Introducing Phylogenetics in a Clinical Proteomics Context
Phylogenetics is a framework originally developed in evolutionary biology to infer the relationships among organisms based on shared characteristics. At its core, phylogenetic analysis uses algorithms to interpret multivariate data — traditionally genetic sequences — to construct trees (cladograms) that reflect similarity, divergence, and inferred hierarchical relationships.
Instead of assuming linear or single-marker comparisons, phylogenetics embraces multi-dimensional pattern structure and constructs models that reflect complex similarity landscapes. This makes it especially powerful for high-dimensional datasets where relationships are subtle, hierarchical, or non-linear.
Key aspects of the phylogenetic paradigm include:
- Multivariate comparison: simultaneous use of many features to infer relationships
- Relational structure: output is a tree (cladogram) showing relative proximity and divergence
- Pattern amplification: subtle patterns can be revealed by leveraging phylogenetic optimization rather than single marker thresholds
Our Application: Phylogenetics Meets Cancer Proteomics
Traditional cancer diagnostics often rely on threshold-based biomarkers or univariate analyses of individual proteins. In contrast, our team has applied phylogenetic methods to cancer proteomics data to uncover systemic relational structure across high-dimensional molecular profiles.
By treating proteomic profiles like “character sets” in evolutionary analysis, we can construct proteome-based cladograms that group samples according to their global similarity patterns, not just individual marker intensities.
In preliminary work, this approach has yielded spectacular results:
- Distinct clustering of proteomic profiles corresponding to known clinical or biological subtypes
- Ability to resolve nuanced relationships between samples that are invisible to conventional statistical or machine learning models
- Identification of coherent proteomic structure that correlates with disease state, progression, or phenotype
Master Cladogram Strategy for Clinical Deployment
We are now proposing a clinical deployment strategy based on a reference framework we call a Master Cladogram:
- Build Master Cladograms
For each cancer type, we will assemble proteomic data from ~1000 patients representing diverse genetic backgrounds, tumor subtypes, stages, and clinical outcomes. - Construct a Robust Reference Tree
Using phylogenetic inference methods, we will generate a master cladogram for each cancer type that encapsulates the multivariate proteomic landscape of that disease. - Prospective Patient Classification
When a new patient provides a blood sample:- The sample is processed into a proteomic profile
- The profile is projected onto the existing master cladogram
- The patient’s position within the tree reflects underlying biological affinity to known proteomic signatures
- Diagnosis and Insight
The placement in the master cladogram becomes a diagnostic and potentially prognostic indicator, leveraging the accumulated relational structure from the reference cohort.- If the sample branches near profiles associated with aggressive disease, that informs risk
- If it groups with less advanced signatures, that informs a different clinical interpretation
Why This Matters
This strategy provides:
- Highly contextualized diagnosis: placement relative to thousands of examples, not lone biomarkers
- Scalable and interpretable structure: rooted in evolutionary logic rather than black-box optimization
- Adaptability: as more data grows, the master cladogram becomes more informative
- Cross-cancer comparability: each cancer type can have its own master tree, enabling modular use