The Gist of it!

Introducing Phylogenetics in a Clinical Proteomics Context

Phylogenetics is a framework originally developed in evolutionary biology to infer the relationships among organisms based on shared characteristics. At its core, phylogenetic analysis uses algorithms to interpret multivariate data — traditionally genetic sequences — to construct trees (cladograms) that reflect similarity, divergence, and inferred hierarchical relationships.

Instead of assuming linear or single-marker comparisons, phylogenetics embraces multi-dimensional pattern structure and constructs models that reflect complex similarity landscapes. This makes it especially powerful for high-dimensional datasets where relationships are subtle, hierarchical, or non-linear.

Key aspects of the phylogenetic paradigm include:

  • Multivariate comparison: simultaneous use of many features to infer relationships
  • Relational structure: output is a tree (cladogram) showing relative proximity and divergence
  • Pattern amplification: subtle patterns can be revealed by leveraging phylogenetic optimization rather than single marker thresholds

Our Application: Phylogenetics Meets Cancer Proteomics

Traditional cancer diagnostics often rely on threshold-based biomarkers or univariate analyses of individual proteins. In contrast, our team has applied phylogenetic methods to cancer proteomics data to uncover systemic relational structure across high-dimensional molecular profiles.

By treating proteomic profiles like “character sets” in evolutionary analysis, we can construct proteome-based cladograms that group samples according to their global similarity patterns, not just individual marker intensities.

In preliminary work, this approach has yielded spectacular results:

  • Distinct clustering of proteomic profiles corresponding to known clinical or biological subtypes
  • Ability to resolve nuanced relationships between samples that are invisible to conventional statistical or machine learning models
  • Identification of coherent proteomic structure that correlates with disease state, progression, or phenotype

Master Cladogram Strategy for Clinical Deployment

We are now proposing a clinical deployment strategy based on a reference framework we call a Master Cladogram:

  1. Build Master Cladograms
    For each cancer type, we will assemble proteomic data from ~1000 patients representing diverse genetic backgrounds, tumor subtypes, stages, and clinical outcomes.
  2. Construct a Robust Reference Tree
    Using phylogenetic inference methods, we will generate a master cladogram for each cancer type that encapsulates the multivariate proteomic landscape of that disease.
  3. Prospective Patient Classification
    When a new patient provides a blood sample:
    • The sample is processed into a proteomic profile
    • The profile is projected onto the existing master cladogram
    • The patient’s position within the tree reflects underlying biological affinity to known proteomic signatures
  4. Diagnosis and Insight
    The placement in the master cladogram becomes a diagnostic and potentially prognostic indicator, leveraging the accumulated relational structure from the reference cohort.
    • If the sample branches near profiles associated with aggressive disease, that informs risk
    • If it groups with less advanced signatures, that informs a different clinical interpretation

Why This Matters

This strategy provides:

  • Highly contextualized diagnosis: placement relative to thousands of examples, not lone biomarkers
  • Scalable and interpretable structure: rooted in evolutionary logic rather than black-box optimization
  • Adaptability: as more data grows, the master cladogram becomes more informative
  • Cross-cancer comparability: each cancer type can have its own master tree, enabling modular use