Cargando…
Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model
Cancer progression and treatment can be informed by reconstructing its evolutionary history from tumor cells. Although many methods exist to estimate evolutionary trees (called phylogenies) from molecular sequences, traditional approaches assume the input data are error-free and the output tree is f...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10691101/ https://www.ncbi.nlm.nih.gov/pubmed/38041123 http://dx.doi.org/10.1186/s13015-023-00248-w |
_version_ | 1785152670360666112 |
---|---|
author | Han, Yunheng Molloy, Erin K. |
author_facet | Han, Yunheng Molloy, Erin K. |
author_sort | Han, Yunheng |
collection | PubMed |
description | Cancer progression and treatment can be informed by reconstructing its evolutionary history from tumor cells. Although many methods exist to estimate evolutionary trees (called phylogenies) from molecular sequences, traditional approaches assume the input data are error-free and the output tree is fully resolved. These assumptions are challenged in tumor phylogenetics because single-cell sequencing produces sparse, error-ridden data and because tumors evolve clonally. Here, we study the theoretical utility of methods based on quartets (four-leaf, unrooted phylogenetic trees) in light of these barriers. We consider a popular tumor phylogenetics model, in which mutations arise on a (highly unresolved) tree and then (unbiased) errors and missing values are introduced. Quartets are then implied by mutations present in two cells and absent from two cells. Our main result is that the most probable quartet identifies the unrooted model tree on four cells. This motivates seeking a tree such that the number of quartets shared between it and the input mutations is maximized. We prove an optimal solution to this problem is a consistent estimator of the unrooted cell lineage tree; this guarantee includes the case where the model tree is highly unresolved, with error defined as the number of false negative branches. Lastly, we outline how quartet-based methods might be employed when there are copy number aberrations and other challenges specific to tumor phylogenetics. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13015-023-00248-w. |
format | Online Article Text |
id | pubmed-10691101 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-106911012023-12-02 Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model Han, Yunheng Molloy, Erin K. Algorithms Mol Biol Research Cancer progression and treatment can be informed by reconstructing its evolutionary history from tumor cells. Although many methods exist to estimate evolutionary trees (called phylogenies) from molecular sequences, traditional approaches assume the input data are error-free and the output tree is fully resolved. These assumptions are challenged in tumor phylogenetics because single-cell sequencing produces sparse, error-ridden data and because tumors evolve clonally. Here, we study the theoretical utility of methods based on quartets (four-leaf, unrooted phylogenetic trees) in light of these barriers. We consider a popular tumor phylogenetics model, in which mutations arise on a (highly unresolved) tree and then (unbiased) errors and missing values are introduced. Quartets are then implied by mutations present in two cells and absent from two cells. Our main result is that the most probable quartet identifies the unrooted model tree on four cells. This motivates seeking a tree such that the number of quartets shared between it and the input mutations is maximized. We prove an optimal solution to this problem is a consistent estimator of the unrooted cell lineage tree; this guarantee includes the case where the model tree is highly unresolved, with error defined as the number of false negative branches. Lastly, we outline how quartet-based methods might be employed when there are copy number aberrations and other challenges specific to tumor phylogenetics. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13015-023-00248-w. BioMed Central 2023-12-01 /pmc/articles/PMC10691101/ /pubmed/38041123 http://dx.doi.org/10.1186/s13015-023-00248-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Han, Yunheng Molloy, Erin K. Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model |
title | Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model |
title_full | Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model |
title_fullStr | Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model |
title_full_unstemmed | Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model |
title_short | Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model |
title_sort | quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10691101/ https://www.ncbi.nlm.nih.gov/pubmed/38041123 http://dx.doi.org/10.1186/s13015-023-00248-w |
work_keys_str_mv | AT hanyunheng quartetsenablestatisticallyconsistentestimationofcelllineagetreesunderanunbiasederrorandmissingnessmodel AT molloyerink quartetsenablestatisticallyconsistentestimationofcelllineagetreesunderanunbiasederrorandmissingnessmodel |