Cargando…

Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model

Cancer progression and treatment can be informed by reconstructing its evolutionary history from tumor cells. Although many methods exist to estimate evolutionary trees (called phylogenies) from molecular sequences, traditional approaches assume the input data are error-free and the output tree is f...

Descripción completa

Detalles Bibliográficos
Autores principales: Han, Yunheng, Molloy, Erin K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10691101/
https://www.ncbi.nlm.nih.gov/pubmed/38041123
http://dx.doi.org/10.1186/s13015-023-00248-w
_version_ 1785152670360666112
author Han, Yunheng
Molloy, Erin K.
author_facet Han, Yunheng
Molloy, Erin K.
author_sort Han, Yunheng
collection PubMed
description Cancer progression and treatment can be informed by reconstructing its evolutionary history from tumor cells. Although many methods exist to estimate evolutionary trees (called phylogenies) from molecular sequences, traditional approaches assume the input data are error-free and the output tree is fully resolved. These assumptions are challenged in tumor phylogenetics because single-cell sequencing produces sparse, error-ridden data and because tumors evolve clonally. Here, we study the theoretical utility of methods based on quartets (four-leaf, unrooted phylogenetic trees) in light of these barriers. We consider a popular tumor phylogenetics model, in which mutations arise on a (highly unresolved) tree and then (unbiased) errors and missing values are introduced. Quartets are then implied by mutations present in two cells and absent from two cells. Our main result is that the most probable quartet identifies the unrooted model tree on four cells. This motivates seeking a tree such that the number of quartets shared between it and the input mutations is maximized. We prove an optimal solution to this problem is a consistent estimator of the unrooted cell lineage tree; this guarantee includes the case where the model tree is highly unresolved, with error defined as the number of false negative branches. Lastly, we outline how quartet-based methods might be employed when there are copy number aberrations and other challenges specific to tumor phylogenetics. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13015-023-00248-w.
format Online
Article
Text
id pubmed-10691101
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-106911012023-12-02 Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model Han, Yunheng Molloy, Erin K. Algorithms Mol Biol Research Cancer progression and treatment can be informed by reconstructing its evolutionary history from tumor cells. Although many methods exist to estimate evolutionary trees (called phylogenies) from molecular sequences, traditional approaches assume the input data are error-free and the output tree is fully resolved. These assumptions are challenged in tumor phylogenetics because single-cell sequencing produces sparse, error-ridden data and because tumors evolve clonally. Here, we study the theoretical utility of methods based on quartets (four-leaf, unrooted phylogenetic trees) in light of these barriers. We consider a popular tumor phylogenetics model, in which mutations arise on a (highly unresolved) tree and then (unbiased) errors and missing values are introduced. Quartets are then implied by mutations present in two cells and absent from two cells. Our main result is that the most probable quartet identifies the unrooted model tree on four cells. This motivates seeking a tree such that the number of quartets shared between it and the input mutations is maximized. We prove an optimal solution to this problem is a consistent estimator of the unrooted cell lineage tree; this guarantee includes the case where the model tree is highly unresolved, with error defined as the number of false negative branches. Lastly, we outline how quartet-based methods might be employed when there are copy number aberrations and other challenges specific to tumor phylogenetics. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13015-023-00248-w. BioMed Central 2023-12-01 /pmc/articles/PMC10691101/ /pubmed/38041123 http://dx.doi.org/10.1186/s13015-023-00248-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Han, Yunheng
Molloy, Erin K.
Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model
title Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model
title_full Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model
title_fullStr Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model
title_full_unstemmed Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model
title_short Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model
title_sort quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10691101/
https://www.ncbi.nlm.nih.gov/pubmed/38041123
http://dx.doi.org/10.1186/s13015-023-00248-w
work_keys_str_mv AT hanyunheng quartetsenablestatisticallyconsistentestimationofcelllineagetreesunderanunbiasederrorandmissingnessmodel
AT molloyerink quartetsenablestatisticallyconsistentestimationofcelllineagetreesunderanunbiasederrorandmissingnessmodel