Cargando…

Online Phylogenetics with matOptimize Produces Equivalent Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than de novo and Maximum-Likelihood Implementations

Phylogenetics has been foundational to SARS-CoV-2 research and public health policy, assisting in genomic surveillance, contact tracing, and assessing emergence and spread of new variants. However, phylogenetic analyses of SARS-CoV-2 have often relied on tools designed for de novo phylogenetic infer...

Descripción completa

Detalles Bibliográficos
Autores principales: Kramer, Alexander M, Thornlow, Bryan, Ye, Cheng, De Maio, Nicola, McBroome, Jakob, Hinrichs, Angie S, Lanfear, Robert, Turakhia, Yatish, Corbett-Detig, Russell
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10627557/
https://www.ncbi.nlm.nih.gov/pubmed/37232476
http://dx.doi.org/10.1093/sysbio/syad031
_version_ 1785131549054730240
author Kramer, Alexander M
Thornlow, Bryan
Ye, Cheng
De Maio, Nicola
McBroome, Jakob
Hinrichs, Angie S
Lanfear, Robert
Turakhia, Yatish
Corbett-Detig, Russell
author_facet Kramer, Alexander M
Thornlow, Bryan
Ye, Cheng
De Maio, Nicola
McBroome, Jakob
Hinrichs, Angie S
Lanfear, Robert
Turakhia, Yatish
Corbett-Detig, Russell
author_sort Kramer, Alexander M
collection PubMed
description Phylogenetics has been foundational to SARS-CoV-2 research and public health policy, assisting in genomic surveillance, contact tracing, and assessing emergence and spread of new variants. However, phylogenetic analyses of SARS-CoV-2 have often relied on tools designed for de novo phylogenetic inference, in which all data are collected before any analysis is performed and the phylogeny is inferred once from scratch. SARS-CoV-2 data sets do not fit this mold. There are currently over 14 million sequenced SARS-CoV-2 genomes in online databases, with tens of thousands of new genomes added every day. Continuous data collection, combined with the public health relevance of SARS-CoV-2, invites an “online” approach to phylogenetics, in which new samples are added to existing phylogenetic trees every day. The extremely dense sampling of SARS-CoV-2 genomes also invites a comparison between likelihood and parsimony approaches to phylogenetic inference. Maximum likelihood (ML) and pseudo-ML methods may be more accurate when there are multiple changes at a single site on a single branch, but this accuracy comes at a large computational cost, and the dense sampling of SARS-CoV-2 genomes means that these instances will be extremely rare because each internal branch is expected to be extremely short. Therefore, it may be that approaches based on maximum parsimony (MP) are sufficiently accurate for reconstructing phylogenies of SARS-CoV-2, and their simplicity means that they can be applied to much larger data sets. Here, we evaluate the performance of de novo and online phylogenetic approaches, as well as ML, pseudo-ML, and MP frameworks for inferring large and dense SARS-CoV-2 phylogenies. Overall, we find that online phylogenetics produces similar phylogenetic trees to de novo analyses for SARS-CoV-2, and that MP optimization with UShER and matOptimize produces equivalent SARS-CoV-2 phylogenies to some of the most popular ML and pseudo-ML inference tools. MP optimization with UShER and matOptimize is thousands of times faster than presently available implementations of ML and online phylogenetics is faster than de novo inference. Our results therefore suggest that parsimony-based methods like UShER and matOptimize represent an accurate and more practical alternative to established ML implementations for large SARS-CoV-2 phylogenies and could be successfully applied to other similar data sets with particularly dense sampling and short branch lengths.
format Online
Article
Text
id pubmed-10627557
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-106275572023-11-07 Online Phylogenetics with matOptimize Produces Equivalent Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than de novo and Maximum-Likelihood Implementations Kramer, Alexander M Thornlow, Bryan Ye, Cheng De Maio, Nicola McBroome, Jakob Hinrichs, Angie S Lanfear, Robert Turakhia, Yatish Corbett-Detig, Russell Syst Biol Regular Manuscripts Phylogenetics has been foundational to SARS-CoV-2 research and public health policy, assisting in genomic surveillance, contact tracing, and assessing emergence and spread of new variants. However, phylogenetic analyses of SARS-CoV-2 have often relied on tools designed for de novo phylogenetic inference, in which all data are collected before any analysis is performed and the phylogeny is inferred once from scratch. SARS-CoV-2 data sets do not fit this mold. There are currently over 14 million sequenced SARS-CoV-2 genomes in online databases, with tens of thousands of new genomes added every day. Continuous data collection, combined with the public health relevance of SARS-CoV-2, invites an “online” approach to phylogenetics, in which new samples are added to existing phylogenetic trees every day. The extremely dense sampling of SARS-CoV-2 genomes also invites a comparison between likelihood and parsimony approaches to phylogenetic inference. Maximum likelihood (ML) and pseudo-ML methods may be more accurate when there are multiple changes at a single site on a single branch, but this accuracy comes at a large computational cost, and the dense sampling of SARS-CoV-2 genomes means that these instances will be extremely rare because each internal branch is expected to be extremely short. Therefore, it may be that approaches based on maximum parsimony (MP) are sufficiently accurate for reconstructing phylogenies of SARS-CoV-2, and their simplicity means that they can be applied to much larger data sets. Here, we evaluate the performance of de novo and online phylogenetic approaches, as well as ML, pseudo-ML, and MP frameworks for inferring large and dense SARS-CoV-2 phylogenies. Overall, we find that online phylogenetics produces similar phylogenetic trees to de novo analyses for SARS-CoV-2, and that MP optimization with UShER and matOptimize produces equivalent SARS-CoV-2 phylogenies to some of the most popular ML and pseudo-ML inference tools. MP optimization with UShER and matOptimize is thousands of times faster than presently available implementations of ML and online phylogenetics is faster than de novo inference. Our results therefore suggest that parsimony-based methods like UShER and matOptimize represent an accurate and more practical alternative to established ML implementations for large SARS-CoV-2 phylogenies and could be successfully applied to other similar data sets with particularly dense sampling and short branch lengths. Oxford University Press 2023-05-26 /pmc/articles/PMC10627557/ /pubmed/37232476 http://dx.doi.org/10.1093/sysbio/syad031 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of the Society of Systematic Biologists. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Regular Manuscripts
Kramer, Alexander M
Thornlow, Bryan
Ye, Cheng
De Maio, Nicola
McBroome, Jakob
Hinrichs, Angie S
Lanfear, Robert
Turakhia, Yatish
Corbett-Detig, Russell
Online Phylogenetics with matOptimize Produces Equivalent Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than de novo and Maximum-Likelihood Implementations
title Online Phylogenetics with matOptimize Produces Equivalent Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than de novo and Maximum-Likelihood Implementations
title_full Online Phylogenetics with matOptimize Produces Equivalent Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than de novo and Maximum-Likelihood Implementations
title_fullStr Online Phylogenetics with matOptimize Produces Equivalent Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than de novo and Maximum-Likelihood Implementations
title_full_unstemmed Online Phylogenetics with matOptimize Produces Equivalent Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than de novo and Maximum-Likelihood Implementations
title_short Online Phylogenetics with matOptimize Produces Equivalent Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than de novo and Maximum-Likelihood Implementations
title_sort online phylogenetics with matoptimize produces equivalent trees and is dramatically more efficient for large sars-cov-2 phylogenies than de novo and maximum-likelihood implementations
topic Regular Manuscripts
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10627557/
https://www.ncbi.nlm.nih.gov/pubmed/37232476
http://dx.doi.org/10.1093/sysbio/syad031
work_keys_str_mv AT krameralexanderm onlinephylogeneticswithmatoptimizeproducesequivalenttreesandisdramaticallymoreefficientforlargesarscov2phylogeniesthandenovoandmaximumlikelihoodimplementations
AT thornlowbryan onlinephylogeneticswithmatoptimizeproducesequivalenttreesandisdramaticallymoreefficientforlargesarscov2phylogeniesthandenovoandmaximumlikelihoodimplementations
AT yecheng onlinephylogeneticswithmatoptimizeproducesequivalenttreesandisdramaticallymoreefficientforlargesarscov2phylogeniesthandenovoandmaximumlikelihoodimplementations
AT demaionicola onlinephylogeneticswithmatoptimizeproducesequivalenttreesandisdramaticallymoreefficientforlargesarscov2phylogeniesthandenovoandmaximumlikelihoodimplementations
AT mcbroomejakob onlinephylogeneticswithmatoptimizeproducesequivalenttreesandisdramaticallymoreefficientforlargesarscov2phylogeniesthandenovoandmaximumlikelihoodimplementations
AT hinrichsangies onlinephylogeneticswithmatoptimizeproducesequivalenttreesandisdramaticallymoreefficientforlargesarscov2phylogeniesthandenovoandmaximumlikelihoodimplementations
AT lanfearrobert onlinephylogeneticswithmatoptimizeproducesequivalenttreesandisdramaticallymoreefficientforlargesarscov2phylogeniesthandenovoandmaximumlikelihoodimplementations
AT turakhiayatish onlinephylogeneticswithmatoptimizeproducesequivalenttreesandisdramaticallymoreefficientforlargesarscov2phylogeniesthandenovoandmaximumlikelihoodimplementations
AT corbettdetigrussell onlinephylogeneticswithmatoptimizeproducesequivalenttreesandisdramaticallymoreefficientforlargesarscov2phylogeniesthandenovoandmaximumlikelihoodimplementations