Cargando…

Online Phylogenetics using Parsimony Produces Slightly Better Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than de novo and Maximum-Likelihood Approaches

Phylogenetics has been foundational to SARS-CoV-2 research and public health policy, assisting in genomic surveillance, contact tracing, and assessing emergence and spread of new variants. However, phylogenetic analyses of SARS-CoV-2 have often relied on tools designed for de novo phylogenetic infer...

Descripción completa

Detalles Bibliográficos
Autores principales: Thornlow, Bryan, Kramer, Alexander, Ye, Cheng, De Maio, Nicola, McBroome, Jakob, Hinrichs, Angie S., Lanfear, Robert, Turakhia, Yatish, Corbett-Detig, Russell
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9128781/
https://www.ncbi.nlm.nih.gov/pubmed/35611334
http://dx.doi.org/10.1101/2021.12.02.471004
_version_ 1784712617216966656
author Thornlow, Bryan
Kramer, Alexander
Ye, Cheng
De Maio, Nicola
McBroome, Jakob
Hinrichs, Angie S.
Lanfear, Robert
Turakhia, Yatish
Corbett-Detig, Russell
author_facet Thornlow, Bryan
Kramer, Alexander
Ye, Cheng
De Maio, Nicola
McBroome, Jakob
Hinrichs, Angie S.
Lanfear, Robert
Turakhia, Yatish
Corbett-Detig, Russell
author_sort Thornlow, Bryan
collection PubMed
description Phylogenetics has been foundational to SARS-CoV-2 research and public health policy, assisting in genomic surveillance, contact tracing, and assessing emergence and spread of new variants. However, phylogenetic analyses of SARS-CoV-2 have often relied on tools designed for de novo phylogenetic inference, in which all data are collected before any analysis is performed and the phylogeny is inferred once from scratch. SARS-CoV-2 datasets do not fit this mould. There are currently over 10 million sequenced SARS-CoV-2 genomes in online databases, with tens of thousands of new genomes added every day. Continuous data collection, combined with the public health relevance of SARS-CoV-2, invites an “online” approach to phylogenetics, in which new samples are added to existing phylogenetic trees every day. The extremely dense sampling of SARS-CoV-2 genomes also invites a comparison between likelihood and parsimony approaches to phylogenetic inference. Maximum likelihood (ML) methods are more accurate when there are multiple changes at a single site on a single branch, but this accuracy comes at a large computational cost, and the dense sampling of SARS-CoV-2 genomes means that these instances will be extremely rare because each internal branch is expected to be extremely short. Therefore, it may be that approaches based on maximum parsimony (MP) are sufficiently accurate for reconstructing phylogenies of SARS-CoV-2, and their simplicity means that they can be applied to much larger datasets. Here, we evaluate the performance of de novo and online phylogenetic approaches, and ML and MP frameworks, for inferring large and dense SARS-CoV-2 phylogenies. Overall, we find that online phylogenetics produces similar phylogenetic trees to de novo analyses for SARS-CoV-2, and that MP optimizations produce more accurate SARS-CoV-2 phylogenies than do ML optimizations. Since MP is thousands of times faster than presently available implementations of ML and online phylogenetics is faster than de novo, we therefore propose that, in the context of comprehensive genomic epidemiology of SARS-CoV-2, MP online phylogenetics approaches should be favored.
format Online
Article
Text
id pubmed-9128781
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-91287812022-05-25 Online Phylogenetics using Parsimony Produces Slightly Better Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than de novo and Maximum-Likelihood Approaches Thornlow, Bryan Kramer, Alexander Ye, Cheng De Maio, Nicola McBroome, Jakob Hinrichs, Angie S. Lanfear, Robert Turakhia, Yatish Corbett-Detig, Russell bioRxiv Article Phylogenetics has been foundational to SARS-CoV-2 research and public health policy, assisting in genomic surveillance, contact tracing, and assessing emergence and spread of new variants. However, phylogenetic analyses of SARS-CoV-2 have often relied on tools designed for de novo phylogenetic inference, in which all data are collected before any analysis is performed and the phylogeny is inferred once from scratch. SARS-CoV-2 datasets do not fit this mould. There are currently over 10 million sequenced SARS-CoV-2 genomes in online databases, with tens of thousands of new genomes added every day. Continuous data collection, combined with the public health relevance of SARS-CoV-2, invites an “online” approach to phylogenetics, in which new samples are added to existing phylogenetic trees every day. The extremely dense sampling of SARS-CoV-2 genomes also invites a comparison between likelihood and parsimony approaches to phylogenetic inference. Maximum likelihood (ML) methods are more accurate when there are multiple changes at a single site on a single branch, but this accuracy comes at a large computational cost, and the dense sampling of SARS-CoV-2 genomes means that these instances will be extremely rare because each internal branch is expected to be extremely short. Therefore, it may be that approaches based on maximum parsimony (MP) are sufficiently accurate for reconstructing phylogenies of SARS-CoV-2, and their simplicity means that they can be applied to much larger datasets. Here, we evaluate the performance of de novo and online phylogenetic approaches, and ML and MP frameworks, for inferring large and dense SARS-CoV-2 phylogenies. Overall, we find that online phylogenetics produces similar phylogenetic trees to de novo analyses for SARS-CoV-2, and that MP optimizations produce more accurate SARS-CoV-2 phylogenies than do ML optimizations. Since MP is thousands of times faster than presently available implementations of ML and online phylogenetics is faster than de novo, we therefore propose that, in the context of comprehensive genomic epidemiology of SARS-CoV-2, MP online phylogenetics approaches should be favored. Cold Spring Harbor Laboratory 2022-05-18 /pmc/articles/PMC9128781/ /pubmed/35611334 http://dx.doi.org/10.1101/2021.12.02.471004 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Thornlow, Bryan
Kramer, Alexander
Ye, Cheng
De Maio, Nicola
McBroome, Jakob
Hinrichs, Angie S.
Lanfear, Robert
Turakhia, Yatish
Corbett-Detig, Russell
Online Phylogenetics using Parsimony Produces Slightly Better Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than de novo and Maximum-Likelihood Approaches
title Online Phylogenetics using Parsimony Produces Slightly Better Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than de novo and Maximum-Likelihood Approaches
title_full Online Phylogenetics using Parsimony Produces Slightly Better Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than de novo and Maximum-Likelihood Approaches
title_fullStr Online Phylogenetics using Parsimony Produces Slightly Better Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than de novo and Maximum-Likelihood Approaches
title_full_unstemmed Online Phylogenetics using Parsimony Produces Slightly Better Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than de novo and Maximum-Likelihood Approaches
title_short Online Phylogenetics using Parsimony Produces Slightly Better Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than de novo and Maximum-Likelihood Approaches
title_sort online phylogenetics using parsimony produces slightly better trees and is dramatically more efficient for large sars-cov-2 phylogenies than de novo and maximum-likelihood approaches
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9128781/
https://www.ncbi.nlm.nih.gov/pubmed/35611334
http://dx.doi.org/10.1101/2021.12.02.471004
work_keys_str_mv AT thornlowbryan onlinephylogeneticsusingparsimonyproducesslightlybettertreesandisdramaticallymoreefficientforlargesarscov2phylogeniesthandenovoandmaximumlikelihoodapproaches
AT krameralexander onlinephylogeneticsusingparsimonyproducesslightlybettertreesandisdramaticallymoreefficientforlargesarscov2phylogeniesthandenovoandmaximumlikelihoodapproaches
AT yecheng onlinephylogeneticsusingparsimonyproducesslightlybettertreesandisdramaticallymoreefficientforlargesarscov2phylogeniesthandenovoandmaximumlikelihoodapproaches
AT demaionicola onlinephylogeneticsusingparsimonyproducesslightlybettertreesandisdramaticallymoreefficientforlargesarscov2phylogeniesthandenovoandmaximumlikelihoodapproaches
AT mcbroomejakob onlinephylogeneticsusingparsimonyproducesslightlybettertreesandisdramaticallymoreefficientforlargesarscov2phylogeniesthandenovoandmaximumlikelihoodapproaches
AT hinrichsangies onlinephylogeneticsusingparsimonyproducesslightlybettertreesandisdramaticallymoreefficientforlargesarscov2phylogeniesthandenovoandmaximumlikelihoodapproaches
AT lanfearrobert onlinephylogeneticsusingparsimonyproducesslightlybettertreesandisdramaticallymoreefficientforlargesarscov2phylogeniesthandenovoandmaximumlikelihoodapproaches
AT turakhiayatish onlinephylogeneticsusingparsimonyproducesslightlybettertreesandisdramaticallymoreefficientforlargesarscov2phylogeniesthandenovoandmaximumlikelihoodapproaches
AT corbettdetigrussell onlinephylogeneticsusingparsimonyproducesslightlybettertreesandisdramaticallymoreefficientforlargesarscov2phylogeniesthandenovoandmaximumlikelihoodapproaches