Cargando…

An Evaluation of Phylogenetic Workflows in Viral Molecular Epidemiology

The use of viral sequence data to inform public health intervention has become increasingly common in the realm of epidemiology. Such methods typically utilize multiple sequence alignments and phylogenies estimated from the sequence data. Like all estimation techniques, they are error prone, yet the...

Descripción completa

Detalles Bibliográficos
Autores principales: Young, Colin, Meng, Sarah, Moshiri, Niema
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9032411/
https://www.ncbi.nlm.nih.gov/pubmed/35458504
http://dx.doi.org/10.3390/v14040774
_version_ 1784692637261889536
author Young, Colin
Meng, Sarah
Moshiri, Niema
author_facet Young, Colin
Meng, Sarah
Moshiri, Niema
author_sort Young, Colin
collection PubMed
description The use of viral sequence data to inform public health intervention has become increasingly common in the realm of epidemiology. Such methods typically utilize multiple sequence alignments and phylogenies estimated from the sequence data. Like all estimation techniques, they are error prone, yet the impacts of such imperfections on downstream epidemiological inferences are poorly understood. To address this, we executed multiple commonly used viral phylogenetic analysis workflows on simulated viral sequence data, modeling Human Immunodeficiency Virus (HIV), Hepatitis C Virus (HCV), and Ebolavirus, and we computed multiple methods of accuracy, motivated by transmission-clustering techniques. For multiple sequence alignment, MAFFT consistently outperformed MUSCLE and Clustal Omega, in both accuracy and runtime. For phylogenetic inference, FastTree 2, IQ-TREE, RAxML-NG, and PhyML had similar topological accuracies, but branch lengths and pairwise distances were consistently most accurate in phylogenies inferred by RAxML-NG. However, FastTree 2 was the fastest, by orders of magnitude, and when the other tools were used to optimize branch lengths along a fixed FastTree 2 topology, the resulting phylogenies had accuracies that were indistinguishable from their original counterparts, but with a fraction of the runtime.
format Online
Article
Text
id pubmed-9032411
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-90324112022-04-23 An Evaluation of Phylogenetic Workflows in Viral Molecular Epidemiology Young, Colin Meng, Sarah Moshiri, Niema Viruses Article The use of viral sequence data to inform public health intervention has become increasingly common in the realm of epidemiology. Such methods typically utilize multiple sequence alignments and phylogenies estimated from the sequence data. Like all estimation techniques, they are error prone, yet the impacts of such imperfections on downstream epidemiological inferences are poorly understood. To address this, we executed multiple commonly used viral phylogenetic analysis workflows on simulated viral sequence data, modeling Human Immunodeficiency Virus (HIV), Hepatitis C Virus (HCV), and Ebolavirus, and we computed multiple methods of accuracy, motivated by transmission-clustering techniques. For multiple sequence alignment, MAFFT consistently outperformed MUSCLE and Clustal Omega, in both accuracy and runtime. For phylogenetic inference, FastTree 2, IQ-TREE, RAxML-NG, and PhyML had similar topological accuracies, but branch lengths and pairwise distances were consistently most accurate in phylogenies inferred by RAxML-NG. However, FastTree 2 was the fastest, by orders of magnitude, and when the other tools were used to optimize branch lengths along a fixed FastTree 2 topology, the resulting phylogenies had accuracies that were indistinguishable from their original counterparts, but with a fraction of the runtime. MDPI 2022-04-08 /pmc/articles/PMC9032411/ /pubmed/35458504 http://dx.doi.org/10.3390/v14040774 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Young, Colin
Meng, Sarah
Moshiri, Niema
An Evaluation of Phylogenetic Workflows in Viral Molecular Epidemiology
title An Evaluation of Phylogenetic Workflows in Viral Molecular Epidemiology
title_full An Evaluation of Phylogenetic Workflows in Viral Molecular Epidemiology
title_fullStr An Evaluation of Phylogenetic Workflows in Viral Molecular Epidemiology
title_full_unstemmed An Evaluation of Phylogenetic Workflows in Viral Molecular Epidemiology
title_short An Evaluation of Phylogenetic Workflows in Viral Molecular Epidemiology
title_sort evaluation of phylogenetic workflows in viral molecular epidemiology
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9032411/
https://www.ncbi.nlm.nih.gov/pubmed/35458504
http://dx.doi.org/10.3390/v14040774
work_keys_str_mv AT youngcolin anevaluationofphylogeneticworkflowsinviralmolecularepidemiology
AT mengsarah anevaluationofphylogeneticworkflowsinviralmolecularepidemiology
AT moshiriniema anevaluationofphylogeneticworkflowsinviralmolecularepidemiology
AT youngcolin evaluationofphylogeneticworkflowsinviralmolecularepidemiology
AT mengsarah evaluationofphylogeneticworkflowsinviralmolecularepidemiology
AT moshiriniema evaluationofphylogeneticworkflowsinviralmolecularepidemiology