Cargando…
An Evaluation of Phylogenetic Workflows in Viral Molecular Epidemiology
The use of viral sequence data to inform public health intervention has become increasingly common in the realm of epidemiology. Such methods typically utilize multiple sequence alignments and phylogenies estimated from the sequence data. Like all estimation techniques, they are error prone, yet the...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9032411/ https://www.ncbi.nlm.nih.gov/pubmed/35458504 http://dx.doi.org/10.3390/v14040774 |
_version_ | 1784692637261889536 |
---|---|
author | Young, Colin Meng, Sarah Moshiri, Niema |
author_facet | Young, Colin Meng, Sarah Moshiri, Niema |
author_sort | Young, Colin |
collection | PubMed |
description | The use of viral sequence data to inform public health intervention has become increasingly common in the realm of epidemiology. Such methods typically utilize multiple sequence alignments and phylogenies estimated from the sequence data. Like all estimation techniques, they are error prone, yet the impacts of such imperfections on downstream epidemiological inferences are poorly understood. To address this, we executed multiple commonly used viral phylogenetic analysis workflows on simulated viral sequence data, modeling Human Immunodeficiency Virus (HIV), Hepatitis C Virus (HCV), and Ebolavirus, and we computed multiple methods of accuracy, motivated by transmission-clustering techniques. For multiple sequence alignment, MAFFT consistently outperformed MUSCLE and Clustal Omega, in both accuracy and runtime. For phylogenetic inference, FastTree 2, IQ-TREE, RAxML-NG, and PhyML had similar topological accuracies, but branch lengths and pairwise distances were consistently most accurate in phylogenies inferred by RAxML-NG. However, FastTree 2 was the fastest, by orders of magnitude, and when the other tools were used to optimize branch lengths along a fixed FastTree 2 topology, the resulting phylogenies had accuracies that were indistinguishable from their original counterparts, but with a fraction of the runtime. |
format | Online Article Text |
id | pubmed-9032411 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-90324112022-04-23 An Evaluation of Phylogenetic Workflows in Viral Molecular Epidemiology Young, Colin Meng, Sarah Moshiri, Niema Viruses Article The use of viral sequence data to inform public health intervention has become increasingly common in the realm of epidemiology. Such methods typically utilize multiple sequence alignments and phylogenies estimated from the sequence data. Like all estimation techniques, they are error prone, yet the impacts of such imperfections on downstream epidemiological inferences are poorly understood. To address this, we executed multiple commonly used viral phylogenetic analysis workflows on simulated viral sequence data, modeling Human Immunodeficiency Virus (HIV), Hepatitis C Virus (HCV), and Ebolavirus, and we computed multiple methods of accuracy, motivated by transmission-clustering techniques. For multiple sequence alignment, MAFFT consistently outperformed MUSCLE and Clustal Omega, in both accuracy and runtime. For phylogenetic inference, FastTree 2, IQ-TREE, RAxML-NG, and PhyML had similar topological accuracies, but branch lengths and pairwise distances were consistently most accurate in phylogenies inferred by RAxML-NG. However, FastTree 2 was the fastest, by orders of magnitude, and when the other tools were used to optimize branch lengths along a fixed FastTree 2 topology, the resulting phylogenies had accuracies that were indistinguishable from their original counterparts, but with a fraction of the runtime. MDPI 2022-04-08 /pmc/articles/PMC9032411/ /pubmed/35458504 http://dx.doi.org/10.3390/v14040774 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Young, Colin Meng, Sarah Moshiri, Niema An Evaluation of Phylogenetic Workflows in Viral Molecular Epidemiology |
title | An Evaluation of Phylogenetic Workflows in Viral Molecular Epidemiology |
title_full | An Evaluation of Phylogenetic Workflows in Viral Molecular Epidemiology |
title_fullStr | An Evaluation of Phylogenetic Workflows in Viral Molecular Epidemiology |
title_full_unstemmed | An Evaluation of Phylogenetic Workflows in Viral Molecular Epidemiology |
title_short | An Evaluation of Phylogenetic Workflows in Viral Molecular Epidemiology |
title_sort | evaluation of phylogenetic workflows in viral molecular epidemiology |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9032411/ https://www.ncbi.nlm.nih.gov/pubmed/35458504 http://dx.doi.org/10.3390/v14040774 |
work_keys_str_mv | AT youngcolin anevaluationofphylogeneticworkflowsinviralmolecularepidemiology AT mengsarah anevaluationofphylogeneticworkflowsinviralmolecularepidemiology AT moshiriniema anevaluationofphylogeneticworkflowsinviralmolecularepidemiology AT youngcolin evaluationofphylogeneticworkflowsinviralmolecularepidemiology AT mengsarah evaluationofphylogeneticworkflowsinviralmolecularepidemiology AT moshiriniema evaluationofphylogeneticworkflowsinviralmolecularepidemiology |