Cargando…

PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences

Next-generation sequencing of pathogen quasispecies within a host yields data sets of tens to hundreds of unique sequences. However, the full data set often contains thousands of sequences, because many of those unique sequences have multiple identical copies. Data sets of this size represent a comp...

Descripción completa

Detalles Bibliográficos
Autores principales: Boskova, Veronika, Stadler, Tanja
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7530608/
https://www.ncbi.nlm.nih.gov/pubmed/32492139
http://dx.doi.org/10.1093/molbev/msaa136
_version_ 1783589600471547904
author Boskova, Veronika
Stadler, Tanja
author_facet Boskova, Veronika
Stadler, Tanja
author_sort Boskova, Veronika
collection PubMed
description Next-generation sequencing of pathogen quasispecies within a host yields data sets of tens to hundreds of unique sequences. However, the full data set often contains thousands of sequences, because many of those unique sequences have multiple identical copies. Data sets of this size represent a computational challenge for currently available Bayesian phylogenetic and phylodynamic methods. Through simulations, we explore how large data sets with duplicate sequences affect the speed and accuracy of phylogenetic and phylodynamic analysis within BEAST 2. We show that using unique sequences only leads to biases, and using a random subset of sequences yields imprecise parameter estimates. To overcome these shortcomings, we introduce PIQMEE, a BEAST 2 add-on that produces reliable parameter estimates from full data sets with increased computational efficiency as compared with the currently available methods within BEAST 2. The principle behind PIQMEE is to resolve the tree structure of the unique sequences only, while simultaneously estimating the branching times of the duplicate sequences. Distinguishing between unique and duplicate sequences allows our method to perform well even for very large data sets. Although the classic method converges poorly for data sets of 6,000 sequences when allowed to run for 7 days, our method converges in slightly more than 1 day. In fact, PIQMEE can handle data sets of around 21,000 sequences with 20 unique sequences in 14 days. Finally, we apply the method to a real, within-host HIV sequencing data set with several thousand sequences per patient.
format Online
Article
Text
id pubmed-7530608
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-75306082020-10-07 PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences Boskova, Veronika Stadler, Tanja Mol Biol Evol Resources Next-generation sequencing of pathogen quasispecies within a host yields data sets of tens to hundreds of unique sequences. However, the full data set often contains thousands of sequences, because many of those unique sequences have multiple identical copies. Data sets of this size represent a computational challenge for currently available Bayesian phylogenetic and phylodynamic methods. Through simulations, we explore how large data sets with duplicate sequences affect the speed and accuracy of phylogenetic and phylodynamic analysis within BEAST 2. We show that using unique sequences only leads to biases, and using a random subset of sequences yields imprecise parameter estimates. To overcome these shortcomings, we introduce PIQMEE, a BEAST 2 add-on that produces reliable parameter estimates from full data sets with increased computational efficiency as compared with the currently available methods within BEAST 2. The principle behind PIQMEE is to resolve the tree structure of the unique sequences only, while simultaneously estimating the branching times of the duplicate sequences. Distinguishing between unique and duplicate sequences allows our method to perform well even for very large data sets. Although the classic method converges poorly for data sets of 6,000 sequences when allowed to run for 7 days, our method converges in slightly more than 1 day. In fact, PIQMEE can handle data sets of around 21,000 sequences with 20 unique sequences in 14 days. Finally, we apply the method to a real, within-host HIV sequencing data set with several thousand sequences per patient. Oxford University Press 2020-06-03 /pmc/articles/PMC7530608/ /pubmed/32492139 http://dx.doi.org/10.1093/molbev/msaa136 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Resources
Boskova, Veronika
Stadler, Tanja
PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences
title PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences
title_full PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences
title_fullStr PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences
title_full_unstemmed PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences
title_short PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences
title_sort piqmee: bayesian phylodynamic method for analysis of large data sets with duplicate sequences
topic Resources
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7530608/
https://www.ncbi.nlm.nih.gov/pubmed/32492139
http://dx.doi.org/10.1093/molbev/msaa136
work_keys_str_mv AT boskovaveronika piqmeebayesianphylodynamicmethodforanalysisoflargedatasetswithduplicatesequences
AT stadlertanja piqmeebayesianphylodynamicmethodforanalysisoflargedatasetswithduplicatesequences