Cargando…
PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences
Next-generation sequencing of pathogen quasispecies within a host yields data sets of tens to hundreds of unique sequences. However, the full data set often contains thousands of sequences, because many of those unique sequences have multiple identical copies. Data sets of this size represent a comp...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7530608/ https://www.ncbi.nlm.nih.gov/pubmed/32492139 http://dx.doi.org/10.1093/molbev/msaa136 |
_version_ | 1783589600471547904 |
---|---|
author | Boskova, Veronika Stadler, Tanja |
author_facet | Boskova, Veronika Stadler, Tanja |
author_sort | Boskova, Veronika |
collection | PubMed |
description | Next-generation sequencing of pathogen quasispecies within a host yields data sets of tens to hundreds of unique sequences. However, the full data set often contains thousands of sequences, because many of those unique sequences have multiple identical copies. Data sets of this size represent a computational challenge for currently available Bayesian phylogenetic and phylodynamic methods. Through simulations, we explore how large data sets with duplicate sequences affect the speed and accuracy of phylogenetic and phylodynamic analysis within BEAST 2. We show that using unique sequences only leads to biases, and using a random subset of sequences yields imprecise parameter estimates. To overcome these shortcomings, we introduce PIQMEE, a BEAST 2 add-on that produces reliable parameter estimates from full data sets with increased computational efficiency as compared with the currently available methods within BEAST 2. The principle behind PIQMEE is to resolve the tree structure of the unique sequences only, while simultaneously estimating the branching times of the duplicate sequences. Distinguishing between unique and duplicate sequences allows our method to perform well even for very large data sets. Although the classic method converges poorly for data sets of 6,000 sequences when allowed to run for 7 days, our method converges in slightly more than 1 day. In fact, PIQMEE can handle data sets of around 21,000 sequences with 20 unique sequences in 14 days. Finally, we apply the method to a real, within-host HIV sequencing data set with several thousand sequences per patient. |
format | Online Article Text |
id | pubmed-7530608 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-75306082020-10-07 PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences Boskova, Veronika Stadler, Tanja Mol Biol Evol Resources Next-generation sequencing of pathogen quasispecies within a host yields data sets of tens to hundreds of unique sequences. However, the full data set often contains thousands of sequences, because many of those unique sequences have multiple identical copies. Data sets of this size represent a computational challenge for currently available Bayesian phylogenetic and phylodynamic methods. Through simulations, we explore how large data sets with duplicate sequences affect the speed and accuracy of phylogenetic and phylodynamic analysis within BEAST 2. We show that using unique sequences only leads to biases, and using a random subset of sequences yields imprecise parameter estimates. To overcome these shortcomings, we introduce PIQMEE, a BEAST 2 add-on that produces reliable parameter estimates from full data sets with increased computational efficiency as compared with the currently available methods within BEAST 2. The principle behind PIQMEE is to resolve the tree structure of the unique sequences only, while simultaneously estimating the branching times of the duplicate sequences. Distinguishing between unique and duplicate sequences allows our method to perform well even for very large data sets. Although the classic method converges poorly for data sets of 6,000 sequences when allowed to run for 7 days, our method converges in slightly more than 1 day. In fact, PIQMEE can handle data sets of around 21,000 sequences with 20 unique sequences in 14 days. Finally, we apply the method to a real, within-host HIV sequencing data set with several thousand sequences per patient. Oxford University Press 2020-06-03 /pmc/articles/PMC7530608/ /pubmed/32492139 http://dx.doi.org/10.1093/molbev/msaa136 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Resources Boskova, Veronika Stadler, Tanja PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences |
title | PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences |
title_full | PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences |
title_fullStr | PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences |
title_full_unstemmed | PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences |
title_short | PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences |
title_sort | piqmee: bayesian phylodynamic method for analysis of large data sets with duplicate sequences |
topic | Resources |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7530608/ https://www.ncbi.nlm.nih.gov/pubmed/32492139 http://dx.doi.org/10.1093/molbev/msaa136 |
work_keys_str_mv | AT boskovaveronika piqmeebayesianphylodynamicmethodforanalysisoflargedatasetswithduplicatesequences AT stadlertanja piqmeebayesianphylodynamicmethodforanalysisoflargedatasetswithduplicatesequences |