Cargando…

PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences

Next-generation sequencing of pathogen quasispecies within a host yields data sets of tens to hundreds of unique sequences. However, the full data set often contains thousands of sequences, because many of those unique sequences have multiple identical copies. Data sets of this size represent a comp...

Descripción completa

Detalles Bibliográficos
Autores principales:	Boskova, Veronika, Stadler, Tanja
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Resources
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7530608/ https://www.ncbi.nlm.nih.gov/pubmed/32492139 http://dx.doi.org/10.1093/molbev/msaa136

_version_	1783589600471547904
author	Boskova, Veronika Stadler, Tanja
author_facet	Boskova, Veronika Stadler, Tanja
author_sort	Boskova, Veronika
collection	PubMed
description	Next-generation sequencing of pathogen quasispecies within a host yields data sets of tens to hundreds of unique sequences. However, the full data set often contains thousands of sequences, because many of those unique sequences have multiple identical copies. Data sets of this size represent a computational challenge for currently available Bayesian phylogenetic and phylodynamic methods. Through simulations, we explore how large data sets with duplicate sequences affect the speed and accuracy of phylogenetic and phylodynamic analysis within BEAST 2. We show that using unique sequences only leads to biases, and using a random subset of sequences yields imprecise parameter estimates. To overcome these shortcomings, we introduce PIQMEE, a BEAST 2 add-on that produces reliable parameter estimates from full data sets with increased computational efficiency as compared with the currently available methods within BEAST 2. The principle behind PIQMEE is to resolve the tree structure of the unique sequences only, while simultaneously estimating the branching times of the duplicate sequences. Distinguishing between unique and duplicate sequences allows our method to perform well even for very large data sets. Although the classic method converges poorly for data sets of 6,000 sequences when allowed to run for 7 days, our method converges in slightly more than 1 day. In fact, PIQMEE can handle data sets of around 21,000 sequences with 20 unique sequences in 14 days. Finally, we apply the method to a real, within-host HIV sequencing data set with several thousand sequences per patient.
format	Online Article Text
id	pubmed-7530608
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-75306082020-10-07 PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences Boskova, Veronika Stadler, Tanja Mol Biol Evol Resources Next-generation sequencing of pathogen quasispecies within a host yields data sets of tens to hundreds of unique sequences. However, the full data set often contains thousands of sequences, because many of those unique sequences have multiple identical copies. Data sets of this size represent a computational challenge for currently available Bayesian phylogenetic and phylodynamic methods. Through simulations, we explore how large data sets with duplicate sequences affect the speed and accuracy of phylogenetic and phylodynamic analysis within BEAST 2. We show that using unique sequences only leads to biases, and using a random subset of sequences yields imprecise parameter estimates. To overcome these shortcomings, we introduce PIQMEE, a BEAST 2 add-on that produces reliable parameter estimates from full data sets with increased computational efficiency as compared with the currently available methods within BEAST 2. The principle behind PIQMEE is to resolve the tree structure of the unique sequences only, while simultaneously estimating the branching times of the duplicate sequences. Distinguishing between unique and duplicate sequences allows our method to perform well even for very large data sets. Although the classic method converges poorly for data sets of 6,000 sequences when allowed to run for 7 days, our method converges in slightly more than 1 day. In fact, PIQMEE can handle data sets of around 21,000 sequences with 20 unique sequences in 14 days. Finally, we apply the method to a real, within-host HIV sequencing data set with several thousand sequences per patient. Oxford University Press 2020-06-03 /pmc/articles/PMC7530608/ /pubmed/32492139 http://dx.doi.org/10.1093/molbev/msaa136 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Resources Boskova, Veronika Stadler, Tanja PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences
title	PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences
title_full	PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences
title_fullStr	PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences
title_full_unstemmed	PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences
title_short	PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences
title_sort	piqmee: bayesian phylodynamic method for analysis of large data sets with duplicate sequences
topic	Resources
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7530608/ https://www.ncbi.nlm.nih.gov/pubmed/32492139 http://dx.doi.org/10.1093/molbev/msaa136
work_keys_str_mv	AT boskovaveronika piqmeebayesianphylodynamicmethodforanalysisoflargedatasetswithduplicatesequences AT stadlertanja piqmeebayesianphylodynamicmethodforanalysisoflargedatasetswithduplicatesequences

PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences

Ejemplares similares