Cargando…

V-Phaser 2: variant inference for viral populations

BACKGROUND: Massively parallel sequencing offers the possibility of revolutionizing the study of viral populations by providing ultra deep sequencing (tens to hundreds of thousand fold coverage) of complete viral genomes. However, differentiation of true low frequency variants from sequencing errors...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Xiao, Charlebois, Patrick, Macalalad, Alex, Henn, Matthew R, Zody, Michael C
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3907024/
https://www.ncbi.nlm.nih.gov/pubmed/24088188
http://dx.doi.org/10.1186/1471-2164-14-674
_version_ 1782301556144603136
author Yang, Xiao
Charlebois, Patrick
Macalalad, Alex
Henn, Matthew R
Zody, Michael C
author_facet Yang, Xiao
Charlebois, Patrick
Macalalad, Alex
Henn, Matthew R
Zody, Michael C
author_sort Yang, Xiao
collection PubMed
description BACKGROUND: Massively parallel sequencing offers the possibility of revolutionizing the study of viral populations by providing ultra deep sequencing (tens to hundreds of thousand fold coverage) of complete viral genomes. However, differentiation of true low frequency variants from sequencing errors remains challenging. RESULTS: We developed a software package, V-Phaser 2, for inferring intrahost diversity within viral populations. This program adds three major new methodologies to the state of the art: a technique to efficiently utilize paired end read data for calling phased variants, a new strategy to represent and infer length polymorphisms, and an in line filter for erroneous calls arising from systematic sequencing artifacts. We have also heavily optimized memory and run time performance. This combination of algorithmic and technical advances allows V-Phaser 2 to fully utilize extremely deep paired end sequencing data (such as generated by Illumina sequencers) to accurately infer low frequency intrahost variants in viral populations in reasonable time on a standard desktop computer. V-Phaser 2 was validated and compared to both QuRe and the original V-Phaser on three datasets obtained from two viral populations: a mixture of eight known strains of West Nile Virus (WNV) sequenced on both 454 Titanium and Illumina MiSeq and a mixture of twenty-four known strains of WNV sequenced only on 454 Titanium. V-Phaser 2 outperformed the other two programs in both sensitivity and specificity while using more than five fold less time and memory. CONCLUSIONS: We developed V-Phaser 2, a publicly available software tool (V-Phaser 2 can be accessed via: http://www.broadinstitute.org/scientific-community/science/projects/viral-genomics/v-phaser-2 and is freely available for academic use) that enables the efficient analysis of ultra-deep sequencing data produced by common next generation sequencing platforms for viral populations.
format Online
Article
Text
id pubmed-3907024
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-39070242014-02-12 V-Phaser 2: variant inference for viral populations Yang, Xiao Charlebois, Patrick Macalalad, Alex Henn, Matthew R Zody, Michael C BMC Genomics Methodology Article BACKGROUND: Massively parallel sequencing offers the possibility of revolutionizing the study of viral populations by providing ultra deep sequencing (tens to hundreds of thousand fold coverage) of complete viral genomes. However, differentiation of true low frequency variants from sequencing errors remains challenging. RESULTS: We developed a software package, V-Phaser 2, for inferring intrahost diversity within viral populations. This program adds three major new methodologies to the state of the art: a technique to efficiently utilize paired end read data for calling phased variants, a new strategy to represent and infer length polymorphisms, and an in line filter for erroneous calls arising from systematic sequencing artifacts. We have also heavily optimized memory and run time performance. This combination of algorithmic and technical advances allows V-Phaser 2 to fully utilize extremely deep paired end sequencing data (such as generated by Illumina sequencers) to accurately infer low frequency intrahost variants in viral populations in reasonable time on a standard desktop computer. V-Phaser 2 was validated and compared to both QuRe and the original V-Phaser on three datasets obtained from two viral populations: a mixture of eight known strains of West Nile Virus (WNV) sequenced on both 454 Titanium and Illumina MiSeq and a mixture of twenty-four known strains of WNV sequenced only on 454 Titanium. V-Phaser 2 outperformed the other two programs in both sensitivity and specificity while using more than five fold less time and memory. CONCLUSIONS: We developed V-Phaser 2, a publicly available software tool (V-Phaser 2 can be accessed via: http://www.broadinstitute.org/scientific-community/science/projects/viral-genomics/v-phaser-2 and is freely available for academic use) that enables the efficient analysis of ultra-deep sequencing data produced by common next generation sequencing platforms for viral populations. BioMed Central 2013-10-03 /pmc/articles/PMC3907024/ /pubmed/24088188 http://dx.doi.org/10.1186/1471-2164-14-674 Text en Copyright © 2013 Yang et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Yang, Xiao
Charlebois, Patrick
Macalalad, Alex
Henn, Matthew R
Zody, Michael C
V-Phaser 2: variant inference for viral populations
title V-Phaser 2: variant inference for viral populations
title_full V-Phaser 2: variant inference for viral populations
title_fullStr V-Phaser 2: variant inference for viral populations
title_full_unstemmed V-Phaser 2: variant inference for viral populations
title_short V-Phaser 2: variant inference for viral populations
title_sort v-phaser 2: variant inference for viral populations
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3907024/
https://www.ncbi.nlm.nih.gov/pubmed/24088188
http://dx.doi.org/10.1186/1471-2164-14-674
work_keys_str_mv AT yangxiao vphaser2variantinferenceforviralpopulations
AT charleboispatrick vphaser2variantinferenceforviralpopulations
AT macalaladalex vphaser2variantinferenceforviralpopulations
AT hennmatthewr vphaser2variantinferenceforviralpopulations
AT zodymichaelc vphaser2variantinferenceforviralpopulations