Cargando…

Accurate viral population assembly from ultra-deep sequencing data

Motivation: Next-generation sequencing technologies sequence viruses with ultra-deep coverage, thus promising to revolutionize our understanding of the underlying diversity of viral populations. While the sequencing coverage is high enough that even rare viral variants are sequenced, the presence of...

Descripción completa

Detalles Bibliográficos
Autores principales: Mangul, Serghei, Wu, Nicholas C., Mancuso, Nicholas, Zelikovsky, Alex, Sun, Ren, Eskin, Eleazar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4058922/
https://www.ncbi.nlm.nih.gov/pubmed/24932001
http://dx.doi.org/10.1093/bioinformatics/btu295
_version_ 1782321187117858816
author Mangul, Serghei
Wu, Nicholas C.
Mancuso, Nicholas
Zelikovsky, Alex
Sun, Ren
Eskin, Eleazar
author_facet Mangul, Serghei
Wu, Nicholas C.
Mancuso, Nicholas
Zelikovsky, Alex
Sun, Ren
Eskin, Eleazar
author_sort Mangul, Serghei
collection PubMed
description Motivation: Next-generation sequencing technologies sequence viruses with ultra-deep coverage, thus promising to revolutionize our understanding of the underlying diversity of viral populations. While the sequencing coverage is high enough that even rare viral variants are sequenced, the presence of sequencing errors makes it difficult to distinguish between rare variants and sequencing errors. Results: In this article, we present a method to overcome the limitations of sequencing technologies and assemble a diverse viral population that allows for the detection of previously undiscovered rare variants. The proposed method consists of a high-fidelity sequencing protocol and an accurate viral population assembly method, referred to as Viral Genome Assembler (VGA). The proposed protocol is able to eliminate sequencing errors by using individual barcodes attached to the sequencing fragments. Highly accurate data in combination with deep coverage allow VGA to assemble rare variants. VGA uses an expectation–maximization algorithm to estimate abundances of the assembled viral variants in the population. Results on both synthetic and real datasets show that our method is able to accurately assemble an HIV viral population and detect rare variants previously undetectable due to sequencing errors. VGA outperforms state-of-the-art methods for genome-wide viral assembly. Furthermore, our method is the first viral assembly method that scales to millions of sequencing reads. Availability: Our tool VGA is freely available at http://genetics.cs.ucla.edu/vga/ Contact: serghei@cs.ucla.edu; eeskin@cs.ucla.edu
format Online
Article
Text
id pubmed-4058922
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-40589222014-06-18 Accurate viral population assembly from ultra-deep sequencing data Mangul, Serghei Wu, Nicholas C. Mancuso, Nicholas Zelikovsky, Alex Sun, Ren Eskin, Eleazar Bioinformatics Ismb 2014 Proceedings Papers Committee Motivation: Next-generation sequencing technologies sequence viruses with ultra-deep coverage, thus promising to revolutionize our understanding of the underlying diversity of viral populations. While the sequencing coverage is high enough that even rare viral variants are sequenced, the presence of sequencing errors makes it difficult to distinguish between rare variants and sequencing errors. Results: In this article, we present a method to overcome the limitations of sequencing technologies and assemble a diverse viral population that allows for the detection of previously undiscovered rare variants. The proposed method consists of a high-fidelity sequencing protocol and an accurate viral population assembly method, referred to as Viral Genome Assembler (VGA). The proposed protocol is able to eliminate sequencing errors by using individual barcodes attached to the sequencing fragments. Highly accurate data in combination with deep coverage allow VGA to assemble rare variants. VGA uses an expectation–maximization algorithm to estimate abundances of the assembled viral variants in the population. Results on both synthetic and real datasets show that our method is able to accurately assemble an HIV viral population and detect rare variants previously undetectable due to sequencing errors. VGA outperforms state-of-the-art methods for genome-wide viral assembly. Furthermore, our method is the first viral assembly method that scales to millions of sequencing reads. Availability: Our tool VGA is freely available at http://genetics.cs.ucla.edu/vga/ Contact: serghei@cs.ucla.edu; eeskin@cs.ucla.edu Oxford University Press 2014-06-15 2014-06-11 /pmc/articles/PMC4058922/ /pubmed/24932001 http://dx.doi.org/10.1093/bioinformatics/btu295 Text en © The Author 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb 2014 Proceedings Papers Committee
Mangul, Serghei
Wu, Nicholas C.
Mancuso, Nicholas
Zelikovsky, Alex
Sun, Ren
Eskin, Eleazar
Accurate viral population assembly from ultra-deep sequencing data
title Accurate viral population assembly from ultra-deep sequencing data
title_full Accurate viral population assembly from ultra-deep sequencing data
title_fullStr Accurate viral population assembly from ultra-deep sequencing data
title_full_unstemmed Accurate viral population assembly from ultra-deep sequencing data
title_short Accurate viral population assembly from ultra-deep sequencing data
title_sort accurate viral population assembly from ultra-deep sequencing data
topic Ismb 2014 Proceedings Papers Committee
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4058922/
https://www.ncbi.nlm.nih.gov/pubmed/24932001
http://dx.doi.org/10.1093/bioinformatics/btu295
work_keys_str_mv AT mangulserghei accurateviralpopulationassemblyfromultradeepsequencingdata
AT wunicholasc accurateviralpopulationassemblyfromultradeepsequencingdata
AT mancusonicholas accurateviralpopulationassemblyfromultradeepsequencingdata
AT zelikovskyalex accurateviralpopulationassemblyfromultradeepsequencingdata
AT sunren accurateviralpopulationassemblyfromultradeepsequencingdata
AT eskineleazar accurateviralpopulationassemblyfromultradeepsequencingdata