Cargando…

De novo assembly of viral quasispecies using overlap graphs

A viral quasispecies, the ensemble of viral strains populating an infected person, can be highly diverse. For optimal assessment of virulence, pathogenesis, and therapy selection, determining the haplotypes of the individual strains can play a key role. As many viruses are subject to high mutation a...

Descripción completa

Detalles Bibliográficos
Autores principales: Baaijens, Jasmijn A., Aabidine, Amal Zine El, Rivals, Eric, Schönhuth, Alexander
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5411778/
https://www.ncbi.nlm.nih.gov/pubmed/28396522
http://dx.doi.org/10.1101/gr.215038.116
_version_ 1783232864542785536
author Baaijens, Jasmijn A.
Aabidine, Amal Zine El
Rivals, Eric
Schönhuth, Alexander
author_facet Baaijens, Jasmijn A.
Aabidine, Amal Zine El
Rivals, Eric
Schönhuth, Alexander
author_sort Baaijens, Jasmijn A.
collection PubMed
description A viral quasispecies, the ensemble of viral strains populating an infected person, can be highly diverse. For optimal assessment of virulence, pathogenesis, and therapy selection, determining the haplotypes of the individual strains can play a key role. As many viruses are subject to high mutation and recombination rates, high-quality reference genomes are often not available at the time of a new disease outbreak. We present SAVAGE, a computational tool for reconstructing individual haplotypes of intra-host virus strains without the need for a high-quality reference genome. SAVAGE makes use of either FM-index–based data structures or ad hoc consensus reference sequence for constructing overlap graphs from patient sample data. In this overlap graph, nodes represent reads and/or contigs, while edges reflect that two reads/contigs, based on sound statistical considerations, represent identical haplotypic sequence. Following an iterative scheme, a new overlap assembly algorithm that is based on the enumeration of statistically well-calibrated groups of reads/contigs then efficiently reconstructs the individual haplotypes from this overlap graph. In benchmark experiments on simulated and on real deep-coverage data, SAVAGE drastically outperforms generic de novo assemblers as well as the only specialized de novo viral quasispecies assembler available so far. When run on ad hoc consensus reference sequence, SAVAGE performs very favorably in comparison with state-of-the-art reference genome-guided tools. We also apply SAVAGE on two deep-coverage samples of patients infected by the Zika and the hepatitis C virus, respectively, which sheds light on the genetic structures of the respective viral quasispecies.
format Online
Article
Text
id pubmed-5411778
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-54117782017-11-01 De novo assembly of viral quasispecies using overlap graphs Baaijens, Jasmijn A. Aabidine, Amal Zine El Rivals, Eric Schönhuth, Alexander Genome Res Method A viral quasispecies, the ensemble of viral strains populating an infected person, can be highly diverse. For optimal assessment of virulence, pathogenesis, and therapy selection, determining the haplotypes of the individual strains can play a key role. As many viruses are subject to high mutation and recombination rates, high-quality reference genomes are often not available at the time of a new disease outbreak. We present SAVAGE, a computational tool for reconstructing individual haplotypes of intra-host virus strains without the need for a high-quality reference genome. SAVAGE makes use of either FM-index–based data structures or ad hoc consensus reference sequence for constructing overlap graphs from patient sample data. In this overlap graph, nodes represent reads and/or contigs, while edges reflect that two reads/contigs, based on sound statistical considerations, represent identical haplotypic sequence. Following an iterative scheme, a new overlap assembly algorithm that is based on the enumeration of statistically well-calibrated groups of reads/contigs then efficiently reconstructs the individual haplotypes from this overlap graph. In benchmark experiments on simulated and on real deep-coverage data, SAVAGE drastically outperforms generic de novo assemblers as well as the only specialized de novo viral quasispecies assembler available so far. When run on ad hoc consensus reference sequence, SAVAGE performs very favorably in comparison with state-of-the-art reference genome-guided tools. We also apply SAVAGE on two deep-coverage samples of patients infected by the Zika and the hepatitis C virus, respectively, which sheds light on the genetic structures of the respective viral quasispecies. Cold Spring Harbor Laboratory Press 2017-05 /pmc/articles/PMC5411778/ /pubmed/28396522 http://dx.doi.org/10.1101/gr.215038.116 Text en © 2017 Baaijens et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle Method
Baaijens, Jasmijn A.
Aabidine, Amal Zine El
Rivals, Eric
Schönhuth, Alexander
De novo assembly of viral quasispecies using overlap graphs
title De novo assembly of viral quasispecies using overlap graphs
title_full De novo assembly of viral quasispecies using overlap graphs
title_fullStr De novo assembly of viral quasispecies using overlap graphs
title_full_unstemmed De novo assembly of viral quasispecies using overlap graphs
title_short De novo assembly of viral quasispecies using overlap graphs
title_sort de novo assembly of viral quasispecies using overlap graphs
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5411778/
https://www.ncbi.nlm.nih.gov/pubmed/28396522
http://dx.doi.org/10.1101/gr.215038.116
work_keys_str_mv AT baaijensjasmijna denovoassemblyofviralquasispeciesusingoverlapgraphs
AT aabidineamalzineel denovoassemblyofviralquasispeciesusingoverlapgraphs
AT rivalseric denovoassemblyofviralquasispeciesusingoverlapgraphs
AT schonhuthalexander denovoassemblyofviralquasispeciesusingoverlapgraphs