Cargando…

A binning tool to reconstruct viral haplotypes from assembled contigs

BACKGROUND: Infections by RNA viruses such as Influenza, HIV still pose a serious threat to human health despite extensive research on viral diseases. One challenge for producing effective prevention and treatment strategies is high intra-species genetic diversity. As different strains may have diff...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Jiao, Shang, Jiayu, Wang, Jianrong, Sun, Yanni
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6829986/
https://www.ncbi.nlm.nih.gov/pubmed/31684876
http://dx.doi.org/10.1186/s12859-019-3138-1
_version_ 1783465686523183104
author Chen, Jiao
Shang, Jiayu
Wang, Jianrong
Sun, Yanni
author_facet Chen, Jiao
Shang, Jiayu
Wang, Jianrong
Sun, Yanni
author_sort Chen, Jiao
collection PubMed
description BACKGROUND: Infections by RNA viruses such as Influenza, HIV still pose a serious threat to human health despite extensive research on viral diseases. One challenge for producing effective prevention and treatment strategies is high intra-species genetic diversity. As different strains may have different biological properties, characterizing the genetic diversity is thus important to vaccine and drug design. Next-generation sequencing technology enables comprehensive characterization of both known and novel strains and has been widely adopted for sequencing viral populations. However, genome-scale reconstruction of haplotypes is still a challenging problem. In particular, haplotype assembly programs often produce contigs rather than full genomes. As a mutation in one gene can mask the phenotypic effects of a mutation at another locus, clustering these contigs into genome-scale haplotypes is still needed. RESULTS: We developed a contig binning tool, VirBin, which clusters contigs into different groups so that each group represents a haplotype. Commonly used features based on sequence composition and contig coverage cannot effectively distinguish viral haplotypes because of their high sequence similarity and heterogeneous sequencing coverage for RNA viruses. VirBin applied prototype-based clustering to cluster regions that are more likely to contain mutations specific to a haplotype. The tool was tested on multiple simulated sequencing data with different haplotype abundance distributions and contig sizes, and also on mock quasispecies sequencing data. The benchmark results with other contig binning tools demonstrated the superior sensitivity and precision of VirBin in contig binning for viral haplotype reconstruction. CONCLUSIONS: In this work, we presented VirBin, a new contig binning tool for distinguishing contigs from different viral haplotypes with high sequence similarity. It competes favorably with other tools on viral contig binning. The source codes are available at: https://github.com/chjiao/VirBin.
format Online
Article
Text
id pubmed-6829986
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68299862019-11-08 A binning tool to reconstruct viral haplotypes from assembled contigs Chen, Jiao Shang, Jiayu Wang, Jianrong Sun, Yanni BMC Bioinformatics Methodology Article BACKGROUND: Infections by RNA viruses such as Influenza, HIV still pose a serious threat to human health despite extensive research on viral diseases. One challenge for producing effective prevention and treatment strategies is high intra-species genetic diversity. As different strains may have different biological properties, characterizing the genetic diversity is thus important to vaccine and drug design. Next-generation sequencing technology enables comprehensive characterization of both known and novel strains and has been widely adopted for sequencing viral populations. However, genome-scale reconstruction of haplotypes is still a challenging problem. In particular, haplotype assembly programs often produce contigs rather than full genomes. As a mutation in one gene can mask the phenotypic effects of a mutation at another locus, clustering these contigs into genome-scale haplotypes is still needed. RESULTS: We developed a contig binning tool, VirBin, which clusters contigs into different groups so that each group represents a haplotype. Commonly used features based on sequence composition and contig coverage cannot effectively distinguish viral haplotypes because of their high sequence similarity and heterogeneous sequencing coverage for RNA viruses. VirBin applied prototype-based clustering to cluster regions that are more likely to contain mutations specific to a haplotype. The tool was tested on multiple simulated sequencing data with different haplotype abundance distributions and contig sizes, and also on mock quasispecies sequencing data. The benchmark results with other contig binning tools demonstrated the superior sensitivity and precision of VirBin in contig binning for viral haplotype reconstruction. CONCLUSIONS: In this work, we presented VirBin, a new contig binning tool for distinguishing contigs from different viral haplotypes with high sequence similarity. It competes favorably with other tools on viral contig binning. The source codes are available at: https://github.com/chjiao/VirBin. BioMed Central 2019-11-04 /pmc/articles/PMC6829986/ /pubmed/31684876 http://dx.doi.org/10.1186/s12859-019-3138-1 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Chen, Jiao
Shang, Jiayu
Wang, Jianrong
Sun, Yanni
A binning tool to reconstruct viral haplotypes from assembled contigs
title A binning tool to reconstruct viral haplotypes from assembled contigs
title_full A binning tool to reconstruct viral haplotypes from assembled contigs
title_fullStr A binning tool to reconstruct viral haplotypes from assembled contigs
title_full_unstemmed A binning tool to reconstruct viral haplotypes from assembled contigs
title_short A binning tool to reconstruct viral haplotypes from assembled contigs
title_sort binning tool to reconstruct viral haplotypes from assembled contigs
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6829986/
https://www.ncbi.nlm.nih.gov/pubmed/31684876
http://dx.doi.org/10.1186/s12859-019-3138-1
work_keys_str_mv AT chenjiao abinningtooltoreconstructviralhaplotypesfromassembledcontigs
AT shangjiayu abinningtooltoreconstructviralhaplotypesfromassembledcontigs
AT wangjianrong abinningtooltoreconstructviralhaplotypesfromassembledcontigs
AT sunyanni abinningtooltoreconstructviralhaplotypesfromassembledcontigs
AT chenjiao binningtooltoreconstructviralhaplotypesfromassembledcontigs
AT shangjiayu binningtooltoreconstructviralhaplotypesfromassembledcontigs
AT wangjianrong binningtooltoreconstructviralhaplotypesfromassembledcontigs
AT sunyanni binningtooltoreconstructviralhaplotypesfromassembledcontigs