Cargando…
A binning tool to reconstruct viral haplotypes from assembled contigs
BACKGROUND: Infections by RNA viruses such as Influenza, HIV still pose a serious threat to human health despite extensive research on viral diseases. One challenge for producing effective prevention and treatment strategies is high intra-species genetic diversity. As different strains may have diff...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6829986/ https://www.ncbi.nlm.nih.gov/pubmed/31684876 http://dx.doi.org/10.1186/s12859-019-3138-1 |
_version_ | 1783465686523183104 |
---|---|
author | Chen, Jiao Shang, Jiayu Wang, Jianrong Sun, Yanni |
author_facet | Chen, Jiao Shang, Jiayu Wang, Jianrong Sun, Yanni |
author_sort | Chen, Jiao |
collection | PubMed |
description | BACKGROUND: Infections by RNA viruses such as Influenza, HIV still pose a serious threat to human health despite extensive research on viral diseases. One challenge for producing effective prevention and treatment strategies is high intra-species genetic diversity. As different strains may have different biological properties, characterizing the genetic diversity is thus important to vaccine and drug design. Next-generation sequencing technology enables comprehensive characterization of both known and novel strains and has been widely adopted for sequencing viral populations. However, genome-scale reconstruction of haplotypes is still a challenging problem. In particular, haplotype assembly programs often produce contigs rather than full genomes. As a mutation in one gene can mask the phenotypic effects of a mutation at another locus, clustering these contigs into genome-scale haplotypes is still needed. RESULTS: We developed a contig binning tool, VirBin, which clusters contigs into different groups so that each group represents a haplotype. Commonly used features based on sequence composition and contig coverage cannot effectively distinguish viral haplotypes because of their high sequence similarity and heterogeneous sequencing coverage for RNA viruses. VirBin applied prototype-based clustering to cluster regions that are more likely to contain mutations specific to a haplotype. The tool was tested on multiple simulated sequencing data with different haplotype abundance distributions and contig sizes, and also on mock quasispecies sequencing data. The benchmark results with other contig binning tools demonstrated the superior sensitivity and precision of VirBin in contig binning for viral haplotype reconstruction. CONCLUSIONS: In this work, we presented VirBin, a new contig binning tool for distinguishing contigs from different viral haplotypes with high sequence similarity. It competes favorably with other tools on viral contig binning. The source codes are available at: https://github.com/chjiao/VirBin. |
format | Online Article Text |
id | pubmed-6829986 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-68299862019-11-08 A binning tool to reconstruct viral haplotypes from assembled contigs Chen, Jiao Shang, Jiayu Wang, Jianrong Sun, Yanni BMC Bioinformatics Methodology Article BACKGROUND: Infections by RNA viruses such as Influenza, HIV still pose a serious threat to human health despite extensive research on viral diseases. One challenge for producing effective prevention and treatment strategies is high intra-species genetic diversity. As different strains may have different biological properties, characterizing the genetic diversity is thus important to vaccine and drug design. Next-generation sequencing technology enables comprehensive characterization of both known and novel strains and has been widely adopted for sequencing viral populations. However, genome-scale reconstruction of haplotypes is still a challenging problem. In particular, haplotype assembly programs often produce contigs rather than full genomes. As a mutation in one gene can mask the phenotypic effects of a mutation at another locus, clustering these contigs into genome-scale haplotypes is still needed. RESULTS: We developed a contig binning tool, VirBin, which clusters contigs into different groups so that each group represents a haplotype. Commonly used features based on sequence composition and contig coverage cannot effectively distinguish viral haplotypes because of their high sequence similarity and heterogeneous sequencing coverage for RNA viruses. VirBin applied prototype-based clustering to cluster regions that are more likely to contain mutations specific to a haplotype. The tool was tested on multiple simulated sequencing data with different haplotype abundance distributions and contig sizes, and also on mock quasispecies sequencing data. The benchmark results with other contig binning tools demonstrated the superior sensitivity and precision of VirBin in contig binning for viral haplotype reconstruction. CONCLUSIONS: In this work, we presented VirBin, a new contig binning tool for distinguishing contigs from different viral haplotypes with high sequence similarity. It competes favorably with other tools on viral contig binning. The source codes are available at: https://github.com/chjiao/VirBin. BioMed Central 2019-11-04 /pmc/articles/PMC6829986/ /pubmed/31684876 http://dx.doi.org/10.1186/s12859-019-3138-1 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Chen, Jiao Shang, Jiayu Wang, Jianrong Sun, Yanni A binning tool to reconstruct viral haplotypes from assembled contigs |
title | A binning tool to reconstruct viral haplotypes from assembled contigs |
title_full | A binning tool to reconstruct viral haplotypes from assembled contigs |
title_fullStr | A binning tool to reconstruct viral haplotypes from assembled contigs |
title_full_unstemmed | A binning tool to reconstruct viral haplotypes from assembled contigs |
title_short | A binning tool to reconstruct viral haplotypes from assembled contigs |
title_sort | binning tool to reconstruct viral haplotypes from assembled contigs |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6829986/ https://www.ncbi.nlm.nih.gov/pubmed/31684876 http://dx.doi.org/10.1186/s12859-019-3138-1 |
work_keys_str_mv | AT chenjiao abinningtooltoreconstructviralhaplotypesfromassembledcontigs AT shangjiayu abinningtooltoreconstructviralhaplotypesfromassembledcontigs AT wangjianrong abinningtooltoreconstructviralhaplotypesfromassembledcontigs AT sunyanni abinningtooltoreconstructviralhaplotypesfromassembledcontigs AT chenjiao binningtooltoreconstructviralhaplotypesfromassembledcontigs AT shangjiayu binningtooltoreconstructviralhaplotypesfromassembledcontigs AT wangjianrong binningtooltoreconstructviralhaplotypesfromassembledcontigs AT sunyanni binningtooltoreconstructviralhaplotypesfromassembledcontigs |