Cargando…
HGA: denovo genome assembly method for bacterial genomes using high coverage short sequencing reads
BACKGROUND: Current high-throughput sequencing technologies generate large numbers of relatively short and error-prone reads, making the de novo assembly problem challenging. Although high quality assemblies can be obtained by assembling multiple paired-end libraries with both short and long insert...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4779561/ https://www.ncbi.nlm.nih.gov/pubmed/26945881 http://dx.doi.org/10.1186/s12864-016-2515-7 |
_version_ | 1782419639454662656 |
---|---|
author | Al-okaily, Anas A. |
author_facet | Al-okaily, Anas A. |
author_sort | Al-okaily, Anas A. |
collection | PubMed |
description | BACKGROUND: Current high-throughput sequencing technologies generate large numbers of relatively short and error-prone reads, making the de novo assembly problem challenging. Although high quality assemblies can be obtained by assembling multiple paired-end libraries with both short and long insert sizes, the latter are costly to generate. Recently, GAGE-B study showed that a remarkably good assembly quality can be obtained for bacterial genomes by state-of-the-art assemblers run on a single short-insert library with very high coverage. RESULTS: In this paper, we introduce a novel hierarchical genome assembly (HGA) methodology that takes further advantage of such very high coverage by independently assembling disjoint subsets of reads, combining assemblies of the subsets, and finally re-assembling the combined contigs along with the original reads. CONCLUSIONS: We empirically evaluated this methodology for 8 leading assemblers using 7 GAGE-B bacterial datasets consisting of 100 bp Illumina HiSeq and 250 bp Illumina MiSeq reads, with coverage ranging from 100x– ∼200x. The results show that for all evaluated datasets and using most evaluated assemblers (that were used to assemble the disjoint subsets), HGA leads to a significant improvement in the quality of the assembly based on N50 and corrected N50 metrics. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2515-7) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4779561 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-47795612016-03-07 HGA: denovo genome assembly method for bacterial genomes using high coverage short sequencing reads Al-okaily, Anas A. BMC Genomics Research Article BACKGROUND: Current high-throughput sequencing technologies generate large numbers of relatively short and error-prone reads, making the de novo assembly problem challenging. Although high quality assemblies can be obtained by assembling multiple paired-end libraries with both short and long insert sizes, the latter are costly to generate. Recently, GAGE-B study showed that a remarkably good assembly quality can be obtained for bacterial genomes by state-of-the-art assemblers run on a single short-insert library with very high coverage. RESULTS: In this paper, we introduce a novel hierarchical genome assembly (HGA) methodology that takes further advantage of such very high coverage by independently assembling disjoint subsets of reads, combining assemblies of the subsets, and finally re-assembling the combined contigs along with the original reads. CONCLUSIONS: We empirically evaluated this methodology for 8 leading assemblers using 7 GAGE-B bacterial datasets consisting of 100 bp Illumina HiSeq and 250 bp Illumina MiSeq reads, with coverage ranging from 100x– ∼200x. The results show that for all evaluated datasets and using most evaluated assemblers (that were used to assemble the disjoint subsets), HGA leads to a significant improvement in the quality of the assembly based on N50 and corrected N50 metrics. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2515-7) contains supplementary material, which is available to authorized users. BioMed Central 2016-03-05 /pmc/articles/PMC4779561/ /pubmed/26945881 http://dx.doi.org/10.1186/s12864-016-2515-7 Text en © Al-okaily. 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Al-okaily, Anas A. HGA: denovo genome assembly method for bacterial genomes using high coverage short sequencing reads |
title | HGA: denovo genome assembly method for bacterial genomes using high coverage short sequencing reads |
title_full | HGA: denovo genome assembly method for bacterial genomes using high coverage short sequencing reads |
title_fullStr | HGA: denovo genome assembly method for bacterial genomes using high coverage short sequencing reads |
title_full_unstemmed | HGA: denovo genome assembly method for bacterial genomes using high coverage short sequencing reads |
title_short | HGA: denovo genome assembly method for bacterial genomes using high coverage short sequencing reads |
title_sort | hga: denovo genome assembly method for bacterial genomes using high coverage short sequencing reads |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4779561/ https://www.ncbi.nlm.nih.gov/pubmed/26945881 http://dx.doi.org/10.1186/s12864-016-2515-7 |
work_keys_str_mv | AT alokailyanasa hgadenovogenomeassemblymethodforbacterialgenomesusinghighcoverageshortsequencingreads |