Cargando…

HGA: denovo genome assembly method for bacterial genomes using high coverage short sequencing reads

BACKGROUND: Current high-throughput sequencing technologies generate large numbers of relatively short and error-prone reads, making the de novo assembly problem challenging. Although high quality assemblies can be obtained by assembling multiple paired-end libraries with both short and long insert...

Descripción completa

Detalles Bibliográficos
Autor principal: Al-okaily, Anas A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4779561/
https://www.ncbi.nlm.nih.gov/pubmed/26945881
http://dx.doi.org/10.1186/s12864-016-2515-7
_version_ 1782419639454662656
author Al-okaily, Anas A.
author_facet Al-okaily, Anas A.
author_sort Al-okaily, Anas A.
collection PubMed
description BACKGROUND: Current high-throughput sequencing technologies generate large numbers of relatively short and error-prone reads, making the de novo assembly problem challenging. Although high quality assemblies can be obtained by assembling multiple paired-end libraries with both short and long insert sizes, the latter are costly to generate. Recently, GAGE-B study showed that a remarkably good assembly quality can be obtained for bacterial genomes by state-of-the-art assemblers run on a single short-insert library with very high coverage. RESULTS: In this paper, we introduce a novel hierarchical genome assembly (HGA) methodology that takes further advantage of such very high coverage by independently assembling disjoint subsets of reads, combining assemblies of the subsets, and finally re-assembling the combined contigs along with the original reads. CONCLUSIONS: We empirically evaluated this methodology for 8 leading assemblers using 7 GAGE-B bacterial datasets consisting of 100 bp Illumina HiSeq and 250 bp Illumina MiSeq reads, with coverage ranging from 100x– ∼200x. The results show that for all evaluated datasets and using most evaluated assemblers (that were used to assemble the disjoint subsets), HGA leads to a significant improvement in the quality of the assembly based on N50 and corrected N50 metrics. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2515-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4779561
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-47795612016-03-07 HGA: denovo genome assembly method for bacterial genomes using high coverage short sequencing reads Al-okaily, Anas A. BMC Genomics Research Article BACKGROUND: Current high-throughput sequencing technologies generate large numbers of relatively short and error-prone reads, making the de novo assembly problem challenging. Although high quality assemblies can be obtained by assembling multiple paired-end libraries with both short and long insert sizes, the latter are costly to generate. Recently, GAGE-B study showed that a remarkably good assembly quality can be obtained for bacterial genomes by state-of-the-art assemblers run on a single short-insert library with very high coverage. RESULTS: In this paper, we introduce a novel hierarchical genome assembly (HGA) methodology that takes further advantage of such very high coverage by independently assembling disjoint subsets of reads, combining assemblies of the subsets, and finally re-assembling the combined contigs along with the original reads. CONCLUSIONS: We empirically evaluated this methodology for 8 leading assemblers using 7 GAGE-B bacterial datasets consisting of 100 bp Illumina HiSeq and 250 bp Illumina MiSeq reads, with coverage ranging from 100x– ∼200x. The results show that for all evaluated datasets and using most evaluated assemblers (that were used to assemble the disjoint subsets), HGA leads to a significant improvement in the quality of the assembly based on N50 and corrected N50 metrics. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2515-7) contains supplementary material, which is available to authorized users. BioMed Central 2016-03-05 /pmc/articles/PMC4779561/ /pubmed/26945881 http://dx.doi.org/10.1186/s12864-016-2515-7 Text en © Al-okaily. 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Al-okaily, Anas A.
HGA: denovo genome assembly method for bacterial genomes using high coverage short sequencing reads
title HGA: denovo genome assembly method for bacterial genomes using high coverage short sequencing reads
title_full HGA: denovo genome assembly method for bacterial genomes using high coverage short sequencing reads
title_fullStr HGA: denovo genome assembly method for bacterial genomes using high coverage short sequencing reads
title_full_unstemmed HGA: denovo genome assembly method for bacterial genomes using high coverage short sequencing reads
title_short HGA: denovo genome assembly method for bacterial genomes using high coverage short sequencing reads
title_sort hga: denovo genome assembly method for bacterial genomes using high coverage short sequencing reads
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4779561/
https://www.ncbi.nlm.nih.gov/pubmed/26945881
http://dx.doi.org/10.1186/s12864-016-2515-7
work_keys_str_mv AT alokailyanasa hgadenovogenomeassemblymethodforbacterialgenomesusinghighcoverageshortsequencingreads