Cargando…

GAPPadder: a sensitive approach for closing gaps on draft genomes with short sequence reads

BACKGROUND: Closing gaps in draft genomes is an important post processing step in genome assembly. It leads to more complete genomes, which benefits downstream genome analysis such as annotation and genotyping. Several tools have been developed for gap closing. However, these tools don’t fully utili...

Descripción completa

Detalles Bibliográficos
Autores principales: Chu, Chong, Li, Xin, Wu, Yufeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6551238/
https://www.ncbi.nlm.nih.gov/pubmed/31167639
http://dx.doi.org/10.1186/s12864-019-5703-4
_version_ 1783424361722544128
author Chu, Chong
Li, Xin
Wu, Yufeng
author_facet Chu, Chong
Li, Xin
Wu, Yufeng
author_sort Chu, Chong
collection PubMed
description BACKGROUND: Closing gaps in draft genomes is an important post processing step in genome assembly. It leads to more complete genomes, which benefits downstream genome analysis such as annotation and genotyping. Several tools have been developed for gap closing. However, these tools don’t fully utilize the information contained in the sequence data. For example, while it is known that many gaps are caused by genomic repeats, existing tools often ignore many sequence reads that originate from a repeat-related gap. RESULTS: We compare GAPPadder with GapCloser, GapFiller and Sealer on one bacterial genome, human chromosome 14 and the human whole genome with paired-end and mate-paired reads with both short and long insert sizes. Empirical results show that GAPPadder can close more gaps than these existing tools. Besides closing gaps on draft genomes assembled only from short sequence reads, GAPPadder can also be used to close gaps for draft genomes assembled with long reads. We show GAPPadder can close gaps on the bed bug genome and the Asian sea bass genome that are assembled partially and fully with long reads respectively. We also show GAPPadder is efficient in both time and memory usage. CONCLUSION: In this paper, we propose a new approach called GAPPadder for gap closing. The main advantage of GAPPadder is that it uses more information in sequence data for gap closing. In particular, GAPPadder finds and uses reads that originate from repeat-related gaps. We show that these repeat-associated reads are useful for gap closing, even though they are ignored by all existing tools. Other main features of GAPPadder include utilizing the information in sequence reads with different insert sizes and performing two-stage local assembly of gap sequences. The results show that our method can close more gaps than several existing tools. The software tool, GAPPadder, is available for download at https://github.com/Reedwarbler/GAPPadder. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-5703-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6551238
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-65512382019-06-07 GAPPadder: a sensitive approach for closing gaps on draft genomes with short sequence reads Chu, Chong Li, Xin Wu, Yufeng BMC Genomics Research BACKGROUND: Closing gaps in draft genomes is an important post processing step in genome assembly. It leads to more complete genomes, which benefits downstream genome analysis such as annotation and genotyping. Several tools have been developed for gap closing. However, these tools don’t fully utilize the information contained in the sequence data. For example, while it is known that many gaps are caused by genomic repeats, existing tools often ignore many sequence reads that originate from a repeat-related gap. RESULTS: We compare GAPPadder with GapCloser, GapFiller and Sealer on one bacterial genome, human chromosome 14 and the human whole genome with paired-end and mate-paired reads with both short and long insert sizes. Empirical results show that GAPPadder can close more gaps than these existing tools. Besides closing gaps on draft genomes assembled only from short sequence reads, GAPPadder can also be used to close gaps for draft genomes assembled with long reads. We show GAPPadder can close gaps on the bed bug genome and the Asian sea bass genome that are assembled partially and fully with long reads respectively. We also show GAPPadder is efficient in both time and memory usage. CONCLUSION: In this paper, we propose a new approach called GAPPadder for gap closing. The main advantage of GAPPadder is that it uses more information in sequence data for gap closing. In particular, GAPPadder finds and uses reads that originate from repeat-related gaps. We show that these repeat-associated reads are useful for gap closing, even though they are ignored by all existing tools. Other main features of GAPPadder include utilizing the information in sequence reads with different insert sizes and performing two-stage local assembly of gap sequences. The results show that our method can close more gaps than several existing tools. The software tool, GAPPadder, is available for download at https://github.com/Reedwarbler/GAPPadder. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-5703-4) contains supplementary material, which is available to authorized users. BioMed Central 2019-06-06 /pmc/articles/PMC6551238/ /pubmed/31167639 http://dx.doi.org/10.1186/s12864-019-5703-4 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Chu, Chong
Li, Xin
Wu, Yufeng
GAPPadder: a sensitive approach for closing gaps on draft genomes with short sequence reads
title GAPPadder: a sensitive approach for closing gaps on draft genomes with short sequence reads
title_full GAPPadder: a sensitive approach for closing gaps on draft genomes with short sequence reads
title_fullStr GAPPadder: a sensitive approach for closing gaps on draft genomes with short sequence reads
title_full_unstemmed GAPPadder: a sensitive approach for closing gaps on draft genomes with short sequence reads
title_short GAPPadder: a sensitive approach for closing gaps on draft genomes with short sequence reads
title_sort gappadder: a sensitive approach for closing gaps on draft genomes with short sequence reads
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6551238/
https://www.ncbi.nlm.nih.gov/pubmed/31167639
http://dx.doi.org/10.1186/s12864-019-5703-4
work_keys_str_mv AT chuchong gappadderasensitiveapproachforclosinggapsondraftgenomeswithshortsequencereads
AT lixin gappadderasensitiveapproachforclosinggapsondraftgenomeswithshortsequencereads
AT wuyufeng gappadderasensitiveapproachforclosinggapsondraftgenomeswithshortsequencereads