Cargando…

gapFinisher: A reliable gap filling pipeline for SSPACE-LongRead scaffolder output

Unknown sequences, or gaps, are present in many published genomes across public databases. Gap filling is an important finishing step in de novo genome assembly, especially in large genomes. The gap filling problem is nontrivial and while there are many computational tools partially solving the prob...

Descripción completa

Detalles Bibliográficos
Autores principales: Kammonen, Juhana I., Smolander, Olli-Pekka, Paulin, Lars, Pereira, Pedro A. B., Laine, Pia, Koskinen, Patrik, Jernvall, Jukka, Auvinen, Petri
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6733440/
https://www.ncbi.nlm.nih.gov/pubmed/31498807
http://dx.doi.org/10.1371/journal.pone.0216885
_version_ 1783449983445368832
author Kammonen, Juhana I.
Smolander, Olli-Pekka
Paulin, Lars
Pereira, Pedro A. B.
Laine, Pia
Koskinen, Patrik
Jernvall, Jukka
Auvinen, Petri
author_facet Kammonen, Juhana I.
Smolander, Olli-Pekka
Paulin, Lars
Pereira, Pedro A. B.
Laine, Pia
Koskinen, Patrik
Jernvall, Jukka
Auvinen, Petri
author_sort Kammonen, Juhana I.
collection PubMed
description Unknown sequences, or gaps, are present in many published genomes across public databases. Gap filling is an important finishing step in de novo genome assembly, especially in large genomes. The gap filling problem is nontrivial and while there are many computational tools partially solving the problem, several have shortcomings as to the reliability and correctness of the output, i.e. the gap filled draft genome. SSPACE-LongRead is a scaffolding tool that utilizes long reads from multiple third-generation sequencing platforms in finding links between contigs and combining them. The long reads potentially contain sequence information to fill the gaps created in the scaffolding, but SSPACE-LongRead currently lacks this functionality. We present an automated pipeline called gapFinisher to process SSPACE-LongRead output to fill gaps after the scaffolding. gapFinisher is based on the controlled use of a previously published gap filling tool FGAP and works on all standard Linux/UNIX command lines. We compare the performance of gapFinisher against two other published gap filling tools PBJelly and GMcloser. We conclude that gapFinisher can fill gaps in draft genomes quickly and reliably. In addition, the serial design of gapFinisher makes it scale well from prokaryote genomes to larger genomes with no increase in the computational footprint.
format Online
Article
Text
id pubmed-6733440
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-67334402019-09-20 gapFinisher: A reliable gap filling pipeline for SSPACE-LongRead scaffolder output Kammonen, Juhana I. Smolander, Olli-Pekka Paulin, Lars Pereira, Pedro A. B. Laine, Pia Koskinen, Patrik Jernvall, Jukka Auvinen, Petri PLoS One Research Article Unknown sequences, or gaps, are present in many published genomes across public databases. Gap filling is an important finishing step in de novo genome assembly, especially in large genomes. The gap filling problem is nontrivial and while there are many computational tools partially solving the problem, several have shortcomings as to the reliability and correctness of the output, i.e. the gap filled draft genome. SSPACE-LongRead is a scaffolding tool that utilizes long reads from multiple third-generation sequencing platforms in finding links between contigs and combining them. The long reads potentially contain sequence information to fill the gaps created in the scaffolding, but SSPACE-LongRead currently lacks this functionality. We present an automated pipeline called gapFinisher to process SSPACE-LongRead output to fill gaps after the scaffolding. gapFinisher is based on the controlled use of a previously published gap filling tool FGAP and works on all standard Linux/UNIX command lines. We compare the performance of gapFinisher against two other published gap filling tools PBJelly and GMcloser. We conclude that gapFinisher can fill gaps in draft genomes quickly and reliably. In addition, the serial design of gapFinisher makes it scale well from prokaryote genomes to larger genomes with no increase in the computational footprint. Public Library of Science 2019-09-09 /pmc/articles/PMC6733440/ /pubmed/31498807 http://dx.doi.org/10.1371/journal.pone.0216885 Text en © 2019 Kammonen et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Kammonen, Juhana I.
Smolander, Olli-Pekka
Paulin, Lars
Pereira, Pedro A. B.
Laine, Pia
Koskinen, Patrik
Jernvall, Jukka
Auvinen, Petri
gapFinisher: A reliable gap filling pipeline for SSPACE-LongRead scaffolder output
title gapFinisher: A reliable gap filling pipeline for SSPACE-LongRead scaffolder output
title_full gapFinisher: A reliable gap filling pipeline for SSPACE-LongRead scaffolder output
title_fullStr gapFinisher: A reliable gap filling pipeline for SSPACE-LongRead scaffolder output
title_full_unstemmed gapFinisher: A reliable gap filling pipeline for SSPACE-LongRead scaffolder output
title_short gapFinisher: A reliable gap filling pipeline for SSPACE-LongRead scaffolder output
title_sort gapfinisher: a reliable gap filling pipeline for sspace-longread scaffolder output
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6733440/
https://www.ncbi.nlm.nih.gov/pubmed/31498807
http://dx.doi.org/10.1371/journal.pone.0216885
work_keys_str_mv AT kammonenjuhanai gapfinisherareliablegapfillingpipelineforsspacelongreadscaffolderoutput
AT smolanderollipekka gapfinisherareliablegapfillingpipelineforsspacelongreadscaffolderoutput
AT paulinlars gapfinisherareliablegapfillingpipelineforsspacelongreadscaffolderoutput
AT pereirapedroab gapfinisherareliablegapfillingpipelineforsspacelongreadscaffolderoutput
AT lainepia gapfinisherareliablegapfillingpipelineforsspacelongreadscaffolderoutput
AT koskinenpatrik gapfinisherareliablegapfillingpipelineforsspacelongreadscaffolderoutput
AT jernvalljukka gapfinisherareliablegapfillingpipelineforsspacelongreadscaffolderoutput
AT auvinenpetri gapfinisherareliablegapfillingpipelineforsspacelongreadscaffolderoutput