Cargando…

Scaffolder - software for manual genome scaffolding

BACKGROUND: The assembly of next-generation short-read sequencing data can result in a fragmented non-contiguous set of genomic sequences. Therefore a common step in a genome project is to join neighbouring sequence regions together and fill gaps. This scaffolding step is non-trivial and requires ma...

Descripción completa

Detalles Bibliográficos
Autores principales: Barton, Michael D, Barton, Hazel A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3464138/
https://www.ncbi.nlm.nih.gov/pubmed/22640820
http://dx.doi.org/10.1186/1751-0473-7-4
_version_ 1782245368295063552
author Barton, Michael D
Barton, Hazel A
author_facet Barton, Michael D
Barton, Hazel A
author_sort Barton, Michael D
collection PubMed
description BACKGROUND: The assembly of next-generation short-read sequencing data can result in a fragmented non-contiguous set of genomic sequences. Therefore a common step in a genome project is to join neighbouring sequence regions together and fill gaps. This scaffolding step is non-trivial and requires manually editing large blocks of nucleotide sequence. Joining these sequences together also hides the source of each region in the final genome sequence. Taken together these considerations may make reproducing or editing an existing genome scaffold difficult. METHODS: The software outlined here, “Scaffolder,” is implemented in the Ruby programming language and can be installed via the RubyGems software management system. Genome scaffolds are defined using YAML - a data format which is both human and machine-readable. Command line binaries and extensive documentation are available. RESULTS: This software allows a genome build to be defined in terms of the constituent sequences using a relatively simple syntax. This syntax further allows unknown regions to be specified and additional sequence to be used to fill known gaps in the scaffold. Defining the genome construction in a file makes the scaffolding process reproducible and easier to edit compared with large FASTA nucleotide sequences. CONCLUSIONS: Scaffolder is easy-to-use genome scaffolding software which promotes reproducibility and continuous development in a genome project. Scaffolder can be found at http://next.gs.
format Online
Article
Text
id pubmed-3464138
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34641382012-10-05 Scaffolder - software for manual genome scaffolding Barton, Michael D Barton, Hazel A Source Code Biol Med Software Review BACKGROUND: The assembly of next-generation short-read sequencing data can result in a fragmented non-contiguous set of genomic sequences. Therefore a common step in a genome project is to join neighbouring sequence regions together and fill gaps. This scaffolding step is non-trivial and requires manually editing large blocks of nucleotide sequence. Joining these sequences together also hides the source of each region in the final genome sequence. Taken together these considerations may make reproducing or editing an existing genome scaffold difficult. METHODS: The software outlined here, “Scaffolder,” is implemented in the Ruby programming language and can be installed via the RubyGems software management system. Genome scaffolds are defined using YAML - a data format which is both human and machine-readable. Command line binaries and extensive documentation are available. RESULTS: This software allows a genome build to be defined in terms of the constituent sequences using a relatively simple syntax. This syntax further allows unknown regions to be specified and additional sequence to be used to fill known gaps in the scaffold. Defining the genome construction in a file makes the scaffolding process reproducible and easier to edit compared with large FASTA nucleotide sequences. CONCLUSIONS: Scaffolder is easy-to-use genome scaffolding software which promotes reproducibility and continuous development in a genome project. Scaffolder can be found at http://next.gs. BioMed Central 2012-05-28 /pmc/articles/PMC3464138/ /pubmed/22640820 http://dx.doi.org/10.1186/1751-0473-7-4 Text en Copyright ©2012 Barton and Barton; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software Review
Barton, Michael D
Barton, Hazel A
Scaffolder - software for manual genome scaffolding
title Scaffolder - software for manual genome scaffolding
title_full Scaffolder - software for manual genome scaffolding
title_fullStr Scaffolder - software for manual genome scaffolding
title_full_unstemmed Scaffolder - software for manual genome scaffolding
title_short Scaffolder - software for manual genome scaffolding
title_sort scaffolder - software for manual genome scaffolding
topic Software Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3464138/
https://www.ncbi.nlm.nih.gov/pubmed/22640820
http://dx.doi.org/10.1186/1751-0473-7-4
work_keys_str_mv AT bartonmichaeld scaffoldersoftwareformanualgenomescaffolding
AT bartonhazela scaffoldersoftwareformanualgenomescaffolding