Cargando…

Multi-CSAR: a multiple reference-based contig scaffolder using algebraic rearrangements

BACKGROUND: One of the important steps in the process of assembling a genome sequence from short reads is scaffolding, in which the contigs in a draft genome are ordered and oriented into scaffolds. Currently, several scaffolding tools based on a single reference genome have been developed. However,...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Kun-Tze, Shen, Hsin-Ting, Lu, Chin Lung
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6311912/
https://www.ncbi.nlm.nih.gov/pubmed/30598087
http://dx.doi.org/10.1186/s12918-018-0654-y
_version_ 1783383699708968960
author Chen, Kun-Tze
Shen, Hsin-Ting
Lu, Chin Lung
author_facet Chen, Kun-Tze
Shen, Hsin-Ting
Lu, Chin Lung
author_sort Chen, Kun-Tze
collection PubMed
description BACKGROUND: One of the important steps in the process of assembling a genome sequence from short reads is scaffolding, in which the contigs in a draft genome are ordered and oriented into scaffolds. Currently, several scaffolding tools based on a single reference genome have been developed. However, a single reference genome may not be sufficient alone for a scaffolder to generate correct scaffolds of a target draft genome, especially when the evolutionary relationship between the target and reference genomes is distant or some rearrangements occur between them. This motivates the need to develop scaffolding tools that can order and orient the contigs of the target genome using multiple reference genomes. RESULTS: In this work, we utilize a heuristic method to develop a new scaffolder called Multi-CSAR that is able to accurately scaffold a target draft genome based on multiple reference genomes, each of which does not need to be complete. Our experimental results on real datasets show that Multi-CSAR outperforms other two multiple reference-based scaffolding tools, Ragout and MeDuSa, in terms of many average metrics, such as sensitivity, precision, F-score, genome coverage, NGA50, scaffold number and running time. CONCLUSIONS: Multi-CSAR is a multiple reference-based scaffolder that can efficiently produce more accurate scaffolds of a target draft genome by referring to multiple complete and/or incomplete genomes of related organisms. Its stand-alone program is available for download at https://github.com/ablab-nthu/Multi-CSAR.
format Online
Article
Text
id pubmed-6311912
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63119122019-01-07 Multi-CSAR: a multiple reference-based contig scaffolder using algebraic rearrangements Chen, Kun-Tze Shen, Hsin-Ting Lu, Chin Lung BMC Syst Biol Research BACKGROUND: One of the important steps in the process of assembling a genome sequence from short reads is scaffolding, in which the contigs in a draft genome are ordered and oriented into scaffolds. Currently, several scaffolding tools based on a single reference genome have been developed. However, a single reference genome may not be sufficient alone for a scaffolder to generate correct scaffolds of a target draft genome, especially when the evolutionary relationship between the target and reference genomes is distant or some rearrangements occur between them. This motivates the need to develop scaffolding tools that can order and orient the contigs of the target genome using multiple reference genomes. RESULTS: In this work, we utilize a heuristic method to develop a new scaffolder called Multi-CSAR that is able to accurately scaffold a target draft genome based on multiple reference genomes, each of which does not need to be complete. Our experimental results on real datasets show that Multi-CSAR outperforms other two multiple reference-based scaffolding tools, Ragout and MeDuSa, in terms of many average metrics, such as sensitivity, precision, F-score, genome coverage, NGA50, scaffold number and running time. CONCLUSIONS: Multi-CSAR is a multiple reference-based scaffolder that can efficiently produce more accurate scaffolds of a target draft genome by referring to multiple complete and/or incomplete genomes of related organisms. Its stand-alone program is available for download at https://github.com/ablab-nthu/Multi-CSAR. BioMed Central 2018-12-31 /pmc/articles/PMC6311912/ /pubmed/30598087 http://dx.doi.org/10.1186/s12918-018-0654-y Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Chen, Kun-Tze
Shen, Hsin-Ting
Lu, Chin Lung
Multi-CSAR: a multiple reference-based contig scaffolder using algebraic rearrangements
title Multi-CSAR: a multiple reference-based contig scaffolder using algebraic rearrangements
title_full Multi-CSAR: a multiple reference-based contig scaffolder using algebraic rearrangements
title_fullStr Multi-CSAR: a multiple reference-based contig scaffolder using algebraic rearrangements
title_full_unstemmed Multi-CSAR: a multiple reference-based contig scaffolder using algebraic rearrangements
title_short Multi-CSAR: a multiple reference-based contig scaffolder using algebraic rearrangements
title_sort multi-csar: a multiple reference-based contig scaffolder using algebraic rearrangements
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6311912/
https://www.ncbi.nlm.nih.gov/pubmed/30598087
http://dx.doi.org/10.1186/s12918-018-0654-y
work_keys_str_mv AT chenkuntze multicsaramultiplereferencebasedcontigscaffolderusingalgebraicrearrangements
AT shenhsinting multicsaramultiplereferencebasedcontigscaffolderusingalgebraicrearrangements
AT luchinlung multicsaramultiplereferencebasedcontigscaffolderusingalgebraicrearrangements