Cargando…
Multi-CSAR: a multiple reference-based contig scaffolder using algebraic rearrangements
BACKGROUND: One of the important steps in the process of assembling a genome sequence from short reads is scaffolding, in which the contigs in a draft genome are ordered and oriented into scaffolds. Currently, several scaffolding tools based on a single reference genome have been developed. However,...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6311912/ https://www.ncbi.nlm.nih.gov/pubmed/30598087 http://dx.doi.org/10.1186/s12918-018-0654-y |
_version_ | 1783383699708968960 |
---|---|
author | Chen, Kun-Tze Shen, Hsin-Ting Lu, Chin Lung |
author_facet | Chen, Kun-Tze Shen, Hsin-Ting Lu, Chin Lung |
author_sort | Chen, Kun-Tze |
collection | PubMed |
description | BACKGROUND: One of the important steps in the process of assembling a genome sequence from short reads is scaffolding, in which the contigs in a draft genome are ordered and oriented into scaffolds. Currently, several scaffolding tools based on a single reference genome have been developed. However, a single reference genome may not be sufficient alone for a scaffolder to generate correct scaffolds of a target draft genome, especially when the evolutionary relationship between the target and reference genomes is distant or some rearrangements occur between them. This motivates the need to develop scaffolding tools that can order and orient the contigs of the target genome using multiple reference genomes. RESULTS: In this work, we utilize a heuristic method to develop a new scaffolder called Multi-CSAR that is able to accurately scaffold a target draft genome based on multiple reference genomes, each of which does not need to be complete. Our experimental results on real datasets show that Multi-CSAR outperforms other two multiple reference-based scaffolding tools, Ragout and MeDuSa, in terms of many average metrics, such as sensitivity, precision, F-score, genome coverage, NGA50, scaffold number and running time. CONCLUSIONS: Multi-CSAR is a multiple reference-based scaffolder that can efficiently produce more accurate scaffolds of a target draft genome by referring to multiple complete and/or incomplete genomes of related organisms. Its stand-alone program is available for download at https://github.com/ablab-nthu/Multi-CSAR. |
format | Online Article Text |
id | pubmed-6311912 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-63119122019-01-07 Multi-CSAR: a multiple reference-based contig scaffolder using algebraic rearrangements Chen, Kun-Tze Shen, Hsin-Ting Lu, Chin Lung BMC Syst Biol Research BACKGROUND: One of the important steps in the process of assembling a genome sequence from short reads is scaffolding, in which the contigs in a draft genome are ordered and oriented into scaffolds. Currently, several scaffolding tools based on a single reference genome have been developed. However, a single reference genome may not be sufficient alone for a scaffolder to generate correct scaffolds of a target draft genome, especially when the evolutionary relationship between the target and reference genomes is distant or some rearrangements occur between them. This motivates the need to develop scaffolding tools that can order and orient the contigs of the target genome using multiple reference genomes. RESULTS: In this work, we utilize a heuristic method to develop a new scaffolder called Multi-CSAR that is able to accurately scaffold a target draft genome based on multiple reference genomes, each of which does not need to be complete. Our experimental results on real datasets show that Multi-CSAR outperforms other two multiple reference-based scaffolding tools, Ragout and MeDuSa, in terms of many average metrics, such as sensitivity, precision, F-score, genome coverage, NGA50, scaffold number and running time. CONCLUSIONS: Multi-CSAR is a multiple reference-based scaffolder that can efficiently produce more accurate scaffolds of a target draft genome by referring to multiple complete and/or incomplete genomes of related organisms. Its stand-alone program is available for download at https://github.com/ablab-nthu/Multi-CSAR. BioMed Central 2018-12-31 /pmc/articles/PMC6311912/ /pubmed/30598087 http://dx.doi.org/10.1186/s12918-018-0654-y Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Chen, Kun-Tze Shen, Hsin-Ting Lu, Chin Lung Multi-CSAR: a multiple reference-based contig scaffolder using algebraic rearrangements |
title | Multi-CSAR: a multiple reference-based contig scaffolder using algebraic rearrangements |
title_full | Multi-CSAR: a multiple reference-based contig scaffolder using algebraic rearrangements |
title_fullStr | Multi-CSAR: a multiple reference-based contig scaffolder using algebraic rearrangements |
title_full_unstemmed | Multi-CSAR: a multiple reference-based contig scaffolder using algebraic rearrangements |
title_short | Multi-CSAR: a multiple reference-based contig scaffolder using algebraic rearrangements |
title_sort | multi-csar: a multiple reference-based contig scaffolder using algebraic rearrangements |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6311912/ https://www.ncbi.nlm.nih.gov/pubmed/30598087 http://dx.doi.org/10.1186/s12918-018-0654-y |
work_keys_str_mv | AT chenkuntze multicsaramultiplereferencebasedcontigscaffolderusingalgebraicrearrangements AT shenhsinting multicsaramultiplereferencebasedcontigscaffolderusingalgebraicrearrangements AT luchinlung multicsaramultiplereferencebasedcontigscaffolderusingalgebraicrearrangements |