Cargando…

BESST - Efficient scaffolding of large fragmented assemblies

BACKGROUND: The use of short reads from High Throughput Sequencing (HTS) techniques is now commonplace in de novo assembly. Yet, obtaining contiguous assemblies from short reads is challenging, thus making scaffolding an important step in the assembly pipeline. Different algorithms have been propose...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sahlin, Kristoffer, Vezzi, Francesco, Nystedt, Björn, Lundeberg, Joakim, Arvestad, Lars
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4262078/ https://www.ncbi.nlm.nih.gov/pubmed/25128196 http://dx.doi.org/10.1186/1471-2105-15-281

_version_	1782348377099337728
author	Sahlin, Kristoffer Vezzi, Francesco Nystedt, Björn Lundeberg, Joakim Arvestad, Lars
author_facet	Sahlin, Kristoffer Vezzi, Francesco Nystedt, Björn Lundeberg, Joakim Arvestad, Lars
author_sort	Sahlin, Kristoffer
collection	PubMed
description	BACKGROUND: The use of short reads from High Throughput Sequencing (HTS) techniques is now commonplace in de novo assembly. Yet, obtaining contiguous assemblies from short reads is challenging, thus making scaffolding an important step in the assembly pipeline. Different algorithms have been proposed but many of them use the number of read pairs supporting a linking of two contigs as an indicator of reliability. This reasoning is intuitive, but fails to account for variation in link count due to contig features. We have also noted that published scaffolders are only evaluated on small datasets using output from only one assembler. Two issues arise from this. Firstly, some of the available tools are not well suited for complex genomes. Secondly, these evaluations provide little support for inferring a software’s general performance. RESULTS: We propose a new algorithm, implemented in a tool called BESST, which can scaffold genomes of all sizes and complexities and was used to scaffold the genome of P. abies (20 Gbp). We performed a comprehensive comparison of BESST against the most popular stand-alone scaffolders on a large variety of datasets. Our results confirm that some of the popular scaffolders are not practical to run on complex datasets. Furthermore, no single stand-alone scaffolder outperforms the others on all datasets. However, BESST fares favorably to the other tested scaffolders on GAGE datasets and, moreover, outperforms the other methods when library insert size distribution is wide. CONCLUSION: We conclude from our results that information sources other than the quantity of links, as is commonly used, can provide useful information about genome structure when scaffolding. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2105-15-281) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4262078
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-42620782014-12-11 BESST - Efficient scaffolding of large fragmented assemblies Sahlin, Kristoffer Vezzi, Francesco Nystedt, Björn Lundeberg, Joakim Arvestad, Lars BMC Bioinformatics Methodology Article BACKGROUND: The use of short reads from High Throughput Sequencing (HTS) techniques is now commonplace in de novo assembly. Yet, obtaining contiguous assemblies from short reads is challenging, thus making scaffolding an important step in the assembly pipeline. Different algorithms have been proposed but many of them use the number of read pairs supporting a linking of two contigs as an indicator of reliability. This reasoning is intuitive, but fails to account for variation in link count due to contig features. We have also noted that published scaffolders are only evaluated on small datasets using output from only one assembler. Two issues arise from this. Firstly, some of the available tools are not well suited for complex genomes. Secondly, these evaluations provide little support for inferring a software’s general performance. RESULTS: We propose a new algorithm, implemented in a tool called BESST, which can scaffold genomes of all sizes and complexities and was used to scaffold the genome of P. abies (20 Gbp). We performed a comprehensive comparison of BESST against the most popular stand-alone scaffolders on a large variety of datasets. Our results confirm that some of the popular scaffolders are not practical to run on complex datasets. Furthermore, no single stand-alone scaffolder outperforms the others on all datasets. However, BESST fares favorably to the other tested scaffolders on GAGE datasets and, moreover, outperforms the other methods when library insert size distribution is wide. CONCLUSION: We conclude from our results that information sources other than the quantity of links, as is commonly used, can provide useful information about genome structure when scaffolding. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2105-15-281) contains supplementary material, which is available to authorized users. BioMed Central 2014-08-15 /pmc/articles/PMC4262078/ /pubmed/25128196 http://dx.doi.org/10.1186/1471-2105-15-281 Text en © Sahlin et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Sahlin, Kristoffer Vezzi, Francesco Nystedt, Björn Lundeberg, Joakim Arvestad, Lars BESST - Efficient scaffolding of large fragmented assemblies
title	BESST - Efficient scaffolding of large fragmented assemblies
title_full	BESST - Efficient scaffolding of large fragmented assemblies
title_fullStr	BESST - Efficient scaffolding of large fragmented assemblies
title_full_unstemmed	BESST - Efficient scaffolding of large fragmented assemblies
title_short	BESST - Efficient scaffolding of large fragmented assemblies
title_sort	besst - efficient scaffolding of large fragmented assemblies
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4262078/ https://www.ncbi.nlm.nih.gov/pubmed/25128196 http://dx.doi.org/10.1186/1471-2105-15-281
work_keys_str_mv	AT sahlinkristoffer besstefficientscaffoldingoflargefragmentedassemblies AT vezzifrancesco besstefficientscaffoldingoflargefragmentedassemblies AT nystedtbjorn besstefficientscaffoldingoflargefragmentedassemblies AT lundebergjoakim besstefficientscaffoldingoflargefragmentedassemblies AT arvestadlars besstefficientscaffoldingoflargefragmentedassemblies

BESST - Efficient scaffolding of large fragmented assemblies

Ejemplares similares