Cargando…

LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads

BACKGROUND: Owing to the complexity of the assembly problem, we do not yet have complete genome sequences. The difficulty in assembling reads into finished genomes is exacerbated by sequence repeats and the inability of short reads to capture sufficient genomic information to resolve those problemat...

Descripción completa

Detalles Bibliográficos
Autores principales:	Warren, René L., Yang, Chen, Vandervalk, Benjamin P., Behsaz, Bahar, Lagman, Albert, Jones, Steven J. M., Birol, Inanç
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4524009/ https://www.ncbi.nlm.nih.gov/pubmed/26244089 http://dx.doi.org/10.1186/s13742-015-0076-3

_version_	1782384152567349248
author	Warren, René L. Yang, Chen Vandervalk, Benjamin P. Behsaz, Bahar Lagman, Albert Jones, Steven J. M. Birol, Inanç
author_facet	Warren, René L. Yang, Chen Vandervalk, Benjamin P. Behsaz, Bahar Lagman, Albert Jones, Steven J. M. Birol, Inanç
author_sort	Warren, René L.
collection	PubMed
description	BACKGROUND: Owing to the complexity of the assembly problem, we do not yet have complete genome sequences. The difficulty in assembling reads into finished genomes is exacerbated by sequence repeats and the inability of short reads to capture sufficient genomic information to resolve those problematic regions. In this regard, established and emerging long read technologies show great promise, but their current associated higher error rates typically require computational base correction and/or additional bioinformatics pre-processing before they can be of value. RESULTS: We present LINKS, the Long Interval Nucleotide K-mer Scaffolder algorithm, a method that makes use of the sequence properties of nanopore sequence data and other error-containing sequence data, to scaffold high-quality genome assemblies, without the need for read alignment or base correction. Here, we show how the contiguity of an ABySS Escherichia coli K-12 genome assembly can be increased greater than five-fold by the use of beta-released Oxford Nanopore Technologies Ltd. long reads and how LINKS leverages long-range information in Saccharomyces cerevisiae W303 nanopore reads to yield assemblies whose resulting contiguity and correctness are on par with or better than that of competing applications. We also present the re-scaffolding of the colossal white spruce (Picea glauca) draft assembly (PG29, 20 Gbp) and demonstrate how LINKS scales to larger genomes. CONCLUSIONS: This study highlights the present utility of nanopore reads for genome scaffolding in spite of their current limitations, which are expected to diminish as the nanopore sequencing technology advances. We expect LINKS to have broad utility in harnessing the potential of long reads in connecting high-quality sequences of small and large genome assembly drafts. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13742-015-0076-3) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4524009
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-45240092015-08-05 LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads Warren, René L. Yang, Chen Vandervalk, Benjamin P. Behsaz, Bahar Lagman, Albert Jones, Steven J. M. Birol, Inanç Gigascience Research BACKGROUND: Owing to the complexity of the assembly problem, we do not yet have complete genome sequences. The difficulty in assembling reads into finished genomes is exacerbated by sequence repeats and the inability of short reads to capture sufficient genomic information to resolve those problematic regions. In this regard, established and emerging long read technologies show great promise, but their current associated higher error rates typically require computational base correction and/or additional bioinformatics pre-processing before they can be of value. RESULTS: We present LINKS, the Long Interval Nucleotide K-mer Scaffolder algorithm, a method that makes use of the sequence properties of nanopore sequence data and other error-containing sequence data, to scaffold high-quality genome assemblies, without the need for read alignment or base correction. Here, we show how the contiguity of an ABySS Escherichia coli K-12 genome assembly can be increased greater than five-fold by the use of beta-released Oxford Nanopore Technologies Ltd. long reads and how LINKS leverages long-range information in Saccharomyces cerevisiae W303 nanopore reads to yield assemblies whose resulting contiguity and correctness are on par with or better than that of competing applications. We also present the re-scaffolding of the colossal white spruce (Picea glauca) draft assembly (PG29, 20 Gbp) and demonstrate how LINKS scales to larger genomes. CONCLUSIONS: This study highlights the present utility of nanopore reads for genome scaffolding in spite of their current limitations, which are expected to diminish as the nanopore sequencing technology advances. We expect LINKS to have broad utility in harnessing the potential of long reads in connecting high-quality sequences of small and large genome assembly drafts. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13742-015-0076-3) contains supplementary material, which is available to authorized users. BioMed Central 2015-08-04 /pmc/articles/PMC4524009/ /pubmed/26244089 http://dx.doi.org/10.1186/s13742-015-0076-3 Text en © Warren et al. 2015 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Warren, René L. Yang, Chen Vandervalk, Benjamin P. Behsaz, Bahar Lagman, Albert Jones, Steven J. M. Birol, Inanç LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads
title	LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads
title_full	LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads
title_fullStr	LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads
title_full_unstemmed	LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads
title_short	LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads
title_sort	links: scalable, alignment-free scaffolding of draft genomes with long reads
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4524009/ https://www.ncbi.nlm.nih.gov/pubmed/26244089 http://dx.doi.org/10.1186/s13742-015-0076-3
work_keys_str_mv	AT warrenrenel linksscalablealignmentfreescaffoldingofdraftgenomeswithlongreads AT yangchen linksscalablealignmentfreescaffoldingofdraftgenomeswithlongreads AT vandervalkbenjaminp linksscalablealignmentfreescaffoldingofdraftgenomeswithlongreads AT behsazbahar linksscalablealignmentfreescaffoldingofdraftgenomeswithlongreads AT lagmanalbert linksscalablealignmentfreescaffoldingofdraftgenomeswithlongreads AT jonesstevenjm linksscalablealignmentfreescaffoldingofdraftgenomeswithlongreads AT birolinanc linksscalablealignmentfreescaffoldingofdraftgenomeswithlongreads

LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads

Ejemplares similares