Cargando…

Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches

BACKGROUND: Improved DNA sequencing methods have transformed the field of genomics over the last decade. This has become possible due to the development of inexpensive short read sequencing technologies which have now resulted in three generations of sequencing platforms. More recently, a new fourth...

Descripción completa

Detalles Bibliográficos
Autores principales:	Cherukuri, Yesesri, Janga, Sarath Chandra
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5001211/ https://www.ncbi.nlm.nih.gov/pubmed/27556636 http://dx.doi.org/10.1186/s12864-016-2895-8

_version_	1782450431286312960
author	Cherukuri, Yesesri Janga, Sarath Chandra
author_facet	Cherukuri, Yesesri Janga, Sarath Chandra
author_sort	Cherukuri, Yesesri
collection	PubMed
description	BACKGROUND: Improved DNA sequencing methods have transformed the field of genomics over the last decade. This has become possible due to the development of inexpensive short read sequencing technologies which have now resulted in three generations of sequencing platforms. More recently, a new fourth generation of Nanopore based single molecule sequencing technology, was developed based on MinION(®) sequencer which is portable, inexpensive and fast. It is capable of generating reads of length greater than 100 kb. Though it has many specific advantages, the two major limitations of the MinION reads are high error rates and the need for the development of downstream pipelines. The algorithms for error correction have already emerged, while development of pipelines is still at nascent stage. RESULTS: In this study, we benchmarked available assembler algorithms to find an appropriate framework that can efficiently assemble Nanopore sequenced reads. To address this, we employed genome-scale Nanopore sequenced datasets available for E. coli and yeast genomes respectively. In order to comprehensively evaluate multiple algorithmic frameworks, we included assemblers based on de Bruijn graphs (Velvet and ABySS), Overlap Layout Consensus (OLC) (Celera) and Greedy extension (SSAKE) approaches. We analyzed the quality, accuracy of the assemblies as well as the computational performance of each of the assemblers included in our benchmark. Our analysis unveiled that OLC-based algorithm, Celera, could generate a high quality assembly with ten times higher N50 & mean contig values as well as one-fifth the number of total number of contigs compared to other tools. Celera was also found to exhibit an average genome coverage of 12 % in E. coli dataset and 70 % in Yeast dataset as well as relatively lesser run times. In contrast, de Bruijn graph based assemblers Velvet and ABySS generated the assemblies of moderate quality, in less time when there is no limitation on the memory allocation, while greedy extension based algorithm SSAKE generated an assembly of very poor quality but with genome coverage of 90 % on yeast dataset. CONCLUSION: OLC can be considered as a favorable algorithmic framework for the development of assembler tools for Nanopore-based data, followed by de Bruijn based algorithms as they consume relatively less or similar run times as OLC-based algorithms for generating assembly, irrespective of the memory allocated for the task. However, few improvements must be made to the existing de Bruijn implementations in order to generate an assembly with reasonable quality. Our findings should help in stimulating the development of novel assemblers for handling Nanopore sequence data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2895-8) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5001211
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-50012112016-09-06 Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches Cherukuri, Yesesri Janga, Sarath Chandra BMC Genomics Research BACKGROUND: Improved DNA sequencing methods have transformed the field of genomics over the last decade. This has become possible due to the development of inexpensive short read sequencing technologies which have now resulted in three generations of sequencing platforms. More recently, a new fourth generation of Nanopore based single molecule sequencing technology, was developed based on MinION(®) sequencer which is portable, inexpensive and fast. It is capable of generating reads of length greater than 100 kb. Though it has many specific advantages, the two major limitations of the MinION reads are high error rates and the need for the development of downstream pipelines. The algorithms for error correction have already emerged, while development of pipelines is still at nascent stage. RESULTS: In this study, we benchmarked available assembler algorithms to find an appropriate framework that can efficiently assemble Nanopore sequenced reads. To address this, we employed genome-scale Nanopore sequenced datasets available for E. coli and yeast genomes respectively. In order to comprehensively evaluate multiple algorithmic frameworks, we included assemblers based on de Bruijn graphs (Velvet and ABySS), Overlap Layout Consensus (OLC) (Celera) and Greedy extension (SSAKE) approaches. We analyzed the quality, accuracy of the assemblies as well as the computational performance of each of the assemblers included in our benchmark. Our analysis unveiled that OLC-based algorithm, Celera, could generate a high quality assembly with ten times higher N50 & mean contig values as well as one-fifth the number of total number of contigs compared to other tools. Celera was also found to exhibit an average genome coverage of 12 % in E. coli dataset and 70 % in Yeast dataset as well as relatively lesser run times. In contrast, de Bruijn graph based assemblers Velvet and ABySS generated the assemblies of moderate quality, in less time when there is no limitation on the memory allocation, while greedy extension based algorithm SSAKE generated an assembly of very poor quality but with genome coverage of 90 % on yeast dataset. CONCLUSION: OLC can be considered as a favorable algorithmic framework for the development of assembler tools for Nanopore-based data, followed by de Bruijn based algorithms as they consume relatively less or similar run times as OLC-based algorithms for generating assembly, irrespective of the memory allocated for the task. However, few improvements must be made to the existing de Bruijn implementations in order to generate an assembly with reasonable quality. Our findings should help in stimulating the development of novel assemblers for handling Nanopore sequence data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2895-8) contains supplementary material, which is available to authorized users. BioMed Central 2016-08-22 /pmc/articles/PMC5001211/ /pubmed/27556636 http://dx.doi.org/10.1186/s12864-016-2895-8 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Cherukuri, Yesesri Janga, Sarath Chandra Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches
title	Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches
title_full	Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches
title_fullStr	Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches
title_full_unstemmed	Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches
title_short	Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches
title_sort	benchmarking of de novo assembly algorithms for nanopore data reveals optimal performance of olc approaches
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5001211/ https://www.ncbi.nlm.nih.gov/pubmed/27556636 http://dx.doi.org/10.1186/s12864-016-2895-8
work_keys_str_mv	AT cherukuriyesesri benchmarkingofdenovoassemblyalgorithmsfornanoporedatarevealsoptimalperformanceofolcapproaches AT jangasarathchandra benchmarkingofdenovoassemblyalgorithmsfornanoporedatarevealsoptimalperformanceofolcapproaches

Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches

Ejemplares similares