Cargando…
Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches
BACKGROUND: Improved DNA sequencing methods have transformed the field of genomics over the last decade. This has become possible due to the development of inexpensive short read sequencing technologies which have now resulted in three generations of sequencing platforms. More recently, a new fourth...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5001211/ https://www.ncbi.nlm.nih.gov/pubmed/27556636 http://dx.doi.org/10.1186/s12864-016-2895-8 |
_version_ | 1782450431286312960 |
---|---|
author | Cherukuri, Yesesri Janga, Sarath Chandra |
author_facet | Cherukuri, Yesesri Janga, Sarath Chandra |
author_sort | Cherukuri, Yesesri |
collection | PubMed |
description | BACKGROUND: Improved DNA sequencing methods have transformed the field of genomics over the last decade. This has become possible due to the development of inexpensive short read sequencing technologies which have now resulted in three generations of sequencing platforms. More recently, a new fourth generation of Nanopore based single molecule sequencing technology, was developed based on MinION(®) sequencer which is portable, inexpensive and fast. It is capable of generating reads of length greater than 100 kb. Though it has many specific advantages, the two major limitations of the MinION reads are high error rates and the need for the development of downstream pipelines. The algorithms for error correction have already emerged, while development of pipelines is still at nascent stage. RESULTS: In this study, we benchmarked available assembler algorithms to find an appropriate framework that can efficiently assemble Nanopore sequenced reads. To address this, we employed genome-scale Nanopore sequenced datasets available for E. coli and yeast genomes respectively. In order to comprehensively evaluate multiple algorithmic frameworks, we included assemblers based on de Bruijn graphs (Velvet and ABySS), Overlap Layout Consensus (OLC) (Celera) and Greedy extension (SSAKE) approaches. We analyzed the quality, accuracy of the assemblies as well as the computational performance of each of the assemblers included in our benchmark. Our analysis unveiled that OLC-based algorithm, Celera, could generate a high quality assembly with ten times higher N50 & mean contig values as well as one-fifth the number of total number of contigs compared to other tools. Celera was also found to exhibit an average genome coverage of 12 % in E. coli dataset and 70 % in Yeast dataset as well as relatively lesser run times. In contrast, de Bruijn graph based assemblers Velvet and ABySS generated the assemblies of moderate quality, in less time when there is no limitation on the memory allocation, while greedy extension based algorithm SSAKE generated an assembly of very poor quality but with genome coverage of 90 % on yeast dataset. CONCLUSION: OLC can be considered as a favorable algorithmic framework for the development of assembler tools for Nanopore-based data, followed by de Bruijn based algorithms as they consume relatively less or similar run times as OLC-based algorithms for generating assembly, irrespective of the memory allocated for the task. However, few improvements must be made to the existing de Bruijn implementations in order to generate an assembly with reasonable quality. Our findings should help in stimulating the development of novel assemblers for handling Nanopore sequence data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2895-8) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5001211 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-50012112016-09-06 Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches Cherukuri, Yesesri Janga, Sarath Chandra BMC Genomics Research BACKGROUND: Improved DNA sequencing methods have transformed the field of genomics over the last decade. This has become possible due to the development of inexpensive short read sequencing technologies which have now resulted in three generations of sequencing platforms. More recently, a new fourth generation of Nanopore based single molecule sequencing technology, was developed based on MinION(®) sequencer which is portable, inexpensive and fast. It is capable of generating reads of length greater than 100 kb. Though it has many specific advantages, the two major limitations of the MinION reads are high error rates and the need for the development of downstream pipelines. The algorithms for error correction have already emerged, while development of pipelines is still at nascent stage. RESULTS: In this study, we benchmarked available assembler algorithms to find an appropriate framework that can efficiently assemble Nanopore sequenced reads. To address this, we employed genome-scale Nanopore sequenced datasets available for E. coli and yeast genomes respectively. In order to comprehensively evaluate multiple algorithmic frameworks, we included assemblers based on de Bruijn graphs (Velvet and ABySS), Overlap Layout Consensus (OLC) (Celera) and Greedy extension (SSAKE) approaches. We analyzed the quality, accuracy of the assemblies as well as the computational performance of each of the assemblers included in our benchmark. Our analysis unveiled that OLC-based algorithm, Celera, could generate a high quality assembly with ten times higher N50 & mean contig values as well as one-fifth the number of total number of contigs compared to other tools. Celera was also found to exhibit an average genome coverage of 12 % in E. coli dataset and 70 % in Yeast dataset as well as relatively lesser run times. In contrast, de Bruijn graph based assemblers Velvet and ABySS generated the assemblies of moderate quality, in less time when there is no limitation on the memory allocation, while greedy extension based algorithm SSAKE generated an assembly of very poor quality but with genome coverage of 90 % on yeast dataset. CONCLUSION: OLC can be considered as a favorable algorithmic framework for the development of assembler tools for Nanopore-based data, followed by de Bruijn based algorithms as they consume relatively less or similar run times as OLC-based algorithms for generating assembly, irrespective of the memory allocated for the task. However, few improvements must be made to the existing de Bruijn implementations in order to generate an assembly with reasonable quality. Our findings should help in stimulating the development of novel assemblers for handling Nanopore sequence data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2895-8) contains supplementary material, which is available to authorized users. BioMed Central 2016-08-22 /pmc/articles/PMC5001211/ /pubmed/27556636 http://dx.doi.org/10.1186/s12864-016-2895-8 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Cherukuri, Yesesri Janga, Sarath Chandra Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches |
title | Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches |
title_full | Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches |
title_fullStr | Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches |
title_full_unstemmed | Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches |
title_short | Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches |
title_sort | benchmarking of de novo assembly algorithms for nanopore data reveals optimal performance of olc approaches |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5001211/ https://www.ncbi.nlm.nih.gov/pubmed/27556636 http://dx.doi.org/10.1186/s12864-016-2895-8 |
work_keys_str_mv | AT cherukuriyesesri benchmarkingofdenovoassemblyalgorithmsfornanoporedatarevealsoptimalperformanceofolcapproaches AT jangasarathchandra benchmarkingofdenovoassemblyalgorithmsfornanoporedatarevealsoptimalperformanceofolcapproaches |