Cargando…

NoDe: a fast error-correction algorithm for pyrosequencing amplicon reads

BACKGROUND: The popularity of new sequencing technologies has led to an explosion of possible applications, including new approaches in biodiversity studies. However each of these sequencing technologies suffers from sequencing errors originating from different factors. For 16S rRNA metagenomics stu...

Descripción completa

Detalles Bibliográficos
Autores principales: Mysara, Mohamed, Leys, Natalie, Raes, Jeroen, Monsieurs, Pieter
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4403973/
https://www.ncbi.nlm.nih.gov/pubmed/25888405
http://dx.doi.org/10.1186/s12859-015-0520-5
_version_ 1782367419065434112
author Mysara, Mohamed
Leys, Natalie
Raes, Jeroen
Monsieurs, Pieter
author_facet Mysara, Mohamed
Leys, Natalie
Raes, Jeroen
Monsieurs, Pieter
author_sort Mysara, Mohamed
collection PubMed
description BACKGROUND: The popularity of new sequencing technologies has led to an explosion of possible applications, including new approaches in biodiversity studies. However each of these sequencing technologies suffers from sequencing errors originating from different factors. For 16S rRNA metagenomics studies, the 454 pyrosequencing technology is one of the most frequently used platforms, but sequencing errors still lead to important data analysis issues (e.g. in clustering in taxonomic units and biodiversity estimation). Moreover, retaining a higher portion of the sequencing data by preserving as much of the read length as possible while maintaining the error rate within an acceptable range, will have important consequences at the level of taxonomic precision. RESULTS: The new error correction algorithm proposed in this work - NoDe (Noise Detector) - is trained to identify those positions in 454 sequencing reads that are likely to have an error, and subsequently clusters those error-prone reads with correct reads resulting in error-free representative read. A benchmarking study with other denoising algorithms shows that NoDe can detect up to 75% more errors in a large scale mock community dataset, and this with a low computational cost compared to the second best algorithm considered in this study. The positive effect of NoDe in 16S rRNA studies was confirmed by the beneficial effect on the precision of the clustering of pyrosequencing reads in operational taxonomic units. CONCLUSIONS: NoDe was shown to be a computational efficient denoising algorithm for pyrosequencing reads, producing the lowest error rates in an extensive benchmarking study with other denoising algorithms. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0520-5) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4403973
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44039732015-04-21 NoDe: a fast error-correction algorithm for pyrosequencing amplicon reads Mysara, Mohamed Leys, Natalie Raes, Jeroen Monsieurs, Pieter BMC Bioinformatics Methodology Article BACKGROUND: The popularity of new sequencing technologies has led to an explosion of possible applications, including new approaches in biodiversity studies. However each of these sequencing technologies suffers from sequencing errors originating from different factors. For 16S rRNA metagenomics studies, the 454 pyrosequencing technology is one of the most frequently used platforms, but sequencing errors still lead to important data analysis issues (e.g. in clustering in taxonomic units and biodiversity estimation). Moreover, retaining a higher portion of the sequencing data by preserving as much of the read length as possible while maintaining the error rate within an acceptable range, will have important consequences at the level of taxonomic precision. RESULTS: The new error correction algorithm proposed in this work - NoDe (Noise Detector) - is trained to identify those positions in 454 sequencing reads that are likely to have an error, and subsequently clusters those error-prone reads with correct reads resulting in error-free representative read. A benchmarking study with other denoising algorithms shows that NoDe can detect up to 75% more errors in a large scale mock community dataset, and this with a low computational cost compared to the second best algorithm considered in this study. The positive effect of NoDe in 16S rRNA studies was confirmed by the beneficial effect on the precision of the clustering of pyrosequencing reads in operational taxonomic units. CONCLUSIONS: NoDe was shown to be a computational efficient denoising algorithm for pyrosequencing reads, producing the lowest error rates in an extensive benchmarking study with other denoising algorithms. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0520-5) contains supplementary material, which is available to authorized users. BioMed Central 2015-03-15 /pmc/articles/PMC4403973/ /pubmed/25888405 http://dx.doi.org/10.1186/s12859-015-0520-5 Text en © Mysara et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Mysara, Mohamed
Leys, Natalie
Raes, Jeroen
Monsieurs, Pieter
NoDe: a fast error-correction algorithm for pyrosequencing amplicon reads
title NoDe: a fast error-correction algorithm for pyrosequencing amplicon reads
title_full NoDe: a fast error-correction algorithm for pyrosequencing amplicon reads
title_fullStr NoDe: a fast error-correction algorithm for pyrosequencing amplicon reads
title_full_unstemmed NoDe: a fast error-correction algorithm for pyrosequencing amplicon reads
title_short NoDe: a fast error-correction algorithm for pyrosequencing amplicon reads
title_sort node: a fast error-correction algorithm for pyrosequencing amplicon reads
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4403973/
https://www.ncbi.nlm.nih.gov/pubmed/25888405
http://dx.doi.org/10.1186/s12859-015-0520-5
work_keys_str_mv AT mysaramohamed nodeafasterrorcorrectionalgorithmforpyrosequencingampliconreads
AT leysnatalie nodeafasterrorcorrectionalgorithmforpyrosequencingampliconreads
AT raesjeroen nodeafasterrorcorrectionalgorithmforpyrosequencingampliconreads
AT monsieurspieter nodeafasterrorcorrectionalgorithmforpyrosequencingampliconreads