Cargando…
NoDe: a fast error-correction algorithm for pyrosequencing amplicon reads
BACKGROUND: The popularity of new sequencing technologies has led to an explosion of possible applications, including new approaches in biodiversity studies. However each of these sequencing technologies suffers from sequencing errors originating from different factors. For 16S rRNA metagenomics stu...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4403973/ https://www.ncbi.nlm.nih.gov/pubmed/25888405 http://dx.doi.org/10.1186/s12859-015-0520-5 |
_version_ | 1782367419065434112 |
---|---|
author | Mysara, Mohamed Leys, Natalie Raes, Jeroen Monsieurs, Pieter |
author_facet | Mysara, Mohamed Leys, Natalie Raes, Jeroen Monsieurs, Pieter |
author_sort | Mysara, Mohamed |
collection | PubMed |
description | BACKGROUND: The popularity of new sequencing technologies has led to an explosion of possible applications, including new approaches in biodiversity studies. However each of these sequencing technologies suffers from sequencing errors originating from different factors. For 16S rRNA metagenomics studies, the 454 pyrosequencing technology is one of the most frequently used platforms, but sequencing errors still lead to important data analysis issues (e.g. in clustering in taxonomic units and biodiversity estimation). Moreover, retaining a higher portion of the sequencing data by preserving as much of the read length as possible while maintaining the error rate within an acceptable range, will have important consequences at the level of taxonomic precision. RESULTS: The new error correction algorithm proposed in this work - NoDe (Noise Detector) - is trained to identify those positions in 454 sequencing reads that are likely to have an error, and subsequently clusters those error-prone reads with correct reads resulting in error-free representative read. A benchmarking study with other denoising algorithms shows that NoDe can detect up to 75% more errors in a large scale mock community dataset, and this with a low computational cost compared to the second best algorithm considered in this study. The positive effect of NoDe in 16S rRNA studies was confirmed by the beneficial effect on the precision of the clustering of pyrosequencing reads in operational taxonomic units. CONCLUSIONS: NoDe was shown to be a computational efficient denoising algorithm for pyrosequencing reads, producing the lowest error rates in an extensive benchmarking study with other denoising algorithms. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0520-5) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4403973 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-44039732015-04-21 NoDe: a fast error-correction algorithm for pyrosequencing amplicon reads Mysara, Mohamed Leys, Natalie Raes, Jeroen Monsieurs, Pieter BMC Bioinformatics Methodology Article BACKGROUND: The popularity of new sequencing technologies has led to an explosion of possible applications, including new approaches in biodiversity studies. However each of these sequencing technologies suffers from sequencing errors originating from different factors. For 16S rRNA metagenomics studies, the 454 pyrosequencing technology is one of the most frequently used platforms, but sequencing errors still lead to important data analysis issues (e.g. in clustering in taxonomic units and biodiversity estimation). Moreover, retaining a higher portion of the sequencing data by preserving as much of the read length as possible while maintaining the error rate within an acceptable range, will have important consequences at the level of taxonomic precision. RESULTS: The new error correction algorithm proposed in this work - NoDe (Noise Detector) - is trained to identify those positions in 454 sequencing reads that are likely to have an error, and subsequently clusters those error-prone reads with correct reads resulting in error-free representative read. A benchmarking study with other denoising algorithms shows that NoDe can detect up to 75% more errors in a large scale mock community dataset, and this with a low computational cost compared to the second best algorithm considered in this study. The positive effect of NoDe in 16S rRNA studies was confirmed by the beneficial effect on the precision of the clustering of pyrosequencing reads in operational taxonomic units. CONCLUSIONS: NoDe was shown to be a computational efficient denoising algorithm for pyrosequencing reads, producing the lowest error rates in an extensive benchmarking study with other denoising algorithms. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0520-5) contains supplementary material, which is available to authorized users. BioMed Central 2015-03-15 /pmc/articles/PMC4403973/ /pubmed/25888405 http://dx.doi.org/10.1186/s12859-015-0520-5 Text en © Mysara et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Mysara, Mohamed Leys, Natalie Raes, Jeroen Monsieurs, Pieter NoDe: a fast error-correction algorithm for pyrosequencing amplicon reads |
title | NoDe: a fast error-correction algorithm for pyrosequencing amplicon reads |
title_full | NoDe: a fast error-correction algorithm for pyrosequencing amplicon reads |
title_fullStr | NoDe: a fast error-correction algorithm for pyrosequencing amplicon reads |
title_full_unstemmed | NoDe: a fast error-correction algorithm for pyrosequencing amplicon reads |
title_short | NoDe: a fast error-correction algorithm for pyrosequencing amplicon reads |
title_sort | node: a fast error-correction algorithm for pyrosequencing amplicon reads |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4403973/ https://www.ncbi.nlm.nih.gov/pubmed/25888405 http://dx.doi.org/10.1186/s12859-015-0520-5 |
work_keys_str_mv | AT mysaramohamed nodeafasterrorcorrectionalgorithmforpyrosequencingampliconreads AT leysnatalie nodeafasterrorcorrectionalgorithmforpyrosequencingampliconreads AT raesjeroen nodeafasterrorcorrectionalgorithmforpyrosequencingampliconreads AT monsieurspieter nodeafasterrorcorrectionalgorithmforpyrosequencingampliconreads |