Cargando…

IPED: a highly efficient denoising tool for Illumina MiSeq Paired-end 16S rRNA gene amplicon sequencing data

BACKGROUND: The development of high-throughput sequencing technologies has revolutionized the field of microbial ecology via the sequencing of phylogenetic marker genes (e.g. 16S rRNA gene amplicon sequencing). Denoising, the removal of sequencing errors, is an important step in preprocessing amplic...

Descripción completa

Detalles Bibliográficos
Autores principales: Mysara, Mohamed, Leys, Natalie, Raes, Jeroen, Monsieurs, Pieter
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4850673/
https://www.ncbi.nlm.nih.gov/pubmed/27130479
http://dx.doi.org/10.1186/s12859-016-1061-2
_version_ 1782429693697327104
author Mysara, Mohamed
Leys, Natalie
Raes, Jeroen
Monsieurs, Pieter
author_facet Mysara, Mohamed
Leys, Natalie
Raes, Jeroen
Monsieurs, Pieter
author_sort Mysara, Mohamed
collection PubMed
description BACKGROUND: The development of high-throughput sequencing technologies has revolutionized the field of microbial ecology via the sequencing of phylogenetic marker genes (e.g. 16S rRNA gene amplicon sequencing). Denoising, the removal of sequencing errors, is an important step in preprocessing amplicon sequencing data. The increasing popularity of the Illumina MiSeq platform for these applications requires the development of appropriate denoising methods. RESULTS: The newly proposed denoising algorithm IPED includes a machine learning method which predicts potentially erroneous positions in sequencing reads based on a combination of quality metrics. Subsequently, this information is used to group those error-containing reads with correct reads, resulting in error-free consensus reads. This is achieved by masking potentially erroneous positions during this clustering step. Compared to the second best algorithm available, IPED detects double the amount of errors. Reducing the error rate had a positive effect on the clustering of reads in operational taxonomic units, with an almost perfect correspondence between the number of clusters and the theoretical number of species present in the mock communities. CONCLUSION: Our algorithm IPED is a powerful denoising tool for correcting sequencing errors in Illumina MiSeq 16S rRNA gene amplicon sequencing data. Apart from significantly reducing the error rate of the sequencing reads, it has also a beneficial effect on their clustering into operational taxonomic units. IPED is freely available at http://science.sckcen.be/en/Institutes/EHS/MCB/MIC/Bioinformatics/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1061-2) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4850673
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-48506732016-05-12 IPED: a highly efficient denoising tool for Illumina MiSeq Paired-end 16S rRNA gene amplicon sequencing data Mysara, Mohamed Leys, Natalie Raes, Jeroen Monsieurs, Pieter BMC Bioinformatics Research Article BACKGROUND: The development of high-throughput sequencing technologies has revolutionized the field of microbial ecology via the sequencing of phylogenetic marker genes (e.g. 16S rRNA gene amplicon sequencing). Denoising, the removal of sequencing errors, is an important step in preprocessing amplicon sequencing data. The increasing popularity of the Illumina MiSeq platform for these applications requires the development of appropriate denoising methods. RESULTS: The newly proposed denoising algorithm IPED includes a machine learning method which predicts potentially erroneous positions in sequencing reads based on a combination of quality metrics. Subsequently, this information is used to group those error-containing reads with correct reads, resulting in error-free consensus reads. This is achieved by masking potentially erroneous positions during this clustering step. Compared to the second best algorithm available, IPED detects double the amount of errors. Reducing the error rate had a positive effect on the clustering of reads in operational taxonomic units, with an almost perfect correspondence between the number of clusters and the theoretical number of species present in the mock communities. CONCLUSION: Our algorithm IPED is a powerful denoising tool for correcting sequencing errors in Illumina MiSeq 16S rRNA gene amplicon sequencing data. Apart from significantly reducing the error rate of the sequencing reads, it has also a beneficial effect on their clustering into operational taxonomic units. IPED is freely available at http://science.sckcen.be/en/Institutes/EHS/MCB/MIC/Bioinformatics/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1061-2) contains supplementary material, which is available to authorized users. BioMed Central 2016-04-29 /pmc/articles/PMC4850673/ /pubmed/27130479 http://dx.doi.org/10.1186/s12859-016-1061-2 Text en © Mysara et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Mysara, Mohamed
Leys, Natalie
Raes, Jeroen
Monsieurs, Pieter
IPED: a highly efficient denoising tool for Illumina MiSeq Paired-end 16S rRNA gene amplicon sequencing data
title IPED: a highly efficient denoising tool for Illumina MiSeq Paired-end 16S rRNA gene amplicon sequencing data
title_full IPED: a highly efficient denoising tool for Illumina MiSeq Paired-end 16S rRNA gene amplicon sequencing data
title_fullStr IPED: a highly efficient denoising tool for Illumina MiSeq Paired-end 16S rRNA gene amplicon sequencing data
title_full_unstemmed IPED: a highly efficient denoising tool for Illumina MiSeq Paired-end 16S rRNA gene amplicon sequencing data
title_short IPED: a highly efficient denoising tool for Illumina MiSeq Paired-end 16S rRNA gene amplicon sequencing data
title_sort iped: a highly efficient denoising tool for illumina miseq paired-end 16s rrna gene amplicon sequencing data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4850673/
https://www.ncbi.nlm.nih.gov/pubmed/27130479
http://dx.doi.org/10.1186/s12859-016-1061-2
work_keys_str_mv AT mysaramohamed ipedahighlyefficientdenoisingtoolforilluminamiseqpairedend16srrnageneampliconsequencingdata
AT leysnatalie ipedahighlyefficientdenoisingtoolforilluminamiseqpairedend16srrnageneampliconsequencingdata
AT raesjeroen ipedahighlyefficientdenoisingtoolforilluminamiseqpairedend16srrnageneampliconsequencingdata
AT monsieurspieter ipedahighlyefficientdenoisingtoolforilluminamiseqpairedend16srrnageneampliconsequencingdata