Cargando…

From reads to operational taxonomic units: an ensemble processing pipeline for MiSeq amplicon sequencing data

The development of high-throughput sequencing technologies has provided microbial ecologists with an efficient approach to assess bacterial diversity at an unseen depth, particularly with the recent advances in the Illumina MiSeq sequencing platform. However, analyzing such high-throughput data is p...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mysara, Mohamed, Njima, Mercy, Leys, Natalie, Raes, Jeroen, Monsieurs, Pieter
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2017
Materias:	Technical Note
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5466709/ https://www.ncbi.nlm.nih.gov/pubmed/28369460 http://dx.doi.org/10.1093/gigascience/giw017

_version_	1783243142919618560
author	Mysara, Mohamed Njima, Mercy Leys, Natalie Raes, Jeroen Monsieurs, Pieter
author_facet	Mysara, Mohamed Njima, Mercy Leys, Natalie Raes, Jeroen Monsieurs, Pieter
author_sort	Mysara, Mohamed
collection	PubMed
description	The development of high-throughput sequencing technologies has provided microbial ecologists with an efficient approach to assess bacterial diversity at an unseen depth, particularly with the recent advances in the Illumina MiSeq sequencing platform. However, analyzing such high-throughput data is posing important computational challenges, requiring specialized bioinformatics solutions at different stages during the processing pipeline, such as assembly of paired-end reads, chimera removal, correction of sequencing errors, and clustering of those sequences into Operational Taxonomic Units (OTUs). Individual algorithms grappling with each of those challenges have been combined into various bioinformatics pipelines, such as mothur, QIIME, LotuS, and USEARCH. Using a set of well-described bacterial mock communities, state-of-the-art pipelines for Illumina MiSeq amplicon sequencing data are benchmarked at the level of the amount of sequences retained, computational cost, error rate, and quality of the OTUs. In addition, a new pipeline called OCToPUS is introduced, which is making an optimal combination of different algorithms. Huge variability is observed between the different pipelines in respect to the monitored performance parameters, where in general the amount of retained reads is found to be inversely proportional to the quality of the reads. By contrast, OCToPUS achieves the lowest error rate, minimum number of spurious OTUs, and the closest correspondence to the existing community, while retaining the uppermost amount of reads when compared to other pipelines. The newly introduced pipeline translates Illumina MiSeq amplicon sequencing data into high-quality and reliable OTUs, with improved performance and accuracy compared to the currently existing pipelines.
format	Online Article Text
id	pubmed-5466709
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-54667092017-06-19 From reads to operational taxonomic units: an ensemble processing pipeline for MiSeq amplicon sequencing data Mysara, Mohamed Njima, Mercy Leys, Natalie Raes, Jeroen Monsieurs, Pieter Gigascience Technical Note The development of high-throughput sequencing technologies has provided microbial ecologists with an efficient approach to assess bacterial diversity at an unseen depth, particularly with the recent advances in the Illumina MiSeq sequencing platform. However, analyzing such high-throughput data is posing important computational challenges, requiring specialized bioinformatics solutions at different stages during the processing pipeline, such as assembly of paired-end reads, chimera removal, correction of sequencing errors, and clustering of those sequences into Operational Taxonomic Units (OTUs). Individual algorithms grappling with each of those challenges have been combined into various bioinformatics pipelines, such as mothur, QIIME, LotuS, and USEARCH. Using a set of well-described bacterial mock communities, state-of-the-art pipelines for Illumina MiSeq amplicon sequencing data are benchmarked at the level of the amount of sequences retained, computational cost, error rate, and quality of the OTUs. In addition, a new pipeline called OCToPUS is introduced, which is making an optimal combination of different algorithms. Huge variability is observed between the different pipelines in respect to the monitored performance parameters, where in general the amount of retained reads is found to be inversely proportional to the quality of the reads. By contrast, OCToPUS achieves the lowest error rate, minimum number of spurious OTUs, and the closest correspondence to the existing community, while retaining the uppermost amount of reads when compared to other pipelines. The newly introduced pipeline translates Illumina MiSeq amplicon sequencing data into high-quality and reliable OTUs, with improved performance and accuracy compared to the currently existing pipelines. Oxford University Press 2017-01-18 /pmc/articles/PMC5466709/ /pubmed/28369460 http://dx.doi.org/10.1093/gigascience/giw017 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Technical Note Mysara, Mohamed Njima, Mercy Leys, Natalie Raes, Jeroen Monsieurs, Pieter From reads to operational taxonomic units: an ensemble processing pipeline for MiSeq amplicon sequencing data
title	From reads to operational taxonomic units: an ensemble processing pipeline for MiSeq amplicon sequencing data
title_full	From reads to operational taxonomic units: an ensemble processing pipeline for MiSeq amplicon sequencing data
title_fullStr	From reads to operational taxonomic units: an ensemble processing pipeline for MiSeq amplicon sequencing data
title_full_unstemmed	From reads to operational taxonomic units: an ensemble processing pipeline for MiSeq amplicon sequencing data
title_short	From reads to operational taxonomic units: an ensemble processing pipeline for MiSeq amplicon sequencing data
title_sort	from reads to operational taxonomic units: an ensemble processing pipeline for miseq amplicon sequencing data
topic	Technical Note
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5466709/ https://www.ncbi.nlm.nih.gov/pubmed/28369460 http://dx.doi.org/10.1093/gigascience/giw017
work_keys_str_mv	AT mysaramohamed fromreadstooperationaltaxonomicunitsanensembleprocessingpipelineformiseqampliconsequencingdata AT njimamercy fromreadstooperationaltaxonomicunitsanensembleprocessingpipelineformiseqampliconsequencingdata AT leysnatalie fromreadstooperationaltaxonomicunitsanensembleprocessingpipelineformiseqampliconsequencingdata AT raesjeroen fromreadstooperationaltaxonomicunitsanensembleprocessingpipelineformiseqampliconsequencingdata AT monsieurspieter fromreadstooperationaltaxonomicunitsanensembleprocessingpipelineformiseqampliconsequencingdata

From reads to operational taxonomic units: an ensemble processing pipeline for MiSeq amplicon sequencing data

Ejemplares similares