Cargando…

FlowClus: efficiently filtering and denoising pyrosequenced amplicons

BACKGROUND: Reducing the effects of sequencing errors and PCR artifacts has emerged as an essential component in amplicon-based metagenomic studies. Denoising algorithms have been designed that can reduce error rates in mock community data, but they change the sequence data in a manner that can be i...

Descripción completa

Detalles Bibliográficos
Autores principales: Gaspar, John M, Thomas, W Kelley
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4380255/
https://www.ncbi.nlm.nih.gov/pubmed/25885646
http://dx.doi.org/10.1186/s12859-015-0532-1
_version_ 1782364308421738496
author Gaspar, John M
Thomas, W Kelley
author_facet Gaspar, John M
Thomas, W Kelley
author_sort Gaspar, John M
collection PubMed
description BACKGROUND: Reducing the effects of sequencing errors and PCR artifacts has emerged as an essential component in amplicon-based metagenomic studies. Denoising algorithms have been designed that can reduce error rates in mock community data, but they change the sequence data in a manner that can be inconsistent with the process of removing errors in studies of real communities. In addition, they are limited by the size of the dataset and the sequencing technology used. RESULTS: FlowClus uses a systematic approach to filter and denoise reads efficiently. When denoising real datasets, FlowClus provides feedback about the process that can be used as the basis to adjust the parameters of the algorithm to suit the particular dataset. When used to analyze a mock community dataset, FlowClus produced a lower error rate compared to other denoising algorithms, while retaining significantly more sequence information. Among its other attributes, FlowClus can analyze longer reads being generated from all stages of 454 sequencing technology, as well as from Ion Torrent. It has processed a large dataset of 2.2 million GS-FLX Titanium reads in twelve hours; using its more efficient (but less precise) trie analysis option, this time was further reduced, to seven minutes. CONCLUSIONS: Many of the amplicon-based metagenomics datasets generated over the last several years have been processed through a denoising pipeline that likely caused deleterious effects on the raw data. By using FlowClus, one can avoid such negative outcomes while maintaining control over the filtering and denoising processes. Because of its efficiency, FlowClus can be used to re-analyze multiple large datasets together, thereby leading to more standardized conclusions. FlowClus is freely available on GitHub (jsh58/FlowClus); it is written in C and supported on Linux. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0532-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4380255
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43802552015-04-01 FlowClus: efficiently filtering and denoising pyrosequenced amplicons Gaspar, John M Thomas, W Kelley BMC Bioinformatics Methodology Article BACKGROUND: Reducing the effects of sequencing errors and PCR artifacts has emerged as an essential component in amplicon-based metagenomic studies. Denoising algorithms have been designed that can reduce error rates in mock community data, but they change the sequence data in a manner that can be inconsistent with the process of removing errors in studies of real communities. In addition, they are limited by the size of the dataset and the sequencing technology used. RESULTS: FlowClus uses a systematic approach to filter and denoise reads efficiently. When denoising real datasets, FlowClus provides feedback about the process that can be used as the basis to adjust the parameters of the algorithm to suit the particular dataset. When used to analyze a mock community dataset, FlowClus produced a lower error rate compared to other denoising algorithms, while retaining significantly more sequence information. Among its other attributes, FlowClus can analyze longer reads being generated from all stages of 454 sequencing technology, as well as from Ion Torrent. It has processed a large dataset of 2.2 million GS-FLX Titanium reads in twelve hours; using its more efficient (but less precise) trie analysis option, this time was further reduced, to seven minutes. CONCLUSIONS: Many of the amplicon-based metagenomics datasets generated over the last several years have been processed through a denoising pipeline that likely caused deleterious effects on the raw data. By using FlowClus, one can avoid such negative outcomes while maintaining control over the filtering and denoising processes. Because of its efficiency, FlowClus can be used to re-analyze multiple large datasets together, thereby leading to more standardized conclusions. FlowClus is freely available on GitHub (jsh58/FlowClus); it is written in C and supported on Linux. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0532-1) contains supplementary material, which is available to authorized users. BioMed Central 2015-03-27 /pmc/articles/PMC4380255/ /pubmed/25885646 http://dx.doi.org/10.1186/s12859-015-0532-1 Text en © Gaspar and Thomas; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Gaspar, John M
Thomas, W Kelley
FlowClus: efficiently filtering and denoising pyrosequenced amplicons
title FlowClus: efficiently filtering and denoising pyrosequenced amplicons
title_full FlowClus: efficiently filtering and denoising pyrosequenced amplicons
title_fullStr FlowClus: efficiently filtering and denoising pyrosequenced amplicons
title_full_unstemmed FlowClus: efficiently filtering and denoising pyrosequenced amplicons
title_short FlowClus: efficiently filtering and denoising pyrosequenced amplicons
title_sort flowclus: efficiently filtering and denoising pyrosequenced amplicons
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4380255/
https://www.ncbi.nlm.nih.gov/pubmed/25885646
http://dx.doi.org/10.1186/s12859-015-0532-1
work_keys_str_mv AT gasparjohnm flowclusefficientlyfilteringanddenoisingpyrosequencedamplicons
AT thomaswkelley flowclusefficientlyfilteringanddenoisingpyrosequencedamplicons