Cargando…

Implications of Pyrosequencing Error Correction for Biological Data Interpretation

There has been a rapid proliferation of approaches for processing and manipulating second generation DNA sequence data. However, users are often left with uncertainties about how the choice of processing methods may impact biological interpretation of data. In this report, we probe differences in ou...

Descripción completa

Detalles Bibliográficos
Autores principales: Bakker, Matthew G., Tu, Zheng J., Bradeen, James M., Kinkel, Linda L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431371/
https://www.ncbi.nlm.nih.gov/pubmed/22952965
http://dx.doi.org/10.1371/journal.pone.0044357
_version_ 1782242074932805632
author Bakker, Matthew G.
Tu, Zheng J.
Bradeen, James M.
Kinkel, Linda L.
author_facet Bakker, Matthew G.
Tu, Zheng J.
Bradeen, James M.
Kinkel, Linda L.
author_sort Bakker, Matthew G.
collection PubMed
description There has been a rapid proliferation of approaches for processing and manipulating second generation DNA sequence data. However, users are often left with uncertainties about how the choice of processing methods may impact biological interpretation of data. In this report, we probe differences in output between two different processing pipelines: a de-noising approach using the AmpliconNoise algorithm for error correction, and a standard approach using quality filtering and preclustering to reduce error. There was a large overlap in reads culled by each method, although AmpliconNoise removed a greater net number of reads. Most OTUs produced by one method had a clearly corresponding partner in the other. Although each method resulted in OTUs consisting entirely of reads that were culled by the other method, there were many more such OTUs formed in the standard pipeline. Total OTU richness was reduced by AmpliconNoise processing, but per-sample OTU richness, diversity and evenness were increased. Increases in per-sample richness and diversity may be a result of AmpliconNoise processing producing a more even OTU rank-abundance distribution. Because communities were randomly subsampled to equalize sample size across communities, and because rare sequence variants are less likely to be selected during subsampling, fewer OTUs were lost from individual communities when subsampling AmpliconNoise-processed data. In contrast to taxon-based diversity estimates, phylogenetic diversity was reduced even on a per-sample basis by de-noising, and samples switched widely in diversity rankings. This work illustrates the significant impacts of processing pipelines on the biological interpretations that can be made from pyrosequencing surveys. This study provides important cautions for analyses of contemporary data, for requisite data archiving (processed vs. non-processed data), and for drawing comparisons among studies performed using distinct data processing pipelines.
format Online
Article
Text
id pubmed-3431371
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-34313712012-09-05 Implications of Pyrosequencing Error Correction for Biological Data Interpretation Bakker, Matthew G. Tu, Zheng J. Bradeen, James M. Kinkel, Linda L. PLoS One Research Article There has been a rapid proliferation of approaches for processing and manipulating second generation DNA sequence data. However, users are often left with uncertainties about how the choice of processing methods may impact biological interpretation of data. In this report, we probe differences in output between two different processing pipelines: a de-noising approach using the AmpliconNoise algorithm for error correction, and a standard approach using quality filtering and preclustering to reduce error. There was a large overlap in reads culled by each method, although AmpliconNoise removed a greater net number of reads. Most OTUs produced by one method had a clearly corresponding partner in the other. Although each method resulted in OTUs consisting entirely of reads that were culled by the other method, there were many more such OTUs formed in the standard pipeline. Total OTU richness was reduced by AmpliconNoise processing, but per-sample OTU richness, diversity and evenness were increased. Increases in per-sample richness and diversity may be a result of AmpliconNoise processing producing a more even OTU rank-abundance distribution. Because communities were randomly subsampled to equalize sample size across communities, and because rare sequence variants are less likely to be selected during subsampling, fewer OTUs were lost from individual communities when subsampling AmpliconNoise-processed data. In contrast to taxon-based diversity estimates, phylogenetic diversity was reduced even on a per-sample basis by de-noising, and samples switched widely in diversity rankings. This work illustrates the significant impacts of processing pipelines on the biological interpretations that can be made from pyrosequencing surveys. This study provides important cautions for analyses of contemporary data, for requisite data archiving (processed vs. non-processed data), and for drawing comparisons among studies performed using distinct data processing pipelines. Public Library of Science 2012-08-30 /pmc/articles/PMC3431371/ /pubmed/22952965 http://dx.doi.org/10.1371/journal.pone.0044357 Text en © 2012 Bakker et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Bakker, Matthew G.
Tu, Zheng J.
Bradeen, James M.
Kinkel, Linda L.
Implications of Pyrosequencing Error Correction for Biological Data Interpretation
title Implications of Pyrosequencing Error Correction for Biological Data Interpretation
title_full Implications of Pyrosequencing Error Correction for Biological Data Interpretation
title_fullStr Implications of Pyrosequencing Error Correction for Biological Data Interpretation
title_full_unstemmed Implications of Pyrosequencing Error Correction for Biological Data Interpretation
title_short Implications of Pyrosequencing Error Correction for Biological Data Interpretation
title_sort implications of pyrosequencing error correction for biological data interpretation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431371/
https://www.ncbi.nlm.nih.gov/pubmed/22952965
http://dx.doi.org/10.1371/journal.pone.0044357
work_keys_str_mv AT bakkermatthewg implicationsofpyrosequencingerrorcorrectionforbiologicaldatainterpretation
AT tuzhengj implicationsofpyrosequencingerrorcorrectionforbiologicaldatainterpretation
AT bradeenjamesm implicationsofpyrosequencingerrorcorrectionforbiologicaldatainterpretation
AT kinkellindal implicationsofpyrosequencingerrorcorrectionforbiologicaldatainterpretation