Cargando…

PyroClean: Denoising Pyrosequences from Protein-Coding Amplicons for the Recovery of Interspecific and Intraspecific Genetic Variation

High-throughput parallel sequencing is a powerful tool for the quantification of microbial diversity through the amplification of nuclear ribosomal gene regions. Recent work has extended this approach to the quantification of diversity within otherwise difficult-to-study metazoan groups. However, nu...

Descripción completa

Detalles Bibliográficos
Autores principales: Ramirez-Gonzalez, Ricardo, Yu, Douglas W., Bruce, Catharine, Heavens, Darren, Caccamo, Mario, Emerson, Brent C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3585932/
https://www.ncbi.nlm.nih.gov/pubmed/23469211
http://dx.doi.org/10.1371/journal.pone.0057615
_version_ 1782261238879748096
author Ramirez-Gonzalez, Ricardo
Yu, Douglas W.
Bruce, Catharine
Heavens, Darren
Caccamo, Mario
Emerson, Brent C.
author_facet Ramirez-Gonzalez, Ricardo
Yu, Douglas W.
Bruce, Catharine
Heavens, Darren
Caccamo, Mario
Emerson, Brent C.
author_sort Ramirez-Gonzalez, Ricardo
collection PubMed
description High-throughput parallel sequencing is a powerful tool for the quantification of microbial diversity through the amplification of nuclear ribosomal gene regions. Recent work has extended this approach to the quantification of diversity within otherwise difficult-to-study metazoan groups. However, nuclear ribosomal genes present both analytical challenges and practical limitations that are a consequence of the mutational properties of nuclear ribosomal genes. Here we exploit useful properties of protein-coding genes for cross-species amplification and denoising of 454 flowgrams. We first use experimental mixtures of species from the class Collembola to amplify and pyrosequence the 5′ region of the COI barcode, and we implement a new algorithm called PyroClean for the denoising of Roche GS FLX pyrosequences. Using parameter values from the analysis of experimental mixtures, we then analyse two communities sampled from field sites on the island of Tenerife. Cross-species amplification success of target mitochondrial sequences in experimental species mixtures is high; however, there is little relationship between template DNA concentrations and pyrosequencing read abundance. Homopolymer error correction and filtering against a consensus reference sequence reduced the volume of unique sequences to approximately 5% of the original unique raw reads. Filtering of remaining non-target sequences attributed to PCR error, sequencing error, or numts further reduced unique sequence volume to 0.8% of the original raw reads. PyroClean reduces or eliminates the need for an additional, time-consuming step to cluster reads into Operational Taxonomic Units, which facilitates the detection of intraspecific DNA sequence variation. PyroCleaned sequence data from field sites in Tenerife demonstrate the utility of our approach for quantifying evolutionary diversity and its spatial structure. Comparison of our sequence data to public databases reveals that we are able to successfully recover both interspecific and intraspecific sequence diversity.
format Online
Article
Text
id pubmed-3585932
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-35859322013-03-06 PyroClean: Denoising Pyrosequences from Protein-Coding Amplicons for the Recovery of Interspecific and Intraspecific Genetic Variation Ramirez-Gonzalez, Ricardo Yu, Douglas W. Bruce, Catharine Heavens, Darren Caccamo, Mario Emerson, Brent C. PLoS One Research Article High-throughput parallel sequencing is a powerful tool for the quantification of microbial diversity through the amplification of nuclear ribosomal gene regions. Recent work has extended this approach to the quantification of diversity within otherwise difficult-to-study metazoan groups. However, nuclear ribosomal genes present both analytical challenges and practical limitations that are a consequence of the mutational properties of nuclear ribosomal genes. Here we exploit useful properties of protein-coding genes for cross-species amplification and denoising of 454 flowgrams. We first use experimental mixtures of species from the class Collembola to amplify and pyrosequence the 5′ region of the COI barcode, and we implement a new algorithm called PyroClean for the denoising of Roche GS FLX pyrosequences. Using parameter values from the analysis of experimental mixtures, we then analyse two communities sampled from field sites on the island of Tenerife. Cross-species amplification success of target mitochondrial sequences in experimental species mixtures is high; however, there is little relationship between template DNA concentrations and pyrosequencing read abundance. Homopolymer error correction and filtering against a consensus reference sequence reduced the volume of unique sequences to approximately 5% of the original unique raw reads. Filtering of remaining non-target sequences attributed to PCR error, sequencing error, or numts further reduced unique sequence volume to 0.8% of the original raw reads. PyroClean reduces or eliminates the need for an additional, time-consuming step to cluster reads into Operational Taxonomic Units, which facilitates the detection of intraspecific DNA sequence variation. PyroCleaned sequence data from field sites in Tenerife demonstrate the utility of our approach for quantifying evolutionary diversity and its spatial structure. Comparison of our sequence data to public databases reveals that we are able to successfully recover both interspecific and intraspecific sequence diversity. Public Library of Science 2013-03-01 /pmc/articles/PMC3585932/ /pubmed/23469211 http://dx.doi.org/10.1371/journal.pone.0057615 Text en © 2013 Ramirez-Gonzalez et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Ramirez-Gonzalez, Ricardo
Yu, Douglas W.
Bruce, Catharine
Heavens, Darren
Caccamo, Mario
Emerson, Brent C.
PyroClean: Denoising Pyrosequences from Protein-Coding Amplicons for the Recovery of Interspecific and Intraspecific Genetic Variation
title PyroClean: Denoising Pyrosequences from Protein-Coding Amplicons for the Recovery of Interspecific and Intraspecific Genetic Variation
title_full PyroClean: Denoising Pyrosequences from Protein-Coding Amplicons for the Recovery of Interspecific and Intraspecific Genetic Variation
title_fullStr PyroClean: Denoising Pyrosequences from Protein-Coding Amplicons for the Recovery of Interspecific and Intraspecific Genetic Variation
title_full_unstemmed PyroClean: Denoising Pyrosequences from Protein-Coding Amplicons for the Recovery of Interspecific and Intraspecific Genetic Variation
title_short PyroClean: Denoising Pyrosequences from Protein-Coding Amplicons for the Recovery of Interspecific and Intraspecific Genetic Variation
title_sort pyroclean: denoising pyrosequences from protein-coding amplicons for the recovery of interspecific and intraspecific genetic variation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3585932/
https://www.ncbi.nlm.nih.gov/pubmed/23469211
http://dx.doi.org/10.1371/journal.pone.0057615
work_keys_str_mv AT ramirezgonzalezricardo pyrocleandenoisingpyrosequencesfromproteincodingampliconsfortherecoveryofinterspecificandintraspecificgeneticvariation
AT yudouglasw pyrocleandenoisingpyrosequencesfromproteincodingampliconsfortherecoveryofinterspecificandintraspecificgeneticvariation
AT brucecatharine pyrocleandenoisingpyrosequencesfromproteincodingampliconsfortherecoveryofinterspecificandintraspecificgeneticvariation
AT heavensdarren pyrocleandenoisingpyrosequencesfromproteincodingampliconsfortherecoveryofinterspecificandintraspecificgeneticvariation
AT caccamomario pyrocleandenoisingpyrosequencesfromproteincodingampliconsfortherecoveryofinterspecificandintraspecificgeneticvariation
AT emersonbrentc pyrocleandenoisingpyrosequencesfromproteincodingampliconsfortherecoveryofinterspecificandintraspecificgeneticvariation