Cargando…

Removing Noise From Pyrosequenced Amplicons

BACKGROUND: In many environmental genomics applications a homologous region of DNA from a diverse sample is first amplified by PCR and then sequenced. The next generation sequencing technology, 454 pyrosequencing, has allowed much larger read numbers from PCR amplicons than ever before. This has rev...

Descripción completa

Detalles Bibliográficos
Autores principales: Quince, Christopher, Lanzen, Anders, Davenport, Russell J, Turnbaugh, Peter J
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3045300/
https://www.ncbi.nlm.nih.gov/pubmed/21276213
http://dx.doi.org/10.1186/1471-2105-12-38
_version_ 1782198805943287808
author Quince, Christopher
Lanzen, Anders
Davenport, Russell J
Turnbaugh, Peter J
author_facet Quince, Christopher
Lanzen, Anders
Davenport, Russell J
Turnbaugh, Peter J
author_sort Quince, Christopher
collection PubMed
description BACKGROUND: In many environmental genomics applications a homologous region of DNA from a diverse sample is first amplified by PCR and then sequenced. The next generation sequencing technology, 454 pyrosequencing, has allowed much larger read numbers from PCR amplicons than ever before. This has revolutionised the study of microbial diversity as it is now possible to sequence a substantial fraction of the 16S rRNA genes in a community. However, there is a growing realisation that because of the large read numbers and the lack of consensus sequences it is vital to distinguish noise from true sequence diversity in this data. Otherwise this leads to inflated estimates of the number of types or operational taxonomic units (OTUs) present. Three sources of error are important: sequencing error, PCR single base substitutions and PCR chimeras. We present AmpliconNoise, a development of the PyroNoise algorithm that is capable of separately removing 454 sequencing errors and PCR single base errors. We also introduce a novel chimera removal program, Perseus, that exploits the sequence abundances associated with pyrosequencing data. We use data sets where samples of known diversity have been amplified and sequenced to quantify the effect of each of the sources of error on OTU inflation and to validate these algorithms. RESULTS: AmpliconNoise outperforms alternative algorithms substantially reducing per base error rates for both the GS FLX and latest Titanium protocol. All three sources of error lead to inflation of diversity estimates. In particular, chimera formation has a hitherto unrealised importance which varies according to amplification protocol. We show that AmpliconNoise allows accurate estimates of OTU number. Just as importantly AmpliconNoise generates the right OTUs even at low sequence differences. We demonstrate that Perseus has very high sensitivity, able to find 99% of chimeras, which is critical when these are present at high frequencies. CONCLUSIONS: AmpliconNoise followed by Perseus is a very effective pipeline for the removal of noise. In addition the principles behind the algorithms, the inference of true sequences using Expectation-Maximization (EM), and the treatment of chimera detection as a classification or 'supervised learning' problem, will be equally applicable to new sequencing technologies as they appear.
format Text
id pubmed-3045300
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30453002011-03-01 Removing Noise From Pyrosequenced Amplicons Quince, Christopher Lanzen, Anders Davenport, Russell J Turnbaugh, Peter J BMC Bioinformatics Research Article BACKGROUND: In many environmental genomics applications a homologous region of DNA from a diverse sample is first amplified by PCR and then sequenced. The next generation sequencing technology, 454 pyrosequencing, has allowed much larger read numbers from PCR amplicons than ever before. This has revolutionised the study of microbial diversity as it is now possible to sequence a substantial fraction of the 16S rRNA genes in a community. However, there is a growing realisation that because of the large read numbers and the lack of consensus sequences it is vital to distinguish noise from true sequence diversity in this data. Otherwise this leads to inflated estimates of the number of types or operational taxonomic units (OTUs) present. Three sources of error are important: sequencing error, PCR single base substitutions and PCR chimeras. We present AmpliconNoise, a development of the PyroNoise algorithm that is capable of separately removing 454 sequencing errors and PCR single base errors. We also introduce a novel chimera removal program, Perseus, that exploits the sequence abundances associated with pyrosequencing data. We use data sets where samples of known diversity have been amplified and sequenced to quantify the effect of each of the sources of error on OTU inflation and to validate these algorithms. RESULTS: AmpliconNoise outperforms alternative algorithms substantially reducing per base error rates for both the GS FLX and latest Titanium protocol. All three sources of error lead to inflation of diversity estimates. In particular, chimera formation has a hitherto unrealised importance which varies according to amplification protocol. We show that AmpliconNoise allows accurate estimates of OTU number. Just as importantly AmpliconNoise generates the right OTUs even at low sequence differences. We demonstrate that Perseus has very high sensitivity, able to find 99% of chimeras, which is critical when these are present at high frequencies. CONCLUSIONS: AmpliconNoise followed by Perseus is a very effective pipeline for the removal of noise. In addition the principles behind the algorithms, the inference of true sequences using Expectation-Maximization (EM), and the treatment of chimera detection as a classification or 'supervised learning' problem, will be equally applicable to new sequencing technologies as they appear. BioMed Central 2011-01-28 /pmc/articles/PMC3045300/ /pubmed/21276213 http://dx.doi.org/10.1186/1471-2105-12-38 Text en Copyright ©2011 Quince et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Quince, Christopher
Lanzen, Anders
Davenport, Russell J
Turnbaugh, Peter J
Removing Noise From Pyrosequenced Amplicons
title Removing Noise From Pyrosequenced Amplicons
title_full Removing Noise From Pyrosequenced Amplicons
title_fullStr Removing Noise From Pyrosequenced Amplicons
title_full_unstemmed Removing Noise From Pyrosequenced Amplicons
title_short Removing Noise From Pyrosequenced Amplicons
title_sort removing noise from pyrosequenced amplicons
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3045300/
https://www.ncbi.nlm.nih.gov/pubmed/21276213
http://dx.doi.org/10.1186/1471-2105-12-38
work_keys_str_mv AT quincechristopher removingnoisefrompyrosequencedamplicons
AT lanzenanders removingnoisefrompyrosequencedamplicons
AT davenportrussellj removingnoisefrompyrosequencedamplicons
AT turnbaughpeterj removingnoisefrompyrosequencedamplicons