Cargando…

AmpliCI: a high-resolution model-based approach for denoising Illumina amplicon data

MOTIVATION: Next-generation amplicon sequencing is a powerful tool for investigating microbial communities. A main challenge is to distinguish true biological variants from errors caused by amplification and sequencing. In traditional analyses, such errors are eliminated by clustering reads within a...

Descripción completa

Detalles Bibliográficos
Autores principales: Peng, Xiyu, Dorman, Karin S
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7850112/
https://www.ncbi.nlm.nih.gov/pubmed/32697845
http://dx.doi.org/10.1093/bioinformatics/btaa648
_version_ 1783645405857185792
author Peng, Xiyu
Dorman, Karin S
author_facet Peng, Xiyu
Dorman, Karin S
author_sort Peng, Xiyu
collection PubMed
description MOTIVATION: Next-generation amplicon sequencing is a powerful tool for investigating microbial communities. A main challenge is to distinguish true biological variants from errors caused by amplification and sequencing. In traditional analyses, such errors are eliminated by clustering reads within a sequence similarity threshold, usually 97%, and constructing operational taxonomic units, but the arbitrary threshold leads to low resolution and high false-positive rates. Recently developed ‘denoising’ methods have proven able to resolve single-nucleotide amplicon variants, but they still miss low-frequency sequences, especially those near more frequent sequences, because they ignore the sequencing quality information. RESULTS: We introduce AmpliCI, a reference-free, model-based method for rapidly resolving the number, abundance and identity of error-free sequences in massive Illumina amplicon datasets. AmpliCI considers the quality information and allows the data, not an arbitrary threshold or an external database, to drive conclusions. AmpliCI estimates a finite mixture model, using a greedy strategy to gradually select error-free sequences and approximately maximize the likelihood. AmpliCI has better performance than three popular denoising methods, with acceptable computation time and memory usage. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/DormanLab/AmpliCI. SUPPLEMENTARY INFORMATION: Supplementary material are available at Bioinformatics online.
format Online
Article
Text
id pubmed-7850112
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-78501122021-02-03 AmpliCI: a high-resolution model-based approach for denoising Illumina amplicon data Peng, Xiyu Dorman, Karin S Bioinformatics Original Papers MOTIVATION: Next-generation amplicon sequencing is a powerful tool for investigating microbial communities. A main challenge is to distinguish true biological variants from errors caused by amplification and sequencing. In traditional analyses, such errors are eliminated by clustering reads within a sequence similarity threshold, usually 97%, and constructing operational taxonomic units, but the arbitrary threshold leads to low resolution and high false-positive rates. Recently developed ‘denoising’ methods have proven able to resolve single-nucleotide amplicon variants, but they still miss low-frequency sequences, especially those near more frequent sequences, because they ignore the sequencing quality information. RESULTS: We introduce AmpliCI, a reference-free, model-based method for rapidly resolving the number, abundance and identity of error-free sequences in massive Illumina amplicon datasets. AmpliCI considers the quality information and allows the data, not an arbitrary threshold or an external database, to drive conclusions. AmpliCI estimates a finite mixture model, using a greedy strategy to gradually select error-free sequences and approximately maximize the likelihood. AmpliCI has better performance than three popular denoising methods, with acceptable computation time and memory usage. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/DormanLab/AmpliCI. SUPPLEMENTARY INFORMATION: Supplementary material are available at Bioinformatics online. Oxford University Press 2020-07-22 /pmc/articles/PMC7850112/ /pubmed/32697845 http://dx.doi.org/10.1093/bioinformatics/btaa648 Text en © The Author(s) 2020. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Peng, Xiyu
Dorman, Karin S
AmpliCI: a high-resolution model-based approach for denoising Illumina amplicon data
title AmpliCI: a high-resolution model-based approach for denoising Illumina amplicon data
title_full AmpliCI: a high-resolution model-based approach for denoising Illumina amplicon data
title_fullStr AmpliCI: a high-resolution model-based approach for denoising Illumina amplicon data
title_full_unstemmed AmpliCI: a high-resolution model-based approach for denoising Illumina amplicon data
title_short AmpliCI: a high-resolution model-based approach for denoising Illumina amplicon data
title_sort amplici: a high-resolution model-based approach for denoising illumina amplicon data
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7850112/
https://www.ncbi.nlm.nih.gov/pubmed/32697845
http://dx.doi.org/10.1093/bioinformatics/btaa648
work_keys_str_mv AT pengxiyu ampliciahighresolutionmodelbasedapproachfordenoisingilluminaamplicondata
AT dormankarins ampliciahighresolutionmodelbasedapproachfordenoisingilluminaamplicondata