Cargando…
AmpliCI: a high-resolution model-based approach for denoising Illumina amplicon data
MOTIVATION: Next-generation amplicon sequencing is a powerful tool for investigating microbial communities. A main challenge is to distinguish true biological variants from errors caused by amplification and sequencing. In traditional analyses, such errors are eliminated by clustering reads within a...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7850112/ https://www.ncbi.nlm.nih.gov/pubmed/32697845 http://dx.doi.org/10.1093/bioinformatics/btaa648 |
_version_ | 1783645405857185792 |
---|---|
author | Peng, Xiyu Dorman, Karin S |
author_facet | Peng, Xiyu Dorman, Karin S |
author_sort | Peng, Xiyu |
collection | PubMed |
description | MOTIVATION: Next-generation amplicon sequencing is a powerful tool for investigating microbial communities. A main challenge is to distinguish true biological variants from errors caused by amplification and sequencing. In traditional analyses, such errors are eliminated by clustering reads within a sequence similarity threshold, usually 97%, and constructing operational taxonomic units, but the arbitrary threshold leads to low resolution and high false-positive rates. Recently developed ‘denoising’ methods have proven able to resolve single-nucleotide amplicon variants, but they still miss low-frequency sequences, especially those near more frequent sequences, because they ignore the sequencing quality information. RESULTS: We introduce AmpliCI, a reference-free, model-based method for rapidly resolving the number, abundance and identity of error-free sequences in massive Illumina amplicon datasets. AmpliCI considers the quality information and allows the data, not an arbitrary threshold or an external database, to drive conclusions. AmpliCI estimates a finite mixture model, using a greedy strategy to gradually select error-free sequences and approximately maximize the likelihood. AmpliCI has better performance than three popular denoising methods, with acceptable computation time and memory usage. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/DormanLab/AmpliCI. SUPPLEMENTARY INFORMATION: Supplementary material are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-7850112 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-78501122021-02-03 AmpliCI: a high-resolution model-based approach for denoising Illumina amplicon data Peng, Xiyu Dorman, Karin S Bioinformatics Original Papers MOTIVATION: Next-generation amplicon sequencing is a powerful tool for investigating microbial communities. A main challenge is to distinguish true biological variants from errors caused by amplification and sequencing. In traditional analyses, such errors are eliminated by clustering reads within a sequence similarity threshold, usually 97%, and constructing operational taxonomic units, but the arbitrary threshold leads to low resolution and high false-positive rates. Recently developed ‘denoising’ methods have proven able to resolve single-nucleotide amplicon variants, but they still miss low-frequency sequences, especially those near more frequent sequences, because they ignore the sequencing quality information. RESULTS: We introduce AmpliCI, a reference-free, model-based method for rapidly resolving the number, abundance and identity of error-free sequences in massive Illumina amplicon datasets. AmpliCI considers the quality information and allows the data, not an arbitrary threshold or an external database, to drive conclusions. AmpliCI estimates a finite mixture model, using a greedy strategy to gradually select error-free sequences and approximately maximize the likelihood. AmpliCI has better performance than three popular denoising methods, with acceptable computation time and memory usage. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/DormanLab/AmpliCI. SUPPLEMENTARY INFORMATION: Supplementary material are available at Bioinformatics online. Oxford University Press 2020-07-22 /pmc/articles/PMC7850112/ /pubmed/32697845 http://dx.doi.org/10.1093/bioinformatics/btaa648 Text en © The Author(s) 2020. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Original Papers Peng, Xiyu Dorman, Karin S AmpliCI: a high-resolution model-based approach for denoising Illumina amplicon data |
title | AmpliCI: a high-resolution model-based approach for denoising Illumina amplicon data |
title_full | AmpliCI: a high-resolution model-based approach for denoising Illumina amplicon data |
title_fullStr | AmpliCI: a high-resolution model-based approach for denoising Illumina amplicon data |
title_full_unstemmed | AmpliCI: a high-resolution model-based approach for denoising Illumina amplicon data |
title_short | AmpliCI: a high-resolution model-based approach for denoising Illumina amplicon data |
title_sort | amplici: a high-resolution model-based approach for denoising illumina amplicon data |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7850112/ https://www.ncbi.nlm.nih.gov/pubmed/32697845 http://dx.doi.org/10.1093/bioinformatics/btaa648 |
work_keys_str_mv | AT pengxiyu ampliciahighresolutionmodelbasedapproachfordenoisingilluminaamplicondata AT dormankarins ampliciahighresolutionmodelbasedapproachfordenoisingilluminaamplicondata |