Cargando…

Accurate estimation of molecular counts from amplicon sequence data with unique molecular identifiers

MOTIVATION: Amplicon sequencing is widely applied to explore heterogeneity and rare variants in genetic populations. Resolving true biological variants and quantifying their abundance is crucial for downstream analyses, but measured abundances are distorted by stochasticity and bias in amplification...

Descripción completa

Detalles Bibliográficos
Autores principales: Peng, Xiyu, Dorman, Karin S
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9891248/
https://www.ncbi.nlm.nih.gov/pubmed/36610988
http://dx.doi.org/10.1093/bioinformatics/btad002
_version_ 1784881103030452224
author Peng, Xiyu
Dorman, Karin S
author_facet Peng, Xiyu
Dorman, Karin S
author_sort Peng, Xiyu
collection PubMed
description MOTIVATION: Amplicon sequencing is widely applied to explore heterogeneity and rare variants in genetic populations. Resolving true biological variants and quantifying their abundance is crucial for downstream analyses, but measured abundances are distorted by stochasticity and bias in amplification, plus errors during polymerase chain reaction (PCR) and sequencing. One solution attaches unique molecular identifiers (UMIs) to sample sequences before amplification. Counting UMIs instead of sequences provides unbiased estimates of abundance. While modern methods improve over naïve counting by UMI identity, most do not account for UMI reuse or collision, and they do not adequately model PCR and sequencing errors in the UMIs and sample sequences. RESULTS: We introduce Deduplication and Abundance estimation with UMIs (DAUMI), a probabilistic framework to detect true biological amplicon sequences and accurately estimate their deduplicated abundance. DAUMI recognizes UMI collision, even on highly similar sequences, and detects and corrects most PCR and sequencing errors in the UMI and sampled sequences. DAUMI performs better on simulated and real data compared to other UMI-aware clustering methods. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/DormanLab/AmpliCI. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9891248
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-98912482023-02-02 Accurate estimation of molecular counts from amplicon sequence data with unique molecular identifiers Peng, Xiyu Dorman, Karin S Bioinformatics Original Paper MOTIVATION: Amplicon sequencing is widely applied to explore heterogeneity and rare variants in genetic populations. Resolving true biological variants and quantifying their abundance is crucial for downstream analyses, but measured abundances are distorted by stochasticity and bias in amplification, plus errors during polymerase chain reaction (PCR) and sequencing. One solution attaches unique molecular identifiers (UMIs) to sample sequences before amplification. Counting UMIs instead of sequences provides unbiased estimates of abundance. While modern methods improve over naïve counting by UMI identity, most do not account for UMI reuse or collision, and they do not adequately model PCR and sequencing errors in the UMIs and sample sequences. RESULTS: We introduce Deduplication and Abundance estimation with UMIs (DAUMI), a probabilistic framework to detect true biological amplicon sequences and accurately estimate their deduplicated abundance. DAUMI recognizes UMI collision, even on highly similar sequences, and detects and corrects most PCR and sequencing errors in the UMI and sampled sequences. DAUMI performs better on simulated and real data compared to other UMI-aware clustering methods. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/DormanLab/AmpliCI. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2023-01-05 /pmc/articles/PMC9891248/ /pubmed/36610988 http://dx.doi.org/10.1093/bioinformatics/btad002 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Peng, Xiyu
Dorman, Karin S
Accurate estimation of molecular counts from amplicon sequence data with unique molecular identifiers
title Accurate estimation of molecular counts from amplicon sequence data with unique molecular identifiers
title_full Accurate estimation of molecular counts from amplicon sequence data with unique molecular identifiers
title_fullStr Accurate estimation of molecular counts from amplicon sequence data with unique molecular identifiers
title_full_unstemmed Accurate estimation of molecular counts from amplicon sequence data with unique molecular identifiers
title_short Accurate estimation of molecular counts from amplicon sequence data with unique molecular identifiers
title_sort accurate estimation of molecular counts from amplicon sequence data with unique molecular identifiers
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9891248/
https://www.ncbi.nlm.nih.gov/pubmed/36610988
http://dx.doi.org/10.1093/bioinformatics/btad002
work_keys_str_mv AT pengxiyu accurateestimationofmolecularcountsfromampliconsequencedatawithuniquemolecularidentifiers
AT dormankarins accurateestimationofmolecularcountsfromampliconsequencedatawithuniquemolecularidentifiers