Cargando…

Accurate estimation of molecular counts from amplicon sequence data with unique molecular identifiers

MOTIVATION: Amplicon sequencing is widely applied to explore heterogeneity and rare variants in genetic populations. Resolving true biological variants and quantifying their abundance is crucial for downstream analyses, but measured abundances are distorted by stochasticity and bias in amplification...

Descripción completa

Detalles Bibliográficos
Autores principales: Peng, Xiyu, Dorman, Karin S
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9891248/
https://www.ncbi.nlm.nih.gov/pubmed/36610988
http://dx.doi.org/10.1093/bioinformatics/btad002
Descripción
Sumario:MOTIVATION: Amplicon sequencing is widely applied to explore heterogeneity and rare variants in genetic populations. Resolving true biological variants and quantifying their abundance is crucial for downstream analyses, but measured abundances are distorted by stochasticity and bias in amplification, plus errors during polymerase chain reaction (PCR) and sequencing. One solution attaches unique molecular identifiers (UMIs) to sample sequences before amplification. Counting UMIs instead of sequences provides unbiased estimates of abundance. While modern methods improve over naïve counting by UMI identity, most do not account for UMI reuse or collision, and they do not adequately model PCR and sequencing errors in the UMIs and sample sequences. RESULTS: We introduce Deduplication and Abundance estimation with UMIs (DAUMI), a probabilistic framework to detect true biological amplicon sequences and accurately estimate their deduplicated abundance. DAUMI recognizes UMI collision, even on highly similar sequences, and detects and corrects most PCR and sequencing errors in the UMI and sampled sequences. DAUMI performs better on simulated and real data compared to other UMI-aware clustering methods. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/DormanLab/AmpliCI. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.