Cargando…

TRUmiCount: correctly counting absolute numbers of molecules using unique molecular identifiers

MOTIVATION: Counting molecules using next-generation sequencing (NGS) suffers from PCR amplification bias, which reduces the accuracy of many quantitative NGS-based experimental methods such as RNA-Seq. This is true even if molecules are made distinguishable using unique molecular identifiers (UMIs)...

Descripción completa

Detalles Bibliográficos
Autores principales: Pflug, Florian G, von Haeseler, Arndt
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6157883/
https://www.ncbi.nlm.nih.gov/pubmed/29672674
http://dx.doi.org/10.1093/bioinformatics/bty283
_version_ 1783358342001852416
author Pflug, Florian G
von Haeseler, Arndt
author_facet Pflug, Florian G
von Haeseler, Arndt
author_sort Pflug, Florian G
collection PubMed
description MOTIVATION: Counting molecules using next-generation sequencing (NGS) suffers from PCR amplification bias, which reduces the accuracy of many quantitative NGS-based experimental methods such as RNA-Seq. This is true even if molecules are made distinguishable using unique molecular identifiers (UMIs) before PCR amplification, and distinct UMIs are counted instead of reads: Molecules that are lost entirely during the sequencing process will still cause underestimation of the molecule count, and amplification artifacts like PCR chimeras create phantom UMIs and thus cause over-estimation. RESULTS: We introduce the TRUmiCount algorithm to correct for both types of errors. The TRUmiCount algorithm is based on a mechanistic model of PCR amplification and sequencing, whose two parameters have an immediate physical interpretation as PCR efficiency and sequencing depth and can be estimated from experimental data without requiring calibration experiments or spike-ins. We show that our model captures the main stochastic properties of amplification and sequencing, and that it allows us to filter out phantom UMIs and to estimate the number of molecules lost during the sequencing process. Finally, we demonstrate that the phantom-filtered and loss-corrected molecule counts computed by TRUmiCount measure the true number of molecules with considerably higher accuracy than the raw number of distinct UMIs, even if most UMIs are sequenced only once as is typical for single-cell RNA-Seq. AVAILABILITY AND IMPLEMENTATION: TRUmiCount is available at http://www.cibiv.at/software/trumicount and through Bioconda (http://bioconda.github.io). SUPPLEMENTARY INFORMATION: Supplementary information is available at Bioinformatics online.
format Online
Article
Text
id pubmed-6157883
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-61578832018-10-01 TRUmiCount: correctly counting absolute numbers of molecules using unique molecular identifiers Pflug, Florian G von Haeseler, Arndt Bioinformatics Original Papers MOTIVATION: Counting molecules using next-generation sequencing (NGS) suffers from PCR amplification bias, which reduces the accuracy of many quantitative NGS-based experimental methods such as RNA-Seq. This is true even if molecules are made distinguishable using unique molecular identifiers (UMIs) before PCR amplification, and distinct UMIs are counted instead of reads: Molecules that are lost entirely during the sequencing process will still cause underestimation of the molecule count, and amplification artifacts like PCR chimeras create phantom UMIs and thus cause over-estimation. RESULTS: We introduce the TRUmiCount algorithm to correct for both types of errors. The TRUmiCount algorithm is based on a mechanistic model of PCR amplification and sequencing, whose two parameters have an immediate physical interpretation as PCR efficiency and sequencing depth and can be estimated from experimental data without requiring calibration experiments or spike-ins. We show that our model captures the main stochastic properties of amplification and sequencing, and that it allows us to filter out phantom UMIs and to estimate the number of molecules lost during the sequencing process. Finally, we demonstrate that the phantom-filtered and loss-corrected molecule counts computed by TRUmiCount measure the true number of molecules with considerably higher accuracy than the raw number of distinct UMIs, even if most UMIs are sequenced only once as is typical for single-cell RNA-Seq. AVAILABILITY AND IMPLEMENTATION: TRUmiCount is available at http://www.cibiv.at/software/trumicount and through Bioconda (http://bioconda.github.io). SUPPLEMENTARY INFORMATION: Supplementary information is available at Bioinformatics online. Oxford University Press 2018-09-15 2018-04-16 /pmc/articles/PMC6157883/ /pubmed/29672674 http://dx.doi.org/10.1093/bioinformatics/bty283 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Pflug, Florian G
von Haeseler, Arndt
TRUmiCount: correctly counting absolute numbers of molecules using unique molecular identifiers
title TRUmiCount: correctly counting absolute numbers of molecules using unique molecular identifiers
title_full TRUmiCount: correctly counting absolute numbers of molecules using unique molecular identifiers
title_fullStr TRUmiCount: correctly counting absolute numbers of molecules using unique molecular identifiers
title_full_unstemmed TRUmiCount: correctly counting absolute numbers of molecules using unique molecular identifiers
title_short TRUmiCount: correctly counting absolute numbers of molecules using unique molecular identifiers
title_sort trumicount: correctly counting absolute numbers of molecules using unique molecular identifiers
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6157883/
https://www.ncbi.nlm.nih.gov/pubmed/29672674
http://dx.doi.org/10.1093/bioinformatics/bty283
work_keys_str_mv AT pflugfloriang trumicountcorrectlycountingabsolutenumbersofmoleculesusinguniquemolecularidentifiers
AT vonhaeselerarndt trumicountcorrectlycountingabsolutenumbersofmoleculesusinguniquemolecularidentifiers