Cargando…

Unique Molecular Identifiers reveal a novel sequencing artefact with implications for RNA-Seq based gene expression analysis

Attaching Unique Molecular Identifiers (UMI) to RNA molecules in the first step of sequencing library preparation establishes a distinct identity for each input molecule. This makes it possible to eliminate the effects of PCR amplification bias, which is particularly important where many PCR cycles...

Descripción completa

Detalles Bibliográficos
Autores principales: Sena, Johnny A., Galotto, Giulia, Devitt, Nico P., Connick, Melanie C., Jacobi, Jennifer L., Umale, Pooja E., Vidali, Luis, Bell, Callum J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6120941/
https://www.ncbi.nlm.nih.gov/pubmed/30177820
http://dx.doi.org/10.1038/s41598-018-31064-7
_version_ 1783352355021914112
author Sena, Johnny A.
Galotto, Giulia
Devitt, Nico P.
Connick, Melanie C.
Jacobi, Jennifer L.
Umale, Pooja E.
Vidali, Luis
Bell, Callum J.
author_facet Sena, Johnny A.
Galotto, Giulia
Devitt, Nico P.
Connick, Melanie C.
Jacobi, Jennifer L.
Umale, Pooja E.
Vidali, Luis
Bell, Callum J.
author_sort Sena, Johnny A.
collection PubMed
description Attaching Unique Molecular Identifiers (UMI) to RNA molecules in the first step of sequencing library preparation establishes a distinct identity for each input molecule. This makes it possible to eliminate the effects of PCR amplification bias, which is particularly important where many PCR cycles are required, for example, in single cell studies. After PCR, molecules sharing a UMI are assumed to be derived from the same input molecule. In our single cell RNA-Seq studies of Physcomitrella patens, we discovered that reads sharing a UMI, and therefore presumed to be derived from the same mRNA molecule, frequently map to different, but closely spaced locations. This behaviour occurs in all such libraries that we have produced, and in multiple other UMI-containing RNA-Seq data sets in the public domain. This apparent paradox, that reads of identical origin map to distinct genomic coordinates may be partially explained by PCR stutter, which is often seen in low-entropy templates and those containing simple tandem repeats. In the absence of UMI this artefact is undetectable. We show that the common assumption that sequence reads having different mapping coordinates are derived from different starting molecules does not hold. Unless taken into account, this artefact is likely to result in over-estimation of certain transcript abundances, depending on the counting method employed.
format Online
Article
Text
id pubmed-6120941
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-61209412018-09-06 Unique Molecular Identifiers reveal a novel sequencing artefact with implications for RNA-Seq based gene expression analysis Sena, Johnny A. Galotto, Giulia Devitt, Nico P. Connick, Melanie C. Jacobi, Jennifer L. Umale, Pooja E. Vidali, Luis Bell, Callum J. Sci Rep Article Attaching Unique Molecular Identifiers (UMI) to RNA molecules in the first step of sequencing library preparation establishes a distinct identity for each input molecule. This makes it possible to eliminate the effects of PCR amplification bias, which is particularly important where many PCR cycles are required, for example, in single cell studies. After PCR, molecules sharing a UMI are assumed to be derived from the same input molecule. In our single cell RNA-Seq studies of Physcomitrella patens, we discovered that reads sharing a UMI, and therefore presumed to be derived from the same mRNA molecule, frequently map to different, but closely spaced locations. This behaviour occurs in all such libraries that we have produced, and in multiple other UMI-containing RNA-Seq data sets in the public domain. This apparent paradox, that reads of identical origin map to distinct genomic coordinates may be partially explained by PCR stutter, which is often seen in low-entropy templates and those containing simple tandem repeats. In the absence of UMI this artefact is undetectable. We show that the common assumption that sequence reads having different mapping coordinates are derived from different starting molecules does not hold. Unless taken into account, this artefact is likely to result in over-estimation of certain transcript abundances, depending on the counting method employed. Nature Publishing Group UK 2018-09-03 /pmc/articles/PMC6120941/ /pubmed/30177820 http://dx.doi.org/10.1038/s41598-018-31064-7 Text en © The Author(s) 2018 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Sena, Johnny A.
Galotto, Giulia
Devitt, Nico P.
Connick, Melanie C.
Jacobi, Jennifer L.
Umale, Pooja E.
Vidali, Luis
Bell, Callum J.
Unique Molecular Identifiers reveal a novel sequencing artefact with implications for RNA-Seq based gene expression analysis
title Unique Molecular Identifiers reveal a novel sequencing artefact with implications for RNA-Seq based gene expression analysis
title_full Unique Molecular Identifiers reveal a novel sequencing artefact with implications for RNA-Seq based gene expression analysis
title_fullStr Unique Molecular Identifiers reveal a novel sequencing artefact with implications for RNA-Seq based gene expression analysis
title_full_unstemmed Unique Molecular Identifiers reveal a novel sequencing artefact with implications for RNA-Seq based gene expression analysis
title_short Unique Molecular Identifiers reveal a novel sequencing artefact with implications for RNA-Seq based gene expression analysis
title_sort unique molecular identifiers reveal a novel sequencing artefact with implications for rna-seq based gene expression analysis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6120941/
https://www.ncbi.nlm.nih.gov/pubmed/30177820
http://dx.doi.org/10.1038/s41598-018-31064-7
work_keys_str_mv AT senajohnnya uniquemolecularidentifiersrevealanovelsequencingartefactwithimplicationsforrnaseqbasedgeneexpressionanalysis
AT galottogiulia uniquemolecularidentifiersrevealanovelsequencingartefactwithimplicationsforrnaseqbasedgeneexpressionanalysis
AT devittnicop uniquemolecularidentifiersrevealanovelsequencingartefactwithimplicationsforrnaseqbasedgeneexpressionanalysis
AT connickmelaniec uniquemolecularidentifiersrevealanovelsequencingartefactwithimplicationsforrnaseqbasedgeneexpressionanalysis
AT jacobijenniferl uniquemolecularidentifiersrevealanovelsequencingartefactwithimplicationsforrnaseqbasedgeneexpressionanalysis
AT umalepoojae uniquemolecularidentifiersrevealanovelsequencingartefactwithimplicationsforrnaseqbasedgeneexpressionanalysis
AT vidaliluis uniquemolecularidentifiersrevealanovelsequencingartefactwithimplicationsforrnaseqbasedgeneexpressionanalysis
AT bellcallumj uniquemolecularidentifiersrevealanovelsequencingartefactwithimplicationsforrnaseqbasedgeneexpressionanalysis