Cargando…
Unique Molecular Identifiers reveal a novel sequencing artefact with implications for RNA-Seq based gene expression analysis
Attaching Unique Molecular Identifiers (UMI) to RNA molecules in the first step of sequencing library preparation establishes a distinct identity for each input molecule. This makes it possible to eliminate the effects of PCR amplification bias, which is particularly important where many PCR cycles...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6120941/ https://www.ncbi.nlm.nih.gov/pubmed/30177820 http://dx.doi.org/10.1038/s41598-018-31064-7 |
_version_ | 1783352355021914112 |
---|---|
author | Sena, Johnny A. Galotto, Giulia Devitt, Nico P. Connick, Melanie C. Jacobi, Jennifer L. Umale, Pooja E. Vidali, Luis Bell, Callum J. |
author_facet | Sena, Johnny A. Galotto, Giulia Devitt, Nico P. Connick, Melanie C. Jacobi, Jennifer L. Umale, Pooja E. Vidali, Luis Bell, Callum J. |
author_sort | Sena, Johnny A. |
collection | PubMed |
description | Attaching Unique Molecular Identifiers (UMI) to RNA molecules in the first step of sequencing library preparation establishes a distinct identity for each input molecule. This makes it possible to eliminate the effects of PCR amplification bias, which is particularly important where many PCR cycles are required, for example, in single cell studies. After PCR, molecules sharing a UMI are assumed to be derived from the same input molecule. In our single cell RNA-Seq studies of Physcomitrella patens, we discovered that reads sharing a UMI, and therefore presumed to be derived from the same mRNA molecule, frequently map to different, but closely spaced locations. This behaviour occurs in all such libraries that we have produced, and in multiple other UMI-containing RNA-Seq data sets in the public domain. This apparent paradox, that reads of identical origin map to distinct genomic coordinates may be partially explained by PCR stutter, which is often seen in low-entropy templates and those containing simple tandem repeats. In the absence of UMI this artefact is undetectable. We show that the common assumption that sequence reads having different mapping coordinates are derived from different starting molecules does not hold. Unless taken into account, this artefact is likely to result in over-estimation of certain transcript abundances, depending on the counting method employed. |
format | Online Article Text |
id | pubmed-6120941 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-61209412018-09-06 Unique Molecular Identifiers reveal a novel sequencing artefact with implications for RNA-Seq based gene expression analysis Sena, Johnny A. Galotto, Giulia Devitt, Nico P. Connick, Melanie C. Jacobi, Jennifer L. Umale, Pooja E. Vidali, Luis Bell, Callum J. Sci Rep Article Attaching Unique Molecular Identifiers (UMI) to RNA molecules in the first step of sequencing library preparation establishes a distinct identity for each input molecule. This makes it possible to eliminate the effects of PCR amplification bias, which is particularly important where many PCR cycles are required, for example, in single cell studies. After PCR, molecules sharing a UMI are assumed to be derived from the same input molecule. In our single cell RNA-Seq studies of Physcomitrella patens, we discovered that reads sharing a UMI, and therefore presumed to be derived from the same mRNA molecule, frequently map to different, but closely spaced locations. This behaviour occurs in all such libraries that we have produced, and in multiple other UMI-containing RNA-Seq data sets in the public domain. This apparent paradox, that reads of identical origin map to distinct genomic coordinates may be partially explained by PCR stutter, which is often seen in low-entropy templates and those containing simple tandem repeats. In the absence of UMI this artefact is undetectable. We show that the common assumption that sequence reads having different mapping coordinates are derived from different starting molecules does not hold. Unless taken into account, this artefact is likely to result in over-estimation of certain transcript abundances, depending on the counting method employed. Nature Publishing Group UK 2018-09-03 /pmc/articles/PMC6120941/ /pubmed/30177820 http://dx.doi.org/10.1038/s41598-018-31064-7 Text en © The Author(s) 2018 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Sena, Johnny A. Galotto, Giulia Devitt, Nico P. Connick, Melanie C. Jacobi, Jennifer L. Umale, Pooja E. Vidali, Luis Bell, Callum J. Unique Molecular Identifiers reveal a novel sequencing artefact with implications for RNA-Seq based gene expression analysis |
title | Unique Molecular Identifiers reveal a novel sequencing artefact with implications for RNA-Seq based gene expression analysis |
title_full | Unique Molecular Identifiers reveal a novel sequencing artefact with implications for RNA-Seq based gene expression analysis |
title_fullStr | Unique Molecular Identifiers reveal a novel sequencing artefact with implications for RNA-Seq based gene expression analysis |
title_full_unstemmed | Unique Molecular Identifiers reveal a novel sequencing artefact with implications for RNA-Seq based gene expression analysis |
title_short | Unique Molecular Identifiers reveal a novel sequencing artefact with implications for RNA-Seq based gene expression analysis |
title_sort | unique molecular identifiers reveal a novel sequencing artefact with implications for rna-seq based gene expression analysis |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6120941/ https://www.ncbi.nlm.nih.gov/pubmed/30177820 http://dx.doi.org/10.1038/s41598-018-31064-7 |
work_keys_str_mv | AT senajohnnya uniquemolecularidentifiersrevealanovelsequencingartefactwithimplicationsforrnaseqbasedgeneexpressionanalysis AT galottogiulia uniquemolecularidentifiersrevealanovelsequencingartefactwithimplicationsforrnaseqbasedgeneexpressionanalysis AT devittnicop uniquemolecularidentifiersrevealanovelsequencingartefactwithimplicationsforrnaseqbasedgeneexpressionanalysis AT connickmelaniec uniquemolecularidentifiersrevealanovelsequencingartefactwithimplicationsforrnaseqbasedgeneexpressionanalysis AT jacobijenniferl uniquemolecularidentifiersrevealanovelsequencingartefactwithimplicationsforrnaseqbasedgeneexpressionanalysis AT umalepoojae uniquemolecularidentifiersrevealanovelsequencingartefactwithimplicationsforrnaseqbasedgeneexpressionanalysis AT vidaliluis uniquemolecularidentifiersrevealanovelsequencingartefactwithimplicationsforrnaseqbasedgeneexpressionanalysis AT bellcallumj uniquemolecularidentifiersrevealanovelsequencingartefactwithimplicationsforrnaseqbasedgeneexpressionanalysis |