Cargando…

Identification and removal of sequencing artifacts produced by mispriming during reverse transcription in multiple RNA-seq technologies

The quality of RNA sequencing data relies on specific priming by the primer used for reverse transcription (RT-primer). Nonspecific annealing of the RT-primer to the RNA template can generate reads with incorrect cDNA ends and can cause misinterpretation of data (RT mispriming). This kind of artifac...

Descripción completa

Detalles Bibliográficos
Autores principales: Shivram, Haridha, Iyer, Vishwanath R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6097653/
https://www.ncbi.nlm.nih.gov/pubmed/29950518
http://dx.doi.org/10.1261/rna.066217.118
_version_ 1783348338774507520
author Shivram, Haridha
Iyer, Vishwanath R.
author_facet Shivram, Haridha
Iyer, Vishwanath R.
author_sort Shivram, Haridha
collection PubMed
description The quality of RNA sequencing data relies on specific priming by the primer used for reverse transcription (RT-primer). Nonspecific annealing of the RT-primer to the RNA template can generate reads with incorrect cDNA ends and can cause misinterpretation of data (RT mispriming). This kind of artifact in RNA-seq based technologies is underappreciated and currently no adequate tools exist to computationally remove them from published data sets. We show that mispriming can occur with as little as two bases of complementarity at the 3′ end of the primer followed by intermittent regions of complementarity. We also provide a computational pipeline that identifies cDNA reads produced from RT mispriming, allowing users to filter them out from any aligned data set. Using this analysis pipeline, we identify thousands of mispriming events in a dozen published data sets from diverse technologies including short RNA-seq, total/mRNA-seq, HITS-CLIP, and GRO-seq. We further show how RT mispriming can lead to misinterpretation of data. In addition to providing a solution to computationally remove RT-misprimed reads, we also propose an experimental solution to completely avoid RT-mispriming by performing RNA-seq using thermostable group II intron derived reverse transcriptase (TGIRT-seq).
format Online
Article
Text
id pubmed-6097653
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-60976532019-09-01 Identification and removal of sequencing artifacts produced by mispriming during reverse transcription in multiple RNA-seq technologies Shivram, Haridha Iyer, Vishwanath R. RNA Method The quality of RNA sequencing data relies on specific priming by the primer used for reverse transcription (RT-primer). Nonspecific annealing of the RT-primer to the RNA template can generate reads with incorrect cDNA ends and can cause misinterpretation of data (RT mispriming). This kind of artifact in RNA-seq based technologies is underappreciated and currently no adequate tools exist to computationally remove them from published data sets. We show that mispriming can occur with as little as two bases of complementarity at the 3′ end of the primer followed by intermittent regions of complementarity. We also provide a computational pipeline that identifies cDNA reads produced from RT mispriming, allowing users to filter them out from any aligned data set. Using this analysis pipeline, we identify thousands of mispriming events in a dozen published data sets from diverse technologies including short RNA-seq, total/mRNA-seq, HITS-CLIP, and GRO-seq. We further show how RT mispriming can lead to misinterpretation of data. In addition to providing a solution to computationally remove RT-misprimed reads, we also propose an experimental solution to completely avoid RT-mispriming by performing RNA-seq using thermostable group II intron derived reverse transcriptase (TGIRT-seq). Cold Spring Harbor Laboratory Press 2018-09 /pmc/articles/PMC6097653/ /pubmed/29950518 http://dx.doi.org/10.1261/rna.066217.118 Text en © 2018 Shivram and Iyer; Published by Cold Spring Harbor Laboratory Press for the RNA Society http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed exclusively by the RNA Society for the first 12 months after the full-issue publication date (see http://rnajournal.cshlp.org/site/misc/terms.xhtml). After 12 months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle Method
Shivram, Haridha
Iyer, Vishwanath R.
Identification and removal of sequencing artifacts produced by mispriming during reverse transcription in multiple RNA-seq technologies
title Identification and removal of sequencing artifacts produced by mispriming during reverse transcription in multiple RNA-seq technologies
title_full Identification and removal of sequencing artifacts produced by mispriming during reverse transcription in multiple RNA-seq technologies
title_fullStr Identification and removal of sequencing artifacts produced by mispriming during reverse transcription in multiple RNA-seq technologies
title_full_unstemmed Identification and removal of sequencing artifacts produced by mispriming during reverse transcription in multiple RNA-seq technologies
title_short Identification and removal of sequencing artifacts produced by mispriming during reverse transcription in multiple RNA-seq technologies
title_sort identification and removal of sequencing artifacts produced by mispriming during reverse transcription in multiple rna-seq technologies
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6097653/
https://www.ncbi.nlm.nih.gov/pubmed/29950518
http://dx.doi.org/10.1261/rna.066217.118
work_keys_str_mv AT shivramharidha identificationandremovalofsequencingartifactsproducedbymisprimingduringreversetranscriptioninmultiplernaseqtechnologies
AT iyervishwanathr identificationandremovalofsequencingartifactsproducedbymisprimingduringreversetranscriptioninmultiplernaseqtechnologies