Cargando…

Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols

In recent years, RNA-sequencing (RNA-seq) has emerged as a powerful technology for transcriptome profiling. For a given gene, the number of mapped reads is not only dependent on its expression level and gene length, but also the sequencing depth. To normalize these dependencies, RPKM (reads per kilo...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Shanrong, Ye, Zhan, Stanton, Robert
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7373998/
https://www.ncbi.nlm.nih.gov/pubmed/32284352
http://dx.doi.org/10.1261/rna.074922.120
_version_ 1783561607271415808
author Zhao, Shanrong
Ye, Zhan
Stanton, Robert
author_facet Zhao, Shanrong
Ye, Zhan
Stanton, Robert
author_sort Zhao, Shanrong
collection PubMed
description In recent years, RNA-sequencing (RNA-seq) has emerged as a powerful technology for transcriptome profiling. For a given gene, the number of mapped reads is not only dependent on its expression level and gene length, but also the sequencing depth. To normalize these dependencies, RPKM (reads per kilobase of transcript per million reads mapped) and TPM (transcripts per million) are used to measure gene or transcript expression levels. A common misconception is that RPKM and TPM values are already normalized, and thus should be comparable across samples or RNA-seq projects. However, RPKM and TPM represent the relative abundance of a transcript among a population of sequenced transcripts, and therefore depend on the composition of the RNA population in a sample. Quite often, it is reasonable to assume that total RNA concentration and distributions are very close across compared samples. Nevertheless, the sequenced RNA repertoires may differ significantly under different experimental conditions and/or across sequencing protocols; thus, the proportion of gene expression is not directly comparable in such cases. In this review, we illustrate typical scenarios in which RPKM and TPM are misused, unintentionally, and hope to raise scientists’ awareness of this issue when comparing them across samples or different sequencing protocols.
format Online
Article
Text
id pubmed-7373998
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-73739982020-08-05 Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols Zhao, Shanrong Ye, Zhan Stanton, Robert RNA Review In recent years, RNA-sequencing (RNA-seq) has emerged as a powerful technology for transcriptome profiling. For a given gene, the number of mapped reads is not only dependent on its expression level and gene length, but also the sequencing depth. To normalize these dependencies, RPKM (reads per kilobase of transcript per million reads mapped) and TPM (transcripts per million) are used to measure gene or transcript expression levels. A common misconception is that RPKM and TPM values are already normalized, and thus should be comparable across samples or RNA-seq projects. However, RPKM and TPM represent the relative abundance of a transcript among a population of sequenced transcripts, and therefore depend on the composition of the RNA population in a sample. Quite often, it is reasonable to assume that total RNA concentration and distributions are very close across compared samples. Nevertheless, the sequenced RNA repertoires may differ significantly under different experimental conditions and/or across sequencing protocols; thus, the proportion of gene expression is not directly comparable in such cases. In this review, we illustrate typical scenarios in which RPKM and TPM are misused, unintentionally, and hope to raise scientists’ awareness of this issue when comparing them across samples or different sequencing protocols. Cold Spring Harbor Laboratory Press 2020-08 /pmc/articles/PMC7373998/ /pubmed/32284352 http://dx.doi.org/10.1261/rna.074922.120 Text en © 2020 Zhao et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society http://creativecommons.org/licenses/by-nc/4.0/ This article, published in RNA, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle Review
Zhao, Shanrong
Ye, Zhan
Stanton, Robert
Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols
title Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols
title_full Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols
title_fullStr Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols
title_full_unstemmed Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols
title_short Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols
title_sort misuse of rpkm or tpm normalization when comparing across samples and sequencing protocols
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7373998/
https://www.ncbi.nlm.nih.gov/pubmed/32284352
http://dx.doi.org/10.1261/rna.074922.120
work_keys_str_mv AT zhaoshanrong misuseofrpkmortpmnormalizationwhencomparingacrosssamplesandsequencingprotocols
AT yezhan misuseofrpkmortpmnormalizationwhencomparingacrosssamplesandsequencingprotocols
AT stantonrobert misuseofrpkmortpmnormalizationwhencomparingacrosssamplesandsequencingprotocols