Cargando…

TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository

BACKGROUND: In order to correctly decode phenotypic information from RNA-sequencing (RNA-seq) data, careful selection of the RNA-seq quantification measure is critical for inter-sample comparisons and for downstream analyses, such as differential gene expression between two or more conditions. Sever...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Yingdong, Li, Ming-Chung, Konaté, Mariam M., Chen, Li, Das, Biswajit, Karlovich, Chris, Williams, P. Mickey, Evrard, Yvonne A., Doroshow, James H., McShane, Lisa M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8220791/
https://www.ncbi.nlm.nih.gov/pubmed/34158060
http://dx.doi.org/10.1186/s12967-021-02936-w
_version_ 1783711215366701056
author Zhao, Yingdong
Li, Ming-Chung
Konaté, Mariam M.
Chen, Li
Das, Biswajit
Karlovich, Chris
Williams, P. Mickey
Evrard, Yvonne A.
Doroshow, James H.
McShane, Lisa M.
author_facet Zhao, Yingdong
Li, Ming-Chung
Konaté, Mariam M.
Chen, Li
Das, Biswajit
Karlovich, Chris
Williams, P. Mickey
Evrard, Yvonne A.
Doroshow, James H.
McShane, Lisa M.
author_sort Zhao, Yingdong
collection PubMed
description BACKGROUND: In order to correctly decode phenotypic information from RNA-sequencing (RNA-seq) data, careful selection of the RNA-seq quantification measure is critical for inter-sample comparisons and for downstream analyses, such as differential gene expression between two or more conditions. Several methods have been proposed and continue to be used. However, a consensus has not been reached regarding the best gene expression quantification method for RNA-seq data analysis. METHODS: In the present study, we used replicate samples from each of 20 patient-derived xenograft (PDX) models spanning 15 tumor types, for a total of 61 human tumor xenograft samples available through the NCI patient-derived model repository (PDMR). We compared the reproducibility across replicate samples based on TPM (transcripts per million), FPKM (fragments per kilobase of transcript per million fragments mapped), and normalized counts using coefficient of variation, intraclass correlation coefficient, and cluster analysis. RESULTS: Our results revealed that hierarchical clustering on normalized count data tended to group replicate samples from the same PDX model together more accurately than TPM and FPKM data. Furthermore, normalized count data were observed to have the lowest median coefficient of variation (CV), and highest intraclass correlation (ICC) values across all replicate samples from the same model and for the same gene across all PDX models compared to TPM and FPKM data. CONCLUSION: We provided compelling evidence for a preferred quantification measure to conduct downstream analyses of PDX RNA-seq data. To our knowledge, this is the first comparative study of RNA-seq data quantification measures conducted on PDX models, which are known to be inherently more variable than cell line models. Our findings are consistent with what others have shown for human tumors and cell lines and add further support to the thesis that normalized counts are the best choice for the analysis of RNA-seq data across samples. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12967-021-02936-w.
format Online
Article
Text
id pubmed-8220791
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-82207912021-06-24 TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository Zhao, Yingdong Li, Ming-Chung Konaté, Mariam M. Chen, Li Das, Biswajit Karlovich, Chris Williams, P. Mickey Evrard, Yvonne A. Doroshow, James H. McShane, Lisa M. J Transl Med Research BACKGROUND: In order to correctly decode phenotypic information from RNA-sequencing (RNA-seq) data, careful selection of the RNA-seq quantification measure is critical for inter-sample comparisons and for downstream analyses, such as differential gene expression between two or more conditions. Several methods have been proposed and continue to be used. However, a consensus has not been reached regarding the best gene expression quantification method for RNA-seq data analysis. METHODS: In the present study, we used replicate samples from each of 20 patient-derived xenograft (PDX) models spanning 15 tumor types, for a total of 61 human tumor xenograft samples available through the NCI patient-derived model repository (PDMR). We compared the reproducibility across replicate samples based on TPM (transcripts per million), FPKM (fragments per kilobase of transcript per million fragments mapped), and normalized counts using coefficient of variation, intraclass correlation coefficient, and cluster analysis. RESULTS: Our results revealed that hierarchical clustering on normalized count data tended to group replicate samples from the same PDX model together more accurately than TPM and FPKM data. Furthermore, normalized count data were observed to have the lowest median coefficient of variation (CV), and highest intraclass correlation (ICC) values across all replicate samples from the same model and for the same gene across all PDX models compared to TPM and FPKM data. CONCLUSION: We provided compelling evidence for a preferred quantification measure to conduct downstream analyses of PDX RNA-seq data. To our knowledge, this is the first comparative study of RNA-seq data quantification measures conducted on PDX models, which are known to be inherently more variable than cell line models. Our findings are consistent with what others have shown for human tumors and cell lines and add further support to the thesis that normalized counts are the best choice for the analysis of RNA-seq data across samples. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12967-021-02936-w. BioMed Central 2021-06-22 /pmc/articles/PMC8220791/ /pubmed/34158060 http://dx.doi.org/10.1186/s12967-021-02936-w Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Zhao, Yingdong
Li, Ming-Chung
Konaté, Mariam M.
Chen, Li
Das, Biswajit
Karlovich, Chris
Williams, P. Mickey
Evrard, Yvonne A.
Doroshow, James H.
McShane, Lisa M.
TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository
title TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository
title_full TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository
title_fullStr TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository
title_full_unstemmed TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository
title_short TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository
title_sort tpm, fpkm, or normalized counts? a comparative study of quantification measures for the analysis of rna-seq data from the nci patient-derived models repository
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8220791/
https://www.ncbi.nlm.nih.gov/pubmed/34158060
http://dx.doi.org/10.1186/s12967-021-02936-w
work_keys_str_mv AT zhaoyingdong tpmfpkmornormalizedcountsacomparativestudyofquantificationmeasuresfortheanalysisofrnaseqdatafromthencipatientderivedmodelsrepository
AT limingchung tpmfpkmornormalizedcountsacomparativestudyofquantificationmeasuresfortheanalysisofrnaseqdatafromthencipatientderivedmodelsrepository
AT konatemariamm tpmfpkmornormalizedcountsacomparativestudyofquantificationmeasuresfortheanalysisofrnaseqdatafromthencipatientderivedmodelsrepository
AT chenli tpmfpkmornormalizedcountsacomparativestudyofquantificationmeasuresfortheanalysisofrnaseqdatafromthencipatientderivedmodelsrepository
AT dasbiswajit tpmfpkmornormalizedcountsacomparativestudyofquantificationmeasuresfortheanalysisofrnaseqdatafromthencipatientderivedmodelsrepository
AT karlovichchris tpmfpkmornormalizedcountsacomparativestudyofquantificationmeasuresfortheanalysisofrnaseqdatafromthencipatientderivedmodelsrepository
AT williamspmickey tpmfpkmornormalizedcountsacomparativestudyofquantificationmeasuresfortheanalysisofrnaseqdatafromthencipatientderivedmodelsrepository
AT evrardyvonnea tpmfpkmornormalizedcountsacomparativestudyofquantificationmeasuresfortheanalysisofrnaseqdatafromthencipatientderivedmodelsrepository
AT doroshowjamesh tpmfpkmornormalizedcountsacomparativestudyofquantificationmeasuresfortheanalysisofrnaseqdatafromthencipatientderivedmodelsrepository
AT mcshanelisam tpmfpkmornormalizedcountsacomparativestudyofquantificationmeasuresfortheanalysisofrnaseqdatafromthencipatientderivedmodelsrepository