Cargando…

NaRnEA: An Information Theoretic Framework for Gene Set Analysis

Gene sets are being increasingly leveraged to make high-level biological inferences from transcriptomic data; however, existing gene set analysis methods rely on overly conservative, heuristic approaches for quantifying the statistical significance of gene set enrichment. We created Nonparametric an...

Descripción completa

Detalles Bibliográficos
Autores principales: Griffin, Aaron T., Vlahos, Lukas J., Chiuzan, Codruta, Califano, Andrea
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10048242/
https://www.ncbi.nlm.nih.gov/pubmed/36981431
http://dx.doi.org/10.3390/e25030542
_version_ 1785014134676389888
author Griffin, Aaron T.
Vlahos, Lukas J.
Chiuzan, Codruta
Califano, Andrea
author_facet Griffin, Aaron T.
Vlahos, Lukas J.
Chiuzan, Codruta
Califano, Andrea
author_sort Griffin, Aaron T.
collection PubMed
description Gene sets are being increasingly leveraged to make high-level biological inferences from transcriptomic data; however, existing gene set analysis methods rely on overly conservative, heuristic approaches for quantifying the statistical significance of gene set enrichment. We created Nonparametric analytical-Rank-based Enrichment Analysis (NaRnEA) to facilitate accurate and robust gene set analysis with an optimal null model derived using the information theoretic Principle of Maximum Entropy. By measuring the differential activity of ~2500 transcriptional regulatory proteins based on the differential expression of each protein’s transcriptional targets between primary tumors and normal tissue samples in three cohorts from The Cancer Genome Atlas (TCGA), we demonstrate that NaRnEA critically improves in two widely used gene set analysis methods: Gene Set Enrichment Analysis (GSEA) and analytical-Rank-based Enrichment Analysis (aREA). We show that the NaRnEA-inferred differential protein activity is significantly correlated with differential protein abundance inferred from independent, phenotype-matched mass spectrometry data in the Clinical Proteomic Tumor Analysis Consortium (CPTAC), confirming the statistical and biological accuracy of our approach. Additionally, our analysis crucially demonstrates that the sample-shuffling empirical null models leveraged by GSEA and aREA for gene set analysis are overly conservative, a shortcoming that is avoided by the newly developed Maximum Entropy analytical null model employed by NaRnEA.
format Online
Article
Text
id pubmed-10048242
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-100482422023-03-29 NaRnEA: An Information Theoretic Framework for Gene Set Analysis Griffin, Aaron T. Vlahos, Lukas J. Chiuzan, Codruta Califano, Andrea Entropy (Basel) Article Gene sets are being increasingly leveraged to make high-level biological inferences from transcriptomic data; however, existing gene set analysis methods rely on overly conservative, heuristic approaches for quantifying the statistical significance of gene set enrichment. We created Nonparametric analytical-Rank-based Enrichment Analysis (NaRnEA) to facilitate accurate and robust gene set analysis with an optimal null model derived using the information theoretic Principle of Maximum Entropy. By measuring the differential activity of ~2500 transcriptional regulatory proteins based on the differential expression of each protein’s transcriptional targets between primary tumors and normal tissue samples in three cohorts from The Cancer Genome Atlas (TCGA), we demonstrate that NaRnEA critically improves in two widely used gene set analysis methods: Gene Set Enrichment Analysis (GSEA) and analytical-Rank-based Enrichment Analysis (aREA). We show that the NaRnEA-inferred differential protein activity is significantly correlated with differential protein abundance inferred from independent, phenotype-matched mass spectrometry data in the Clinical Proteomic Tumor Analysis Consortium (CPTAC), confirming the statistical and biological accuracy of our approach. Additionally, our analysis crucially demonstrates that the sample-shuffling empirical null models leveraged by GSEA and aREA for gene set analysis are overly conservative, a shortcoming that is avoided by the newly developed Maximum Entropy analytical null model employed by NaRnEA. MDPI 2023-03-21 /pmc/articles/PMC10048242/ /pubmed/36981431 http://dx.doi.org/10.3390/e25030542 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Griffin, Aaron T.
Vlahos, Lukas J.
Chiuzan, Codruta
Califano, Andrea
NaRnEA: An Information Theoretic Framework for Gene Set Analysis
title NaRnEA: An Information Theoretic Framework for Gene Set Analysis
title_full NaRnEA: An Information Theoretic Framework for Gene Set Analysis
title_fullStr NaRnEA: An Information Theoretic Framework for Gene Set Analysis
title_full_unstemmed NaRnEA: An Information Theoretic Framework for Gene Set Analysis
title_short NaRnEA: An Information Theoretic Framework for Gene Set Analysis
title_sort narnea: an information theoretic framework for gene set analysis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10048242/
https://www.ncbi.nlm.nih.gov/pubmed/36981431
http://dx.doi.org/10.3390/e25030542
work_keys_str_mv AT griffinaaront narneaaninformationtheoreticframeworkforgenesetanalysis
AT vlahoslukasj narneaaninformationtheoreticframeworkforgenesetanalysis
AT chiuzancodruta narneaaninformationtheoreticframeworkforgenesetanalysis
AT califanoandrea narneaaninformationtheoreticframeworkforgenesetanalysis