Cargando…
NaRnEA: An Information Theoretic Framework for Gene Set Analysis
Gene sets are being increasingly leveraged to make high-level biological inferences from transcriptomic data; however, existing gene set analysis methods rely on overly conservative, heuristic approaches for quantifying the statistical significance of gene set enrichment. We created Nonparametric an...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10048242/ https://www.ncbi.nlm.nih.gov/pubmed/36981431 http://dx.doi.org/10.3390/e25030542 |
_version_ | 1785014134676389888 |
---|---|
author | Griffin, Aaron T. Vlahos, Lukas J. Chiuzan, Codruta Califano, Andrea |
author_facet | Griffin, Aaron T. Vlahos, Lukas J. Chiuzan, Codruta Califano, Andrea |
author_sort | Griffin, Aaron T. |
collection | PubMed |
description | Gene sets are being increasingly leveraged to make high-level biological inferences from transcriptomic data; however, existing gene set analysis methods rely on overly conservative, heuristic approaches for quantifying the statistical significance of gene set enrichment. We created Nonparametric analytical-Rank-based Enrichment Analysis (NaRnEA) to facilitate accurate and robust gene set analysis with an optimal null model derived using the information theoretic Principle of Maximum Entropy. By measuring the differential activity of ~2500 transcriptional regulatory proteins based on the differential expression of each protein’s transcriptional targets between primary tumors and normal tissue samples in three cohorts from The Cancer Genome Atlas (TCGA), we demonstrate that NaRnEA critically improves in two widely used gene set analysis methods: Gene Set Enrichment Analysis (GSEA) and analytical-Rank-based Enrichment Analysis (aREA). We show that the NaRnEA-inferred differential protein activity is significantly correlated with differential protein abundance inferred from independent, phenotype-matched mass spectrometry data in the Clinical Proteomic Tumor Analysis Consortium (CPTAC), confirming the statistical and biological accuracy of our approach. Additionally, our analysis crucially demonstrates that the sample-shuffling empirical null models leveraged by GSEA and aREA for gene set analysis are overly conservative, a shortcoming that is avoided by the newly developed Maximum Entropy analytical null model employed by NaRnEA. |
format | Online Article Text |
id | pubmed-10048242 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-100482422023-03-29 NaRnEA: An Information Theoretic Framework for Gene Set Analysis Griffin, Aaron T. Vlahos, Lukas J. Chiuzan, Codruta Califano, Andrea Entropy (Basel) Article Gene sets are being increasingly leveraged to make high-level biological inferences from transcriptomic data; however, existing gene set analysis methods rely on overly conservative, heuristic approaches for quantifying the statistical significance of gene set enrichment. We created Nonparametric analytical-Rank-based Enrichment Analysis (NaRnEA) to facilitate accurate and robust gene set analysis with an optimal null model derived using the information theoretic Principle of Maximum Entropy. By measuring the differential activity of ~2500 transcriptional regulatory proteins based on the differential expression of each protein’s transcriptional targets between primary tumors and normal tissue samples in three cohorts from The Cancer Genome Atlas (TCGA), we demonstrate that NaRnEA critically improves in two widely used gene set analysis methods: Gene Set Enrichment Analysis (GSEA) and analytical-Rank-based Enrichment Analysis (aREA). We show that the NaRnEA-inferred differential protein activity is significantly correlated with differential protein abundance inferred from independent, phenotype-matched mass spectrometry data in the Clinical Proteomic Tumor Analysis Consortium (CPTAC), confirming the statistical and biological accuracy of our approach. Additionally, our analysis crucially demonstrates that the sample-shuffling empirical null models leveraged by GSEA and aREA for gene set analysis are overly conservative, a shortcoming that is avoided by the newly developed Maximum Entropy analytical null model employed by NaRnEA. MDPI 2023-03-21 /pmc/articles/PMC10048242/ /pubmed/36981431 http://dx.doi.org/10.3390/e25030542 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Griffin, Aaron T. Vlahos, Lukas J. Chiuzan, Codruta Califano, Andrea NaRnEA: An Information Theoretic Framework for Gene Set Analysis |
title | NaRnEA: An Information Theoretic Framework for Gene Set Analysis |
title_full | NaRnEA: An Information Theoretic Framework for Gene Set Analysis |
title_fullStr | NaRnEA: An Information Theoretic Framework for Gene Set Analysis |
title_full_unstemmed | NaRnEA: An Information Theoretic Framework for Gene Set Analysis |
title_short | NaRnEA: An Information Theoretic Framework for Gene Set Analysis |
title_sort | narnea: an information theoretic framework for gene set analysis |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10048242/ https://www.ncbi.nlm.nih.gov/pubmed/36981431 http://dx.doi.org/10.3390/e25030542 |
work_keys_str_mv | AT griffinaaront narneaaninformationtheoreticframeworkforgenesetanalysis AT vlahoslukasj narneaaninformationtheoreticframeworkforgenesetanalysis AT chiuzancodruta narneaaninformationtheoreticframeworkforgenesetanalysis AT califanoandrea narneaaninformationtheoreticframeworkforgenesetanalysis |