Cargando…

PrismEXP: gene annotation prediction from stratified gene-gene co-expression matrices

BACKGROUND: Gene-gene co-expression correlations measured by mRNA-sequencing (RNA-seq) can be used to predict gene annotations based on the co-variance structure within these data. In our prior work, we showed that uniformly aligned RNA-seq co-expression data from thousands of diverse studies is hig...

Descripción completa

Detalles Bibliográficos
Autores principales: Lachmann, Alexander, Rizzo, Kaeli A., Bartal, Alon, Jeon, Minji, Clarke, Daniel J. B., Ma’ayan, Avi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9979837/
https://www.ncbi.nlm.nih.gov/pubmed/36874981
http://dx.doi.org/10.7717/peerj.14927
Descripción
Sumario:BACKGROUND: Gene-gene co-expression correlations measured by mRNA-sequencing (RNA-seq) can be used to predict gene annotations based on the co-variance structure within these data. In our prior work, we showed that uniformly aligned RNA-seq co-expression data from thousands of diverse studies is highly predictive of both gene annotations and protein-protein interactions. However, the performance of the predictions varies depending on whether the gene annotations and interactions are cell type and tissue specific or agnostic. Tissue and cell type-specific gene-gene co-expression data can be useful for making more accurate predictions because many genes perform their functions in unique ways in different cellular contexts. However, identifying the optimal tissues and cell types to partition the global gene-gene co-expression matrix is challenging. RESULTS: Here we introduce and validate an approach called PRediction of gene Insights from Stratified Mammalian gene co-EXPression (PrismEXP) for improved gene annotation predictions based on RNA-seq gene-gene co-expression data. Using uniformly aligned data from ARCHS4, we apply PrismEXP to predict a wide variety of gene annotations including pathway membership, Gene Ontology terms, as well as human and mouse phenotypes. Predictions made with PrismEXP outperform predictions made with the global cross-tissue co-expression correlation matrix approach on all tested domains, and training using one annotation domain can be used to predict annotations in other domains. CONCLUSIONS: By demonstrating the utility of PrismEXP predictions in multiple use cases we show how PrismEXP can be used to enhance unsupervised machine learning methods to better understand the roles of understudied genes and proteins. To make PrismEXP accessible, it is provided via a user-friendly web interface, a Python package, and an Appyter. AVAILABILITY. The PrismEXP web-based application, with pre-computed PrismEXP predictions, is available from: https://maayanlab.cloud/prismexp; PrismEXP is also available as an Appyter: https://appyters.maayanlab.cloud/PrismEXP/; and as Python package: https://github.com/maayanlab/prismexp.