Cargando…

PrismEXP: gene annotation prediction from stratified gene-gene co-expression matrices

BACKGROUND: Gene-gene co-expression correlations measured by mRNA-sequencing (RNA-seq) can be used to predict gene annotations based on the co-variance structure within these data. In our prior work, we showed that uniformly aligned RNA-seq co-expression data from thousands of diverse studies is hig...

Descripción completa

Detalles Bibliográficos
Autores principales: Lachmann, Alexander, Rizzo, Kaeli A., Bartal, Alon, Jeon, Minji, Clarke, Daniel J. B., Ma’ayan, Avi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9979837/
https://www.ncbi.nlm.nih.gov/pubmed/36874981
http://dx.doi.org/10.7717/peerj.14927
_version_ 1784899799328227328
author Lachmann, Alexander
Rizzo, Kaeli A.
Bartal, Alon
Jeon, Minji
Clarke, Daniel J. B.
Ma’ayan, Avi
author_facet Lachmann, Alexander
Rizzo, Kaeli A.
Bartal, Alon
Jeon, Minji
Clarke, Daniel J. B.
Ma’ayan, Avi
author_sort Lachmann, Alexander
collection PubMed
description BACKGROUND: Gene-gene co-expression correlations measured by mRNA-sequencing (RNA-seq) can be used to predict gene annotations based on the co-variance structure within these data. In our prior work, we showed that uniformly aligned RNA-seq co-expression data from thousands of diverse studies is highly predictive of both gene annotations and protein-protein interactions. However, the performance of the predictions varies depending on whether the gene annotations and interactions are cell type and tissue specific or agnostic. Tissue and cell type-specific gene-gene co-expression data can be useful for making more accurate predictions because many genes perform their functions in unique ways in different cellular contexts. However, identifying the optimal tissues and cell types to partition the global gene-gene co-expression matrix is challenging. RESULTS: Here we introduce and validate an approach called PRediction of gene Insights from Stratified Mammalian gene co-EXPression (PrismEXP) for improved gene annotation predictions based on RNA-seq gene-gene co-expression data. Using uniformly aligned data from ARCHS4, we apply PrismEXP to predict a wide variety of gene annotations including pathway membership, Gene Ontology terms, as well as human and mouse phenotypes. Predictions made with PrismEXP outperform predictions made with the global cross-tissue co-expression correlation matrix approach on all tested domains, and training using one annotation domain can be used to predict annotations in other domains. CONCLUSIONS: By demonstrating the utility of PrismEXP predictions in multiple use cases we show how PrismEXP can be used to enhance unsupervised machine learning methods to better understand the roles of understudied genes and proteins. To make PrismEXP accessible, it is provided via a user-friendly web interface, a Python package, and an Appyter. AVAILABILITY. The PrismEXP web-based application, with pre-computed PrismEXP predictions, is available from: https://maayanlab.cloud/prismexp; PrismEXP is also available as an Appyter: https://appyters.maayanlab.cloud/PrismEXP/; and as Python package: https://github.com/maayanlab/prismexp.
format Online
Article
Text
id pubmed-9979837
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-99798372023-03-03 PrismEXP: gene annotation prediction from stratified gene-gene co-expression matrices Lachmann, Alexander Rizzo, Kaeli A. Bartal, Alon Jeon, Minji Clarke, Daniel J. B. Ma’ayan, Avi PeerJ Bioinformatics BACKGROUND: Gene-gene co-expression correlations measured by mRNA-sequencing (RNA-seq) can be used to predict gene annotations based on the co-variance structure within these data. In our prior work, we showed that uniformly aligned RNA-seq co-expression data from thousands of diverse studies is highly predictive of both gene annotations and protein-protein interactions. However, the performance of the predictions varies depending on whether the gene annotations and interactions are cell type and tissue specific or agnostic. Tissue and cell type-specific gene-gene co-expression data can be useful for making more accurate predictions because many genes perform their functions in unique ways in different cellular contexts. However, identifying the optimal tissues and cell types to partition the global gene-gene co-expression matrix is challenging. RESULTS: Here we introduce and validate an approach called PRediction of gene Insights from Stratified Mammalian gene co-EXPression (PrismEXP) for improved gene annotation predictions based on RNA-seq gene-gene co-expression data. Using uniformly aligned data from ARCHS4, we apply PrismEXP to predict a wide variety of gene annotations including pathway membership, Gene Ontology terms, as well as human and mouse phenotypes. Predictions made with PrismEXP outperform predictions made with the global cross-tissue co-expression correlation matrix approach on all tested domains, and training using one annotation domain can be used to predict annotations in other domains. CONCLUSIONS: By demonstrating the utility of PrismEXP predictions in multiple use cases we show how PrismEXP can be used to enhance unsupervised machine learning methods to better understand the roles of understudied genes and proteins. To make PrismEXP accessible, it is provided via a user-friendly web interface, a Python package, and an Appyter. AVAILABILITY. The PrismEXP web-based application, with pre-computed PrismEXP predictions, is available from: https://maayanlab.cloud/prismexp; PrismEXP is also available as an Appyter: https://appyters.maayanlab.cloud/PrismEXP/; and as Python package: https://github.com/maayanlab/prismexp. PeerJ Inc. 2023-02-27 /pmc/articles/PMC9979837/ /pubmed/36874981 http://dx.doi.org/10.7717/peerj.14927 Text en © 2023 Lachmann et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Lachmann, Alexander
Rizzo, Kaeli A.
Bartal, Alon
Jeon, Minji
Clarke, Daniel J. B.
Ma’ayan, Avi
PrismEXP: gene annotation prediction from stratified gene-gene co-expression matrices
title PrismEXP: gene annotation prediction from stratified gene-gene co-expression matrices
title_full PrismEXP: gene annotation prediction from stratified gene-gene co-expression matrices
title_fullStr PrismEXP: gene annotation prediction from stratified gene-gene co-expression matrices
title_full_unstemmed PrismEXP: gene annotation prediction from stratified gene-gene co-expression matrices
title_short PrismEXP: gene annotation prediction from stratified gene-gene co-expression matrices
title_sort prismexp: gene annotation prediction from stratified gene-gene co-expression matrices
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9979837/
https://www.ncbi.nlm.nih.gov/pubmed/36874981
http://dx.doi.org/10.7717/peerj.14927
work_keys_str_mv AT lachmannalexander prismexpgeneannotationpredictionfromstratifiedgenegenecoexpressionmatrices
AT rizzokaelia prismexpgeneannotationpredictionfromstratifiedgenegenecoexpressionmatrices
AT bartalalon prismexpgeneannotationpredictionfromstratifiedgenegenecoexpressionmatrices
AT jeonminji prismexpgeneannotationpredictionfromstratifiedgenegenecoexpressionmatrices
AT clarkedanieljb prismexpgeneannotationpredictionfromstratifiedgenegenecoexpressionmatrices
AT maayanavi prismexpgeneannotationpredictionfromstratifiedgenegenecoexpressionmatrices