Cargando…
PrismEXP: gene annotation prediction from stratified gene-gene co-expression matrices
BACKGROUND: Gene-gene co-expression correlations measured by mRNA-sequencing (RNA-seq) can be used to predict gene annotations based on the co-variance structure within these data. In our prior work, we showed that uniformly aligned RNA-seq co-expression data from thousands of diverse studies is hig...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9979837/ https://www.ncbi.nlm.nih.gov/pubmed/36874981 http://dx.doi.org/10.7717/peerj.14927 |
_version_ | 1784899799328227328 |
---|---|
author | Lachmann, Alexander Rizzo, Kaeli A. Bartal, Alon Jeon, Minji Clarke, Daniel J. B. Ma’ayan, Avi |
author_facet | Lachmann, Alexander Rizzo, Kaeli A. Bartal, Alon Jeon, Minji Clarke, Daniel J. B. Ma’ayan, Avi |
author_sort | Lachmann, Alexander |
collection | PubMed |
description | BACKGROUND: Gene-gene co-expression correlations measured by mRNA-sequencing (RNA-seq) can be used to predict gene annotations based on the co-variance structure within these data. In our prior work, we showed that uniformly aligned RNA-seq co-expression data from thousands of diverse studies is highly predictive of both gene annotations and protein-protein interactions. However, the performance of the predictions varies depending on whether the gene annotations and interactions are cell type and tissue specific or agnostic. Tissue and cell type-specific gene-gene co-expression data can be useful for making more accurate predictions because many genes perform their functions in unique ways in different cellular contexts. However, identifying the optimal tissues and cell types to partition the global gene-gene co-expression matrix is challenging. RESULTS: Here we introduce and validate an approach called PRediction of gene Insights from Stratified Mammalian gene co-EXPression (PrismEXP) for improved gene annotation predictions based on RNA-seq gene-gene co-expression data. Using uniformly aligned data from ARCHS4, we apply PrismEXP to predict a wide variety of gene annotations including pathway membership, Gene Ontology terms, as well as human and mouse phenotypes. Predictions made with PrismEXP outperform predictions made with the global cross-tissue co-expression correlation matrix approach on all tested domains, and training using one annotation domain can be used to predict annotations in other domains. CONCLUSIONS: By demonstrating the utility of PrismEXP predictions in multiple use cases we show how PrismEXP can be used to enhance unsupervised machine learning methods to better understand the roles of understudied genes and proteins. To make PrismEXP accessible, it is provided via a user-friendly web interface, a Python package, and an Appyter. AVAILABILITY. The PrismEXP web-based application, with pre-computed PrismEXP predictions, is available from: https://maayanlab.cloud/prismexp; PrismEXP is also available as an Appyter: https://appyters.maayanlab.cloud/PrismEXP/; and as Python package: https://github.com/maayanlab/prismexp. |
format | Online Article Text |
id | pubmed-9979837 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-99798372023-03-03 PrismEXP: gene annotation prediction from stratified gene-gene co-expression matrices Lachmann, Alexander Rizzo, Kaeli A. Bartal, Alon Jeon, Minji Clarke, Daniel J. B. Ma’ayan, Avi PeerJ Bioinformatics BACKGROUND: Gene-gene co-expression correlations measured by mRNA-sequencing (RNA-seq) can be used to predict gene annotations based on the co-variance structure within these data. In our prior work, we showed that uniformly aligned RNA-seq co-expression data from thousands of diverse studies is highly predictive of both gene annotations and protein-protein interactions. However, the performance of the predictions varies depending on whether the gene annotations and interactions are cell type and tissue specific or agnostic. Tissue and cell type-specific gene-gene co-expression data can be useful for making more accurate predictions because many genes perform their functions in unique ways in different cellular contexts. However, identifying the optimal tissues and cell types to partition the global gene-gene co-expression matrix is challenging. RESULTS: Here we introduce and validate an approach called PRediction of gene Insights from Stratified Mammalian gene co-EXPression (PrismEXP) for improved gene annotation predictions based on RNA-seq gene-gene co-expression data. Using uniformly aligned data from ARCHS4, we apply PrismEXP to predict a wide variety of gene annotations including pathway membership, Gene Ontology terms, as well as human and mouse phenotypes. Predictions made with PrismEXP outperform predictions made with the global cross-tissue co-expression correlation matrix approach on all tested domains, and training using one annotation domain can be used to predict annotations in other domains. CONCLUSIONS: By demonstrating the utility of PrismEXP predictions in multiple use cases we show how PrismEXP can be used to enhance unsupervised machine learning methods to better understand the roles of understudied genes and proteins. To make PrismEXP accessible, it is provided via a user-friendly web interface, a Python package, and an Appyter. AVAILABILITY. The PrismEXP web-based application, with pre-computed PrismEXP predictions, is available from: https://maayanlab.cloud/prismexp; PrismEXP is also available as an Appyter: https://appyters.maayanlab.cloud/PrismEXP/; and as Python package: https://github.com/maayanlab/prismexp. PeerJ Inc. 2023-02-27 /pmc/articles/PMC9979837/ /pubmed/36874981 http://dx.doi.org/10.7717/peerj.14927 Text en © 2023 Lachmann et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Lachmann, Alexander Rizzo, Kaeli A. Bartal, Alon Jeon, Minji Clarke, Daniel J. B. Ma’ayan, Avi PrismEXP: gene annotation prediction from stratified gene-gene co-expression matrices |
title | PrismEXP: gene annotation prediction from stratified gene-gene co-expression matrices |
title_full | PrismEXP: gene annotation prediction from stratified gene-gene co-expression matrices |
title_fullStr | PrismEXP: gene annotation prediction from stratified gene-gene co-expression matrices |
title_full_unstemmed | PrismEXP: gene annotation prediction from stratified gene-gene co-expression matrices |
title_short | PrismEXP: gene annotation prediction from stratified gene-gene co-expression matrices |
title_sort | prismexp: gene annotation prediction from stratified gene-gene co-expression matrices |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9979837/ https://www.ncbi.nlm.nih.gov/pubmed/36874981 http://dx.doi.org/10.7717/peerj.14927 |
work_keys_str_mv | AT lachmannalexander prismexpgeneannotationpredictionfromstratifiedgenegenecoexpressionmatrices AT rizzokaelia prismexpgeneannotationpredictionfromstratifiedgenegenecoexpressionmatrices AT bartalalon prismexpgeneannotationpredictionfromstratifiedgenegenecoexpressionmatrices AT jeonminji prismexpgeneannotationpredictionfromstratifiedgenegenecoexpressionmatrices AT clarkedanieljb prismexpgeneannotationpredictionfromstratifiedgenegenecoexpressionmatrices AT maayanavi prismexpgeneannotationpredictionfromstratifiedgenegenecoexpressionmatrices |