Cargando…
Potential Schizophrenia Disease-Related Genes Prediction Using Metagraph Representations Based on a Protein-Protein Interaction Keyword Network: Framework Development and Validation
BACKGROUND: Schizophrenia is a serious mental disease. With increased research funding for this disease, schizophrenia has become one of the key areas of focus in the medical field. Searching for associations between diseases and genes is an effective approach to study complex diseases, which may en...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
JMIR Publications
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10687686/ https://www.ncbi.nlm.nih.gov/pubmed/37966892 http://dx.doi.org/10.2196/50998 |
_version_ | 1785152025514737664 |
---|---|
author | Yu, Shirui Wang, Ziyang Nan, Jiale Li, Aihua Yang, Xuemei Tang, Xiaoli |
author_facet | Yu, Shirui Wang, Ziyang Nan, Jiale Li, Aihua Yang, Xuemei Tang, Xiaoli |
author_sort | Yu, Shirui |
collection | PubMed |
description | BACKGROUND: Schizophrenia is a serious mental disease. With increased research funding for this disease, schizophrenia has become one of the key areas of focus in the medical field. Searching for associations between diseases and genes is an effective approach to study complex diseases, which may enhance research on schizophrenia pathology and lead to the identification of new treatment targets. OBJECTIVE: The aim of this study was to identify potential schizophrenia risk genes by employing machine learning methods to extract topological characteristics of proteins and their functional roles in a protein-protein interaction (PPI)-keywords (PPIK) network and understand the complex disease–causing property. Consequently, a PPIK-based metagraph representation approach is proposed. METHODS: To enrich the PPI network, we integrated keywords describing protein properties and constructed a PPIK network. We extracted features that describe the topology of this network through metagraphs. We further transformed these metagraphs into vectors and represented proteins with a series of vectors. We then trained and optimized our model using random forest (RF), extreme gradient boosting, light gradient boosting machine, and logistic regression models. RESULTS: Comprehensive experiments demonstrated the good performance of our proposed method with an area under the receiver operating characteristic curve (AUC) value between 0.72 and 0.76. Our model also outperformed baseline methods for overall disease protein prediction, including the random walk with restart, average commute time, and Katz models. Compared with the PPI network constructed from the baseline models, complementation of keywords in the PPIK network improved the performance (AUC) by 0.08 on average, and the metagraph-based method improved the AUC by 0.30 on average compared with that of the baseline methods. According to the comprehensive performance of the four models, RF was selected as the best model for disease protein prediction, with precision, recall, F1-score, and AUC values of 0.76, 0.73, 0.72, and 0.76, respectively. We transformed these proteins to their encoding gene IDs and identified the top 20 genes as the most probable schizophrenia-risk genes, including the EYA3, CNTN4, HSPA8, LRRK2, and AFP genes. We further validated these outcomes against metagraph features and evidence from the literature, performed a features analysis, and exploited evidence from the literature to interpret the correlation between the predicted genes and diseases. CONCLUSIONS: The metagraph representation based on the PPIK network framework was found to be effective for potential schizophrenia risk genes identification. The results are quite reliable as evidence can be found in the literature to support our prediction. Our approach can provide more biological insights into the pathogenesis of schizophrenia. |
format | Online Article Text |
id | pubmed-10687686 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | JMIR Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-106876862023-11-30 Potential Schizophrenia Disease-Related Genes Prediction Using Metagraph Representations Based on a Protein-Protein Interaction Keyword Network: Framework Development and Validation Yu, Shirui Wang, Ziyang Nan, Jiale Li, Aihua Yang, Xuemei Tang, Xiaoli JMIR Form Res Original Paper BACKGROUND: Schizophrenia is a serious mental disease. With increased research funding for this disease, schizophrenia has become one of the key areas of focus in the medical field. Searching for associations between diseases and genes is an effective approach to study complex diseases, which may enhance research on schizophrenia pathology and lead to the identification of new treatment targets. OBJECTIVE: The aim of this study was to identify potential schizophrenia risk genes by employing machine learning methods to extract topological characteristics of proteins and their functional roles in a protein-protein interaction (PPI)-keywords (PPIK) network and understand the complex disease–causing property. Consequently, a PPIK-based metagraph representation approach is proposed. METHODS: To enrich the PPI network, we integrated keywords describing protein properties and constructed a PPIK network. We extracted features that describe the topology of this network through metagraphs. We further transformed these metagraphs into vectors and represented proteins with a series of vectors. We then trained and optimized our model using random forest (RF), extreme gradient boosting, light gradient boosting machine, and logistic regression models. RESULTS: Comprehensive experiments demonstrated the good performance of our proposed method with an area under the receiver operating characteristic curve (AUC) value between 0.72 and 0.76. Our model also outperformed baseline methods for overall disease protein prediction, including the random walk with restart, average commute time, and Katz models. Compared with the PPI network constructed from the baseline models, complementation of keywords in the PPIK network improved the performance (AUC) by 0.08 on average, and the metagraph-based method improved the AUC by 0.30 on average compared with that of the baseline methods. According to the comprehensive performance of the four models, RF was selected as the best model for disease protein prediction, with precision, recall, F1-score, and AUC values of 0.76, 0.73, 0.72, and 0.76, respectively. We transformed these proteins to their encoding gene IDs and identified the top 20 genes as the most probable schizophrenia-risk genes, including the EYA3, CNTN4, HSPA8, LRRK2, and AFP genes. We further validated these outcomes against metagraph features and evidence from the literature, performed a features analysis, and exploited evidence from the literature to interpret the correlation between the predicted genes and diseases. CONCLUSIONS: The metagraph representation based on the PPIK network framework was found to be effective for potential schizophrenia risk genes identification. The results are quite reliable as evidence can be found in the literature to support our prediction. Our approach can provide more biological insights into the pathogenesis of schizophrenia. JMIR Publications 2023-11-15 /pmc/articles/PMC10687686/ /pubmed/37966892 http://dx.doi.org/10.2196/50998 Text en ©Shirui Yu, Ziyang Wang, Jiale Nan, Aihua Li, Xuemei Yang, Xiaoli Tang. Originally published in JMIR Formative Research (https://formative.jmir.org), 15.11.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included. |
spellingShingle | Original Paper Yu, Shirui Wang, Ziyang Nan, Jiale Li, Aihua Yang, Xuemei Tang, Xiaoli Potential Schizophrenia Disease-Related Genes Prediction Using Metagraph Representations Based on a Protein-Protein Interaction Keyword Network: Framework Development and Validation |
title | Potential Schizophrenia Disease-Related Genes Prediction Using Metagraph Representations Based on a Protein-Protein Interaction Keyword Network: Framework Development and Validation |
title_full | Potential Schizophrenia Disease-Related Genes Prediction Using Metagraph Representations Based on a Protein-Protein Interaction Keyword Network: Framework Development and Validation |
title_fullStr | Potential Schizophrenia Disease-Related Genes Prediction Using Metagraph Representations Based on a Protein-Protein Interaction Keyword Network: Framework Development and Validation |
title_full_unstemmed | Potential Schizophrenia Disease-Related Genes Prediction Using Metagraph Representations Based on a Protein-Protein Interaction Keyword Network: Framework Development and Validation |
title_short | Potential Schizophrenia Disease-Related Genes Prediction Using Metagraph Representations Based on a Protein-Protein Interaction Keyword Network: Framework Development and Validation |
title_sort | potential schizophrenia disease-related genes prediction using metagraph representations based on a protein-protein interaction keyword network: framework development and validation |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10687686/ https://www.ncbi.nlm.nih.gov/pubmed/37966892 http://dx.doi.org/10.2196/50998 |
work_keys_str_mv | AT yushirui potentialschizophreniadiseaserelatedgenespredictionusingmetagraphrepresentationsbasedonaproteinproteininteractionkeywordnetworkframeworkdevelopmentandvalidation AT wangziyang potentialschizophreniadiseaserelatedgenespredictionusingmetagraphrepresentationsbasedonaproteinproteininteractionkeywordnetworkframeworkdevelopmentandvalidation AT nanjiale potentialschizophreniadiseaserelatedgenespredictionusingmetagraphrepresentationsbasedonaproteinproteininteractionkeywordnetworkframeworkdevelopmentandvalidation AT liaihua potentialschizophreniadiseaserelatedgenespredictionusingmetagraphrepresentationsbasedonaproteinproteininteractionkeywordnetworkframeworkdevelopmentandvalidation AT yangxuemei potentialschizophreniadiseaserelatedgenespredictionusingmetagraphrepresentationsbasedonaproteinproteininteractionkeywordnetworkframeworkdevelopmentandvalidation AT tangxiaoli potentialschizophreniadiseaserelatedgenespredictionusingmetagraphrepresentationsbasedonaproteinproteininteractionkeywordnetworkframeworkdevelopmentandvalidation |