Cargando…

Potential Schizophrenia Disease-Related Genes Prediction Using Metagraph Representations Based on a Protein-Protein Interaction Keyword Network: Framework Development and Validation

BACKGROUND: Schizophrenia is a serious mental disease. With increased research funding for this disease, schizophrenia has become one of the key areas of focus in the medical field. Searching for associations between diseases and genes is an effective approach to study complex diseases, which may en...

Descripción completa

Detalles Bibliográficos
Autores principales: Yu, Shirui, Wang, Ziyang, Nan, Jiale, Li, Aihua, Yang, Xuemei, Tang, Xiaoli
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10687686/
https://www.ncbi.nlm.nih.gov/pubmed/37966892
http://dx.doi.org/10.2196/50998
_version_ 1785152025514737664
author Yu, Shirui
Wang, Ziyang
Nan, Jiale
Li, Aihua
Yang, Xuemei
Tang, Xiaoli
author_facet Yu, Shirui
Wang, Ziyang
Nan, Jiale
Li, Aihua
Yang, Xuemei
Tang, Xiaoli
author_sort Yu, Shirui
collection PubMed
description BACKGROUND: Schizophrenia is a serious mental disease. With increased research funding for this disease, schizophrenia has become one of the key areas of focus in the medical field. Searching for associations between diseases and genes is an effective approach to study complex diseases, which may enhance research on schizophrenia pathology and lead to the identification of new treatment targets. OBJECTIVE: The aim of this study was to identify potential schizophrenia risk genes by employing machine learning methods to extract topological characteristics of proteins and their functional roles in a protein-protein interaction (PPI)-keywords (PPIK) network and understand the complex disease–causing property. Consequently, a PPIK-based metagraph representation approach is proposed. METHODS: To enrich the PPI network, we integrated keywords describing protein properties and constructed a PPIK network. We extracted features that describe the topology of this network through metagraphs. We further transformed these metagraphs into vectors and represented proteins with a series of vectors. We then trained and optimized our model using random forest (RF), extreme gradient boosting, light gradient boosting machine, and logistic regression models. RESULTS: Comprehensive experiments demonstrated the good performance of our proposed method with an area under the receiver operating characteristic curve (AUC) value between 0.72 and 0.76. Our model also outperformed baseline methods for overall disease protein prediction, including the random walk with restart, average commute time, and Katz models. Compared with the PPI network constructed from the baseline models, complementation of keywords in the PPIK network improved the performance (AUC) by 0.08 on average, and the metagraph-based method improved the AUC by 0.30 on average compared with that of the baseline methods. According to the comprehensive performance of the four models, RF was selected as the best model for disease protein prediction, with precision, recall, F1-score, and AUC values of 0.76, 0.73, 0.72, and 0.76, respectively. We transformed these proteins to their encoding gene IDs and identified the top 20 genes as the most probable schizophrenia-risk genes, including the EYA3, CNTN4, HSPA8, LRRK2, and AFP genes. We further validated these outcomes against metagraph features and evidence from the literature, performed a features analysis, and exploited evidence from the literature to interpret the correlation between the predicted genes and diseases. CONCLUSIONS: The metagraph representation based on the PPIK network framework was found to be effective for potential schizophrenia risk genes identification. The results are quite reliable as evidence can be found in the literature to support our prediction. Our approach can provide more biological insights into the pathogenesis of schizophrenia.
format Online
Article
Text
id pubmed-10687686
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-106876862023-11-30 Potential Schizophrenia Disease-Related Genes Prediction Using Metagraph Representations Based on a Protein-Protein Interaction Keyword Network: Framework Development and Validation Yu, Shirui Wang, Ziyang Nan, Jiale Li, Aihua Yang, Xuemei Tang, Xiaoli JMIR Form Res Original Paper BACKGROUND: Schizophrenia is a serious mental disease. With increased research funding for this disease, schizophrenia has become one of the key areas of focus in the medical field. Searching for associations between diseases and genes is an effective approach to study complex diseases, which may enhance research on schizophrenia pathology and lead to the identification of new treatment targets. OBJECTIVE: The aim of this study was to identify potential schizophrenia risk genes by employing machine learning methods to extract topological characteristics of proteins and their functional roles in a protein-protein interaction (PPI)-keywords (PPIK) network and understand the complex disease–causing property. Consequently, a PPIK-based metagraph representation approach is proposed. METHODS: To enrich the PPI network, we integrated keywords describing protein properties and constructed a PPIK network. We extracted features that describe the topology of this network through metagraphs. We further transformed these metagraphs into vectors and represented proteins with a series of vectors. We then trained and optimized our model using random forest (RF), extreme gradient boosting, light gradient boosting machine, and logistic regression models. RESULTS: Comprehensive experiments demonstrated the good performance of our proposed method with an area under the receiver operating characteristic curve (AUC) value between 0.72 and 0.76. Our model also outperformed baseline methods for overall disease protein prediction, including the random walk with restart, average commute time, and Katz models. Compared with the PPI network constructed from the baseline models, complementation of keywords in the PPIK network improved the performance (AUC) by 0.08 on average, and the metagraph-based method improved the AUC by 0.30 on average compared with that of the baseline methods. According to the comprehensive performance of the four models, RF was selected as the best model for disease protein prediction, with precision, recall, F1-score, and AUC values of 0.76, 0.73, 0.72, and 0.76, respectively. We transformed these proteins to their encoding gene IDs and identified the top 20 genes as the most probable schizophrenia-risk genes, including the EYA3, CNTN4, HSPA8, LRRK2, and AFP genes. We further validated these outcomes against metagraph features and evidence from the literature, performed a features analysis, and exploited evidence from the literature to interpret the correlation between the predicted genes and diseases. CONCLUSIONS: The metagraph representation based on the PPIK network framework was found to be effective for potential schizophrenia risk genes identification. The results are quite reliable as evidence can be found in the literature to support our prediction. Our approach can provide more biological insights into the pathogenesis of schizophrenia. JMIR Publications 2023-11-15 /pmc/articles/PMC10687686/ /pubmed/37966892 http://dx.doi.org/10.2196/50998 Text en ©Shirui Yu, Ziyang Wang, Jiale Nan, Aihua Li, Xuemei Yang, Xiaoli Tang. Originally published in JMIR Formative Research (https://formative.jmir.org), 15.11.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included.
spellingShingle Original Paper
Yu, Shirui
Wang, Ziyang
Nan, Jiale
Li, Aihua
Yang, Xuemei
Tang, Xiaoli
Potential Schizophrenia Disease-Related Genes Prediction Using Metagraph Representations Based on a Protein-Protein Interaction Keyword Network: Framework Development and Validation
title Potential Schizophrenia Disease-Related Genes Prediction Using Metagraph Representations Based on a Protein-Protein Interaction Keyword Network: Framework Development and Validation
title_full Potential Schizophrenia Disease-Related Genes Prediction Using Metagraph Representations Based on a Protein-Protein Interaction Keyword Network: Framework Development and Validation
title_fullStr Potential Schizophrenia Disease-Related Genes Prediction Using Metagraph Representations Based on a Protein-Protein Interaction Keyword Network: Framework Development and Validation
title_full_unstemmed Potential Schizophrenia Disease-Related Genes Prediction Using Metagraph Representations Based on a Protein-Protein Interaction Keyword Network: Framework Development and Validation
title_short Potential Schizophrenia Disease-Related Genes Prediction Using Metagraph Representations Based on a Protein-Protein Interaction Keyword Network: Framework Development and Validation
title_sort potential schizophrenia disease-related genes prediction using metagraph representations based on a protein-protein interaction keyword network: framework development and validation
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10687686/
https://www.ncbi.nlm.nih.gov/pubmed/37966892
http://dx.doi.org/10.2196/50998
work_keys_str_mv AT yushirui potentialschizophreniadiseaserelatedgenespredictionusingmetagraphrepresentationsbasedonaproteinproteininteractionkeywordnetworkframeworkdevelopmentandvalidation
AT wangziyang potentialschizophreniadiseaserelatedgenespredictionusingmetagraphrepresentationsbasedonaproteinproteininteractionkeywordnetworkframeworkdevelopmentandvalidation
AT nanjiale potentialschizophreniadiseaserelatedgenespredictionusingmetagraphrepresentationsbasedonaproteinproteininteractionkeywordnetworkframeworkdevelopmentandvalidation
AT liaihua potentialschizophreniadiseaserelatedgenespredictionusingmetagraphrepresentationsbasedonaproteinproteininteractionkeywordnetworkframeworkdevelopmentandvalidation
AT yangxuemei potentialschizophreniadiseaserelatedgenespredictionusingmetagraphrepresentationsbasedonaproteinproteininteractionkeywordnetworkframeworkdevelopmentandvalidation
AT tangxiaoli potentialschizophreniadiseaserelatedgenespredictionusingmetagraphrepresentationsbasedonaproteinproteininteractionkeywordnetworkframeworkdevelopmentandvalidation