Cargando…

A novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning

BACKGROUND: Selecting and prioritizing candidate disease genes is necessary before conducting laboratory studies as identifying disease genes from a large number of candidate genes using laboratory methods, is a very costly and time-consuming task. There are many machine learning-based gene prioriti...

Descripción completa

Detalles Bibliográficos
Autores principales: Azadifar, Saeid, Ahmadi, Ali
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9563530/
https://www.ncbi.nlm.nih.gov/pubmed/36241966
http://dx.doi.org/10.1186/s12859-022-04954-x
_version_ 1784808425655369728
author Azadifar, Saeid
Ahmadi, Ali
author_facet Azadifar, Saeid
Ahmadi, Ali
author_sort Azadifar, Saeid
collection PubMed
description BACKGROUND: Selecting and prioritizing candidate disease genes is necessary before conducting laboratory studies as identifying disease genes from a large number of candidate genes using laboratory methods, is a very costly and time-consuming task. There are many machine learning-based gene prioritization methods. These methods differ in various aspects including the feature vectors of genes, the used datasets with different structures, and the learning model. Creating a suitable feature vector for genes and an appropriate learning model on a variety of data with different and non-Euclidean structures, including graphs, as well as the lack of negative data are very important challenges of these methods. The use of graph neural networks has recently emerged in machine learning and other related fields, and they have demonstrated superior performance for a broad range of problems. METHODS: In this study, a new semi-supervised learning method based on graph convolutional networks is presented using the novel constructing feature vector for each gene. In the proposed method, first, we construct three feature vectors for each gene using terms from the Gene Ontology (GO) database. Then, we train a graph convolution network on these vectors using protein–protein interaction (PPI) network data to identify disease candidate genes. Our model discovers hidden layer representations encoding in both local graph structure as well as features of nodes. This method is characterized by the simultaneous consideration of topological information of the biological network (e.g., PPI) and other sources of evidence. Finally, a validation has been done to demonstrate the efficiency of our method. RESULTS: Several experiments are performed on 16 diseases to evaluate the proposed method's performance. The experiments demonstrate that our proposed method achieves the best results, in terms of precision, the area under the ROC curve (AUCs), and F1-score values, when compared with eight state-of-the-art network and machine learning-based disease gene prioritization methods. CONCLUSION: This study shows that the proposed semi-supervised learning method appropriately classifies and ranks candidate disease genes using a graph convolutional network and an innovative method to create three feature vectors for genes based on the molecular function, cellular component, and biological process terms from GO data.
format Online
Article
Text
id pubmed-9563530
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-95635302022-10-15 A novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning Azadifar, Saeid Ahmadi, Ali BMC Bioinformatics Research BACKGROUND: Selecting and prioritizing candidate disease genes is necessary before conducting laboratory studies as identifying disease genes from a large number of candidate genes using laboratory methods, is a very costly and time-consuming task. There are many machine learning-based gene prioritization methods. These methods differ in various aspects including the feature vectors of genes, the used datasets with different structures, and the learning model. Creating a suitable feature vector for genes and an appropriate learning model on a variety of data with different and non-Euclidean structures, including graphs, as well as the lack of negative data are very important challenges of these methods. The use of graph neural networks has recently emerged in machine learning and other related fields, and they have demonstrated superior performance for a broad range of problems. METHODS: In this study, a new semi-supervised learning method based on graph convolutional networks is presented using the novel constructing feature vector for each gene. In the proposed method, first, we construct three feature vectors for each gene using terms from the Gene Ontology (GO) database. Then, we train a graph convolution network on these vectors using protein–protein interaction (PPI) network data to identify disease candidate genes. Our model discovers hidden layer representations encoding in both local graph structure as well as features of nodes. This method is characterized by the simultaneous consideration of topological information of the biological network (e.g., PPI) and other sources of evidence. Finally, a validation has been done to demonstrate the efficiency of our method. RESULTS: Several experiments are performed on 16 diseases to evaluate the proposed method's performance. The experiments demonstrate that our proposed method achieves the best results, in terms of precision, the area under the ROC curve (AUCs), and F1-score values, when compared with eight state-of-the-art network and machine learning-based disease gene prioritization methods. CONCLUSION: This study shows that the proposed semi-supervised learning method appropriately classifies and ranks candidate disease genes using a graph convolutional network and an innovative method to create three feature vectors for genes based on the molecular function, cellular component, and biological process terms from GO data. BioMed Central 2022-10-14 /pmc/articles/PMC9563530/ /pubmed/36241966 http://dx.doi.org/10.1186/s12859-022-04954-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Azadifar, Saeid
Ahmadi, Ali
A novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning
title A novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning
title_full A novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning
title_fullStr A novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning
title_full_unstemmed A novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning
title_short A novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning
title_sort novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9563530/
https://www.ncbi.nlm.nih.gov/pubmed/36241966
http://dx.doi.org/10.1186/s12859-022-04954-x
work_keys_str_mv AT azadifarsaeid anovelcandidatediseasegeneprioritizationmethodusingdeepgraphconvolutionalnetworksandsemisupervisedlearning
AT ahmadiali anovelcandidatediseasegeneprioritizationmethodusingdeepgraphconvolutionalnetworksandsemisupervisedlearning
AT azadifarsaeid novelcandidatediseasegeneprioritizationmethodusingdeepgraphconvolutionalnetworksandsemisupervisedlearning
AT ahmadiali novelcandidatediseasegeneprioritizationmethodusingdeepgraphconvolutionalnetworksandsemisupervisedlearning