Cargando…
A novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning
BACKGROUND: Selecting and prioritizing candidate disease genes is necessary before conducting laboratory studies as identifying disease genes from a large number of candidate genes using laboratory methods, is a very costly and time-consuming task. There are many machine learning-based gene prioriti...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9563530/ https://www.ncbi.nlm.nih.gov/pubmed/36241966 http://dx.doi.org/10.1186/s12859-022-04954-x |
_version_ | 1784808425655369728 |
---|---|
author | Azadifar, Saeid Ahmadi, Ali |
author_facet | Azadifar, Saeid Ahmadi, Ali |
author_sort | Azadifar, Saeid |
collection | PubMed |
description | BACKGROUND: Selecting and prioritizing candidate disease genes is necessary before conducting laboratory studies as identifying disease genes from a large number of candidate genes using laboratory methods, is a very costly and time-consuming task. There are many machine learning-based gene prioritization methods. These methods differ in various aspects including the feature vectors of genes, the used datasets with different structures, and the learning model. Creating a suitable feature vector for genes and an appropriate learning model on a variety of data with different and non-Euclidean structures, including graphs, as well as the lack of negative data are very important challenges of these methods. The use of graph neural networks has recently emerged in machine learning and other related fields, and they have demonstrated superior performance for a broad range of problems. METHODS: In this study, a new semi-supervised learning method based on graph convolutional networks is presented using the novel constructing feature vector for each gene. In the proposed method, first, we construct three feature vectors for each gene using terms from the Gene Ontology (GO) database. Then, we train a graph convolution network on these vectors using protein–protein interaction (PPI) network data to identify disease candidate genes. Our model discovers hidden layer representations encoding in both local graph structure as well as features of nodes. This method is characterized by the simultaneous consideration of topological information of the biological network (e.g., PPI) and other sources of evidence. Finally, a validation has been done to demonstrate the efficiency of our method. RESULTS: Several experiments are performed on 16 diseases to evaluate the proposed method's performance. The experiments demonstrate that our proposed method achieves the best results, in terms of precision, the area under the ROC curve (AUCs), and F1-score values, when compared with eight state-of-the-art network and machine learning-based disease gene prioritization methods. CONCLUSION: This study shows that the proposed semi-supervised learning method appropriately classifies and ranks candidate disease genes using a graph convolutional network and an innovative method to create three feature vectors for genes based on the molecular function, cellular component, and biological process terms from GO data. |
format | Online Article Text |
id | pubmed-9563530 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-95635302022-10-15 A novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning Azadifar, Saeid Ahmadi, Ali BMC Bioinformatics Research BACKGROUND: Selecting and prioritizing candidate disease genes is necessary before conducting laboratory studies as identifying disease genes from a large number of candidate genes using laboratory methods, is a very costly and time-consuming task. There are many machine learning-based gene prioritization methods. These methods differ in various aspects including the feature vectors of genes, the used datasets with different structures, and the learning model. Creating a suitable feature vector for genes and an appropriate learning model on a variety of data with different and non-Euclidean structures, including graphs, as well as the lack of negative data are very important challenges of these methods. The use of graph neural networks has recently emerged in machine learning and other related fields, and they have demonstrated superior performance for a broad range of problems. METHODS: In this study, a new semi-supervised learning method based on graph convolutional networks is presented using the novel constructing feature vector for each gene. In the proposed method, first, we construct three feature vectors for each gene using terms from the Gene Ontology (GO) database. Then, we train a graph convolution network on these vectors using protein–protein interaction (PPI) network data to identify disease candidate genes. Our model discovers hidden layer representations encoding in both local graph structure as well as features of nodes. This method is characterized by the simultaneous consideration of topological information of the biological network (e.g., PPI) and other sources of evidence. Finally, a validation has been done to demonstrate the efficiency of our method. RESULTS: Several experiments are performed on 16 diseases to evaluate the proposed method's performance. The experiments demonstrate that our proposed method achieves the best results, in terms of precision, the area under the ROC curve (AUCs), and F1-score values, when compared with eight state-of-the-art network and machine learning-based disease gene prioritization methods. CONCLUSION: This study shows that the proposed semi-supervised learning method appropriately classifies and ranks candidate disease genes using a graph convolutional network and an innovative method to create three feature vectors for genes based on the molecular function, cellular component, and biological process terms from GO data. BioMed Central 2022-10-14 /pmc/articles/PMC9563530/ /pubmed/36241966 http://dx.doi.org/10.1186/s12859-022-04954-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Azadifar, Saeid Ahmadi, Ali A novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning |
title | A novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning |
title_full | A novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning |
title_fullStr | A novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning |
title_full_unstemmed | A novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning |
title_short | A novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning |
title_sort | novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9563530/ https://www.ncbi.nlm.nih.gov/pubmed/36241966 http://dx.doi.org/10.1186/s12859-022-04954-x |
work_keys_str_mv | AT azadifarsaeid anovelcandidatediseasegeneprioritizationmethodusingdeepgraphconvolutionalnetworksandsemisupervisedlearning AT ahmadiali anovelcandidatediseasegeneprioritizationmethodusingdeepgraphconvolutionalnetworksandsemisupervisedlearning AT azadifarsaeid novelcandidatediseasegeneprioritizationmethodusingdeepgraphconvolutionalnetworksandsemisupervisedlearning AT ahmadiali novelcandidatediseasegeneprioritizationmethodusingdeepgraphconvolutionalnetworksandsemisupervisedlearning |