Cargando…

Disease gene prediction with privileged information and heteroscedastic dropout

MOTIVATION: Recently, machine learning models have achieved tremendous success in prioritizing candidate genes for genetic diseases. These models are able to accurately quantify the similarity among disease and genes based on the intuition that similar genes are more likely to be associated with sim...

Descripción completa

Detalles Bibliográficos
Autores principales:	Shu, Juan, Li, Yu, Wang, Sheng, Xi, Bowei, Ma, Jianzhu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2021
Materias:	Systems Biology and Networks
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8275341/ https://www.ncbi.nlm.nih.gov/pubmed/34252957 http://dx.doi.org/10.1093/bioinformatics/btab310

_version_	1783721693781426176
author	Shu, Juan Li, Yu Wang, Sheng Xi, Bowei Ma, Jianzhu
author_facet	Shu, Juan Li, Yu Wang, Sheng Xi, Bowei Ma, Jianzhu
author_sort	Shu, Juan
collection	PubMed
description	MOTIVATION: Recently, machine learning models have achieved tremendous success in prioritizing candidate genes for genetic diseases. These models are able to accurately quantify the similarity among disease and genes based on the intuition that similar genes are more likely to be associated with similar diseases. However, the genetic features these methods rely on are often hard to collect due to high experimental cost and various other technical limitations. Existing solutions of this problem significantly increase the risk of overfitting and decrease the generalizability of the models. RESULTS: In this work, we propose a graph neural network (GNN) version of the Learning under Privileged Information paradigm to predict new disease gene associations. Unlike previous gene prioritization approaches, our model does not require the genetic features to be the same at training and test stages. If a genetic feature is hard to measure and therefore missing at the test stage, our model could still efficiently incorporate its information during the training process. To implement this, we develop a Heteroscedastic Gaussian Dropout algorithm, where the dropout probability of the GNN model is determined by another GNN model with a mirrored GNN architecture. To evaluate our method, we compared our method with four state-of-the-art methods on the Online Mendelian Inheritance in Man dataset to prioritize candidate disease genes. Extensive evaluations show that our model could improve the prediction accuracy when all the features are available compared to other methods. More importantly, our model could make very accurate predictions when >90% of the features are missing at the test stage. AVAILABILITY AND IMPLEMENTATION: Our method is realized with Python 3.7 and Pytorch 1.5.0 and method and data are freely available at: https://github.com/juanshu30/Disease-Gene-Prioritization-with-Privileged-Information-and-Heteroscedastic-Dropout.
format	Online Article Text
id	pubmed-8275341
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-82753412021-07-13 Disease gene prediction with privileged information and heteroscedastic dropout Shu, Juan Li, Yu Wang, Sheng Xi, Bowei Ma, Jianzhu Bioinformatics Systems Biology and Networks MOTIVATION: Recently, machine learning models have achieved tremendous success in prioritizing candidate genes for genetic diseases. These models are able to accurately quantify the similarity among disease and genes based on the intuition that similar genes are more likely to be associated with similar diseases. However, the genetic features these methods rely on are often hard to collect due to high experimental cost and various other technical limitations. Existing solutions of this problem significantly increase the risk of overfitting and decrease the generalizability of the models. RESULTS: In this work, we propose a graph neural network (GNN) version of the Learning under Privileged Information paradigm to predict new disease gene associations. Unlike previous gene prioritization approaches, our model does not require the genetic features to be the same at training and test stages. If a genetic feature is hard to measure and therefore missing at the test stage, our model could still efficiently incorporate its information during the training process. To implement this, we develop a Heteroscedastic Gaussian Dropout algorithm, where the dropout probability of the GNN model is determined by another GNN model with a mirrored GNN architecture. To evaluate our method, we compared our method with four state-of-the-art methods on the Online Mendelian Inheritance in Man dataset to prioritize candidate disease genes. Extensive evaluations show that our model could improve the prediction accuracy when all the features are available compared to other methods. More importantly, our model could make very accurate predictions when >90% of the features are missing at the test stage. AVAILABILITY AND IMPLEMENTATION: Our method is realized with Python 3.7 and Pytorch 1.5.0 and method and data are freely available at: https://github.com/juanshu30/Disease-Gene-Prioritization-with-Privileged-Information-and-Heteroscedastic-Dropout. Oxford University Press 2021-07-12 /pmc/articles/PMC8275341/ /pubmed/34252957 http://dx.doi.org/10.1093/bioinformatics/btab310 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Systems Biology and Networks Shu, Juan Li, Yu Wang, Sheng Xi, Bowei Ma, Jianzhu Disease gene prediction with privileged information and heteroscedastic dropout
title	Disease gene prediction with privileged information and heteroscedastic dropout
title_full	Disease gene prediction with privileged information and heteroscedastic dropout
title_fullStr	Disease gene prediction with privileged information and heteroscedastic dropout
title_full_unstemmed	Disease gene prediction with privileged information and heteroscedastic dropout
title_short	Disease gene prediction with privileged information and heteroscedastic dropout
title_sort	disease gene prediction with privileged information and heteroscedastic dropout
topic	Systems Biology and Networks
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8275341/ https://www.ncbi.nlm.nih.gov/pubmed/34252957 http://dx.doi.org/10.1093/bioinformatics/btab310
work_keys_str_mv	AT shujuan diseasegenepredictionwithprivilegedinformationandheteroscedasticdropout AT liyu diseasegenepredictionwithprivilegedinformationandheteroscedasticdropout AT wangsheng diseasegenepredictionwithprivilegedinformationandheteroscedasticdropout AT xibowei diseasegenepredictionwithprivilegedinformationandheteroscedasticdropout AT majianzhu diseasegenepredictionwithprivilegedinformationandheteroscedasticdropout

Disease gene prediction with privileged information and heteroscedastic dropout

Ejemplares similares