Cargando…

Identifying disease genes using machine learning and gene functional similarities, assessed through Gene Ontology

Identifying disease genes from a vast amount of genetic data is one of the most challenging tasks in the post-genomic era. Also, complex diseases present highly heterogeneous genotype, which difficult biological marker identification. Machine learning methods are widely used to identify these marker...

Descripción completa

Detalles Bibliográficos
Autores principales: Asif, Muhammad, Martiniano, Hugo F. M. C. M., Vicente, Astrid M., Couto, Francisco M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6287949/
https://www.ncbi.nlm.nih.gov/pubmed/30532199
http://dx.doi.org/10.1371/journal.pone.0208626
_version_ 1783379707072348160
author Asif, Muhammad
Martiniano, Hugo F. M. C. M.
Vicente, Astrid M.
Couto, Francisco M.
author_facet Asif, Muhammad
Martiniano, Hugo F. M. C. M.
Vicente, Astrid M.
Couto, Francisco M.
author_sort Asif, Muhammad
collection PubMed
description Identifying disease genes from a vast amount of genetic data is one of the most challenging tasks in the post-genomic era. Also, complex diseases present highly heterogeneous genotype, which difficult biological marker identification. Machine learning methods are widely used to identify these markers, but their performance is highly dependent upon the size and quality of available data. In this study, we demonstrated that machine learning classifiers trained on gene functional similarities, using Gene Ontology (GO), can improve the identification of genes involved in complex diseases. For this purpose, we developed a supervised machine learning methodology to predict complex disease genes. The proposed pipeline was assessed using Autism Spectrum Disorder (ASD) candidate genes. A quantitative measure of gene functional similarities was obtained by employing different semantic similarity measures. To infer the hidden functional similarities between ASD genes, various types of machine learning classifiers were built on quantitative semantic similarity matrices of ASD and non-ASD genes. The classifiers trained and tested on ASD and non-ASD gene functional similarities outperformed previously reported ASD classifiers. For example, a Random Forest (RF) classifier achieved an AUC of 0. 80 for predicting new ASD genes, which was higher than the reported classifier (0.73). Additionally, this classifier was able to predict 73 novel ASD candidate genes that were enriched for core ASD phenotypes, such as autism and obsessive-compulsive behavior. In addition, predicted genes were also enriched for ASD co-occurring conditions, including Attention Deficit Hyperactivity Disorder (ADHD). We also developed a KNIME workflow with the proposed methodology which allows users to configure and execute it without requiring machine learning and programming skills. Machine learning is an effective and reliable technique to decipher ASD mechanism by identifying novel disease genes, but this study further demonstrated that their performance can be improved by incorporating a quantitative measure of gene functional similarities. Source code and the workflow of the proposed methodology are available at https://github.com/Muh-Asif/ASD-genes-prediction.
format Online
Article
Text
id pubmed-6287949
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-62879492018-12-28 Identifying disease genes using machine learning and gene functional similarities, assessed through Gene Ontology Asif, Muhammad Martiniano, Hugo F. M. C. M. Vicente, Astrid M. Couto, Francisco M. PLoS One Research Article Identifying disease genes from a vast amount of genetic data is one of the most challenging tasks in the post-genomic era. Also, complex diseases present highly heterogeneous genotype, which difficult biological marker identification. Machine learning methods are widely used to identify these markers, but their performance is highly dependent upon the size and quality of available data. In this study, we demonstrated that machine learning classifiers trained on gene functional similarities, using Gene Ontology (GO), can improve the identification of genes involved in complex diseases. For this purpose, we developed a supervised machine learning methodology to predict complex disease genes. The proposed pipeline was assessed using Autism Spectrum Disorder (ASD) candidate genes. A quantitative measure of gene functional similarities was obtained by employing different semantic similarity measures. To infer the hidden functional similarities between ASD genes, various types of machine learning classifiers were built on quantitative semantic similarity matrices of ASD and non-ASD genes. The classifiers trained and tested on ASD and non-ASD gene functional similarities outperformed previously reported ASD classifiers. For example, a Random Forest (RF) classifier achieved an AUC of 0. 80 for predicting new ASD genes, which was higher than the reported classifier (0.73). Additionally, this classifier was able to predict 73 novel ASD candidate genes that were enriched for core ASD phenotypes, such as autism and obsessive-compulsive behavior. In addition, predicted genes were also enriched for ASD co-occurring conditions, including Attention Deficit Hyperactivity Disorder (ADHD). We also developed a KNIME workflow with the proposed methodology which allows users to configure and execute it without requiring machine learning and programming skills. Machine learning is an effective and reliable technique to decipher ASD mechanism by identifying novel disease genes, but this study further demonstrated that their performance can be improved by incorporating a quantitative measure of gene functional similarities. Source code and the workflow of the proposed methodology are available at https://github.com/Muh-Asif/ASD-genes-prediction. Public Library of Science 2018-12-10 /pmc/articles/PMC6287949/ /pubmed/30532199 http://dx.doi.org/10.1371/journal.pone.0208626 Text en © 2018 Asif et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Asif, Muhammad
Martiniano, Hugo F. M. C. M.
Vicente, Astrid M.
Couto, Francisco M.
Identifying disease genes using machine learning and gene functional similarities, assessed through Gene Ontology
title Identifying disease genes using machine learning and gene functional similarities, assessed through Gene Ontology
title_full Identifying disease genes using machine learning and gene functional similarities, assessed through Gene Ontology
title_fullStr Identifying disease genes using machine learning and gene functional similarities, assessed through Gene Ontology
title_full_unstemmed Identifying disease genes using machine learning and gene functional similarities, assessed through Gene Ontology
title_short Identifying disease genes using machine learning and gene functional similarities, assessed through Gene Ontology
title_sort identifying disease genes using machine learning and gene functional similarities, assessed through gene ontology
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6287949/
https://www.ncbi.nlm.nih.gov/pubmed/30532199
http://dx.doi.org/10.1371/journal.pone.0208626
work_keys_str_mv AT asifmuhammad identifyingdiseasegenesusingmachinelearningandgenefunctionalsimilaritiesassessedthroughgeneontology
AT martinianohugofmcm identifyingdiseasegenesusingmachinelearningandgenefunctionalsimilaritiesassessedthroughgeneontology
AT vicenteastridm identifyingdiseasegenesusingmachinelearningandgenefunctionalsimilaritiesassessedthroughgeneontology
AT coutofranciscom identifyingdiseasegenesusingmachinelearningandgenefunctionalsimilaritiesassessedthroughgeneontology