Cargando…

“Guilt by association” is not competitive with genetic association for identifying autism risk genes

Discovering genes involved in complex human genetic disorders is a major challenge. Many have suggested that machine learning (ML) algorithms using gene networks can be used to supplement traditional genetic association-based approaches to predict or prioritize disease genes. However, questions have...

Descripción completa

Detalles Bibliográficos
Autores principales: Gunning, Margot, Pavlidis, Paul
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8342445/
https://www.ncbi.nlm.nih.gov/pubmed/34354131
http://dx.doi.org/10.1038/s41598-021-95321-y
_version_ 1783734071984128000
author Gunning, Margot
Pavlidis, Paul
author_facet Gunning, Margot
Pavlidis, Paul
author_sort Gunning, Margot
collection PubMed
description Discovering genes involved in complex human genetic disorders is a major challenge. Many have suggested that machine learning (ML) algorithms using gene networks can be used to supplement traditional genetic association-based approaches to predict or prioritize disease genes. However, questions have been raised about the utility of ML methods for this type of task due to biases within the data, and poor real-world performance. Using autism spectrum disorder (ASD) as a test case, we sought to investigate the question: can machine learning aid in the discovery of disease genes? We collected 13 published ASD gene prioritization studies and evaluated their performance using known and novel high-confidence ASD genes. We also investigated their biases towards generic gene annotations, like number of association publications. We found that ML methods which do not incorporate genetics information have limited utility for prioritization of ASD risk genes. These studies perform at a comparable level to generic measures of likelihood for the involvement of genes in any condition, and do not out-perform genetic association studies. Future efforts to discover disease genes should be focused on developing and validating statistical models for genetic association, specifically for association between rare variants and disease, rather than developing complex machine learning methods using complex heterogeneous biological data with unknown reliability.
format Online
Article
Text
id pubmed-8342445
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-83424452021-08-06 “Guilt by association” is not competitive with genetic association for identifying autism risk genes Gunning, Margot Pavlidis, Paul Sci Rep Article Discovering genes involved in complex human genetic disorders is a major challenge. Many have suggested that machine learning (ML) algorithms using gene networks can be used to supplement traditional genetic association-based approaches to predict or prioritize disease genes. However, questions have been raised about the utility of ML methods for this type of task due to biases within the data, and poor real-world performance. Using autism spectrum disorder (ASD) as a test case, we sought to investigate the question: can machine learning aid in the discovery of disease genes? We collected 13 published ASD gene prioritization studies and evaluated their performance using known and novel high-confidence ASD genes. We also investigated their biases towards generic gene annotations, like number of association publications. We found that ML methods which do not incorporate genetics information have limited utility for prioritization of ASD risk genes. These studies perform at a comparable level to generic measures of likelihood for the involvement of genes in any condition, and do not out-perform genetic association studies. Future efforts to discover disease genes should be focused on developing and validating statistical models for genetic association, specifically for association between rare variants and disease, rather than developing complex machine learning methods using complex heterogeneous biological data with unknown reliability. Nature Publishing Group UK 2021-08-05 /pmc/articles/PMC8342445/ /pubmed/34354131 http://dx.doi.org/10.1038/s41598-021-95321-y Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Gunning, Margot
Pavlidis, Paul
“Guilt by association” is not competitive with genetic association for identifying autism risk genes
title “Guilt by association” is not competitive with genetic association for identifying autism risk genes
title_full “Guilt by association” is not competitive with genetic association for identifying autism risk genes
title_fullStr “Guilt by association” is not competitive with genetic association for identifying autism risk genes
title_full_unstemmed “Guilt by association” is not competitive with genetic association for identifying autism risk genes
title_short “Guilt by association” is not competitive with genetic association for identifying autism risk genes
title_sort “guilt by association” is not competitive with genetic association for identifying autism risk genes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8342445/
https://www.ncbi.nlm.nih.gov/pubmed/34354131
http://dx.doi.org/10.1038/s41598-021-95321-y
work_keys_str_mv AT gunningmargot guiltbyassociationisnotcompetitivewithgeneticassociationforidentifyingautismriskgenes
AT pavlidispaul guiltbyassociationisnotcompetitivewithgeneticassociationforidentifyingautismriskgenes