Cargando…

Forecasting risk gene discovery in autism with machine learning and genome-scale data

Genetics has been one of the most powerful windows into the biology of autism spectrum disorder (ASD). It is estimated that a thousand or more genes may confer risk for ASD when functionally perturbed, however, only around 100 genes currently have sufficient evidence to be considered true “autism ri...

Descripción completa

Detalles Bibliográficos
Autores principales: Brueggeman, Leo, Koomar, Tanner, Michaelson, Jacob J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7067874/
https://www.ncbi.nlm.nih.gov/pubmed/32165711
http://dx.doi.org/10.1038/s41598-020-61288-5
_version_ 1783505474710142976
author Brueggeman, Leo
Koomar, Tanner
Michaelson, Jacob J.
author_facet Brueggeman, Leo
Koomar, Tanner
Michaelson, Jacob J.
author_sort Brueggeman, Leo
collection PubMed
description Genetics has been one of the most powerful windows into the biology of autism spectrum disorder (ASD). It is estimated that a thousand or more genes may confer risk for ASD when functionally perturbed, however, only around 100 genes currently have sufficient evidence to be considered true “autism risk genes”. Massive genetic studies are currently underway producing data to implicate additional genes. This approach — although necessary — is costly and slow-moving, making identification of putative ASD risk genes with existing data vital. Here, we approach autism risk gene discovery as a machine learning problem, rather than a genetic association problem, by using genome-scale data as predictors to identify new genes with similar properties to established autism risk genes. This ensemble method, forecASD, integrates brain gene expression, heterogeneous network data, and previous gene-level predictors of autism association into an ensemble classifier that yields a single score indexing evidence of each gene’s involvement in the etiology of autism. We demonstrate that forecASD has substantially better performance than previous predictors of autism association in three independent trio-based sequencing studies. Studying forecASD prioritized genes, we show that forecASD is a robust indicator of a gene’s involvement in ASD etiology, with diverse applications to gene discovery, differential expression analysis, eQTL prioritization, and pathway enrichment analysis.
format Online
Article
Text
id pubmed-7067874
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-70678742020-03-22 Forecasting risk gene discovery in autism with machine learning and genome-scale data Brueggeman, Leo Koomar, Tanner Michaelson, Jacob J. Sci Rep Article Genetics has been one of the most powerful windows into the biology of autism spectrum disorder (ASD). It is estimated that a thousand or more genes may confer risk for ASD when functionally perturbed, however, only around 100 genes currently have sufficient evidence to be considered true “autism risk genes”. Massive genetic studies are currently underway producing data to implicate additional genes. This approach — although necessary — is costly and slow-moving, making identification of putative ASD risk genes with existing data vital. Here, we approach autism risk gene discovery as a machine learning problem, rather than a genetic association problem, by using genome-scale data as predictors to identify new genes with similar properties to established autism risk genes. This ensemble method, forecASD, integrates brain gene expression, heterogeneous network data, and previous gene-level predictors of autism association into an ensemble classifier that yields a single score indexing evidence of each gene’s involvement in the etiology of autism. We demonstrate that forecASD has substantially better performance than previous predictors of autism association in three independent trio-based sequencing studies. Studying forecASD prioritized genes, we show that forecASD is a robust indicator of a gene’s involvement in ASD etiology, with diverse applications to gene discovery, differential expression analysis, eQTL prioritization, and pathway enrichment analysis. Nature Publishing Group UK 2020-03-12 /pmc/articles/PMC7067874/ /pubmed/32165711 http://dx.doi.org/10.1038/s41598-020-61288-5 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Brueggeman, Leo
Koomar, Tanner
Michaelson, Jacob J.
Forecasting risk gene discovery in autism with machine learning and genome-scale data
title Forecasting risk gene discovery in autism with machine learning and genome-scale data
title_full Forecasting risk gene discovery in autism with machine learning and genome-scale data
title_fullStr Forecasting risk gene discovery in autism with machine learning and genome-scale data
title_full_unstemmed Forecasting risk gene discovery in autism with machine learning and genome-scale data
title_short Forecasting risk gene discovery in autism with machine learning and genome-scale data
title_sort forecasting risk gene discovery in autism with machine learning and genome-scale data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7067874/
https://www.ncbi.nlm.nih.gov/pubmed/32165711
http://dx.doi.org/10.1038/s41598-020-61288-5
work_keys_str_mv AT brueggemanleo forecastingriskgenediscoveryinautismwithmachinelearningandgenomescaledata
AT koomartanner forecastingriskgenediscoveryinautismwithmachinelearningandgenomescaledata
AT michaelsonjacobj forecastingriskgenediscoveryinautismwithmachinelearningandgenomescaledata