Cargando…

A genome-wide association study coupled with machine learning approaches to identify influential demographic and genomic factors underlying Parkinson’s disease

Background: Despite the recent success of genome-wide association studies (GWAS) in identifying 90 independent risk loci for Parkinson’s disease (PD), the genomic underpinning of PD is still largely unknown. At the same time, accurate and reliable predictive models utilizing genomic or demographic f...

Descripción completa

Detalles Bibliográficos
Autores principales: Rahman, Md Asad, Liu, Jinling
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10570619/
https://www.ncbi.nlm.nih.gov/pubmed/37842648
http://dx.doi.org/10.3389/fgene.2023.1230579
_version_ 1785119809157988352
author Rahman, Md Asad
Liu, Jinling
author_facet Rahman, Md Asad
Liu, Jinling
author_sort Rahman, Md Asad
collection PubMed
description Background: Despite the recent success of genome-wide association studies (GWAS) in identifying 90 independent risk loci for Parkinson’s disease (PD), the genomic underpinning of PD is still largely unknown. At the same time, accurate and reliable predictive models utilizing genomic or demographic features are desired in the clinic for predicting the risk of Parkinson’s disease. Methods: To identify influential demographic and genomic factors associated with PD and to further develop predictive models, we utilized demographic data, incorporating 200 variables across 33,473 participants, along with genomic data involving 447,089 SNPs across 8,840 samples, both derived from the Fox Insight online study. We first applied correlation and GWAS analyses to find the top demographic and genomic factors associated with PD, respectively. We further developed and compared a variety of machine learning (ML) models for predicting PD. From the developed ML models, we performed feature importance analysis to reveal the predictability of each demographic or the genomic input feature for PD. Finally, we performed gene set enrichment analysis on our GWAS results to identify PD-associated pathways. Results: In our study, we identified both novel and well-known demographic and genetic factors (along with the enriched pathways) related to PD. In addition, we developed predictive models that performed robustly, with AUC = 0.89 for demographic data and AUC = 0.74 for genomic data. Our GWAS analysis identified several novel and significant variants and gene loci, including three intron variants in LMNA (p-values smaller than 4.0e-21) and one missense variant in SEMA4A (p-value = 1.11e-26). Our feature importance analysis from the PD-predictive ML models highlighted some significant and novel variants from our GWAS analysis (e.g., the intron variant rs1749409 in the RIT1 gene) and helped identify potentially causative variants that were missed by GWAS, such as rs11264300, a missense variant in the gene DCST1, and rs11584630, an intron variant in the gene KCNN3. Conclusion: In summary, by combining a GWAS with advanced machine learning models, we identified both known and novel demographic and genomic factors as well as built well-performing ML models for predicting Parkinson’s disease.
format Online
Article
Text
id pubmed-10570619
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-105706192023-10-14 A genome-wide association study coupled with machine learning approaches to identify influential demographic and genomic factors underlying Parkinson’s disease Rahman, Md Asad Liu, Jinling Front Genet Genetics Background: Despite the recent success of genome-wide association studies (GWAS) in identifying 90 independent risk loci for Parkinson’s disease (PD), the genomic underpinning of PD is still largely unknown. At the same time, accurate and reliable predictive models utilizing genomic or demographic features are desired in the clinic for predicting the risk of Parkinson’s disease. Methods: To identify influential demographic and genomic factors associated with PD and to further develop predictive models, we utilized demographic data, incorporating 200 variables across 33,473 participants, along with genomic data involving 447,089 SNPs across 8,840 samples, both derived from the Fox Insight online study. We first applied correlation and GWAS analyses to find the top demographic and genomic factors associated with PD, respectively. We further developed and compared a variety of machine learning (ML) models for predicting PD. From the developed ML models, we performed feature importance analysis to reveal the predictability of each demographic or the genomic input feature for PD. Finally, we performed gene set enrichment analysis on our GWAS results to identify PD-associated pathways. Results: In our study, we identified both novel and well-known demographic and genetic factors (along with the enriched pathways) related to PD. In addition, we developed predictive models that performed robustly, with AUC = 0.89 for demographic data and AUC = 0.74 for genomic data. Our GWAS analysis identified several novel and significant variants and gene loci, including three intron variants in LMNA (p-values smaller than 4.0e-21) and one missense variant in SEMA4A (p-value = 1.11e-26). Our feature importance analysis from the PD-predictive ML models highlighted some significant and novel variants from our GWAS analysis (e.g., the intron variant rs1749409 in the RIT1 gene) and helped identify potentially causative variants that were missed by GWAS, such as rs11264300, a missense variant in the gene DCST1, and rs11584630, an intron variant in the gene KCNN3. Conclusion: In summary, by combining a GWAS with advanced machine learning models, we identified both known and novel demographic and genomic factors as well as built well-performing ML models for predicting Parkinson’s disease. Frontiers Media S.A. 2023-09-29 /pmc/articles/PMC10570619/ /pubmed/37842648 http://dx.doi.org/10.3389/fgene.2023.1230579 Text en Copyright © 2023 Rahman and Liu. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Rahman, Md Asad
Liu, Jinling
A genome-wide association study coupled with machine learning approaches to identify influential demographic and genomic factors underlying Parkinson’s disease
title A genome-wide association study coupled with machine learning approaches to identify influential demographic and genomic factors underlying Parkinson’s disease
title_full A genome-wide association study coupled with machine learning approaches to identify influential demographic and genomic factors underlying Parkinson’s disease
title_fullStr A genome-wide association study coupled with machine learning approaches to identify influential demographic and genomic factors underlying Parkinson’s disease
title_full_unstemmed A genome-wide association study coupled with machine learning approaches to identify influential demographic and genomic factors underlying Parkinson’s disease
title_short A genome-wide association study coupled with machine learning approaches to identify influential demographic and genomic factors underlying Parkinson’s disease
title_sort genome-wide association study coupled with machine learning approaches to identify influential demographic and genomic factors underlying parkinson’s disease
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10570619/
https://www.ncbi.nlm.nih.gov/pubmed/37842648
http://dx.doi.org/10.3389/fgene.2023.1230579
work_keys_str_mv AT rahmanmdasad agenomewideassociationstudycoupledwithmachinelearningapproachestoidentifyinfluentialdemographicandgenomicfactorsunderlyingparkinsonsdisease
AT liujinling agenomewideassociationstudycoupledwithmachinelearningapproachestoidentifyinfluentialdemographicandgenomicfactorsunderlyingparkinsonsdisease
AT rahmanmdasad genomewideassociationstudycoupledwithmachinelearningapproachestoidentifyinfluentialdemographicandgenomicfactorsunderlyingparkinsonsdisease
AT liujinling genomewideassociationstudycoupledwithmachinelearningapproachestoidentifyinfluentialdemographicandgenomicfactorsunderlyingparkinsonsdisease