Cargando…

Interaction-Based Feature Selection Algorithm Outperforms Polygenic Risk Score in Predicting Parkinson’s Disease Status

Polygenic risk scores (PRS) aggregating results from genome-wide association studies are the state of the art in the prediction of susceptibility to complex traits or diseases, yet their predictive performance is limited for various reasons, not least of which is their failure to incorporate the eff...

Descripción completa

Detalles Bibliográficos
Autores principales: Cope, Justin L., Baukmann, Hannes A., Klinger, Jörn E., Ravarani, Charles N. J., Böttinger, Erwin P., Konigorski, Stefan, Schmidt, Marco F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8564367/
https://www.ncbi.nlm.nih.gov/pubmed/34745218
http://dx.doi.org/10.3389/fgene.2021.744557
_version_ 1784593603267395584
author Cope, Justin L.
Baukmann, Hannes A.
Klinger, Jörn E.
Ravarani, Charles N. J.
Böttinger, Erwin P.
Konigorski, Stefan
Schmidt, Marco F.
author_facet Cope, Justin L.
Baukmann, Hannes A.
Klinger, Jörn E.
Ravarani, Charles N. J.
Böttinger, Erwin P.
Konigorski, Stefan
Schmidt, Marco F.
author_sort Cope, Justin L.
collection PubMed
description Polygenic risk scores (PRS) aggregating results from genome-wide association studies are the state of the art in the prediction of susceptibility to complex traits or diseases, yet their predictive performance is limited for various reasons, not least of which is their failure to incorporate the effects of gene-gene interactions. Novel machine learning algorithms that use large amounts of data promise to find gene-gene interactions in order to build models with better predictive performance than PRS. Here, we present a data preprocessing step by using data-mining of contextual information to reduce the number of features, enabling machine learning algorithms to identify gene-gene interactions. We applied our approach to the Parkinson’s Progression Markers Initiative (PPMI) dataset, an observational clinical study of 471 genotyped subjects (368 cases and 152 controls). With an AUC of 0.85 (95% CI = [0.72; 0.96]), the interaction-based prediction model outperforms the PRS (AUC of 0.58 (95% CI = [0.42; 0.81])). Furthermore, feature importance analysis of the model provided insights into the mechanism of Parkinson’s disease. For instance, the model revealed an interaction of previously described drug target candidate genes TMEM175 and GAPDHP25. These results demonstrate that interaction-based machine learning models can improve genetic prediction models and might provide an answer to the missing heritability problem.
format Online
Article
Text
id pubmed-8564367
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-85643672021-11-04 Interaction-Based Feature Selection Algorithm Outperforms Polygenic Risk Score in Predicting Parkinson’s Disease Status Cope, Justin L. Baukmann, Hannes A. Klinger, Jörn E. Ravarani, Charles N. J. Böttinger, Erwin P. Konigorski, Stefan Schmidt, Marco F. Front Genet Genetics Polygenic risk scores (PRS) aggregating results from genome-wide association studies are the state of the art in the prediction of susceptibility to complex traits or diseases, yet their predictive performance is limited for various reasons, not least of which is their failure to incorporate the effects of gene-gene interactions. Novel machine learning algorithms that use large amounts of data promise to find gene-gene interactions in order to build models with better predictive performance than PRS. Here, we present a data preprocessing step by using data-mining of contextual information to reduce the number of features, enabling machine learning algorithms to identify gene-gene interactions. We applied our approach to the Parkinson’s Progression Markers Initiative (PPMI) dataset, an observational clinical study of 471 genotyped subjects (368 cases and 152 controls). With an AUC of 0.85 (95% CI = [0.72; 0.96]), the interaction-based prediction model outperforms the PRS (AUC of 0.58 (95% CI = [0.42; 0.81])). Furthermore, feature importance analysis of the model provided insights into the mechanism of Parkinson’s disease. For instance, the model revealed an interaction of previously described drug target candidate genes TMEM175 and GAPDHP25. These results demonstrate that interaction-based machine learning models can improve genetic prediction models and might provide an answer to the missing heritability problem. Frontiers Media S.A. 2021-10-20 /pmc/articles/PMC8564367/ /pubmed/34745218 http://dx.doi.org/10.3389/fgene.2021.744557 Text en Copyright © 2021 Cope, Baukmann, Klinger, Ravarani, Böttinger, Konigorski and Schmidt. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Cope, Justin L.
Baukmann, Hannes A.
Klinger, Jörn E.
Ravarani, Charles N. J.
Böttinger, Erwin P.
Konigorski, Stefan
Schmidt, Marco F.
Interaction-Based Feature Selection Algorithm Outperforms Polygenic Risk Score in Predicting Parkinson’s Disease Status
title Interaction-Based Feature Selection Algorithm Outperforms Polygenic Risk Score in Predicting Parkinson’s Disease Status
title_full Interaction-Based Feature Selection Algorithm Outperforms Polygenic Risk Score in Predicting Parkinson’s Disease Status
title_fullStr Interaction-Based Feature Selection Algorithm Outperforms Polygenic Risk Score in Predicting Parkinson’s Disease Status
title_full_unstemmed Interaction-Based Feature Selection Algorithm Outperforms Polygenic Risk Score in Predicting Parkinson’s Disease Status
title_short Interaction-Based Feature Selection Algorithm Outperforms Polygenic Risk Score in Predicting Parkinson’s Disease Status
title_sort interaction-based feature selection algorithm outperforms polygenic risk score in predicting parkinson’s disease status
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8564367/
https://www.ncbi.nlm.nih.gov/pubmed/34745218
http://dx.doi.org/10.3389/fgene.2021.744557
work_keys_str_mv AT copejustinl interactionbasedfeatureselectionalgorithmoutperformspolygenicriskscoreinpredictingparkinsonsdiseasestatus
AT baukmannhannesa interactionbasedfeatureselectionalgorithmoutperformspolygenicriskscoreinpredictingparkinsonsdiseasestatus
AT klingerjorne interactionbasedfeatureselectionalgorithmoutperformspolygenicriskscoreinpredictingparkinsonsdiseasestatus
AT ravaranicharlesnj interactionbasedfeatureselectionalgorithmoutperformspolygenicriskscoreinpredictingparkinsonsdiseasestatus
AT bottingererwinp interactionbasedfeatureselectionalgorithmoutperformspolygenicriskscoreinpredictingparkinsonsdiseasestatus
AT konigorskistefan interactionbasedfeatureselectionalgorithmoutperformspolygenicriskscoreinpredictingparkinsonsdiseasestatus
AT schmidtmarcof interactionbasedfeatureselectionalgorithmoutperformspolygenicriskscoreinpredictingparkinsonsdiseasestatus