Cargando…

Machine learning based risk prediction for Parkinson's disease with nationwide health screening data

Although many studies have been conducted on machine learning (ML) models for Parkinson’s disease (PD) prediction using neuroimaging and movement analyses, studies with large population-based datasets are limited. We aimed to propose PD prediction models using ML algorithms based on the National Hea...

Descripción completa

Detalles Bibliográficos
Autores principales: Park, You Hyun, Suh, Jee Hyun, Kim, Yong Wook, Kang, Dae Ryong, Shin, Jaeyong, Yang, Seung Nam, Yoon, Seo Yeon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9663430/
https://www.ncbi.nlm.nih.gov/pubmed/36376523
http://dx.doi.org/10.1038/s41598-022-24105-9
_version_ 1784830874588545024
author Park, You Hyun
Suh, Jee Hyun
Kim, Yong Wook
Kang, Dae Ryong
Shin, Jaeyong
Yang, Seung Nam
Yoon, Seo Yeon
author_facet Park, You Hyun
Suh, Jee Hyun
Kim, Yong Wook
Kang, Dae Ryong
Shin, Jaeyong
Yang, Seung Nam
Yoon, Seo Yeon
author_sort Park, You Hyun
collection PubMed
description Although many studies have been conducted on machine learning (ML) models for Parkinson’s disease (PD) prediction using neuroimaging and movement analyses, studies with large population-based datasets are limited. We aimed to propose PD prediction models using ML algorithms based on the National Health Insurance Service-Health Screening datasets. We selected individuals who participated in national health-screening programs > 5 times between 2002 and 2015. PD was defined based on the ICD-code (G20), and a matched cohort of individuals without PD was selected using a 1:1 random sampling method. Various ML algorithms were applied for PD prediction, and the performance of the prediction models was compared. Neural networks, gradient boosting machines, and random forest algorithms exhibited the best average prediction accuracy (average area under the receiver operating characteristic curve (AUC): 0.779, 0.766, and 0.731, respectively) among the algorithms validated in this study. The overall model performance metrics were higher in men than in women (AUC: 0.742 and 0.729, respectively). The most important factor for predicting PD occurrence was body mass index, followed by total cholesterol, glucose, hemoglobin, and blood pressure levels. Smoking and alcohol consumption (in men) and socioeconomic status, physical activity, and diabetes mellitus (in women) were highly correlated with the occurrence of PD. The proposed health-screening dataset-based PD prediction model using ML algorithms is readily applicable, produces validated results, and could be a useful option for PD prediction models.
format Online
Article
Text
id pubmed-9663430
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-96634302022-11-15 Machine learning based risk prediction for Parkinson's disease with nationwide health screening data Park, You Hyun Suh, Jee Hyun Kim, Yong Wook Kang, Dae Ryong Shin, Jaeyong Yang, Seung Nam Yoon, Seo Yeon Sci Rep Article Although many studies have been conducted on machine learning (ML) models for Parkinson’s disease (PD) prediction using neuroimaging and movement analyses, studies with large population-based datasets are limited. We aimed to propose PD prediction models using ML algorithms based on the National Health Insurance Service-Health Screening datasets. We selected individuals who participated in national health-screening programs > 5 times between 2002 and 2015. PD was defined based on the ICD-code (G20), and a matched cohort of individuals without PD was selected using a 1:1 random sampling method. Various ML algorithms were applied for PD prediction, and the performance of the prediction models was compared. Neural networks, gradient boosting machines, and random forest algorithms exhibited the best average prediction accuracy (average area under the receiver operating characteristic curve (AUC): 0.779, 0.766, and 0.731, respectively) among the algorithms validated in this study. The overall model performance metrics were higher in men than in women (AUC: 0.742 and 0.729, respectively). The most important factor for predicting PD occurrence was body mass index, followed by total cholesterol, glucose, hemoglobin, and blood pressure levels. Smoking and alcohol consumption (in men) and socioeconomic status, physical activity, and diabetes mellitus (in women) were highly correlated with the occurrence of PD. The proposed health-screening dataset-based PD prediction model using ML algorithms is readily applicable, produces validated results, and could be a useful option for PD prediction models. Nature Publishing Group UK 2022-11-14 /pmc/articles/PMC9663430/ /pubmed/36376523 http://dx.doi.org/10.1038/s41598-022-24105-9 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Park, You Hyun
Suh, Jee Hyun
Kim, Yong Wook
Kang, Dae Ryong
Shin, Jaeyong
Yang, Seung Nam
Yoon, Seo Yeon
Machine learning based risk prediction for Parkinson's disease with nationwide health screening data
title Machine learning based risk prediction for Parkinson's disease with nationwide health screening data
title_full Machine learning based risk prediction for Parkinson's disease with nationwide health screening data
title_fullStr Machine learning based risk prediction for Parkinson's disease with nationwide health screening data
title_full_unstemmed Machine learning based risk prediction for Parkinson's disease with nationwide health screening data
title_short Machine learning based risk prediction for Parkinson's disease with nationwide health screening data
title_sort machine learning based risk prediction for parkinson's disease with nationwide health screening data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9663430/
https://www.ncbi.nlm.nih.gov/pubmed/36376523
http://dx.doi.org/10.1038/s41598-022-24105-9
work_keys_str_mv AT parkyouhyun machinelearningbasedriskpredictionforparkinsonsdiseasewithnationwidehealthscreeningdata
AT suhjeehyun machinelearningbasedriskpredictionforparkinsonsdiseasewithnationwidehealthscreeningdata
AT kimyongwook machinelearningbasedriskpredictionforparkinsonsdiseasewithnationwidehealthscreeningdata
AT kangdaeryong machinelearningbasedriskpredictionforparkinsonsdiseasewithnationwidehealthscreeningdata
AT shinjaeyong machinelearningbasedriskpredictionforparkinsonsdiseasewithnationwidehealthscreeningdata
AT yangseungnam machinelearningbasedriskpredictionforparkinsonsdiseasewithnationwidehealthscreeningdata
AT yoonseoyeon machinelearningbasedriskpredictionforparkinsonsdiseasewithnationwidehealthscreeningdata