Cargando…

Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data

Nationwide population-based cohort provides a new opportunity to build an automated risk prediction model based on individuals’ history of health and healthcare beyond existing risk prediction models. We tested the possibility of machine learning models to predict future incidence of Alzheimer’s dis...

Descripción completa

Detalles Bibliográficos
Autores principales: Park, Ji Hwan, Cho, Han Eol, Kim, Jong Hun, Wall, Melanie M., Stern, Yaakov, Lim, Hyunsun, Yoo, Shinjae, Kim, Hyoung Seop, Cha, Jiook
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7099065/
https://www.ncbi.nlm.nih.gov/pubmed/32258428
http://dx.doi.org/10.1038/s41746-020-0256-0
_version_ 1783511286431088640
author Park, Ji Hwan
Cho, Han Eol
Kim, Jong Hun
Wall, Melanie M.
Stern, Yaakov
Lim, Hyunsun
Yoo, Shinjae
Kim, Hyoung Seop
Cha, Jiook
author_facet Park, Ji Hwan
Cho, Han Eol
Kim, Jong Hun
Wall, Melanie M.
Stern, Yaakov
Lim, Hyunsun
Yoo, Shinjae
Kim, Hyoung Seop
Cha, Jiook
author_sort Park, Ji Hwan
collection PubMed
description Nationwide population-based cohort provides a new opportunity to build an automated risk prediction model based on individuals’ history of health and healthcare beyond existing risk prediction models. We tested the possibility of machine learning models to predict future incidence of Alzheimer’s disease (AD) using large-scale administrative health data. From the Korean National Health Insurance Service database between 2002 and 2010, we obtained de-identified health data in elders above 65 years (N = 40,736) containing 4,894 unique clinical features including ICD-10 codes, medication codes, laboratory values, history of personal and family illness and socio-demographics. To define incident AD we considered two operational definitions: “definite AD” with diagnostic codes and dementia medication (n = 614) and “probable AD” with only diagnosis (n = 2026). We trained and validated random forest, support vector machine and logistic regression to predict incident AD in 1, 2, 3, and 4 subsequent years. For predicting future incidence of AD in balanced samples (bootstrapping), the machine learning models showed reasonable performance in 1-year prediction with AUC of 0.775 and 0.759, based on “definite AD” and “probable AD” outcomes, respectively; in 2-year, 0.730 and 0.693; in 3-year, 0.677 and 0.644; in 4-year, 0.725 and 0.683. The results were similar when the entire (unbalanced) samples were used. Important clinical features selected in logistic regression included hemoglobin level, age and urine protein level. This study may shed a light on the utility of the data-driven machine learning model based on large-scale administrative health data in AD risk prediction, which may enable better selection of individuals at risk for AD in clinical trials or early detection in clinical settings.
format Online
Article
Text
id pubmed-7099065
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-70990652020-04-06 Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data Park, Ji Hwan Cho, Han Eol Kim, Jong Hun Wall, Melanie M. Stern, Yaakov Lim, Hyunsun Yoo, Shinjae Kim, Hyoung Seop Cha, Jiook NPJ Digit Med Article Nationwide population-based cohort provides a new opportunity to build an automated risk prediction model based on individuals’ history of health and healthcare beyond existing risk prediction models. We tested the possibility of machine learning models to predict future incidence of Alzheimer’s disease (AD) using large-scale administrative health data. From the Korean National Health Insurance Service database between 2002 and 2010, we obtained de-identified health data in elders above 65 years (N = 40,736) containing 4,894 unique clinical features including ICD-10 codes, medication codes, laboratory values, history of personal and family illness and socio-demographics. To define incident AD we considered two operational definitions: “definite AD” with diagnostic codes and dementia medication (n = 614) and “probable AD” with only diagnosis (n = 2026). We trained and validated random forest, support vector machine and logistic regression to predict incident AD in 1, 2, 3, and 4 subsequent years. For predicting future incidence of AD in balanced samples (bootstrapping), the machine learning models showed reasonable performance in 1-year prediction with AUC of 0.775 and 0.759, based on “definite AD” and “probable AD” outcomes, respectively; in 2-year, 0.730 and 0.693; in 3-year, 0.677 and 0.644; in 4-year, 0.725 and 0.683. The results were similar when the entire (unbalanced) samples were used. Important clinical features selected in logistic regression included hemoglobin level, age and urine protein level. This study may shed a light on the utility of the data-driven machine learning model based on large-scale administrative health data in AD risk prediction, which may enable better selection of individuals at risk for AD in clinical trials or early detection in clinical settings. Nature Publishing Group UK 2020-03-26 /pmc/articles/PMC7099065/ /pubmed/32258428 http://dx.doi.org/10.1038/s41746-020-0256-0 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Park, Ji Hwan
Cho, Han Eol
Kim, Jong Hun
Wall, Melanie M.
Stern, Yaakov
Lim, Hyunsun
Yoo, Shinjae
Kim, Hyoung Seop
Cha, Jiook
Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data
title Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data
title_full Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data
title_fullStr Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data
title_full_unstemmed Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data
title_short Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data
title_sort machine learning prediction of incidence of alzheimer’s disease using large-scale administrative health data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7099065/
https://www.ncbi.nlm.nih.gov/pubmed/32258428
http://dx.doi.org/10.1038/s41746-020-0256-0
work_keys_str_mv AT parkjihwan machinelearningpredictionofincidenceofalzheimersdiseaseusinglargescaleadministrativehealthdata
AT chohaneol machinelearningpredictionofincidenceofalzheimersdiseaseusinglargescaleadministrativehealthdata
AT kimjonghun machinelearningpredictionofincidenceofalzheimersdiseaseusinglargescaleadministrativehealthdata
AT wallmelaniem machinelearningpredictionofincidenceofalzheimersdiseaseusinglargescaleadministrativehealthdata
AT sternyaakov machinelearningpredictionofincidenceofalzheimersdiseaseusinglargescaleadministrativehealthdata
AT limhyunsun machinelearningpredictionofincidenceofalzheimersdiseaseusinglargescaleadministrativehealthdata
AT yooshinjae machinelearningpredictionofincidenceofalzheimersdiseaseusinglargescaleadministrativehealthdata
AT kimhyoungseop machinelearningpredictionofincidenceofalzheimersdiseaseusinglargescaleadministrativehealthdata
AT chajiook machinelearningpredictionofincidenceofalzheimersdiseaseusinglargescaleadministrativehealthdata