Cargando…

The Impact of Time Horizon on Classification Accuracy: Application of Machine Learning to Prediction of Incident Coronary Heart Disease

BACKGROUND: Many machine learning approaches are limited to classification of outcomes rather than longitudinal prediction. One strategy to use machine learning in clinical risk prediction is to classify outcomes over a given time horizon. However, it is not well-known how to identify the optimal ti...

Descripción completa

Detalles Bibliográficos
Autores principales:	Simon, Steven, Mandair, Divneet, Albakri, Abdel, Fohner, Alison, Simon, Noah, Lange, Leslie, Biggs, Mary, Mukamal, Kenneth, Psaty, Bruce, Rosenberg, Michael
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2022
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9669890/ https://www.ncbi.nlm.nih.gov/pubmed/36322114 http://dx.doi.org/10.2196/38040

_version_	1784832226941206528
author	Simon, Steven Mandair, Divneet Albakri, Abdel Fohner, Alison Simon, Noah Lange, Leslie Biggs, Mary Mukamal, Kenneth Psaty, Bruce Rosenberg, Michael
author_facet	Simon, Steven Mandair, Divneet Albakri, Abdel Fohner, Alison Simon, Noah Lange, Leslie Biggs, Mary Mukamal, Kenneth Psaty, Bruce Rosenberg, Michael
author_sort	Simon, Steven
collection	PubMed
description	BACKGROUND: Many machine learning approaches are limited to classification of outcomes rather than longitudinal prediction. One strategy to use machine learning in clinical risk prediction is to classify outcomes over a given time horizon. However, it is not well-known how to identify the optimal time horizon for risk prediction. OBJECTIVE: In this study, we aim to identify an optimal time horizon for classification of incident myocardial infarction (MI) using machine learning approaches looped over outcomes with increasing time horizons. Additionally, we sought to compare the performance of these models with the traditional Framingham Heart Study (FHS) coronary heart disease gender-specific Cox proportional hazards regression model. METHODS: We analyzed data from a single clinic visit of 5201 participants of a cardiovascular health study. We examined 61 variables collected from this baseline exam, including demographic and biologic data, medical history, medications, serum biomarkers, electrocardiographic, and echocardiographic data. We compared several machine learning methods (eg, random forest, L1 regression, gradient boosted decision tree, support vector machine, and k-nearest neighbor) trained to predict incident MI that occurred within time horizons ranging from 500-10,000 days of follow-up. Models were compared on a 20% held-out testing set using area under the receiver operating characteristic curve (AUROC). Variable importance was performed for random forest and L1 regression models across time points. We compared results with the FHS coronary heart disease gender-specific Cox proportional hazards regression functions. RESULTS: There were 4190 participants included in the analysis, with 2522 (60.2%) female participants and an average age of 72.6 years. Over 10,000 days of follow-up, there were 813 incident MI events. The machine learning models were most predictive over moderate follow-up time horizons (ie, 1500-2500 days). Overall, the L1 (Lasso) logistic regression demonstrated the strongest classification accuracy across all time horizons. This model was most predictive at 1500 days follow-up, with an AUROC of 0.71. The most influential variables differed by follow-up time and model, with gender being the most important feature for the L1 regression and weight for the random forest model across all time frames. Compared with the Framingham Cox function, the L1 and random forest models performed better across all time frames beyond 1500 days. CONCLUSIONS: In a population free of coronary heart disease, machine learning techniques can be used to predict incident MI at varying time horizons with reasonable accuracy, with the strongest prediction accuracy in moderate follow-up periods. Validation across additional populations is needed to confirm the validity of this approach in risk prediction.
format	Online Article Text
id	pubmed-9669890
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-96698902022-11-18 The Impact of Time Horizon on Classification Accuracy: Application of Machine Learning to Prediction of Incident Coronary Heart Disease Simon, Steven Mandair, Divneet Albakri, Abdel Fohner, Alison Simon, Noah Lange, Leslie Biggs, Mary Mukamal, Kenneth Psaty, Bruce Rosenberg, Michael JMIR Cardio Original Paper BACKGROUND: Many machine learning approaches are limited to classification of outcomes rather than longitudinal prediction. One strategy to use machine learning in clinical risk prediction is to classify outcomes over a given time horizon. However, it is not well-known how to identify the optimal time horizon for risk prediction. OBJECTIVE: In this study, we aim to identify an optimal time horizon for classification of incident myocardial infarction (MI) using machine learning approaches looped over outcomes with increasing time horizons. Additionally, we sought to compare the performance of these models with the traditional Framingham Heart Study (FHS) coronary heart disease gender-specific Cox proportional hazards regression model. METHODS: We analyzed data from a single clinic visit of 5201 participants of a cardiovascular health study. We examined 61 variables collected from this baseline exam, including demographic and biologic data, medical history, medications, serum biomarkers, electrocardiographic, and echocardiographic data. We compared several machine learning methods (eg, random forest, L1 regression, gradient boosted decision tree, support vector machine, and k-nearest neighbor) trained to predict incident MI that occurred within time horizons ranging from 500-10,000 days of follow-up. Models were compared on a 20% held-out testing set using area under the receiver operating characteristic curve (AUROC). Variable importance was performed for random forest and L1 regression models across time points. We compared results with the FHS coronary heart disease gender-specific Cox proportional hazards regression functions. RESULTS: There were 4190 participants included in the analysis, with 2522 (60.2%) female participants and an average age of 72.6 years. Over 10,000 days of follow-up, there were 813 incident MI events. The machine learning models were most predictive over moderate follow-up time horizons (ie, 1500-2500 days). Overall, the L1 (Lasso) logistic regression demonstrated the strongest classification accuracy across all time horizons. This model was most predictive at 1500 days follow-up, with an AUROC of 0.71. The most influential variables differed by follow-up time and model, with gender being the most important feature for the L1 regression and weight for the random forest model across all time frames. Compared with the Framingham Cox function, the L1 and random forest models performed better across all time frames beyond 1500 days. CONCLUSIONS: In a population free of coronary heart disease, machine learning techniques can be used to predict incident MI at varying time horizons with reasonable accuracy, with the strongest prediction accuracy in moderate follow-up periods. Validation across additional populations is needed to confirm the validity of this approach in risk prediction. JMIR Publications 2022-11-02 /pmc/articles/PMC9669890/ /pubmed/36322114 http://dx.doi.org/10.2196/38040 Text en ©Steven Simon, Divneet Mandair, Abdel Albakri, Alison Fohner, Noah Simon, Leslie Lange, Mary Biggs, Kenneth Mukamal, Bruce Psaty, Michael Rosenberg. Originally published in JMIR Cardio (https://cardio.jmir.org), 02.11.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Cardio, is properly cited. The complete bibliographic information, a link to the original publication on https://cardio.jmir.org, as well as this copyright and license information must be included.
spellingShingle	Original Paper Simon, Steven Mandair, Divneet Albakri, Abdel Fohner, Alison Simon, Noah Lange, Leslie Biggs, Mary Mukamal, Kenneth Psaty, Bruce Rosenberg, Michael The Impact of Time Horizon on Classification Accuracy: Application of Machine Learning to Prediction of Incident Coronary Heart Disease
title	The Impact of Time Horizon on Classification Accuracy: Application of Machine Learning to Prediction of Incident Coronary Heart Disease
title_full	The Impact of Time Horizon on Classification Accuracy: Application of Machine Learning to Prediction of Incident Coronary Heart Disease
title_fullStr	The Impact of Time Horizon on Classification Accuracy: Application of Machine Learning to Prediction of Incident Coronary Heart Disease
title_full_unstemmed	The Impact of Time Horizon on Classification Accuracy: Application of Machine Learning to Prediction of Incident Coronary Heart Disease
title_short	The Impact of Time Horizon on Classification Accuracy: Application of Machine Learning to Prediction of Incident Coronary Heart Disease
title_sort	impact of time horizon on classification accuracy: application of machine learning to prediction of incident coronary heart disease
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9669890/ https://www.ncbi.nlm.nih.gov/pubmed/36322114 http://dx.doi.org/10.2196/38040
work_keys_str_mv	AT simonsteven theimpactoftimehorizononclassificationaccuracyapplicationofmachinelearningtopredictionofincidentcoronaryheartdisease AT mandairdivneet theimpactoftimehorizononclassificationaccuracyapplicationofmachinelearningtopredictionofincidentcoronaryheartdisease AT albakriabdel theimpactoftimehorizononclassificationaccuracyapplicationofmachinelearningtopredictionofincidentcoronaryheartdisease AT fohneralison theimpactoftimehorizononclassificationaccuracyapplicationofmachinelearningtopredictionofincidentcoronaryheartdisease AT simonnoah theimpactoftimehorizononclassificationaccuracyapplicationofmachinelearningtopredictionofincidentcoronaryheartdisease AT langeleslie theimpactoftimehorizononclassificationaccuracyapplicationofmachinelearningtopredictionofincidentcoronaryheartdisease AT biggsmary theimpactoftimehorizononclassificationaccuracyapplicationofmachinelearningtopredictionofincidentcoronaryheartdisease AT mukamalkenneth theimpactoftimehorizononclassificationaccuracyapplicationofmachinelearningtopredictionofincidentcoronaryheartdisease AT psatybruce theimpactoftimehorizononclassificationaccuracyapplicationofmachinelearningtopredictionofincidentcoronaryheartdisease AT rosenbergmichael theimpactoftimehorizononclassificationaccuracyapplicationofmachinelearningtopredictionofincidentcoronaryheartdisease AT simonsteven impactoftimehorizononclassificationaccuracyapplicationofmachinelearningtopredictionofincidentcoronaryheartdisease AT mandairdivneet impactoftimehorizononclassificationaccuracyapplicationofmachinelearningtopredictionofincidentcoronaryheartdisease AT albakriabdel impactoftimehorizononclassificationaccuracyapplicationofmachinelearningtopredictionofincidentcoronaryheartdisease AT fohneralison impactoftimehorizononclassificationaccuracyapplicationofmachinelearningtopredictionofincidentcoronaryheartdisease AT simonnoah impactoftimehorizononclassificationaccuracyapplicationofmachinelearningtopredictionofincidentcoronaryheartdisease AT langeleslie impactoftimehorizononclassificationaccuracyapplicationofmachinelearningtopredictionofincidentcoronaryheartdisease AT biggsmary impactoftimehorizononclassificationaccuracyapplicationofmachinelearningtopredictionofincidentcoronaryheartdisease AT mukamalkenneth impactoftimehorizononclassificationaccuracyapplicationofmachinelearningtopredictionofincidentcoronaryheartdisease AT psatybruce impactoftimehorizononclassificationaccuracyapplicationofmachinelearningtopredictionofincidentcoronaryheartdisease AT rosenbergmichael impactoftimehorizononclassificationaccuracyapplicationofmachinelearningtopredictionofincidentcoronaryheartdisease

The Impact of Time Horizon on Classification Accuracy: Application of Machine Learning to Prediction of Incident Coronary Heart Disease

Ejemplares similares