Cargando…

Post-Analysis of Predictive Modeling with an Epidemiological Example

Post-analysis of predictive models fosters their application in practice, as domain experts want to understand the logic behind them. In epidemiology, methods explaining sophisticated models facilitate the usage of up-to-date tools, especially in the high-dimensional predictor space. Investigating h...

Descripción completa

Detalles Bibliográficos
Autores principales:	Brester, Christina, Voutilainen, Ari, Tuomainen, Tomi-Pekka, Kauhanen, Jussi, Kolehmainen, Mikko
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8304882/ https://www.ncbi.nlm.nih.gov/pubmed/34202622 http://dx.doi.org/10.3390/healthcare9070792

_version_	1783727441589567488
author	Brester, Christina Voutilainen, Ari Tuomainen, Tomi-Pekka Kauhanen, Jussi Kolehmainen, Mikko
author_facet	Brester, Christina Voutilainen, Ari Tuomainen, Tomi-Pekka Kauhanen, Jussi Kolehmainen, Mikko
author_sort	Brester, Christina
collection	PubMed
description	Post-analysis of predictive models fosters their application in practice, as domain experts want to understand the logic behind them. In epidemiology, methods explaining sophisticated models facilitate the usage of up-to-date tools, especially in the high-dimensional predictor space. Investigating how model performance varies for subjects with different conditions is one of the important parts of post-analysis. This paper presents a model-independent approach for post-analysis, aiming to reveal those subjects’ conditions that lead to low or high model performance, compared to the average level on the whole sample. Conditions of interest are presented in the form of rules generated by a multi-objective evolutionary algorithm (MOGA). In this study, Lasso logistic regression (LLR) was trained to predict cardiovascular death by 2016 using the data from the 1984–1989 examination within the Kuopio Ischemic Heart Disease Risk Factor Study (KIHD), which contained 2682 subjects and 950 preselected predictors. After 50 independent runs of five-fold cross-validation, the model performance collected for each subject was used to generate rules describing “easy” and “difficult” cases. LLR with 61 selected predictors, on average, achieved 72.53% accuracy on the whole sample. However, during post-analysis, three categories of subjects were discovered: “Easy” cases with an LLR accuracy of 95.84%, “difficult” cases with an LLR accuracy of 48.11%, and the remaining cases with an LLR accuracy of 71.00%. Moreover, the rule analysis showed that medication was one of the main confusing factors that led to lower model performance. The proposed approach provides insightful information about subjects’ conditions that complicate predictive modeling.
format	Online Article Text
id	pubmed-8304882
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-83048822021-07-25 Post-Analysis of Predictive Modeling with an Epidemiological Example Brester, Christina Voutilainen, Ari Tuomainen, Tomi-Pekka Kauhanen, Jussi Kolehmainen, Mikko Healthcare (Basel) Article Post-analysis of predictive models fosters their application in practice, as domain experts want to understand the logic behind them. In epidemiology, methods explaining sophisticated models facilitate the usage of up-to-date tools, especially in the high-dimensional predictor space. Investigating how model performance varies for subjects with different conditions is one of the important parts of post-analysis. This paper presents a model-independent approach for post-analysis, aiming to reveal those subjects’ conditions that lead to low or high model performance, compared to the average level on the whole sample. Conditions of interest are presented in the form of rules generated by a multi-objective evolutionary algorithm (MOGA). In this study, Lasso logistic regression (LLR) was trained to predict cardiovascular death by 2016 using the data from the 1984–1989 examination within the Kuopio Ischemic Heart Disease Risk Factor Study (KIHD), which contained 2682 subjects and 950 preselected predictors. After 50 independent runs of five-fold cross-validation, the model performance collected for each subject was used to generate rules describing “easy” and “difficult” cases. LLR with 61 selected predictors, on average, achieved 72.53% accuracy on the whole sample. However, during post-analysis, three categories of subjects were discovered: “Easy” cases with an LLR accuracy of 95.84%, “difficult” cases with an LLR accuracy of 48.11%, and the remaining cases with an LLR accuracy of 71.00%. Moreover, the rule analysis showed that medication was one of the main confusing factors that led to lower model performance. The proposed approach provides insightful information about subjects’ conditions that complicate predictive modeling. MDPI 2021-06-24 /pmc/articles/PMC8304882/ /pubmed/34202622 http://dx.doi.org/10.3390/healthcare9070792 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Brester, Christina Voutilainen, Ari Tuomainen, Tomi-Pekka Kauhanen, Jussi Kolehmainen, Mikko Post-Analysis of Predictive Modeling with an Epidemiological Example
title	Post-Analysis of Predictive Modeling with an Epidemiological Example
title_full	Post-Analysis of Predictive Modeling with an Epidemiological Example
title_fullStr	Post-Analysis of Predictive Modeling with an Epidemiological Example
title_full_unstemmed	Post-Analysis of Predictive Modeling with an Epidemiological Example
title_short	Post-Analysis of Predictive Modeling with an Epidemiological Example
title_sort	post-analysis of predictive modeling with an epidemiological example
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8304882/ https://www.ncbi.nlm.nih.gov/pubmed/34202622 http://dx.doi.org/10.3390/healthcare9070792
work_keys_str_mv	AT bresterchristina postanalysisofpredictivemodelingwithanepidemiologicalexample AT voutilainenari postanalysisofpredictivemodelingwithanepidemiologicalexample AT tuomainentomipekka postanalysisofpredictivemodelingwithanepidemiologicalexample AT kauhanenjussi postanalysisofpredictivemodelingwithanepidemiologicalexample AT kolehmainenmikko postanalysisofpredictivemodelingwithanepidemiologicalexample

Post-Analysis of Predictive Modeling with an Epidemiological Example

Ejemplares similares