Cargando…

Development and Application of a Genetic Algorithm for Variable Optimization and Predictive Modeling of Five-Year Mortality Using Questionnaire Data

The problem of selecting important variables for predictive modeling of a specific outcome of interest using questionnaire data has rarely been addressed in clinical settings. In this study, we implemented a genetic algorithm (GA) technique to select optimal variables from questionnaire data for pre...

Descripción completa

Detalles Bibliográficos
Autores principales: Adams, Lucas J., Bello, Ghalib, Dumancas, Gerard G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Libertas Academica 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4639510/
https://www.ncbi.nlm.nih.gov/pubmed/26604716
http://dx.doi.org/10.4137/BBI.S29469
_version_ 1782399926387343360
author Adams, Lucas J.
Bello, Ghalib
Dumancas, Gerard G.
author_facet Adams, Lucas J.
Bello, Ghalib
Dumancas, Gerard G.
author_sort Adams, Lucas J.
collection PubMed
description The problem of selecting important variables for predictive modeling of a specific outcome of interest using questionnaire data has rarely been addressed in clinical settings. In this study, we implemented a genetic algorithm (GA) technique to select optimal variables from questionnaire data for predicting a five-year mortality. We examined 123 questions (variables) answered by 5,444 individuals in the National Health and Nutrition Examination Survey. The GA iterations selected the top 24 variables, including questions related to stroke, emphysema, and general health problems requiring the use of special equipment, for use in predictive modeling by various parametric and nonparametric machine learning techniques. Using these top 24 variables, gradient boosting yielded the nominally highest performance (area under curve [AUC] = 0.7654), although there were other techniques with lower but not significantly different AUC. This study shows how GA in conjunction with various machine learning techniques could be used to examine questionnaire data to predict a binary outcome.
format Online
Article
Text
id pubmed-4639510
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-46395102015-11-24 Development and Application of a Genetic Algorithm for Variable Optimization and Predictive Modeling of Five-Year Mortality Using Questionnaire Data Adams, Lucas J. Bello, Ghalib Dumancas, Gerard G. Bioinform Biol Insights Original Research The problem of selecting important variables for predictive modeling of a specific outcome of interest using questionnaire data has rarely been addressed in clinical settings. In this study, we implemented a genetic algorithm (GA) technique to select optimal variables from questionnaire data for predicting a five-year mortality. We examined 123 questions (variables) answered by 5,444 individuals in the National Health and Nutrition Examination Survey. The GA iterations selected the top 24 variables, including questions related to stroke, emphysema, and general health problems requiring the use of special equipment, for use in predictive modeling by various parametric and nonparametric machine learning techniques. Using these top 24 variables, gradient boosting yielded the nominally highest performance (area under curve [AUC] = 0.7654), although there were other techniques with lower but not significantly different AUC. This study shows how GA in conjunction with various machine learning techniques could be used to examine questionnaire data to predict a binary outcome. Libertas Academica 2015-11-08 /pmc/articles/PMC4639510/ /pubmed/26604716 http://dx.doi.org/10.4137/BBI.S29469 Text en © 2015 the author(s), publisher and licensee Libertas Academica Ltd. This is an open-access article distributed under the terms of the Creative Commons CC-BY-NC 3.0 License.
spellingShingle Original Research
Adams, Lucas J.
Bello, Ghalib
Dumancas, Gerard G.
Development and Application of a Genetic Algorithm for Variable Optimization and Predictive Modeling of Five-Year Mortality Using Questionnaire Data
title Development and Application of a Genetic Algorithm for Variable Optimization and Predictive Modeling of Five-Year Mortality Using Questionnaire Data
title_full Development and Application of a Genetic Algorithm for Variable Optimization and Predictive Modeling of Five-Year Mortality Using Questionnaire Data
title_fullStr Development and Application of a Genetic Algorithm for Variable Optimization and Predictive Modeling of Five-Year Mortality Using Questionnaire Data
title_full_unstemmed Development and Application of a Genetic Algorithm for Variable Optimization and Predictive Modeling of Five-Year Mortality Using Questionnaire Data
title_short Development and Application of a Genetic Algorithm for Variable Optimization and Predictive Modeling of Five-Year Mortality Using Questionnaire Data
title_sort development and application of a genetic algorithm for variable optimization and predictive modeling of five-year mortality using questionnaire data
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4639510/
https://www.ncbi.nlm.nih.gov/pubmed/26604716
http://dx.doi.org/10.4137/BBI.S29469
work_keys_str_mv AT adamslucasj developmentandapplicationofageneticalgorithmforvariableoptimizationandpredictivemodelingoffiveyearmortalityusingquestionnairedata
AT belloghalib developmentandapplicationofageneticalgorithmforvariableoptimizationandpredictivemodelingoffiveyearmortalityusingquestionnairedata
AT dumancasgerardg developmentandapplicationofageneticalgorithmforvariableoptimizationandpredictivemodelingoffiveyearmortalityusingquestionnairedata