Cargando…

Evolutionary methods for variable selection in the epidemiological modeling of cardiovascular diseases

BACKGROUND: The redundancy of information is becoming a critical issue for epidemiologists. High-dimensional datasets require new effective variable selection methods to be developed. This study implements an advanced evolutionary variable selection method which is applied for cardiovascular predict...

Descripción completa

Detalles Bibliográficos
Autores principales: Brester, Christina, Kauhanen, Jussi, Tuomainen, Tomi-Pekka, Voutilainen, Sari, Rönkkö, Mauno, Ronkainen, Kimmo, Semenkin, Eugene, Kolehmainen, Mikko
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6092817/
https://www.ncbi.nlm.nih.gov/pubmed/30127856
http://dx.doi.org/10.1186/s13040-018-0180-x
_version_ 1783347597251969024
author Brester, Christina
Kauhanen, Jussi
Tuomainen, Tomi-Pekka
Voutilainen, Sari
Rönkkö, Mauno
Ronkainen, Kimmo
Semenkin, Eugene
Kolehmainen, Mikko
author_facet Brester, Christina
Kauhanen, Jussi
Tuomainen, Tomi-Pekka
Voutilainen, Sari
Rönkkö, Mauno
Ronkainen, Kimmo
Semenkin, Eugene
Kolehmainen, Mikko
author_sort Brester, Christina
collection PubMed
description BACKGROUND: The redundancy of information is becoming a critical issue for epidemiologists. High-dimensional datasets require new effective variable selection methods to be developed. This study implements an advanced evolutionary variable selection method which is applied for cardiovascular predictive modeling. The epidemiological follow-up study KIHD (Kuopio Ischemic Heart Disease Risk Factor Study) was used to compare the designed variable selection method based on an evolutionary search with conventional stepwise selection. The sample contains in total 433 predictor variables and a response variable indicating incidents of cardiovascular diseases for 1465 study subjects. RESULTS: The effectiveness of variable selection methods was investigated in combination with two models: Generalized Linear Logistic Regression and Support Vector Machine. We managed to decrease the number of variables from 433 to 38 and save the predictive ability of the models used. Their performance was evaluated with an F-score metric. At most, we gained 65.6% and 67.4% of the F-score before and after variable selection respectively. All the results were averaged over 5-folds of a cross-validation procedure. CONCLUSIONS: The presented evolutionary variable selection method allows a reduced set of variables to be chosen which are relevant to predicting cardiovascular diseases. A reference list of the most meaningful variables is introduced to be used as a basis for new epidemiological studies. In general, the multicollinearity of variables enables different combinations of predictors to be used and the same performance of models to be attained. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13040-018-0180-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6092817
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-60928172018-08-20 Evolutionary methods for variable selection in the epidemiological modeling of cardiovascular diseases Brester, Christina Kauhanen, Jussi Tuomainen, Tomi-Pekka Voutilainen, Sari Rönkkö, Mauno Ronkainen, Kimmo Semenkin, Eugene Kolehmainen, Mikko BioData Min Research BACKGROUND: The redundancy of information is becoming a critical issue for epidemiologists. High-dimensional datasets require new effective variable selection methods to be developed. This study implements an advanced evolutionary variable selection method which is applied for cardiovascular predictive modeling. The epidemiological follow-up study KIHD (Kuopio Ischemic Heart Disease Risk Factor Study) was used to compare the designed variable selection method based on an evolutionary search with conventional stepwise selection. The sample contains in total 433 predictor variables and a response variable indicating incidents of cardiovascular diseases for 1465 study subjects. RESULTS: The effectiveness of variable selection methods was investigated in combination with two models: Generalized Linear Logistic Regression and Support Vector Machine. We managed to decrease the number of variables from 433 to 38 and save the predictive ability of the models used. Their performance was evaluated with an F-score metric. At most, we gained 65.6% and 67.4% of the F-score before and after variable selection respectively. All the results were averaged over 5-folds of a cross-validation procedure. CONCLUSIONS: The presented evolutionary variable selection method allows a reduced set of variables to be chosen which are relevant to predicting cardiovascular diseases. A reference list of the most meaningful variables is introduced to be used as a basis for new epidemiological studies. In general, the multicollinearity of variables enables different combinations of predictors to be used and the same performance of models to be attained. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13040-018-0180-x) contains supplementary material, which is available to authorized users. BioMed Central 2018-08-14 /pmc/articles/PMC6092817/ /pubmed/30127856 http://dx.doi.org/10.1186/s13040-018-0180-x Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Brester, Christina
Kauhanen, Jussi
Tuomainen, Tomi-Pekka
Voutilainen, Sari
Rönkkö, Mauno
Ronkainen, Kimmo
Semenkin, Eugene
Kolehmainen, Mikko
Evolutionary methods for variable selection in the epidemiological modeling of cardiovascular diseases
title Evolutionary methods for variable selection in the epidemiological modeling of cardiovascular diseases
title_full Evolutionary methods for variable selection in the epidemiological modeling of cardiovascular diseases
title_fullStr Evolutionary methods for variable selection in the epidemiological modeling of cardiovascular diseases
title_full_unstemmed Evolutionary methods for variable selection in the epidemiological modeling of cardiovascular diseases
title_short Evolutionary methods for variable selection in the epidemiological modeling of cardiovascular diseases
title_sort evolutionary methods for variable selection in the epidemiological modeling of cardiovascular diseases
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6092817/
https://www.ncbi.nlm.nih.gov/pubmed/30127856
http://dx.doi.org/10.1186/s13040-018-0180-x
work_keys_str_mv AT bresterchristina evolutionarymethodsforvariableselectionintheepidemiologicalmodelingofcardiovasculardiseases
AT kauhanenjussi evolutionarymethodsforvariableselectionintheepidemiologicalmodelingofcardiovasculardiseases
AT tuomainentomipekka evolutionarymethodsforvariableselectionintheepidemiologicalmodelingofcardiovasculardiseases
AT voutilainensari evolutionarymethodsforvariableselectionintheepidemiologicalmodelingofcardiovasculardiseases
AT ronkkomauno evolutionarymethodsforvariableselectionintheepidemiologicalmodelingofcardiovasculardiseases
AT ronkainenkimmo evolutionarymethodsforvariableselectionintheepidemiologicalmodelingofcardiovasculardiseases
AT semenkineugene evolutionarymethodsforvariableselectionintheepidemiologicalmodelingofcardiovasculardiseases
AT kolehmainenmikko evolutionarymethodsforvariableselectionintheepidemiologicalmodelingofcardiovasculardiseases