Cargando…
Evolutionary methods for variable selection in the epidemiological modeling of cardiovascular diseases
BACKGROUND: The redundancy of information is becoming a critical issue for epidemiologists. High-dimensional datasets require new effective variable selection methods to be developed. This study implements an advanced evolutionary variable selection method which is applied for cardiovascular predict...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6092817/ https://www.ncbi.nlm.nih.gov/pubmed/30127856 http://dx.doi.org/10.1186/s13040-018-0180-x |
_version_ | 1783347597251969024 |
---|---|
author | Brester, Christina Kauhanen, Jussi Tuomainen, Tomi-Pekka Voutilainen, Sari Rönkkö, Mauno Ronkainen, Kimmo Semenkin, Eugene Kolehmainen, Mikko |
author_facet | Brester, Christina Kauhanen, Jussi Tuomainen, Tomi-Pekka Voutilainen, Sari Rönkkö, Mauno Ronkainen, Kimmo Semenkin, Eugene Kolehmainen, Mikko |
author_sort | Brester, Christina |
collection | PubMed |
description | BACKGROUND: The redundancy of information is becoming a critical issue for epidemiologists. High-dimensional datasets require new effective variable selection methods to be developed. This study implements an advanced evolutionary variable selection method which is applied for cardiovascular predictive modeling. The epidemiological follow-up study KIHD (Kuopio Ischemic Heart Disease Risk Factor Study) was used to compare the designed variable selection method based on an evolutionary search with conventional stepwise selection. The sample contains in total 433 predictor variables and a response variable indicating incidents of cardiovascular diseases for 1465 study subjects. RESULTS: The effectiveness of variable selection methods was investigated in combination with two models: Generalized Linear Logistic Regression and Support Vector Machine. We managed to decrease the number of variables from 433 to 38 and save the predictive ability of the models used. Their performance was evaluated with an F-score metric. At most, we gained 65.6% and 67.4% of the F-score before and after variable selection respectively. All the results were averaged over 5-folds of a cross-validation procedure. CONCLUSIONS: The presented evolutionary variable selection method allows a reduced set of variables to be chosen which are relevant to predicting cardiovascular diseases. A reference list of the most meaningful variables is introduced to be used as a basis for new epidemiological studies. In general, the multicollinearity of variables enables different combinations of predictors to be used and the same performance of models to be attained. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13040-018-0180-x) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6092817 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-60928172018-08-20 Evolutionary methods for variable selection in the epidemiological modeling of cardiovascular diseases Brester, Christina Kauhanen, Jussi Tuomainen, Tomi-Pekka Voutilainen, Sari Rönkkö, Mauno Ronkainen, Kimmo Semenkin, Eugene Kolehmainen, Mikko BioData Min Research BACKGROUND: The redundancy of information is becoming a critical issue for epidemiologists. High-dimensional datasets require new effective variable selection methods to be developed. This study implements an advanced evolutionary variable selection method which is applied for cardiovascular predictive modeling. The epidemiological follow-up study KIHD (Kuopio Ischemic Heart Disease Risk Factor Study) was used to compare the designed variable selection method based on an evolutionary search with conventional stepwise selection. The sample contains in total 433 predictor variables and a response variable indicating incidents of cardiovascular diseases for 1465 study subjects. RESULTS: The effectiveness of variable selection methods was investigated in combination with two models: Generalized Linear Logistic Regression and Support Vector Machine. We managed to decrease the number of variables from 433 to 38 and save the predictive ability of the models used. Their performance was evaluated with an F-score metric. At most, we gained 65.6% and 67.4% of the F-score before and after variable selection respectively. All the results were averaged over 5-folds of a cross-validation procedure. CONCLUSIONS: The presented evolutionary variable selection method allows a reduced set of variables to be chosen which are relevant to predicting cardiovascular diseases. A reference list of the most meaningful variables is introduced to be used as a basis for new epidemiological studies. In general, the multicollinearity of variables enables different combinations of predictors to be used and the same performance of models to be attained. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13040-018-0180-x) contains supplementary material, which is available to authorized users. BioMed Central 2018-08-14 /pmc/articles/PMC6092817/ /pubmed/30127856 http://dx.doi.org/10.1186/s13040-018-0180-x Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Brester, Christina Kauhanen, Jussi Tuomainen, Tomi-Pekka Voutilainen, Sari Rönkkö, Mauno Ronkainen, Kimmo Semenkin, Eugene Kolehmainen, Mikko Evolutionary methods for variable selection in the epidemiological modeling of cardiovascular diseases |
title | Evolutionary methods for variable selection in the epidemiological modeling of cardiovascular diseases |
title_full | Evolutionary methods for variable selection in the epidemiological modeling of cardiovascular diseases |
title_fullStr | Evolutionary methods for variable selection in the epidemiological modeling of cardiovascular diseases |
title_full_unstemmed | Evolutionary methods for variable selection in the epidemiological modeling of cardiovascular diseases |
title_short | Evolutionary methods for variable selection in the epidemiological modeling of cardiovascular diseases |
title_sort | evolutionary methods for variable selection in the epidemiological modeling of cardiovascular diseases |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6092817/ https://www.ncbi.nlm.nih.gov/pubmed/30127856 http://dx.doi.org/10.1186/s13040-018-0180-x |
work_keys_str_mv | AT bresterchristina evolutionarymethodsforvariableselectionintheepidemiologicalmodelingofcardiovasculardiseases AT kauhanenjussi evolutionarymethodsforvariableselectionintheepidemiologicalmodelingofcardiovasculardiseases AT tuomainentomipekka evolutionarymethodsforvariableselectionintheepidemiologicalmodelingofcardiovasculardiseases AT voutilainensari evolutionarymethodsforvariableselectionintheepidemiologicalmodelingofcardiovasculardiseases AT ronkkomauno evolutionarymethodsforvariableselectionintheepidemiologicalmodelingofcardiovasculardiseases AT ronkainenkimmo evolutionarymethodsforvariableselectionintheepidemiologicalmodelingofcardiovasculardiseases AT semenkineugene evolutionarymethodsforvariableselectionintheepidemiologicalmodelingofcardiovasculardiseases AT kolehmainenmikko evolutionarymethodsforvariableselectionintheepidemiologicalmodelingofcardiovasculardiseases |