Cargando…

An ensemble-based feature selection framework to select risk factors of childhood obesity for policy decision making

BACKGROUND: The increasing prevalence of childhood obesity makes it essential to study the risk factors with a sample representative of the population covering more health topics for better preventive policies and interventions. It is aimed to develop an ensemble feature selection framework for larg...

Descripción completa

Detalles Bibliográficos
Autores principales: Shi, Xi, Nikolic, Gorana, Epelde, Gorka, Arrúe, Mónica, Bidaurrazaga Van-Dierdonck , Joseba, Bilbao, Roberto, De Moor, Bart
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8293582/
https://www.ncbi.nlm.nih.gov/pubmed/34289843
http://dx.doi.org/10.1186/s12911-021-01580-0
_version_ 1783725072054222848
author Shi, Xi
Nikolic, Gorana
Epelde, Gorka
Arrúe, Mónica
Bidaurrazaga Van-Dierdonck , Joseba
Bilbao, Roberto
De Moor, Bart
author_facet Shi, Xi
Nikolic, Gorana
Epelde, Gorka
Arrúe, Mónica
Bidaurrazaga Van-Dierdonck , Joseba
Bilbao, Roberto
De Moor, Bart
author_sort Shi, Xi
collection PubMed
description BACKGROUND: The increasing prevalence of childhood obesity makes it essential to study the risk factors with a sample representative of the population covering more health topics for better preventive policies and interventions. It is aimed to develop an ensemble feature selection framework for large-scale data to identify risk factors of childhood obesity with good interpretability and clinical relevance. METHODS: We analyzed the data collected from 426,813 children under 18 during 2000–2019. A BMI above the 90th percentile for the children of the same age and gender was defined as overweight. An ensemble feature selection framework, Bagging-based Feature Selection framework integrating MapReduce (BFSMR), was proposed to identify risk factors. The framework comprises 5 models (filter with mutual information/SVM-RFE/Lasso/Ridge/Random Forest) from filter, wrapper, and embedded feature selection methods. Each feature selection model identified 10 variables based on variable importance. Considering accuracy, F-score, and model characteristics, the models were classified into 3 levels with different weights: Lasso/Ridge, Filter/SVM-RFE, and Random Forest. The voting strategy was applied to aggregate the selected features, with both feature weights and model weights taken into consideration. We compared our voting strategy with another two for selecting top-ranked features in terms of 6 dimensions of interpretability. RESULTS: Our method performed the best to select the features with good interpretability and clinical relevance. The top 10 features selected by BFSMR are age, sex, birth year, breastfeeding type, smoking habit and diet-related knowledge of both children and mothers, exercise, and Mother’s systolic blood pressure. CONCLUSION: Our framework provides a solution for identifying a diverse and interpretable feature set without model bias from large-scale data, which can help identify risk factors of childhood obesity and potentially some other diseases for future interventions or policies. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-021-01580-0.
format Online
Article
Text
id pubmed-8293582
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-82935822021-07-21 An ensemble-based feature selection framework to select risk factors of childhood obesity for policy decision making Shi, Xi Nikolic, Gorana Epelde, Gorka Arrúe, Mónica Bidaurrazaga Van-Dierdonck , Joseba Bilbao, Roberto De Moor, Bart BMC Med Inform Decis Mak Research BACKGROUND: The increasing prevalence of childhood obesity makes it essential to study the risk factors with a sample representative of the population covering more health topics for better preventive policies and interventions. It is aimed to develop an ensemble feature selection framework for large-scale data to identify risk factors of childhood obesity with good interpretability and clinical relevance. METHODS: We analyzed the data collected from 426,813 children under 18 during 2000–2019. A BMI above the 90th percentile for the children of the same age and gender was defined as overweight. An ensemble feature selection framework, Bagging-based Feature Selection framework integrating MapReduce (BFSMR), was proposed to identify risk factors. The framework comprises 5 models (filter with mutual information/SVM-RFE/Lasso/Ridge/Random Forest) from filter, wrapper, and embedded feature selection methods. Each feature selection model identified 10 variables based on variable importance. Considering accuracy, F-score, and model characteristics, the models were classified into 3 levels with different weights: Lasso/Ridge, Filter/SVM-RFE, and Random Forest. The voting strategy was applied to aggregate the selected features, with both feature weights and model weights taken into consideration. We compared our voting strategy with another two for selecting top-ranked features in terms of 6 dimensions of interpretability. RESULTS: Our method performed the best to select the features with good interpretability and clinical relevance. The top 10 features selected by BFSMR are age, sex, birth year, breastfeeding type, smoking habit and diet-related knowledge of both children and mothers, exercise, and Mother’s systolic blood pressure. CONCLUSION: Our framework provides a solution for identifying a diverse and interpretable feature set without model bias from large-scale data, which can help identify risk factors of childhood obesity and potentially some other diseases for future interventions or policies. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-021-01580-0. BioMed Central 2021-07-21 /pmc/articles/PMC8293582/ /pubmed/34289843 http://dx.doi.org/10.1186/s12911-021-01580-0 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Shi, Xi
Nikolic, Gorana
Epelde, Gorka
Arrúe, Mónica
Bidaurrazaga Van-Dierdonck , Joseba
Bilbao, Roberto
De Moor, Bart
An ensemble-based feature selection framework to select risk factors of childhood obesity for policy decision making
title An ensemble-based feature selection framework to select risk factors of childhood obesity for policy decision making
title_full An ensemble-based feature selection framework to select risk factors of childhood obesity for policy decision making
title_fullStr An ensemble-based feature selection framework to select risk factors of childhood obesity for policy decision making
title_full_unstemmed An ensemble-based feature selection framework to select risk factors of childhood obesity for policy decision making
title_short An ensemble-based feature selection framework to select risk factors of childhood obesity for policy decision making
title_sort ensemble-based feature selection framework to select risk factors of childhood obesity for policy decision making
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8293582/
https://www.ncbi.nlm.nih.gov/pubmed/34289843
http://dx.doi.org/10.1186/s12911-021-01580-0
work_keys_str_mv AT shixi anensemblebasedfeatureselectionframeworktoselectriskfactorsofchildhoodobesityforpolicydecisionmaking
AT nikolicgorana anensemblebasedfeatureselectionframeworktoselectriskfactorsofchildhoodobesityforpolicydecisionmaking
AT epeldegorka anensemblebasedfeatureselectionframeworktoselectriskfactorsofchildhoodobesityforpolicydecisionmaking
AT arruemonica anensemblebasedfeatureselectionframeworktoselectriskfactorsofchildhoodobesityforpolicydecisionmaking
AT bidaurrazagavandierdonckjoseba anensemblebasedfeatureselectionframeworktoselectriskfactorsofchildhoodobesityforpolicydecisionmaking
AT bilbaoroberto anensemblebasedfeatureselectionframeworktoselectriskfactorsofchildhoodobesityforpolicydecisionmaking
AT demoorbart anensemblebasedfeatureselectionframeworktoselectriskfactorsofchildhoodobesityforpolicydecisionmaking
AT shixi ensemblebasedfeatureselectionframeworktoselectriskfactorsofchildhoodobesityforpolicydecisionmaking
AT nikolicgorana ensemblebasedfeatureselectionframeworktoselectriskfactorsofchildhoodobesityforpolicydecisionmaking
AT epeldegorka ensemblebasedfeatureselectionframeworktoselectriskfactorsofchildhoodobesityforpolicydecisionmaking
AT arruemonica ensemblebasedfeatureselectionframeworktoselectriskfactorsofchildhoodobesityforpolicydecisionmaking
AT bidaurrazagavandierdonckjoseba ensemblebasedfeatureselectionframeworktoselectriskfactorsofchildhoodobesityforpolicydecisionmaking
AT bilbaoroberto ensemblebasedfeatureselectionframeworktoselectriskfactorsofchildhoodobesityforpolicydecisionmaking
AT demoorbart ensemblebasedfeatureselectionframeworktoselectriskfactorsofchildhoodobesityforpolicydecisionmaking