Cargando…

Waist circumference prediction for epidemiological research using gradient boosted trees

BACKGROUND: Waist circumference is becoming recognized as a useful predictor of health risks in clinical research. However, clinical datasets tend to lack this measurement and self-reported values tend to be inaccurate. Predicting waist circumference from standard physical features could be a viable...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Weihong, Eckler, Spencer, Barszczyk, Andrew, Waese-Perlman, Alex, Wang, Yingjie, Gu, Xiaoping, Feng, Zhong-Ping, Peng, Yuzhu, Lee, Kang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7944598/
https://www.ncbi.nlm.nih.gov/pubmed/33750311
http://dx.doi.org/10.1186/s12874-021-01242-9
_version_ 1783662704322412544
author Zhou, Weihong
Eckler, Spencer
Barszczyk, Andrew
Waese-Perlman, Alex
Wang, Yingjie
Gu, Xiaoping
Feng, Zhong-Ping
Peng, Yuzhu
Lee, Kang
author_facet Zhou, Weihong
Eckler, Spencer
Barszczyk, Andrew
Waese-Perlman, Alex
Wang, Yingjie
Gu, Xiaoping
Feng, Zhong-Ping
Peng, Yuzhu
Lee, Kang
author_sort Zhou, Weihong
collection PubMed
description BACKGROUND: Waist circumference is becoming recognized as a useful predictor of health risks in clinical research. However, clinical datasets tend to lack this measurement and self-reported values tend to be inaccurate. Predicting waist circumference from standard physical features could be a viable method for generating this information when it is missing or mitigating the impact of inaccurate self-reports. This study determined the degree to which the XGBoost advanced machine learning algorithm could build models that predict waist circumference from height, weight, calculated Body Mass Index, age, race/ethnicity and sex, whether they perform better than current models based on linear regression, and the relative importance of each feature in this prediction. METHODS: We trained tree-based models (via XGBoost gradient boosting) and linear models (via regression) to predict waist circumference from height, weight, Body Mass Index, age, race/ethnicity and sex (n = 60,740 participants). We created 10 iterations of each model, each using 90% of the dataset for training and the remaining 10% for testing performance (this group was different for each iteration). We calculated model performance and feature importance as an average across 10 iterations. We then externally validated the ensembled version of the top model. RESULTS: The XGBoost model predicted waist circumference with a mean bias ± standard deviation of 0.0 ± 0.04 cm and a root mean squared error of 4.7 ± 0.05 cm, with performance varying slightly by sex and race/ethnicity. The XGBoost model showed varying degrees of improvement over linear regression models. The top 3 predictors were Body Mass Index, weight and race (Asian). External validation found that on average this model overestimated waist circumference by 4.65 cm in the United Kingdom population (mainly due to overprediction in females) and underestimated waist circumference by 1.7 cm in the Chinese population. The respective root mean squared errors were 7.7 cm and 7.1 cm. CONCLUSIONS: XGBoost-based models accurately predict waist circumference from standard physical features. Waist circumference prediction using this approach would be valuable for epidemiological research and beyond. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-021-01242-9.
format Online
Article
Text
id pubmed-7944598
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-79445982021-03-10 Waist circumference prediction for epidemiological research using gradient boosted trees Zhou, Weihong Eckler, Spencer Barszczyk, Andrew Waese-Perlman, Alex Wang, Yingjie Gu, Xiaoping Feng, Zhong-Ping Peng, Yuzhu Lee, Kang BMC Med Res Methodol Technical Advance BACKGROUND: Waist circumference is becoming recognized as a useful predictor of health risks in clinical research. However, clinical datasets tend to lack this measurement and self-reported values tend to be inaccurate. Predicting waist circumference from standard physical features could be a viable method for generating this information when it is missing or mitigating the impact of inaccurate self-reports. This study determined the degree to which the XGBoost advanced machine learning algorithm could build models that predict waist circumference from height, weight, calculated Body Mass Index, age, race/ethnicity and sex, whether they perform better than current models based on linear regression, and the relative importance of each feature in this prediction. METHODS: We trained tree-based models (via XGBoost gradient boosting) and linear models (via regression) to predict waist circumference from height, weight, Body Mass Index, age, race/ethnicity and sex (n = 60,740 participants). We created 10 iterations of each model, each using 90% of the dataset for training and the remaining 10% for testing performance (this group was different for each iteration). We calculated model performance and feature importance as an average across 10 iterations. We then externally validated the ensembled version of the top model. RESULTS: The XGBoost model predicted waist circumference with a mean bias ± standard deviation of 0.0 ± 0.04 cm and a root mean squared error of 4.7 ± 0.05 cm, with performance varying slightly by sex and race/ethnicity. The XGBoost model showed varying degrees of improvement over linear regression models. The top 3 predictors were Body Mass Index, weight and race (Asian). External validation found that on average this model overestimated waist circumference by 4.65 cm in the United Kingdom population (mainly due to overprediction in females) and underestimated waist circumference by 1.7 cm in the Chinese population. The respective root mean squared errors were 7.7 cm and 7.1 cm. CONCLUSIONS: XGBoost-based models accurately predict waist circumference from standard physical features. Waist circumference prediction using this approach would be valuable for epidemiological research and beyond. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-021-01242-9. BioMed Central 2021-03-09 /pmc/articles/PMC7944598/ /pubmed/33750311 http://dx.doi.org/10.1186/s12874-021-01242-9 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Technical Advance
Zhou, Weihong
Eckler, Spencer
Barszczyk, Andrew
Waese-Perlman, Alex
Wang, Yingjie
Gu, Xiaoping
Feng, Zhong-Ping
Peng, Yuzhu
Lee, Kang
Waist circumference prediction for epidemiological research using gradient boosted trees
title Waist circumference prediction for epidemiological research using gradient boosted trees
title_full Waist circumference prediction for epidemiological research using gradient boosted trees
title_fullStr Waist circumference prediction for epidemiological research using gradient boosted trees
title_full_unstemmed Waist circumference prediction for epidemiological research using gradient boosted trees
title_short Waist circumference prediction for epidemiological research using gradient boosted trees
title_sort waist circumference prediction for epidemiological research using gradient boosted trees
topic Technical Advance
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7944598/
https://www.ncbi.nlm.nih.gov/pubmed/33750311
http://dx.doi.org/10.1186/s12874-021-01242-9
work_keys_str_mv AT zhouweihong waistcircumferencepredictionforepidemiologicalresearchusinggradientboostedtrees
AT ecklerspencer waistcircumferencepredictionforepidemiologicalresearchusinggradientboostedtrees
AT barszczykandrew waistcircumferencepredictionforepidemiologicalresearchusinggradientboostedtrees
AT waeseperlmanalex waistcircumferencepredictionforepidemiologicalresearchusinggradientboostedtrees
AT wangyingjie waistcircumferencepredictionforepidemiologicalresearchusinggradientboostedtrees
AT guxiaoping waistcircumferencepredictionforepidemiologicalresearchusinggradientboostedtrees
AT fengzhongping waistcircumferencepredictionforepidemiologicalresearchusinggradientboostedtrees
AT pengyuzhu waistcircumferencepredictionforepidemiologicalresearchusinggradientboostedtrees
AT leekang waistcircumferencepredictionforepidemiologicalresearchusinggradientboostedtrees