Cargando…

Gradient boosted trees with individual explanations: An alternative to logistic regression for viability prediction in the first trimester of pregnancy

BACKGROUND: Clinical models to predict first trimester viability are traditionally based on multivariable logistic regression (LR) which is not directly interpretable for non-statistical experts like physicians. Furthermore, LR requires complete datasets and pre-established variables specifications....

Descripción completa

Detalles Bibliográficos
Autores principales: Vaulet, Thibaut, Al-Memar, Maya, Fourie, Hanine, Bobdiwala, Shabnam, Saso, Srdjan, Pipi, Maria, Stalder, Catriona, Bennett, Phillip, Timmerman, Dirk, Bourne, Tom, De Moor, Bart
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier Scientific Publishers 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8674730/
https://www.ncbi.nlm.nih.gov/pubmed/34808532
http://dx.doi.org/10.1016/j.cmpb.2021.106520
_version_ 1784615738249576448
author Vaulet, Thibaut
Al-Memar, Maya
Fourie, Hanine
Bobdiwala, Shabnam
Saso, Srdjan
Pipi, Maria
Stalder, Catriona
Bennett, Phillip
Timmerman, Dirk
Bourne, Tom
De Moor, Bart
author_facet Vaulet, Thibaut
Al-Memar, Maya
Fourie, Hanine
Bobdiwala, Shabnam
Saso, Srdjan
Pipi, Maria
Stalder, Catriona
Bennett, Phillip
Timmerman, Dirk
Bourne, Tom
De Moor, Bart
author_sort Vaulet, Thibaut
collection PubMed
description BACKGROUND: Clinical models to predict first trimester viability are traditionally based on multivariable logistic regression (LR) which is not directly interpretable for non-statistical experts like physicians. Furthermore, LR requires complete datasets and pre-established variables specifications. In this study, we leveraged the internal non-linearity, feature selection and missing values handling mechanisms of machine learning algorithms, along with a post-hoc interpretability strategy, as potential advantages over LR for clinical modeling. METHODS: The dataset included 1154 patients with 2377 individual scans and was obtained from a prospective observational cohort study conducted at a hospital in London, UK, from March 2014 to May 2019. The data were split into a training (70%) and a test set (30%). Parsimonious and complete multivariable models were developed from two algorithms to predict first trimester viability at 11–14 weeks gestational age (GA): LR and light gradient boosted machine (LGBM). Missing values were handled by multiple imputation where appropriate. The SHapley Additive exPlanations (SHAP) framework was applied to derive individual explanations of the models. RESULTS: The parsimonious LGBM model had similar discriminative and calibration performance as the parsimonious LR (AUC 0.885 vs 0.860; calibration slope: 1.19 vs 1.18). The complete models did not outperform the parsimonious models. LGBM was robust to the presence of missing values and did not require multiple imputation unlike LR. Decision path plots and feature importance analysis revealed different algorithm behaviors despite similar predictive performance. The main driving variable from the LR model was the pre-specified interaction between fetal heart presence and mean sac diameter. The crown-rump length variable and a proxy variable reflecting the difference in GA between expected and observed GA were the two most important variables of LGBM. Finally, while variable interactions must be specified upfront with LR, several interactions were ranked by the SHAP framework among the most important features learned automatically by the LGBM algorithm. CONCLUSIONS: Gradient boosted algorithms performed similarly to carefully crafted LR models in terms of discrimination and calibration for first trimester viability prediction. By handling multi-collinearity, missing values, feature selection and variable interactions internally, the gradient boosted trees algorithm, combined with SHAP, offers a serious alternative to traditional LR models.
format Online
Article
Text
id pubmed-8674730
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier Scientific Publishers
record_format MEDLINE/PubMed
spelling pubmed-86747302022-01-01 Gradient boosted trees with individual explanations: An alternative to logistic regression for viability prediction in the first trimester of pregnancy Vaulet, Thibaut Al-Memar, Maya Fourie, Hanine Bobdiwala, Shabnam Saso, Srdjan Pipi, Maria Stalder, Catriona Bennett, Phillip Timmerman, Dirk Bourne, Tom De Moor, Bart Comput Methods Programs Biomed Article BACKGROUND: Clinical models to predict first trimester viability are traditionally based on multivariable logistic regression (LR) which is not directly interpretable for non-statistical experts like physicians. Furthermore, LR requires complete datasets and pre-established variables specifications. In this study, we leveraged the internal non-linearity, feature selection and missing values handling mechanisms of machine learning algorithms, along with a post-hoc interpretability strategy, as potential advantages over LR for clinical modeling. METHODS: The dataset included 1154 patients with 2377 individual scans and was obtained from a prospective observational cohort study conducted at a hospital in London, UK, from March 2014 to May 2019. The data were split into a training (70%) and a test set (30%). Parsimonious and complete multivariable models were developed from two algorithms to predict first trimester viability at 11–14 weeks gestational age (GA): LR and light gradient boosted machine (LGBM). Missing values were handled by multiple imputation where appropriate. The SHapley Additive exPlanations (SHAP) framework was applied to derive individual explanations of the models. RESULTS: The parsimonious LGBM model had similar discriminative and calibration performance as the parsimonious LR (AUC 0.885 vs 0.860; calibration slope: 1.19 vs 1.18). The complete models did not outperform the parsimonious models. LGBM was robust to the presence of missing values and did not require multiple imputation unlike LR. Decision path plots and feature importance analysis revealed different algorithm behaviors despite similar predictive performance. The main driving variable from the LR model was the pre-specified interaction between fetal heart presence and mean sac diameter. The crown-rump length variable and a proxy variable reflecting the difference in GA between expected and observed GA were the two most important variables of LGBM. Finally, while variable interactions must be specified upfront with LR, several interactions were ranked by the SHAP framework among the most important features learned automatically by the LGBM algorithm. CONCLUSIONS: Gradient boosted algorithms performed similarly to carefully crafted LR models in terms of discrimination and calibration for first trimester viability prediction. By handling multi-collinearity, missing values, feature selection and variable interactions internally, the gradient boosted trees algorithm, combined with SHAP, offers a serious alternative to traditional LR models. Elsevier Scientific Publishers 2022-01 /pmc/articles/PMC8674730/ /pubmed/34808532 http://dx.doi.org/10.1016/j.cmpb.2021.106520 Text en © 2021 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Article
Vaulet, Thibaut
Al-Memar, Maya
Fourie, Hanine
Bobdiwala, Shabnam
Saso, Srdjan
Pipi, Maria
Stalder, Catriona
Bennett, Phillip
Timmerman, Dirk
Bourne, Tom
De Moor, Bart
Gradient boosted trees with individual explanations: An alternative to logistic regression for viability prediction in the first trimester of pregnancy
title Gradient boosted trees with individual explanations: An alternative to logistic regression for viability prediction in the first trimester of pregnancy
title_full Gradient boosted trees with individual explanations: An alternative to logistic regression for viability prediction in the first trimester of pregnancy
title_fullStr Gradient boosted trees with individual explanations: An alternative to logistic regression for viability prediction in the first trimester of pregnancy
title_full_unstemmed Gradient boosted trees with individual explanations: An alternative to logistic regression for viability prediction in the first trimester of pregnancy
title_short Gradient boosted trees with individual explanations: An alternative to logistic regression for viability prediction in the first trimester of pregnancy
title_sort gradient boosted trees with individual explanations: an alternative to logistic regression for viability prediction in the first trimester of pregnancy
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8674730/
https://www.ncbi.nlm.nih.gov/pubmed/34808532
http://dx.doi.org/10.1016/j.cmpb.2021.106520
work_keys_str_mv AT vauletthibaut gradientboostedtreeswithindividualexplanationsanalternativetologisticregressionforviabilitypredictioninthefirsttrimesterofpregnancy
AT almemarmaya gradientboostedtreeswithindividualexplanationsanalternativetologisticregressionforviabilitypredictioninthefirsttrimesterofpregnancy
AT fouriehanine gradientboostedtreeswithindividualexplanationsanalternativetologisticregressionforviabilitypredictioninthefirsttrimesterofpregnancy
AT bobdiwalashabnam gradientboostedtreeswithindividualexplanationsanalternativetologisticregressionforviabilitypredictioninthefirsttrimesterofpregnancy
AT sasosrdjan gradientboostedtreeswithindividualexplanationsanalternativetologisticregressionforviabilitypredictioninthefirsttrimesterofpregnancy
AT pipimaria gradientboostedtreeswithindividualexplanationsanalternativetologisticregressionforviabilitypredictioninthefirsttrimesterofpregnancy
AT staldercatriona gradientboostedtreeswithindividualexplanationsanalternativetologisticregressionforviabilitypredictioninthefirsttrimesterofpregnancy
AT bennettphillip gradientboostedtreeswithindividualexplanationsanalternativetologisticregressionforviabilitypredictioninthefirsttrimesterofpregnancy
AT timmermandirk gradientboostedtreeswithindividualexplanationsanalternativetologisticregressionforviabilitypredictioninthefirsttrimesterofpregnancy
AT bournetom gradientboostedtreeswithindividualexplanationsanalternativetologisticregressionforviabilitypredictioninthefirsttrimesterofpregnancy
AT demoorbart gradientboostedtreeswithindividualexplanationsanalternativetologisticregressionforviabilitypredictioninthefirsttrimesterofpregnancy