Cargando…

Comparison of Machine Learning Methods and Conventional Logistic Regressions for Predicting Gestational Diabetes Using Routine Clinical Data: A Retrospective Cohort Study

BACKGROUND: Gestational diabetes mellitus (GDM) contributes to adverse pregnancy and birth outcomes. In recent decades, extensive research has been devoted to the early prediction of GDM by various methods. Machine learning methods are flexible prediction algorithms with potential advantages over co...

Descripción completa

Detalles Bibliográficos
Autores principales: Ye, Yunzhen, Xiong, Yu, Zhou, Qiongjie, Wu, Jiangnan, Li, Xiaotian, Xiao, Xirong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7306091/
https://www.ncbi.nlm.nih.gov/pubmed/32626780
http://dx.doi.org/10.1155/2020/4168340
_version_ 1783548591692840960
author Ye, Yunzhen
Xiong, Yu
Zhou, Qiongjie
Wu, Jiangnan
Li, Xiaotian
Xiao, Xirong
author_facet Ye, Yunzhen
Xiong, Yu
Zhou, Qiongjie
Wu, Jiangnan
Li, Xiaotian
Xiao, Xirong
author_sort Ye, Yunzhen
collection PubMed
description BACKGROUND: Gestational diabetes mellitus (GDM) contributes to adverse pregnancy and birth outcomes. In recent decades, extensive research has been devoted to the early prediction of GDM by various methods. Machine learning methods are flexible prediction algorithms with potential advantages over conventional regression. OBJECTIVE: The purpose of this study was to use machine learning methods to predict GDM and compare their performance with that of logistic regressions. METHODS: We performed a retrospective, observational study including women who attended their routine first hospital visits during early pregnancy and had Down's syndrome screening at 16-20 gestational weeks in a tertiary maternity hospital in China from 2013.1.1 to 2017.12.31. A total of 22,242 singleton pregnancies were included, and 3182 (14.31%) women developed GDM. Candidate predictors included maternal demographic characteristics and medical history (maternal factors) and laboratory values at early pregnancy. The models were derived from the first 70% of the data and then validated with the next 30%. Variables were trained in different machine learning models and traditional logistic regression models. Eight common machine learning methods (GDBT, AdaBoost, LGB, Logistic, Vote, XGB, Decision Tree, and Random Forest) and two common regressions (stepwise logistic regression and logistic regression with RCS) were implemented to predict the occurrence of GDM. Models were compared on discrimination and calibration metrics. RESULTS: In the validation dataset, the machine learning and logistic regression models performed moderately (AUC 0.59-0.74). Overall, the GBDT model performed best (AUC 0.74, 95% CI 0.71-0.76) among the machine learning methods, with negligible differences between them. Fasting blood glucose, HbA1c, triglycerides, and BMI strongly contributed to GDM. A cutoff point for the predictive value at 0.3 in the GBDT model had a negative predictive value of 74.1% (95% CI 69.5%-78.2%) and a sensitivity of 90% (95% CI 88.0%-91.7%), and the cutoff point at 0.7 had a positive predictive value of 93.2% (95% CI 88.2%-96.1%) and a specificity of 99% (95% CI 98.2%-99.4%). CONCLUSION: In this study, we found that several machine learning methods did not outperform logistic regression in predicting GDM. We developed a model with cutoff points for risk stratification of GDM.
format Online
Article
Text
id pubmed-7306091
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-73060912020-07-02 Comparison of Machine Learning Methods and Conventional Logistic Regressions for Predicting Gestational Diabetes Using Routine Clinical Data: A Retrospective Cohort Study Ye, Yunzhen Xiong, Yu Zhou, Qiongjie Wu, Jiangnan Li, Xiaotian Xiao, Xirong J Diabetes Res Research Article BACKGROUND: Gestational diabetes mellitus (GDM) contributes to adverse pregnancy and birth outcomes. In recent decades, extensive research has been devoted to the early prediction of GDM by various methods. Machine learning methods are flexible prediction algorithms with potential advantages over conventional regression. OBJECTIVE: The purpose of this study was to use machine learning methods to predict GDM and compare their performance with that of logistic regressions. METHODS: We performed a retrospective, observational study including women who attended their routine first hospital visits during early pregnancy and had Down's syndrome screening at 16-20 gestational weeks in a tertiary maternity hospital in China from 2013.1.1 to 2017.12.31. A total of 22,242 singleton pregnancies were included, and 3182 (14.31%) women developed GDM. Candidate predictors included maternal demographic characteristics and medical history (maternal factors) and laboratory values at early pregnancy. The models were derived from the first 70% of the data and then validated with the next 30%. Variables were trained in different machine learning models and traditional logistic regression models. Eight common machine learning methods (GDBT, AdaBoost, LGB, Logistic, Vote, XGB, Decision Tree, and Random Forest) and two common regressions (stepwise logistic regression and logistic regression with RCS) were implemented to predict the occurrence of GDM. Models were compared on discrimination and calibration metrics. RESULTS: In the validation dataset, the machine learning and logistic regression models performed moderately (AUC 0.59-0.74). Overall, the GBDT model performed best (AUC 0.74, 95% CI 0.71-0.76) among the machine learning methods, with negligible differences between them. Fasting blood glucose, HbA1c, triglycerides, and BMI strongly contributed to GDM. A cutoff point for the predictive value at 0.3 in the GBDT model had a negative predictive value of 74.1% (95% CI 69.5%-78.2%) and a sensitivity of 90% (95% CI 88.0%-91.7%), and the cutoff point at 0.7 had a positive predictive value of 93.2% (95% CI 88.2%-96.1%) and a specificity of 99% (95% CI 98.2%-99.4%). CONCLUSION: In this study, we found that several machine learning methods did not outperform logistic regression in predicting GDM. We developed a model with cutoff points for risk stratification of GDM. Hindawi 2020-06-12 /pmc/articles/PMC7306091/ /pubmed/32626780 http://dx.doi.org/10.1155/2020/4168340 Text en Copyright © 2020 Yunzhen Ye et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Ye, Yunzhen
Xiong, Yu
Zhou, Qiongjie
Wu, Jiangnan
Li, Xiaotian
Xiao, Xirong
Comparison of Machine Learning Methods and Conventional Logistic Regressions for Predicting Gestational Diabetes Using Routine Clinical Data: A Retrospective Cohort Study
title Comparison of Machine Learning Methods and Conventional Logistic Regressions for Predicting Gestational Diabetes Using Routine Clinical Data: A Retrospective Cohort Study
title_full Comparison of Machine Learning Methods and Conventional Logistic Regressions for Predicting Gestational Diabetes Using Routine Clinical Data: A Retrospective Cohort Study
title_fullStr Comparison of Machine Learning Methods and Conventional Logistic Regressions for Predicting Gestational Diabetes Using Routine Clinical Data: A Retrospective Cohort Study
title_full_unstemmed Comparison of Machine Learning Methods and Conventional Logistic Regressions for Predicting Gestational Diabetes Using Routine Clinical Data: A Retrospective Cohort Study
title_short Comparison of Machine Learning Methods and Conventional Logistic Regressions for Predicting Gestational Diabetes Using Routine Clinical Data: A Retrospective Cohort Study
title_sort comparison of machine learning methods and conventional logistic regressions for predicting gestational diabetes using routine clinical data: a retrospective cohort study
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7306091/
https://www.ncbi.nlm.nih.gov/pubmed/32626780
http://dx.doi.org/10.1155/2020/4168340
work_keys_str_mv AT yeyunzhen comparisonofmachinelearningmethodsandconventionallogisticregressionsforpredictinggestationaldiabetesusingroutineclinicaldataaretrospectivecohortstudy
AT xiongyu comparisonofmachinelearningmethodsandconventionallogisticregressionsforpredictinggestationaldiabetesusingroutineclinicaldataaretrospectivecohortstudy
AT zhouqiongjie comparisonofmachinelearningmethodsandconventionallogisticregressionsforpredictinggestationaldiabetesusingroutineclinicaldataaretrospectivecohortstudy
AT wujiangnan comparisonofmachinelearningmethodsandconventionallogisticregressionsforpredictinggestationaldiabetesusingroutineclinicaldataaretrospectivecohortstudy
AT lixiaotian comparisonofmachinelearningmethodsandconventionallogisticregressionsforpredictinggestationaldiabetesusingroutineclinicaldataaretrospectivecohortstudy
AT xiaoxirong comparisonofmachinelearningmethodsandconventionallogisticregressionsforpredictinggestationaldiabetesusingroutineclinicaldataaretrospectivecohortstudy