Cargando…

Development of machine learning models to predict gestational diabetes risk in the first half of pregnancy

BACKGROUND: Early prediction of Gestational Diabetes Mellitus (GDM) risk is of particular importance as it may enable more efficacious interventions and reduce cumulative injury to mother and fetus. The aim of this study is to develop machine learning (ML) models, for the early prediction of GDM usi...

Descripción completa

Detalles Bibliográficos
Autores principales: Cubillos, Gabriel, Monckeberg, Max, Plaza, Alejandra, Morgan, Maria, Estevez, Pablo A., Choolani, Mahesh, Kemp, Matthew W., Illanes, Sebastian E., Perez, Claudio A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10288662/
https://www.ncbi.nlm.nih.gov/pubmed/37353749
http://dx.doi.org/10.1186/s12884-023-05766-4
_version_ 1785062116013637632
author Cubillos, Gabriel
Monckeberg, Max
Plaza, Alejandra
Morgan, Maria
Estevez, Pablo A.
Choolani, Mahesh
Kemp, Matthew W.
Illanes, Sebastian E.
Perez, Claudio A.
author_facet Cubillos, Gabriel
Monckeberg, Max
Plaza, Alejandra
Morgan, Maria
Estevez, Pablo A.
Choolani, Mahesh
Kemp, Matthew W.
Illanes, Sebastian E.
Perez, Claudio A.
author_sort Cubillos, Gabriel
collection PubMed
description BACKGROUND: Early prediction of Gestational Diabetes Mellitus (GDM) risk is of particular importance as it may enable more efficacious interventions and reduce cumulative injury to mother and fetus. The aim of this study is to develop machine learning (ML) models, for the early prediction of GDM using widely available variables, facilitating early intervention, and making possible to apply the prediction models in places where there is no access to more complex examinations. METHODS: The dataset used in this study includes registries from 1,611 pregnancies. Twelve different ML models and their hyperparameters were optimized to achieve early and high prediction performance of GDM. A data augmentation method was used in training to improve prediction results. Three methods were used to select the most relevant variables for GDM prediction. After training, the models ranked with the highest Area under the Receiver Operating Characteristic Curve (AUCROC), were assessed on the validation set. Models with the best results were assessed in the test set as a measure of generalization performance. RESULTS: Our method allows identifying many possible models for various levels of sensitivity and specificity. Four models achieved a high sensitivity of 0.82, a specificity in the range 0.72–0.74, accuracy between 0.73–0.75, and AUCROC of 0.81. These models required between 7 and 12 input variables. Another possible choice could be a model with sensitivity of 0.89 that requires just 5 variables reaching an accuracy of 0.65, a specificity of 0.62, and AUCROC of 0.82. CONCLUSIONS: The principal findings of our study are: Early prediction of GDM within early stages of pregnancy using regular examinations/exams; the development and optimization of twelve different ML models and their hyperparameters to achieve the highest prediction performance; a novel data augmentation method is proposed to allow reaching excellent GDM prediction results with various models. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12884-023-05766-4.
format Online
Article
Text
id pubmed-10288662
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-102886622023-06-24 Development of machine learning models to predict gestational diabetes risk in the first half of pregnancy Cubillos, Gabriel Monckeberg, Max Plaza, Alejandra Morgan, Maria Estevez, Pablo A. Choolani, Mahesh Kemp, Matthew W. Illanes, Sebastian E. Perez, Claudio A. BMC Pregnancy Childbirth Research BACKGROUND: Early prediction of Gestational Diabetes Mellitus (GDM) risk is of particular importance as it may enable more efficacious interventions and reduce cumulative injury to mother and fetus. The aim of this study is to develop machine learning (ML) models, for the early prediction of GDM using widely available variables, facilitating early intervention, and making possible to apply the prediction models in places where there is no access to more complex examinations. METHODS: The dataset used in this study includes registries from 1,611 pregnancies. Twelve different ML models and their hyperparameters were optimized to achieve early and high prediction performance of GDM. A data augmentation method was used in training to improve prediction results. Three methods were used to select the most relevant variables for GDM prediction. After training, the models ranked with the highest Area under the Receiver Operating Characteristic Curve (AUCROC), were assessed on the validation set. Models with the best results were assessed in the test set as a measure of generalization performance. RESULTS: Our method allows identifying many possible models for various levels of sensitivity and specificity. Four models achieved a high sensitivity of 0.82, a specificity in the range 0.72–0.74, accuracy between 0.73–0.75, and AUCROC of 0.81. These models required between 7 and 12 input variables. Another possible choice could be a model with sensitivity of 0.89 that requires just 5 variables reaching an accuracy of 0.65, a specificity of 0.62, and AUCROC of 0.82. CONCLUSIONS: The principal findings of our study are: Early prediction of GDM within early stages of pregnancy using regular examinations/exams; the development and optimization of twelve different ML models and their hyperparameters to achieve the highest prediction performance; a novel data augmentation method is proposed to allow reaching excellent GDM prediction results with various models. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12884-023-05766-4. BioMed Central 2023-06-23 /pmc/articles/PMC10288662/ /pubmed/37353749 http://dx.doi.org/10.1186/s12884-023-05766-4 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Cubillos, Gabriel
Monckeberg, Max
Plaza, Alejandra
Morgan, Maria
Estevez, Pablo A.
Choolani, Mahesh
Kemp, Matthew W.
Illanes, Sebastian E.
Perez, Claudio A.
Development of machine learning models to predict gestational diabetes risk in the first half of pregnancy
title Development of machine learning models to predict gestational diabetes risk in the first half of pregnancy
title_full Development of machine learning models to predict gestational diabetes risk in the first half of pregnancy
title_fullStr Development of machine learning models to predict gestational diabetes risk in the first half of pregnancy
title_full_unstemmed Development of machine learning models to predict gestational diabetes risk in the first half of pregnancy
title_short Development of machine learning models to predict gestational diabetes risk in the first half of pregnancy
title_sort development of machine learning models to predict gestational diabetes risk in the first half of pregnancy
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10288662/
https://www.ncbi.nlm.nih.gov/pubmed/37353749
http://dx.doi.org/10.1186/s12884-023-05766-4
work_keys_str_mv AT cubillosgabriel developmentofmachinelearningmodelstopredictgestationaldiabetesriskinthefirsthalfofpregnancy
AT monckebergmax developmentofmachinelearningmodelstopredictgestationaldiabetesriskinthefirsthalfofpregnancy
AT plazaalejandra developmentofmachinelearningmodelstopredictgestationaldiabetesriskinthefirsthalfofpregnancy
AT morganmaria developmentofmachinelearningmodelstopredictgestationaldiabetesriskinthefirsthalfofpregnancy
AT estevezpabloa developmentofmachinelearningmodelstopredictgestationaldiabetesriskinthefirsthalfofpregnancy
AT choolanimahesh developmentofmachinelearningmodelstopredictgestationaldiabetesriskinthefirsthalfofpregnancy
AT kempmattheww developmentofmachinelearningmodelstopredictgestationaldiabetesriskinthefirsthalfofpregnancy
AT illanessebastiane developmentofmachinelearningmodelstopredictgestationaldiabetesriskinthefirsthalfofpregnancy
AT perezclaudioa developmentofmachinelearningmodelstopredictgestationaldiabetesriskinthefirsthalfofpregnancy