Cargando…
Predicting Current Glycated Hemoglobin Levels in Adults From Electronic Health Records: Validation of Multiple Logistic Regression Algorithm
BACKGROUND: Electronic health record (EHR) systems generate large datasets that can significantly enrich the development of medical predictive models. Several attempts have been made to investigate the effect of glycated hemoglobin (HbA(1c)) elevation on the prediction of diabetes onset. However, th...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
JMIR Publications
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7367516/ https://www.ncbi.nlm.nih.gov/pubmed/32618575 http://dx.doi.org/10.2196/18963 |
_version_ | 1783560436315062272 |
---|---|
author | Alhassan, Zakhriya Budgen, David Alshammari, Riyad Al Moubayed, Noura |
author_facet | Alhassan, Zakhriya Budgen, David Alshammari, Riyad Al Moubayed, Noura |
author_sort | Alhassan, Zakhriya |
collection | PubMed |
description | BACKGROUND: Electronic health record (EHR) systems generate large datasets that can significantly enrich the development of medical predictive models. Several attempts have been made to investigate the effect of glycated hemoglobin (HbA(1c)) elevation on the prediction of diabetes onset. However, there is still a need for validation of these models using EHR data collected from different populations. OBJECTIVE: The aim of this study is to perform a replication study to validate, evaluate, and identify the strengths and weaknesses of replicating a predictive model that employed multiple logistic regression with EHR data to forecast the levels of HbA(1c). The original study used data from a population in the United States and this differentiated replication used a population in Saudi Arabia. METHODS: A total of 3 models were developed and compared with the model created in the original study. The models were trained and tested using a larger dataset from Saudi Arabia with 36,378 records. The 10-fold cross-validation approach was used for measuring the performance of the models. RESULTS: Applying the method employed in the original study achieved an accuracy of 74% to 75% when using the dataset collected from Saudi Arabia, compared with 77% obtained from using the population from the United States. The results also show a different ranking of importance for the predictors between the original study and the replication. The order of importance for the predictors with our population, from the most to the least importance, is age, random blood sugar, estimated glomerular filtration rate, total cholesterol, non–high-density lipoprotein, and body mass index. CONCLUSIONS: This replication study shows that direct use of the models (calculators) created using multiple logistic regression to predict the level of HbA(1c) may not be appropriate for all populations. This study reveals that the weighting of the predictors needs to be calibrated to the population used. However, the study does confirm that replicating the original study using a different population can help with predicting the levels of HbA(1c) by using the predictors that are routinely collected and stored in hospital EHR systems. |
format | Online Article Text |
id | pubmed-7367516 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | JMIR Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-73675162020-08-07 Predicting Current Glycated Hemoglobin Levels in Adults From Electronic Health Records: Validation of Multiple Logistic Regression Algorithm Alhassan, Zakhriya Budgen, David Alshammari, Riyad Al Moubayed, Noura JMIR Med Inform Original Paper BACKGROUND: Electronic health record (EHR) systems generate large datasets that can significantly enrich the development of medical predictive models. Several attempts have been made to investigate the effect of glycated hemoglobin (HbA(1c)) elevation on the prediction of diabetes onset. However, there is still a need for validation of these models using EHR data collected from different populations. OBJECTIVE: The aim of this study is to perform a replication study to validate, evaluate, and identify the strengths and weaknesses of replicating a predictive model that employed multiple logistic regression with EHR data to forecast the levels of HbA(1c). The original study used data from a population in the United States and this differentiated replication used a population in Saudi Arabia. METHODS: A total of 3 models were developed and compared with the model created in the original study. The models were trained and tested using a larger dataset from Saudi Arabia with 36,378 records. The 10-fold cross-validation approach was used for measuring the performance of the models. RESULTS: Applying the method employed in the original study achieved an accuracy of 74% to 75% when using the dataset collected from Saudi Arabia, compared with 77% obtained from using the population from the United States. The results also show a different ranking of importance for the predictors between the original study and the replication. The order of importance for the predictors with our population, from the most to the least importance, is age, random blood sugar, estimated glomerular filtration rate, total cholesterol, non–high-density lipoprotein, and body mass index. CONCLUSIONS: This replication study shows that direct use of the models (calculators) created using multiple logistic regression to predict the level of HbA(1c) may not be appropriate for all populations. This study reveals that the weighting of the predictors needs to be calibrated to the population used. However, the study does confirm that replicating the original study using a different population can help with predicting the levels of HbA(1c) by using the predictors that are routinely collected and stored in hospital EHR systems. JMIR Publications 2020-07-03 /pmc/articles/PMC7367516/ /pubmed/32618575 http://dx.doi.org/10.2196/18963 Text en ©Zakhriya Alhassan, David Budgen, Riyad Alshammari, Noura Al Moubayed. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 03.07.2020. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included. |
spellingShingle | Original Paper Alhassan, Zakhriya Budgen, David Alshammari, Riyad Al Moubayed, Noura Predicting Current Glycated Hemoglobin Levels in Adults From Electronic Health Records: Validation of Multiple Logistic Regression Algorithm |
title | Predicting Current Glycated Hemoglobin Levels in Adults From Electronic Health Records: Validation of Multiple Logistic Regression Algorithm |
title_full | Predicting Current Glycated Hemoglobin Levels in Adults From Electronic Health Records: Validation of Multiple Logistic Regression Algorithm |
title_fullStr | Predicting Current Glycated Hemoglobin Levels in Adults From Electronic Health Records: Validation of Multiple Logistic Regression Algorithm |
title_full_unstemmed | Predicting Current Glycated Hemoglobin Levels in Adults From Electronic Health Records: Validation of Multiple Logistic Regression Algorithm |
title_short | Predicting Current Glycated Hemoglobin Levels in Adults From Electronic Health Records: Validation of Multiple Logistic Regression Algorithm |
title_sort | predicting current glycated hemoglobin levels in adults from electronic health records: validation of multiple logistic regression algorithm |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7367516/ https://www.ncbi.nlm.nih.gov/pubmed/32618575 http://dx.doi.org/10.2196/18963 |
work_keys_str_mv | AT alhassanzakhriya predictingcurrentglycatedhemoglobinlevelsinadultsfromelectronichealthrecordsvalidationofmultiplelogisticregressionalgorithm AT budgendavid predictingcurrentglycatedhemoglobinlevelsinadultsfromelectronichealthrecordsvalidationofmultiplelogisticregressionalgorithm AT alshammaririyad predictingcurrentglycatedhemoglobinlevelsinadultsfromelectronichealthrecordsvalidationofmultiplelogisticregressionalgorithm AT almoubayednoura predictingcurrentglycatedhemoglobinlevelsinadultsfromelectronichealthrecordsvalidationofmultiplelogisticregressionalgorithm |