Cargando…

Use of artificial intelligence for public health surveillance: a case study to develop a machine Learning-algorithm to estimate the incidence of diabetes mellitus in France

BACKGROUND: The use of machine learning techniques is increasing in healthcare which allows to estimate and predict health outcomes from large administrative data sets more efficiently. The main objective of this study was to develop a generic machine learning (ML) algorithm to estimate the incidenc...

Descripción completa

Detalles Bibliográficos
Autores principales:	Haneef, Romana, Kab, Sofiane, Hrzic, Rok, Fuentes, Sonsoles, Fosse-Edorh, Sandrine, Cosson, Emmanuel, Gallay, Anne
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2021
Materias:	Methodology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8456679/ https://www.ncbi.nlm.nih.gov/pubmed/34551816 http://dx.doi.org/10.1186/s13690-021-00687-0

_version_	1784570917413715968
author	Haneef, Romana Kab, Sofiane Hrzic, Rok Fuentes, Sonsoles Fosse-Edorh, Sandrine Cosson, Emmanuel Gallay, Anne
author_facet	Haneef, Romana Kab, Sofiane Hrzic, Rok Fuentes, Sonsoles Fosse-Edorh, Sandrine Cosson, Emmanuel Gallay, Anne
author_sort	Haneef, Romana
collection	PubMed
description	BACKGROUND: The use of machine learning techniques is increasing in healthcare which allows to estimate and predict health outcomes from large administrative data sets more efficiently. The main objective of this study was to develop a generic machine learning (ML) algorithm to estimate the incidence of diabetes based on the number of reimbursements over the last 2 years. METHODS: We selected a final data set from a population-based epidemiological cohort (i.e., CONSTANCES) linked with French National Health Database (i.e., SNDS). To develop this algorithm, we adopted a supervised ML approach. Following steps were performed: i. selection of final data set, ii. target definition, iii. Coding variables for a given window of time, iv. split final data into training and test data sets, v. variables selection, vi. training model, vii. Validation of model with test data set and viii. Selection of the model. We used the area under the receiver operating characteristic curve (AUC) to select the best algorithm. RESULTS: The final data set used to develop the algorithm included 44,659 participants from CONSTANCES. Out of 3468 variables from SNDS linked to CONSTANCES cohort were coded, 23 variables were selected to train different algorithms. The final algorithm to estimate the incidence of diabetes was a Linear Discriminant Analysis model based on number of reimbursements of selected variables related to biological tests, drugs, medical acts and hospitalization without a procedure over the last 2 years. This algorithm has a sensitivity of 62%, a specificity of 67% and an accuracy of 67% [95% CI: 0.66–0.68]. CONCLUSIONS: Supervised ML is an innovative tool for the development of new methods to exploit large health administrative databases. In context of InfAct project, we have developed and applied the first time a generic ML-algorithm to estimate the incidence of diabetes for public health surveillance. The ML-algorithm we have developed, has a moderate performance. The next step is to apply this algorithm on SNDS to estimate the incidence of type 2 diabetes cases. More research is needed to apply various MLTs to estimate the incidence of various health conditions. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13690-021-00687-0.
format	Online Article Text
id	pubmed-8456679
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-84566792021-09-22 Use of artificial intelligence for public health surveillance: a case study to develop a machine Learning-algorithm to estimate the incidence of diabetes mellitus in France Haneef, Romana Kab, Sofiane Hrzic, Rok Fuentes, Sonsoles Fosse-Edorh, Sandrine Cosson, Emmanuel Gallay, Anne Arch Public Health Methodology BACKGROUND: The use of machine learning techniques is increasing in healthcare which allows to estimate and predict health outcomes from large administrative data sets more efficiently. The main objective of this study was to develop a generic machine learning (ML) algorithm to estimate the incidence of diabetes based on the number of reimbursements over the last 2 years. METHODS: We selected a final data set from a population-based epidemiological cohort (i.e., CONSTANCES) linked with French National Health Database (i.e., SNDS). To develop this algorithm, we adopted a supervised ML approach. Following steps were performed: i. selection of final data set, ii. target definition, iii. Coding variables for a given window of time, iv. split final data into training and test data sets, v. variables selection, vi. training model, vii. Validation of model with test data set and viii. Selection of the model. We used the area under the receiver operating characteristic curve (AUC) to select the best algorithm. RESULTS: The final data set used to develop the algorithm included 44,659 participants from CONSTANCES. Out of 3468 variables from SNDS linked to CONSTANCES cohort were coded, 23 variables were selected to train different algorithms. The final algorithm to estimate the incidence of diabetes was a Linear Discriminant Analysis model based on number of reimbursements of selected variables related to biological tests, drugs, medical acts and hospitalization without a procedure over the last 2 years. This algorithm has a sensitivity of 62%, a specificity of 67% and an accuracy of 67% [95% CI: 0.66–0.68]. CONCLUSIONS: Supervised ML is an innovative tool for the development of new methods to exploit large health administrative databases. In context of InfAct project, we have developed and applied the first time a generic ML-algorithm to estimate the incidence of diabetes for public health surveillance. The ML-algorithm we have developed, has a moderate performance. The next step is to apply this algorithm on SNDS to estimate the incidence of type 2 diabetes cases. More research is needed to apply various MLTs to estimate the incidence of various health conditions. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13690-021-00687-0. BioMed Central 2021-09-22 /pmc/articles/PMC8456679/ /pubmed/34551816 http://dx.doi.org/10.1186/s13690-021-00687-0 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Methodology Haneef, Romana Kab, Sofiane Hrzic, Rok Fuentes, Sonsoles Fosse-Edorh, Sandrine Cosson, Emmanuel Gallay, Anne Use of artificial intelligence for public health surveillance: a case study to develop a machine Learning-algorithm to estimate the incidence of diabetes mellitus in France
title	Use of artificial intelligence for public health surveillance: a case study to develop a machine Learning-algorithm to estimate the incidence of diabetes mellitus in France
title_full	Use of artificial intelligence for public health surveillance: a case study to develop a machine Learning-algorithm to estimate the incidence of diabetes mellitus in France
title_fullStr	Use of artificial intelligence for public health surveillance: a case study to develop a machine Learning-algorithm to estimate the incidence of diabetes mellitus in France
title_full_unstemmed	Use of artificial intelligence for public health surveillance: a case study to develop a machine Learning-algorithm to estimate the incidence of diabetes mellitus in France
title_short	Use of artificial intelligence for public health surveillance: a case study to develop a machine Learning-algorithm to estimate the incidence of diabetes mellitus in France
title_sort	use of artificial intelligence for public health surveillance: a case study to develop a machine learning-algorithm to estimate the incidence of diabetes mellitus in france
topic	Methodology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8456679/ https://www.ncbi.nlm.nih.gov/pubmed/34551816 http://dx.doi.org/10.1186/s13690-021-00687-0
work_keys_str_mv	AT haneefromana useofartificialintelligenceforpublichealthsurveillanceacasestudytodevelopamachinelearningalgorithmtoestimatetheincidenceofdiabetesmellitusinfrance AT kabsofiane useofartificialintelligenceforpublichealthsurveillanceacasestudytodevelopamachinelearningalgorithmtoestimatetheincidenceofdiabetesmellitusinfrance AT hrzicrok useofartificialintelligenceforpublichealthsurveillanceacasestudytodevelopamachinelearningalgorithmtoestimatetheincidenceofdiabetesmellitusinfrance AT fuentessonsoles useofartificialintelligenceforpublichealthsurveillanceacasestudytodevelopamachinelearningalgorithmtoestimatetheincidenceofdiabetesmellitusinfrance AT fosseedorhsandrine useofartificialintelligenceforpublichealthsurveillanceacasestudytodevelopamachinelearningalgorithmtoestimatetheincidenceofdiabetesmellitusinfrance AT cossonemmanuel useofartificialintelligenceforpublichealthsurveillanceacasestudytodevelopamachinelearningalgorithmtoestimatetheincidenceofdiabetesmellitusinfrance AT gallayanne useofartificialintelligenceforpublichealthsurveillanceacasestudytodevelopamachinelearningalgorithmtoestimatetheincidenceofdiabetesmellitusinfrance

Use of artificial intelligence for public health surveillance: a case study to develop a machine Learning-algorithm to estimate the incidence of diabetes mellitus in France

Ejemplares similares