Cargando…

Data collaboration analysis in predicting diabetes from a small amount of health checkup data

Recent studies showed that machine learning models such as gradient-boosting decision tree (GBDT) can predict diabetes with high accuracy from big data. In this study, we asked whether highly accurate prediction of diabetes is possible even from small data by expanding the amount of data through dat...

Descripción completa

Detalles Bibliográficos
Autores principales:	Uchitachimoto, Go, Sukegawa, Noriyoshi, Kojima, Masayuki, Kagawa, Rina, Oyama, Takashi, Okada, Yukihiko, Imakura, Akira, Sakurai, Tetsuya
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10361975/ https://www.ncbi.nlm.nih.gov/pubmed/37479701 http://dx.doi.org/10.1038/s41598-023-38932-x

_version_	1785076319024840704
author	Uchitachimoto, Go Sukegawa, Noriyoshi Kojima, Masayuki Kagawa, Rina Oyama, Takashi Okada, Yukihiko Imakura, Akira Sakurai, Tetsuya
author_facet	Uchitachimoto, Go Sukegawa, Noriyoshi Kojima, Masayuki Kagawa, Rina Oyama, Takashi Okada, Yukihiko Imakura, Akira Sakurai, Tetsuya
author_sort	Uchitachimoto, Go
collection	PubMed
description	Recent studies showed that machine learning models such as gradient-boosting decision tree (GBDT) can predict diabetes with high accuracy from big data. In this study, we asked whether highly accurate prediction of diabetes is possible even from small data by expanding the amount of data through data collaboration (DC) analysis, a modern framework for integrating and analyzing data accumulated at multiple institutions while ensuring confidentiality. To this end, we focused on data from two institutions: health checkup data of 1502 citizens accumulated in Tsukuba City and health history data of 1399 patients collected at the University of Tsukuba Hospital. When using only the health checkup data, the ROC-AUC and Recall for logistic regression (LR) were 0.858 ± 0.014 and 0.970 ± 0.019, respectively, while those for GBDT were 0.856 ± 0.014 and 0.983 ± 0.016, respectively. When using also the health history data through DC analysis, these values for LR improved to 0.875 ± 0.013 and 0.993 ± 0.009, respectively, while those for GBDT deteriorated because of the low compatibility with a method used for confidential data sharing (although DC analysis brought improvements). Even in a situation where health checkup data of only 324 citizens are available, the ROC-AUC and Recall for LR were 0.767 ± 0.025 and 0.867 ± 0.04, respectively, thanks to DC analysis, indicating an 11% and 12% improvement. Thus, we concluded that the answer to the above question was “Yes” for LR but “No” for GBDT for the data set tested in this study.
format	Online Article Text
id	pubmed-10361975
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-103619752023-07-23 Data collaboration analysis in predicting diabetes from a small amount of health checkup data Uchitachimoto, Go Sukegawa, Noriyoshi Kojima, Masayuki Kagawa, Rina Oyama, Takashi Okada, Yukihiko Imakura, Akira Sakurai, Tetsuya Sci Rep Article Recent studies showed that machine learning models such as gradient-boosting decision tree (GBDT) can predict diabetes with high accuracy from big data. In this study, we asked whether highly accurate prediction of diabetes is possible even from small data by expanding the amount of data through data collaboration (DC) analysis, a modern framework for integrating and analyzing data accumulated at multiple institutions while ensuring confidentiality. To this end, we focused on data from two institutions: health checkup data of 1502 citizens accumulated in Tsukuba City and health history data of 1399 patients collected at the University of Tsukuba Hospital. When using only the health checkup data, the ROC-AUC and Recall for logistic regression (LR) were 0.858 ± 0.014 and 0.970 ± 0.019, respectively, while those for GBDT were 0.856 ± 0.014 and 0.983 ± 0.016, respectively. When using also the health history data through DC analysis, these values for LR improved to 0.875 ± 0.013 and 0.993 ± 0.009, respectively, while those for GBDT deteriorated because of the low compatibility with a method used for confidential data sharing (although DC analysis brought improvements). Even in a situation where health checkup data of only 324 citizens are available, the ROC-AUC and Recall for LR were 0.767 ± 0.025 and 0.867 ± 0.04, respectively, thanks to DC analysis, indicating an 11% and 12% improvement. Thus, we concluded that the answer to the above question was “Yes” for LR but “No” for GBDT for the data set tested in this study. Nature Publishing Group UK 2023-07-21 /pmc/articles/PMC10361975/ /pubmed/37479701 http://dx.doi.org/10.1038/s41598-023-38932-x Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Uchitachimoto, Go Sukegawa, Noriyoshi Kojima, Masayuki Kagawa, Rina Oyama, Takashi Okada, Yukihiko Imakura, Akira Sakurai, Tetsuya Data collaboration analysis in predicting diabetes from a small amount of health checkup data
title	Data collaboration analysis in predicting diabetes from a small amount of health checkup data
title_full	Data collaboration analysis in predicting diabetes from a small amount of health checkup data
title_fullStr	Data collaboration analysis in predicting diabetes from a small amount of health checkup data
title_full_unstemmed	Data collaboration analysis in predicting diabetes from a small amount of health checkup data
title_short	Data collaboration analysis in predicting diabetes from a small amount of health checkup data
title_sort	data collaboration analysis in predicting diabetes from a small amount of health checkup data
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10361975/ https://www.ncbi.nlm.nih.gov/pubmed/37479701 http://dx.doi.org/10.1038/s41598-023-38932-x
work_keys_str_mv	AT uchitachimotogo datacollaborationanalysisinpredictingdiabetesfromasmallamountofhealthcheckupdata AT sukegawanoriyoshi datacollaborationanalysisinpredictingdiabetesfromasmallamountofhealthcheckupdata AT kojimamasayuki datacollaborationanalysisinpredictingdiabetesfromasmallamountofhealthcheckupdata AT kagawarina datacollaborationanalysisinpredictingdiabetesfromasmallamountofhealthcheckupdata AT oyamatakashi datacollaborationanalysisinpredictingdiabetesfromasmallamountofhealthcheckupdata AT okadayukihiko datacollaborationanalysisinpredictingdiabetesfromasmallamountofhealthcheckupdata AT imakuraakira datacollaborationanalysisinpredictingdiabetesfromasmallamountofhealthcheckupdata AT sakuraitetsuya datacollaborationanalysisinpredictingdiabetesfromasmallamountofhealthcheckupdata

Data collaboration analysis in predicting diabetes from a small amount of health checkup data

Ejemplares similares