Cargando…

Improving an Electronic Health Record–Based Clinical Prediction Model Under Label Deficiency: Network-Based Generative Adversarial Semisupervised Approach

BACKGROUND: Observational biomedical studies facilitate a new strategy for large-scale electronic health record (EHR) utilization to support precision medicine. However, data label inaccessibility is an increasingly important issue in clinical prediction, despite the use of synthetic and semisupervi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Runze, Tian, Yu, Shen, Zhuyi, Li, Jin, Li, Jun, Ding, Kefeng, Li, Jingsong
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2023
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10337516/ https://www.ncbi.nlm.nih.gov/pubmed/37310778 http://dx.doi.org/10.2196/47862

_version_	1785071442141904896
author	Li, Runze Tian, Yu Shen, Zhuyi Li, Jin Li, Jun Ding, Kefeng Li, Jingsong
author_facet	Li, Runze Tian, Yu Shen, Zhuyi Li, Jin Li, Jun Ding, Kefeng Li, Jingsong
author_sort	Li, Runze
collection	PubMed
description	BACKGROUND: Observational biomedical studies facilitate a new strategy for large-scale electronic health record (EHR) utilization to support precision medicine. However, data label inaccessibility is an increasingly important issue in clinical prediction, despite the use of synthetic and semisupervised learning from data. Little research has aimed to uncover the underlying graphical structure of EHRs. OBJECTIVE: A network-based generative adversarial semisupervised method is proposed. The objective is to train clinical prediction models on label-deficient EHRs to achieve comparable learning performance to supervised methods. METHODS: Three public data sets and one colorectal cancer data set gathered from the Second Affiliated Hospital of Zhejiang University were selected as benchmarks. The proposed models were trained on 5% to 25% labeled data and evaluated on classification metrics against conventional semisupervised and supervised methods. The data quality, model security, and memory scalability were also evaluated. RESULTS: The proposed method for semisupervised classification outperforms related semisupervised methods under the same setup, with the average area under the receiver operating characteristics curve (AUC) reaching 0.945, 0.673, 0.611, and 0.588 for the four data sets, respectively, followed by graph-based semisupervised learning (0.450, 0.454, 0.425, and 0.5676, respectively) and label propagation (0.475,0.344, 0.440, and 0.477, respectively). The average classification AUCs with 10% labeled data were 0.929, 0.719, 0.652, and 0.650, respectively, comparable to that of the supervised learning methods logistic regression (0.601, 0.670, 0.731, and 0.710, respectively), support vector machines (0.733, 0.720, 0.720, and 0.721, respectively), and random forests (0.982, 0.750, 0.758, and 0.740, respectively). The concerns regarding the secondary use of data and data security are alleviated by realistic data synthesis and robust privacy preservation. CONCLUSIONS: Training clinical prediction models on label-deficient EHRs is indispensable in data-driven research. The proposed method has great potential to exploit the intrinsic structure of EHRs and achieve comparable learning performance to supervised methods.
format	Online Article Text
id	pubmed-10337516
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-103375162023-07-13 Improving an Electronic Health Record–Based Clinical Prediction Model Under Label Deficiency: Network-Based Generative Adversarial Semisupervised Approach Li, Runze Tian, Yu Shen, Zhuyi Li, Jin Li, Jun Ding, Kefeng Li, Jingsong JMIR Med Inform Original Paper BACKGROUND: Observational biomedical studies facilitate a new strategy for large-scale electronic health record (EHR) utilization to support precision medicine. However, data label inaccessibility is an increasingly important issue in clinical prediction, despite the use of synthetic and semisupervised learning from data. Little research has aimed to uncover the underlying graphical structure of EHRs. OBJECTIVE: A network-based generative adversarial semisupervised method is proposed. The objective is to train clinical prediction models on label-deficient EHRs to achieve comparable learning performance to supervised methods. METHODS: Three public data sets and one colorectal cancer data set gathered from the Second Affiliated Hospital of Zhejiang University were selected as benchmarks. The proposed models were trained on 5% to 25% labeled data and evaluated on classification metrics against conventional semisupervised and supervised methods. The data quality, model security, and memory scalability were also evaluated. RESULTS: The proposed method for semisupervised classification outperforms related semisupervised methods under the same setup, with the average area under the receiver operating characteristics curve (AUC) reaching 0.945, 0.673, 0.611, and 0.588 for the four data sets, respectively, followed by graph-based semisupervised learning (0.450, 0.454, 0.425, and 0.5676, respectively) and label propagation (0.475,0.344, 0.440, and 0.477, respectively). The average classification AUCs with 10% labeled data were 0.929, 0.719, 0.652, and 0.650, respectively, comparable to that of the supervised learning methods logistic regression (0.601, 0.670, 0.731, and 0.710, respectively), support vector machines (0.733, 0.720, 0.720, and 0.721, respectively), and random forests (0.982, 0.750, 0.758, and 0.740, respectively). The concerns regarding the secondary use of data and data security are alleviated by realistic data synthesis and robust privacy preservation. CONCLUSIONS: Training clinical prediction models on label-deficient EHRs is indispensable in data-driven research. The proposed method has great potential to exploit the intrinsic structure of EHRs and achieve comparable learning performance to supervised methods. JMIR Publications 2023-06-13 /pmc/articles/PMC10337516/ /pubmed/37310778 http://dx.doi.org/10.2196/47862 Text en ©Runze Li, Yu Tian, Zhuyi Shen, Jin Li, Jun Li, Kefeng Ding, Jingsong Li. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 13.06.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Li, Runze Tian, Yu Shen, Zhuyi Li, Jin Li, Jun Ding, Kefeng Li, Jingsong Improving an Electronic Health Record–Based Clinical Prediction Model Under Label Deficiency: Network-Based Generative Adversarial Semisupervised Approach
title	Improving an Electronic Health Record–Based Clinical Prediction Model Under Label Deficiency: Network-Based Generative Adversarial Semisupervised Approach
title_full	Improving an Electronic Health Record–Based Clinical Prediction Model Under Label Deficiency: Network-Based Generative Adversarial Semisupervised Approach
title_fullStr	Improving an Electronic Health Record–Based Clinical Prediction Model Under Label Deficiency: Network-Based Generative Adversarial Semisupervised Approach
title_full_unstemmed	Improving an Electronic Health Record–Based Clinical Prediction Model Under Label Deficiency: Network-Based Generative Adversarial Semisupervised Approach
title_short	Improving an Electronic Health Record–Based Clinical Prediction Model Under Label Deficiency: Network-Based Generative Adversarial Semisupervised Approach
title_sort	improving an electronic health record–based clinical prediction model under label deficiency: network-based generative adversarial semisupervised approach
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10337516/ https://www.ncbi.nlm.nih.gov/pubmed/37310778 http://dx.doi.org/10.2196/47862
work_keys_str_mv	AT lirunze improvinganelectronichealthrecordbasedclinicalpredictionmodelunderlabeldeficiencynetworkbasedgenerativeadversarialsemisupervisedapproach AT tianyu improvinganelectronichealthrecordbasedclinicalpredictionmodelunderlabeldeficiencynetworkbasedgenerativeadversarialsemisupervisedapproach AT shenzhuyi improvinganelectronichealthrecordbasedclinicalpredictionmodelunderlabeldeficiencynetworkbasedgenerativeadversarialsemisupervisedapproach AT lijin improvinganelectronichealthrecordbasedclinicalpredictionmodelunderlabeldeficiencynetworkbasedgenerativeadversarialsemisupervisedapproach AT lijun improvinganelectronichealthrecordbasedclinicalpredictionmodelunderlabeldeficiencynetworkbasedgenerativeadversarialsemisupervisedapproach AT dingkefeng improvinganelectronichealthrecordbasedclinicalpredictionmodelunderlabeldeficiencynetworkbasedgenerativeadversarialsemisupervisedapproach AT lijingsong improvinganelectronichealthrecordbasedclinicalpredictionmodelunderlabeldeficiencynetworkbasedgenerativeadversarialsemisupervisedapproach

Improving an Electronic Health Record–Based Clinical Prediction Model Under Label Deficiency: Network-Based Generative Adversarial Semisupervised Approach

Ejemplares similares