Cargando…

Improving an Electronic Health Record–Based Clinical Prediction Model Under Label Deficiency: Network-Based Generative Adversarial Semisupervised Approach

BACKGROUND: Observational biomedical studies facilitate a new strategy for large-scale electronic health record (EHR) utilization to support precision medicine. However, data label inaccessibility is an increasingly important issue in clinical prediction, despite the use of synthetic and semisupervi...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Runze, Tian, Yu, Shen, Zhuyi, Li, Jin, Li, Jun, Ding, Kefeng, Li, Jingsong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10337516/
https://www.ncbi.nlm.nih.gov/pubmed/37310778
http://dx.doi.org/10.2196/47862
_version_ 1785071442141904896
author Li, Runze
Tian, Yu
Shen, Zhuyi
Li, Jin
Li, Jun
Ding, Kefeng
Li, Jingsong
author_facet Li, Runze
Tian, Yu
Shen, Zhuyi
Li, Jin
Li, Jun
Ding, Kefeng
Li, Jingsong
author_sort Li, Runze
collection PubMed
description BACKGROUND: Observational biomedical studies facilitate a new strategy for large-scale electronic health record (EHR) utilization to support precision medicine. However, data label inaccessibility is an increasingly important issue in clinical prediction, despite the use of synthetic and semisupervised learning from data. Little research has aimed to uncover the underlying graphical structure of EHRs. OBJECTIVE: A network-based generative adversarial semisupervised method is proposed. The objective is to train clinical prediction models on label-deficient EHRs to achieve comparable learning performance to supervised methods. METHODS: Three public data sets and one colorectal cancer data set gathered from the Second Affiliated Hospital of Zhejiang University were selected as benchmarks. The proposed models were trained on 5% to 25% labeled data and evaluated on classification metrics against conventional semisupervised and supervised methods. The data quality, model security, and memory scalability were also evaluated. RESULTS: The proposed method for semisupervised classification outperforms related semisupervised methods under the same setup, with the average area under the receiver operating characteristics curve (AUC) reaching 0.945, 0.673, 0.611, and 0.588 for the four data sets, respectively, followed by graph-based semisupervised learning (0.450, 0.454, 0.425, and 0.5676, respectively) and label propagation (0.475,0.344, 0.440, and 0.477, respectively). The average classification AUCs with 10% labeled data were 0.929, 0.719, 0.652, and 0.650, respectively, comparable to that of the supervised learning methods logistic regression (0.601, 0.670, 0.731, and 0.710, respectively), support vector machines (0.733, 0.720, 0.720, and 0.721, respectively), and random forests (0.982, 0.750, 0.758, and 0.740, respectively). The concerns regarding the secondary use of data and data security are alleviated by realistic data synthesis and robust privacy preservation. CONCLUSIONS: Training clinical prediction models on label-deficient EHRs is indispensable in data-driven research. The proposed method has great potential to exploit the intrinsic structure of EHRs and achieve comparable learning performance to supervised methods.
format Online
Article
Text
id pubmed-10337516
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-103375162023-07-13 Improving an Electronic Health Record–Based Clinical Prediction Model Under Label Deficiency: Network-Based Generative Adversarial Semisupervised Approach Li, Runze Tian, Yu Shen, Zhuyi Li, Jin Li, Jun Ding, Kefeng Li, Jingsong JMIR Med Inform Original Paper BACKGROUND: Observational biomedical studies facilitate a new strategy for large-scale electronic health record (EHR) utilization to support precision medicine. However, data label inaccessibility is an increasingly important issue in clinical prediction, despite the use of synthetic and semisupervised learning from data. Little research has aimed to uncover the underlying graphical structure of EHRs. OBJECTIVE: A network-based generative adversarial semisupervised method is proposed. The objective is to train clinical prediction models on label-deficient EHRs to achieve comparable learning performance to supervised methods. METHODS: Three public data sets and one colorectal cancer data set gathered from the Second Affiliated Hospital of Zhejiang University were selected as benchmarks. The proposed models were trained on 5% to 25% labeled data and evaluated on classification metrics against conventional semisupervised and supervised methods. The data quality, model security, and memory scalability were also evaluated. RESULTS: The proposed method for semisupervised classification outperforms related semisupervised methods under the same setup, with the average area under the receiver operating characteristics curve (AUC) reaching 0.945, 0.673, 0.611, and 0.588 for the four data sets, respectively, followed by graph-based semisupervised learning (0.450, 0.454, 0.425, and 0.5676, respectively) and label propagation (0.475,0.344, 0.440, and 0.477, respectively). The average classification AUCs with 10% labeled data were 0.929, 0.719, 0.652, and 0.650, respectively, comparable to that of the supervised learning methods logistic regression (0.601, 0.670, 0.731, and 0.710, respectively), support vector machines (0.733, 0.720, 0.720, and 0.721, respectively), and random forests (0.982, 0.750, 0.758, and 0.740, respectively). The concerns regarding the secondary use of data and data security are alleviated by realistic data synthesis and robust privacy preservation. CONCLUSIONS: Training clinical prediction models on label-deficient EHRs is indispensable in data-driven research. The proposed method has great potential to exploit the intrinsic structure of EHRs and achieve comparable learning performance to supervised methods. JMIR Publications 2023-06-13 /pmc/articles/PMC10337516/ /pubmed/37310778 http://dx.doi.org/10.2196/47862 Text en ©Runze Li, Yu Tian, Zhuyi Shen, Jin Li, Jun Li, Kefeng Ding, Jingsong Li. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 13.06.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Li, Runze
Tian, Yu
Shen, Zhuyi
Li, Jin
Li, Jun
Ding, Kefeng
Li, Jingsong
Improving an Electronic Health Record–Based Clinical Prediction Model Under Label Deficiency: Network-Based Generative Adversarial Semisupervised Approach
title Improving an Electronic Health Record–Based Clinical Prediction Model Under Label Deficiency: Network-Based Generative Adversarial Semisupervised Approach
title_full Improving an Electronic Health Record–Based Clinical Prediction Model Under Label Deficiency: Network-Based Generative Adversarial Semisupervised Approach
title_fullStr Improving an Electronic Health Record–Based Clinical Prediction Model Under Label Deficiency: Network-Based Generative Adversarial Semisupervised Approach
title_full_unstemmed Improving an Electronic Health Record–Based Clinical Prediction Model Under Label Deficiency: Network-Based Generative Adversarial Semisupervised Approach
title_short Improving an Electronic Health Record–Based Clinical Prediction Model Under Label Deficiency: Network-Based Generative Adversarial Semisupervised Approach
title_sort improving an electronic health record–based clinical prediction model under label deficiency: network-based generative adversarial semisupervised approach
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10337516/
https://www.ncbi.nlm.nih.gov/pubmed/37310778
http://dx.doi.org/10.2196/47862
work_keys_str_mv AT lirunze improvinganelectronichealthrecordbasedclinicalpredictionmodelunderlabeldeficiencynetworkbasedgenerativeadversarialsemisupervisedapproach
AT tianyu improvinganelectronichealthrecordbasedclinicalpredictionmodelunderlabeldeficiencynetworkbasedgenerativeadversarialsemisupervisedapproach
AT shenzhuyi improvinganelectronichealthrecordbasedclinicalpredictionmodelunderlabeldeficiencynetworkbasedgenerativeadversarialsemisupervisedapproach
AT lijin improvinganelectronichealthrecordbasedclinicalpredictionmodelunderlabeldeficiencynetworkbasedgenerativeadversarialsemisupervisedapproach
AT lijun improvinganelectronichealthrecordbasedclinicalpredictionmodelunderlabeldeficiencynetworkbasedgenerativeadversarialsemisupervisedapproach
AT dingkefeng improvinganelectronichealthrecordbasedclinicalpredictionmodelunderlabeldeficiencynetworkbasedgenerativeadversarialsemisupervisedapproach
AT lijingsong improvinganelectronichealthrecordbasedclinicalpredictionmodelunderlabeldeficiencynetworkbasedgenerativeadversarialsemisupervisedapproach