Cargando…

Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records

BACKGROUND: A new learning-based patient similarity measurement was proposed to measure patients’ similarity for heterogeneous electronic medical records (EMRs) data. METHODS: We first calculated feature-level similarities according to the features’ attributes. A domain expert provided patient simil...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Ni, Huang, Yanqun, Liu, Honglei, Zhang, Zhiqiang, Wei, Lan, Fei, Xiaolu, Chen, Hui
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2021
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8323210/ https://www.ncbi.nlm.nih.gov/pubmed/34330261 http://dx.doi.org/10.1186/s12911-021-01432-x

_version_	1783731196716384256
author	Wang, Ni Huang, Yanqun Liu, Honglei Zhang, Zhiqiang Wei, Lan Fei, Xiaolu Chen, Hui
author_facet	Wang, Ni Huang, Yanqun Liu, Honglei Zhang, Zhiqiang Wei, Lan Fei, Xiaolu Chen, Hui
author_sort	Wang, Ni
collection	PubMed
description	BACKGROUND: A new learning-based patient similarity measurement was proposed to measure patients’ similarity for heterogeneous electronic medical records (EMRs) data. METHODS: We first calculated feature-level similarities according to the features’ attributes. A domain expert provided patient similarity scores of 30 randomly selected patients. These similarity scores and feature-level similarities for 30 patients comprised the labeled sample set, which was used for the semi-supervised learning algorithm to learn the patient-level similarities for all patients. Then we used the k-nearest neighbor (kNN) classifier to predict four liver conditions. The predictive performances were compared in four different situations. We also compared the performances between personalized kNN models and other machine learning models. We assessed the predictive performances by the area under the receiver operating characteristic curve (AUC), F1-score, and cross-entropy (CE) loss. RESULTS: As the size of the random training samples increased, the kNN models using the learned patient similarity to select near neighbors consistently outperformed those using the Euclidean distance to select near neighbors (all P values < 0.001). The kNN models using the learned patient similarity to identify the top k nearest neighbors from the random training samples also had a higher best-performance (AUC: 0.95 vs. 0.89, F1-score: 0.84 vs. 0.67, and CE loss: 1.22 vs. 1.82) than those using the Euclidean distance. As the size of the similar training samples increased, which composed the most similar samples determined by the learned patient similarity, the performance of kNN models using the simple Euclidean distance to select the near neighbors degraded gradually. When exchanging the role of the Euclidean distance, and the learned patient similarity in selecting the near neighbors and similar training samples, the performance of the kNN models gradually increased. These two kinds of kNN models had the same best-performance of AUC 0.95, F1-score 0.84, and CE loss 1.22. Among the four reference models, the highest AUC and F1-score were 0.94 and 0.80, separately, which were both lower than those for the simple and similarity-based kNN models. CONCLUSIONS: This learning-based method opened an opportunity for similarity measurement based on heterogeneous EMR data and supported the secondary use of EMR data.
format	Online Article Text
id	pubmed-8323210
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-83232102021-07-30 Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records Wang, Ni Huang, Yanqun Liu, Honglei Zhang, Zhiqiang Wei, Lan Fei, Xiaolu Chen, Hui BMC Med Inform Decis Mak Research BACKGROUND: A new learning-based patient similarity measurement was proposed to measure patients’ similarity for heterogeneous electronic medical records (EMRs) data. METHODS: We first calculated feature-level similarities according to the features’ attributes. A domain expert provided patient similarity scores of 30 randomly selected patients. These similarity scores and feature-level similarities for 30 patients comprised the labeled sample set, which was used for the semi-supervised learning algorithm to learn the patient-level similarities for all patients. Then we used the k-nearest neighbor (kNN) classifier to predict four liver conditions. The predictive performances were compared in four different situations. We also compared the performances between personalized kNN models and other machine learning models. We assessed the predictive performances by the area under the receiver operating characteristic curve (AUC), F1-score, and cross-entropy (CE) loss. RESULTS: As the size of the random training samples increased, the kNN models using the learned patient similarity to select near neighbors consistently outperformed those using the Euclidean distance to select near neighbors (all P values < 0.001). The kNN models using the learned patient similarity to identify the top k nearest neighbors from the random training samples also had a higher best-performance (AUC: 0.95 vs. 0.89, F1-score: 0.84 vs. 0.67, and CE loss: 1.22 vs. 1.82) than those using the Euclidean distance. As the size of the similar training samples increased, which composed the most similar samples determined by the learned patient similarity, the performance of kNN models using the simple Euclidean distance to select the near neighbors degraded gradually. When exchanging the role of the Euclidean distance, and the learned patient similarity in selecting the near neighbors and similar training samples, the performance of the kNN models gradually increased. These two kinds of kNN models had the same best-performance of AUC 0.95, F1-score 0.84, and CE loss 1.22. Among the four reference models, the highest AUC and F1-score were 0.94 and 0.80, separately, which were both lower than those for the simple and similarity-based kNN models. CONCLUSIONS: This learning-based method opened an opportunity for similarity measurement based on heterogeneous EMR data and supported the secondary use of EMR data. BioMed Central 2021-07-30 /pmc/articles/PMC8323210/ /pubmed/34330261 http://dx.doi.org/10.1186/s12911-021-01432-x Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Wang, Ni Huang, Yanqun Liu, Honglei Zhang, Zhiqiang Wei, Lan Fei, Xiaolu Chen, Hui Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records
title	Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records
title_full	Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records
title_fullStr	Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records
title_full_unstemmed	Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records
title_short	Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records
title_sort	study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8323210/ https://www.ncbi.nlm.nih.gov/pubmed/34330261 http://dx.doi.org/10.1186/s12911-021-01432-x
work_keys_str_mv	AT wangni studyonthesemisupervisedlearningbasedpatientsimilarityfromheterogeneouselectronicmedicalrecords AT huangyanqun studyonthesemisupervisedlearningbasedpatientsimilarityfromheterogeneouselectronicmedicalrecords AT liuhonglei studyonthesemisupervisedlearningbasedpatientsimilarityfromheterogeneouselectronicmedicalrecords AT zhangzhiqiang studyonthesemisupervisedlearningbasedpatientsimilarityfromheterogeneouselectronicmedicalrecords AT weilan studyonthesemisupervisedlearningbasedpatientsimilarityfromheterogeneouselectronicmedicalrecords AT feixiaolu studyonthesemisupervisedlearningbasedpatientsimilarityfromheterogeneouselectronicmedicalrecords AT chenhui studyonthesemisupervisedlearningbasedpatientsimilarityfromheterogeneouselectronicmedicalrecords

Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records

Ejemplares similares