Cargando…

Unsupervised anomaly detection of implausible electronic health records: a real-world evaluation in cancer registries

BACKGROUND: Cancer registries collect patient-specific information about cancer diseases. The collected information is verified and made available to clinical researchers, physicians, and patients. When processing information, cancer registries verify that the patient-specific records they collect a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Röchner, Philipp, Rothlauf, Franz
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2023
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10207857/ https://www.ncbi.nlm.nih.gov/pubmed/37226114 http://dx.doi.org/10.1186/s12874-023-01946-0

_version_	1785046546197250048
author	Röchner, Philipp Rothlauf, Franz
author_facet	Röchner, Philipp Rothlauf, Franz
author_sort	Röchner, Philipp
collection	PubMed
description	BACKGROUND: Cancer registries collect patient-specific information about cancer diseases. The collected information is verified and made available to clinical researchers, physicians, and patients. When processing information, cancer registries verify that the patient-specific records they collect are plausible. This means that the collected information about a particular patient makes medical sense. METHODS: Unsupervised machine learning approaches can detect implausible electronic health records without human guidance. Therefore, this article investigates two unsupervised anomaly detection approaches, a pattern-based approach (FindFPOF) and a compression-based approach (autoencoder), to identify implausible electronic health records in cancer registries. Unlike most existing work that analyzes synthetic anomalies, we compare the performance of both approaches and a baseline (random selection of records) on a real-world dataset. The dataset contains 21,104 electronic health records of patients with breast, colorectal, and prostate tumors. Each record consists of 16 categorical variables describing the disease, the patient, and the diagnostic procedure. The samples identified by FindFPOF, the autoencoder, and a random selection—a total of 785 different records—are evaluated in a real-world scenario by medical domain experts. RESULTS: Both anomaly detection methods are good at detecting implausible electronic health records. First, domain experts identified [Formula: see text] of 300 randomly selected records as implausible. With FindFPOF and the autoencoder, [Formula: see text] of the proposed 300 records in each sample were implausible. This corresponds to a precision of [Formula: see text] for FindFPOF and the autoencoder. Second, for 300 randomly selected records that were labeled by domain experts, the sensitivity of the autoencoder was [Formula: see text] and the sensitivity of FindFPOF was [Formula: see text] . Both anomaly detection methods had a specificity of [Formula: see text] . Third, FindFPOF and the autoencoder suggested samples with a different distribution of values than the overall dataset. For example, both anomaly detection methods suggested a higher proportion of colorectal records, the tumor localization with the highest percentage of implausible records in a randomly selected sample. CONCLUSIONS: Unsupervised anomaly detection can significantly reduce the manual effort of domain experts to find implausible electronic health records in cancer registries. In our experiments, the manual effort was reduced by a factor of approximately 3.5 compared to evaluating a random sample.
format	Online Article Text
id	pubmed-10207857
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-102078572023-05-25 Unsupervised anomaly detection of implausible electronic health records: a real-world evaluation in cancer registries Röchner, Philipp Rothlauf, Franz BMC Med Res Methodol Research Article BACKGROUND: Cancer registries collect patient-specific information about cancer diseases. The collected information is verified and made available to clinical researchers, physicians, and patients. When processing information, cancer registries verify that the patient-specific records they collect are plausible. This means that the collected information about a particular patient makes medical sense. METHODS: Unsupervised machine learning approaches can detect implausible electronic health records without human guidance. Therefore, this article investigates two unsupervised anomaly detection approaches, a pattern-based approach (FindFPOF) and a compression-based approach (autoencoder), to identify implausible electronic health records in cancer registries. Unlike most existing work that analyzes synthetic anomalies, we compare the performance of both approaches and a baseline (random selection of records) on a real-world dataset. The dataset contains 21,104 electronic health records of patients with breast, colorectal, and prostate tumors. Each record consists of 16 categorical variables describing the disease, the patient, and the diagnostic procedure. The samples identified by FindFPOF, the autoencoder, and a random selection—a total of 785 different records—are evaluated in a real-world scenario by medical domain experts. RESULTS: Both anomaly detection methods are good at detecting implausible electronic health records. First, domain experts identified [Formula: see text] of 300 randomly selected records as implausible. With FindFPOF and the autoencoder, [Formula: see text] of the proposed 300 records in each sample were implausible. This corresponds to a precision of [Formula: see text] for FindFPOF and the autoencoder. Second, for 300 randomly selected records that were labeled by domain experts, the sensitivity of the autoencoder was [Formula: see text] and the sensitivity of FindFPOF was [Formula: see text] . Both anomaly detection methods had a specificity of [Formula: see text] . Third, FindFPOF and the autoencoder suggested samples with a different distribution of values than the overall dataset. For example, both anomaly detection methods suggested a higher proportion of colorectal records, the tumor localization with the highest percentage of implausible records in a randomly selected sample. CONCLUSIONS: Unsupervised anomaly detection can significantly reduce the manual effort of domain experts to find implausible electronic health records in cancer registries. In our experiments, the manual effort was reduced by a factor of approximately 3.5 compared to evaluating a random sample. BioMed Central 2023-05-24 /pmc/articles/PMC10207857/ /pubmed/37226114 http://dx.doi.org/10.1186/s12874-023-01946-0 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Article Röchner, Philipp Rothlauf, Franz Unsupervised anomaly detection of implausible electronic health records: a real-world evaluation in cancer registries
title	Unsupervised anomaly detection of implausible electronic health records: a real-world evaluation in cancer registries
title_full	Unsupervised anomaly detection of implausible electronic health records: a real-world evaluation in cancer registries
title_fullStr	Unsupervised anomaly detection of implausible electronic health records: a real-world evaluation in cancer registries
title_full_unstemmed	Unsupervised anomaly detection of implausible electronic health records: a real-world evaluation in cancer registries
title_short	Unsupervised anomaly detection of implausible electronic health records: a real-world evaluation in cancer registries
title_sort	unsupervised anomaly detection of implausible electronic health records: a real-world evaluation in cancer registries
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10207857/ https://www.ncbi.nlm.nih.gov/pubmed/37226114 http://dx.doi.org/10.1186/s12874-023-01946-0
work_keys_str_mv	AT rochnerphilipp unsupervisedanomalydetectionofimplausibleelectronichealthrecordsarealworldevaluationincancerregistries AT rothlauffranz unsupervisedanomalydetectionofimplausibleelectronichealthrecordsarealworldevaluationincancerregistries

Unsupervised anomaly detection of implausible electronic health records: a real-world evaluation in cancer registries

Ejemplares similares