Cargando…

Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records

BACKGROUND: Data mining of electronic health records (EHRs) has a huge potential for improving clinical decision support and to help healthcare deliver precision medicine. Unfortunately, the rule-based and machine learning-based approaches used for natural language processing (NLP) in healthcare tod...

Descripción completa

Detalles Bibliográficos
Autores principales:	Berge, Geir Thore, Granmo, Ole-Christoffer, Tveit, Tor Oddbjørn, Ruthjersen, Anna Linda, Sharma, Jivitesh
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2023
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10507898/ https://www.ncbi.nlm.nih.gov/pubmed/37723446 http://dx.doi.org/10.1186/s12911-023-02271-8

_version_	1785107411683508224
author	Berge, Geir Thore Granmo, Ole-Christoffer Tveit, Tor Oddbjørn Ruthjersen, Anna Linda Sharma, Jivitesh
author_facet	Berge, Geir Thore Granmo, Ole-Christoffer Tveit, Tor Oddbjørn Ruthjersen, Anna Linda Sharma, Jivitesh
author_sort	Berge, Geir Thore
collection	PubMed
description	BACKGROUND: Data mining of electronic health records (EHRs) has a huge potential for improving clinical decision support and to help healthcare deliver precision medicine. Unfortunately, the rule-based and machine learning-based approaches used for natural language processing (NLP) in healthcare today all struggle with various shortcomings related to performance, efficiency, or transparency. METHODS: In this paper, we address these issues by presenting a novel method for NLP that implements unsupervised learning of word embeddings, semi-supervised learning for simplified and accelerated clinical vocabulary and concept building, and deterministic rules for fine-grained control of information extraction. The clinical language is automatically learnt, and vocabulary, concepts, and rules supporting a variety of NLP downstream tasks can further be built with only minimal manual feature engineering and tagging required from clinical experts. Together, these steps create an open processing pipeline that gradually refines the data in a transparent way, which greatly improves the interpretable nature of our method. Data transformations are thus made transparent and predictions interpretable, which is imperative for healthcare. The combined method also has other advantages, like potentially being language independent, demanding few domain resources for maintenance, and able to cover misspellings, abbreviations, and acronyms. To test and evaluate the combined method, we have developed a clinical decision support system (CDSS) named Information System for Clinical Concept Searching (ICCS) that implements the method for clinical concept tagging, extraction, and classification. RESULTS: In empirical studies the method shows high performance (recall 92.6%, precision 88.8%, F-measure 90.7%), and has demonstrated its value to clinical practice. Here we employ a real-life EHR-derived dataset to evaluate the method’s performance on the task of classification (i.e., detecting patient allergies) against a range of common supervised learning algorithms. The combined method achieves state-of-the-art performance compared to the alternative methods we evaluate. We also perform a qualitative analysis of common word embedding methods on the task of word similarity to examine their potential for supporting automatic feature engineering for clinical NLP tasks. CONCLUSIONS: Based on the promising results, we suggest more research should be aimed at exploiting the inherent synergies between unsupervised, supervised, and rule-based paradigms for clinical NLP. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-023-02271-8.
format	Online Article Text
id	pubmed-10507898
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-105078982023-09-20 Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records Berge, Geir Thore Granmo, Ole-Christoffer Tveit, Tor Oddbjørn Ruthjersen, Anna Linda Sharma, Jivitesh BMC Med Inform Decis Mak Research Article BACKGROUND: Data mining of electronic health records (EHRs) has a huge potential for improving clinical decision support and to help healthcare deliver precision medicine. Unfortunately, the rule-based and machine learning-based approaches used for natural language processing (NLP) in healthcare today all struggle with various shortcomings related to performance, efficiency, or transparency. METHODS: In this paper, we address these issues by presenting a novel method for NLP that implements unsupervised learning of word embeddings, semi-supervised learning for simplified and accelerated clinical vocabulary and concept building, and deterministic rules for fine-grained control of information extraction. The clinical language is automatically learnt, and vocabulary, concepts, and rules supporting a variety of NLP downstream tasks can further be built with only minimal manual feature engineering and tagging required from clinical experts. Together, these steps create an open processing pipeline that gradually refines the data in a transparent way, which greatly improves the interpretable nature of our method. Data transformations are thus made transparent and predictions interpretable, which is imperative for healthcare. The combined method also has other advantages, like potentially being language independent, demanding few domain resources for maintenance, and able to cover misspellings, abbreviations, and acronyms. To test and evaluate the combined method, we have developed a clinical decision support system (CDSS) named Information System for Clinical Concept Searching (ICCS) that implements the method for clinical concept tagging, extraction, and classification. RESULTS: In empirical studies the method shows high performance (recall 92.6%, precision 88.8%, F-measure 90.7%), and has demonstrated its value to clinical practice. Here we employ a real-life EHR-derived dataset to evaluate the method’s performance on the task of classification (i.e., detecting patient allergies) against a range of common supervised learning algorithms. The combined method achieves state-of-the-art performance compared to the alternative methods we evaluate. We also perform a qualitative analysis of common word embedding methods on the task of word similarity to examine their potential for supporting automatic feature engineering for clinical NLP tasks. CONCLUSIONS: Based on the promising results, we suggest more research should be aimed at exploiting the inherent synergies between unsupervised, supervised, and rule-based paradigms for clinical NLP. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-023-02271-8. BioMed Central 2023-09-18 /pmc/articles/PMC10507898/ /pubmed/37723446 http://dx.doi.org/10.1186/s12911-023-02271-8 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Article Berge, Geir Thore Granmo, Ole-Christoffer Tveit, Tor Oddbjørn Ruthjersen, Anna Linda Sharma, Jivitesh Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records
title	Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records
title_full	Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records
title_fullStr	Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records
title_full_unstemmed	Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records
title_short	Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records
title_sort	combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10507898/ https://www.ncbi.nlm.nih.gov/pubmed/37723446 http://dx.doi.org/10.1186/s12911-023-02271-8
work_keys_str_mv	AT bergegeirthore combiningunsupervisedsupervisedandrulebasedlearningthecaseofdetectingpatientallergiesinelectronichealthrecords AT granmoolechristoffer combiningunsupervisedsupervisedandrulebasedlearningthecaseofdetectingpatientallergiesinelectronichealthrecords AT tveittoroddbjørn combiningunsupervisedsupervisedandrulebasedlearningthecaseofdetectingpatientallergiesinelectronichealthrecords AT ruthjersenannalinda combiningunsupervisedsupervisedandrulebasedlearningthecaseofdetectingpatientallergiesinelectronichealthrecords AT sharmajivitesh combiningunsupervisedsupervisedandrulebasedlearningthecaseofdetectingpatientallergiesinelectronichealthrecords

Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records

Ejemplares similares