Cargando…

Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records

BACKGROUND: Data mining of electronic health records (EHRs) has a huge potential for improving clinical decision support and to help healthcare deliver precision medicine. Unfortunately, the rule-based and machine learning-based approaches used for natural language processing (NLP) in healthcare tod...

Descripción completa

Detalles Bibliográficos
Autores principales: Berge, Geir Thore, Granmo, Ole-Christoffer, Tveit, Tor Oddbjørn, Ruthjersen, Anna Linda, Sharma, Jivitesh
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10507898/
https://www.ncbi.nlm.nih.gov/pubmed/37723446
http://dx.doi.org/10.1186/s12911-023-02271-8
_version_ 1785107411683508224
author Berge, Geir Thore
Granmo, Ole-Christoffer
Tveit, Tor Oddbjørn
Ruthjersen, Anna Linda
Sharma, Jivitesh
author_facet Berge, Geir Thore
Granmo, Ole-Christoffer
Tveit, Tor Oddbjørn
Ruthjersen, Anna Linda
Sharma, Jivitesh
author_sort Berge, Geir Thore
collection PubMed
description BACKGROUND: Data mining of electronic health records (EHRs) has a huge potential for improving clinical decision support and to help healthcare deliver precision medicine. Unfortunately, the rule-based and machine learning-based approaches used for natural language processing (NLP) in healthcare today all struggle with various shortcomings related to performance, efficiency, or transparency. METHODS: In this paper, we address these issues by presenting a novel method for NLP that implements unsupervised learning of word embeddings, semi-supervised learning for simplified and accelerated clinical vocabulary and concept building, and deterministic rules for fine-grained control of information extraction. The clinical language is automatically learnt, and vocabulary, concepts, and rules supporting a variety of NLP downstream tasks can further be built with only minimal manual feature engineering and tagging required from clinical experts. Together, these steps create an open processing pipeline that gradually refines the data in a transparent way, which greatly improves the interpretable nature of our method. Data transformations are thus made transparent and predictions interpretable, which is imperative for healthcare. The combined method also has other advantages, like potentially being language independent, demanding few domain resources for maintenance, and able to cover misspellings, abbreviations, and acronyms. To test and evaluate the combined method, we have developed a clinical decision support system (CDSS) named Information System for Clinical Concept Searching (ICCS) that implements the method for clinical concept tagging, extraction, and classification. RESULTS: In empirical studies the method shows high performance (recall 92.6%, precision 88.8%, F-measure 90.7%), and has demonstrated its value to clinical practice. Here we employ a real-life EHR-derived dataset to evaluate the method’s performance on the task of classification (i.e., detecting patient allergies) against a range of common supervised learning algorithms. The combined method achieves state-of-the-art performance compared to the alternative methods we evaluate. We also perform a qualitative analysis of common word embedding methods on the task of word similarity to examine their potential for supporting automatic feature engineering for clinical NLP tasks. CONCLUSIONS: Based on the promising results, we suggest more research should be aimed at exploiting the inherent synergies between unsupervised, supervised, and rule-based paradigms for clinical NLP. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-023-02271-8.
format Online
Article
Text
id pubmed-10507898
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-105078982023-09-20 Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records Berge, Geir Thore Granmo, Ole-Christoffer Tveit, Tor Oddbjørn Ruthjersen, Anna Linda Sharma, Jivitesh BMC Med Inform Decis Mak Research Article BACKGROUND: Data mining of electronic health records (EHRs) has a huge potential for improving clinical decision support and to help healthcare deliver precision medicine. Unfortunately, the rule-based and machine learning-based approaches used for natural language processing (NLP) in healthcare today all struggle with various shortcomings related to performance, efficiency, or transparency. METHODS: In this paper, we address these issues by presenting a novel method for NLP that implements unsupervised learning of word embeddings, semi-supervised learning for simplified and accelerated clinical vocabulary and concept building, and deterministic rules for fine-grained control of information extraction. The clinical language is automatically learnt, and vocabulary, concepts, and rules supporting a variety of NLP downstream tasks can further be built with only minimal manual feature engineering and tagging required from clinical experts. Together, these steps create an open processing pipeline that gradually refines the data in a transparent way, which greatly improves the interpretable nature of our method. Data transformations are thus made transparent and predictions interpretable, which is imperative for healthcare. The combined method also has other advantages, like potentially being language independent, demanding few domain resources for maintenance, and able to cover misspellings, abbreviations, and acronyms. To test and evaluate the combined method, we have developed a clinical decision support system (CDSS) named Information System for Clinical Concept Searching (ICCS) that implements the method for clinical concept tagging, extraction, and classification. RESULTS: In empirical studies the method shows high performance (recall 92.6%, precision 88.8%, F-measure 90.7%), and has demonstrated its value to clinical practice. Here we employ a real-life EHR-derived dataset to evaluate the method’s performance on the task of classification (i.e., detecting patient allergies) against a range of common supervised learning algorithms. The combined method achieves state-of-the-art performance compared to the alternative methods we evaluate. We also perform a qualitative analysis of common word embedding methods on the task of word similarity to examine their potential for supporting automatic feature engineering for clinical NLP tasks. CONCLUSIONS: Based on the promising results, we suggest more research should be aimed at exploiting the inherent synergies between unsupervised, supervised, and rule-based paradigms for clinical NLP. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-023-02271-8. BioMed Central 2023-09-18 /pmc/articles/PMC10507898/ /pubmed/37723446 http://dx.doi.org/10.1186/s12911-023-02271-8 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Berge, Geir Thore
Granmo, Ole-Christoffer
Tveit, Tor Oddbjørn
Ruthjersen, Anna Linda
Sharma, Jivitesh
Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records
title Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records
title_full Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records
title_fullStr Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records
title_full_unstemmed Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records
title_short Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records
title_sort combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10507898/
https://www.ncbi.nlm.nih.gov/pubmed/37723446
http://dx.doi.org/10.1186/s12911-023-02271-8
work_keys_str_mv AT bergegeirthore combiningunsupervisedsupervisedandrulebasedlearningthecaseofdetectingpatientallergiesinelectronichealthrecords
AT granmoolechristoffer combiningunsupervisedsupervisedandrulebasedlearningthecaseofdetectingpatientallergiesinelectronichealthrecords
AT tveittoroddbjørn combiningunsupervisedsupervisedandrulebasedlearningthecaseofdetectingpatientallergiesinelectronichealthrecords
AT ruthjersenannalinda combiningunsupervisedsupervisedandrulebasedlearningthecaseofdetectingpatientallergiesinelectronichealthrecords
AT sharmajivitesh combiningunsupervisedsupervisedandrulebasedlearningthecaseofdetectingpatientallergiesinelectronichealthrecords