Cargando…
Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records
BACKGROUND: Data mining of electronic health records (EHRs) has a huge potential for improving clinical decision support and to help healthcare deliver precision medicine. Unfortunately, the rule-based and machine learning-based approaches used for natural language processing (NLP) in healthcare tod...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10507898/ https://www.ncbi.nlm.nih.gov/pubmed/37723446 http://dx.doi.org/10.1186/s12911-023-02271-8 |
_version_ | 1785107411683508224 |
---|---|
author | Berge, Geir Thore Granmo, Ole-Christoffer Tveit, Tor Oddbjørn Ruthjersen, Anna Linda Sharma, Jivitesh |
author_facet | Berge, Geir Thore Granmo, Ole-Christoffer Tveit, Tor Oddbjørn Ruthjersen, Anna Linda Sharma, Jivitesh |
author_sort | Berge, Geir Thore |
collection | PubMed |
description | BACKGROUND: Data mining of electronic health records (EHRs) has a huge potential for improving clinical decision support and to help healthcare deliver precision medicine. Unfortunately, the rule-based and machine learning-based approaches used for natural language processing (NLP) in healthcare today all struggle with various shortcomings related to performance, efficiency, or transparency. METHODS: In this paper, we address these issues by presenting a novel method for NLP that implements unsupervised learning of word embeddings, semi-supervised learning for simplified and accelerated clinical vocabulary and concept building, and deterministic rules for fine-grained control of information extraction. The clinical language is automatically learnt, and vocabulary, concepts, and rules supporting a variety of NLP downstream tasks can further be built with only minimal manual feature engineering and tagging required from clinical experts. Together, these steps create an open processing pipeline that gradually refines the data in a transparent way, which greatly improves the interpretable nature of our method. Data transformations are thus made transparent and predictions interpretable, which is imperative for healthcare. The combined method also has other advantages, like potentially being language independent, demanding few domain resources for maintenance, and able to cover misspellings, abbreviations, and acronyms. To test and evaluate the combined method, we have developed a clinical decision support system (CDSS) named Information System for Clinical Concept Searching (ICCS) that implements the method for clinical concept tagging, extraction, and classification. RESULTS: In empirical studies the method shows high performance (recall 92.6%, precision 88.8%, F-measure 90.7%), and has demonstrated its value to clinical practice. Here we employ a real-life EHR-derived dataset to evaluate the method’s performance on the task of classification (i.e., detecting patient allergies) against a range of common supervised learning algorithms. The combined method achieves state-of-the-art performance compared to the alternative methods we evaluate. We also perform a qualitative analysis of common word embedding methods on the task of word similarity to examine their potential for supporting automatic feature engineering for clinical NLP tasks. CONCLUSIONS: Based on the promising results, we suggest more research should be aimed at exploiting the inherent synergies between unsupervised, supervised, and rule-based paradigms for clinical NLP. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-023-02271-8. |
format | Online Article Text |
id | pubmed-10507898 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-105078982023-09-20 Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records Berge, Geir Thore Granmo, Ole-Christoffer Tveit, Tor Oddbjørn Ruthjersen, Anna Linda Sharma, Jivitesh BMC Med Inform Decis Mak Research Article BACKGROUND: Data mining of electronic health records (EHRs) has a huge potential for improving clinical decision support and to help healthcare deliver precision medicine. Unfortunately, the rule-based and machine learning-based approaches used for natural language processing (NLP) in healthcare today all struggle with various shortcomings related to performance, efficiency, or transparency. METHODS: In this paper, we address these issues by presenting a novel method for NLP that implements unsupervised learning of word embeddings, semi-supervised learning for simplified and accelerated clinical vocabulary and concept building, and deterministic rules for fine-grained control of information extraction. The clinical language is automatically learnt, and vocabulary, concepts, and rules supporting a variety of NLP downstream tasks can further be built with only minimal manual feature engineering and tagging required from clinical experts. Together, these steps create an open processing pipeline that gradually refines the data in a transparent way, which greatly improves the interpretable nature of our method. Data transformations are thus made transparent and predictions interpretable, which is imperative for healthcare. The combined method also has other advantages, like potentially being language independent, demanding few domain resources for maintenance, and able to cover misspellings, abbreviations, and acronyms. To test and evaluate the combined method, we have developed a clinical decision support system (CDSS) named Information System for Clinical Concept Searching (ICCS) that implements the method for clinical concept tagging, extraction, and classification. RESULTS: In empirical studies the method shows high performance (recall 92.6%, precision 88.8%, F-measure 90.7%), and has demonstrated its value to clinical practice. Here we employ a real-life EHR-derived dataset to evaluate the method’s performance on the task of classification (i.e., detecting patient allergies) against a range of common supervised learning algorithms. The combined method achieves state-of-the-art performance compared to the alternative methods we evaluate. We also perform a qualitative analysis of common word embedding methods on the task of word similarity to examine their potential for supporting automatic feature engineering for clinical NLP tasks. CONCLUSIONS: Based on the promising results, we suggest more research should be aimed at exploiting the inherent synergies between unsupervised, supervised, and rule-based paradigms for clinical NLP. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-023-02271-8. BioMed Central 2023-09-18 /pmc/articles/PMC10507898/ /pubmed/37723446 http://dx.doi.org/10.1186/s12911-023-02271-8 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Article Berge, Geir Thore Granmo, Ole-Christoffer Tveit, Tor Oddbjørn Ruthjersen, Anna Linda Sharma, Jivitesh Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records |
title | Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records |
title_full | Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records |
title_fullStr | Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records |
title_full_unstemmed | Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records |
title_short | Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records |
title_sort | combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10507898/ https://www.ncbi.nlm.nih.gov/pubmed/37723446 http://dx.doi.org/10.1186/s12911-023-02271-8 |
work_keys_str_mv | AT bergegeirthore combiningunsupervisedsupervisedandrulebasedlearningthecaseofdetectingpatientallergiesinelectronichealthrecords AT granmoolechristoffer combiningunsupervisedsupervisedandrulebasedlearningthecaseofdetectingpatientallergiesinelectronichealthrecords AT tveittoroddbjørn combiningunsupervisedsupervisedandrulebasedlearningthecaseofdetectingpatientallergiesinelectronichealthrecords AT ruthjersenannalinda combiningunsupervisedsupervisedandrulebasedlearningthecaseofdetectingpatientallergiesinelectronichealthrecords AT sharmajivitesh combiningunsupervisedsupervisedandrulebasedlearningthecaseofdetectingpatientallergiesinelectronichealthrecords |