Cargando…
Care episode retrieval: distributional semantic models for information retrieval in the clinical domain
Patients' health related information is stored in electronic health records (EHRs) by health service providers. These records include sequential documentation of care episodes in the form of clinical notes. EHRs are used throughout the health care sector by professionals, administrators and pat...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4474584/ https://www.ncbi.nlm.nih.gov/pubmed/26099735 http://dx.doi.org/10.1186/1472-6947-15-S2-S2 |
_version_ | 1782377297025695744 |
---|---|
author | Moen, Hans Ginter, Filip Marsi, Erwin Peltonen, Laura-Maria Salakoski, Tapio Salanterä, Sanna |
author_facet | Moen, Hans Ginter, Filip Marsi, Erwin Peltonen, Laura-Maria Salakoski, Tapio Salanterä, Sanna |
author_sort | Moen, Hans |
collection | PubMed |
description | Patients' health related information is stored in electronic health records (EHRs) by health service providers. These records include sequential documentation of care episodes in the form of clinical notes. EHRs are used throughout the health care sector by professionals, administrators and patients, primarily for clinical purposes, but also for secondary purposes such as decision support and research. The vast amounts of information in EHR systems complicate information management and increase the risk of information overload. Therefore, clinicians and researchers need new tools to manage the information stored in the EHRs. A common use case is, given a - possibly unfinished - care episode, to retrieve the most similar care episodes among the records. This paper presents several methods for information retrieval, focusing on care episode retrieval, based on textual similarity, where similarity is measured through domain-specific modelling of the distributional semantics of words. Models include variants of random indexing and the semantic neural network model word2vec. Two novel methods are introduced that utilize the ICD-10 codes attached to care episodes to better induce domain-specificity in the semantic model. We report on experimental evaluation of care episode retrieval that circumvents the lack of human judgements regarding episode relevance. Results suggest that several of the methods proposed outperform a state-of-the art search engine (Lucene) on the retrieval task. |
format | Online Article Text |
id | pubmed-4474584 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-44745842015-06-25 Care episode retrieval: distributional semantic models for information retrieval in the clinical domain Moen, Hans Ginter, Filip Marsi, Erwin Peltonen, Laura-Maria Salakoski, Tapio Salanterä, Sanna BMC Med Inform Decis Mak Proceedings Patients' health related information is stored in electronic health records (EHRs) by health service providers. These records include sequential documentation of care episodes in the form of clinical notes. EHRs are used throughout the health care sector by professionals, administrators and patients, primarily for clinical purposes, but also for secondary purposes such as decision support and research. The vast amounts of information in EHR systems complicate information management and increase the risk of information overload. Therefore, clinicians and researchers need new tools to manage the information stored in the EHRs. A common use case is, given a - possibly unfinished - care episode, to retrieve the most similar care episodes among the records. This paper presents several methods for information retrieval, focusing on care episode retrieval, based on textual similarity, where similarity is measured through domain-specific modelling of the distributional semantics of words. Models include variants of random indexing and the semantic neural network model word2vec. Two novel methods are introduced that utilize the ICD-10 codes attached to care episodes to better induce domain-specificity in the semantic model. We report on experimental evaluation of care episode retrieval that circumvents the lack of human judgements regarding episode relevance. Results suggest that several of the methods proposed outperform a state-of-the art search engine (Lucene) on the retrieval task. BioMed Central 2015-06-15 /pmc/articles/PMC4474584/ /pubmed/26099735 http://dx.doi.org/10.1186/1472-6947-15-S2-S2 Text en Copyright © 2015 Moen et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Proceedings Moen, Hans Ginter, Filip Marsi, Erwin Peltonen, Laura-Maria Salakoski, Tapio Salanterä, Sanna Care episode retrieval: distributional semantic models for information retrieval in the clinical domain |
title | Care episode retrieval: distributional semantic models for information retrieval in the clinical domain |
title_full | Care episode retrieval: distributional semantic models for information retrieval in the clinical domain |
title_fullStr | Care episode retrieval: distributional semantic models for information retrieval in the clinical domain |
title_full_unstemmed | Care episode retrieval: distributional semantic models for information retrieval in the clinical domain |
title_short | Care episode retrieval: distributional semantic models for information retrieval in the clinical domain |
title_sort | care episode retrieval: distributional semantic models for information retrieval in the clinical domain |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4474584/ https://www.ncbi.nlm.nih.gov/pubmed/26099735 http://dx.doi.org/10.1186/1472-6947-15-S2-S2 |
work_keys_str_mv | AT moenhans careepisoderetrievaldistributionalsemanticmodelsforinformationretrievalintheclinicaldomain AT ginterfilip careepisoderetrievaldistributionalsemanticmodelsforinformationretrievalintheclinicaldomain AT marsierwin careepisoderetrievaldistributionalsemanticmodelsforinformationretrievalintheclinicaldomain AT peltonenlauramaria careepisoderetrievaldistributionalsemanticmodelsforinformationretrievalintheclinicaldomain AT salakoskitapio careepisoderetrievaldistributionalsemanticmodelsforinformationretrievalintheclinicaldomain AT salanterasanna careepisoderetrievaldistributionalsemanticmodelsforinformationretrievalintheclinicaldomain |