Cargando…

ContextD: an algorithm to identify contextual properties of medical terms in a Dutch clinical corpus

BACKGROUND: In order to extract meaningful information from electronic medical records, such as signs and symptoms, diagnoses, and treatments, it is important to take into account the contextual properties of the identified information: negation, temporality, and experiencer. Most work on automatic...

Descripción completa

Detalles Bibliográficos
Autores principales: Afzal, Zubair, Pons, Ewoud, Kang, Ning, Sturkenboom, Miriam CJM, Schuemie, Martijn J, Kors, Jan A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4264258/
https://www.ncbi.nlm.nih.gov/pubmed/25432799
http://dx.doi.org/10.1186/s12859-014-0373-3
_version_ 1782348704320061440
author Afzal, Zubair
Pons, Ewoud
Kang, Ning
Sturkenboom, Miriam CJM
Schuemie, Martijn J
Kors, Jan A
author_facet Afzal, Zubair
Pons, Ewoud
Kang, Ning
Sturkenboom, Miriam CJM
Schuemie, Martijn J
Kors, Jan A
author_sort Afzal, Zubair
collection PubMed
description BACKGROUND: In order to extract meaningful information from electronic medical records, such as signs and symptoms, diagnoses, and treatments, it is important to take into account the contextual properties of the identified information: negation, temporality, and experiencer. Most work on automatic identification of these contextual properties has been done on English clinical text. This study presents ContextD, an adaptation of the English ConText algorithm to the Dutch language, and a Dutch clinical corpus. We created a Dutch clinical corpus containing four types of anonymized clinical documents: entries from general practitioners, specialists’ letters, radiology reports, and discharge letters. Using a Dutch list of medical terms extracted from the Unified Medical Language System, we identified medical terms in the corpus with exact matching. The identified terms were annotated for negation, temporality, and experiencer properties. To adapt the ConText algorithm, we translated English trigger terms to Dutch and added several general and document specific enhancements, such as negation rules for general practitioners’ entries and a regular expression based temporality module. RESULTS: The ContextD algorithm utilized 41 unique triggers to identify the contextual properties in the clinical corpus. For the negation property, the algorithm obtained an F-score from 87% to 93% for the different document types. For the experiencer property, the F-score was 99% to 100%. For the historical and hypothetical values of the temporality property, F-scores ranged from 26% to 54% and from 13% to 44%, respectively. CONCLUSIONS: The ContextD showed good performance in identifying negation and experiencer property values across all Dutch clinical document types. Accurate identification of the temporality property proved to be difficult and requires further work. The anonymized and annotated Dutch clinical corpus can serve as a useful resource for further algorithm development. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0373-3) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4264258
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42642582014-12-13 ContextD: an algorithm to identify contextual properties of medical terms in a Dutch clinical corpus Afzal, Zubair Pons, Ewoud Kang, Ning Sturkenboom, Miriam CJM Schuemie, Martijn J Kors, Jan A BMC Bioinformatics Research Article BACKGROUND: In order to extract meaningful information from electronic medical records, such as signs and symptoms, diagnoses, and treatments, it is important to take into account the contextual properties of the identified information: negation, temporality, and experiencer. Most work on automatic identification of these contextual properties has been done on English clinical text. This study presents ContextD, an adaptation of the English ConText algorithm to the Dutch language, and a Dutch clinical corpus. We created a Dutch clinical corpus containing four types of anonymized clinical documents: entries from general practitioners, specialists’ letters, radiology reports, and discharge letters. Using a Dutch list of medical terms extracted from the Unified Medical Language System, we identified medical terms in the corpus with exact matching. The identified terms were annotated for negation, temporality, and experiencer properties. To adapt the ConText algorithm, we translated English trigger terms to Dutch and added several general and document specific enhancements, such as negation rules for general practitioners’ entries and a regular expression based temporality module. RESULTS: The ContextD algorithm utilized 41 unique triggers to identify the contextual properties in the clinical corpus. For the negation property, the algorithm obtained an F-score from 87% to 93% for the different document types. For the experiencer property, the F-score was 99% to 100%. For the historical and hypothetical values of the temporality property, F-scores ranged from 26% to 54% and from 13% to 44%, respectively. CONCLUSIONS: The ContextD showed good performance in identifying negation and experiencer property values across all Dutch clinical document types. Accurate identification of the temporality property proved to be difficult and requires further work. The anonymized and annotated Dutch clinical corpus can serve as a useful resource for further algorithm development. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0373-3) contains supplementary material, which is available to authorized users. BioMed Central 2014-11-29 /pmc/articles/PMC4264258/ /pubmed/25432799 http://dx.doi.org/10.1186/s12859-014-0373-3 Text en © Afzal et al.; licensee BioMed Central Ltd. 2014 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Afzal, Zubair
Pons, Ewoud
Kang, Ning
Sturkenboom, Miriam CJM
Schuemie, Martijn J
Kors, Jan A
ContextD: an algorithm to identify contextual properties of medical terms in a Dutch clinical corpus
title ContextD: an algorithm to identify contextual properties of medical terms in a Dutch clinical corpus
title_full ContextD: an algorithm to identify contextual properties of medical terms in a Dutch clinical corpus
title_fullStr ContextD: an algorithm to identify contextual properties of medical terms in a Dutch clinical corpus
title_full_unstemmed ContextD: an algorithm to identify contextual properties of medical terms in a Dutch clinical corpus
title_short ContextD: an algorithm to identify contextual properties of medical terms in a Dutch clinical corpus
title_sort contextd: an algorithm to identify contextual properties of medical terms in a dutch clinical corpus
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4264258/
https://www.ncbi.nlm.nih.gov/pubmed/25432799
http://dx.doi.org/10.1186/s12859-014-0373-3
work_keys_str_mv AT afzalzubair contextdanalgorithmtoidentifycontextualpropertiesofmedicaltermsinadutchclinicalcorpus
AT ponsewoud contextdanalgorithmtoidentifycontextualpropertiesofmedicaltermsinadutchclinicalcorpus
AT kangning contextdanalgorithmtoidentifycontextualpropertiesofmedicaltermsinadutchclinicalcorpus
AT sturkenboommiriamcjm contextdanalgorithmtoidentifycontextualpropertiesofmedicaltermsinadutchclinicalcorpus
AT schuemiemartijnj contextdanalgorithmtoidentifycontextualpropertiesofmedicaltermsinadutchclinicalcorpus
AT korsjana contextdanalgorithmtoidentifycontextualpropertiesofmedicaltermsinadutchclinicalcorpus