Cargando…

Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES

BACKGROUND: Information in Electronic Health Records is largely stored as unstructured free text. Natural language processing (NLP), or Medical Language Processing (MLP) in medicine, aims at extracting structured information from free text, and is less expensive and time-consuming than manual extrac...

Descripción completa

Detalles Bibliográficos
Autores principales: Kersloot, Martijn G., Lau, Francis, Abu-Hanna, Ameen, Arts, Derk L., Cornet, Ronald
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6749652/
https://www.ncbi.nlm.nih.gov/pubmed/31533810
http://dx.doi.org/10.1186/s13326-019-0207-3
_version_ 1783452321428013056
author Kersloot, Martijn G.
Lau, Francis
Abu-Hanna, Ameen
Arts, Derk L.
Cornet, Ronald
author_facet Kersloot, Martijn G.
Lau, Francis
Abu-Hanna, Ameen
Arts, Derk L.
Cornet, Ronald
author_sort Kersloot, Martijn G.
collection PubMed
description BACKGROUND: Information in Electronic Health Records is largely stored as unstructured free text. Natural language processing (NLP), or Medical Language Processing (MLP) in medicine, aims at extracting structured information from free text, and is less expensive and time-consuming than manual extraction. However, most algorithms in MLP are institution-specific or address only one clinical need, and thus cannot be broadly applied. In addition, most MLP systems do not detect concepts in misspelled text and cannot detect attribute relationships between concepts. The objective of this study was to develop and evaluate an MLP application that includes generic algorithms for the detection of (misspelled) concepts and of attribute relationships between them. METHODS: An implementation of the MLP system cTAKES, called DIRECT, was developed with generic SNOMED CT concept filter, concept relationship detection, and attribute relationship detection algorithms and a custom dictionary. Four implementations of cTAKES were evaluated by comparing 98 manually annotated oncology charts with the output of DIRECT. The F(1)-score was determined for named-entity recognition and attribute relationship detection for the concepts ‘lung cancer’, ‘non-small cell lung cancer’, and ‘recurrence’. The performance of the four implementations was compared with a two-tailed permutation test. RESULTS: DIRECT detected lung cancer and non-small cell lung cancer concepts with F(1)-scores between 0.828 and 0.947 and between 0.862 and 0.933, respectively. The concept recurrence was detected with a significantly higher F(1)-score of 0.921, compared to the other implementations, and the relationship between recurrence and lung cancer with an F(1)-score of 0.857. The precision of the detection of lung cancer, non-small cell lung cancer, and recurrence concepts were 1.000, 0.966, and 0.879, compared to precisions of 0.943, 0.967, and 0.000 in the original implementation, respectively. CONCLUSION: DIRECT can detect oncology concepts and attribute relationships with high precision and can detect recurrence with significant increase in F(1)-score, compared to the original implementation of cTAKES, due to the usage of a custom dictionary and a generic concept relationship detection algorithm. These concepts and relationships can be used to encode clinical narratives, and can thus substantially reduce manual chart abstraction efforts, saving time for clinicians and researchers.
format Online
Article
Text
id pubmed-6749652
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-67496522019-09-23 Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES Kersloot, Martijn G. Lau, Francis Abu-Hanna, Ameen Arts, Derk L. Cornet, Ronald J Biomed Semantics Research BACKGROUND: Information in Electronic Health Records is largely stored as unstructured free text. Natural language processing (NLP), or Medical Language Processing (MLP) in medicine, aims at extracting structured information from free text, and is less expensive and time-consuming than manual extraction. However, most algorithms in MLP are institution-specific or address only one clinical need, and thus cannot be broadly applied. In addition, most MLP systems do not detect concepts in misspelled text and cannot detect attribute relationships between concepts. The objective of this study was to develop and evaluate an MLP application that includes generic algorithms for the detection of (misspelled) concepts and of attribute relationships between them. METHODS: An implementation of the MLP system cTAKES, called DIRECT, was developed with generic SNOMED CT concept filter, concept relationship detection, and attribute relationship detection algorithms and a custom dictionary. Four implementations of cTAKES were evaluated by comparing 98 manually annotated oncology charts with the output of DIRECT. The F(1)-score was determined for named-entity recognition and attribute relationship detection for the concepts ‘lung cancer’, ‘non-small cell lung cancer’, and ‘recurrence’. The performance of the four implementations was compared with a two-tailed permutation test. RESULTS: DIRECT detected lung cancer and non-small cell lung cancer concepts with F(1)-scores between 0.828 and 0.947 and between 0.862 and 0.933, respectively. The concept recurrence was detected with a significantly higher F(1)-score of 0.921, compared to the other implementations, and the relationship between recurrence and lung cancer with an F(1)-score of 0.857. The precision of the detection of lung cancer, non-small cell lung cancer, and recurrence concepts were 1.000, 0.966, and 0.879, compared to precisions of 0.943, 0.967, and 0.000 in the original implementation, respectively. CONCLUSION: DIRECT can detect oncology concepts and attribute relationships with high precision and can detect recurrence with significant increase in F(1)-score, compared to the original implementation of cTAKES, due to the usage of a custom dictionary and a generic concept relationship detection algorithm. These concepts and relationships can be used to encode clinical narratives, and can thus substantially reduce manual chart abstraction efforts, saving time for clinicians and researchers. BioMed Central 2019-09-18 /pmc/articles/PMC6749652/ /pubmed/31533810 http://dx.doi.org/10.1186/s13326-019-0207-3 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Kersloot, Martijn G.
Lau, Francis
Abu-Hanna, Ameen
Arts, Derk L.
Cornet, Ronald
Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES
title Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES
title_full Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES
title_fullStr Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES
title_full_unstemmed Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES
title_short Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES
title_sort automated snomed ct concept and attribute relationship detection through a web-based implementation of ctakes
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6749652/
https://www.ncbi.nlm.nih.gov/pubmed/31533810
http://dx.doi.org/10.1186/s13326-019-0207-3
work_keys_str_mv AT kerslootmartijng automatedsnomedctconceptandattributerelationshipdetectionthroughawebbasedimplementationofctakes
AT laufrancis automatedsnomedctconceptandattributerelationshipdetectionthroughawebbasedimplementationofctakes
AT abuhannaameen automatedsnomedctconceptandattributerelationshipdetectionthroughawebbasedimplementationofctakes
AT artsderkl automatedsnomedctconceptandattributerelationshipdetectionthroughawebbasedimplementationofctakes
AT cornetronald automatedsnomedctconceptandattributerelationshipdetectionthroughawebbasedimplementationofctakes