Cargando…
Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES
BACKGROUND: Information in Electronic Health Records is largely stored as unstructured free text. Natural language processing (NLP), or Medical Language Processing (MLP) in medicine, aims at extracting structured information from free text, and is less expensive and time-consuming than manual extrac...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6749652/ https://www.ncbi.nlm.nih.gov/pubmed/31533810 http://dx.doi.org/10.1186/s13326-019-0207-3 |
_version_ | 1783452321428013056 |
---|---|
author | Kersloot, Martijn G. Lau, Francis Abu-Hanna, Ameen Arts, Derk L. Cornet, Ronald |
author_facet | Kersloot, Martijn G. Lau, Francis Abu-Hanna, Ameen Arts, Derk L. Cornet, Ronald |
author_sort | Kersloot, Martijn G. |
collection | PubMed |
description | BACKGROUND: Information in Electronic Health Records is largely stored as unstructured free text. Natural language processing (NLP), or Medical Language Processing (MLP) in medicine, aims at extracting structured information from free text, and is less expensive and time-consuming than manual extraction. However, most algorithms in MLP are institution-specific or address only one clinical need, and thus cannot be broadly applied. In addition, most MLP systems do not detect concepts in misspelled text and cannot detect attribute relationships between concepts. The objective of this study was to develop and evaluate an MLP application that includes generic algorithms for the detection of (misspelled) concepts and of attribute relationships between them. METHODS: An implementation of the MLP system cTAKES, called DIRECT, was developed with generic SNOMED CT concept filter, concept relationship detection, and attribute relationship detection algorithms and a custom dictionary. Four implementations of cTAKES were evaluated by comparing 98 manually annotated oncology charts with the output of DIRECT. The F(1)-score was determined for named-entity recognition and attribute relationship detection for the concepts ‘lung cancer’, ‘non-small cell lung cancer’, and ‘recurrence’. The performance of the four implementations was compared with a two-tailed permutation test. RESULTS: DIRECT detected lung cancer and non-small cell lung cancer concepts with F(1)-scores between 0.828 and 0.947 and between 0.862 and 0.933, respectively. The concept recurrence was detected with a significantly higher F(1)-score of 0.921, compared to the other implementations, and the relationship between recurrence and lung cancer with an F(1)-score of 0.857. The precision of the detection of lung cancer, non-small cell lung cancer, and recurrence concepts were 1.000, 0.966, and 0.879, compared to precisions of 0.943, 0.967, and 0.000 in the original implementation, respectively. CONCLUSION: DIRECT can detect oncology concepts and attribute relationships with high precision and can detect recurrence with significant increase in F(1)-score, compared to the original implementation of cTAKES, due to the usage of a custom dictionary and a generic concept relationship detection algorithm. These concepts and relationships can be used to encode clinical narratives, and can thus substantially reduce manual chart abstraction efforts, saving time for clinicians and researchers. |
format | Online Article Text |
id | pubmed-6749652 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-67496522019-09-23 Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES Kersloot, Martijn G. Lau, Francis Abu-Hanna, Ameen Arts, Derk L. Cornet, Ronald J Biomed Semantics Research BACKGROUND: Information in Electronic Health Records is largely stored as unstructured free text. Natural language processing (NLP), or Medical Language Processing (MLP) in medicine, aims at extracting structured information from free text, and is less expensive and time-consuming than manual extraction. However, most algorithms in MLP are institution-specific or address only one clinical need, and thus cannot be broadly applied. In addition, most MLP systems do not detect concepts in misspelled text and cannot detect attribute relationships between concepts. The objective of this study was to develop and evaluate an MLP application that includes generic algorithms for the detection of (misspelled) concepts and of attribute relationships between them. METHODS: An implementation of the MLP system cTAKES, called DIRECT, was developed with generic SNOMED CT concept filter, concept relationship detection, and attribute relationship detection algorithms and a custom dictionary. Four implementations of cTAKES were evaluated by comparing 98 manually annotated oncology charts with the output of DIRECT. The F(1)-score was determined for named-entity recognition and attribute relationship detection for the concepts ‘lung cancer’, ‘non-small cell lung cancer’, and ‘recurrence’. The performance of the four implementations was compared with a two-tailed permutation test. RESULTS: DIRECT detected lung cancer and non-small cell lung cancer concepts with F(1)-scores between 0.828 and 0.947 and between 0.862 and 0.933, respectively. The concept recurrence was detected with a significantly higher F(1)-score of 0.921, compared to the other implementations, and the relationship between recurrence and lung cancer with an F(1)-score of 0.857. The precision of the detection of lung cancer, non-small cell lung cancer, and recurrence concepts were 1.000, 0.966, and 0.879, compared to precisions of 0.943, 0.967, and 0.000 in the original implementation, respectively. CONCLUSION: DIRECT can detect oncology concepts and attribute relationships with high precision and can detect recurrence with significant increase in F(1)-score, compared to the original implementation of cTAKES, due to the usage of a custom dictionary and a generic concept relationship detection algorithm. These concepts and relationships can be used to encode clinical narratives, and can thus substantially reduce manual chart abstraction efforts, saving time for clinicians and researchers. BioMed Central 2019-09-18 /pmc/articles/PMC6749652/ /pubmed/31533810 http://dx.doi.org/10.1186/s13326-019-0207-3 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Kersloot, Martijn G. Lau, Francis Abu-Hanna, Ameen Arts, Derk L. Cornet, Ronald Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES |
title | Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES |
title_full | Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES |
title_fullStr | Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES |
title_full_unstemmed | Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES |
title_short | Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES |
title_sort | automated snomed ct concept and attribute relationship detection through a web-based implementation of ctakes |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6749652/ https://www.ncbi.nlm.nih.gov/pubmed/31533810 http://dx.doi.org/10.1186/s13326-019-0207-3 |
work_keys_str_mv | AT kerslootmartijng automatedsnomedctconceptandattributerelationshipdetectionthroughawebbasedimplementationofctakes AT laufrancis automatedsnomedctconceptandattributerelationshipdetectionthroughawebbasedimplementationofctakes AT abuhannaameen automatedsnomedctconceptandattributerelationshipdetectionthroughawebbasedimplementationofctakes AT artsderkl automatedsnomedctconceptandattributerelationshipdetectionthroughawebbasedimplementationofctakes AT cornetronald automatedsnomedctconceptandattributerelationshipdetectionthroughawebbasedimplementationofctakes |