Cargando…

Improved characterisation of clinical text through ontology-based vocabulary expansion

BACKGROUND: Biomedical ontologies contain a wealth of metadata that constitutes a fundamental infrastructural resource for text mining. For several reasons, redundancies exist in the ontology ecosystem, which lead to the same entities being described by several concepts in the same or similar contex...

Descripción completa

Detalles Bibliográficos
Autores principales: Slater, Luke T., Bradlow, William, Ball, Simon, Hoehndorf, Robert, Gkoutos, Georgios V
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8042947/
https://www.ncbi.nlm.nih.gov/pubmed/33845909
http://dx.doi.org/10.1186/s13326-021-00241-5
_version_ 1783678221488750592
author Slater, Luke T.
Bradlow, William
Ball, Simon
Hoehndorf, Robert
Gkoutos, Georgios V
author_facet Slater, Luke T.
Bradlow, William
Ball, Simon
Hoehndorf, Robert
Gkoutos, Georgios V
author_sort Slater, Luke T.
collection PubMed
description BACKGROUND: Biomedical ontologies contain a wealth of metadata that constitutes a fundamental infrastructural resource for text mining. For several reasons, redundancies exist in the ontology ecosystem, which lead to the same entities being described by several concepts in the same or similar contexts across several ontologies. While these concepts describe the same entities, they contain different sets of complementary metadata. Linking these definitions to make use of their combined metadata could lead to improved performance in ontology-based information retrieval, extraction, and analysis tasks. RESULTS: We develop and present an algorithm that expands the set of labels associated with an ontology class using a combination of strict lexical matching and cross-ontology reasoner-enabled equivalency queries. Across all disease terms in the Disease Ontology, the approach found 51,362 additional labels, more than tripling the number defined by the ontology itself. Manual validation by a clinical expert on a random sampling of expanded synonyms over the Human Phenotype Ontology yielded a precision of 0.912. Furthermore, we found that annotating patient visits in MIMIC-III with an extended set of Disease Ontology labels led to semantic similarity score derived from those labels being a significantly better predictor of matching first diagnosis, with a mean average precision of 0.88 for the unexpanded set of annotations, and 0.913 for the expanded set. CONCLUSIONS: Inter-ontology synonym expansion can lead to a vast increase in the scale of vocabulary available for text mining applications. While the accuracy of the extended vocabulary is not perfect, it nevertheless led to a significantly improved ontology-based characterisation of patients from text in one setting. Furthermore, where run-on error is not acceptable, the technique can be used to provide candidate synonyms which can be checked by a domain expert.
format Online
Article
Text
id pubmed-8042947
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-80429472021-04-14 Improved characterisation of clinical text through ontology-based vocabulary expansion Slater, Luke T. Bradlow, William Ball, Simon Hoehndorf, Robert Gkoutos, Georgios V J Biomed Semantics Research BACKGROUND: Biomedical ontologies contain a wealth of metadata that constitutes a fundamental infrastructural resource for text mining. For several reasons, redundancies exist in the ontology ecosystem, which lead to the same entities being described by several concepts in the same or similar contexts across several ontologies. While these concepts describe the same entities, they contain different sets of complementary metadata. Linking these definitions to make use of their combined metadata could lead to improved performance in ontology-based information retrieval, extraction, and analysis tasks. RESULTS: We develop and present an algorithm that expands the set of labels associated with an ontology class using a combination of strict lexical matching and cross-ontology reasoner-enabled equivalency queries. Across all disease terms in the Disease Ontology, the approach found 51,362 additional labels, more than tripling the number defined by the ontology itself. Manual validation by a clinical expert on a random sampling of expanded synonyms over the Human Phenotype Ontology yielded a precision of 0.912. Furthermore, we found that annotating patient visits in MIMIC-III with an extended set of Disease Ontology labels led to semantic similarity score derived from those labels being a significantly better predictor of matching first diagnosis, with a mean average precision of 0.88 for the unexpanded set of annotations, and 0.913 for the expanded set. CONCLUSIONS: Inter-ontology synonym expansion can lead to a vast increase in the scale of vocabulary available for text mining applications. While the accuracy of the extended vocabulary is not perfect, it nevertheless led to a significantly improved ontology-based characterisation of patients from text in one setting. Furthermore, where run-on error is not acceptable, the technique can be used to provide candidate synonyms which can be checked by a domain expert. BioMed Central 2021-04-12 /pmc/articles/PMC8042947/ /pubmed/33845909 http://dx.doi.org/10.1186/s13326-021-00241-5 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Slater, Luke T.
Bradlow, William
Ball, Simon
Hoehndorf, Robert
Gkoutos, Georgios V
Improved characterisation of clinical text through ontology-based vocabulary expansion
title Improved characterisation of clinical text through ontology-based vocabulary expansion
title_full Improved characterisation of clinical text through ontology-based vocabulary expansion
title_fullStr Improved characterisation of clinical text through ontology-based vocabulary expansion
title_full_unstemmed Improved characterisation of clinical text through ontology-based vocabulary expansion
title_short Improved characterisation of clinical text through ontology-based vocabulary expansion
title_sort improved characterisation of clinical text through ontology-based vocabulary expansion
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8042947/
https://www.ncbi.nlm.nih.gov/pubmed/33845909
http://dx.doi.org/10.1186/s13326-021-00241-5
work_keys_str_mv AT slaterluket improvedcharacterisationofclinicaltextthroughontologybasedvocabularyexpansion
AT bradlowwilliam improvedcharacterisationofclinicaltextthroughontologybasedvocabularyexpansion
AT ballsimon improvedcharacterisationofclinicaltextthroughontologybasedvocabularyexpansion
AT hoehndorfrobert improvedcharacterisationofclinicaltextthroughontologybasedvocabularyexpansion
AT gkoutosgeorgiosv improvedcharacterisationofclinicaltextthroughontologybasedvocabularyexpansion