Cargando…

Combining lexical and context features for automatic ontology extension

BACKGROUND: Ontologies are widely used across biology and biomedicine for the annotation of databases. Ontology development is often a manual, time-consuming, and expensive process. Automatic or semi-automatic identification of classes that can be added to an ontology can make ontology development m...

Descripción completa

Detalles Bibliográficos
Autores principales: Althubaiti, Sara, Kafkas, Şenay, Abdelhakim, Marwa, Hoehndorf, Robert
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6958746/
https://www.ncbi.nlm.nih.gov/pubmed/31931870
http://dx.doi.org/10.1186/s13326-019-0218-0
_version_ 1783487479891886080
author Althubaiti, Sara
Kafkas, Şenay
Abdelhakim, Marwa
Hoehndorf, Robert
author_facet Althubaiti, Sara
Kafkas, Şenay
Abdelhakim, Marwa
Hoehndorf, Robert
author_sort Althubaiti, Sara
collection PubMed
description BACKGROUND: Ontologies are widely used across biology and biomedicine for the annotation of databases. Ontology development is often a manual, time-consuming, and expensive process. Automatic or semi-automatic identification of classes that can be added to an ontology can make ontology development more efficient. RESULTS: We developed a method that uses machine learning and word embeddings to identify words and phrases that are used to refer to an ontology class in biomedical Europe PMC full-text articles. Once labels and synonyms of a class are known, we use machine learning to identify the super-classes of a class. For this purpose, we identify lexical term variants, use word embeddings to capture context information, and rely on automated reasoning over ontologies to generate features, and we use an artificial neural network as classifier. We demonstrate the utility of our approach in identifying terms that refer to diseases in the Human Disease Ontology and to distinguish between different types of diseases. CONCLUSIONS: Our method is capable of discovering labels that refer to a class in an ontology but are not present in an ontology, and it can identify whether a class should be a subclass of some high-level ontology classes. Our approach can therefore be used for the semi-automatic extension and quality control of ontologies. The algorithm, corpora and evaluation datasets are available at https://github.com/bio-ontology-research-group/ontology-extension.
format Online
Article
Text
id pubmed-6958746
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69587462020-01-17 Combining lexical and context features for automatic ontology extension Althubaiti, Sara Kafkas, Şenay Abdelhakim, Marwa Hoehndorf, Robert J Biomed Semantics Research BACKGROUND: Ontologies are widely used across biology and biomedicine for the annotation of databases. Ontology development is often a manual, time-consuming, and expensive process. Automatic or semi-automatic identification of classes that can be added to an ontology can make ontology development more efficient. RESULTS: We developed a method that uses machine learning and word embeddings to identify words and phrases that are used to refer to an ontology class in biomedical Europe PMC full-text articles. Once labels and synonyms of a class are known, we use machine learning to identify the super-classes of a class. For this purpose, we identify lexical term variants, use word embeddings to capture context information, and rely on automated reasoning over ontologies to generate features, and we use an artificial neural network as classifier. We demonstrate the utility of our approach in identifying terms that refer to diseases in the Human Disease Ontology and to distinguish between different types of diseases. CONCLUSIONS: Our method is capable of discovering labels that refer to a class in an ontology but are not present in an ontology, and it can identify whether a class should be a subclass of some high-level ontology classes. Our approach can therefore be used for the semi-automatic extension and quality control of ontologies. The algorithm, corpora and evaluation datasets are available at https://github.com/bio-ontology-research-group/ontology-extension. BioMed Central 2020-01-13 /pmc/articles/PMC6958746/ /pubmed/31931870 http://dx.doi.org/10.1186/s13326-019-0218-0 Text en © The Author(s) 2020 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Althubaiti, Sara
Kafkas, Şenay
Abdelhakim, Marwa
Hoehndorf, Robert
Combining lexical and context features for automatic ontology extension
title Combining lexical and context features for automatic ontology extension
title_full Combining lexical and context features for automatic ontology extension
title_fullStr Combining lexical and context features for automatic ontology extension
title_full_unstemmed Combining lexical and context features for automatic ontology extension
title_short Combining lexical and context features for automatic ontology extension
title_sort combining lexical and context features for automatic ontology extension
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6958746/
https://www.ncbi.nlm.nih.gov/pubmed/31931870
http://dx.doi.org/10.1186/s13326-019-0218-0
work_keys_str_mv AT althubaitisara combininglexicalandcontextfeaturesforautomaticontologyextension
AT kafkassenay combininglexicalandcontextfeaturesforautomaticontologyextension
AT abdelhakimmarwa combininglexicalandcontextfeaturesforautomaticontologyextension
AT hoehndorfrobert combininglexicalandcontextfeaturesforautomaticontologyextension