Cargando…

A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature

BACKGROUND: Annotating scientific literature with ontology concepts is a critical task in biology and several other domains for knowledge discovery. Ontology based annotations can power large-scale comparative analyses in a wide range of applications ranging from evolutionary phenotypes to rare huma...

Descripción completa

Detalles Bibliográficos
Autores principales: Devkota, Pratik, Mohanty, Somya D., Manda, Prashanti
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9516808/
https://www.ncbi.nlm.nih.gov/pubmed/36171616
http://dx.doi.org/10.1186/s13040-022-00310-0
_version_ 1784798784504463360
author Devkota, Pratik
Mohanty, Somya D.
Manda, Prashanti
author_facet Devkota, Pratik
Mohanty, Somya D.
Manda, Prashanti
author_sort Devkota, Pratik
collection PubMed
description BACKGROUND: Annotating scientific literature with ontology concepts is a critical task in biology and several other domains for knowledge discovery. Ontology based annotations can power large-scale comparative analyses in a wide range of applications ranging from evolutionary phenotypes to rare human diseases to the study of protein functions. Computational methods that can tag scientific text with ontology terms have included lexical/syntactic methods, traditional machine learning, and most recently, deep learning. RESULTS: Here, we present state of the art deep learning architectures based on Gated Recurrent Units for annotating text with ontology concepts. We use the Colorado Richly Annotated Full Text Corpus (CRAFT) as a gold standard for training and testing. We explore a number of additional information sources including NCBI’s BioThesauraus and Unified Medical Language System (UMLS) to augment information from CRAFT for increasing prediction accuracy. Our best model results in a 0.84 F1 and semantic similarity. CONCLUSION: The results shown here underscore the impact for using deep learning architectures for automatically recognizing ontology concepts from literature. The augmentation of the models with biological information beyond that present in the gold standard corpus shows a distinct improvement in prediction accuracy.
format Online
Article
Text
id pubmed-9516808
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-95168082022-09-29 A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature Devkota, Pratik Mohanty, Somya D. Manda, Prashanti BioData Min Research BACKGROUND: Annotating scientific literature with ontology concepts is a critical task in biology and several other domains for knowledge discovery. Ontology based annotations can power large-scale comparative analyses in a wide range of applications ranging from evolutionary phenotypes to rare human diseases to the study of protein functions. Computational methods that can tag scientific text with ontology terms have included lexical/syntactic methods, traditional machine learning, and most recently, deep learning. RESULTS: Here, we present state of the art deep learning architectures based on Gated Recurrent Units for annotating text with ontology concepts. We use the Colorado Richly Annotated Full Text Corpus (CRAFT) as a gold standard for training and testing. We explore a number of additional information sources including NCBI’s BioThesauraus and Unified Medical Language System (UMLS) to augment information from CRAFT for increasing prediction accuracy. Our best model results in a 0.84 F1 and semantic similarity. CONCLUSION: The results shown here underscore the impact for using deep learning architectures for automatically recognizing ontology concepts from literature. The augmentation of the models with biological information beyond that present in the gold standard corpus shows a distinct improvement in prediction accuracy. BioMed Central 2022-09-28 /pmc/articles/PMC9516808/ /pubmed/36171616 http://dx.doi.org/10.1186/s13040-022-00310-0 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Devkota, Pratik
Mohanty, Somya D.
Manda, Prashanti
A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature
title A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature
title_full A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature
title_fullStr A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature
title_full_unstemmed A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature
title_short A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature
title_sort gated recurrent unit based architecture for recognizing ontology concepts from biological literature
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9516808/
https://www.ncbi.nlm.nih.gov/pubmed/36171616
http://dx.doi.org/10.1186/s13040-022-00310-0
work_keys_str_mv AT devkotapratik agatedrecurrentunitbasedarchitectureforrecognizingontologyconceptsfrombiologicalliterature
AT mohantysomyad agatedrecurrentunitbasedarchitectureforrecognizingontologyconceptsfrombiologicalliterature
AT mandaprashanti agatedrecurrentunitbasedarchitectureforrecognizingontologyconceptsfrombiologicalliterature
AT devkotapratik gatedrecurrentunitbasedarchitectureforrecognizingontologyconceptsfrombiologicalliterature
AT mohantysomyad gatedrecurrentunitbasedarchitectureforrecognizingontologyconceptsfrombiologicalliterature
AT mandaprashanti gatedrecurrentunitbasedarchitectureforrecognizingontologyconceptsfrombiologicalliterature