Cargando…
A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature
BACKGROUND: Annotating scientific literature with ontology concepts is a critical task in biology and several other domains for knowledge discovery. Ontology based annotations can power large-scale comparative analyses in a wide range of applications ranging from evolutionary phenotypes to rare huma...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9516808/ https://www.ncbi.nlm.nih.gov/pubmed/36171616 http://dx.doi.org/10.1186/s13040-022-00310-0 |
_version_ | 1784798784504463360 |
---|---|
author | Devkota, Pratik Mohanty, Somya D. Manda, Prashanti |
author_facet | Devkota, Pratik Mohanty, Somya D. Manda, Prashanti |
author_sort | Devkota, Pratik |
collection | PubMed |
description | BACKGROUND: Annotating scientific literature with ontology concepts is a critical task in biology and several other domains for knowledge discovery. Ontology based annotations can power large-scale comparative analyses in a wide range of applications ranging from evolutionary phenotypes to rare human diseases to the study of protein functions. Computational methods that can tag scientific text with ontology terms have included lexical/syntactic methods, traditional machine learning, and most recently, deep learning. RESULTS: Here, we present state of the art deep learning architectures based on Gated Recurrent Units for annotating text with ontology concepts. We use the Colorado Richly Annotated Full Text Corpus (CRAFT) as a gold standard for training and testing. We explore a number of additional information sources including NCBI’s BioThesauraus and Unified Medical Language System (UMLS) to augment information from CRAFT for increasing prediction accuracy. Our best model results in a 0.84 F1 and semantic similarity. CONCLUSION: The results shown here underscore the impact for using deep learning architectures for automatically recognizing ontology concepts from literature. The augmentation of the models with biological information beyond that present in the gold standard corpus shows a distinct improvement in prediction accuracy. |
format | Online Article Text |
id | pubmed-9516808 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-95168082022-09-29 A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature Devkota, Pratik Mohanty, Somya D. Manda, Prashanti BioData Min Research BACKGROUND: Annotating scientific literature with ontology concepts is a critical task in biology and several other domains for knowledge discovery. Ontology based annotations can power large-scale comparative analyses in a wide range of applications ranging from evolutionary phenotypes to rare human diseases to the study of protein functions. Computational methods that can tag scientific text with ontology terms have included lexical/syntactic methods, traditional machine learning, and most recently, deep learning. RESULTS: Here, we present state of the art deep learning architectures based on Gated Recurrent Units for annotating text with ontology concepts. We use the Colorado Richly Annotated Full Text Corpus (CRAFT) as a gold standard for training and testing. We explore a number of additional information sources including NCBI’s BioThesauraus and Unified Medical Language System (UMLS) to augment information from CRAFT for increasing prediction accuracy. Our best model results in a 0.84 F1 and semantic similarity. CONCLUSION: The results shown here underscore the impact for using deep learning architectures for automatically recognizing ontology concepts from literature. The augmentation of the models with biological information beyond that present in the gold standard corpus shows a distinct improvement in prediction accuracy. BioMed Central 2022-09-28 /pmc/articles/PMC9516808/ /pubmed/36171616 http://dx.doi.org/10.1186/s13040-022-00310-0 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Devkota, Pratik Mohanty, Somya D. Manda, Prashanti A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature |
title | A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature |
title_full | A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature |
title_fullStr | A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature |
title_full_unstemmed | A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature |
title_short | A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature |
title_sort | gated recurrent unit based architecture for recognizing ontology concepts from biological literature |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9516808/ https://www.ncbi.nlm.nih.gov/pubmed/36171616 http://dx.doi.org/10.1186/s13040-022-00310-0 |
work_keys_str_mv | AT devkotapratik agatedrecurrentunitbasedarchitectureforrecognizingontologyconceptsfrombiologicalliterature AT mohantysomyad agatedrecurrentunitbasedarchitectureforrecognizingontologyconceptsfrombiologicalliterature AT mandaprashanti agatedrecurrentunitbasedarchitectureforrecognizingontologyconceptsfrombiologicalliterature AT devkotapratik gatedrecurrentunitbasedarchitectureforrecognizingontologyconceptsfrombiologicalliterature AT mohantysomyad gatedrecurrentunitbasedarchitectureforrecognizingontologyconceptsfrombiologicalliterature AT mandaprashanti gatedrecurrentunitbasedarchitectureforrecognizingontologyconceptsfrombiologicalliterature |