Cargando…
Gene ontology concept recognition using named concept: understanding the various presentations of the gene functions in biomedical literature
Objective: A major challenge in precision medicine is the development of patient-specific genetic biomarkers or drug targets. The firsthand information of the genes associated with the pathologic pathways of interest is buried in the ocean of biomedical literature. Gene ontology concept recognition...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6204799/ https://www.ncbi.nlm.nih.gov/pubmed/30376045 http://dx.doi.org/10.1093/database/bay115 |
_version_ | 1783366093856833536 |
---|---|
author | Yang, Chia-Jung Chiang, Jung-Hsien |
author_facet | Yang, Chia-Jung Chiang, Jung-Hsien |
author_sort | Yang, Chia-Jung |
collection | PubMed |
description | Objective: A major challenge in precision medicine is the development of patient-specific genetic biomarkers or drug targets. The firsthand information of the genes associated with the pathologic pathways of interest is buried in the ocean of biomedical literature. Gene ontology concept recognition (GOCR) is a biomedical natural language processing task used to extract and normalize the mentions of gene ontology (GO), the controlled vocabulary for gene functions across many species, from biomedical text. The previous GOCR systems, using either rule-based or machine-learning methods, treated GO concepts as separate terms and did not have an efficient way of sharing the common synonyms among the concepts. Materials and Methods: We used the CRAFT corpus in this study. Targeting the compositional structure of the GO, we introduced named concept, the basic conceptual unit which has a conserved name and is used in other complex concepts. Using the named concepts, we separated the GOCR task into dictionary-matching and machine-learning steps. By harvesting the surface names used in the training data, we wildly boosted the synonyms of GO concepts via the connection of the named concepts and then enhanced the capability to recognize more GO concepts in the text. The source code is available at https://github.com/jeroyang/ncgocr . Results: Named concept gene ontology concept recognizer (NCGOCR) achieved 0.804 precision and 0.715 recall by correct recognition of the non-standard mentions of the GO concepts. Discussion: The lack of consensus on GO naming causes diversity in the GO mentions in biomedical manuscripts. The high performance is owed to the stability of the composing GO concepts and the lack of variance in the spelling of named concepts. Conclusion: NCGOCR reduced the arduous work of GO annotation and amended the process of searching for the biomarkers or drug targets, leading to improved biomarker development and greater success in precision medicine. |
format | Online Article Text |
id | pubmed-6204799 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-62047992018-11-02 Gene ontology concept recognition using named concept: understanding the various presentations of the gene functions in biomedical literature Yang, Chia-Jung Chiang, Jung-Hsien Database (Oxford) Original Article Objective: A major challenge in precision medicine is the development of patient-specific genetic biomarkers or drug targets. The firsthand information of the genes associated with the pathologic pathways of interest is buried in the ocean of biomedical literature. Gene ontology concept recognition (GOCR) is a biomedical natural language processing task used to extract and normalize the mentions of gene ontology (GO), the controlled vocabulary for gene functions across many species, from biomedical text. The previous GOCR systems, using either rule-based or machine-learning methods, treated GO concepts as separate terms and did not have an efficient way of sharing the common synonyms among the concepts. Materials and Methods: We used the CRAFT corpus in this study. Targeting the compositional structure of the GO, we introduced named concept, the basic conceptual unit which has a conserved name and is used in other complex concepts. Using the named concepts, we separated the GOCR task into dictionary-matching and machine-learning steps. By harvesting the surface names used in the training data, we wildly boosted the synonyms of GO concepts via the connection of the named concepts and then enhanced the capability to recognize more GO concepts in the text. The source code is available at https://github.com/jeroyang/ncgocr . Results: Named concept gene ontology concept recognizer (NCGOCR) achieved 0.804 precision and 0.715 recall by correct recognition of the non-standard mentions of the GO concepts. Discussion: The lack of consensus on GO naming causes diversity in the GO mentions in biomedical manuscripts. The high performance is owed to the stability of the composing GO concepts and the lack of variance in the spelling of named concepts. Conclusion: NCGOCR reduced the arduous work of GO annotation and amended the process of searching for the biomarkers or drug targets, leading to improved biomarker development and greater success in precision medicine. Oxford University Press 2018-10-29 /pmc/articles/PMC6204799/ /pubmed/30376045 http://dx.doi.org/10.1093/database/bay115 Text en © The Author(s) 2018. Published by Oxford University Press. http://academic.oup.com/journals/pages/about_us/legal/notices This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) For permissions, please e-mail: journals. permissions@oup.com |
spellingShingle | Original Article Yang, Chia-Jung Chiang, Jung-Hsien Gene ontology concept recognition using named concept: understanding the various presentations of the gene functions in biomedical literature |
title |
Gene ontology concept recognition using named concept: understanding the various presentations of the gene functions in biomedical literature
|
title_full |
Gene ontology concept recognition using named concept: understanding the various presentations of the gene functions in biomedical literature
|
title_fullStr |
Gene ontology concept recognition using named concept: understanding the various presentations of the gene functions in biomedical literature
|
title_full_unstemmed |
Gene ontology concept recognition using named concept: understanding the various presentations of the gene functions in biomedical literature
|
title_short |
Gene ontology concept recognition using named concept: understanding the various presentations of the gene functions in biomedical literature
|
title_sort | gene ontology concept recognition using named concept: understanding the various presentations of the gene functions in biomedical literature |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6204799/ https://www.ncbi.nlm.nih.gov/pubmed/30376045 http://dx.doi.org/10.1093/database/bay115 |
work_keys_str_mv | AT yangchiajung geneontologyconceptrecognitionusingnamedconceptunderstandingthevariouspresentationsofthegenefunctionsinbiomedicalliterature AT chiangjunghsien geneontologyconceptrecognitionusingnamedconceptunderstandingthevariouspresentationsofthegenefunctionsinbiomedicalliterature |