Cargando…

Automatic gene annotation using GO terms from cellular component domain

BACKGROUND: The Gene Ontology (GO) is a resource that supplies information about gene product function using ontologies to represent biological knowledge. These ontologies cover three domains: Cellular Component (CC), Molecular Function (MF), and Biological Process (BP). GO annotation is a process w...

Descripción completa

Detalles Bibliográficos
Autores principales: Ding, Ruoyao, Qu, Yingying, Wu, Cathy H., Vijay-Shanker, K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6284271/
https://www.ncbi.nlm.nih.gov/pubmed/30526566
http://dx.doi.org/10.1186/s12911-018-0694-7
_version_ 1783379304927723520
author Ding, Ruoyao
Qu, Yingying
Wu, Cathy H.
Vijay-Shanker, K.
author_facet Ding, Ruoyao
Qu, Yingying
Wu, Cathy H.
Vijay-Shanker, K.
author_sort Ding, Ruoyao
collection PubMed
description BACKGROUND: The Gene Ontology (GO) is a resource that supplies information about gene product function using ontologies to represent biological knowledge. These ontologies cover three domains: Cellular Component (CC), Molecular Function (MF), and Biological Process (BP). GO annotation is a process which assigns gene functional information using GO terms to relevant genes in the literature. It is a common task among the Model Organism Database (MOD) groups. Manual GO annotation relies on human curators assigning gene functional information using GO terms by reading the biomedical literature. This process is very time-consuming and labor-intensive. As a result, many MODs can afford to curate only a fraction of relevant articles. METHODS: GO terms from the CC domain can be essentially divided into two sub-hierarchies: subcellular location terms, and protein complex terms. We cast the task of gene annotation using GO terms from the CC domain as relation extraction between gene and other entities: (1) extract cases where a protein is found to be in a subcellular location, and (2) extract cases where a protein is a subunit of a protein complex. For each relation extraction task, we use an approach based on triggers and syntactic dependencies to extract the desired relations among entities. RESULTS: We tested our approach on the BC4GO test set, a publicly available corpus for GO annotation. Our approach obtains a F1-score of 71%, a precision of 91% and a recall of 58% for predicting GO terms from CC Domain for given genes. CONCLUSIONS: We have described a novel approach of treating gene annotation with GO terms from CC domain as two relation extraction subtasks. Evaluation results show that our approach achieves a F1-score of 71% for predicting GO terms for given genes. Thereby our approach can be used to accelerate the process of GO annotation for the bio-annotators.
format Online
Article
Text
id pubmed-6284271
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62842712018-12-14 Automatic gene annotation using GO terms from cellular component domain Ding, Ruoyao Qu, Yingying Wu, Cathy H. Vijay-Shanker, K. BMC Med Inform Decis Mak Research BACKGROUND: The Gene Ontology (GO) is a resource that supplies information about gene product function using ontologies to represent biological knowledge. These ontologies cover three domains: Cellular Component (CC), Molecular Function (MF), and Biological Process (BP). GO annotation is a process which assigns gene functional information using GO terms to relevant genes in the literature. It is a common task among the Model Organism Database (MOD) groups. Manual GO annotation relies on human curators assigning gene functional information using GO terms by reading the biomedical literature. This process is very time-consuming and labor-intensive. As a result, many MODs can afford to curate only a fraction of relevant articles. METHODS: GO terms from the CC domain can be essentially divided into two sub-hierarchies: subcellular location terms, and protein complex terms. We cast the task of gene annotation using GO terms from the CC domain as relation extraction between gene and other entities: (1) extract cases where a protein is found to be in a subcellular location, and (2) extract cases where a protein is a subunit of a protein complex. For each relation extraction task, we use an approach based on triggers and syntactic dependencies to extract the desired relations among entities. RESULTS: We tested our approach on the BC4GO test set, a publicly available corpus for GO annotation. Our approach obtains a F1-score of 71%, a precision of 91% and a recall of 58% for predicting GO terms from CC Domain for given genes. CONCLUSIONS: We have described a novel approach of treating gene annotation with GO terms from CC domain as two relation extraction subtasks. Evaluation results show that our approach achieves a F1-score of 71% for predicting GO terms for given genes. Thereby our approach can be used to accelerate the process of GO annotation for the bio-annotators. BioMed Central 2018-12-07 /pmc/articles/PMC6284271/ /pubmed/30526566 http://dx.doi.org/10.1186/s12911-018-0694-7 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Ding, Ruoyao
Qu, Yingying
Wu, Cathy H.
Vijay-Shanker, K.
Automatic gene annotation using GO terms from cellular component domain
title Automatic gene annotation using GO terms from cellular component domain
title_full Automatic gene annotation using GO terms from cellular component domain
title_fullStr Automatic gene annotation using GO terms from cellular component domain
title_full_unstemmed Automatic gene annotation using GO terms from cellular component domain
title_short Automatic gene annotation using GO terms from cellular component domain
title_sort automatic gene annotation using go terms from cellular component domain
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6284271/
https://www.ncbi.nlm.nih.gov/pubmed/30526566
http://dx.doi.org/10.1186/s12911-018-0694-7
work_keys_str_mv AT dingruoyao automaticgeneannotationusinggotermsfromcellularcomponentdomain
AT quyingying automaticgeneannotationusinggotermsfromcellularcomponentdomain
AT wucathyh automaticgeneannotationusinggotermsfromcellularcomponentdomain
AT vijayshankerk automaticgeneannotationusinggotermsfromcellularcomponentdomain