Cargando…
Integrating information retrieval with distant supervision for Gene Ontology annotation
This article describes our participation of the Gene Ontology Curation task (GO task) in BioCreative IV where we participated in both subtasks: A) identification of GO evidence sentences (GOESs) for relevant genes in full-text articles and B) prediction of GO terms for relevant genes in full-text ar...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4150992/ https://www.ncbi.nlm.nih.gov/pubmed/25183856 http://dx.doi.org/10.1093/database/bau087 |
_version_ | 1782332980712177664 |
---|---|
author | Zhu, Dongqing Li, Dingcheng Carterette, Ben Liu, Hongfang |
author_facet | Zhu, Dongqing Li, Dingcheng Carterette, Ben Liu, Hongfang |
author_sort | Zhu, Dongqing |
collection | PubMed |
description | This article describes our participation of the Gene Ontology Curation task (GO task) in BioCreative IV where we participated in both subtasks: A) identification of GO evidence sentences (GOESs) for relevant genes in full-text articles and B) prediction of GO terms for relevant genes in full-text articles. For subtask A, we trained a logistic regression model to detect GOES based on annotations in the training data supplemented with more noisy negatives from an external resource. Then, a greedy approach was applied to associate genes with sentences. For subtask B, we designed two types of systems: (i) search-based systems, which predict GO terms based on existing annotations for GOESs that are of different textual granularities (i.e., full-text articles, abstracts, and sentences) using state-of-the-art information retrieval techniques (i.e., a novel application of the idea of distant supervision) and (ii) a similarity-based system, which assigns GO terms based on the distance between words in sentences and GO terms/synonyms. Our best performing system for subtask A achieves an F1 score of 0.27 based on exact match and 0.387 allowing relaxed overlap match. Our best performing system for subtask B, a search-based system, achieves an F1 score of 0.075 based on exact match and 0.301 considering hierarchical matches. Our search-based systems for subtask B significantly outperformed the similarity-based system. Database URL: https://github.com/noname2020/Bioc |
format | Online Article Text |
id | pubmed-4150992 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-41509922014-09-03 Integrating information retrieval with distant supervision for Gene Ontology annotation Zhu, Dongqing Li, Dingcheng Carterette, Ben Liu, Hongfang Database (Oxford) Original Article This article describes our participation of the Gene Ontology Curation task (GO task) in BioCreative IV where we participated in both subtasks: A) identification of GO evidence sentences (GOESs) for relevant genes in full-text articles and B) prediction of GO terms for relevant genes in full-text articles. For subtask A, we trained a logistic regression model to detect GOES based on annotations in the training data supplemented with more noisy negatives from an external resource. Then, a greedy approach was applied to associate genes with sentences. For subtask B, we designed two types of systems: (i) search-based systems, which predict GO terms based on existing annotations for GOESs that are of different textual granularities (i.e., full-text articles, abstracts, and sentences) using state-of-the-art information retrieval techniques (i.e., a novel application of the idea of distant supervision) and (ii) a similarity-based system, which assigns GO terms based on the distance between words in sentences and GO terms/synonyms. Our best performing system for subtask A achieves an F1 score of 0.27 based on exact match and 0.387 allowing relaxed overlap match. Our best performing system for subtask B, a search-based system, achieves an F1 score of 0.075 based on exact match and 0.301 considering hierarchical matches. Our search-based systems for subtask B significantly outperformed the similarity-based system. Database URL: https://github.com/noname2020/Bioc Oxford University Press 2014-09-01 /pmc/articles/PMC4150992/ /pubmed/25183856 http://dx.doi.org/10.1093/database/bau087 Text en © The Author(s) 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Article Zhu, Dongqing Li, Dingcheng Carterette, Ben Liu, Hongfang Integrating information retrieval with distant supervision for Gene Ontology annotation |
title | Integrating information retrieval with distant supervision for Gene Ontology annotation |
title_full | Integrating information retrieval with distant supervision for Gene Ontology annotation |
title_fullStr | Integrating information retrieval with distant supervision for Gene Ontology annotation |
title_full_unstemmed | Integrating information retrieval with distant supervision for Gene Ontology annotation |
title_short | Integrating information retrieval with distant supervision for Gene Ontology annotation |
title_sort | integrating information retrieval with distant supervision for gene ontology annotation |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4150992/ https://www.ncbi.nlm.nih.gov/pubmed/25183856 http://dx.doi.org/10.1093/database/bau087 |
work_keys_str_mv | AT zhudongqing integratinginformationretrievalwithdistantsupervisionforgeneontologyannotation AT lidingcheng integratinginformationretrievalwithdistantsupervisionforgeneontologyannotation AT carteretteben integratinginformationretrievalwithdistantsupervisionforgeneontologyannotation AT liuhongfang integratinginformationretrievalwithdistantsupervisionforgeneontologyannotation |