Cargando…

Overview of the gene ontology task at BioCreative IV

Gene Ontology (GO) annotation is a common task among model organism databases (MODs) for capturing gene function data from journal articles. It is a time-consuming and labor-intensive task, and is thus often considered as one of the bottlenecks in literature curation. There is a growing need for sem...

Descripción completa

Detalles Bibliográficos
Autores principales: Mao, Yuqing, Van Auken, Kimberly, Li, Donghui, Arighi, Cecilia N., McQuilton, Peter, Hayman, G. Thomas, Tweedie, Susan, Schaeffer, Mary L., Laulederkind, Stanley J. F., Wang, Shur-Jen, Gobeill, Julien, Ruch, Patrick, Luu, Anh Tuan, Kim, Jung-jae, Chiang, Jung-Hsien, Chen, Yu-De, Yang, Chia-Jung, Liu, Hongfang, Zhu, Dongqing, Li, Yanpeng, Yu, Hong, Emadzadeh, Ehsan, Gonzalez, Graciela, Chen, Jian-Ming, Dai, Hong-Jie, Lu, Zhiyong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4142793/
https://www.ncbi.nlm.nih.gov/pubmed/25157073
http://dx.doi.org/10.1093/database/bau086
_version_ 1782331815956054016
author Mao, Yuqing
Van Auken, Kimberly
Li, Donghui
Arighi, Cecilia N.
McQuilton, Peter
Hayman, G. Thomas
Tweedie, Susan
Schaeffer, Mary L.
Laulederkind, Stanley J. F.
Wang, Shur-Jen
Gobeill, Julien
Ruch, Patrick
Luu, Anh Tuan
Kim, Jung-jae
Chiang, Jung-Hsien
Chen, Yu-De
Yang, Chia-Jung
Liu, Hongfang
Zhu, Dongqing
Li, Yanpeng
Yu, Hong
Emadzadeh, Ehsan
Gonzalez, Graciela
Chen, Jian-Ming
Dai, Hong-Jie
Lu, Zhiyong
author_facet Mao, Yuqing
Van Auken, Kimberly
Li, Donghui
Arighi, Cecilia N.
McQuilton, Peter
Hayman, G. Thomas
Tweedie, Susan
Schaeffer, Mary L.
Laulederkind, Stanley J. F.
Wang, Shur-Jen
Gobeill, Julien
Ruch, Patrick
Luu, Anh Tuan
Kim, Jung-jae
Chiang, Jung-Hsien
Chen, Yu-De
Yang, Chia-Jung
Liu, Hongfang
Zhu, Dongqing
Li, Yanpeng
Yu, Hong
Emadzadeh, Ehsan
Gonzalez, Graciela
Chen, Jian-Ming
Dai, Hong-Jie
Lu, Zhiyong
author_sort Mao, Yuqing
collection PubMed
description Gene Ontology (GO) annotation is a common task among model organism databases (MODs) for capturing gene function data from journal articles. It is a time-consuming and labor-intensive task, and is thus often considered as one of the bottlenecks in literature curation. There is a growing need for semiautomated or fully automated GO curation techniques that will help database curators to rapidly and accurately identify gene function information in full-length articles. Despite multiple attempts in the past, few studies have proven to be useful with regard to assisting real-world GO curation. The shortage of sentence-level training data and opportunities for interaction between text-mining developers and GO curators has limited the advances in algorithm development and corresponding use in practical circumstances. To this end, we organized a text-mining challenge task for literature-based GO annotation in BioCreative IV. More specifically, we developed two subtasks: (i) to automatically locate text passages that contain GO-relevant information (a text retrieval task) and (ii) to automatically identify relevant GO terms for the genes in a given article (a concept-recognition task). With the support from five MODs, we provided teams with >4000 unique text passages that served as the basis for each GO annotation in our task data. Such evidence text information has long been recognized as critical for text-mining algorithm development but was never made available because of the high cost of curation. In total, seven teams participated in the challenge task. From the team results, we conclude that the state of the art in automatically mining GO terms from literature has improved over the past decade while much progress is still needed for computer-assisted GO curation. Future work should focus on addressing remaining technical challenges for improved performance of automatic GO concept recognition and incorporating practical benefits of text-mining tools into real-world GO annotation. Database URL: http://www.biocreative.org/tasks/biocreative-iv/track-4-GO/.
format Online
Article
Text
id pubmed-4142793
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-41427932014-08-26 Overview of the gene ontology task at BioCreative IV Mao, Yuqing Van Auken, Kimberly Li, Donghui Arighi, Cecilia N. McQuilton, Peter Hayman, G. Thomas Tweedie, Susan Schaeffer, Mary L. Laulederkind, Stanley J. F. Wang, Shur-Jen Gobeill, Julien Ruch, Patrick Luu, Anh Tuan Kim, Jung-jae Chiang, Jung-Hsien Chen, Yu-De Yang, Chia-Jung Liu, Hongfang Zhu, Dongqing Li, Yanpeng Yu, Hong Emadzadeh, Ehsan Gonzalez, Graciela Chen, Jian-Ming Dai, Hong-Jie Lu, Zhiyong Database (Oxford) Original Article Gene Ontology (GO) annotation is a common task among model organism databases (MODs) for capturing gene function data from journal articles. It is a time-consuming and labor-intensive task, and is thus often considered as one of the bottlenecks in literature curation. There is a growing need for semiautomated or fully automated GO curation techniques that will help database curators to rapidly and accurately identify gene function information in full-length articles. Despite multiple attempts in the past, few studies have proven to be useful with regard to assisting real-world GO curation. The shortage of sentence-level training data and opportunities for interaction between text-mining developers and GO curators has limited the advances in algorithm development and corresponding use in practical circumstances. To this end, we organized a text-mining challenge task for literature-based GO annotation in BioCreative IV. More specifically, we developed two subtasks: (i) to automatically locate text passages that contain GO-relevant information (a text retrieval task) and (ii) to automatically identify relevant GO terms for the genes in a given article (a concept-recognition task). With the support from five MODs, we provided teams with >4000 unique text passages that served as the basis for each GO annotation in our task data. Such evidence text information has long been recognized as critical for text-mining algorithm development but was never made available because of the high cost of curation. In total, seven teams participated in the challenge task. From the team results, we conclude that the state of the art in automatically mining GO terms from literature has improved over the past decade while much progress is still needed for computer-assisted GO curation. Future work should focus on addressing remaining technical challenges for improved performance of automatic GO concept recognition and incorporating practical benefits of text-mining tools into real-world GO annotation. Database URL: http://www.biocreative.org/tasks/biocreative-iv/track-4-GO/. Oxford University Press 2014-08-25 /pmc/articles/PMC4142793/ /pubmed/25157073 http://dx.doi.org/10.1093/database/bau086 Text en Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.
spellingShingle Original Article
Mao, Yuqing
Van Auken, Kimberly
Li, Donghui
Arighi, Cecilia N.
McQuilton, Peter
Hayman, G. Thomas
Tweedie, Susan
Schaeffer, Mary L.
Laulederkind, Stanley J. F.
Wang, Shur-Jen
Gobeill, Julien
Ruch, Patrick
Luu, Anh Tuan
Kim, Jung-jae
Chiang, Jung-Hsien
Chen, Yu-De
Yang, Chia-Jung
Liu, Hongfang
Zhu, Dongqing
Li, Yanpeng
Yu, Hong
Emadzadeh, Ehsan
Gonzalez, Graciela
Chen, Jian-Ming
Dai, Hong-Jie
Lu, Zhiyong
Overview of the gene ontology task at BioCreative IV
title Overview of the gene ontology task at BioCreative IV
title_full Overview of the gene ontology task at BioCreative IV
title_fullStr Overview of the gene ontology task at BioCreative IV
title_full_unstemmed Overview of the gene ontology task at BioCreative IV
title_short Overview of the gene ontology task at BioCreative IV
title_sort overview of the gene ontology task at biocreative iv
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4142793/
https://www.ncbi.nlm.nih.gov/pubmed/25157073
http://dx.doi.org/10.1093/database/bau086
work_keys_str_mv AT maoyuqing overviewofthegeneontologytaskatbiocreativeiv
AT vanaukenkimberly overviewofthegeneontologytaskatbiocreativeiv
AT lidonghui overviewofthegeneontologytaskatbiocreativeiv
AT arighicecilian overviewofthegeneontologytaskatbiocreativeiv
AT mcquiltonpeter overviewofthegeneontologytaskatbiocreativeiv
AT haymangthomas overviewofthegeneontologytaskatbiocreativeiv
AT tweediesusan overviewofthegeneontologytaskatbiocreativeiv
AT schaeffermaryl overviewofthegeneontologytaskatbiocreativeiv
AT laulederkindstanleyjf overviewofthegeneontologytaskatbiocreativeiv
AT wangshurjen overviewofthegeneontologytaskatbiocreativeiv
AT gobeilljulien overviewofthegeneontologytaskatbiocreativeiv
AT ruchpatrick overviewofthegeneontologytaskatbiocreativeiv
AT luuanhtuan overviewofthegeneontologytaskatbiocreativeiv
AT kimjungjae overviewofthegeneontologytaskatbiocreativeiv
AT chiangjunghsien overviewofthegeneontologytaskatbiocreativeiv
AT chenyude overviewofthegeneontologytaskatbiocreativeiv
AT yangchiajung overviewofthegeneontologytaskatbiocreativeiv
AT liuhongfang overviewofthegeneontologytaskatbiocreativeiv
AT zhudongqing overviewofthegeneontologytaskatbiocreativeiv
AT liyanpeng overviewofthegeneontologytaskatbiocreativeiv
AT yuhong overviewofthegeneontologytaskatbiocreativeiv
AT emadzadehehsan overviewofthegeneontologytaskatbiocreativeiv
AT gonzalezgraciela overviewofthegeneontologytaskatbiocreativeiv
AT chenjianming overviewofthegeneontologytaskatbiocreativeiv
AT daihongjie overviewofthegeneontologytaskatbiocreativeiv
AT luzhiyong overviewofthegeneontologytaskatbiocreativeiv