Cargando…
Overview of the gene ontology task at BioCreative IV
Gene Ontology (GO) annotation is a common task among model organism databases (MODs) for capturing gene function data from journal articles. It is a time-consuming and labor-intensive task, and is thus often considered as one of the bottlenecks in literature curation. There is a growing need for sem...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4142793/ https://www.ncbi.nlm.nih.gov/pubmed/25157073 http://dx.doi.org/10.1093/database/bau086 |
_version_ | 1782331815956054016 |
---|---|
author | Mao, Yuqing Van Auken, Kimberly Li, Donghui Arighi, Cecilia N. McQuilton, Peter Hayman, G. Thomas Tweedie, Susan Schaeffer, Mary L. Laulederkind, Stanley J. F. Wang, Shur-Jen Gobeill, Julien Ruch, Patrick Luu, Anh Tuan Kim, Jung-jae Chiang, Jung-Hsien Chen, Yu-De Yang, Chia-Jung Liu, Hongfang Zhu, Dongqing Li, Yanpeng Yu, Hong Emadzadeh, Ehsan Gonzalez, Graciela Chen, Jian-Ming Dai, Hong-Jie Lu, Zhiyong |
author_facet | Mao, Yuqing Van Auken, Kimberly Li, Donghui Arighi, Cecilia N. McQuilton, Peter Hayman, G. Thomas Tweedie, Susan Schaeffer, Mary L. Laulederkind, Stanley J. F. Wang, Shur-Jen Gobeill, Julien Ruch, Patrick Luu, Anh Tuan Kim, Jung-jae Chiang, Jung-Hsien Chen, Yu-De Yang, Chia-Jung Liu, Hongfang Zhu, Dongqing Li, Yanpeng Yu, Hong Emadzadeh, Ehsan Gonzalez, Graciela Chen, Jian-Ming Dai, Hong-Jie Lu, Zhiyong |
author_sort | Mao, Yuqing |
collection | PubMed |
description | Gene Ontology (GO) annotation is a common task among model organism databases (MODs) for capturing gene function data from journal articles. It is a time-consuming and labor-intensive task, and is thus often considered as one of the bottlenecks in literature curation. There is a growing need for semiautomated or fully automated GO curation techniques that will help database curators to rapidly and accurately identify gene function information in full-length articles. Despite multiple attempts in the past, few studies have proven to be useful with regard to assisting real-world GO curation. The shortage of sentence-level training data and opportunities for interaction between text-mining developers and GO curators has limited the advances in algorithm development and corresponding use in practical circumstances. To this end, we organized a text-mining challenge task for literature-based GO annotation in BioCreative IV. More specifically, we developed two subtasks: (i) to automatically locate text passages that contain GO-relevant information (a text retrieval task) and (ii) to automatically identify relevant GO terms for the genes in a given article (a concept-recognition task). With the support from five MODs, we provided teams with >4000 unique text passages that served as the basis for each GO annotation in our task data. Such evidence text information has long been recognized as critical for text-mining algorithm development but was never made available because of the high cost of curation. In total, seven teams participated in the challenge task. From the team results, we conclude that the state of the art in automatically mining GO terms from literature has improved over the past decade while much progress is still needed for computer-assisted GO curation. Future work should focus on addressing remaining technical challenges for improved performance of automatic GO concept recognition and incorporating practical benefits of text-mining tools into real-world GO annotation. Database URL: http://www.biocreative.org/tasks/biocreative-iv/track-4-GO/. |
format | Online Article Text |
id | pubmed-4142793 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-41427932014-08-26 Overview of the gene ontology task at BioCreative IV Mao, Yuqing Van Auken, Kimberly Li, Donghui Arighi, Cecilia N. McQuilton, Peter Hayman, G. Thomas Tweedie, Susan Schaeffer, Mary L. Laulederkind, Stanley J. F. Wang, Shur-Jen Gobeill, Julien Ruch, Patrick Luu, Anh Tuan Kim, Jung-jae Chiang, Jung-Hsien Chen, Yu-De Yang, Chia-Jung Liu, Hongfang Zhu, Dongqing Li, Yanpeng Yu, Hong Emadzadeh, Ehsan Gonzalez, Graciela Chen, Jian-Ming Dai, Hong-Jie Lu, Zhiyong Database (Oxford) Original Article Gene Ontology (GO) annotation is a common task among model organism databases (MODs) for capturing gene function data from journal articles. It is a time-consuming and labor-intensive task, and is thus often considered as one of the bottlenecks in literature curation. There is a growing need for semiautomated or fully automated GO curation techniques that will help database curators to rapidly and accurately identify gene function information in full-length articles. Despite multiple attempts in the past, few studies have proven to be useful with regard to assisting real-world GO curation. The shortage of sentence-level training data and opportunities for interaction between text-mining developers and GO curators has limited the advances in algorithm development and corresponding use in practical circumstances. To this end, we organized a text-mining challenge task for literature-based GO annotation in BioCreative IV. More specifically, we developed two subtasks: (i) to automatically locate text passages that contain GO-relevant information (a text retrieval task) and (ii) to automatically identify relevant GO terms for the genes in a given article (a concept-recognition task). With the support from five MODs, we provided teams with >4000 unique text passages that served as the basis for each GO annotation in our task data. Such evidence text information has long been recognized as critical for text-mining algorithm development but was never made available because of the high cost of curation. In total, seven teams participated in the challenge task. From the team results, we conclude that the state of the art in automatically mining GO terms from literature has improved over the past decade while much progress is still needed for computer-assisted GO curation. Future work should focus on addressing remaining technical challenges for improved performance of automatic GO concept recognition and incorporating practical benefits of text-mining tools into real-world GO annotation. Database URL: http://www.biocreative.org/tasks/biocreative-iv/track-4-GO/. Oxford University Press 2014-08-25 /pmc/articles/PMC4142793/ /pubmed/25157073 http://dx.doi.org/10.1093/database/bau086 Text en Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US. |
spellingShingle | Original Article Mao, Yuqing Van Auken, Kimberly Li, Donghui Arighi, Cecilia N. McQuilton, Peter Hayman, G. Thomas Tweedie, Susan Schaeffer, Mary L. Laulederkind, Stanley J. F. Wang, Shur-Jen Gobeill, Julien Ruch, Patrick Luu, Anh Tuan Kim, Jung-jae Chiang, Jung-Hsien Chen, Yu-De Yang, Chia-Jung Liu, Hongfang Zhu, Dongqing Li, Yanpeng Yu, Hong Emadzadeh, Ehsan Gonzalez, Graciela Chen, Jian-Ming Dai, Hong-Jie Lu, Zhiyong Overview of the gene ontology task at BioCreative IV |
title | Overview of the gene ontology task at BioCreative IV |
title_full | Overview of the gene ontology task at BioCreative IV |
title_fullStr | Overview of the gene ontology task at BioCreative IV |
title_full_unstemmed | Overview of the gene ontology task at BioCreative IV |
title_short | Overview of the gene ontology task at BioCreative IV |
title_sort | overview of the gene ontology task at biocreative iv |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4142793/ https://www.ncbi.nlm.nih.gov/pubmed/25157073 http://dx.doi.org/10.1093/database/bau086 |
work_keys_str_mv | AT maoyuqing overviewofthegeneontologytaskatbiocreativeiv AT vanaukenkimberly overviewofthegeneontologytaskatbiocreativeiv AT lidonghui overviewofthegeneontologytaskatbiocreativeiv AT arighicecilian overviewofthegeneontologytaskatbiocreativeiv AT mcquiltonpeter overviewofthegeneontologytaskatbiocreativeiv AT haymangthomas overviewofthegeneontologytaskatbiocreativeiv AT tweediesusan overviewofthegeneontologytaskatbiocreativeiv AT schaeffermaryl overviewofthegeneontologytaskatbiocreativeiv AT laulederkindstanleyjf overviewofthegeneontologytaskatbiocreativeiv AT wangshurjen overviewofthegeneontologytaskatbiocreativeiv AT gobeilljulien overviewofthegeneontologytaskatbiocreativeiv AT ruchpatrick overviewofthegeneontologytaskatbiocreativeiv AT luuanhtuan overviewofthegeneontologytaskatbiocreativeiv AT kimjungjae overviewofthegeneontologytaskatbiocreativeiv AT chiangjunghsien overviewofthegeneontologytaskatbiocreativeiv AT chenyude overviewofthegeneontologytaskatbiocreativeiv AT yangchiajung overviewofthegeneontologytaskatbiocreativeiv AT liuhongfang overviewofthegeneontologytaskatbiocreativeiv AT zhudongqing overviewofthegeneontologytaskatbiocreativeiv AT liyanpeng overviewofthegeneontologytaskatbiocreativeiv AT yuhong overviewofthegeneontologytaskatbiocreativeiv AT emadzadehehsan overviewofthegeneontologytaskatbiocreativeiv AT gonzalezgraciela overviewofthegeneontologytaskatbiocreativeiv AT chenjianming overviewofthegeneontologytaskatbiocreativeiv AT daihongjie overviewofthegeneontologytaskatbiocreativeiv AT luzhiyong overviewofthegeneontologytaskatbiocreativeiv |