Cargando…

Overview of BioCreative II gene normalization

BACKGROUND: The goal of the gene normalization task is to link genes or gene products mentioned in the literature to biological databases. This is a key step in an accurate search of the biological literature. It is a challenging task, even for the human expert; genes are often described rather than...

Descripción completa

Detalles Bibliográficos
Autores principales: Morgan, Alexander A, Lu, Zhiyong, Wang, Xinglong, Cohen, Aaron M, Fluck, Juliane, Ruch, Patrick, Divoli, Anna, Fundel, Katrin, Leaman, Robert, Hakenberg, Jörg, Sun, Chengjie, Liu, Heng-hui, Torres, Rafael, Krauthammer, Michael, Lau, William W, Liu, Hongfang, Hsu, Chun-Nan, Schuemie, Martijn, Cohen, K Bretonnel, Hirschman, Lynette
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559987/
https://www.ncbi.nlm.nih.gov/pubmed/18834494
http://dx.doi.org/10.1186/gb-2008-9-s2-s3
_version_ 1782159692502401024
author Morgan, Alexander A
Lu, Zhiyong
Wang, Xinglong
Cohen, Aaron M
Fluck, Juliane
Ruch, Patrick
Divoli, Anna
Fundel, Katrin
Leaman, Robert
Hakenberg, Jörg
Sun, Chengjie
Liu, Heng-hui
Torres, Rafael
Krauthammer, Michael
Lau, William W
Liu, Hongfang
Hsu, Chun-Nan
Schuemie, Martijn
Cohen, K Bretonnel
Hirschman, Lynette
author_facet Morgan, Alexander A
Lu, Zhiyong
Wang, Xinglong
Cohen, Aaron M
Fluck, Juliane
Ruch, Patrick
Divoli, Anna
Fundel, Katrin
Leaman, Robert
Hakenberg, Jörg
Sun, Chengjie
Liu, Heng-hui
Torres, Rafael
Krauthammer, Michael
Lau, William W
Liu, Hongfang
Hsu, Chun-Nan
Schuemie, Martijn
Cohen, K Bretonnel
Hirschman, Lynette
author_sort Morgan, Alexander A
collection PubMed
description BACKGROUND: The goal of the gene normalization task is to link genes or gene products mentioned in the literature to biological databases. This is a key step in an accurate search of the biological literature. It is a challenging task, even for the human expert; genes are often described rather than referred to by gene symbol and, confusingly, one gene name may refer to different genes (often from different organisms). For BioCreative II, the task was to list the Entrez Gene identifiers for human genes or gene products mentioned in PubMed/MEDLINE abstracts. We selected abstracts associated with articles previously curated for human genes. We provided 281 expert-annotated abstracts containing 684 gene identifiers for training, and a blind test set of 262 documents containing 785 identifiers, with a gold standard created by expert annotators. Inter-annotator agreement was measured at over 90%. RESULTS: Twenty groups submitted one to three runs each, for a total of 54 runs. Three systems achieved F-measures (balanced precision and recall) between 0.80 and 0.81. Combining the system outputs using simple voting schemes and classifiers obtained improved results; the best composite system achieved an F-measure of 0.92 with 10-fold cross-validation. A 'maximum recall' system based on the pooled responses of all participants gave a recall of 0.97 (with precision 0.23), identifying 763 out of 785 identifiers. CONCLUSION: Major advances for the BioCreative II gene normalization task include broader participation (20 versus 8 teams) and a pooled system performance comparable to human experts, at over 90% agreement. These results show promise as tools to link the literature with biological databases.
format Text
id pubmed-2559987
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-25599872008-10-04 Overview of BioCreative II gene normalization Morgan, Alexander A Lu, Zhiyong Wang, Xinglong Cohen, Aaron M Fluck, Juliane Ruch, Patrick Divoli, Anna Fundel, Katrin Leaman, Robert Hakenberg, Jörg Sun, Chengjie Liu, Heng-hui Torres, Rafael Krauthammer, Michael Lau, William W Liu, Hongfang Hsu, Chun-Nan Schuemie, Martijn Cohen, K Bretonnel Hirschman, Lynette Genome Biol Research BACKGROUND: The goal of the gene normalization task is to link genes or gene products mentioned in the literature to biological databases. This is a key step in an accurate search of the biological literature. It is a challenging task, even for the human expert; genes are often described rather than referred to by gene symbol and, confusingly, one gene name may refer to different genes (often from different organisms). For BioCreative II, the task was to list the Entrez Gene identifiers for human genes or gene products mentioned in PubMed/MEDLINE abstracts. We selected abstracts associated with articles previously curated for human genes. We provided 281 expert-annotated abstracts containing 684 gene identifiers for training, and a blind test set of 262 documents containing 785 identifiers, with a gold standard created by expert annotators. Inter-annotator agreement was measured at over 90%. RESULTS: Twenty groups submitted one to three runs each, for a total of 54 runs. Three systems achieved F-measures (balanced precision and recall) between 0.80 and 0.81. Combining the system outputs using simple voting schemes and classifiers obtained improved results; the best composite system achieved an F-measure of 0.92 with 10-fold cross-validation. A 'maximum recall' system based on the pooled responses of all participants gave a recall of 0.97 (with precision 0.23), identifying 763 out of 785 identifiers. CONCLUSION: Major advances for the BioCreative II gene normalization task include broader participation (20 versus 8 teams) and a pooled system performance comparable to human experts, at over 90% agreement. These results show promise as tools to link the literature with biological databases. BioMed Central 2008 2008-09-01 /pmc/articles/PMC2559987/ /pubmed/18834494 http://dx.doi.org/10.1186/gb-2008-9-s2-s3 Text en Copyright © 2008 Morgan et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Morgan, Alexander A
Lu, Zhiyong
Wang, Xinglong
Cohen, Aaron M
Fluck, Juliane
Ruch, Patrick
Divoli, Anna
Fundel, Katrin
Leaman, Robert
Hakenberg, Jörg
Sun, Chengjie
Liu, Heng-hui
Torres, Rafael
Krauthammer, Michael
Lau, William W
Liu, Hongfang
Hsu, Chun-Nan
Schuemie, Martijn
Cohen, K Bretonnel
Hirschman, Lynette
Overview of BioCreative II gene normalization
title Overview of BioCreative II gene normalization
title_full Overview of BioCreative II gene normalization
title_fullStr Overview of BioCreative II gene normalization
title_full_unstemmed Overview of BioCreative II gene normalization
title_short Overview of BioCreative II gene normalization
title_sort overview of biocreative ii gene normalization
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559987/
https://www.ncbi.nlm.nih.gov/pubmed/18834494
http://dx.doi.org/10.1186/gb-2008-9-s2-s3
work_keys_str_mv AT morganalexandera overviewofbiocreativeiigenenormalization
AT luzhiyong overviewofbiocreativeiigenenormalization
AT wangxinglong overviewofbiocreativeiigenenormalization
AT cohenaaronm overviewofbiocreativeiigenenormalization
AT fluckjuliane overviewofbiocreativeiigenenormalization
AT ruchpatrick overviewofbiocreativeiigenenormalization
AT divolianna overviewofbiocreativeiigenenormalization
AT fundelkatrin overviewofbiocreativeiigenenormalization
AT leamanrobert overviewofbiocreativeiigenenormalization
AT hakenbergjorg overviewofbiocreativeiigenenormalization
AT sunchengjie overviewofbiocreativeiigenenormalization
AT liuhenghui overviewofbiocreativeiigenenormalization
AT torresrafael overviewofbiocreativeiigenenormalization
AT krauthammermichael overviewofbiocreativeiigenenormalization
AT lauwilliamw overviewofbiocreativeiigenenormalization
AT liuhongfang overviewofbiocreativeiigenenormalization
AT hsuchunnan overviewofbiocreativeiigenenormalization
AT schuemiemartijn overviewofbiocreativeiigenenormalization
AT cohenkbretonnel overviewofbiocreativeiigenenormalization
AT hirschmanlynette overviewofbiocreativeiigenenormalization