Cargando…

Collective Instance-Level Gene Normalization on the IGN Corpus

A high proportion of life science researches are gene-oriented, in which scientists aim to investigate the roles that genes play in biological processes, and their involvement in biological mechanisms. As a result, gene names and their related information turn out to be one of the main objects of in...

Descripción completa

Detalles Bibliográficos
Autores principales: Dai, Hong-Jie, Wu, Johnny Chi-Yang, Tsai, Richard Tzong-Han
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3839972/
https://www.ncbi.nlm.nih.gov/pubmed/24282506
http://dx.doi.org/10.1371/journal.pone.0079517
_version_ 1782478461219110912
author Dai, Hong-Jie
Wu, Johnny Chi-Yang
Tsai, Richard Tzong-Han
author_facet Dai, Hong-Jie
Wu, Johnny Chi-Yang
Tsai, Richard Tzong-Han
author_sort Dai, Hong-Jie
collection PubMed
description A high proportion of life science researches are gene-oriented, in which scientists aim to investigate the roles that genes play in biological processes, and their involvement in biological mechanisms. As a result, gene names and their related information turn out to be one of the main objects of interest in biomedical literatures. While the capability of recognizing gene mentions has made significant progress, the results of recognition are still insufficient for direct use due to the ambiguity of gene names. Gene normalization (GN) goes beyond the recognition task by linking a gene mention to a database ID. Unlike most previous works, we approach GN on the instance-level and evaluate its overall performance on the recognition and normalization steps in abstracts and full texts. We release the first instance-level gene normalization (IGN) corpus in the BioC format, which includes annotations for the boundaries of all gene mentions and the corresponding IDs for human gene mentions. Species information, along with existing co-reference chains and full name/abbreviation pairs are also provided for each gene mention. Using the released corpus, we have designed a collective instance-level GN approach using not only the contextual information of each individual instance, but also the relations among instances and the inherent characteristics of full-text sections. Our experimental results show that our collective approach can achieve an F-score of 0.743. The proposed approach that exploits section characteristics in full-text articles can improve the F-scores of information lacking sections by up to 1.8%. In addition, using the proposed refinement process improved the F-score of gene mention recognition by 0.125 and that of GN by 0.03. Whereas current experimental results are limited to the human species, we seek to continue updating the annotations of the IGN corpus and observe how the proposed approach can be extended to other species.
format Online
Article
Text
id pubmed-3839972
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-38399722013-11-26 Collective Instance-Level Gene Normalization on the IGN Corpus Dai, Hong-Jie Wu, Johnny Chi-Yang Tsai, Richard Tzong-Han PLoS One Research Article A high proportion of life science researches are gene-oriented, in which scientists aim to investigate the roles that genes play in biological processes, and their involvement in biological mechanisms. As a result, gene names and their related information turn out to be one of the main objects of interest in biomedical literatures. While the capability of recognizing gene mentions has made significant progress, the results of recognition are still insufficient for direct use due to the ambiguity of gene names. Gene normalization (GN) goes beyond the recognition task by linking a gene mention to a database ID. Unlike most previous works, we approach GN on the instance-level and evaluate its overall performance on the recognition and normalization steps in abstracts and full texts. We release the first instance-level gene normalization (IGN) corpus in the BioC format, which includes annotations for the boundaries of all gene mentions and the corresponding IDs for human gene mentions. Species information, along with existing co-reference chains and full name/abbreviation pairs are also provided for each gene mention. Using the released corpus, we have designed a collective instance-level GN approach using not only the contextual information of each individual instance, but also the relations among instances and the inherent characteristics of full-text sections. Our experimental results show that our collective approach can achieve an F-score of 0.743. The proposed approach that exploits section characteristics in full-text articles can improve the F-scores of information lacking sections by up to 1.8%. In addition, using the proposed refinement process improved the F-score of gene mention recognition by 0.125 and that of GN by 0.03. Whereas current experimental results are limited to the human species, we seek to continue updating the annotations of the IGN corpus and observe how the proposed approach can be extended to other species. Public Library of Science 2013-11-25 /pmc/articles/PMC3839972/ /pubmed/24282506 http://dx.doi.org/10.1371/journal.pone.0079517 Text en © 2013 Dai et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Dai, Hong-Jie
Wu, Johnny Chi-Yang
Tsai, Richard Tzong-Han
Collective Instance-Level Gene Normalization on the IGN Corpus
title Collective Instance-Level Gene Normalization on the IGN Corpus
title_full Collective Instance-Level Gene Normalization on the IGN Corpus
title_fullStr Collective Instance-Level Gene Normalization on the IGN Corpus
title_full_unstemmed Collective Instance-Level Gene Normalization on the IGN Corpus
title_short Collective Instance-Level Gene Normalization on the IGN Corpus
title_sort collective instance-level gene normalization on the ign corpus
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3839972/
https://www.ncbi.nlm.nih.gov/pubmed/24282506
http://dx.doi.org/10.1371/journal.pone.0079517
work_keys_str_mv AT daihongjie collectiveinstancelevelgenenormalizationontheigncorpus
AT wujohnnychiyang collectiveinstancelevelgenenormalizationontheigncorpus
AT tsairichardtzonghan collectiveinstancelevelgenenormalizationontheigncorpus