Cargando…

A Multistage Gene Normalization System Integrating Multiple Effective Methods

Gene/protein recognition and normalization is an important preliminary step for many biological text mining tasks. In this paper, we present a multistage gene normalization system which consists of four major subtasks: pre-processing, dictionary matching, ambiguity resolution and filtering. For the...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Lishuang, Liu, Shanshan, Li, Lihua, Fan, Wenting, Huang, Degen, Zhou, Huiwei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3861319/
https://www.ncbi.nlm.nih.gov/pubmed/24349160
http://dx.doi.org/10.1371/journal.pone.0081956
_version_ 1782295614256578560
author Li, Lishuang
Liu, Shanshan
Li, Lihua
Fan, Wenting
Huang, Degen
Zhou, Huiwei
author_facet Li, Lishuang
Liu, Shanshan
Li, Lihua
Fan, Wenting
Huang, Degen
Zhou, Huiwei
author_sort Li, Lishuang
collection PubMed
description Gene/protein recognition and normalization is an important preliminary step for many biological text mining tasks. In this paper, we present a multistage gene normalization system which consists of four major subtasks: pre-processing, dictionary matching, ambiguity resolution and filtering. For the first subtask, we apply the gene mention tagger developed in our earlier work, which achieves an F-score of 88.42% on the BioCreative II GM testing set. In the stage of dictionary matching, the exact matching and approximate matching between gene names and the EntrezGene lexicon have been combined. For the ambiguity resolution subtask, we propose a semantic similarity disambiguation method based on Munkres' Assignment Algorithm. At the last step, a filter based on Wikipedia has been built to remove the false positives. Experimental results show that the presented system can achieve an F-score of 90.1%, outperforming most of the state-of-the-art systems.
format Online
Article
Text
id pubmed-3861319
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-38613192013-12-17 A Multistage Gene Normalization System Integrating Multiple Effective Methods Li, Lishuang Liu, Shanshan Li, Lihua Fan, Wenting Huang, Degen Zhou, Huiwei PLoS One Research Article Gene/protein recognition and normalization is an important preliminary step for many biological text mining tasks. In this paper, we present a multistage gene normalization system which consists of four major subtasks: pre-processing, dictionary matching, ambiguity resolution and filtering. For the first subtask, we apply the gene mention tagger developed in our earlier work, which achieves an F-score of 88.42% on the BioCreative II GM testing set. In the stage of dictionary matching, the exact matching and approximate matching between gene names and the EntrezGene lexicon have been combined. For the ambiguity resolution subtask, we propose a semantic similarity disambiguation method based on Munkres' Assignment Algorithm. At the last step, a filter based on Wikipedia has been built to remove the false positives. Experimental results show that the presented system can achieve an F-score of 90.1%, outperforming most of the state-of-the-art systems. Public Library of Science 2013-12-12 /pmc/articles/PMC3861319/ /pubmed/24349160 http://dx.doi.org/10.1371/journal.pone.0081956 Text en © 2013 Li et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Li, Lishuang
Liu, Shanshan
Li, Lihua
Fan, Wenting
Huang, Degen
Zhou, Huiwei
A Multistage Gene Normalization System Integrating Multiple Effective Methods
title A Multistage Gene Normalization System Integrating Multiple Effective Methods
title_full A Multistage Gene Normalization System Integrating Multiple Effective Methods
title_fullStr A Multistage Gene Normalization System Integrating Multiple Effective Methods
title_full_unstemmed A Multistage Gene Normalization System Integrating Multiple Effective Methods
title_short A Multistage Gene Normalization System Integrating Multiple Effective Methods
title_sort multistage gene normalization system integrating multiple effective methods
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3861319/
https://www.ncbi.nlm.nih.gov/pubmed/24349160
http://dx.doi.org/10.1371/journal.pone.0081956
work_keys_str_mv AT lilishuang amultistagegenenormalizationsystemintegratingmultipleeffectivemethods
AT liushanshan amultistagegenenormalizationsystemintegratingmultipleeffectivemethods
AT lilihua amultistagegenenormalizationsystemintegratingmultipleeffectivemethods
AT fanwenting amultistagegenenormalizationsystemintegratingmultipleeffectivemethods
AT huangdegen amultistagegenenormalizationsystemintegratingmultipleeffectivemethods
AT zhouhuiwei amultistagegenenormalizationsystemintegratingmultipleeffectivemethods
AT lilishuang multistagegenenormalizationsystemintegratingmultipleeffectivemethods
AT liushanshan multistagegenenormalizationsystemintegratingmultipleeffectivemethods
AT lilihua multistagegenenormalizationsystemintegratingmultipleeffectivemethods
AT fanwenting multistagegenenormalizationsystemintegratingmultipleeffectivemethods
AT huangdegen multistagegenenormalizationsystemintegratingmultipleeffectivemethods
AT zhouhuiwei multistagegenenormalizationsystemintegratingmultipleeffectivemethods