Cargando…
A Multistage Gene Normalization System Integrating Multiple Effective Methods
Gene/protein recognition and normalization is an important preliminary step for many biological text mining tasks. In this paper, we present a multistage gene normalization system which consists of four major subtasks: pre-processing, dictionary matching, ambiguity resolution and filtering. For the...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3861319/ https://www.ncbi.nlm.nih.gov/pubmed/24349160 http://dx.doi.org/10.1371/journal.pone.0081956 |
_version_ | 1782295614256578560 |
---|---|
author | Li, Lishuang Liu, Shanshan Li, Lihua Fan, Wenting Huang, Degen Zhou, Huiwei |
author_facet | Li, Lishuang Liu, Shanshan Li, Lihua Fan, Wenting Huang, Degen Zhou, Huiwei |
author_sort | Li, Lishuang |
collection | PubMed |
description | Gene/protein recognition and normalization is an important preliminary step for many biological text mining tasks. In this paper, we present a multistage gene normalization system which consists of four major subtasks: pre-processing, dictionary matching, ambiguity resolution and filtering. For the first subtask, we apply the gene mention tagger developed in our earlier work, which achieves an F-score of 88.42% on the BioCreative II GM testing set. In the stage of dictionary matching, the exact matching and approximate matching between gene names and the EntrezGene lexicon have been combined. For the ambiguity resolution subtask, we propose a semantic similarity disambiguation method based on Munkres' Assignment Algorithm. At the last step, a filter based on Wikipedia has been built to remove the false positives. Experimental results show that the presented system can achieve an F-score of 90.1%, outperforming most of the state-of-the-art systems. |
format | Online Article Text |
id | pubmed-3861319 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-38613192013-12-17 A Multistage Gene Normalization System Integrating Multiple Effective Methods Li, Lishuang Liu, Shanshan Li, Lihua Fan, Wenting Huang, Degen Zhou, Huiwei PLoS One Research Article Gene/protein recognition and normalization is an important preliminary step for many biological text mining tasks. In this paper, we present a multistage gene normalization system which consists of four major subtasks: pre-processing, dictionary matching, ambiguity resolution and filtering. For the first subtask, we apply the gene mention tagger developed in our earlier work, which achieves an F-score of 88.42% on the BioCreative II GM testing set. In the stage of dictionary matching, the exact matching and approximate matching between gene names and the EntrezGene lexicon have been combined. For the ambiguity resolution subtask, we propose a semantic similarity disambiguation method based on Munkres' Assignment Algorithm. At the last step, a filter based on Wikipedia has been built to remove the false positives. Experimental results show that the presented system can achieve an F-score of 90.1%, outperforming most of the state-of-the-art systems. Public Library of Science 2013-12-12 /pmc/articles/PMC3861319/ /pubmed/24349160 http://dx.doi.org/10.1371/journal.pone.0081956 Text en © 2013 Li et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Li, Lishuang Liu, Shanshan Li, Lihua Fan, Wenting Huang, Degen Zhou, Huiwei A Multistage Gene Normalization System Integrating Multiple Effective Methods |
title | A Multistage Gene Normalization System Integrating Multiple Effective Methods |
title_full | A Multistage Gene Normalization System Integrating Multiple Effective Methods |
title_fullStr | A Multistage Gene Normalization System Integrating Multiple Effective Methods |
title_full_unstemmed | A Multistage Gene Normalization System Integrating Multiple Effective Methods |
title_short | A Multistage Gene Normalization System Integrating Multiple Effective Methods |
title_sort | multistage gene normalization system integrating multiple effective methods |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3861319/ https://www.ncbi.nlm.nih.gov/pubmed/24349160 http://dx.doi.org/10.1371/journal.pone.0081956 |
work_keys_str_mv | AT lilishuang amultistagegenenormalizationsystemintegratingmultipleeffectivemethods AT liushanshan amultistagegenenormalizationsystemintegratingmultipleeffectivemethods AT lilihua amultistagegenenormalizationsystemintegratingmultipleeffectivemethods AT fanwenting amultistagegenenormalizationsystemintegratingmultipleeffectivemethods AT huangdegen amultistagegenenormalizationsystemintegratingmultipleeffectivemethods AT zhouhuiwei amultistagegenenormalizationsystemintegratingmultipleeffectivemethods AT lilishuang multistagegenenormalizationsystemintegratingmultipleeffectivemethods AT liushanshan multistagegenenormalizationsystemintegratingmultipleeffectivemethods AT lilihua multistagegenenormalizationsystemintegratingmultipleeffectivemethods AT fanwenting multistagegenenormalizationsystemintegratingmultipleeffectivemethods AT huangdegen multistagegenenormalizationsystemintegratingmultipleeffectivemethods AT zhouhuiwei multistagegenenormalizationsystemintegratingmultipleeffectivemethods |