Cargando…
Combined SVM-CRFs for Biological Named Entity Recognition with Maximal Bidirectional Squeezing
Biological named entity recognition, the identification of biological terms in text, is essential for biomedical information extraction. Machine learning-based approaches have been widely applied in this area. However, the recognition performance of current approaches could still be improved. Our no...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3383748/ https://www.ncbi.nlm.nih.gov/pubmed/22745720 http://dx.doi.org/10.1371/journal.pone.0039230 |
_version_ | 1782236649394012160 |
---|---|
author | Zhu, Fei Shen, Bairong |
author_facet | Zhu, Fei Shen, Bairong |
author_sort | Zhu, Fei |
collection | PubMed |
description | Biological named entity recognition, the identification of biological terms in text, is essential for biomedical information extraction. Machine learning-based approaches have been widely applied in this area. However, the recognition performance of current approaches could still be improved. Our novel approach is to combine support vector machines (SVMs) and conditional random fields (CRFs), which can complement and facilitate each other. During the hybrid process, we use SVM to separate biological terms from non-biological terms, before we use CRFs to determine the types of biological terms, which makes full use of the power of SVM as a binary-class classifier and the data-labeling capacity of CRFs. We then merge the results of SVM and CRFs. To remove any inconsistencies that might result from the merging, we develop a useful algorithm and apply two rules. To ensure biological terms with a maximum length are identified, we propose a maximal bidirectional squeezing approach that finds the longest term. We also add a positive gain to rare events to reinforce their probability and avoid bias. Our approach will also gradually extend the context so more contextual information can be included. We examined the performance of four approaches with GENIA corpus and JNLPBA04 data. The combination of SVM and CRFs improved performance. The macro-precision, macro-recall, and macro-F(1) of the SVM-CRFs hybrid approach surpassed conventional SVM and CRFs. After applying the new algorithms, the macro-F1 reached 91.67% with the GENIA corpus and 84.04% with the JNLPBA04 data. |
format | Online Article Text |
id | pubmed-3383748 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-33837482012-06-28 Combined SVM-CRFs for Biological Named Entity Recognition with Maximal Bidirectional Squeezing Zhu, Fei Shen, Bairong PLoS One Research Article Biological named entity recognition, the identification of biological terms in text, is essential for biomedical information extraction. Machine learning-based approaches have been widely applied in this area. However, the recognition performance of current approaches could still be improved. Our novel approach is to combine support vector machines (SVMs) and conditional random fields (CRFs), which can complement and facilitate each other. During the hybrid process, we use SVM to separate biological terms from non-biological terms, before we use CRFs to determine the types of biological terms, which makes full use of the power of SVM as a binary-class classifier and the data-labeling capacity of CRFs. We then merge the results of SVM and CRFs. To remove any inconsistencies that might result from the merging, we develop a useful algorithm and apply two rules. To ensure biological terms with a maximum length are identified, we propose a maximal bidirectional squeezing approach that finds the longest term. We also add a positive gain to rare events to reinforce their probability and avoid bias. Our approach will also gradually extend the context so more contextual information can be included. We examined the performance of four approaches with GENIA corpus and JNLPBA04 data. The combination of SVM and CRFs improved performance. The macro-precision, macro-recall, and macro-F(1) of the SVM-CRFs hybrid approach surpassed conventional SVM and CRFs. After applying the new algorithms, the macro-F1 reached 91.67% with the GENIA corpus and 84.04% with the JNLPBA04 data. Public Library of Science 2012-06-26 /pmc/articles/PMC3383748/ /pubmed/22745720 http://dx.doi.org/10.1371/journal.pone.0039230 Text en Zhu, Shen. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Zhu, Fei Shen, Bairong Combined SVM-CRFs for Biological Named Entity Recognition with Maximal Bidirectional Squeezing |
title | Combined SVM-CRFs for Biological Named Entity Recognition with Maximal Bidirectional Squeezing |
title_full | Combined SVM-CRFs for Biological Named Entity Recognition with Maximal Bidirectional Squeezing |
title_fullStr | Combined SVM-CRFs for Biological Named Entity Recognition with Maximal Bidirectional Squeezing |
title_full_unstemmed | Combined SVM-CRFs for Biological Named Entity Recognition with Maximal Bidirectional Squeezing |
title_short | Combined SVM-CRFs for Biological Named Entity Recognition with Maximal Bidirectional Squeezing |
title_sort | combined svm-crfs for biological named entity recognition with maximal bidirectional squeezing |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3383748/ https://www.ncbi.nlm.nih.gov/pubmed/22745720 http://dx.doi.org/10.1371/journal.pone.0039230 |
work_keys_str_mv | AT zhufei combinedsvmcrfsforbiologicalnamedentityrecognitionwithmaximalbidirectionalsqueezing AT shenbairong combinedsvmcrfsforbiologicalnamedentityrecognitionwithmaximalbidirectionalsqueezing |