Cargando…

Gene/protein name recognition based on support vector machine using dictionary as features

BACKGROUND: Automated information extraction from biomedical literature is important because a vast amount of biomedical literature has been published. Recognition of the biomedical named entities is the first step in information extraction. We developed an automated recognition system based on the...

Descripción completa

Detalles Bibliográficos
Autores principales: Mitsumori, Tomohiro, Fation, Sevrani, Murata, Masaki, Doi, Kouichi, Doi, Hirohumi
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1869022/
https://www.ncbi.nlm.nih.gov/pubmed/15960842
http://dx.doi.org/10.1186/1471-2105-6-S1-S8
_version_ 1782133430273703936
author Mitsumori, Tomohiro
Fation, Sevrani
Murata, Masaki
Doi, Kouichi
Doi, Hirohumi
author_facet Mitsumori, Tomohiro
Fation, Sevrani
Murata, Masaki
Doi, Kouichi
Doi, Hirohumi
author_sort Mitsumori, Tomohiro
collection PubMed
description BACKGROUND: Automated information extraction from biomedical literature is important because a vast amount of biomedical literature has been published. Recognition of the biomedical named entities is the first step in information extraction. We developed an automated recognition system based on the SVM algorithm and evaluated it in Task 1.A of BioCreAtIvE, a competition for automated gene/protein name recognition. RESULTS: In the work presented here, our recognition system uses the feature set of the word, the part-of-speech (POS), the orthography, the prefix, the suffix, and the preceding class. We call these features "internal resource features", i.e., features that can be found in the training data. Additionally, we consider the features of matching against dictionaries to be external resource features. We investigated and evaluated the effect of these features as well as the effect of tuning the parameters of the SVM algorithm. We found that the dictionary matching features contributed slightly to the improvement in the performance of the f-score. We attribute this to the possibility that the dictionary matching features might overlap with other features in the current multiple feature setting. CONCLUSION: During SVM learning, each feature alone had a marginally positive effect on system performance. This supports the fact that the SVM algorithm is robust on the high dimensionality of the feature vector space and means that feature selection is not required.
format Text
id pubmed-1869022
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18690222007-05-18 Gene/protein name recognition based on support vector machine using dictionary as features Mitsumori, Tomohiro Fation, Sevrani Murata, Masaki Doi, Kouichi Doi, Hirohumi BMC Bioinformatics Report BACKGROUND: Automated information extraction from biomedical literature is important because a vast amount of biomedical literature has been published. Recognition of the biomedical named entities is the first step in information extraction. We developed an automated recognition system based on the SVM algorithm and evaluated it in Task 1.A of BioCreAtIvE, a competition for automated gene/protein name recognition. RESULTS: In the work presented here, our recognition system uses the feature set of the word, the part-of-speech (POS), the orthography, the prefix, the suffix, and the preceding class. We call these features "internal resource features", i.e., features that can be found in the training data. Additionally, we consider the features of matching against dictionaries to be external resource features. We investigated and evaluated the effect of these features as well as the effect of tuning the parameters of the SVM algorithm. We found that the dictionary matching features contributed slightly to the improvement in the performance of the f-score. We attribute this to the possibility that the dictionary matching features might overlap with other features in the current multiple feature setting. CONCLUSION: During SVM learning, each feature alone had a marginally positive effect on system performance. This supports the fact that the SVM algorithm is robust on the high dimensionality of the feature vector space and means that feature selection is not required. BioMed Central 2005-05-24 /pmc/articles/PMC1869022/ /pubmed/15960842 http://dx.doi.org/10.1186/1471-2105-6-S1-S8 Text en Copyright © 2005 Mitsumori et al; licensee BioMed Central Ltd http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Report
Mitsumori, Tomohiro
Fation, Sevrani
Murata, Masaki
Doi, Kouichi
Doi, Hirohumi
Gene/protein name recognition based on support vector machine using dictionary as features
title Gene/protein name recognition based on support vector machine using dictionary as features
title_full Gene/protein name recognition based on support vector machine using dictionary as features
title_fullStr Gene/protein name recognition based on support vector machine using dictionary as features
title_full_unstemmed Gene/protein name recognition based on support vector machine using dictionary as features
title_short Gene/protein name recognition based on support vector machine using dictionary as features
title_sort gene/protein name recognition based on support vector machine using dictionary as features
topic Report
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1869022/
https://www.ncbi.nlm.nih.gov/pubmed/15960842
http://dx.doi.org/10.1186/1471-2105-6-S1-S8
work_keys_str_mv AT mitsumoritomohiro geneproteinnamerecognitionbasedonsupportvectormachineusingdictionaryasfeatures
AT fationsevrani geneproteinnamerecognitionbasedonsupportvectormachineusingdictionaryasfeatures
AT muratamasaki geneproteinnamerecognitionbasedonsupportvectormachineusingdictionaryasfeatures
AT doikouichi geneproteinnamerecognitionbasedonsupportvectormachineusingdictionaryasfeatures
AT doihirohumi geneproteinnamerecognitionbasedonsupportvectormachineusingdictionaryasfeatures