Cargando…
Gene/protein name recognition based on support vector machine using dictionary as features
BACKGROUND: Automated information extraction from biomedical literature is important because a vast amount of biomedical literature has been published. Recognition of the biomedical named entities is the first step in information extraction. We developed an automated recognition system based on the...
Autores principales: | , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2005
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1869022/ https://www.ncbi.nlm.nih.gov/pubmed/15960842 http://dx.doi.org/10.1186/1471-2105-6-S1-S8 |
_version_ | 1782133430273703936 |
---|---|
author | Mitsumori, Tomohiro Fation, Sevrani Murata, Masaki Doi, Kouichi Doi, Hirohumi |
author_facet | Mitsumori, Tomohiro Fation, Sevrani Murata, Masaki Doi, Kouichi Doi, Hirohumi |
author_sort | Mitsumori, Tomohiro |
collection | PubMed |
description | BACKGROUND: Automated information extraction from biomedical literature is important because a vast amount of biomedical literature has been published. Recognition of the biomedical named entities is the first step in information extraction. We developed an automated recognition system based on the SVM algorithm and evaluated it in Task 1.A of BioCreAtIvE, a competition for automated gene/protein name recognition. RESULTS: In the work presented here, our recognition system uses the feature set of the word, the part-of-speech (POS), the orthography, the prefix, the suffix, and the preceding class. We call these features "internal resource features", i.e., features that can be found in the training data. Additionally, we consider the features of matching against dictionaries to be external resource features. We investigated and evaluated the effect of these features as well as the effect of tuning the parameters of the SVM algorithm. We found that the dictionary matching features contributed slightly to the improvement in the performance of the f-score. We attribute this to the possibility that the dictionary matching features might overlap with other features in the current multiple feature setting. CONCLUSION: During SVM learning, each feature alone had a marginally positive effect on system performance. This supports the fact that the SVM algorithm is robust on the high dimensionality of the feature vector space and means that feature selection is not required. |
format | Text |
id | pubmed-1869022 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2005 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-18690222007-05-18 Gene/protein name recognition based on support vector machine using dictionary as features Mitsumori, Tomohiro Fation, Sevrani Murata, Masaki Doi, Kouichi Doi, Hirohumi BMC Bioinformatics Report BACKGROUND: Automated information extraction from biomedical literature is important because a vast amount of biomedical literature has been published. Recognition of the biomedical named entities is the first step in information extraction. We developed an automated recognition system based on the SVM algorithm and evaluated it in Task 1.A of BioCreAtIvE, a competition for automated gene/protein name recognition. RESULTS: In the work presented here, our recognition system uses the feature set of the word, the part-of-speech (POS), the orthography, the prefix, the suffix, and the preceding class. We call these features "internal resource features", i.e., features that can be found in the training data. Additionally, we consider the features of matching against dictionaries to be external resource features. We investigated and evaluated the effect of these features as well as the effect of tuning the parameters of the SVM algorithm. We found that the dictionary matching features contributed slightly to the improvement in the performance of the f-score. We attribute this to the possibility that the dictionary matching features might overlap with other features in the current multiple feature setting. CONCLUSION: During SVM learning, each feature alone had a marginally positive effect on system performance. This supports the fact that the SVM algorithm is robust on the high dimensionality of the feature vector space and means that feature selection is not required. BioMed Central 2005-05-24 /pmc/articles/PMC1869022/ /pubmed/15960842 http://dx.doi.org/10.1186/1471-2105-6-S1-S8 Text en Copyright © 2005 Mitsumori et al; licensee BioMed Central Ltd http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Report Mitsumori, Tomohiro Fation, Sevrani Murata, Masaki Doi, Kouichi Doi, Hirohumi Gene/protein name recognition based on support vector machine using dictionary as features |
title | Gene/protein name recognition based on support vector machine using dictionary as features |
title_full | Gene/protein name recognition based on support vector machine using dictionary as features |
title_fullStr | Gene/protein name recognition based on support vector machine using dictionary as features |
title_full_unstemmed | Gene/protein name recognition based on support vector machine using dictionary as features |
title_short | Gene/protein name recognition based on support vector machine using dictionary as features |
title_sort | gene/protein name recognition based on support vector machine using dictionary as features |
topic | Report |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1869022/ https://www.ncbi.nlm.nih.gov/pubmed/15960842 http://dx.doi.org/10.1186/1471-2105-6-S1-S8 |
work_keys_str_mv | AT mitsumoritomohiro geneproteinnamerecognitionbasedonsupportvectormachineusingdictionaryasfeatures AT fationsevrani geneproteinnamerecognitionbasedonsupportvectormachineusingdictionaryasfeatures AT muratamasaki geneproteinnamerecognitionbasedonsupportvectormachineusingdictionaryasfeatures AT doikouichi geneproteinnamerecognitionbasedonsupportvectormachineusingdictionaryasfeatures AT doihirohumi geneproteinnamerecognitionbasedonsupportvectormachineusingdictionaryasfeatures |