Cargando…
Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization
The filtering feature-selection algorithm is a kind of important approach to dimensionality reduction in the field of the text categorization. Most of filtering feature-selection algorithms evaluate the significance of a feature for category based on balanced dataset and do not consider the imbalanc...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi Publishing Corporation
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4058251/ https://www.ncbi.nlm.nih.gov/pubmed/24971386 http://dx.doi.org/10.1155/2014/625342 |
_version_ | 1782321105162207232 |
---|---|
author | Yang, Jieming Qu, Zhaoyang Liu, Zhiying |
author_facet | Yang, Jieming Qu, Zhaoyang Liu, Zhiying |
author_sort | Yang, Jieming |
collection | PubMed |
description | The filtering feature-selection algorithm is a kind of important approach to dimensionality reduction in the field of the text categorization. Most of filtering feature-selection algorithms evaluate the significance of a feature for category based on balanced dataset and do not consider the imbalance factor of dataset. In this paper, a new scheme was proposed, which can weaken the adverse effect caused by the imbalance factor in the corpus. We evaluated the improved versions of nine well-known feature-selection methods (Information Gain, Chi statistic, Document Frequency, Orthogonal Centroid Feature Selection, DIA association factor, Comprehensive Measurement Feature Selection, Deviation from Poisson Feature Selection, improved Gini index, and Mutual Information) using naïve Bayes and support vector machines on three benchmark document collections (20-Newsgroups, Reuters-21578, and WebKB). The experimental results show that the improved scheme can significantly enhance the performance of the feature-selection methods. |
format | Online Article Text |
id | pubmed-4058251 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Hindawi Publishing Corporation |
record_format | MEDLINE/PubMed |
spelling | pubmed-40582512014-06-26 Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization Yang, Jieming Qu, Zhaoyang Liu, Zhiying ScientificWorldJournal Research Article The filtering feature-selection algorithm is a kind of important approach to dimensionality reduction in the field of the text categorization. Most of filtering feature-selection algorithms evaluate the significance of a feature for category based on balanced dataset and do not consider the imbalance factor of dataset. In this paper, a new scheme was proposed, which can weaken the adverse effect caused by the imbalance factor in the corpus. We evaluated the improved versions of nine well-known feature-selection methods (Information Gain, Chi statistic, Document Frequency, Orthogonal Centroid Feature Selection, DIA association factor, Comprehensive Measurement Feature Selection, Deviation from Poisson Feature Selection, improved Gini index, and Mutual Information) using naïve Bayes and support vector machines on three benchmark document collections (20-Newsgroups, Reuters-21578, and WebKB). The experimental results show that the improved scheme can significantly enhance the performance of the feature-selection methods. Hindawi Publishing Corporation 2014 2014-05-26 /pmc/articles/PMC4058251/ /pubmed/24971386 http://dx.doi.org/10.1155/2014/625342 Text en Copyright © 2014 Jieming Yang et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Yang, Jieming Qu, Zhaoyang Liu, Zhiying Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization |
title | Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization |
title_full | Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization |
title_fullStr | Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization |
title_full_unstemmed | Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization |
title_short | Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization |
title_sort | improved feature-selection method considering the imbalance problem in text categorization |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4058251/ https://www.ncbi.nlm.nih.gov/pubmed/24971386 http://dx.doi.org/10.1155/2014/625342 |
work_keys_str_mv | AT yangjieming improvedfeatureselectionmethodconsideringtheimbalanceproblemintextcategorization AT quzhaoyang improvedfeatureselectionmethodconsideringtheimbalanceproblemintextcategorization AT liuzhiying improvedfeatureselectionmethodconsideringtheimbalanceproblemintextcategorization |