Cargando…

A Novel Feature Selection Technique for Text Classification Using Naïve Bayes

With the proliferation of unstructured data, text classification or text categorization has found many applications in topic classification, sentiment analysis, authorship identification, spam detection, and so on. There are many classification algorithms available. Naïve Bayes remains one of the ol...

Descripción completa

Detalles Bibliográficos
Autores principales: Dey Sarkar, Subhajit, Goswami, Saptarsi, Agarwal, Aman, Aktar, Javed
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4897287/
https://www.ncbi.nlm.nih.gov/pubmed/27433512
http://dx.doi.org/10.1155/2014/717092
_version_ 1782436126381834240
author Dey Sarkar, Subhajit
Goswami, Saptarsi
Agarwal, Aman
Aktar, Javed
author_facet Dey Sarkar, Subhajit
Goswami, Saptarsi
Agarwal, Aman
Aktar, Javed
author_sort Dey Sarkar, Subhajit
collection PubMed
description With the proliferation of unstructured data, text classification or text categorization has found many applications in topic classification, sentiment analysis, authorship identification, spam detection, and so on. There are many classification algorithms available. Naïve Bayes remains one of the oldest and most popular classifiers. On one hand, implementation of naïve Bayes is simple and, on the other hand, this also requires fewer amounts of training data. From the literature review, it is found that naïve Bayes performs poorly compared to other classifiers in text classification. As a result, this makes the naïve Bayes classifier unusable in spite of the simplicity and intuitiveness of the model. In this paper, we propose a two-step feature selection method based on firstly a univariate feature selection and then feature clustering, where we use the univariate feature selection method to reduce the search space and then apply clustering to select relatively independent feature sets. We demonstrate the effectiveness of our method by a thorough evaluation and comparison over 13 datasets. The performance improvement thus achieved makes naïve Bayes comparable or superior to other classifiers. The proposed algorithm is shown to outperform other traditional methods like greedy search based wrapper or CFS.
format Online
Article
Text
id pubmed-4897287
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-48972872016-07-18 A Novel Feature Selection Technique for Text Classification Using Naïve Bayes Dey Sarkar, Subhajit Goswami, Saptarsi Agarwal, Aman Aktar, Javed Int Sch Res Notices Research Article With the proliferation of unstructured data, text classification or text categorization has found many applications in topic classification, sentiment analysis, authorship identification, spam detection, and so on. There are many classification algorithms available. Naïve Bayes remains one of the oldest and most popular classifiers. On one hand, implementation of naïve Bayes is simple and, on the other hand, this also requires fewer amounts of training data. From the literature review, it is found that naïve Bayes performs poorly compared to other classifiers in text classification. As a result, this makes the naïve Bayes classifier unusable in spite of the simplicity and intuitiveness of the model. In this paper, we propose a two-step feature selection method based on firstly a univariate feature selection and then feature clustering, where we use the univariate feature selection method to reduce the search space and then apply clustering to select relatively independent feature sets. We demonstrate the effectiveness of our method by a thorough evaluation and comparison over 13 datasets. The performance improvement thus achieved makes naïve Bayes comparable or superior to other classifiers. The proposed algorithm is shown to outperform other traditional methods like greedy search based wrapper or CFS. Hindawi Publishing Corporation 2014-10-28 /pmc/articles/PMC4897287/ /pubmed/27433512 http://dx.doi.org/10.1155/2014/717092 Text en Copyright © 2014 Subhajit Dey Sarkar et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Dey Sarkar, Subhajit
Goswami, Saptarsi
Agarwal, Aman
Aktar, Javed
A Novel Feature Selection Technique for Text Classification Using Naïve Bayes
title A Novel Feature Selection Technique for Text Classification Using Naïve Bayes
title_full A Novel Feature Selection Technique for Text Classification Using Naïve Bayes
title_fullStr A Novel Feature Selection Technique for Text Classification Using Naïve Bayes
title_full_unstemmed A Novel Feature Selection Technique for Text Classification Using Naïve Bayes
title_short A Novel Feature Selection Technique for Text Classification Using Naïve Bayes
title_sort novel feature selection technique for text classification using naïve bayes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4897287/
https://www.ncbi.nlm.nih.gov/pubmed/27433512
http://dx.doi.org/10.1155/2014/717092
work_keys_str_mv AT deysarkarsubhajit anovelfeatureselectiontechniquefortextclassificationusingnaivebayes
AT goswamisaptarsi anovelfeatureselectiontechniquefortextclassificationusingnaivebayes
AT agarwalaman anovelfeatureselectiontechniquefortextclassificationusingnaivebayes
AT aktarjaved anovelfeatureselectiontechniquefortextclassificationusingnaivebayes
AT deysarkarsubhajit novelfeatureselectiontechniquefortextclassificationusingnaivebayes
AT goswamisaptarsi novelfeatureselectiontechniquefortextclassificationusingnaivebayes
AT agarwalaman novelfeatureselectiontechniquefortextclassificationusingnaivebayes
AT aktarjaved novelfeatureselectiontechniquefortextclassificationusingnaivebayes