Cargando…

An Ant Colony Optimization Based Feature Selection for Web Page Classification

The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features suc...

Descripción completa

Detalles Bibliográficos
Autores principales: Saraç, Esra, Özel, Selma Ayşe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4127204/
https://www.ncbi.nlm.nih.gov/pubmed/25136678
http://dx.doi.org/10.1155/2014/649260
_version_ 1782329996302352384
author Saraç, Esra
Özel, Selma Ayşe
author_facet Saraç, Esra
Özel, Selma Ayşe
author_sort Saraç, Esra
collection PubMed
description The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.
format Online
Article
Text
id pubmed-4127204
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-41272042014-08-18 An Ant Colony Optimization Based Feature Selection for Web Page Classification Saraç, Esra Özel, Selma Ayşe ScientificWorldJournal Research Article The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods. Hindawi Publishing Corporation 2014 2014-07-17 /pmc/articles/PMC4127204/ /pubmed/25136678 http://dx.doi.org/10.1155/2014/649260 Text en Copyright © 2014 E. Saraç and S. A. Özel. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Saraç, Esra
Özel, Selma Ayşe
An Ant Colony Optimization Based Feature Selection for Web Page Classification
title An Ant Colony Optimization Based Feature Selection for Web Page Classification
title_full An Ant Colony Optimization Based Feature Selection for Web Page Classification
title_fullStr An Ant Colony Optimization Based Feature Selection for Web Page Classification
title_full_unstemmed An Ant Colony Optimization Based Feature Selection for Web Page Classification
title_short An Ant Colony Optimization Based Feature Selection for Web Page Classification
title_sort ant colony optimization based feature selection for web page classification
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4127204/
https://www.ncbi.nlm.nih.gov/pubmed/25136678
http://dx.doi.org/10.1155/2014/649260
work_keys_str_mv AT saracesra anantcolonyoptimizationbasedfeatureselectionforwebpageclassification
AT ozelselmaayse anantcolonyoptimizationbasedfeatureselectionforwebpageclassification
AT saracesra antcolonyoptimizationbasedfeatureselectionforwebpageclassification
AT ozelselmaayse antcolonyoptimizationbasedfeatureselectionforwebpageclassification