Cargando…

Heterogeneous Ensemble Combination Search Using Genetic Algorithm for Class Imbalanced Data Classification

Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent ba...

Descripción completa

Detalles Bibliográficos
Autores principales: Haque, Mohammad Nazmul, Noman, Nasimul, Berretta, Regina, Moscato, Pablo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4713117/
https://www.ncbi.nlm.nih.gov/pubmed/26764911
http://dx.doi.org/10.1371/journal.pone.0146116
_version_ 1782410148393779200
author Haque, Mohammad Nazmul
Noman, Nasimul
Berretta, Regina
Moscato, Pablo
author_facet Haque, Mohammad Nazmul
Noman, Nasimul
Berretta, Regina
Moscato, Pablo
author_sort Haque, Mohammad Nazmul
collection PubMed
description Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble’s output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) − k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer’s disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases.
format Online
Article
Text
id pubmed-4713117
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-47131172016-01-26 Heterogeneous Ensemble Combination Search Using Genetic Algorithm for Class Imbalanced Data Classification Haque, Mohammad Nazmul Noman, Nasimul Berretta, Regina Moscato, Pablo PLoS One Research Article Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble’s output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) − k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer’s disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases. Public Library of Science 2016-01-14 /pmc/articles/PMC4713117/ /pubmed/26764911 http://dx.doi.org/10.1371/journal.pone.0146116 Text en © 2016 Haque et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
spellingShingle Research Article
Haque, Mohammad Nazmul
Noman, Nasimul
Berretta, Regina
Moscato, Pablo
Heterogeneous Ensemble Combination Search Using Genetic Algorithm for Class Imbalanced Data Classification
title Heterogeneous Ensemble Combination Search Using Genetic Algorithm for Class Imbalanced Data Classification
title_full Heterogeneous Ensemble Combination Search Using Genetic Algorithm for Class Imbalanced Data Classification
title_fullStr Heterogeneous Ensemble Combination Search Using Genetic Algorithm for Class Imbalanced Data Classification
title_full_unstemmed Heterogeneous Ensemble Combination Search Using Genetic Algorithm for Class Imbalanced Data Classification
title_short Heterogeneous Ensemble Combination Search Using Genetic Algorithm for Class Imbalanced Data Classification
title_sort heterogeneous ensemble combination search using genetic algorithm for class imbalanced data classification
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4713117/
https://www.ncbi.nlm.nih.gov/pubmed/26764911
http://dx.doi.org/10.1371/journal.pone.0146116
work_keys_str_mv AT haquemohammadnazmul heterogeneousensemblecombinationsearchusinggeneticalgorithmforclassimbalanceddataclassification
AT nomannasimul heterogeneousensemblecombinationsearchusinggeneticalgorithmforclassimbalanceddataclassification
AT berrettaregina heterogeneousensemblecombinationsearchusinggeneticalgorithmforclassimbalanceddataclassification
AT moscatopablo heterogeneousensemblecombinationsearchusinggeneticalgorithmforclassimbalanceddataclassification