Cargando…

Modified Bat Algorithm for Feature Selection with the Wisconsin Diagnosis Breast Cancer (WDBC) Dataset

Early diagnosis of breast cancer is essential to save lives of patients. Usually, medical datasets include a large variety of data that can lead to confusion during diagnosis. The Knowledge Discovery on Database (KDD) process helps to improve efficiency. It requires elimination of inappropriate and...

Descripción completa

Detalles Bibliográficos
Autores principales: Jeyasingh, Suganthi, Veluchamy, Malathi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: West Asia Organization for Cancer Prevention 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5555532/
https://www.ncbi.nlm.nih.gov/pubmed/28610411
http://dx.doi.org/10.22034/APJCP.2017.18.5.1257
_version_ 1783256934363693056
author Jeyasingh, Suganthi
Veluchamy, Malathi
author_facet Jeyasingh, Suganthi
Veluchamy, Malathi
author_sort Jeyasingh, Suganthi
collection PubMed
description Early diagnosis of breast cancer is essential to save lives of patients. Usually, medical datasets include a large variety of data that can lead to confusion during diagnosis. The Knowledge Discovery on Database (KDD) process helps to improve efficiency. It requires elimination of inappropriate and repeated data from the dataset before final diagnosis. This can be done using any of the feature selection algorithms available in data mining. Feature selection is considered as a vital step to increase the classification accuracy. This paper proposes a Modified Bat Algorithm (MBA) for feature selection to eliminate irrelevant features from an original dataset. The Bat algorithm was modified using simple random sampling to select the random instances from the dataset. Ranking was with the global best features to recognize the predominant features available in the dataset. The selected features are used to train a Random Forest (RF) classification algorithm. The MBA feature selection algorithm enhanced the classification accuracy of RF in identifying the occurrence of breast cancer. The Wisconsin Diagnosis Breast Cancer Dataset (WDBC) was used for estimating the performance analysis of the proposed MBA feature selection algorithm. The proposed algorithm achieved better performance in terms of Kappa statistic, Mathew’s Correlation Coefficient, Precision, F-measure, Recall, Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Relative Absolute Error (RAE) and Root Relative Squared Error (RRSE).
format Online
Article
Text
id pubmed-5555532
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher West Asia Organization for Cancer Prevention
record_format MEDLINE/PubMed
spelling pubmed-55555322017-08-28 Modified Bat Algorithm for Feature Selection with the Wisconsin Diagnosis Breast Cancer (WDBC) Dataset Jeyasingh, Suganthi Veluchamy, Malathi Asian Pac J Cancer Prev Research Article Early diagnosis of breast cancer is essential to save lives of patients. Usually, medical datasets include a large variety of data that can lead to confusion during diagnosis. The Knowledge Discovery on Database (KDD) process helps to improve efficiency. It requires elimination of inappropriate and repeated data from the dataset before final diagnosis. This can be done using any of the feature selection algorithms available in data mining. Feature selection is considered as a vital step to increase the classification accuracy. This paper proposes a Modified Bat Algorithm (MBA) for feature selection to eliminate irrelevant features from an original dataset. The Bat algorithm was modified using simple random sampling to select the random instances from the dataset. Ranking was with the global best features to recognize the predominant features available in the dataset. The selected features are used to train a Random Forest (RF) classification algorithm. The MBA feature selection algorithm enhanced the classification accuracy of RF in identifying the occurrence of breast cancer. The Wisconsin Diagnosis Breast Cancer Dataset (WDBC) was used for estimating the performance analysis of the proposed MBA feature selection algorithm. The proposed algorithm achieved better performance in terms of Kappa statistic, Mathew’s Correlation Coefficient, Precision, F-measure, Recall, Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Relative Absolute Error (RAE) and Root Relative Squared Error (RRSE). West Asia Organization for Cancer Prevention 2017 /pmc/articles/PMC5555532/ /pubmed/28610411 http://dx.doi.org/10.22034/APJCP.2017.18.5.1257 Text en Copyright: © Asian Pacific Journal of Cancer Prevention http://creativecommons.org/licenses/BY-SA/4.0 This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
spellingShingle Research Article
Jeyasingh, Suganthi
Veluchamy, Malathi
Modified Bat Algorithm for Feature Selection with the Wisconsin Diagnosis Breast Cancer (WDBC) Dataset
title Modified Bat Algorithm for Feature Selection with the Wisconsin Diagnosis Breast Cancer (WDBC) Dataset
title_full Modified Bat Algorithm for Feature Selection with the Wisconsin Diagnosis Breast Cancer (WDBC) Dataset
title_fullStr Modified Bat Algorithm for Feature Selection with the Wisconsin Diagnosis Breast Cancer (WDBC) Dataset
title_full_unstemmed Modified Bat Algorithm for Feature Selection with the Wisconsin Diagnosis Breast Cancer (WDBC) Dataset
title_short Modified Bat Algorithm for Feature Selection with the Wisconsin Diagnosis Breast Cancer (WDBC) Dataset
title_sort modified bat algorithm for feature selection with the wisconsin diagnosis breast cancer (wdbc) dataset
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5555532/
https://www.ncbi.nlm.nih.gov/pubmed/28610411
http://dx.doi.org/10.22034/APJCP.2017.18.5.1257
work_keys_str_mv AT jeyasinghsuganthi modifiedbatalgorithmforfeatureselectionwiththewisconsindiagnosisbreastcancerwdbcdataset
AT veluchamymalathi modifiedbatalgorithmforfeatureselectionwiththewisconsindiagnosisbreastcancerwdbcdataset