Cargando…

Feature Selection Using Correlation Analysis and Principal Component Analysis for Accurate Breast Cancer Diagnosis

Breast cancer is one of the leading causes of death among women, more so than all other cancers. The accurate diagnosis of breast cancer is very difficult due to the complexity of the disease, changing treatment procedures and different patient population samples. Diagnostic techniques with better p...

Descripción completa

Detalles Bibliográficos
Autores principales: Ibrahim, Sara, Nazir, Saima, Velastin, Sergio A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8625715/
https://www.ncbi.nlm.nih.gov/pubmed/34821856
http://dx.doi.org/10.3390/jimaging7110225
_version_ 1784606489383534592
author Ibrahim, Sara
Nazir, Saima
Velastin, Sergio A.
author_facet Ibrahim, Sara
Nazir, Saima
Velastin, Sergio A.
author_sort Ibrahim, Sara
collection PubMed
description Breast cancer is one of the leading causes of death among women, more so than all other cancers. The accurate diagnosis of breast cancer is very difficult due to the complexity of the disease, changing treatment procedures and different patient population samples. Diagnostic techniques with better performance are very important for personalized care and treatment and to reduce and control the recurrence of cancer. The main objective of this research was to select feature selection techniques using correlation analysis and variance of input features before passing these significant features to a classification method. We used an ensemble method to improve the classification of breast cancer. The proposed approach was evaluated using the public WBCD dataset (Wisconsin Breast Cancer Dataset). Correlation analysis and principal component analysis were used for dimensionality reduction. Performance was evaluated for well-known machine learning classifiers, and the best seven classifiers were chosen for the next step. Hyper-parameter tuning was performed to improve the performances of the classifiers. The best performing classification algorithms were combined with two different voting techniques. Hard voting predicts the class that gets the majority vote, whereas soft voting predicts the class based on highest probability. The proposed approach performed better than state-of-the-art work, achieving an accuracy of 98.24%, high precision (99.29%) and a recall value of 95.89%.
format Online
Article
Text
id pubmed-8625715
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-86257152021-11-27 Feature Selection Using Correlation Analysis and Principal Component Analysis for Accurate Breast Cancer Diagnosis Ibrahim, Sara Nazir, Saima Velastin, Sergio A. J Imaging Article Breast cancer is one of the leading causes of death among women, more so than all other cancers. The accurate diagnosis of breast cancer is very difficult due to the complexity of the disease, changing treatment procedures and different patient population samples. Diagnostic techniques with better performance are very important for personalized care and treatment and to reduce and control the recurrence of cancer. The main objective of this research was to select feature selection techniques using correlation analysis and variance of input features before passing these significant features to a classification method. We used an ensemble method to improve the classification of breast cancer. The proposed approach was evaluated using the public WBCD dataset (Wisconsin Breast Cancer Dataset). Correlation analysis and principal component analysis were used for dimensionality reduction. Performance was evaluated for well-known machine learning classifiers, and the best seven classifiers were chosen for the next step. Hyper-parameter tuning was performed to improve the performances of the classifiers. The best performing classification algorithms were combined with two different voting techniques. Hard voting predicts the class that gets the majority vote, whereas soft voting predicts the class based on highest probability. The proposed approach performed better than state-of-the-art work, achieving an accuracy of 98.24%, high precision (99.29%) and a recall value of 95.89%. MDPI 2021-10-26 /pmc/articles/PMC8625715/ /pubmed/34821856 http://dx.doi.org/10.3390/jimaging7110225 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Ibrahim, Sara
Nazir, Saima
Velastin, Sergio A.
Feature Selection Using Correlation Analysis and Principal Component Analysis for Accurate Breast Cancer Diagnosis
title Feature Selection Using Correlation Analysis and Principal Component Analysis for Accurate Breast Cancer Diagnosis
title_full Feature Selection Using Correlation Analysis and Principal Component Analysis for Accurate Breast Cancer Diagnosis
title_fullStr Feature Selection Using Correlation Analysis and Principal Component Analysis for Accurate Breast Cancer Diagnosis
title_full_unstemmed Feature Selection Using Correlation Analysis and Principal Component Analysis for Accurate Breast Cancer Diagnosis
title_short Feature Selection Using Correlation Analysis and Principal Component Analysis for Accurate Breast Cancer Diagnosis
title_sort feature selection using correlation analysis and principal component analysis for accurate breast cancer diagnosis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8625715/
https://www.ncbi.nlm.nih.gov/pubmed/34821856
http://dx.doi.org/10.3390/jimaging7110225
work_keys_str_mv AT ibrahimsara featureselectionusingcorrelationanalysisandprincipalcomponentanalysisforaccuratebreastcancerdiagnosis
AT nazirsaima featureselectionusingcorrelationanalysisandprincipalcomponentanalysisforaccuratebreastcancerdiagnosis
AT velastinsergioa featureselectionusingcorrelationanalysisandprincipalcomponentanalysisforaccuratebreastcancerdiagnosis