Cargando…

Decision Support for Breast Cancer Detection: Classification Improvement Through Feature Selection

Several statistical-based approaches have been developed to support medical personnel in early breast cancer detection. This article presents a method for feature selection aimed at classifying cases into categories based on patients’ breast tissue measures and protein microarray. The effectiveness...

Descripción completa

Detalles Bibliográficos
Autores principales: Fogliatto, Flavio S., Anzanello, Michel J., Soares, Felipe, Brust-Renck, Priscila G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6755645/
https://www.ncbi.nlm.nih.gov/pubmed/31538497
http://dx.doi.org/10.1177/1073274819876598
_version_ 1783453278186504192
author Fogliatto, Flavio S.
Anzanello, Michel J.
Soares, Felipe
Brust-Renck, Priscila G.
author_facet Fogliatto, Flavio S.
Anzanello, Michel J.
Soares, Felipe
Brust-Renck, Priscila G.
author_sort Fogliatto, Flavio S.
collection PubMed
description Several statistical-based approaches have been developed to support medical personnel in early breast cancer detection. This article presents a method for feature selection aimed at classifying cases into categories based on patients’ breast tissue measures and protein microarray. The effectiveness of this feature selection strategy was evaluated against the commonly used Wisconsin Breast Cancer Database—WBCD (with several patients and fewer features) and a new protein microarray data set (with several features and fewer patients). Features were ranked according to a feature importance index that combines parameters emerging from the unsupervised method of principal component analysis and the supervised method of Bhattacharyya distance. Observations of a training set were iteratively categorized into malignant and benign cases through 3 classification techniques: k-Nearest Neighbor, linear discriminant analysis, and probabilistic neural network. After each classification, the feature with the smallest importance index was removed, and a new categorization was carried out until there was only one feature left. The subset yielding maximum accuracy was used to classify observations in the testing set. Our method yielded average 99.17% accurate classifications in the testing set while retaining average 4.61 out of 9 features in the WBCD, which is comparable to the best results reported by the literature on that data set, with the advantage of relying on simple and widely available multivariate techniques. When applied to the microarray data, the method yielded average accuracy of 98.30% while retaining average 2.17% of the original features. Our results can aid health-care professionals during early diagnosis of breast cancer.
format Online
Article
Text
id pubmed-6755645
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-67556452019-09-27 Decision Support for Breast Cancer Detection: Classification Improvement Through Feature Selection Fogliatto, Flavio S. Anzanello, Michel J. Soares, Felipe Brust-Renck, Priscila G. Cancer Control ManCAD100-Research Article Several statistical-based approaches have been developed to support medical personnel in early breast cancer detection. This article presents a method for feature selection aimed at classifying cases into categories based on patients’ breast tissue measures and protein microarray. The effectiveness of this feature selection strategy was evaluated against the commonly used Wisconsin Breast Cancer Database—WBCD (with several patients and fewer features) and a new protein microarray data set (with several features and fewer patients). Features were ranked according to a feature importance index that combines parameters emerging from the unsupervised method of principal component analysis and the supervised method of Bhattacharyya distance. Observations of a training set were iteratively categorized into malignant and benign cases through 3 classification techniques: k-Nearest Neighbor, linear discriminant analysis, and probabilistic neural network. After each classification, the feature with the smallest importance index was removed, and a new categorization was carried out until there was only one feature left. The subset yielding maximum accuracy was used to classify observations in the testing set. Our method yielded average 99.17% accurate classifications in the testing set while retaining average 4.61 out of 9 features in the WBCD, which is comparable to the best results reported by the literature on that data set, with the advantage of relying on simple and widely available multivariate techniques. When applied to the microarray data, the method yielded average accuracy of 98.30% while retaining average 2.17% of the original features. Our results can aid health-care professionals during early diagnosis of breast cancer. SAGE Publications 2019-09-20 /pmc/articles/PMC6755645/ /pubmed/31538497 http://dx.doi.org/10.1177/1073274819876598 Text en © The Author(s) 2019 http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (http://www.creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle ManCAD100-Research Article
Fogliatto, Flavio S.
Anzanello, Michel J.
Soares, Felipe
Brust-Renck, Priscila G.
Decision Support for Breast Cancer Detection: Classification Improvement Through Feature Selection
title Decision Support for Breast Cancer Detection: Classification Improvement Through Feature Selection
title_full Decision Support for Breast Cancer Detection: Classification Improvement Through Feature Selection
title_fullStr Decision Support for Breast Cancer Detection: Classification Improvement Through Feature Selection
title_full_unstemmed Decision Support for Breast Cancer Detection: Classification Improvement Through Feature Selection
title_short Decision Support for Breast Cancer Detection: Classification Improvement Through Feature Selection
title_sort decision support for breast cancer detection: classification improvement through feature selection
topic ManCAD100-Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6755645/
https://www.ncbi.nlm.nih.gov/pubmed/31538497
http://dx.doi.org/10.1177/1073274819876598
work_keys_str_mv AT fogliattoflavios decisionsupportforbreastcancerdetectionclassificationimprovementthroughfeatureselection
AT anzanellomichelj decisionsupportforbreastcancerdetectionclassificationimprovementthroughfeatureselection
AT soaresfelipe decisionsupportforbreastcancerdetectionclassificationimprovementthroughfeatureselection
AT brustrenckpriscilag decisionsupportforbreastcancerdetectionclassificationimprovementthroughfeatureselection