Cargando…
Decision Support for Breast Cancer Detection: Classification Improvement Through Feature Selection
Several statistical-based approaches have been developed to support medical personnel in early breast cancer detection. This article presents a method for feature selection aimed at classifying cases into categories based on patients’ breast tissue measures and protein microarray. The effectiveness...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
SAGE Publications
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6755645/ https://www.ncbi.nlm.nih.gov/pubmed/31538497 http://dx.doi.org/10.1177/1073274819876598 |
_version_ | 1783453278186504192 |
---|---|
author | Fogliatto, Flavio S. Anzanello, Michel J. Soares, Felipe Brust-Renck, Priscila G. |
author_facet | Fogliatto, Flavio S. Anzanello, Michel J. Soares, Felipe Brust-Renck, Priscila G. |
author_sort | Fogliatto, Flavio S. |
collection | PubMed |
description | Several statistical-based approaches have been developed to support medical personnel in early breast cancer detection. This article presents a method for feature selection aimed at classifying cases into categories based on patients’ breast tissue measures and protein microarray. The effectiveness of this feature selection strategy was evaluated against the commonly used Wisconsin Breast Cancer Database—WBCD (with several patients and fewer features) and a new protein microarray data set (with several features and fewer patients). Features were ranked according to a feature importance index that combines parameters emerging from the unsupervised method of principal component analysis and the supervised method of Bhattacharyya distance. Observations of a training set were iteratively categorized into malignant and benign cases through 3 classification techniques: k-Nearest Neighbor, linear discriminant analysis, and probabilistic neural network. After each classification, the feature with the smallest importance index was removed, and a new categorization was carried out until there was only one feature left. The subset yielding maximum accuracy was used to classify observations in the testing set. Our method yielded average 99.17% accurate classifications in the testing set while retaining average 4.61 out of 9 features in the WBCD, which is comparable to the best results reported by the literature on that data set, with the advantage of relying on simple and widely available multivariate techniques. When applied to the microarray data, the method yielded average accuracy of 98.30% while retaining average 2.17% of the original features. Our results can aid health-care professionals during early diagnosis of breast cancer. |
format | Online Article Text |
id | pubmed-6755645 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | SAGE Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-67556452019-09-27 Decision Support for Breast Cancer Detection: Classification Improvement Through Feature Selection Fogliatto, Flavio S. Anzanello, Michel J. Soares, Felipe Brust-Renck, Priscila G. Cancer Control ManCAD100-Research Article Several statistical-based approaches have been developed to support medical personnel in early breast cancer detection. This article presents a method for feature selection aimed at classifying cases into categories based on patients’ breast tissue measures and protein microarray. The effectiveness of this feature selection strategy was evaluated against the commonly used Wisconsin Breast Cancer Database—WBCD (with several patients and fewer features) and a new protein microarray data set (with several features and fewer patients). Features were ranked according to a feature importance index that combines parameters emerging from the unsupervised method of principal component analysis and the supervised method of Bhattacharyya distance. Observations of a training set were iteratively categorized into malignant and benign cases through 3 classification techniques: k-Nearest Neighbor, linear discriminant analysis, and probabilistic neural network. After each classification, the feature with the smallest importance index was removed, and a new categorization was carried out until there was only one feature left. The subset yielding maximum accuracy was used to classify observations in the testing set. Our method yielded average 99.17% accurate classifications in the testing set while retaining average 4.61 out of 9 features in the WBCD, which is comparable to the best results reported by the literature on that data set, with the advantage of relying on simple and widely available multivariate techniques. When applied to the microarray data, the method yielded average accuracy of 98.30% while retaining average 2.17% of the original features. Our results can aid health-care professionals during early diagnosis of breast cancer. SAGE Publications 2019-09-20 /pmc/articles/PMC6755645/ /pubmed/31538497 http://dx.doi.org/10.1177/1073274819876598 Text en © The Author(s) 2019 http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (http://www.creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage). |
spellingShingle | ManCAD100-Research Article Fogliatto, Flavio S. Anzanello, Michel J. Soares, Felipe Brust-Renck, Priscila G. Decision Support for Breast Cancer Detection: Classification Improvement Through Feature Selection |
title | Decision Support for Breast Cancer Detection: Classification Improvement Through Feature Selection |
title_full | Decision Support for Breast Cancer Detection: Classification Improvement Through Feature Selection |
title_fullStr | Decision Support for Breast Cancer Detection: Classification Improvement Through Feature Selection |
title_full_unstemmed | Decision Support for Breast Cancer Detection: Classification Improvement Through Feature Selection |
title_short | Decision Support for Breast Cancer Detection: Classification Improvement Through Feature Selection |
title_sort | decision support for breast cancer detection: classification improvement through feature selection |
topic | ManCAD100-Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6755645/ https://www.ncbi.nlm.nih.gov/pubmed/31538497 http://dx.doi.org/10.1177/1073274819876598 |
work_keys_str_mv | AT fogliattoflavios decisionsupportforbreastcancerdetectionclassificationimprovementthroughfeatureselection AT anzanellomichelj decisionsupportforbreastcancerdetectionclassificationimprovementthroughfeatureselection AT soaresfelipe decisionsupportforbreastcancerdetectionclassificationimprovementthroughfeatureselection AT brustrenckpriscilag decisionsupportforbreastcancerdetectionclassificationimprovementthroughfeatureselection |