Cargando…

Evaluation of Feature Selection Techniques for Breast Cancer Risk Prediction

This study evaluates several feature ranking techniques together with some classifiers based on machine learning to identify relevant factors regarding the probability of contracting breast cancer and improve the performance of risk prediction models for breast cancer in a healthy population. The da...

Descripción completa

Detalles Bibliográficos
Autores principales: López, Nahúm Cueto, García-Ordás, María Teresa, Vitelli-Storelli, Facundo, Fernández-Navarro, Pablo, Palazuelos, Camilo, Alaiz-Rodríguez, Rocío
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8535206/
https://www.ncbi.nlm.nih.gov/pubmed/34682416
http://dx.doi.org/10.3390/ijerph182010670
_version_ 1784587723358601216
author López, Nahúm Cueto
García-Ordás, María Teresa
Vitelli-Storelli, Facundo
Fernández-Navarro, Pablo
Palazuelos, Camilo
Alaiz-Rodríguez, Rocío
author_facet López, Nahúm Cueto
García-Ordás, María Teresa
Vitelli-Storelli, Facundo
Fernández-Navarro, Pablo
Palazuelos, Camilo
Alaiz-Rodríguez, Rocío
author_sort López, Nahúm Cueto
collection PubMed
description This study evaluates several feature ranking techniques together with some classifiers based on machine learning to identify relevant factors regarding the probability of contracting breast cancer and improve the performance of risk prediction models for breast cancer in a healthy population. The dataset with 919 cases and 946 controls comes from the MCC-Spain study and includes only environmental and genetic features. Breast cancer is a major public health problem. Our aim is to analyze which factors in the cancer risk prediction model are the most important for breast cancer prediction. Likewise, quantifying the stability of feature selection methods becomes essential before trying to gain insight into the data. This paper assesses several feature selection algorithms in terms of performance for a set of predictive models. Furthermore, their robustness is quantified to analyze both the similarity between the feature selection rankings and their own stability. The ranking provided by the SVM-RFE approach leads to the best performance in terms of the area under the ROC curve (AUC) metric. Top-47 ranked features obtained with this approach fed to the Logistic Regression classifier achieve an AUC = 0.616. This means an improvement of 5.8% in comparison with the full feature set. Furthermore, the SVM-RFE ranking technique turned out to be highly stable (as well as Random Forest), whereas relief and the wrapper approaches are quite unstable. This study demonstrates that the stability and performance of the model should be studied together as Random Forest and SVM-RFE turned out to be the most stable algorithms, but in terms of model performance SVM-RFE outperforms Random Forest.
format Online
Article
Text
id pubmed-8535206
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-85352062021-10-23 Evaluation of Feature Selection Techniques for Breast Cancer Risk Prediction López, Nahúm Cueto García-Ordás, María Teresa Vitelli-Storelli, Facundo Fernández-Navarro, Pablo Palazuelos, Camilo Alaiz-Rodríguez, Rocío Int J Environ Res Public Health Article This study evaluates several feature ranking techniques together with some classifiers based on machine learning to identify relevant factors regarding the probability of contracting breast cancer and improve the performance of risk prediction models for breast cancer in a healthy population. The dataset with 919 cases and 946 controls comes from the MCC-Spain study and includes only environmental and genetic features. Breast cancer is a major public health problem. Our aim is to analyze which factors in the cancer risk prediction model are the most important for breast cancer prediction. Likewise, quantifying the stability of feature selection methods becomes essential before trying to gain insight into the data. This paper assesses several feature selection algorithms in terms of performance for a set of predictive models. Furthermore, their robustness is quantified to analyze both the similarity between the feature selection rankings and their own stability. The ranking provided by the SVM-RFE approach leads to the best performance in terms of the area under the ROC curve (AUC) metric. Top-47 ranked features obtained with this approach fed to the Logistic Regression classifier achieve an AUC = 0.616. This means an improvement of 5.8% in comparison with the full feature set. Furthermore, the SVM-RFE ranking technique turned out to be highly stable (as well as Random Forest), whereas relief and the wrapper approaches are quite unstable. This study demonstrates that the stability and performance of the model should be studied together as Random Forest and SVM-RFE turned out to be the most stable algorithms, but in terms of model performance SVM-RFE outperforms Random Forest. MDPI 2021-10-12 /pmc/articles/PMC8535206/ /pubmed/34682416 http://dx.doi.org/10.3390/ijerph182010670 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
López, Nahúm Cueto
García-Ordás, María Teresa
Vitelli-Storelli, Facundo
Fernández-Navarro, Pablo
Palazuelos, Camilo
Alaiz-Rodríguez, Rocío
Evaluation of Feature Selection Techniques for Breast Cancer Risk Prediction
title Evaluation of Feature Selection Techniques for Breast Cancer Risk Prediction
title_full Evaluation of Feature Selection Techniques for Breast Cancer Risk Prediction
title_fullStr Evaluation of Feature Selection Techniques for Breast Cancer Risk Prediction
title_full_unstemmed Evaluation of Feature Selection Techniques for Breast Cancer Risk Prediction
title_short Evaluation of Feature Selection Techniques for Breast Cancer Risk Prediction
title_sort evaluation of feature selection techniques for breast cancer risk prediction
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8535206/
https://www.ncbi.nlm.nih.gov/pubmed/34682416
http://dx.doi.org/10.3390/ijerph182010670
work_keys_str_mv AT lopeznahumcueto evaluationoffeatureselectiontechniquesforbreastcancerriskprediction
AT garciaordasmariateresa evaluationoffeatureselectiontechniquesforbreastcancerriskprediction
AT vitellistorellifacundo evaluationoffeatureselectiontechniquesforbreastcancerriskprediction
AT fernandeznavarropablo evaluationoffeatureselectiontechniquesforbreastcancerriskprediction
AT palazueloscamilo evaluationoffeatureselectiontechniquesforbreastcancerriskprediction
AT alaizrodriguezrocio evaluationoffeatureselectiontechniquesforbreastcancerriskprediction