Cargando…

Hyperparameter Tuning and Pipeline Optimization via Grid Search Method and Tree-Based AutoML in Breast Cancer Prediction

Automated machine learning (AutoML) has been recognized as a powerful tool to build a system that automates the design and optimizes the model selection machine learning (ML) pipelines. In this study, we present a tree-based pipeline optimization tool (TPOT) as a method for determining ML models wit...

Descripción completa

Detalles Bibliográficos
Autores principales: Radzi, Siti Fairuz Mat, Karim, Muhammad Khalis Abdul, Saripan, M Iqbal, Rahman, Mohd Amiruddin Abd, Isa, Iza Nurzawani Che, Ibahim, Mohammad Johari
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8540332/
https://www.ncbi.nlm.nih.gov/pubmed/34683118
http://dx.doi.org/10.3390/jpm11100978
_version_ 1784588961621999616
author Radzi, Siti Fairuz Mat
Karim, Muhammad Khalis Abdul
Saripan, M Iqbal
Rahman, Mohd Amiruddin Abd
Isa, Iza Nurzawani Che
Ibahim, Mohammad Johari
author_facet Radzi, Siti Fairuz Mat
Karim, Muhammad Khalis Abdul
Saripan, M Iqbal
Rahman, Mohd Amiruddin Abd
Isa, Iza Nurzawani Che
Ibahim, Mohammad Johari
author_sort Radzi, Siti Fairuz Mat
collection PubMed
description Automated machine learning (AutoML) has been recognized as a powerful tool to build a system that automates the design and optimizes the model selection machine learning (ML) pipelines. In this study, we present a tree-based pipeline optimization tool (TPOT) as a method for determining ML models with significant performance and less complex breast cancer diagnostic pipelines. Some features of pre-processors and ML models are defined as expression trees and optimal gene programming (GP) pipelines, a stochastic search system. Features of radiomics have been presented as a guide for the ML pipeline selection from the breast cancer data set based on TPOT. Breast cancer data were used in a comparative analysis of the TPOT-generated ML pipelines with the selected ML classifiers, optimized by a grid search approach. The principal component analysis (PCA) random forest (RF) classification was proven to be the most reliable pipeline with the lowest complexity. The TPOT model selection technique exceeded the performance of grid search (GS) optimization. The RF classifier showed an outstanding outcome amongst the models in combination with only two pre-processors, with a precision of 0.83. The grid search optimized for support vector machine (SVM) classifiers generated a difference of 12% in comparison, while the other two classifiers, naïve Bayes (NB) and artificial neural network—multilayer perceptron (ANN-MLP), generated a difference of almost 39%. The method’s performance was based on sensitivity, specificity, accuracy, precision, and receiver operating curve (ROC) analysis.
format Online
Article
Text
id pubmed-8540332
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-85403322021-10-24 Hyperparameter Tuning and Pipeline Optimization via Grid Search Method and Tree-Based AutoML in Breast Cancer Prediction Radzi, Siti Fairuz Mat Karim, Muhammad Khalis Abdul Saripan, M Iqbal Rahman, Mohd Amiruddin Abd Isa, Iza Nurzawani Che Ibahim, Mohammad Johari J Pers Med Article Automated machine learning (AutoML) has been recognized as a powerful tool to build a system that automates the design and optimizes the model selection machine learning (ML) pipelines. In this study, we present a tree-based pipeline optimization tool (TPOT) as a method for determining ML models with significant performance and less complex breast cancer diagnostic pipelines. Some features of pre-processors and ML models are defined as expression trees and optimal gene programming (GP) pipelines, a stochastic search system. Features of radiomics have been presented as a guide for the ML pipeline selection from the breast cancer data set based on TPOT. Breast cancer data were used in a comparative analysis of the TPOT-generated ML pipelines with the selected ML classifiers, optimized by a grid search approach. The principal component analysis (PCA) random forest (RF) classification was proven to be the most reliable pipeline with the lowest complexity. The TPOT model selection technique exceeded the performance of grid search (GS) optimization. The RF classifier showed an outstanding outcome amongst the models in combination with only two pre-processors, with a precision of 0.83. The grid search optimized for support vector machine (SVM) classifiers generated a difference of 12% in comparison, while the other two classifiers, naïve Bayes (NB) and artificial neural network—multilayer perceptron (ANN-MLP), generated a difference of almost 39%. The method’s performance was based on sensitivity, specificity, accuracy, precision, and receiver operating curve (ROC) analysis. MDPI 2021-09-29 /pmc/articles/PMC8540332/ /pubmed/34683118 http://dx.doi.org/10.3390/jpm11100978 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Radzi, Siti Fairuz Mat
Karim, Muhammad Khalis Abdul
Saripan, M Iqbal
Rahman, Mohd Amiruddin Abd
Isa, Iza Nurzawani Che
Ibahim, Mohammad Johari
Hyperparameter Tuning and Pipeline Optimization via Grid Search Method and Tree-Based AutoML in Breast Cancer Prediction
title Hyperparameter Tuning and Pipeline Optimization via Grid Search Method and Tree-Based AutoML in Breast Cancer Prediction
title_full Hyperparameter Tuning and Pipeline Optimization via Grid Search Method and Tree-Based AutoML in Breast Cancer Prediction
title_fullStr Hyperparameter Tuning and Pipeline Optimization via Grid Search Method and Tree-Based AutoML in Breast Cancer Prediction
title_full_unstemmed Hyperparameter Tuning and Pipeline Optimization via Grid Search Method and Tree-Based AutoML in Breast Cancer Prediction
title_short Hyperparameter Tuning and Pipeline Optimization via Grid Search Method and Tree-Based AutoML in Breast Cancer Prediction
title_sort hyperparameter tuning and pipeline optimization via grid search method and tree-based automl in breast cancer prediction
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8540332/
https://www.ncbi.nlm.nih.gov/pubmed/34683118
http://dx.doi.org/10.3390/jpm11100978
work_keys_str_mv AT radzisitifairuzmat hyperparametertuningandpipelineoptimizationviagridsearchmethodandtreebasedautomlinbreastcancerprediction
AT karimmuhammadkhalisabdul hyperparametertuningandpipelineoptimizationviagridsearchmethodandtreebasedautomlinbreastcancerprediction
AT saripanmiqbal hyperparametertuningandpipelineoptimizationviagridsearchmethodandtreebasedautomlinbreastcancerprediction
AT rahmanmohdamiruddinabd hyperparametertuningandpipelineoptimizationviagridsearchmethodandtreebasedautomlinbreastcancerprediction
AT isaizanurzawaniche hyperparametertuningandpipelineoptimizationviagridsearchmethodandtreebasedautomlinbreastcancerprediction
AT ibahimmohammadjohari hyperparametertuningandpipelineoptimizationviagridsearchmethodandtreebasedautomlinbreastcancerprediction