Cargando…

An Effective Ensemble Machine Learning Approach to Classify Breast Cancer Based on Feature Selection and Lesion Segmentation Using Preprocessed Mammograms

SIMPLE SUMMARY: The screening of breast cancer in its earlier stages can play a crucial role in minimizing mortality rate by enabling clinicians to administer timely treatments and preventing the cancer from reaching the critical stage. With this view, the objective of this research is to develop an...

Descripción completa

Detalles Bibliográficos
Autores principales:	Rafid, A. K. M. Rakibul Haque, Azam, Sami, Montaha, Sidratul, Karim, Asif, Fahim, Kayes Uddin, Hasan, Md. Zahid
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9687739/ https://www.ncbi.nlm.nih.gov/pubmed/36421368 http://dx.doi.org/10.3390/biology11111654

_version_	1784836086815522816
author	Rafid, A. K. M. Rakibul Haque Azam, Sami Montaha, Sidratul Karim, Asif Fahim, Kayes Uddin Hasan, Md. Zahid
author_facet	Rafid, A. K. M. Rakibul Haque Azam, Sami Montaha, Sidratul Karim, Asif Fahim, Kayes Uddin Hasan, Md. Zahid
author_sort	Rafid, A. K. M. Rakibul Haque
collection	PubMed
description	SIMPLE SUMMARY: The screening of breast cancer in its earlier stages can play a crucial role in minimizing mortality rate by enabling clinicians to administer timely treatments and preventing the cancer from reaching the critical stage. With this view, the objective of this research is to develop an efficient automated approach for analyzing and classifying mammograms into four classes. Primarily, artefacts present in the mammograms are eliminated and the mammograms are enhanced utilizing image-processing techniques. When applying seven data augmentation methods, the volume of the mammography dataset is enlarged. Afterward, the region of interest (ROI) is extracted from the mammograms employing a region-growing algorithm with a dynamic intensity threshold calculated for each mammogram. From each ROI, a total of 16 geometrical features are extracted. These features are investigated with eleven state-of-the-art machine learning (ML) algorithms and depending on test accuracies, three ensemble models are developed. Among the ensemble models, the highest test accuracy of 96.03% is gained by stacking Random Forest and XGB classifier (RF-XGB). Furthermore, the performance of RF-XGB is boosted by utilizing various feature selection methods resulting in 98.05% accuracy. Moreover, the performance consistency of the best model is evaluated with the K-fold cross-validation experiment. This proposed approach of classifying mammograms may assist specialists in the precise and effective diagnosis of breast cancer. ABSTRACT: Background: Breast cancer, behind skin cancer, is the second most frequent malignancy among women, initiated by an unregulated cell division in breast tissues. Although early mammogram screening and treatment result in decreased mortality, differentiating cancer cells from surrounding tissues are often fallible, resulting in fallacious diagnosis. Method: The mammography dataset is used to categorize breast cancer into four classes with low computational complexity, introducing a feature extraction-based approach with machine learning (ML) algorithms. After artefact removal and the preprocessing of the mammograms, the dataset is augmented with seven augmentation techniques. The region of interest (ROI) is extracted by employing several algorithms including a dynamic thresholding method. Sixteen geometrical features are extracted from the ROI while eleven ML algorithms are investigated with these features. Three ensemble models are generated from these ML models employing the stacking method where the first ensemble model is built by stacking ML models with an accuracy of over 90% and the accuracy thresholds for generating the rest of the ensemble models are >95% and >96. Five feature selection methods with fourteen configurations are applied to notch up the performance. Results: The Random Forest Importance algorithm, with a threshold of 0.045, produces 10 features that acquired the highest performance with 98.05% test accuracy by stacking Random Forest and XGB classifier, having a higher than >96% accuracy. Furthermore, with K-fold cross-validation, consistent performance is observed across all K values ranging from 3–30. Moreover, the proposed strategy combining image processing, feature extraction and ML has a proven high accuracy in classifying breast cancer.
format	Online Article Text
id	pubmed-9687739
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-96877392022-11-25 An Effective Ensemble Machine Learning Approach to Classify Breast Cancer Based on Feature Selection and Lesion Segmentation Using Preprocessed Mammograms Rafid, A. K. M. Rakibul Haque Azam, Sami Montaha, Sidratul Karim, Asif Fahim, Kayes Uddin Hasan, Md. Zahid Biology (Basel) Article SIMPLE SUMMARY: The screening of breast cancer in its earlier stages can play a crucial role in minimizing mortality rate by enabling clinicians to administer timely treatments and preventing the cancer from reaching the critical stage. With this view, the objective of this research is to develop an efficient automated approach for analyzing and classifying mammograms into four classes. Primarily, artefacts present in the mammograms are eliminated and the mammograms are enhanced utilizing image-processing techniques. When applying seven data augmentation methods, the volume of the mammography dataset is enlarged. Afterward, the region of interest (ROI) is extracted from the mammograms employing a region-growing algorithm with a dynamic intensity threshold calculated for each mammogram. From each ROI, a total of 16 geometrical features are extracted. These features are investigated with eleven state-of-the-art machine learning (ML) algorithms and depending on test accuracies, three ensemble models are developed. Among the ensemble models, the highest test accuracy of 96.03% is gained by stacking Random Forest and XGB classifier (RF-XGB). Furthermore, the performance of RF-XGB is boosted by utilizing various feature selection methods resulting in 98.05% accuracy. Moreover, the performance consistency of the best model is evaluated with the K-fold cross-validation experiment. This proposed approach of classifying mammograms may assist specialists in the precise and effective diagnosis of breast cancer. ABSTRACT: Background: Breast cancer, behind skin cancer, is the second most frequent malignancy among women, initiated by an unregulated cell division in breast tissues. Although early mammogram screening and treatment result in decreased mortality, differentiating cancer cells from surrounding tissues are often fallible, resulting in fallacious diagnosis. Method: The mammography dataset is used to categorize breast cancer into four classes with low computational complexity, introducing a feature extraction-based approach with machine learning (ML) algorithms. After artefact removal and the preprocessing of the mammograms, the dataset is augmented with seven augmentation techniques. The region of interest (ROI) is extracted by employing several algorithms including a dynamic thresholding method. Sixteen geometrical features are extracted from the ROI while eleven ML algorithms are investigated with these features. Three ensemble models are generated from these ML models employing the stacking method where the first ensemble model is built by stacking ML models with an accuracy of over 90% and the accuracy thresholds for generating the rest of the ensemble models are >95% and >96. Five feature selection methods with fourteen configurations are applied to notch up the performance. Results: The Random Forest Importance algorithm, with a threshold of 0.045, produces 10 features that acquired the highest performance with 98.05% test accuracy by stacking Random Forest and XGB classifier, having a higher than >96% accuracy. Furthermore, with K-fold cross-validation, consistent performance is observed across all K values ranging from 3–30. Moreover, the proposed strategy combining image processing, feature extraction and ML has a proven high accuracy in classifying breast cancer. MDPI 2022-11-11 /pmc/articles/PMC9687739/ /pubmed/36421368 http://dx.doi.org/10.3390/biology11111654 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Rafid, A. K. M. Rakibul Haque Azam, Sami Montaha, Sidratul Karim, Asif Fahim, Kayes Uddin Hasan, Md. Zahid An Effective Ensemble Machine Learning Approach to Classify Breast Cancer Based on Feature Selection and Lesion Segmentation Using Preprocessed Mammograms
title	An Effective Ensemble Machine Learning Approach to Classify Breast Cancer Based on Feature Selection and Lesion Segmentation Using Preprocessed Mammograms
title_full	An Effective Ensemble Machine Learning Approach to Classify Breast Cancer Based on Feature Selection and Lesion Segmentation Using Preprocessed Mammograms
title_fullStr	An Effective Ensemble Machine Learning Approach to Classify Breast Cancer Based on Feature Selection and Lesion Segmentation Using Preprocessed Mammograms
title_full_unstemmed	An Effective Ensemble Machine Learning Approach to Classify Breast Cancer Based on Feature Selection and Lesion Segmentation Using Preprocessed Mammograms
title_short	An Effective Ensemble Machine Learning Approach to Classify Breast Cancer Based on Feature Selection and Lesion Segmentation Using Preprocessed Mammograms
title_sort	effective ensemble machine learning approach to classify breast cancer based on feature selection and lesion segmentation using preprocessed mammograms
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9687739/ https://www.ncbi.nlm.nih.gov/pubmed/36421368 http://dx.doi.org/10.3390/biology11111654
work_keys_str_mv	AT rafidakmrakibulhaque aneffectiveensemblemachinelearningapproachtoclassifybreastcancerbasedonfeatureselectionandlesionsegmentationusingpreprocessedmammograms AT azamsami aneffectiveensemblemachinelearningapproachtoclassifybreastcancerbasedonfeatureselectionandlesionsegmentationusingpreprocessedmammograms AT montahasidratul aneffectiveensemblemachinelearningapproachtoclassifybreastcancerbasedonfeatureselectionandlesionsegmentationusingpreprocessedmammograms AT karimasif aneffectiveensemblemachinelearningapproachtoclassifybreastcancerbasedonfeatureselectionandlesionsegmentationusingpreprocessedmammograms AT fahimkayesuddin aneffectiveensemblemachinelearningapproachtoclassifybreastcancerbasedonfeatureselectionandlesionsegmentationusingpreprocessedmammograms AT hasanmdzahid aneffectiveensemblemachinelearningapproachtoclassifybreastcancerbasedonfeatureselectionandlesionsegmentationusingpreprocessedmammograms AT rafidakmrakibulhaque effectiveensemblemachinelearningapproachtoclassifybreastcancerbasedonfeatureselectionandlesionsegmentationusingpreprocessedmammograms AT azamsami effectiveensemblemachinelearningapproachtoclassifybreastcancerbasedonfeatureselectionandlesionsegmentationusingpreprocessedmammograms AT montahasidratul effectiveensemblemachinelearningapproachtoclassifybreastcancerbasedonfeatureselectionandlesionsegmentationusingpreprocessedmammograms AT karimasif effectiveensemblemachinelearningapproachtoclassifybreastcancerbasedonfeatureselectionandlesionsegmentationusingpreprocessedmammograms AT fahimkayesuddin effectiveensemblemachinelearningapproachtoclassifybreastcancerbasedonfeatureselectionandlesionsegmentationusingpreprocessedmammograms AT hasanmdzahid effectiveensemblemachinelearningapproachtoclassifybreastcancerbasedonfeatureselectionandlesionsegmentationusingpreprocessedmammograms

An Effective Ensemble Machine Learning Approach to Classify Breast Cancer Based on Feature Selection and Lesion Segmentation Using Preprocessed Mammograms

Ejemplares similares