Cargando…
Computer Aided Breast Cancer Detection Using Ensembling of Texture and Statistical Image Features
Breast cancer, like most forms of cancer, is a fatal disease that claims more than half a million lives every year. In 2020, breast cancer overtook lung cancer as the most commonly diagnosed form of cancer. Though extremely deadly, the survival rate and longevity increase substantially with early de...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8197148/ https://www.ncbi.nlm.nih.gov/pubmed/34071029 http://dx.doi.org/10.3390/s21113628 |
_version_ | 1783706853075582976 |
---|---|
author | Roy, Soumya Deep Das, Soham Kar, Devroop Schwenker, Friedhelm Sarkar, Ram |
author_facet | Roy, Soumya Deep Das, Soham Kar, Devroop Schwenker, Friedhelm Sarkar, Ram |
author_sort | Roy, Soumya Deep |
collection | PubMed |
description | Breast cancer, like most forms of cancer, is a fatal disease that claims more than half a million lives every year. In 2020, breast cancer overtook lung cancer as the most commonly diagnosed form of cancer. Though extremely deadly, the survival rate and longevity increase substantially with early detection and diagnosis. The treatment protocol also varies with the stage of breast cancer. Diagnosis is typically done using histopathological slides from which it is possible to determine whether the tissue is in the Ductal Carcinoma In Situ (DCIS) stage, in which the cancerous cells have not spread into the encompassing breast tissue, or in the Invasive Ductal Carcinoma (IDC) stage, wherein the cells have penetrated into the neighboring tissues. IDC detection is extremely time-consuming and challenging for physicians. Hence, this can be modeled as an image classification task where pattern recognition and machine learning can be used to aid doctors and medical practitioners in making such crucial decisions. In the present paper, we use an IDC Breast Cancer dataset that contains 277,524 images (with 78,786 IDC positive images and 198,738 IDC negative images) to classify the images into IDC(+) and IDC(-). To that end, we use feature extractors, including textural features, such as SIFT, SURF and ORB, and statistical features, such as Haralick texture features. These features are then combined to yield a dataset of 782 features. These features are ensembled by stacking using various Machine Learning classifiers, such as Random Forest, Extra Trees, XGBoost, AdaBoost, CatBoost and Multi Layer Perceptron followed by feature selection using Pearson Correlation Coefficient to yield a dataset with four features that are then used for classification. From our experimental results, we found that CatBoost yielded the highest accuracy (92.55%), which is at par with other state-of-the-art results—most of which employ Deep Learning architectures. The source code is available in the GitHub repository. |
format | Online Article Text |
id | pubmed-8197148 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-81971482021-06-13 Computer Aided Breast Cancer Detection Using Ensembling of Texture and Statistical Image Features Roy, Soumya Deep Das, Soham Kar, Devroop Schwenker, Friedhelm Sarkar, Ram Sensors (Basel) Article Breast cancer, like most forms of cancer, is a fatal disease that claims more than half a million lives every year. In 2020, breast cancer overtook lung cancer as the most commonly diagnosed form of cancer. Though extremely deadly, the survival rate and longevity increase substantially with early detection and diagnosis. The treatment protocol also varies with the stage of breast cancer. Diagnosis is typically done using histopathological slides from which it is possible to determine whether the tissue is in the Ductal Carcinoma In Situ (DCIS) stage, in which the cancerous cells have not spread into the encompassing breast tissue, or in the Invasive Ductal Carcinoma (IDC) stage, wherein the cells have penetrated into the neighboring tissues. IDC detection is extremely time-consuming and challenging for physicians. Hence, this can be modeled as an image classification task where pattern recognition and machine learning can be used to aid doctors and medical practitioners in making such crucial decisions. In the present paper, we use an IDC Breast Cancer dataset that contains 277,524 images (with 78,786 IDC positive images and 198,738 IDC negative images) to classify the images into IDC(+) and IDC(-). To that end, we use feature extractors, including textural features, such as SIFT, SURF and ORB, and statistical features, such as Haralick texture features. These features are then combined to yield a dataset of 782 features. These features are ensembled by stacking using various Machine Learning classifiers, such as Random Forest, Extra Trees, XGBoost, AdaBoost, CatBoost and Multi Layer Perceptron followed by feature selection using Pearson Correlation Coefficient to yield a dataset with four features that are then used for classification. From our experimental results, we found that CatBoost yielded the highest accuracy (92.55%), which is at par with other state-of-the-art results—most of which employ Deep Learning architectures. The source code is available in the GitHub repository. MDPI 2021-05-23 /pmc/articles/PMC8197148/ /pubmed/34071029 http://dx.doi.org/10.3390/s21113628 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Roy, Soumya Deep Das, Soham Kar, Devroop Schwenker, Friedhelm Sarkar, Ram Computer Aided Breast Cancer Detection Using Ensembling of Texture and Statistical Image Features |
title | Computer Aided Breast Cancer Detection Using Ensembling of Texture and Statistical Image Features |
title_full | Computer Aided Breast Cancer Detection Using Ensembling of Texture and Statistical Image Features |
title_fullStr | Computer Aided Breast Cancer Detection Using Ensembling of Texture and Statistical Image Features |
title_full_unstemmed | Computer Aided Breast Cancer Detection Using Ensembling of Texture and Statistical Image Features |
title_short | Computer Aided Breast Cancer Detection Using Ensembling of Texture and Statistical Image Features |
title_sort | computer aided breast cancer detection using ensembling of texture and statistical image features |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8197148/ https://www.ncbi.nlm.nih.gov/pubmed/34071029 http://dx.doi.org/10.3390/s21113628 |
work_keys_str_mv | AT roysoumyadeep computeraidedbreastcancerdetectionusingensemblingoftextureandstatisticalimagefeatures AT dassoham computeraidedbreastcancerdetectionusingensemblingoftextureandstatisticalimagefeatures AT kardevroop computeraidedbreastcancerdetectionusingensemblingoftextureandstatisticalimagefeatures AT schwenkerfriedhelm computeraidedbreastcancerdetectionusingensemblingoftextureandstatisticalimagefeatures AT sarkarram computeraidedbreastcancerdetectionusingensemblingoftextureandstatisticalimagefeatures |