Cargando…

A Comparison of Techniques for Class Imbalance in Deep Learning Classification of Breast Cancer

Tools based on deep learning models have been created in recent years to aid radiologists in the diagnosis of breast cancer from mammograms. However, the datasets used to train these models may suffer from class imbalance, i.e., there are often fewer malignant samples than benign or healthy cases, w...

Descripción completa

Detalles Bibliográficos
Autores principales:	Walsh, Ricky, Tardy, Mickael
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9818528/ https://www.ncbi.nlm.nih.gov/pubmed/36611358 http://dx.doi.org/10.3390/diagnostics13010067

_version_	1784865008234004480
author	Walsh, Ricky Tardy, Mickael
author_facet	Walsh, Ricky Tardy, Mickael
author_sort	Walsh, Ricky
collection	PubMed
description	Tools based on deep learning models have been created in recent years to aid radiologists in the diagnosis of breast cancer from mammograms. However, the datasets used to train these models may suffer from class imbalance, i.e., there are often fewer malignant samples than benign or healthy cases, which can bias the model towards the healthy class. In this study, we systematically evaluate several popular techniques to deal with this class imbalance, namely, class weighting, over-sampling, and under-sampling, as well as a synthetic lesion generation approach to increase the number of malignant samples. These techniques are applied when training on three diverse Full-Field Digital Mammography datasets, and tested on in-distribution and out-of-distribution samples. The experiments show that a greater imbalance is associated with a greater bias towards the majority class, which can be counteracted by any of the standard class imbalance techniques. On the other hand, these methods provide no benefit to model performance with respect to Area Under the Curve of the Recall Operating Characteristic (AUC-ROC), and indeed under-sampling leads to a reduction of 0.066 in AUC in the case of a 19:1 benign to malignant imbalance. Our synthetic lesion methodology leads to better performance in most cases, with increases of up to 0.07 in AUC on out-of-distribution test sets over the next best experiment.
format	Online Article Text
id	pubmed-9818528
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-98185282023-01-07 A Comparison of Techniques for Class Imbalance in Deep Learning Classification of Breast Cancer Walsh, Ricky Tardy, Mickael Diagnostics (Basel) Article Tools based on deep learning models have been created in recent years to aid radiologists in the diagnosis of breast cancer from mammograms. However, the datasets used to train these models may suffer from class imbalance, i.e., there are often fewer malignant samples than benign or healthy cases, which can bias the model towards the healthy class. In this study, we systematically evaluate several popular techniques to deal with this class imbalance, namely, class weighting, over-sampling, and under-sampling, as well as a synthetic lesion generation approach to increase the number of malignant samples. These techniques are applied when training on three diverse Full-Field Digital Mammography datasets, and tested on in-distribution and out-of-distribution samples. The experiments show that a greater imbalance is associated with a greater bias towards the majority class, which can be counteracted by any of the standard class imbalance techniques. On the other hand, these methods provide no benefit to model performance with respect to Area Under the Curve of the Recall Operating Characteristic (AUC-ROC), and indeed under-sampling leads to a reduction of 0.066 in AUC in the case of a 19:1 benign to malignant imbalance. Our synthetic lesion methodology leads to better performance in most cases, with increases of up to 0.07 in AUC on out-of-distribution test sets over the next best experiment. MDPI 2022-12-26 /pmc/articles/PMC9818528/ /pubmed/36611358 http://dx.doi.org/10.3390/diagnostics13010067 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Walsh, Ricky Tardy, Mickael A Comparison of Techniques for Class Imbalance in Deep Learning Classification of Breast Cancer
title	A Comparison of Techniques for Class Imbalance in Deep Learning Classification of Breast Cancer
title_full	A Comparison of Techniques for Class Imbalance in Deep Learning Classification of Breast Cancer
title_fullStr	A Comparison of Techniques for Class Imbalance in Deep Learning Classification of Breast Cancer
title_full_unstemmed	A Comparison of Techniques for Class Imbalance in Deep Learning Classification of Breast Cancer
title_short	A Comparison of Techniques for Class Imbalance in Deep Learning Classification of Breast Cancer
title_sort	comparison of techniques for class imbalance in deep learning classification of breast cancer
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9818528/ https://www.ncbi.nlm.nih.gov/pubmed/36611358 http://dx.doi.org/10.3390/diagnostics13010067
work_keys_str_mv	AT walshricky acomparisonoftechniquesforclassimbalanceindeeplearningclassificationofbreastcancer AT tardymickael acomparisonoftechniquesforclassimbalanceindeeplearningclassificationofbreastcancer AT walshricky comparisonoftechniquesforclassimbalanceindeeplearningclassificationofbreastcancer AT tardymickael comparisonoftechniquesforclassimbalanceindeeplearningclassificationofbreastcancer

A Comparison of Techniques for Class Imbalance in Deep Learning Classification of Breast Cancer

Ejemplares similares