Cargando…

A real use case of semi-supervised learning for mammogram classification in a local clinic of Costa Rica

The implementation of deep learning-based computer-aided diagnosis systems for the classification of mammogram images can help in improving the accuracy, reliability, and cost of diagnosing patients. However, training a deep learning model requires a considerable amount of labelled images, which can...

Descripción completa

Detalles Bibliográficos
Autores principales: Calderon-Ramirez, Saul, Murillo-Hernandez, Diego, Rojas-Salazar, Kevin, Elizondo, David, Yang, Shengxiang, Moemeni, Armaghan, Molina-Cabello, Miguel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Berlin Heidelberg 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8892413/
https://www.ncbi.nlm.nih.gov/pubmed/35239108
http://dx.doi.org/10.1007/s11517-021-02497-6
_version_ 1784662161748918272
author Calderon-Ramirez, Saul
Murillo-Hernandez, Diego
Rojas-Salazar, Kevin
Elizondo, David
Yang, Shengxiang
Moemeni, Armaghan
Molina-Cabello, Miguel
author_facet Calderon-Ramirez, Saul
Murillo-Hernandez, Diego
Rojas-Salazar, Kevin
Elizondo, David
Yang, Shengxiang
Moemeni, Armaghan
Molina-Cabello, Miguel
author_sort Calderon-Ramirez, Saul
collection PubMed
description The implementation of deep learning-based computer-aided diagnosis systems for the classification of mammogram images can help in improving the accuracy, reliability, and cost of diagnosing patients. However, training a deep learning model requires a considerable amount of labelled images, which can be expensive to obtain as time and effort from clinical practitioners are required. To address this, a number of publicly available datasets have been built with data from different hospitals and clinics, which can be used to pre-train the model. However, using models trained on these datasets for later transfer learning and model fine-tuning with images sampled from a different hospital or clinic might result in lower performance. This is due to the distribution mismatch of the datasets, which include different patient populations and image acquisition protocols. In this work, a real-world scenario is evaluated where a novel target dataset sampled from a private Costa Rican clinic is used, with few labels and heavily imbalanced data. The use of two popular and publicly available datasets (INbreast and CBIS-DDSM) as source data, to train and test the models on the novel target dataset, is evaluated. A common approach to further improve the model’s performance under such small labelled target dataset setting is data augmentation. However, often cheaper unlabelled data is available from the target clinic. Therefore, semi-supervised deep learning, which leverages both labelled and unlabelled data, can be used in such conditions. In this work, we evaluate the semi-supervised deep learning approach known as MixMatch, to take advantage of unlabelled data from the target dataset, for whole mammogram image classification. We compare the usage of semi-supervised learning on its own, and combined with transfer learning (from a source mammogram dataset) with data augmentation, as also against regular supervised learning with transfer learning and data augmentation from source datasets. It is shown that the use of a semi-supervised deep learning combined with transfer learning and data augmentation can provide a meaningful advantage when using scarce labelled observations. Also, we found a strong influence of the source dataset, which suggests a more data-centric approach needed to tackle the challenge of scarcely labelled data. We used several different metrics to assess the performance gain of using semi-supervised learning, when dealing with very imbalanced test datasets (such as the G-mean and the F2-score), as mammogram datasets are often very imbalanced. [Figure: see text] SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s11517-021-02497-6.
format Online
Article
Text
id pubmed-8892413
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Springer Berlin Heidelberg
record_format MEDLINE/PubMed
spelling pubmed-88924132022-03-04 A real use case of semi-supervised learning for mammogram classification in a local clinic of Costa Rica Calderon-Ramirez, Saul Murillo-Hernandez, Diego Rojas-Salazar, Kevin Elizondo, David Yang, Shengxiang Moemeni, Armaghan Molina-Cabello, Miguel Med Biol Eng Comput Original Article The implementation of deep learning-based computer-aided diagnosis systems for the classification of mammogram images can help in improving the accuracy, reliability, and cost of diagnosing patients. However, training a deep learning model requires a considerable amount of labelled images, which can be expensive to obtain as time and effort from clinical practitioners are required. To address this, a number of publicly available datasets have been built with data from different hospitals and clinics, which can be used to pre-train the model. However, using models trained on these datasets for later transfer learning and model fine-tuning with images sampled from a different hospital or clinic might result in lower performance. This is due to the distribution mismatch of the datasets, which include different patient populations and image acquisition protocols. In this work, a real-world scenario is evaluated where a novel target dataset sampled from a private Costa Rican clinic is used, with few labels and heavily imbalanced data. The use of two popular and publicly available datasets (INbreast and CBIS-DDSM) as source data, to train and test the models on the novel target dataset, is evaluated. A common approach to further improve the model’s performance under such small labelled target dataset setting is data augmentation. However, often cheaper unlabelled data is available from the target clinic. Therefore, semi-supervised deep learning, which leverages both labelled and unlabelled data, can be used in such conditions. In this work, we evaluate the semi-supervised deep learning approach known as MixMatch, to take advantage of unlabelled data from the target dataset, for whole mammogram image classification. We compare the usage of semi-supervised learning on its own, and combined with transfer learning (from a source mammogram dataset) with data augmentation, as also against regular supervised learning with transfer learning and data augmentation from source datasets. It is shown that the use of a semi-supervised deep learning combined with transfer learning and data augmentation can provide a meaningful advantage when using scarce labelled observations. Also, we found a strong influence of the source dataset, which suggests a more data-centric approach needed to tackle the challenge of scarcely labelled data. We used several different metrics to assess the performance gain of using semi-supervised learning, when dealing with very imbalanced test datasets (such as the G-mean and the F2-score), as mammogram datasets are often very imbalanced. [Figure: see text] SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s11517-021-02497-6. Springer Berlin Heidelberg 2022-03-03 2022 /pmc/articles/PMC8892413/ /pubmed/35239108 http://dx.doi.org/10.1007/s11517-021-02497-6 Text en © International Federation for Medical and Biological Engineering 2022 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Original Article
Calderon-Ramirez, Saul
Murillo-Hernandez, Diego
Rojas-Salazar, Kevin
Elizondo, David
Yang, Shengxiang
Moemeni, Armaghan
Molina-Cabello, Miguel
A real use case of semi-supervised learning for mammogram classification in a local clinic of Costa Rica
title A real use case of semi-supervised learning for mammogram classification in a local clinic of Costa Rica
title_full A real use case of semi-supervised learning for mammogram classification in a local clinic of Costa Rica
title_fullStr A real use case of semi-supervised learning for mammogram classification in a local clinic of Costa Rica
title_full_unstemmed A real use case of semi-supervised learning for mammogram classification in a local clinic of Costa Rica
title_short A real use case of semi-supervised learning for mammogram classification in a local clinic of Costa Rica
title_sort real use case of semi-supervised learning for mammogram classification in a local clinic of costa rica
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8892413/
https://www.ncbi.nlm.nih.gov/pubmed/35239108
http://dx.doi.org/10.1007/s11517-021-02497-6
work_keys_str_mv AT calderonramirezsaul arealusecaseofsemisupervisedlearningformammogramclassificationinalocalclinicofcostarica
AT murillohernandezdiego arealusecaseofsemisupervisedlearningformammogramclassificationinalocalclinicofcostarica
AT rojassalazarkevin arealusecaseofsemisupervisedlearningformammogramclassificationinalocalclinicofcostarica
AT elizondodavid arealusecaseofsemisupervisedlearningformammogramclassificationinalocalclinicofcostarica
AT yangshengxiang arealusecaseofsemisupervisedlearningformammogramclassificationinalocalclinicofcostarica
AT moemeniarmaghan arealusecaseofsemisupervisedlearningformammogramclassificationinalocalclinicofcostarica
AT molinacabellomiguel arealusecaseofsemisupervisedlearningformammogramclassificationinalocalclinicofcostarica
AT calderonramirezsaul realusecaseofsemisupervisedlearningformammogramclassificationinalocalclinicofcostarica
AT murillohernandezdiego realusecaseofsemisupervisedlearningformammogramclassificationinalocalclinicofcostarica
AT rojassalazarkevin realusecaseofsemisupervisedlearningformammogramclassificationinalocalclinicofcostarica
AT elizondodavid realusecaseofsemisupervisedlearningformammogramclassificationinalocalclinicofcostarica
AT yangshengxiang realusecaseofsemisupervisedlearningformammogramclassificationinalocalclinicofcostarica
AT moemeniarmaghan realusecaseofsemisupervisedlearningformammogramclassificationinalocalclinicofcostarica
AT molinacabellomiguel realusecaseofsemisupervisedlearningformammogramclassificationinalocalclinicofcostarica