Cargando…

How intra-source imbalanced datasets impact the performance of deep learning for COVID-19 diagnosis using chest X-ray images

Over the past decade, the use of deep learning has been widely increasing in the medical image diagnosis field. Deep learning-based methods’ (DLMs) performance strongly relies on training data. Therefore, researchers often focus on collecting as much data as possible from different medical facilitie...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Zhang, Zhang, Xiaoyong, Ichiji, Kei, Bukovský, Ivo, Homma, Noriyasu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10624834/
https://www.ncbi.nlm.nih.gov/pubmed/37923762
http://dx.doi.org/10.1038/s41598-023-45368-w
_version_ 1785130995528237056
author Zhang, Zhang
Zhang, Xiaoyong
Ichiji, Kei
Bukovský, Ivo
Homma, Noriyasu
author_facet Zhang, Zhang
Zhang, Xiaoyong
Ichiji, Kei
Bukovský, Ivo
Homma, Noriyasu
author_sort Zhang, Zhang
collection PubMed
description Over the past decade, the use of deep learning has been widely increasing in the medical image diagnosis field. Deep learning-based methods’ (DLMs) performance strongly relies on training data. Therefore, researchers often focus on collecting as much data as possible from different medical facilities or developing approaches to avoid the impact of inter-category imbalance (ICI), which means a difference in data quantity among categories. However, due to the ICI within each medical facility, medical data are often isolated and acquired in different settings among medical facilities, known as the issue of intra-source imbalance (ISI) characteristic. This imbalance also impacts the performance of DLMs but receives negligible attention. In this study, we study the impact of the ISI on DLMs by comparison of the version of a deep learning model that was trained separately by an intra-source imbalanced chest X-ray (CXR) dataset and an intra-source balanced CXR dataset for COVID-19 diagnosis. The finding is that using the intra-source imbalanced dataset causes a serious training bias, although the dataset has a good inter-category balance. In contrast, the deep learning model performed a reliable diagnosis when trained on the intra-source balanced dataset. Therefore, our study reports clear evidence that the intra-source balance is vital for training data to minimize the risk of poor performance of DLMs.
format Online
Article
Text
id pubmed-10624834
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-106248342023-11-05 How intra-source imbalanced datasets impact the performance of deep learning for COVID-19 diagnosis using chest X-ray images Zhang, Zhang Zhang, Xiaoyong Ichiji, Kei Bukovský, Ivo Homma, Noriyasu Sci Rep Article Over the past decade, the use of deep learning has been widely increasing in the medical image diagnosis field. Deep learning-based methods’ (DLMs) performance strongly relies on training data. Therefore, researchers often focus on collecting as much data as possible from different medical facilities or developing approaches to avoid the impact of inter-category imbalance (ICI), which means a difference in data quantity among categories. However, due to the ICI within each medical facility, medical data are often isolated and acquired in different settings among medical facilities, known as the issue of intra-source imbalance (ISI) characteristic. This imbalance also impacts the performance of DLMs but receives negligible attention. In this study, we study the impact of the ISI on DLMs by comparison of the version of a deep learning model that was trained separately by an intra-source imbalanced chest X-ray (CXR) dataset and an intra-source balanced CXR dataset for COVID-19 diagnosis. The finding is that using the intra-source imbalanced dataset causes a serious training bias, although the dataset has a good inter-category balance. In contrast, the deep learning model performed a reliable diagnosis when trained on the intra-source balanced dataset. Therefore, our study reports clear evidence that the intra-source balance is vital for training data to minimize the risk of poor performance of DLMs. Nature Publishing Group UK 2023-11-03 /pmc/articles/PMC10624834/ /pubmed/37923762 http://dx.doi.org/10.1038/s41598-023-45368-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Zhang, Zhang
Zhang, Xiaoyong
Ichiji, Kei
Bukovský, Ivo
Homma, Noriyasu
How intra-source imbalanced datasets impact the performance of deep learning for COVID-19 diagnosis using chest X-ray images
title How intra-source imbalanced datasets impact the performance of deep learning for COVID-19 diagnosis using chest X-ray images
title_full How intra-source imbalanced datasets impact the performance of deep learning for COVID-19 diagnosis using chest X-ray images
title_fullStr How intra-source imbalanced datasets impact the performance of deep learning for COVID-19 diagnosis using chest X-ray images
title_full_unstemmed How intra-source imbalanced datasets impact the performance of deep learning for COVID-19 diagnosis using chest X-ray images
title_short How intra-source imbalanced datasets impact the performance of deep learning for COVID-19 diagnosis using chest X-ray images
title_sort how intra-source imbalanced datasets impact the performance of deep learning for covid-19 diagnosis using chest x-ray images
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10624834/
https://www.ncbi.nlm.nih.gov/pubmed/37923762
http://dx.doi.org/10.1038/s41598-023-45368-w
work_keys_str_mv AT zhangzhang howintrasourceimbalanceddatasetsimpacttheperformanceofdeeplearningforcovid19diagnosisusingchestxrayimages
AT zhangxiaoyong howintrasourceimbalanceddatasetsimpacttheperformanceofdeeplearningforcovid19diagnosisusingchestxrayimages
AT ichijikei howintrasourceimbalanceddatasetsimpacttheperformanceofdeeplearningforcovid19diagnosisusingchestxrayimages
AT bukovskyivo howintrasourceimbalanceddatasetsimpacttheperformanceofdeeplearningforcovid19diagnosisusingchestxrayimages
AT hommanoriyasu howintrasourceimbalanceddatasetsimpacttheperformanceofdeeplearningforcovid19diagnosisusingchestxrayimages