Cargando…

Challenges of deep learning methods for COVID-19 detection using public datasets

Since the COVID-19 pandemic, several research studies have proposed Deep Learning (DL)-based automated COVID-19 detection, reporting high cross-validation accuracy when classifying COVID-19 patients from normal or other common Pneumonia. Although the reported outcomes are very high in most cases, th...

Descripción completa

Detalles Bibliográficos
Autores principales: Hasan, Md. Kamrul, Alam, Md. Ashraful, Dahal, Lavsen, Roy, Shidhartho, Wahid, Sifat Redwan, Elahi, Md. Toufick E., Martí, Robert, Khanal, Bishesh
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Authors. Published by Elsevier Ltd. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9005223/
https://www.ncbi.nlm.nih.gov/pubmed/35434261
http://dx.doi.org/10.1016/j.imu.2022.100945
_version_ 1784686410988519424
author Hasan, Md. Kamrul
Alam, Md. Ashraful
Dahal, Lavsen
Roy, Shidhartho
Wahid, Sifat Redwan
Elahi, Md. Toufick E.
Martí, Robert
Khanal, Bishesh
author_facet Hasan, Md. Kamrul
Alam, Md. Ashraful
Dahal, Lavsen
Roy, Shidhartho
Wahid, Sifat Redwan
Elahi, Md. Toufick E.
Martí, Robert
Khanal, Bishesh
author_sort Hasan, Md. Kamrul
collection PubMed
description Since the COVID-19 pandemic, several research studies have proposed Deep Learning (DL)-based automated COVID-19 detection, reporting high cross-validation accuracy when classifying COVID-19 patients from normal or other common Pneumonia. Although the reported outcomes are very high in most cases, these results were obtained without an independent test set from a separate data source(s). DL models are likely to overfit training data distribution when independent test sets are not utilized or are prone to learn dataset-specific artifacts rather than the actual disease characteristics and underlying pathology. This study aims to assess the promise of such DL methods and datasets by investigating the key challenges and issues by examining the compositions of the available public image datasets and designing different experimental setups. A convolutional neural network-based network, called CVR-Net (COVID-19 Recognition Network), has been proposed for conducting comprehensive experiments to validate our hypothesis. The presented end-to-end CVR-Net is a multi-scale-multi-encoder ensemble model that aggregates the outputs from two different encoders and their different scales to convey the final prediction probability. Three different classification tasks, such as 2-, 3-, 4-classes, are designed where the train–test datasets are from the single, multiple, and independent sources. The obtained binary classification accuracy is 99.8% for a single train–test data source, where the accuracies fall to 98.4% and 88.7% when multiple and independent train–test data sources are utilized. Similar outcomes are noticed in multi-class categorization tasks for single, multiple, and independent data sources, highlighting the challenges in developing DL models with the existing public datasets without an independent test set from a separate dataset. Such a result concludes a requirement for a better-designed dataset for developing DL tools applicable in actual clinical settings. The dataset should have an independent test set; for a single machine or hospital source, have a more balanced set of images for all the prediction classes; and have a balanced dataset from several hospitals and demography. Our source codes and model are publicly available for the research community for further improvements.
format Online
Article
Text
id pubmed-9005223
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher The Authors. Published by Elsevier Ltd.
record_format MEDLINE/PubMed
spelling pubmed-90052232022-04-13 Challenges of deep learning methods for COVID-19 detection using public datasets Hasan, Md. Kamrul Alam, Md. Ashraful Dahal, Lavsen Roy, Shidhartho Wahid, Sifat Redwan Elahi, Md. Toufick E. Martí, Robert Khanal, Bishesh Inform Med Unlocked Article Since the COVID-19 pandemic, several research studies have proposed Deep Learning (DL)-based automated COVID-19 detection, reporting high cross-validation accuracy when classifying COVID-19 patients from normal or other common Pneumonia. Although the reported outcomes are very high in most cases, these results were obtained without an independent test set from a separate data source(s). DL models are likely to overfit training data distribution when independent test sets are not utilized or are prone to learn dataset-specific artifacts rather than the actual disease characteristics and underlying pathology. This study aims to assess the promise of such DL methods and datasets by investigating the key challenges and issues by examining the compositions of the available public image datasets and designing different experimental setups. A convolutional neural network-based network, called CVR-Net (COVID-19 Recognition Network), has been proposed for conducting comprehensive experiments to validate our hypothesis. The presented end-to-end CVR-Net is a multi-scale-multi-encoder ensemble model that aggregates the outputs from two different encoders and their different scales to convey the final prediction probability. Three different classification tasks, such as 2-, 3-, 4-classes, are designed where the train–test datasets are from the single, multiple, and independent sources. The obtained binary classification accuracy is 99.8% for a single train–test data source, where the accuracies fall to 98.4% and 88.7% when multiple and independent train–test data sources are utilized. Similar outcomes are noticed in multi-class categorization tasks for single, multiple, and independent data sources, highlighting the challenges in developing DL models with the existing public datasets without an independent test set from a separate dataset. Such a result concludes a requirement for a better-designed dataset for developing DL tools applicable in actual clinical settings. The dataset should have an independent test set; for a single machine or hospital source, have a more balanced set of images for all the prediction classes; and have a balanced dataset from several hospitals and demography. Our source codes and model are publicly available for the research community for further improvements. The Authors. Published by Elsevier Ltd. 2022 2022-04-12 /pmc/articles/PMC9005223/ /pubmed/35434261 http://dx.doi.org/10.1016/j.imu.2022.100945 Text en © 2022 The Authors Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Article
Hasan, Md. Kamrul
Alam, Md. Ashraful
Dahal, Lavsen
Roy, Shidhartho
Wahid, Sifat Redwan
Elahi, Md. Toufick E.
Martí, Robert
Khanal, Bishesh
Challenges of deep learning methods for COVID-19 detection using public datasets
title Challenges of deep learning methods for COVID-19 detection using public datasets
title_full Challenges of deep learning methods for COVID-19 detection using public datasets
title_fullStr Challenges of deep learning methods for COVID-19 detection using public datasets
title_full_unstemmed Challenges of deep learning methods for COVID-19 detection using public datasets
title_short Challenges of deep learning methods for COVID-19 detection using public datasets
title_sort challenges of deep learning methods for covid-19 detection using public datasets
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9005223/
https://www.ncbi.nlm.nih.gov/pubmed/35434261
http://dx.doi.org/10.1016/j.imu.2022.100945
work_keys_str_mv AT hasanmdkamrul challengesofdeeplearningmethodsforcovid19detectionusingpublicdatasets
AT alammdashraful challengesofdeeplearningmethodsforcovid19detectionusingpublicdatasets
AT dahallavsen challengesofdeeplearningmethodsforcovid19detectionusingpublicdatasets
AT royshidhartho challengesofdeeplearningmethodsforcovid19detectionusingpublicdatasets
AT wahidsifatredwan challengesofdeeplearningmethodsforcovid19detectionusingpublicdatasets
AT elahimdtouficke challengesofdeeplearningmethodsforcovid19detectionusingpublicdatasets
AT martirobert challengesofdeeplearningmethodsforcovid19detectionusingpublicdatasets
AT khanalbishesh challengesofdeeplearningmethodsforcovid19detectionusingpublicdatasets