Cargando…
HDSNE a new unsupervised multiple image database fusion learning algorithm with flexible and crispy production of one database: a proof case study of lung infection diagnose In chest X-ray images
Continuous release of image databases with fully or partially identical inner categories dramatically deteriorates the production of autonomous Computer-Aided Diagnostics (CAD) systems for true comprehensive medical diagnostics. The first challenge is the frequent massive bulk release of medical ima...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10506286/ https://www.ncbi.nlm.nih.gov/pubmed/37718458 http://dx.doi.org/10.1186/s12880-023-01078-3 |
_version_ | 1785107091001704448 |
---|---|
author | Ahmed, Muhammad Atta Othman Abbas, Ibrahim A. AbdelSatar, Yasser |
author_facet | Ahmed, Muhammad Atta Othman Abbas, Ibrahim A. AbdelSatar, Yasser |
author_sort | Ahmed, Muhammad Atta Othman |
collection | PubMed |
description | Continuous release of image databases with fully or partially identical inner categories dramatically deteriorates the production of autonomous Computer-Aided Diagnostics (CAD) systems for true comprehensive medical diagnostics. The first challenge is the frequent massive bulk release of medical image databases, which often suffer from two common drawbacks: image duplication and corruption. The many subsequent releases of the same data with the same classes or categories come with no clear evidence of success in the concatenation of those identical classes among image databases. This issue stands as a stumbling block in the path of hypothesis-based experiments for the production of a single learning model that can successfully classify all of them correctly. Removing redundant data, enhancing performance, and optimizing energy resources are among the most challenging aspects. In this article, we propose a global data aggregation scale model that incorporates six image databases selected from specific global resources. The proposed valid learner is based on training all the unique patterns within any given data release, thereby creating a unique dataset hypothetically. The Hash MD5 algorithm (MD5) generates a unique hash value for each image, making it suitable for duplication removal. The T-Distributed Stochastic Neighbor Embedding (t-SNE), with a tunable perplexity parameter, can represent data dimensions. Both the Hash MD5 and t-SNE algorithms are applied recursively, producing a balanced and uniform database containing equal samples per category: normal, pneumonia, and Coronavirus Disease of 2019 (COVID-19). We evaluated the performance of all proposed data and the new automated version using the Inception V3 pre-trained model with various evaluation metrics. The performance outcome of the proposed scale model showed more respectable results than traditional data aggregation, achieving a high accuracy of 98.48%, along with high precision, recall, and F1-score. The results have been proved through a statistical t-test, yielding t-values and p-values. It’s important to emphasize that all t-values are undeniably significant, and the p-values provide irrefutable evidence against the null hypothesis. Furthermore, it’s noteworthy that the Final dataset outperformed all other datasets across all metric values when diagnosing various lung infections with the same factors. |
format | Online Article Text |
id | pubmed-10506286 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-105062862023-09-19 HDSNE a new unsupervised multiple image database fusion learning algorithm with flexible and crispy production of one database: a proof case study of lung infection diagnose In chest X-ray images Ahmed, Muhammad Atta Othman Abbas, Ibrahim A. AbdelSatar, Yasser BMC Med Imaging Research Continuous release of image databases with fully or partially identical inner categories dramatically deteriorates the production of autonomous Computer-Aided Diagnostics (CAD) systems for true comprehensive medical diagnostics. The first challenge is the frequent massive bulk release of medical image databases, which often suffer from two common drawbacks: image duplication and corruption. The many subsequent releases of the same data with the same classes or categories come with no clear evidence of success in the concatenation of those identical classes among image databases. This issue stands as a stumbling block in the path of hypothesis-based experiments for the production of a single learning model that can successfully classify all of them correctly. Removing redundant data, enhancing performance, and optimizing energy resources are among the most challenging aspects. In this article, we propose a global data aggregation scale model that incorporates six image databases selected from specific global resources. The proposed valid learner is based on training all the unique patterns within any given data release, thereby creating a unique dataset hypothetically. The Hash MD5 algorithm (MD5) generates a unique hash value for each image, making it suitable for duplication removal. The T-Distributed Stochastic Neighbor Embedding (t-SNE), with a tunable perplexity parameter, can represent data dimensions. Both the Hash MD5 and t-SNE algorithms are applied recursively, producing a balanced and uniform database containing equal samples per category: normal, pneumonia, and Coronavirus Disease of 2019 (COVID-19). We evaluated the performance of all proposed data and the new automated version using the Inception V3 pre-trained model with various evaluation metrics. The performance outcome of the proposed scale model showed more respectable results than traditional data aggregation, achieving a high accuracy of 98.48%, along with high precision, recall, and F1-score. The results have been proved through a statistical t-test, yielding t-values and p-values. It’s important to emphasize that all t-values are undeniably significant, and the p-values provide irrefutable evidence against the null hypothesis. Furthermore, it’s noteworthy that the Final dataset outperformed all other datasets across all metric values when diagnosing various lung infections with the same factors. BioMed Central 2023-09-18 /pmc/articles/PMC10506286/ /pubmed/37718458 http://dx.doi.org/10.1186/s12880-023-01078-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Ahmed, Muhammad Atta Othman Abbas, Ibrahim A. AbdelSatar, Yasser HDSNE a new unsupervised multiple image database fusion learning algorithm with flexible and crispy production of one database: a proof case study of lung infection diagnose In chest X-ray images |
title | HDSNE a new unsupervised multiple image database fusion learning algorithm with flexible and crispy production of one database: a proof case study of lung infection diagnose In chest X-ray images |
title_full | HDSNE a new unsupervised multiple image database fusion learning algorithm with flexible and crispy production of one database: a proof case study of lung infection diagnose In chest X-ray images |
title_fullStr | HDSNE a new unsupervised multiple image database fusion learning algorithm with flexible and crispy production of one database: a proof case study of lung infection diagnose In chest X-ray images |
title_full_unstemmed | HDSNE a new unsupervised multiple image database fusion learning algorithm with flexible and crispy production of one database: a proof case study of lung infection diagnose In chest X-ray images |
title_short | HDSNE a new unsupervised multiple image database fusion learning algorithm with flexible and crispy production of one database: a proof case study of lung infection diagnose In chest X-ray images |
title_sort | hdsne a new unsupervised multiple image database fusion learning algorithm with flexible and crispy production of one database: a proof case study of lung infection diagnose in chest x-ray images |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10506286/ https://www.ncbi.nlm.nih.gov/pubmed/37718458 http://dx.doi.org/10.1186/s12880-023-01078-3 |
work_keys_str_mv | AT ahmedmuhammadattaothman hdsneanewunsupervisedmultipleimagedatabasefusionlearningalgorithmwithflexibleandcrispyproductionofonedatabaseaproofcasestudyoflunginfectiondiagnoseinchestxrayimages AT abbasibrahima hdsneanewunsupervisedmultipleimagedatabasefusionlearningalgorithmwithflexibleandcrispyproductionofonedatabaseaproofcasestudyoflunginfectiondiagnoseinchestxrayimages AT abdelsataryasser hdsneanewunsupervisedmultipleimagedatabasefusionlearningalgorithmwithflexibleandcrispyproductionofonedatabaseaproofcasestudyoflunginfectiondiagnoseinchestxrayimages |