Cargando…
Pathological changes or technical artefacts? The problem of the heterogenous databases in COVID-19 CXR image analysis
BACKGROUND: When the COVID-19 pandemic commenced in 2020, scientists assisted medical specialists with diagnostic algorithm development. One scientific research area related to COVID-19 diagnosis was medical imaging and its potential to support molecular tests. Unfortunately, several systems reporte...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
The Author(s). Published by Elsevier B.V.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10278898/ https://www.ncbi.nlm.nih.gov/pubmed/37356354 http://dx.doi.org/10.1016/j.cmpb.2023.107684 |
_version_ | 1785060563176390656 |
---|---|
author | Socha, Marek Prażuch, Wojciech Suwalska, Aleksandra Foszner, Paweł Tobiasz, Joanna Jaroszewicz, Jerzy Gruszczynska, Katarzyna Sliwinska, Magdalena Nowak, Mateusz Gizycka, Barbara Zapolska, Gabriela Popiela, Tadeusz Przybylski, Grzegorz Fiedor, Piotr Pawlowska, Malgorzata Flisiak, Robert Simon, Krzysztof Walecki, Jerzy Cieszanowski, Andrzej Szurowska, Edyta Marczyk, Michal Polanska, Joanna |
author_facet | Socha, Marek Prażuch, Wojciech Suwalska, Aleksandra Foszner, Paweł Tobiasz, Joanna Jaroszewicz, Jerzy Gruszczynska, Katarzyna Sliwinska, Magdalena Nowak, Mateusz Gizycka, Barbara Zapolska, Gabriela Popiela, Tadeusz Przybylski, Grzegorz Fiedor, Piotr Pawlowska, Malgorzata Flisiak, Robert Simon, Krzysztof Walecki, Jerzy Cieszanowski, Andrzej Szurowska, Edyta Marczyk, Michal Polanska, Joanna |
author_sort | Socha, Marek |
collection | PubMed |
description | BACKGROUND: When the COVID-19 pandemic commenced in 2020, scientists assisted medical specialists with diagnostic algorithm development. One scientific research area related to COVID-19 diagnosis was medical imaging and its potential to support molecular tests. Unfortunately, several systems reported high accuracy in development but did not fare well in clinical application. The reason was poor generalization, a long-standing issue in AI development. Researchers found many causes of this issue and decided to refer to them as confounders, meaning a set of artefacts and methodological errors associated with the method. We aim to contribute to this steed by highlighting an undiscussed confounder related to image resolution. METHODS: 20 216 chest X-ray images (CXR) from worldwide centres were analyzed. The CXRs were bijectively projected into the 2D domain by performing Uniform Manifold Approximation and Projection (UMAP) embedding on the radiomic features (rUMAP) or CNN-based neural features (nUMAP) from the pre-last layer of the pre-trained classification neural network. Additional 44 339 thorax CXRs were used for validation. The comprehensive analysis of the multimodality of the density distribution in rUMAP/nUMAP domains and its relation to the original image properties was used to identify the main confounders. RESULTS: nUMAP revealed a hidden bias of neural networks towards the image resolution, which the regular up-sampling procedure cannot compensate for. The issue appears regardless of the network architecture and is not observed in a high-resolution dataset. The impact of the resolution heterogeneity can be partially diminished by applying advanced deep-learning-based super-resolution networks. CONCLUSIONS: rUMAP and nUMAP are great tools for image homogeneity analysis and bias discovery, as demonstrated by applying them to COVID-19 image data. Nonetheless, nUMAP could be applied to any type of data for which a deep neural network could be constructed. Advanced image super-resolution solutions are needed to reduce the impact of the resolution diversity on the classification network decision. |
format | Online Article Text |
id | pubmed-10278898 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | The Author(s). Published by Elsevier B.V. |
record_format | MEDLINE/PubMed |
spelling | pubmed-102788982023-06-21 Pathological changes or technical artefacts? The problem of the heterogenous databases in COVID-19 CXR image analysis Socha, Marek Prażuch, Wojciech Suwalska, Aleksandra Foszner, Paweł Tobiasz, Joanna Jaroszewicz, Jerzy Gruszczynska, Katarzyna Sliwinska, Magdalena Nowak, Mateusz Gizycka, Barbara Zapolska, Gabriela Popiela, Tadeusz Przybylski, Grzegorz Fiedor, Piotr Pawlowska, Malgorzata Flisiak, Robert Simon, Krzysztof Walecki, Jerzy Cieszanowski, Andrzej Szurowska, Edyta Marczyk, Michal Polanska, Joanna Comput Methods Programs Biomed Article BACKGROUND: When the COVID-19 pandemic commenced in 2020, scientists assisted medical specialists with diagnostic algorithm development. One scientific research area related to COVID-19 diagnosis was medical imaging and its potential to support molecular tests. Unfortunately, several systems reported high accuracy in development but did not fare well in clinical application. The reason was poor generalization, a long-standing issue in AI development. Researchers found many causes of this issue and decided to refer to them as confounders, meaning a set of artefacts and methodological errors associated with the method. We aim to contribute to this steed by highlighting an undiscussed confounder related to image resolution. METHODS: 20 216 chest X-ray images (CXR) from worldwide centres were analyzed. The CXRs were bijectively projected into the 2D domain by performing Uniform Manifold Approximation and Projection (UMAP) embedding on the radiomic features (rUMAP) or CNN-based neural features (nUMAP) from the pre-last layer of the pre-trained classification neural network. Additional 44 339 thorax CXRs were used for validation. The comprehensive analysis of the multimodality of the density distribution in rUMAP/nUMAP domains and its relation to the original image properties was used to identify the main confounders. RESULTS: nUMAP revealed a hidden bias of neural networks towards the image resolution, which the regular up-sampling procedure cannot compensate for. The issue appears regardless of the network architecture and is not observed in a high-resolution dataset. The impact of the resolution heterogeneity can be partially diminished by applying advanced deep-learning-based super-resolution networks. CONCLUSIONS: rUMAP and nUMAP are great tools for image homogeneity analysis and bias discovery, as demonstrated by applying them to COVID-19 image data. Nonetheless, nUMAP could be applied to any type of data for which a deep neural network could be constructed. Advanced image super-resolution solutions are needed to reduce the impact of the resolution diversity on the classification network decision. The Author(s). Published by Elsevier B.V. 2023-10 2023-06-19 /pmc/articles/PMC10278898/ /pubmed/37356354 http://dx.doi.org/10.1016/j.cmpb.2023.107684 Text en © 2023 The Author(s) Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active. |
spellingShingle | Article Socha, Marek Prażuch, Wojciech Suwalska, Aleksandra Foszner, Paweł Tobiasz, Joanna Jaroszewicz, Jerzy Gruszczynska, Katarzyna Sliwinska, Magdalena Nowak, Mateusz Gizycka, Barbara Zapolska, Gabriela Popiela, Tadeusz Przybylski, Grzegorz Fiedor, Piotr Pawlowska, Malgorzata Flisiak, Robert Simon, Krzysztof Walecki, Jerzy Cieszanowski, Andrzej Szurowska, Edyta Marczyk, Michal Polanska, Joanna Pathological changes or technical artefacts? The problem of the heterogenous databases in COVID-19 CXR image analysis |
title | Pathological changes or technical artefacts? The problem of the heterogenous databases in COVID-19 CXR image analysis |
title_full | Pathological changes or technical artefacts? The problem of the heterogenous databases in COVID-19 CXR image analysis |
title_fullStr | Pathological changes or technical artefacts? The problem of the heterogenous databases in COVID-19 CXR image analysis |
title_full_unstemmed | Pathological changes or technical artefacts? The problem of the heterogenous databases in COVID-19 CXR image analysis |
title_short | Pathological changes or technical artefacts? The problem of the heterogenous databases in COVID-19 CXR image analysis |
title_sort | pathological changes or technical artefacts? the problem of the heterogenous databases in covid-19 cxr image analysis |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10278898/ https://www.ncbi.nlm.nih.gov/pubmed/37356354 http://dx.doi.org/10.1016/j.cmpb.2023.107684 |
work_keys_str_mv | AT sochamarek pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis AT prazuchwojciech pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis AT suwalskaaleksandra pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis AT fosznerpaweł pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis AT tobiaszjoanna pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis AT jaroszewiczjerzy pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis AT gruszczynskakatarzyna pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis AT sliwinskamagdalena pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis AT nowakmateusz pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis AT gizyckabarbara pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis AT zapolskagabriela pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis AT popielatadeusz pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis AT przybylskigrzegorz pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis AT fiedorpiotr pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis AT pawlowskamalgorzata pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis AT flisiakrobert pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis AT simonkrzysztof pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis AT waleckijerzy pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis AT cieszanowskiandrzej pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis AT szurowskaedyta pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis AT marczykmichal pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis AT polanskajoanna pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis AT pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis |