Cargando…

Pathological changes or technical artefacts? The problem of the heterogenous databases in COVID-19 CXR image analysis

BACKGROUND: When the COVID-19 pandemic commenced in 2020, scientists assisted medical specialists with diagnostic algorithm development. One scientific research area related to COVID-19 diagnosis was medical imaging and its potential to support molecular tests. Unfortunately, several systems reporte...

Descripción completa

Detalles Bibliográficos
Autores principales: Socha, Marek, Prażuch, Wojciech, Suwalska, Aleksandra, Foszner, Paweł, Tobiasz, Joanna, Jaroszewicz, Jerzy, Gruszczynska, Katarzyna, Sliwinska, Magdalena, Nowak, Mateusz, Gizycka, Barbara, Zapolska, Gabriela, Popiela, Tadeusz, Przybylski, Grzegorz, Fiedor, Piotr, Pawlowska, Malgorzata, Flisiak, Robert, Simon, Krzysztof, Walecki, Jerzy, Cieszanowski, Andrzej, Szurowska, Edyta, Marczyk, Michal, Polanska, Joanna
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Author(s). Published by Elsevier B.V. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10278898/
https://www.ncbi.nlm.nih.gov/pubmed/37356354
http://dx.doi.org/10.1016/j.cmpb.2023.107684
_version_ 1785060563176390656
author Socha, Marek
Prażuch, Wojciech
Suwalska, Aleksandra
Foszner, Paweł
Tobiasz, Joanna
Jaroszewicz, Jerzy
Gruszczynska, Katarzyna
Sliwinska, Magdalena
Nowak, Mateusz
Gizycka, Barbara
Zapolska, Gabriela
Popiela, Tadeusz
Przybylski, Grzegorz
Fiedor, Piotr
Pawlowska, Malgorzata
Flisiak, Robert
Simon, Krzysztof
Walecki, Jerzy
Cieszanowski, Andrzej
Szurowska, Edyta
Marczyk, Michal
Polanska, Joanna
author_facet Socha, Marek
Prażuch, Wojciech
Suwalska, Aleksandra
Foszner, Paweł
Tobiasz, Joanna
Jaroszewicz, Jerzy
Gruszczynska, Katarzyna
Sliwinska, Magdalena
Nowak, Mateusz
Gizycka, Barbara
Zapolska, Gabriela
Popiela, Tadeusz
Przybylski, Grzegorz
Fiedor, Piotr
Pawlowska, Malgorzata
Flisiak, Robert
Simon, Krzysztof
Walecki, Jerzy
Cieszanowski, Andrzej
Szurowska, Edyta
Marczyk, Michal
Polanska, Joanna
author_sort Socha, Marek
collection PubMed
description BACKGROUND: When the COVID-19 pandemic commenced in 2020, scientists assisted medical specialists with diagnostic algorithm development. One scientific research area related to COVID-19 diagnosis was medical imaging and its potential to support molecular tests. Unfortunately, several systems reported high accuracy in development but did not fare well in clinical application. The reason was poor generalization, a long-standing issue in AI development. Researchers found many causes of this issue and decided to refer to them as confounders, meaning a set of artefacts and methodological errors associated with the method. We aim to contribute to this steed by highlighting an undiscussed confounder related to image resolution. METHODS: 20 216 chest X-ray images (CXR) from worldwide centres were analyzed. The CXRs were bijectively projected into the 2D domain by performing Uniform Manifold Approximation and Projection (UMAP) embedding on the radiomic features (rUMAP) or CNN-based neural features (nUMAP) from the pre-last layer of the pre-trained classification neural network. Additional 44 339 thorax CXRs were used for validation. The comprehensive analysis of the multimodality of the density distribution in rUMAP/nUMAP domains and its relation to the original image properties was used to identify the main confounders. RESULTS: nUMAP revealed a hidden bias of neural networks towards the image resolution, which the regular up-sampling procedure cannot compensate for. The issue appears regardless of the network architecture and is not observed in a high-resolution dataset. The impact of the resolution heterogeneity can be partially diminished by applying advanced deep-learning-based super-resolution networks. CONCLUSIONS: rUMAP and nUMAP are great tools for image homogeneity analysis and bias discovery, as demonstrated by applying them to COVID-19 image data. Nonetheless, nUMAP could be applied to any type of data for which a deep neural network could be constructed. Advanced image super-resolution solutions are needed to reduce the impact of the resolution diversity on the classification network decision.
format Online
Article
Text
id pubmed-10278898
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher The Author(s). Published by Elsevier B.V.
record_format MEDLINE/PubMed
spelling pubmed-102788982023-06-21 Pathological changes or technical artefacts? The problem of the heterogenous databases in COVID-19 CXR image analysis Socha, Marek Prażuch, Wojciech Suwalska, Aleksandra Foszner, Paweł Tobiasz, Joanna Jaroszewicz, Jerzy Gruszczynska, Katarzyna Sliwinska, Magdalena Nowak, Mateusz Gizycka, Barbara Zapolska, Gabriela Popiela, Tadeusz Przybylski, Grzegorz Fiedor, Piotr Pawlowska, Malgorzata Flisiak, Robert Simon, Krzysztof Walecki, Jerzy Cieszanowski, Andrzej Szurowska, Edyta Marczyk, Michal Polanska, Joanna Comput Methods Programs Biomed Article BACKGROUND: When the COVID-19 pandemic commenced in 2020, scientists assisted medical specialists with diagnostic algorithm development. One scientific research area related to COVID-19 diagnosis was medical imaging and its potential to support molecular tests. Unfortunately, several systems reported high accuracy in development but did not fare well in clinical application. The reason was poor generalization, a long-standing issue in AI development. Researchers found many causes of this issue and decided to refer to them as confounders, meaning a set of artefacts and methodological errors associated with the method. We aim to contribute to this steed by highlighting an undiscussed confounder related to image resolution. METHODS: 20 216 chest X-ray images (CXR) from worldwide centres were analyzed. The CXRs were bijectively projected into the 2D domain by performing Uniform Manifold Approximation and Projection (UMAP) embedding on the radiomic features (rUMAP) or CNN-based neural features (nUMAP) from the pre-last layer of the pre-trained classification neural network. Additional 44 339 thorax CXRs were used for validation. The comprehensive analysis of the multimodality of the density distribution in rUMAP/nUMAP domains and its relation to the original image properties was used to identify the main confounders. RESULTS: nUMAP revealed a hidden bias of neural networks towards the image resolution, which the regular up-sampling procedure cannot compensate for. The issue appears regardless of the network architecture and is not observed in a high-resolution dataset. The impact of the resolution heterogeneity can be partially diminished by applying advanced deep-learning-based super-resolution networks. CONCLUSIONS: rUMAP and nUMAP are great tools for image homogeneity analysis and bias discovery, as demonstrated by applying them to COVID-19 image data. Nonetheless, nUMAP could be applied to any type of data for which a deep neural network could be constructed. Advanced image super-resolution solutions are needed to reduce the impact of the resolution diversity on the classification network decision. The Author(s). Published by Elsevier B.V. 2023-10 2023-06-19 /pmc/articles/PMC10278898/ /pubmed/37356354 http://dx.doi.org/10.1016/j.cmpb.2023.107684 Text en © 2023 The Author(s) Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Article
Socha, Marek
Prażuch, Wojciech
Suwalska, Aleksandra
Foszner, Paweł
Tobiasz, Joanna
Jaroszewicz, Jerzy
Gruszczynska, Katarzyna
Sliwinska, Magdalena
Nowak, Mateusz
Gizycka, Barbara
Zapolska, Gabriela
Popiela, Tadeusz
Przybylski, Grzegorz
Fiedor, Piotr
Pawlowska, Malgorzata
Flisiak, Robert
Simon, Krzysztof
Walecki, Jerzy
Cieszanowski, Andrzej
Szurowska, Edyta
Marczyk, Michal
Polanska, Joanna
Pathological changes or technical artefacts? The problem of the heterogenous databases in COVID-19 CXR image analysis
title Pathological changes or technical artefacts? The problem of the heterogenous databases in COVID-19 CXR image analysis
title_full Pathological changes or technical artefacts? The problem of the heterogenous databases in COVID-19 CXR image analysis
title_fullStr Pathological changes or technical artefacts? The problem of the heterogenous databases in COVID-19 CXR image analysis
title_full_unstemmed Pathological changes or technical artefacts? The problem of the heterogenous databases in COVID-19 CXR image analysis
title_short Pathological changes or technical artefacts? The problem of the heterogenous databases in COVID-19 CXR image analysis
title_sort pathological changes or technical artefacts? the problem of the heterogenous databases in covid-19 cxr image analysis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10278898/
https://www.ncbi.nlm.nih.gov/pubmed/37356354
http://dx.doi.org/10.1016/j.cmpb.2023.107684
work_keys_str_mv AT sochamarek pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis
AT prazuchwojciech pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis
AT suwalskaaleksandra pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis
AT fosznerpaweł pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis
AT tobiaszjoanna pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis
AT jaroszewiczjerzy pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis
AT gruszczynskakatarzyna pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis
AT sliwinskamagdalena pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis
AT nowakmateusz pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis
AT gizyckabarbara pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis
AT zapolskagabriela pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis
AT popielatadeusz pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis
AT przybylskigrzegorz pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis
AT fiedorpiotr pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis
AT pawlowskamalgorzata pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis
AT flisiakrobert pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis
AT simonkrzysztof pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis
AT waleckijerzy pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis
AT cieszanowskiandrzej pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis
AT szurowskaedyta pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis
AT marczykmichal pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis
AT polanskajoanna pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis
AT pathologicalchangesortechnicalartefactstheproblemoftheheterogenousdatabasesincovid19cxrimageanalysis