Cargando…

Image-encoded biological and non-biological variables may be used as shortcuts in deep learning models trained on multisite neuroimaging data

OBJECTIVE: This work investigates if deep learning (DL) models can classify originating site locations directly from magnetic resonance imaging (MRI) scans with and without correction for intensity differences. MATERIAL AND METHODS: A large database of 1880 T1-weighted MRI scans collected across 41...

Descripción completa

Detalles Bibliográficos
Autores principales:	Souza, Raissa, Wilms, Matthias, Camacho, Milton, Pike, G Bruce, Camicioli, Richard, Monchi, Oury, Forkert, Nils D
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2023
Materias:	Research and Applications
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10654841/ https://www.ncbi.nlm.nih.gov/pubmed/37669158 http://dx.doi.org/10.1093/jamia/ocad171

_version_	1785136704946962432
author	Souza, Raissa Wilms, Matthias Camacho, Milton Pike, G Bruce Camicioli, Richard Monchi, Oury Forkert, Nils D
author_facet	Souza, Raissa Wilms, Matthias Camacho, Milton Pike, G Bruce Camicioli, Richard Monchi, Oury Forkert, Nils D
author_sort	Souza, Raissa
collection	PubMed
description	OBJECTIVE: This work investigates if deep learning (DL) models can classify originating site locations directly from magnetic resonance imaging (MRI) scans with and without correction for intensity differences. MATERIAL AND METHODS: A large database of 1880 T1-weighted MRI scans collected across 41 sites originally for Parkinson’s disease (PD) classification was used to classify sites in this study. Forty-six percent of the datasets are from PD patients, while 54% are from healthy participants. After preprocessing the T1-weighted scans, 2 additional data types were generated: intensity-harmonized T1-weighted scans and log-Jacobian deformation maps resulting from nonlinear atlas registration. Corresponding DL models were trained to classify sites for each data type. Additionally, logistic regression models were used to investigate the contribution of biological (age, sex, disease status) and non-biological (scanner type) variables to the models’ decision. RESULTS: A comparison of the 3 different types of data revealed that DL models trained using T1-weighted and intensity-harmonized T1-weighted scans can classify sites with an accuracy of 85%, while the model using log-Jacobian deformation maps achieved a site classification accuracy of 54%. Disease status and scanner type were found to be significant confounders. DISCUSSION: Our results demonstrate that MRI scans encode relevant site-specific information that models could use as shortcuts that cannot be removed using simple intensity harmonization methods. CONCLUSION: The ability of DL models to exploit site-specific biases as shortcuts raises concerns about their reliability, generalization, and deployability in clinical settings.
format	Online Article Text
id	pubmed-10654841
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-106548412023-09-05 Image-encoded biological and non-biological variables may be used as shortcuts in deep learning models trained on multisite neuroimaging data Souza, Raissa Wilms, Matthias Camacho, Milton Pike, G Bruce Camicioli, Richard Monchi, Oury Forkert, Nils D J Am Med Inform Assoc Research and Applications OBJECTIVE: This work investigates if deep learning (DL) models can classify originating site locations directly from magnetic resonance imaging (MRI) scans with and without correction for intensity differences. MATERIAL AND METHODS: A large database of 1880 T1-weighted MRI scans collected across 41 sites originally for Parkinson’s disease (PD) classification was used to classify sites in this study. Forty-six percent of the datasets are from PD patients, while 54% are from healthy participants. After preprocessing the T1-weighted scans, 2 additional data types were generated: intensity-harmonized T1-weighted scans and log-Jacobian deformation maps resulting from nonlinear atlas registration. Corresponding DL models were trained to classify sites for each data type. Additionally, logistic regression models were used to investigate the contribution of biological (age, sex, disease status) and non-biological (scanner type) variables to the models’ decision. RESULTS: A comparison of the 3 different types of data revealed that DL models trained using T1-weighted and intensity-harmonized T1-weighted scans can classify sites with an accuracy of 85%, while the model using log-Jacobian deformation maps achieved a site classification accuracy of 54%. Disease status and scanner type were found to be significant confounders. DISCUSSION: Our results demonstrate that MRI scans encode relevant site-specific information that models could use as shortcuts that cannot be removed using simple intensity harmonization methods. CONCLUSION: The ability of DL models to exploit site-specific biases as shortcuts raises concerns about their reliability, generalization, and deployability in clinical settings. Oxford University Press 2023-09-05 /pmc/articles/PMC10654841/ /pubmed/37669158 http://dx.doi.org/10.1093/jamia/ocad171 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research and Applications Souza, Raissa Wilms, Matthias Camacho, Milton Pike, G Bruce Camicioli, Richard Monchi, Oury Forkert, Nils D Image-encoded biological and non-biological variables may be used as shortcuts in deep learning models trained on multisite neuroimaging data
title	Image-encoded biological and non-biological variables may be used as shortcuts in deep learning models trained on multisite neuroimaging data
title_full	Image-encoded biological and non-biological variables may be used as shortcuts in deep learning models trained on multisite neuroimaging data
title_fullStr	Image-encoded biological and non-biological variables may be used as shortcuts in deep learning models trained on multisite neuroimaging data
title_full_unstemmed	Image-encoded biological and non-biological variables may be used as shortcuts in deep learning models trained on multisite neuroimaging data
title_short	Image-encoded biological and non-biological variables may be used as shortcuts in deep learning models trained on multisite neuroimaging data
title_sort	image-encoded biological and non-biological variables may be used as shortcuts in deep learning models trained on multisite neuroimaging data
topic	Research and Applications
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10654841/ https://www.ncbi.nlm.nih.gov/pubmed/37669158 http://dx.doi.org/10.1093/jamia/ocad171
work_keys_str_mv	AT souzaraissa imageencodedbiologicalandnonbiologicalvariablesmaybeusedasshortcutsindeeplearningmodelstrainedonmultisiteneuroimagingdata AT wilmsmatthias imageencodedbiologicalandnonbiologicalvariablesmaybeusedasshortcutsindeeplearningmodelstrainedonmultisiteneuroimagingdata AT camachomilton imageencodedbiologicalandnonbiologicalvariablesmaybeusedasshortcutsindeeplearningmodelstrainedonmultisiteneuroimagingdata AT pikegbruce imageencodedbiologicalandnonbiologicalvariablesmaybeusedasshortcutsindeeplearningmodelstrainedonmultisiteneuroimagingdata AT camiciolirichard imageencodedbiologicalandnonbiologicalvariablesmaybeusedasshortcutsindeeplearningmodelstrainedonmultisiteneuroimagingdata AT monchioury imageencodedbiologicalandnonbiologicalvariablesmaybeusedasshortcutsindeeplearningmodelstrainedonmultisiteneuroimagingdata AT forkertnilsd imageencodedbiologicalandnonbiologicalvariablesmaybeusedasshortcutsindeeplearningmodelstrainedonmultisiteneuroimagingdata

Image-encoded biological and non-biological variables may be used as shortcuts in deep learning models trained on multisite neuroimaging data

Ejemplares similares