Cargando…

Just another “Clever Hans”? Neural networks and FDG PET-CT to predict the outcome of patients with breast cancer

BACKGROUND: Manual quantification of the metabolic tumor volume (MTV) from whole-body (18)F-FDG PET/CT is time consuming and therefore usually not applied in clinical routine. It has been shown that neural networks might assist nuclear medicine physicians in such quantification tasks. However, littl...

Descripción completa

Detalles Bibliográficos
Autores principales: Weber, Manuel, Kersting, David, Umutlu, Lale, Schäfers, Michael, Rischpler, Christoph, Fendler, Wolfgang P., Buvat, Irène, Herrmann, Ken, Seifert, Robert
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Berlin Heidelberg 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8426242/
https://www.ncbi.nlm.nih.gov/pubmed/33674891
http://dx.doi.org/10.1007/s00259-021-05270-x
Descripción
Sumario:BACKGROUND: Manual quantification of the metabolic tumor volume (MTV) from whole-body (18)F-FDG PET/CT is time consuming and therefore usually not applied in clinical routine. It has been shown that neural networks might assist nuclear medicine physicians in such quantification tasks. However, little is known if such neural networks have to be designed for a specific type of cancer or whether they can be applied to various cancers. Therefore, the aim of this study was to evaluate the accuracy of a neural network in a cancer that was not used for its training. METHODS: Fifty consecutive breast cancer patients that underwent (18)F-FDG PET/CT were included in this retrospective analysis. The PET-Assisted Reporting System (PARS) prototype that uses a neural network trained on lymphoma and lung cancer (18)F-FDG PET/CT data had to detect pathological foci and determine their anatomical location. Consensus reads of two nuclear medicine physicians together with follow-up data served as diagnostic reference standard; 1072 (18)F-FDG avid foci were manually segmented. The accuracy of the neural network was evaluated with regard to lesion detection, anatomical position determination, and total tumor volume quantification. RESULTS: If PERCIST measurable foci were regarded, the neural network displayed high per patient sensitivity and specificity in detecting suspicious (18)F-FDG foci (92%; CI = 79–97% and 98%; CI = 94–99%). If all FDG-avid foci were regarded, the sensitivity degraded (39%; CI = 30–50%). The localization accuracy was high for body part (98%; CI = 95–99%), region (88%; CI = 84–90%), and subregion (79%; CI = 74–84%). There was a high correlation of AI derived and manually segmented MTV (R(2) = 0.91; p < 0.001). AI-derived whole-body MTV (HR = 1.275; CI = 1.208–1.713; p < 0.001) was a significant prognosticator for overall survival. AI-derived lymph node MTV (HR = 1.190; CI = 1.022–1.384; p = 0.025) and liver MTV (HR = 1.149; CI = 1.001–1.318; p = 0.048) were predictive for overall survival in a multivariate analysis. CONCLUSION: Although trained on lymphoma and lung cancer, PARS showed good accuracy in the detection of PERCIST measurable lesions. Therefore, the neural network seems not prone to the clever Hans effect. However, the network has poor accuracy if all manually segmented lesions were used as reference standard. Both the whole body and organ-wise MTV were significant prognosticators of overall survival in advanced breast cancer. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00259-021-05270-x.