Cargando…
More slices, less truth: effects of different test-set design strategies for magnetic resonance image classification
AIM: To assess the effects of different test-set design strategies for magnetic resonance (MR) image classification using deep learning. METHODS: Error rates in 10 experimental settings were assessed. The performance of pretrained models and data augmentation were examined as possible contributing f...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Croatian Medical Schools
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9468729/ https://www.ncbi.nlm.nih.gov/pubmed/36046934 http://dx.doi.org/10.3325/cmj.2022.63.370 |
Sumario: | AIM: To assess the effects of different test-set design strategies for magnetic resonance (MR) image classification using deep learning. METHODS: Error rates in 10 experimental settings were assessed. The performance of pretrained models and data augmentation were examined as possible contributing factors. RESULTS: Error rates in experimental settings using MR images of different patients for training and test sets were ten times higher than those in experimental settings using MR images of the same patients (four disease groups with whole-chest images, 46.80% vs 2.06%; four disease groups without whole-chest images, 49.09% vs 1.29%; sex classification with whole-chest images, 16.02% vs 0.96%; and sex classification without whole-chest images, 23.56% vs 0.30%). Error rates were higher when data augmentation was applied to settings that used MR images of different patients for training and test sets. CONCLUSION: When deep learning is applied to MR image classification, training and test sets should consist of MR images of different patients. Models built on training and test sets consisting of images of the same patients yield optimistic error rates and lead to wrong conclusions. MR images of neighboring slices are so similar that they cause data leakage effect. |
---|