Cargando…
Predictive performance of radiomic models based on features extracted from pretrained deep networks
OBJECTIVES: In radiomics, generic texture and morphological features are often used for modeling. Recently, features extracted from pretrained deep networks have been used as an alternative. However, extracting deep features involves several decisions, and it is unclear how these affect the resultin...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer Vienna
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9733744/ https://www.ncbi.nlm.nih.gov/pubmed/36484873 http://dx.doi.org/10.1186/s13244-022-01328-y |
Sumario: | OBJECTIVES: In radiomics, generic texture and morphological features are often used for modeling. Recently, features extracted from pretrained deep networks have been used as an alternative. However, extracting deep features involves several decisions, and it is unclear how these affect the resulting models. Therefore, in this study, we considered the influence of such choices on the predictive performance. METHODS: On ten publicly available radiomic datasets, models were trained using feature sets that differed in terms of the utilized network architecture, the layer of feature extraction, the used set of slices, the use of segmentation, and the aggregation method. The influence of these choices on the predictive performance was measured using a linear mixed model. In addition, models with generic features were trained and compared in terms of predictive performance and correlation. RESULTS: No single choice consistently led to the best-performing models. In the mixed model, the choice of architecture (AUC + 0.016; p < 0.001), the level of feature extraction (AUC + 0.016; p < 0.001), and using all slices (AUC + 0.023; p < 0.001) were highly significant; using the segmentation had a lower influence (AUC + 0.011; p = 0.023), while the aggregation method was insignificant (p = 0.774). Models based on deep features were not significantly better than those based on generic features (p > 0.05 on all datasets). Deep feature sets correlated moderately with each other (r = 0.4), in contrast to generic feature sets (r = 0.89). CONCLUSIONS: Different choices have a significant effect on the predictive performance of the resulting models; however, for the highest performance, these choices should be optimized during cross-validation. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13244-022-01328-y. |
---|