Cargando…

Performance variability of radiomics machine learning models for the detection of clinically significant prostate cancer in heterogeneous MRI datasets

BACKGROUND: Radiomics promises to enhance the discriminative performance for clinically significant prostate cancer (csPCa), but still lacks validation in real-life scenarios. This study investigates the classification performance and robustness of machine learning radiomics models in heterogeneous...

Descripción completa

Detalles Bibliográficos
Autores principales: Gresser, Eva, Schachtner, Balthasar, Stüber, Anna Theresa, Solyanik, Olga, Schreier, Andrea, Huber, Thomas, Froelich, Matthias Frank, Magistro, Giuseppe, Kretschmer, Alexander, Stief, Christian, Ricke, Jens, Ingrisch, Michael, Nörenberg, Dominik
Formato: Online Artículo Texto
Lenguaje:English
Publicado: AME Publishing Company 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9622454/
https://www.ncbi.nlm.nih.gov/pubmed/36330197
http://dx.doi.org/10.21037/qims-22-265
Descripción
Sumario:BACKGROUND: Radiomics promises to enhance the discriminative performance for clinically significant prostate cancer (csPCa), but still lacks validation in real-life scenarios. This study investigates the classification performance and robustness of machine learning radiomics models in heterogeneous MRI datasets to characterize suspicious prostate lesions for non-invasive prediction of prostate cancer (PCa) aggressiveness compared to conventional imaging biomarkers. METHODS: A total of 142 patients with clinical suspicion of PCa underwent 1.5T or 3T biparametric MRI (7 scanner types, 14 institutions) and exhibited suspicious lesions [prostate Imaging Reporting and Data System (PI-RADS) score ≥3] in peripheral or transitional zones. Whole-gland and index-lesion segmentations were performed semi-automatically. A total of 1,482 quantitative morphologic, shape, texture, and intensity-based radiomics features were extracted from T2-weighted and apparent diffusion coefficient (ADC)-images and assessed using random forest and logistic regression models. Five-fold cross-validation performance in terms of area under the ROC curve was compared to mean ADC (mADC), PI-RADS and prostate-specific antigen density (PSAD). Bias mitigation techniques targeting the high-dimensional feature space and inherent class imbalance were applied and robustness of results was systematically evaluated. RESULTS: Trained models showed mean area under the curves (AUCs) ranging from 0.78 to 0.83 in csPCa classification. Despite using mitigation techniques, high performance variability of results could be demonstrated. Trained models achieved on average numerically higher classification performance compared to clinical parameters PI-RADS (AUC =0.78), mADC (AUC =0.71) and PSAD (AUC =0.63). CONCLUSIONS: Radiomics models’ classification performance of csPCa was numerically but not significantly higher than PI-RADS scoring. Overall, clinical applicability in heterogeneous MRI datasets is limited because of high variability of results. Performance variability, robustness and reproducibility of radiomics-based measures should be addressed more transparently in future research to enable broad clinical application.