Cargando…

Explainable machine learning for knee osteoarthritis diagnosis based on a novel fuzzy feature selection methodology

Knee Osteoarthritis (ΚΟΑ) is a degenerative joint disease of the knee that results from the progressive loss of cartilage. Due to KOA’s multifactorial nature and the poor understanding of its pathophysiology, there is a need for reliable tools that will reduce diagnostic errors made by clinicians. T...

Descripción completa

Detalles Bibliográficos
Autores principales: Kokkotis, Christos, Ntakolia, Charis, Moustakidis, Serafeim, Giakas, Giannis, Tsaopoulos, Dimitrios
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8802106/
https://www.ncbi.nlm.nih.gov/pubmed/35099771
http://dx.doi.org/10.1007/s13246-022-01106-6
Descripción
Sumario:Knee Osteoarthritis (ΚΟΑ) is a degenerative joint disease of the knee that results from the progressive loss of cartilage. Due to KOA’s multifactorial nature and the poor understanding of its pathophysiology, there is a need for reliable tools that will reduce diagnostic errors made by clinicians. The existence of public databases has facilitated the advent of advanced analytics in KOA research however the heterogeneity of the available data along with the observed high feature dimensionality make this diagnosis task difficult. The objective of the present study is to provide a robust Feature Selection (FS) methodology that could: (i) handle the multidimensional nature of the available datasets and (ii) alleviate the defectiveness of existing feature selection techniques towards the identification of important risk factors which contribute to KOA diagnosis. For this aim, we used multidimensional data obtained from the Osteoarthritis Initiative database for individuals without or with KOA. The proposed fuzzy ensemble feature selection methodology aggregates the results of several FS algorithms (filter, wrapper and embedded ones) based on fuzzy logic. The effectiveness of the proposed methodology was evaluated using an extensive experimental setup that involved multiple competing FS algorithms and several well-known ML models. A 73.55% classification accuracy was achieved by the best performing model (Random Forest classifier) on a group of twenty-one selected risk factors. Explainability analysis was finally performed to quantify the impact of the selected features on the model’s output thus enhancing our understanding of the rationale behind the decision-making mechanism of the best model.