Cargando…

Sampling inequalities affect generalization of neuroimaging-based diagnostic classifiers in psychiatry

BACKGROUND: The development of machine learning models for aiding in the diagnosis of mental disorder is recognized as a significant breakthrough in the field of psychiatry. However, clinical practice of such models remains a challenge, with poor generalizability being a major limitation. METHODS: H...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Zhiyi, Hu, Bowen, Liu, Xuerong, Becker, Benjamin, Eickhoff, Simon B., Miao, Kuan, Gu, Xingmei, Tang, Yancheng, Dai, Xin, Li, Chao, Leonov, Artemiy, Xiao, Zhibing, Feng, Zhengzhi, Chen, Ji, Chuan-Peng, Hu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10318841/
https://www.ncbi.nlm.nih.gov/pubmed/37400814
http://dx.doi.org/10.1186/s12916-023-02941-4
Descripción
Sumario:BACKGROUND: The development of machine learning models for aiding in the diagnosis of mental disorder is recognized as a significant breakthrough in the field of psychiatry. However, clinical practice of such models remains a challenge, with poor generalizability being a major limitation. METHODS: Here, we conducted a pre-registered meta-research assessment on neuroimaging-based models in the psychiatric literature, quantitatively examining global and regional sampling issues over recent decades, from a view that has been relatively underexplored. A total of 476 studies (n = 118,137) were included in the current assessment. Based on these findings, we built a comprehensive 5-star rating system to quantitatively evaluate the quality of existing machine learning models for psychiatric diagnoses. RESULTS: A global sampling inequality in these models was revealed quantitatively (sampling Gini coefficient (G) = 0.81, p < .01), varying across different countries (regions) (e.g., China, G = 0.47; the USA, G = 0.58; Germany, G = 0.78; the UK, G = 0.87). Furthermore, the severity of this sampling inequality was significantly predicted by national economic levels (β =  − 2.75, p < .001, R(2)(adj) = 0.40; r =  − .84, 95% CI: − .41 to − .97), and was plausibly predictable for model performance, with higher sampling inequality for reporting higher classification accuracy. Further analyses showed that lack of independent testing (84.24% of models, 95% CI: 81.0–87.5%), improper cross-validation (51.68% of models, 95% CI: 47.2–56.2%), and poor technical transparency (87.8% of models, 95% CI: 84.9–90.8%)/availability (80.88% of models, 95% CI: 77.3–84.4%) are prevailing in current diagnostic classifiers despite improvements over time. Relating to these observations, model performances were found decreased in studies with independent cross-country sampling validations (all p < .001, BF(10) > 15). In light of this, we proposed a purpose-built quantitative assessment checklist, which demonstrated that the overall ratings of these models increased by publication year but were negatively associated with model performance. CONCLUSIONS: Together, improving sampling economic equality and hence the quality of machine learning models may be a crucial facet to plausibly translating neuroimaging-based diagnostic classifiers into clinical practice. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12916-023-02941-4.