Cargando…

Systematic misestimation of machine learning performance in neuroimaging studies of depression

We currently observe a disconcerting phenomenon in machine learning studies in psychiatry: While we would expect larger samples to yield better results due to the availability of more data, larger machine learning studies consistently show much weaker performance than the numerous small-scale studie...

Descripción completa

Detalles Bibliográficos
Autores principales: Flint, Claas, Cearns, Micah, Opel, Nils, Redlich, Ronny, Mehler, David M. A., Emden, Daniel, Winter, Nils R., Leenings, Ramona, Eickhoff, Simon B., Kircher, Tilo, Krug, Axel, Nenadic, Igor, Arolt, Volker, Clark, Scott, Baune, Bernhard T., Jiang, Xiaoyi, Dannlowski, Udo, Hahn, Tim
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8209109/
https://www.ncbi.nlm.nih.gov/pubmed/33958703
http://dx.doi.org/10.1038/s41386-021-01020-7
_version_ 1783709061079891968
author Flint, Claas
Cearns, Micah
Opel, Nils
Redlich, Ronny
Mehler, David M. A.
Emden, Daniel
Winter, Nils R.
Leenings, Ramona
Eickhoff, Simon B.
Kircher, Tilo
Krug, Axel
Nenadic, Igor
Arolt, Volker
Clark, Scott
Baune, Bernhard T.
Jiang, Xiaoyi
Dannlowski, Udo
Hahn, Tim
author_facet Flint, Claas
Cearns, Micah
Opel, Nils
Redlich, Ronny
Mehler, David M. A.
Emden, Daniel
Winter, Nils R.
Leenings, Ramona
Eickhoff, Simon B.
Kircher, Tilo
Krug, Axel
Nenadic, Igor
Arolt, Volker
Clark, Scott
Baune, Bernhard T.
Jiang, Xiaoyi
Dannlowski, Udo
Hahn, Tim
author_sort Flint, Claas
collection PubMed
description We currently observe a disconcerting phenomenon in machine learning studies in psychiatry: While we would expect larger samples to yield better results due to the availability of more data, larger machine learning studies consistently show much weaker performance than the numerous small-scale studies. Here, we systematically investigated this effect focusing on one of the most heavily studied questions in the field, namely the classification of patients suffering from Major Depressive Disorder (MDD) and healthy controls based on neuroimaging data. Drawing upon structural MRI data from a balanced sample of N = 1868 MDD patients and healthy controls from our recent international Predictive Analytics Competition (PAC), we first trained and tested a classification model on the full dataset which yielded an accuracy of 61%. Next, we mimicked the process by which researchers would draw samples of various sizes (N = 4 to N = 150) from the population and showed a strong risk of misestimation. Specifically, for small sample sizes (N = 20), we observe accuracies of up to 95%. For medium sample sizes (N = 100) accuracies up to 75% were found. Importantly, further investigation showed that sufficiently large test sets effectively protect against performance misestimation whereas larger datasets per se do not. While these results question the validity of a substantial part of the current literature, we outline the relatively low-cost remedy of larger test sets, which is readily available in most cases.
format Online
Article
Text
id pubmed-8209109
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-82091092021-07-01 Systematic misestimation of machine learning performance in neuroimaging studies of depression Flint, Claas Cearns, Micah Opel, Nils Redlich, Ronny Mehler, David M. A. Emden, Daniel Winter, Nils R. Leenings, Ramona Eickhoff, Simon B. Kircher, Tilo Krug, Axel Nenadic, Igor Arolt, Volker Clark, Scott Baune, Bernhard T. Jiang, Xiaoyi Dannlowski, Udo Hahn, Tim Neuropsychopharmacology Article We currently observe a disconcerting phenomenon in machine learning studies in psychiatry: While we would expect larger samples to yield better results due to the availability of more data, larger machine learning studies consistently show much weaker performance than the numerous small-scale studies. Here, we systematically investigated this effect focusing on one of the most heavily studied questions in the field, namely the classification of patients suffering from Major Depressive Disorder (MDD) and healthy controls based on neuroimaging data. Drawing upon structural MRI data from a balanced sample of N = 1868 MDD patients and healthy controls from our recent international Predictive Analytics Competition (PAC), we first trained and tested a classification model on the full dataset which yielded an accuracy of 61%. Next, we mimicked the process by which researchers would draw samples of various sizes (N = 4 to N = 150) from the population and showed a strong risk of misestimation. Specifically, for small sample sizes (N = 20), we observe accuracies of up to 95%. For medium sample sizes (N = 100) accuracies up to 75% were found. Importantly, further investigation showed that sufficiently large test sets effectively protect against performance misestimation whereas larger datasets per se do not. While these results question the validity of a substantial part of the current literature, we outline the relatively low-cost remedy of larger test sets, which is readily available in most cases. Springer International Publishing 2021-05-06 2021-07 /pmc/articles/PMC8209109/ /pubmed/33958703 http://dx.doi.org/10.1038/s41386-021-01020-7 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Flint, Claas
Cearns, Micah
Opel, Nils
Redlich, Ronny
Mehler, David M. A.
Emden, Daniel
Winter, Nils R.
Leenings, Ramona
Eickhoff, Simon B.
Kircher, Tilo
Krug, Axel
Nenadic, Igor
Arolt, Volker
Clark, Scott
Baune, Bernhard T.
Jiang, Xiaoyi
Dannlowski, Udo
Hahn, Tim
Systematic misestimation of machine learning performance in neuroimaging studies of depression
title Systematic misestimation of machine learning performance in neuroimaging studies of depression
title_full Systematic misestimation of machine learning performance in neuroimaging studies of depression
title_fullStr Systematic misestimation of machine learning performance in neuroimaging studies of depression
title_full_unstemmed Systematic misestimation of machine learning performance in neuroimaging studies of depression
title_short Systematic misestimation of machine learning performance in neuroimaging studies of depression
title_sort systematic misestimation of machine learning performance in neuroimaging studies of depression
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8209109/
https://www.ncbi.nlm.nih.gov/pubmed/33958703
http://dx.doi.org/10.1038/s41386-021-01020-7
work_keys_str_mv AT flintclaas systematicmisestimationofmachinelearningperformanceinneuroimagingstudiesofdepression
AT cearnsmicah systematicmisestimationofmachinelearningperformanceinneuroimagingstudiesofdepression
AT opelnils systematicmisestimationofmachinelearningperformanceinneuroimagingstudiesofdepression
AT redlichronny systematicmisestimationofmachinelearningperformanceinneuroimagingstudiesofdepression
AT mehlerdavidma systematicmisestimationofmachinelearningperformanceinneuroimagingstudiesofdepression
AT emdendaniel systematicmisestimationofmachinelearningperformanceinneuroimagingstudiesofdepression
AT winternilsr systematicmisestimationofmachinelearningperformanceinneuroimagingstudiesofdepression
AT leeningsramona systematicmisestimationofmachinelearningperformanceinneuroimagingstudiesofdepression
AT eickhoffsimonb systematicmisestimationofmachinelearningperformanceinneuroimagingstudiesofdepression
AT kirchertilo systematicmisestimationofmachinelearningperformanceinneuroimagingstudiesofdepression
AT krugaxel systematicmisestimationofmachinelearningperformanceinneuroimagingstudiesofdepression
AT nenadicigor systematicmisestimationofmachinelearningperformanceinneuroimagingstudiesofdepression
AT aroltvolker systematicmisestimationofmachinelearningperformanceinneuroimagingstudiesofdepression
AT clarkscott systematicmisestimationofmachinelearningperformanceinneuroimagingstudiesofdepression
AT baunebernhardt systematicmisestimationofmachinelearningperformanceinneuroimagingstudiesofdepression
AT jiangxiaoyi systematicmisestimationofmachinelearningperformanceinneuroimagingstudiesofdepression
AT dannlowskiudo systematicmisestimationofmachinelearningperformanceinneuroimagingstudiesofdepression
AT hahntim systematicmisestimationofmachinelearningperformanceinneuroimagingstudiesofdepression