Cargando…

Meaningless comparisons lead to false optimism in medical machine learning

A new trend in medicine is the use of algorithms to analyze big datasets, e.g. using everything your phone measures about you for diagnostics or monitoring. However, these algorithms are commonly compared against weak baselines, which may contribute to excessive optimism. To assess how well an algor...

Descripción completa

Detalles Bibliográficos
Autores principales: DeMasi, Orianna, Kording, Konrad, Recht, Benjamin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5614525/
https://www.ncbi.nlm.nih.gov/pubmed/28949964
http://dx.doi.org/10.1371/journal.pone.0184604
_version_ 1783266409685450752
author DeMasi, Orianna
Kording, Konrad
Recht, Benjamin
author_facet DeMasi, Orianna
Kording, Konrad
Recht, Benjamin
author_sort DeMasi, Orianna
collection PubMed
description A new trend in medicine is the use of algorithms to analyze big datasets, e.g. using everything your phone measures about you for diagnostics or monitoring. However, these algorithms are commonly compared against weak baselines, which may contribute to excessive optimism. To assess how well an algorithm works, scientists typically ask how well its output correlates with medically assigned scores. Here we perform a meta-analysis to quantify how the literature evaluates their algorithms for monitoring mental wellbeing. We find that the bulk of the literature (∼77%) uses meaningless comparisons that ignore patient baseline state. For example, having an algorithm that uses phone data to diagnose mood disorders would be useful. However, it is possible to explain over 80% of the variance of some mood measures in the population by simply guessing that each patient has their own average mood—the patient-specific baseline. Thus, an algorithm that just predicts that our mood is like it usually is can explain the majority of variance, but is, obviously, entirely useless. Comparing to the wrong (population) baseline has a massive effect on the perceived quality of algorithms and produces baseless optimism in the field. To solve this problem we propose “user lift” that reduces these systematic errors in the evaluation of personalized medical monitoring.
format Online
Article
Text
id pubmed-5614525
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-56145252017-10-09 Meaningless comparisons lead to false optimism in medical machine learning DeMasi, Orianna Kording, Konrad Recht, Benjamin PLoS One Research Article A new trend in medicine is the use of algorithms to analyze big datasets, e.g. using everything your phone measures about you for diagnostics or monitoring. However, these algorithms are commonly compared against weak baselines, which may contribute to excessive optimism. To assess how well an algorithm works, scientists typically ask how well its output correlates with medically assigned scores. Here we perform a meta-analysis to quantify how the literature evaluates their algorithms for monitoring mental wellbeing. We find that the bulk of the literature (∼77%) uses meaningless comparisons that ignore patient baseline state. For example, having an algorithm that uses phone data to diagnose mood disorders would be useful. However, it is possible to explain over 80% of the variance of some mood measures in the population by simply guessing that each patient has their own average mood—the patient-specific baseline. Thus, an algorithm that just predicts that our mood is like it usually is can explain the majority of variance, but is, obviously, entirely useless. Comparing to the wrong (population) baseline has a massive effect on the perceived quality of algorithms and produces baseless optimism in the field. To solve this problem we propose “user lift” that reduces these systematic errors in the evaluation of personalized medical monitoring. Public Library of Science 2017-09-26 /pmc/articles/PMC5614525/ /pubmed/28949964 http://dx.doi.org/10.1371/journal.pone.0184604 Text en © 2017 DeMasi et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
DeMasi, Orianna
Kording, Konrad
Recht, Benjamin
Meaningless comparisons lead to false optimism in medical machine learning
title Meaningless comparisons lead to false optimism in medical machine learning
title_full Meaningless comparisons lead to false optimism in medical machine learning
title_fullStr Meaningless comparisons lead to false optimism in medical machine learning
title_full_unstemmed Meaningless comparisons lead to false optimism in medical machine learning
title_short Meaningless comparisons lead to false optimism in medical machine learning
title_sort meaningless comparisons lead to false optimism in medical machine learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5614525/
https://www.ncbi.nlm.nih.gov/pubmed/28949964
http://dx.doi.org/10.1371/journal.pone.0184604
work_keys_str_mv AT demasiorianna meaninglesscomparisonsleadtofalseoptimisminmedicalmachinelearning
AT kordingkonrad meaninglesscomparisonsleadtofalseoptimisminmedicalmachinelearning
AT rechtbenjamin meaninglesscomparisonsleadtofalseoptimisminmedicalmachinelearning