Cargando…

Meaningless comparisons lead to false optimism in medical machine learning

A new trend in medicine is the use of algorithms to analyze big datasets, e.g. using everything your phone measures about you for diagnostics or monitoring. However, these algorithms are commonly compared against weak baselines, which may contribute to excessive optimism. To assess how well an algor...

Descripción completa

Detalles Bibliográficos
Autores principales:	DeMasi, Orianna, Kording, Konrad, Recht, Benjamin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2017
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5614525/ https://www.ncbi.nlm.nih.gov/pubmed/28949964 http://dx.doi.org/10.1371/journal.pone.0184604

_version_	1783266409685450752
author	DeMasi, Orianna Kording, Konrad Recht, Benjamin
author_facet	DeMasi, Orianna Kording, Konrad Recht, Benjamin
author_sort	DeMasi, Orianna
collection	PubMed
description	A new trend in medicine is the use of algorithms to analyze big datasets, e.g. using everything your phone measures about you for diagnostics or monitoring. However, these algorithms are commonly compared against weak baselines, which may contribute to excessive optimism. To assess how well an algorithm works, scientists typically ask how well its output correlates with medically assigned scores. Here we perform a meta-analysis to quantify how the literature evaluates their algorithms for monitoring mental wellbeing. We find that the bulk of the literature (∼77%) uses meaningless comparisons that ignore patient baseline state. For example, having an algorithm that uses phone data to diagnose mood disorders would be useful. However, it is possible to explain over 80% of the variance of some mood measures in the population by simply guessing that each patient has their own average mood—the patient-specific baseline. Thus, an algorithm that just predicts that our mood is like it usually is can explain the majority of variance, but is, obviously, entirely useless. Comparing to the wrong (population) baseline has a massive effect on the perceived quality of algorithms and produces baseless optimism in the field. To solve this problem we propose “user lift” that reduces these systematic errors in the evaluation of personalized medical monitoring.
format	Online Article Text
id	pubmed-5614525
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-56145252017-10-09 Meaningless comparisons lead to false optimism in medical machine learning DeMasi, Orianna Kording, Konrad Recht, Benjamin PLoS One Research Article A new trend in medicine is the use of algorithms to analyze big datasets, e.g. using everything your phone measures about you for diagnostics or monitoring. However, these algorithms are commonly compared against weak baselines, which may contribute to excessive optimism. To assess how well an algorithm works, scientists typically ask how well its output correlates with medically assigned scores. Here we perform a meta-analysis to quantify how the literature evaluates their algorithms for monitoring mental wellbeing. We find that the bulk of the literature (∼77%) uses meaningless comparisons that ignore patient baseline state. For example, having an algorithm that uses phone data to diagnose mood disorders would be useful. However, it is possible to explain over 80% of the variance of some mood measures in the population by simply guessing that each patient has their own average mood—the patient-specific baseline. Thus, an algorithm that just predicts that our mood is like it usually is can explain the majority of variance, but is, obviously, entirely useless. Comparing to the wrong (population) baseline has a massive effect on the perceived quality of algorithms and produces baseless optimism in the field. To solve this problem we propose “user lift” that reduces these systematic errors in the evaluation of personalized medical monitoring. Public Library of Science 2017-09-26 /pmc/articles/PMC5614525/ /pubmed/28949964 http://dx.doi.org/10.1371/journal.pone.0184604 Text en © 2017 DeMasi et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article DeMasi, Orianna Kording, Konrad Recht, Benjamin Meaningless comparisons lead to false optimism in medical machine learning
title	Meaningless comparisons lead to false optimism in medical machine learning
title_full	Meaningless comparisons lead to false optimism in medical machine learning
title_fullStr	Meaningless comparisons lead to false optimism in medical machine learning
title_full_unstemmed	Meaningless comparisons lead to false optimism in medical machine learning
title_short	Meaningless comparisons lead to false optimism in medical machine learning
title_sort	meaningless comparisons lead to false optimism in medical machine learning
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5614525/ https://www.ncbi.nlm.nih.gov/pubmed/28949964 http://dx.doi.org/10.1371/journal.pone.0184604
work_keys_str_mv	AT demasiorianna meaninglesscomparisonsleadtofalseoptimisminmedicalmachinelearning AT kordingkonrad meaninglesscomparisonsleadtofalseoptimisminmedicalmachinelearning AT rechtbenjamin meaninglesscomparisonsleadtofalseoptimisminmedicalmachinelearning

Meaningless comparisons lead to false optimism in medical machine learning

Ejemplares similares