Cargando…

The need to approximate the use-case in clinical machine learning

The availability of smartphone and wearable sensor technology is leading to a rapid accumulation of human subject data, and machine learning is emerging as a technique to map those data into clinical predictions. As machine learning algorithms are increasingly used to support clinical decision makin...

Descripción completa

Detalles Bibliográficos
Autores principales:	Saeb, Sohrab, Lonini, Luca, Jayaraman, Arun, Mohr, David C., Kording, Konrad P.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2017
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5441397/ https://www.ncbi.nlm.nih.gov/pubmed/28327985 http://dx.doi.org/10.1093/gigascience/gix019

_version_	1783238256575381504
author	Saeb, Sohrab Lonini, Luca Jayaraman, Arun Mohr, David C. Kording, Konrad P.
author_facet	Saeb, Sohrab Lonini, Luca Jayaraman, Arun Mohr, David C. Kording, Konrad P.
author_sort	Saeb, Sohrab
collection	PubMed
description	The availability of smartphone and wearable sensor technology is leading to a rapid accumulation of human subject data, and machine learning is emerging as a technique to map those data into clinical predictions. As machine learning algorithms are increasingly used to support clinical decision making, it is vital to reliably quantify their prediction accuracy. Cross-validation (CV) is the standard approach where the accuracy of such algorithms is evaluated on part of the data the algorithm has not seen during training. However, for this procedure to be meaningful, the relationship between the training and the validation set should mimic the relationship between the training set and the dataset expected for the clinical use. Here we compared two popular CV methods: record-wise and subject-wise. While the subject-wise method mirrors the clinically relevant use-case scenario of diagnosis in newly recruited subjects, the record-wise strategy has no such interpretation. Using both a publicly available dataset and a simulation, we found that record-wise CV often massively overestimates the prediction accuracy of the algorithms. We also conducted a systematic review of the relevant literature, and found that this overly optimistic method was used by almost half of the retrieved studies that used accelerometers, wearable sensors, or smartphones to predict clinical outcomes. As we move towards an era of machine learning-based diagnosis and treatment, using proper methods to evaluate their accuracy is crucial, as inaccurate results can mislead both clinicians and data scientists.
format	Online Article Text
id	pubmed-5441397
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-54413972017-06-19 The need to approximate the use-case in clinical machine learning Saeb, Sohrab Lonini, Luca Jayaraman, Arun Mohr, David C. Kording, Konrad P. Gigascience Research The availability of smartphone and wearable sensor technology is leading to a rapid accumulation of human subject data, and machine learning is emerging as a technique to map those data into clinical predictions. As machine learning algorithms are increasingly used to support clinical decision making, it is vital to reliably quantify their prediction accuracy. Cross-validation (CV) is the standard approach where the accuracy of such algorithms is evaluated on part of the data the algorithm has not seen during training. However, for this procedure to be meaningful, the relationship between the training and the validation set should mimic the relationship between the training set and the dataset expected for the clinical use. Here we compared two popular CV methods: record-wise and subject-wise. While the subject-wise method mirrors the clinically relevant use-case scenario of diagnosis in newly recruited subjects, the record-wise strategy has no such interpretation. Using both a publicly available dataset and a simulation, we found that record-wise CV often massively overestimates the prediction accuracy of the algorithms. We also conducted a systematic review of the relevant literature, and found that this overly optimistic method was used by almost half of the retrieved studies that used accelerometers, wearable sensors, or smartphones to predict clinical outcomes. As we move towards an era of machine learning-based diagnosis and treatment, using proper methods to evaluate their accuracy is crucial, as inaccurate results can mislead both clinicians and data scientists. Oxford University Press 2017-03-15 /pmc/articles/PMC5441397/ /pubmed/28327985 http://dx.doi.org/10.1093/gigascience/gix019 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Saeb, Sohrab Lonini, Luca Jayaraman, Arun Mohr, David C. Kording, Konrad P. The need to approximate the use-case in clinical machine learning
title	The need to approximate the use-case in clinical machine learning
title_full	The need to approximate the use-case in clinical machine learning
title_fullStr	The need to approximate the use-case in clinical machine learning
title_full_unstemmed	The need to approximate the use-case in clinical machine learning
title_short	The need to approximate the use-case in clinical machine learning
title_sort	need to approximate the use-case in clinical machine learning
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5441397/ https://www.ncbi.nlm.nih.gov/pubmed/28327985 http://dx.doi.org/10.1093/gigascience/gix019
work_keys_str_mv	AT saebsohrab theneedtoapproximatetheusecaseinclinicalmachinelearning AT loniniluca theneedtoapproximatetheusecaseinclinicalmachinelearning AT jayaramanarun theneedtoapproximatetheusecaseinclinicalmachinelearning AT mohrdavidc theneedtoapproximatetheusecaseinclinicalmachinelearning AT kordingkonradp theneedtoapproximatetheusecaseinclinicalmachinelearning AT saebsohrab needtoapproximatetheusecaseinclinicalmachinelearning AT loniniluca needtoapproximatetheusecaseinclinicalmachinelearning AT jayaramanarun needtoapproximatetheusecaseinclinicalmachinelearning AT mohrdavidc needtoapproximatetheusecaseinclinicalmachinelearning AT kordingkonradp needtoapproximatetheusecaseinclinicalmachinelearning

The need to approximate the use-case in clinical machine learning

Ejemplares similares