Cargando…

Paired evaluation of machine-learning models characterizes effects of confounders and outliers

The true accuracy of a machine-learning model is a population-level statistic that cannot be observed directly. In practice, predictor performance is estimated against one or more test datasets, and the accuracy of this estimate strongly depends on how well the test sets represent all possible unsee...

Descripción completa

Detalles Bibliográficos
Autores principales:	Nariya, Maulik K., Mills, Caitlin E., Sorger, Peter K., Sokolov, Artem
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10435952/ https://www.ncbi.nlm.nih.gov/pubmed/37602225 http://dx.doi.org/10.1016/j.patter.2023.100791

_version_	1785092219947974656
author	Nariya, Maulik K. Mills, Caitlin E. Sorger, Peter K. Sokolov, Artem
author_facet	Nariya, Maulik K. Mills, Caitlin E. Sorger, Peter K. Sokolov, Artem
author_sort	Nariya, Maulik K.
collection	PubMed
description	The true accuracy of a machine-learning model is a population-level statistic that cannot be observed directly. In practice, predictor performance is estimated against one or more test datasets, and the accuracy of this estimate strongly depends on how well the test sets represent all possible unseen datasets. Here we describe paired evaluation as a simple, robust approach for evaluating performance of machine-learning models in small-sample biological and clinical studies. We use the method to evaluate predictors of drug response in breast cancer cell lines and of disease severity in patients with Alzheimer’s disease, demonstrating that the choice of test data can cause estimates of performance to vary by as much as 20%. We show that paired evaluation makes it possible to identify outliers, improve the accuracy of performance estimates in the presence of known confounders, and assign statistical significance when comparing machine-learning models.
format	Online Article Text
id	pubmed-10435952
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-104359522023-08-19 Paired evaluation of machine-learning models characterizes effects of confounders and outliers Nariya, Maulik K. Mills, Caitlin E. Sorger, Peter K. Sokolov, Artem Patterns (N Y) Article The true accuracy of a machine-learning model is a population-level statistic that cannot be observed directly. In practice, predictor performance is estimated against one or more test datasets, and the accuracy of this estimate strongly depends on how well the test sets represent all possible unseen datasets. Here we describe paired evaluation as a simple, robust approach for evaluating performance of machine-learning models in small-sample biological and clinical studies. We use the method to evaluate predictors of drug response in breast cancer cell lines and of disease severity in patients with Alzheimer’s disease, demonstrating that the choice of test data can cause estimates of performance to vary by as much as 20%. We show that paired evaluation makes it possible to identify outliers, improve the accuracy of performance estimates in the presence of known confounders, and assign statistical significance when comparing machine-learning models. Elsevier 2023-07-07 /pmc/articles/PMC10435952/ /pubmed/37602225 http://dx.doi.org/10.1016/j.patter.2023.100791 Text en © 2023 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Nariya, Maulik K. Mills, Caitlin E. Sorger, Peter K. Sokolov, Artem Paired evaluation of machine-learning models characterizes effects of confounders and outliers
title	Paired evaluation of machine-learning models characterizes effects of confounders and outliers
title_full	Paired evaluation of machine-learning models characterizes effects of confounders and outliers
title_fullStr	Paired evaluation of machine-learning models characterizes effects of confounders and outliers
title_full_unstemmed	Paired evaluation of machine-learning models characterizes effects of confounders and outliers
title_short	Paired evaluation of machine-learning models characterizes effects of confounders and outliers
title_sort	paired evaluation of machine-learning models characterizes effects of confounders and outliers
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10435952/ https://www.ncbi.nlm.nih.gov/pubmed/37602225 http://dx.doi.org/10.1016/j.patter.2023.100791
work_keys_str_mv	AT nariyamaulikk pairedevaluationofmachinelearningmodelscharacterizeseffectsofconfoundersandoutliers AT millscaitline pairedevaluationofmachinelearningmodelscharacterizeseffectsofconfoundersandoutliers AT sorgerpeterk pairedevaluationofmachinelearningmodelscharacterizeseffectsofconfoundersandoutliers AT sokolovartem pairedevaluationofmachinelearningmodelscharacterizeseffectsofconfoundersandoutliers

Paired evaluation of machine-learning models characterizes effects of confounders and outliers

Ejemplares similares