Cargando…

Methods for correcting inference based on outcomes predicted by machine learning

Many modern problems in medicine and public health leverage machine-learning methods to predict outcomes based on observable covariates. In a wide array of settings, predicted outcomes are used in subsequent statistical analysis, often without accounting for the distinction between observed and pred...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Siruo, McCormick, Tyler H., Leek, Jeffrey T.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	National Academy of Sciences 2020
Materias:	Physical Sciences
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7720220/ https://www.ncbi.nlm.nih.gov/pubmed/33208538 http://dx.doi.org/10.1073/pnas.2001238117

_version_	1783619811869196288
author	Wang, Siruo McCormick, Tyler H. Leek, Jeffrey T.
author_facet	Wang, Siruo McCormick, Tyler H. Leek, Jeffrey T.
author_sort	Wang, Siruo
collection	PubMed
description	Many modern problems in medicine and public health leverage machine-learning methods to predict outcomes based on observable covariates. In a wide array of settings, predicted outcomes are used in subsequent statistical analysis, often without accounting for the distinction between observed and predicted outcomes. We call inference with predicted outcomes postprediction inference. In this paper, we develop methods for correcting statistical inference using outcomes predicted with arbitrarily complicated machine-learning models including random forests and deep neural nets. Rather than trying to derive the correction from first principles for each machine-learning algorithm, we observe that there is typically a low-dimensional and easily modeled representation of the relationship between the observed and predicted outcomes. We build an approach for postprediction inference that naturally fits into the standard machine-learning framework where the data are divided into training, testing, and validation sets. We train the prediction model in the training set, estimate the relationship between the observed and predicted outcomes in the testing set, and use that relationship to correct subsequent inference in the validation set. We show our postprediction inference (postpi) approach can correct bias and improve variance estimation and subsequent statistical inference with predicted outcomes. To show the broad range of applicability of our approach, we show postpi can improve inference in two distinct fields: modeling predicted phenotypes in repurposed gene expression data and modeling predicted causes of death in verbal autopsy data. Our method is available through an open-source R package: https://github.com/leekgroup/postpi.
format	Online Article Text
id	pubmed-7720220
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	National Academy of Sciences
record_format	MEDLINE/PubMed
spelling	pubmed-77202202020-12-18 Methods for correcting inference based on outcomes predicted by machine learning Wang, Siruo McCormick, Tyler H. Leek, Jeffrey T. Proc Natl Acad Sci U S A Physical Sciences Many modern problems in medicine and public health leverage machine-learning methods to predict outcomes based on observable covariates. In a wide array of settings, predicted outcomes are used in subsequent statistical analysis, often without accounting for the distinction between observed and predicted outcomes. We call inference with predicted outcomes postprediction inference. In this paper, we develop methods for correcting statistical inference using outcomes predicted with arbitrarily complicated machine-learning models including random forests and deep neural nets. Rather than trying to derive the correction from first principles for each machine-learning algorithm, we observe that there is typically a low-dimensional and easily modeled representation of the relationship between the observed and predicted outcomes. We build an approach for postprediction inference that naturally fits into the standard machine-learning framework where the data are divided into training, testing, and validation sets. We train the prediction model in the training set, estimate the relationship between the observed and predicted outcomes in the testing set, and use that relationship to correct subsequent inference in the validation set. We show our postprediction inference (postpi) approach can correct bias and improve variance estimation and subsequent statistical inference with predicted outcomes. To show the broad range of applicability of our approach, we show postpi can improve inference in two distinct fields: modeling predicted phenotypes in repurposed gene expression data and modeling predicted causes of death in verbal autopsy data. Our method is available through an open-source R package: https://github.com/leekgroup/postpi. National Academy of Sciences 2020-12-01 2020-11-18 /pmc/articles/PMC7720220/ /pubmed/33208538 http://dx.doi.org/10.1073/pnas.2001238117 Text en Copyright © 2020 the Author(s). Published by PNAS. http://creativecommons.org/licenses/by/4.0/ https://creativecommons.org/licenses/by/4.0/This open access article is distributed under Creative Commons Attribution License 4.0 (CC BY) (http://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Physical Sciences Wang, Siruo McCormick, Tyler H. Leek, Jeffrey T. Methods for correcting inference based on outcomes predicted by machine learning
title	Methods for correcting inference based on outcomes predicted by machine learning
title_full	Methods for correcting inference based on outcomes predicted by machine learning
title_fullStr	Methods for correcting inference based on outcomes predicted by machine learning
title_full_unstemmed	Methods for correcting inference based on outcomes predicted by machine learning
title_short	Methods for correcting inference based on outcomes predicted by machine learning
title_sort	methods for correcting inference based on outcomes predicted by machine learning
topic	Physical Sciences
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7720220/ https://www.ncbi.nlm.nih.gov/pubmed/33208538 http://dx.doi.org/10.1073/pnas.2001238117
work_keys_str_mv	AT wangsiruo methodsforcorrectinginferencebasedonoutcomespredictedbymachinelearning AT mccormicktylerh methodsforcorrectinginferencebasedonoutcomespredictedbymachinelearning AT leekjeffreyt methodsforcorrectinginferencebasedonoutcomespredictedbymachinelearning

Methods for correcting inference based on outcomes predicted by machine learning

Ejemplares similares