Cargando…

Risk estimation using probability machines

BACKGROUND: Logistic regression has been the de facto, and often the only, model used in the description and analysis of relationships between a binary outcome and observed features. It is widely used to obtain the conditional probabilities of the outcome given predictors, as well as predictor effec...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dasgupta, Abhijit, Szymczak, Silke, Moore, Jason H, Bailey-Wilson, Joan E, Malley, James D
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Methodology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015350/ https://www.ncbi.nlm.nih.gov/pubmed/24581306 http://dx.doi.org/10.1186/1756-0381-7-2

_version_	1782315322872692736
author	Dasgupta, Abhijit Szymczak, Silke Moore, Jason H Bailey-Wilson, Joan E Malley, James D
author_facet	Dasgupta, Abhijit Szymczak, Silke Moore, Jason H Bailey-Wilson, Joan E Malley, James D
author_sort	Dasgupta, Abhijit
collection	PubMed
description	BACKGROUND: Logistic regression has been the de facto, and often the only, model used in the description and analysis of relationships between a binary outcome and observed features. It is widely used to obtain the conditional probabilities of the outcome given predictors, as well as predictor effect size estimates using conditional odds ratios. RESULTS: We show how statistical learning machines for binary outcomes, provably consistent for the nonparametric regression problem, can be used to provide both consistent conditional probability estimation and conditional effect size estimates. Effect size estimates from learning machines leverage our understanding of counterfactual arguments central to the interpretation of such estimates. We show that, if the data generating model is logistic, we can recover accurate probability predictions and effect size estimates with nearly the same efficiency as a correct logistic model, both for main effects and interactions. We also propose a method using learning machines to scan for possible interaction effects quickly and efficiently. Simulations using random forest probability machines are presented. CONCLUSIONS: The models we propose make no assumptions about the data structure, and capture the patterns in the data by just specifying the predictors involved and not any particular model structure. So they do not run the same risks of model mis-specification and the resultant estimation biases as a logistic model. This methodology, which we call a “risk machine”, will share properties from the statistical machine that it is derived from.
format	Online Article Text
id	pubmed-4015350
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-40153502014-05-23 Risk estimation using probability machines Dasgupta, Abhijit Szymczak, Silke Moore, Jason H Bailey-Wilson, Joan E Malley, James D BioData Min Methodology BACKGROUND: Logistic regression has been the de facto, and often the only, model used in the description and analysis of relationships between a binary outcome and observed features. It is widely used to obtain the conditional probabilities of the outcome given predictors, as well as predictor effect size estimates using conditional odds ratios. RESULTS: We show how statistical learning machines for binary outcomes, provably consistent for the nonparametric regression problem, can be used to provide both consistent conditional probability estimation and conditional effect size estimates. Effect size estimates from learning machines leverage our understanding of counterfactual arguments central to the interpretation of such estimates. We show that, if the data generating model is logistic, we can recover accurate probability predictions and effect size estimates with nearly the same efficiency as a correct logistic model, both for main effects and interactions. We also propose a method using learning machines to scan for possible interaction effects quickly and efficiently. Simulations using random forest probability machines are presented. CONCLUSIONS: The models we propose make no assumptions about the data structure, and capture the patterns in the data by just specifying the predictors involved and not any particular model structure. So they do not run the same risks of model mis-specification and the resultant estimation biases as a logistic model. This methodology, which we call a “risk machine”, will share properties from the statistical machine that it is derived from. BioMed Central 2014-03-01 /pmc/articles/PMC4015350/ /pubmed/24581306 http://dx.doi.org/10.1186/1756-0381-7-2 Text en Copyright © 2014 Dasgupta et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle	Methodology Dasgupta, Abhijit Szymczak, Silke Moore, Jason H Bailey-Wilson, Joan E Malley, James D Risk estimation using probability machines
title	Risk estimation using probability machines
title_full	Risk estimation using probability machines
title_fullStr	Risk estimation using probability machines
title_full_unstemmed	Risk estimation using probability machines
title_short	Risk estimation using probability machines
title_sort	risk estimation using probability machines
topic	Methodology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015350/ https://www.ncbi.nlm.nih.gov/pubmed/24581306 http://dx.doi.org/10.1186/1756-0381-7-2
work_keys_str_mv	AT dasguptaabhijit riskestimationusingprobabilitymachines AT szymczaksilke riskestimationusingprobabilitymachines AT moorejasonh riskestimationusingprobabilitymachines AT baileywilsonjoane riskestimationusingprobabilitymachines AT malleyjamesd riskestimationusingprobabilitymachines

Risk estimation using probability machines

Ejemplares similares