Cargando…

Studentized bootstrap model-averaged tail area intervals

In many scientific studies, the underlying data-generating process is unknown and multiple statistical models are considered to describe it. For example, in a factorial experiment we might consider models involving just main effects, as well as those that include interactions. Model-averaging is a c...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zeng, Jiaxu, Fletcher, David, Dillingham, Peter W., Cornwall, Christopher E.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2019
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6422275/ https://www.ncbi.nlm.nih.gov/pubmed/30883568 http://dx.doi.org/10.1371/journal.pone.0213715

_version_	1783404366081818624
author	Zeng, Jiaxu Fletcher, David Dillingham, Peter W. Cornwall, Christopher E.
author_facet	Zeng, Jiaxu Fletcher, David Dillingham, Peter W. Cornwall, Christopher E.
author_sort	Zeng, Jiaxu
collection	PubMed
description	In many scientific studies, the underlying data-generating process is unknown and multiple statistical models are considered to describe it. For example, in a factorial experiment we might consider models involving just main effects, as well as those that include interactions. Model-averaging is a commonly-used statistical technique to allow for model uncertainty in parameter estimation. In the frequentist setting, the model-averaged estimate of a parameter is a weighted mean of the estimates from the individual models, with the weights typically being based on an information criterion, cross-validation, or bootstrapping. One approach to building a model-averaged confidence interval is to use a Wald interval, based on the model-averaged estimate and its standard error. This has been the default method in many application areas, particularly those in the life sciences. The MA-Wald interval, however, assumes that the studentized model-averaged estimate has a normal distribution, which can be far from true in practice due to the random, data-driven model weights. Recently, the model-averaged tail area Wald interval (MATA-Wald) has been proposed as an alternative to the MA-Wald interval, which only assumes that the studentized estimate from each model has a N(0, 1) or t-distribution, when that model is true. This alternative to the MA-Wald interval has been shown to have better coverage in simulation studies. However, when we have a response variable that is skewed, even these relaxed assumptions may not be valid, and use of these intervals might therefore result in poor coverage. We propose a new interval (MATA-SBoot) which uses a parametric bootstrap approach to estimate the distribution of the studentized estimate for each model, when that model is true. This method only requires that the studentized estimate from each model is approximately pivotal, an assumption that will often be true in practice, even for skewed data. We illustrate use of this new interval in the analysis of a three-factor marine global change experiment in which the response variable is assumed to have a lognormal distribution. We also perform a simulation study, based on the example, to compare the lower and upper error rates of this interval with those for existing methods. The results suggest that the MATA-SBoot interval can provide better error rates than existing intervals when we have skewed data, particularly for the upper error rate when the sample size is small.
format	Online Article Text
id	pubmed-6422275
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-64222752019-04-02 Studentized bootstrap model-averaged tail area intervals Zeng, Jiaxu Fletcher, David Dillingham, Peter W. Cornwall, Christopher E. PLoS One Research Article In many scientific studies, the underlying data-generating process is unknown and multiple statistical models are considered to describe it. For example, in a factorial experiment we might consider models involving just main effects, as well as those that include interactions. Model-averaging is a commonly-used statistical technique to allow for model uncertainty in parameter estimation. In the frequentist setting, the model-averaged estimate of a parameter is a weighted mean of the estimates from the individual models, with the weights typically being based on an information criterion, cross-validation, or bootstrapping. One approach to building a model-averaged confidence interval is to use a Wald interval, based on the model-averaged estimate and its standard error. This has been the default method in many application areas, particularly those in the life sciences. The MA-Wald interval, however, assumes that the studentized model-averaged estimate has a normal distribution, which can be far from true in practice due to the random, data-driven model weights. Recently, the model-averaged tail area Wald interval (MATA-Wald) has been proposed as an alternative to the MA-Wald interval, which only assumes that the studentized estimate from each model has a N(0, 1) or t-distribution, when that model is true. This alternative to the MA-Wald interval has been shown to have better coverage in simulation studies. However, when we have a response variable that is skewed, even these relaxed assumptions may not be valid, and use of these intervals might therefore result in poor coverage. We propose a new interval (MATA-SBoot) which uses a parametric bootstrap approach to estimate the distribution of the studentized estimate for each model, when that model is true. This method only requires that the studentized estimate from each model is approximately pivotal, an assumption that will often be true in practice, even for skewed data. We illustrate use of this new interval in the analysis of a three-factor marine global change experiment in which the response variable is assumed to have a lognormal distribution. We also perform a simulation study, based on the example, to compare the lower and upper error rates of this interval with those for existing methods. The results suggest that the MATA-SBoot interval can provide better error rates than existing intervals when we have skewed data, particularly for the upper error rate when the sample size is small. Public Library of Science 2019-03-18 /pmc/articles/PMC6422275/ /pubmed/30883568 http://dx.doi.org/10.1371/journal.pone.0213715 Text en © 2019 Zeng et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Zeng, Jiaxu Fletcher, David Dillingham, Peter W. Cornwall, Christopher E. Studentized bootstrap model-averaged tail area intervals
title	Studentized bootstrap model-averaged tail area intervals
title_full	Studentized bootstrap model-averaged tail area intervals
title_fullStr	Studentized bootstrap model-averaged tail area intervals
title_full_unstemmed	Studentized bootstrap model-averaged tail area intervals
title_short	Studentized bootstrap model-averaged tail area intervals
title_sort	studentized bootstrap model-averaged tail area intervals
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6422275/ https://www.ncbi.nlm.nih.gov/pubmed/30883568 http://dx.doi.org/10.1371/journal.pone.0213715
work_keys_str_mv	AT zengjiaxu studentizedbootstrapmodelaveragedtailareaintervals AT fletcherdavid studentizedbootstrapmodelaveragedtailareaintervals AT dillinghampeterw studentizedbootstrapmodelaveragedtailareaintervals AT cornwallchristophere studentizedbootstrapmodelaveragedtailareaintervals

Studentized bootstrap model-averaged tail area intervals

Ejemplares similares