Cargando…

Studentized bootstrap model-averaged tail area intervals

In many scientific studies, the underlying data-generating process is unknown and multiple statistical models are considered to describe it. For example, in a factorial experiment we might consider models involving just main effects, as well as those that include interactions. Model-averaging is a c...

Descripción completa

Detalles Bibliográficos
Autores principales: Zeng, Jiaxu, Fletcher, David, Dillingham, Peter W., Cornwall, Christopher E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6422275/
https://www.ncbi.nlm.nih.gov/pubmed/30883568
http://dx.doi.org/10.1371/journal.pone.0213715
_version_ 1783404366081818624
author Zeng, Jiaxu
Fletcher, David
Dillingham, Peter W.
Cornwall, Christopher E.
author_facet Zeng, Jiaxu
Fletcher, David
Dillingham, Peter W.
Cornwall, Christopher E.
author_sort Zeng, Jiaxu
collection PubMed
description In many scientific studies, the underlying data-generating process is unknown and multiple statistical models are considered to describe it. For example, in a factorial experiment we might consider models involving just main effects, as well as those that include interactions. Model-averaging is a commonly-used statistical technique to allow for model uncertainty in parameter estimation. In the frequentist setting, the model-averaged estimate of a parameter is a weighted mean of the estimates from the individual models, with the weights typically being based on an information criterion, cross-validation, or bootstrapping. One approach to building a model-averaged confidence interval is to use a Wald interval, based on the model-averaged estimate and its standard error. This has been the default method in many application areas, particularly those in the life sciences. The MA-Wald interval, however, assumes that the studentized model-averaged estimate has a normal distribution, which can be far from true in practice due to the random, data-driven model weights. Recently, the model-averaged tail area Wald interval (MATA-Wald) has been proposed as an alternative to the MA-Wald interval, which only assumes that the studentized estimate from each model has a N(0, 1) or t-distribution, when that model is true. This alternative to the MA-Wald interval has been shown to have better coverage in simulation studies. However, when we have a response variable that is skewed, even these relaxed assumptions may not be valid, and use of these intervals might therefore result in poor coverage. We propose a new interval (MATA-SBoot) which uses a parametric bootstrap approach to estimate the distribution of the studentized estimate for each model, when that model is true. This method only requires that the studentized estimate from each model is approximately pivotal, an assumption that will often be true in practice, even for skewed data. We illustrate use of this new interval in the analysis of a three-factor marine global change experiment in which the response variable is assumed to have a lognormal distribution. We also perform a simulation study, based on the example, to compare the lower and upper error rates of this interval with those for existing methods. The results suggest that the MATA-SBoot interval can provide better error rates than existing intervals when we have skewed data, particularly for the upper error rate when the sample size is small.
format Online
Article
Text
id pubmed-6422275
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-64222752019-04-02 Studentized bootstrap model-averaged tail area intervals Zeng, Jiaxu Fletcher, David Dillingham, Peter W. Cornwall, Christopher E. PLoS One Research Article In many scientific studies, the underlying data-generating process is unknown and multiple statistical models are considered to describe it. For example, in a factorial experiment we might consider models involving just main effects, as well as those that include interactions. Model-averaging is a commonly-used statistical technique to allow for model uncertainty in parameter estimation. In the frequentist setting, the model-averaged estimate of a parameter is a weighted mean of the estimates from the individual models, with the weights typically being based on an information criterion, cross-validation, or bootstrapping. One approach to building a model-averaged confidence interval is to use a Wald interval, based on the model-averaged estimate and its standard error. This has been the default method in many application areas, particularly those in the life sciences. The MA-Wald interval, however, assumes that the studentized model-averaged estimate has a normal distribution, which can be far from true in practice due to the random, data-driven model weights. Recently, the model-averaged tail area Wald interval (MATA-Wald) has been proposed as an alternative to the MA-Wald interval, which only assumes that the studentized estimate from each model has a N(0, 1) or t-distribution, when that model is true. This alternative to the MA-Wald interval has been shown to have better coverage in simulation studies. However, when we have a response variable that is skewed, even these relaxed assumptions may not be valid, and use of these intervals might therefore result in poor coverage. We propose a new interval (MATA-SBoot) which uses a parametric bootstrap approach to estimate the distribution of the studentized estimate for each model, when that model is true. This method only requires that the studentized estimate from each model is approximately pivotal, an assumption that will often be true in practice, even for skewed data. We illustrate use of this new interval in the analysis of a three-factor marine global change experiment in which the response variable is assumed to have a lognormal distribution. We also perform a simulation study, based on the example, to compare the lower and upper error rates of this interval with those for existing methods. The results suggest that the MATA-SBoot interval can provide better error rates than existing intervals when we have skewed data, particularly for the upper error rate when the sample size is small. Public Library of Science 2019-03-18 /pmc/articles/PMC6422275/ /pubmed/30883568 http://dx.doi.org/10.1371/journal.pone.0213715 Text en © 2019 Zeng et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Zeng, Jiaxu
Fletcher, David
Dillingham, Peter W.
Cornwall, Christopher E.
Studentized bootstrap model-averaged tail area intervals
title Studentized bootstrap model-averaged tail area intervals
title_full Studentized bootstrap model-averaged tail area intervals
title_fullStr Studentized bootstrap model-averaged tail area intervals
title_full_unstemmed Studentized bootstrap model-averaged tail area intervals
title_short Studentized bootstrap model-averaged tail area intervals
title_sort studentized bootstrap model-averaged tail area intervals
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6422275/
https://www.ncbi.nlm.nih.gov/pubmed/30883568
http://dx.doi.org/10.1371/journal.pone.0213715
work_keys_str_mv AT zengjiaxu studentizedbootstrapmodelaveragedtailareaintervals
AT fletcherdavid studentizedbootstrapmodelaveragedtailareaintervals
AT dillinghampeterw studentizedbootstrapmodelaveragedtailareaintervals
AT cornwallchristophere studentizedbootstrapmodelaveragedtailareaintervals