Cargando…

A simple function for full‐subsets multiple regression in ecology with R

Full‐subsets information theoretic approaches are becoming an increasingly popular tool for exploring predictive power and variable importance where a wide range of candidate predictors are being considered. Here, we describe a simple function in the statistical programming language R that can be us...

Descripción completa

Detalles Bibliográficos
Autores principales: Fisher, Rebecca, Wilson, Shaun K., Sin, Tsai M., Lee, Ai C., Langlois, Tim J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6024142/
https://www.ncbi.nlm.nih.gov/pubmed/29988441
http://dx.doi.org/10.1002/ece3.4134
_version_ 1783336004856315904
author Fisher, Rebecca
Wilson, Shaun K.
Sin, Tsai M.
Lee, Ai C.
Langlois, Tim J.
author_facet Fisher, Rebecca
Wilson, Shaun K.
Sin, Tsai M.
Lee, Ai C.
Langlois, Tim J.
author_sort Fisher, Rebecca
collection PubMed
description Full‐subsets information theoretic approaches are becoming an increasingly popular tool for exploring predictive power and variable importance where a wide range of candidate predictors are being considered. Here, we describe a simple function in the statistical programming language R that can be used to construct, fit, and compare a complete model set of possible ecological or environmental predictors, given a response variable of interest and a starting generalized additive (mixed) model fit. Main advantages include not requiring a complete model to be fit as the starting point for candidate model set construction (meaning that a greater number of predictors can potentially be explored than might be available through functions such as dredge); model sets that include interactions between factors and continuous nonlinear predictors; and automatic removal of models with correlated predictors (based on a user defined criterion for exclusion). The function takes continuous predictors, which are fitted using smoothers via either gam, gamm (mgcv) or gamm4, as well as factor variables which are included on their own or as two‐level interaction terms within the gam smooth (via use of the “by” argument), or with themselves. The function allows any model to be constructed and used as a null model, and takes a range of arguments that allow control over the model set being constructed, including specifying cyclic and linear continuous predictors, specification of the smoothing algorithm used, and the maximum complexity allowed for smooth terms. The use of the function is demonstrated via case studies that highlight how appropriate model sets can be easily constructed and the broader utility of the approach for exploratory ecology.
format Online
Article
Text
id pubmed-6024142
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-60241422018-07-09 A simple function for full‐subsets multiple regression in ecology with R Fisher, Rebecca Wilson, Shaun K. Sin, Tsai M. Lee, Ai C. Langlois, Tim J. Ecol Evol Original Research Full‐subsets information theoretic approaches are becoming an increasingly popular tool for exploring predictive power and variable importance where a wide range of candidate predictors are being considered. Here, we describe a simple function in the statistical programming language R that can be used to construct, fit, and compare a complete model set of possible ecological or environmental predictors, given a response variable of interest and a starting generalized additive (mixed) model fit. Main advantages include not requiring a complete model to be fit as the starting point for candidate model set construction (meaning that a greater number of predictors can potentially be explored than might be available through functions such as dredge); model sets that include interactions between factors and continuous nonlinear predictors; and automatic removal of models with correlated predictors (based on a user defined criterion for exclusion). The function takes continuous predictors, which are fitted using smoothers via either gam, gamm (mgcv) or gamm4, as well as factor variables which are included on their own or as two‐level interaction terms within the gam smooth (via use of the “by” argument), or with themselves. The function allows any model to be constructed and used as a null model, and takes a range of arguments that allow control over the model set being constructed, including specifying cyclic and linear continuous predictors, specification of the smoothing algorithm used, and the maximum complexity allowed for smooth terms. The use of the function is demonstrated via case studies that highlight how appropriate model sets can be easily constructed and the broader utility of the approach for exploratory ecology. John Wiley and Sons Inc. 2018-05-20 /pmc/articles/PMC6024142/ /pubmed/29988441 http://dx.doi.org/10.1002/ece3.4134 Text en © 2018 The Authors. Ecology and Evolution published by John Wiley & Sons Ltd. This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Research
Fisher, Rebecca
Wilson, Shaun K.
Sin, Tsai M.
Lee, Ai C.
Langlois, Tim J.
A simple function for full‐subsets multiple regression in ecology with R
title A simple function for full‐subsets multiple regression in ecology with R
title_full A simple function for full‐subsets multiple regression in ecology with R
title_fullStr A simple function for full‐subsets multiple regression in ecology with R
title_full_unstemmed A simple function for full‐subsets multiple regression in ecology with R
title_short A simple function for full‐subsets multiple regression in ecology with R
title_sort simple function for full‐subsets multiple regression in ecology with r
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6024142/
https://www.ncbi.nlm.nih.gov/pubmed/29988441
http://dx.doi.org/10.1002/ece3.4134
work_keys_str_mv AT fisherrebecca asimplefunctionforfullsubsetsmultipleregressioninecologywithr
AT wilsonshaunk asimplefunctionforfullsubsetsmultipleregressioninecologywithr
AT sintsaim asimplefunctionforfullsubsetsmultipleregressioninecologywithr
AT leeaic asimplefunctionforfullsubsetsmultipleregressioninecologywithr
AT langloistimj asimplefunctionforfullsubsetsmultipleregressioninecologywithr
AT fisherrebecca simplefunctionforfullsubsetsmultipleregressioninecologywithr
AT wilsonshaunk simplefunctionforfullsubsetsmultipleregressioninecologywithr
AT sintsaim simplefunctionforfullsubsetsmultipleregressioninecologywithr
AT leeaic simplefunctionforfullsubsetsmultipleregressioninecologywithr
AT langloistimj simplefunctionforfullsubsetsmultipleregressioninecologywithr