Cargando…
A critical issue in model-based inference for studying trait-based community assembly and a solution
Statistical testing of trait-environment association from data is a challenge as there is no common unit of observation: the trait is observed on species, the environment on sites and the mediating abundance on species-site combinations. A number of correlation-based methods, such as the community w...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5237366/ https://www.ncbi.nlm.nih.gov/pubmed/28097076 http://dx.doi.org/10.7717/peerj.2885 |
_version_ | 1782495522326577152 |
---|---|
author | ter Braak, Cajo J.F. Peres-Neto, Pedro Dray, Stéphane |
author_facet | ter Braak, Cajo J.F. Peres-Neto, Pedro Dray, Stéphane |
author_sort | ter Braak, Cajo J.F. |
collection | PubMed |
description | Statistical testing of trait-environment association from data is a challenge as there is no common unit of observation: the trait is observed on species, the environment on sites and the mediating abundance on species-site combinations. A number of correlation-based methods, such as the community weighted trait means method (CWM), the fourth-corner correlation method and the multivariate method RLQ, have been proposed to estimate such trait-environment associations. In these methods, valid statistical testing proceeds by performing two separate resampling tests, one site-based and the other species-based and by assessing significance by the largest of the two p-values (the p(max) test). Recently, regression-based methods using generalized linear models (GLM) have been proposed as a promising alternative with statistical inference via site-based resampling. We investigated the performance of this new approach along with approaches that mimicked the p(max) test using GLM instead of fourth-corner. By simulation using models with additional random variation in the species response to the environment, the site-based resampling tests using GLM are shown to have severely inflated type I error, of up to 90%, when the nominal level is set as 5%. In addition, predictive modelling of such data using site-based cross-validation very often identified trait-environment interactions that had no predictive value. The problem that we identify is not an “omitted variable bias” problem as it occurs even when the additional random variation is independent of the observed trait and environment data. Instead, it is a problem of ignoring a random effect. In the same simulations, the GLM-based p(max) test controlled the type I error in all models proposed so far in this context, but still gave slightly inflated error in more complex models that included both missing (but important) traits and missing (but important) environmental variables. For screening the importance of single trait-environment combinations, the fourth-corner test is shown to give almost the same results as the GLM-based tests in far less computing time. |
format | Online Article Text |
id | pubmed-5237366 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-52373662017-01-17 A critical issue in model-based inference for studying trait-based community assembly and a solution ter Braak, Cajo J.F. Peres-Neto, Pedro Dray, Stéphane PeerJ Ecology Statistical testing of trait-environment association from data is a challenge as there is no common unit of observation: the trait is observed on species, the environment on sites and the mediating abundance on species-site combinations. A number of correlation-based methods, such as the community weighted trait means method (CWM), the fourth-corner correlation method and the multivariate method RLQ, have been proposed to estimate such trait-environment associations. In these methods, valid statistical testing proceeds by performing two separate resampling tests, one site-based and the other species-based and by assessing significance by the largest of the two p-values (the p(max) test). Recently, regression-based methods using generalized linear models (GLM) have been proposed as a promising alternative with statistical inference via site-based resampling. We investigated the performance of this new approach along with approaches that mimicked the p(max) test using GLM instead of fourth-corner. By simulation using models with additional random variation in the species response to the environment, the site-based resampling tests using GLM are shown to have severely inflated type I error, of up to 90%, when the nominal level is set as 5%. In addition, predictive modelling of such data using site-based cross-validation very often identified trait-environment interactions that had no predictive value. The problem that we identify is not an “omitted variable bias” problem as it occurs even when the additional random variation is independent of the observed trait and environment data. Instead, it is a problem of ignoring a random effect. In the same simulations, the GLM-based p(max) test controlled the type I error in all models proposed so far in this context, but still gave slightly inflated error in more complex models that included both missing (but important) traits and missing (but important) environmental variables. For screening the importance of single trait-environment combinations, the fourth-corner test is shown to give almost the same results as the GLM-based tests in far less computing time. PeerJ Inc. 2017-01-12 /pmc/articles/PMC5237366/ /pubmed/28097076 http://dx.doi.org/10.7717/peerj.2885 Text en ©2017 Ter Braak et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Ecology ter Braak, Cajo J.F. Peres-Neto, Pedro Dray, Stéphane A critical issue in model-based inference for studying trait-based community assembly and a solution |
title | A critical issue in model-based inference for studying trait-based community assembly and a solution |
title_full | A critical issue in model-based inference for studying trait-based community assembly and a solution |
title_fullStr | A critical issue in model-based inference for studying trait-based community assembly and a solution |
title_full_unstemmed | A critical issue in model-based inference for studying trait-based community assembly and a solution |
title_short | A critical issue in model-based inference for studying trait-based community assembly and a solution |
title_sort | critical issue in model-based inference for studying trait-based community assembly and a solution |
topic | Ecology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5237366/ https://www.ncbi.nlm.nih.gov/pubmed/28097076 http://dx.doi.org/10.7717/peerj.2885 |
work_keys_str_mv | AT terbraakcajojf acriticalissueinmodelbasedinferenceforstudyingtraitbasedcommunityassemblyandasolution AT peresnetopedro acriticalissueinmodelbasedinferenceforstudyingtraitbasedcommunityassemblyandasolution AT draystephane acriticalissueinmodelbasedinferenceforstudyingtraitbasedcommunityassemblyandasolution AT terbraakcajojf criticalissueinmodelbasedinferenceforstudyingtraitbasedcommunityassemblyandasolution AT peresnetopedro criticalissueinmodelbasedinferenceforstudyingtraitbasedcommunityassemblyandasolution AT draystephane criticalissueinmodelbasedinferenceforstudyingtraitbasedcommunityassemblyandasolution |