Cargando…
Should ecologists prefer model‐ over distance‐based multivariate methods?
Ecological data sets often record the abundance of species, together with a set of explanatory variables. Multivariate statistical methods are optimal to analyze such data and are thus frequently used in ecology for exploration, visualization, and inference. Most approaches are based on pairwise dis...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
John Wiley and Sons Inc.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7069295/ https://www.ncbi.nlm.nih.gov/pubmed/32184990 http://dx.doi.org/10.1002/ece3.6059 |
_version_ | 1783505752702320640 |
---|---|
author | Jupke, Jonathan F. Schäfer, Ralf B. |
author_facet | Jupke, Jonathan F. Schäfer, Ralf B. |
author_sort | Jupke, Jonathan F. |
collection | PubMed |
description | Ecological data sets often record the abundance of species, together with a set of explanatory variables. Multivariate statistical methods are optimal to analyze such data and are thus frequently used in ecology for exploration, visualization, and inference. Most approaches are based on pairwise distance matrices instead of the sites‐by‐species matrix, which stands in stark contrast to univariate statistics, where data models, assuming specific distributions, are the norm. However, through advances in statistical theory and computational power, models for multivariate data have gained traction. Systematic simulation‐based performance evaluations of these methods are important as guides for practitioners but still lacking. Here, we compare two model‐based methods, multivariate generalized linear models (MvGLMs) and constrained quadratic ordination (CQO), with two distance‐based methods, distance‐based redundancy analysis (dbRDA) and canonical correspondence analysis (CCA). We studied the performance of the methods to discriminate between causal variables and noise variables for 190 simulated data sets covering different sample sizes and data distributions. MvGLM and dbRDA differentiated accurately between causal and noise variables. The former had the lowest false‐positive rate (0.008), while the latter had the lowest false‐negative rate (0.027). CQO and CCA had the highest false‐negative rate (0.291) and false‐positive rate (0.256), respectively, where these error rates were typically high for data sets with linear responses. Our study shows that both model‐ and distance‐based methods have their place in the ecologist's statistical toolbox. MvGLM and dbRDA are reliable for analyzing species–environment relations, whereas both CQO and CCA exhibited considerable flaws, especially with linear environmental gradients. |
format | Online Article Text |
id | pubmed-7069295 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | John Wiley and Sons Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-70692952020-03-17 Should ecologists prefer model‐ over distance‐based multivariate methods? Jupke, Jonathan F. Schäfer, Ralf B. Ecol Evol Original Research Ecological data sets often record the abundance of species, together with a set of explanatory variables. Multivariate statistical methods are optimal to analyze such data and are thus frequently used in ecology for exploration, visualization, and inference. Most approaches are based on pairwise distance matrices instead of the sites‐by‐species matrix, which stands in stark contrast to univariate statistics, where data models, assuming specific distributions, are the norm. However, through advances in statistical theory and computational power, models for multivariate data have gained traction. Systematic simulation‐based performance evaluations of these methods are important as guides for practitioners but still lacking. Here, we compare two model‐based methods, multivariate generalized linear models (MvGLMs) and constrained quadratic ordination (CQO), with two distance‐based methods, distance‐based redundancy analysis (dbRDA) and canonical correspondence analysis (CCA). We studied the performance of the methods to discriminate between causal variables and noise variables for 190 simulated data sets covering different sample sizes and data distributions. MvGLM and dbRDA differentiated accurately between causal and noise variables. The former had the lowest false‐positive rate (0.008), while the latter had the lowest false‐negative rate (0.027). CQO and CCA had the highest false‐negative rate (0.291) and false‐positive rate (0.256), respectively, where these error rates were typically high for data sets with linear responses. Our study shows that both model‐ and distance‐based methods have their place in the ecologist's statistical toolbox. MvGLM and dbRDA are reliable for analyzing species–environment relations, whereas both CQO and CCA exhibited considerable flaws, especially with linear environmental gradients. John Wiley and Sons Inc. 2020-02-14 /pmc/articles/PMC7069295/ /pubmed/32184990 http://dx.doi.org/10.1002/ece3.6059 Text en © 2020 The Authors. Ecology and Evolution published by John Wiley & Sons Ltd. This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Research Jupke, Jonathan F. Schäfer, Ralf B. Should ecologists prefer model‐ over distance‐based multivariate methods? |
title | Should ecologists prefer model‐ over distance‐based multivariate methods? |
title_full | Should ecologists prefer model‐ over distance‐based multivariate methods? |
title_fullStr | Should ecologists prefer model‐ over distance‐based multivariate methods? |
title_full_unstemmed | Should ecologists prefer model‐ over distance‐based multivariate methods? |
title_short | Should ecologists prefer model‐ over distance‐based multivariate methods? |
title_sort | should ecologists prefer model‐ over distance‐based multivariate methods? |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7069295/ https://www.ncbi.nlm.nih.gov/pubmed/32184990 http://dx.doi.org/10.1002/ece3.6059 |
work_keys_str_mv | AT jupkejonathanf shouldecologistsprefermodeloverdistancebasedmultivariatemethods AT schaferralfb shouldecologistsprefermodeloverdistancebasedmultivariatemethods |