Cargando…

Should ecologists prefer model‐ over distance‐based multivariate methods?

Ecological data sets often record the abundance of species, together with a set of explanatory variables. Multivariate statistical methods are optimal to analyze such data and are thus frequently used in ecology for exploration, visualization, and inference. Most approaches are based on pairwise dis...

Descripción completa

Detalles Bibliográficos
Autores principales: Jupke, Jonathan F., Schäfer, Ralf B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7069295/
https://www.ncbi.nlm.nih.gov/pubmed/32184990
http://dx.doi.org/10.1002/ece3.6059
_version_ 1783505752702320640
author Jupke, Jonathan F.
Schäfer, Ralf B.
author_facet Jupke, Jonathan F.
Schäfer, Ralf B.
author_sort Jupke, Jonathan F.
collection PubMed
description Ecological data sets often record the abundance of species, together with a set of explanatory variables. Multivariate statistical methods are optimal to analyze such data and are thus frequently used in ecology for exploration, visualization, and inference. Most approaches are based on pairwise distance matrices instead of the sites‐by‐species matrix, which stands in stark contrast to univariate statistics, where data models, assuming specific distributions, are the norm. However, through advances in statistical theory and computational power, models for multivariate data have gained traction. Systematic simulation‐based performance evaluations of these methods are important as guides for practitioners but still lacking. Here, we compare two model‐based methods, multivariate generalized linear models (MvGLMs) and constrained quadratic ordination (CQO), with two distance‐based methods, distance‐based redundancy analysis (dbRDA) and canonical correspondence analysis (CCA). We studied the performance of the methods to discriminate between causal variables and noise variables for 190 simulated data sets covering different sample sizes and data distributions. MvGLM and dbRDA differentiated accurately between causal and noise variables. The former had the lowest false‐positive rate (0.008), while the latter had the lowest false‐negative rate (0.027). CQO and CCA had the highest false‐negative rate (0.291) and false‐positive rate (0.256), respectively, where these error rates were typically high for data sets with linear responses. Our study shows that both model‐ and distance‐based methods have their place in the ecologist's statistical toolbox. MvGLM and dbRDA are reliable for analyzing species–environment relations, whereas both CQO and CCA exhibited considerable flaws, especially with linear environmental gradients.
format Online
Article
Text
id pubmed-7069295
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-70692952020-03-17 Should ecologists prefer model‐ over distance‐based multivariate methods? Jupke, Jonathan F. Schäfer, Ralf B. Ecol Evol Original Research Ecological data sets often record the abundance of species, together with a set of explanatory variables. Multivariate statistical methods are optimal to analyze such data and are thus frequently used in ecology for exploration, visualization, and inference. Most approaches are based on pairwise distance matrices instead of the sites‐by‐species matrix, which stands in stark contrast to univariate statistics, where data models, assuming specific distributions, are the norm. However, through advances in statistical theory and computational power, models for multivariate data have gained traction. Systematic simulation‐based performance evaluations of these methods are important as guides for practitioners but still lacking. Here, we compare two model‐based methods, multivariate generalized linear models (MvGLMs) and constrained quadratic ordination (CQO), with two distance‐based methods, distance‐based redundancy analysis (dbRDA) and canonical correspondence analysis (CCA). We studied the performance of the methods to discriminate between causal variables and noise variables for 190 simulated data sets covering different sample sizes and data distributions. MvGLM and dbRDA differentiated accurately between causal and noise variables. The former had the lowest false‐positive rate (0.008), while the latter had the lowest false‐negative rate (0.027). CQO and CCA had the highest false‐negative rate (0.291) and false‐positive rate (0.256), respectively, where these error rates were typically high for data sets with linear responses. Our study shows that both model‐ and distance‐based methods have their place in the ecologist's statistical toolbox. MvGLM and dbRDA are reliable for analyzing species–environment relations, whereas both CQO and CCA exhibited considerable flaws, especially with linear environmental gradients. John Wiley and Sons Inc. 2020-02-14 /pmc/articles/PMC7069295/ /pubmed/32184990 http://dx.doi.org/10.1002/ece3.6059 Text en © 2020 The Authors. Ecology and Evolution published by John Wiley & Sons Ltd. This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Research
Jupke, Jonathan F.
Schäfer, Ralf B.
Should ecologists prefer model‐ over distance‐based multivariate methods?
title Should ecologists prefer model‐ over distance‐based multivariate methods?
title_full Should ecologists prefer model‐ over distance‐based multivariate methods?
title_fullStr Should ecologists prefer model‐ over distance‐based multivariate methods?
title_full_unstemmed Should ecologists prefer model‐ over distance‐based multivariate methods?
title_short Should ecologists prefer model‐ over distance‐based multivariate methods?
title_sort should ecologists prefer model‐ over distance‐based multivariate methods?
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7069295/
https://www.ncbi.nlm.nih.gov/pubmed/32184990
http://dx.doi.org/10.1002/ece3.6059
work_keys_str_mv AT jupkejonathanf shouldecologistsprefermodeloverdistancebasedmultivariatemethods
AT schaferralfb shouldecologistsprefermodeloverdistancebasedmultivariatemethods