Cargando…

Why sampling ratio matters: Logistic regression and studies of habitat use

Logistic regression (LR) models are among the most frequently used statistical tools in ecology. With LR one can infer if a species’ habitat use is related to environmental factors and estimate the probability of species occurrence based on the values of these factors. However, studies often use ina...

Descripción completa

Detalles Bibliográficos
Autores principales: Nad’o, Ladislav, Kaňuch, Peter
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6056037/
https://www.ncbi.nlm.nih.gov/pubmed/30036369
http://dx.doi.org/10.1371/journal.pone.0200742
_version_ 1783341282403287040
author Nad’o, Ladislav
Kaňuch, Peter
author_facet Nad’o, Ladislav
Kaňuch, Peter
author_sort Nad’o, Ladislav
collection PubMed
description Logistic regression (LR) models are among the most frequently used statistical tools in ecology. With LR one can infer if a species’ habitat use is related to environmental factors and estimate the probability of species occurrence based on the values of these factors. However, studies often use inadequate sampling with regards to the arbitrarily chosen ratio between occupied and unoccupied (or available) locations, and this has a profound effect on the inference and predictive power of LR models. To demonstrate the effect of various sampling strategies/efforts on the quality of LR models, we used a unique census dataset containing all the used roosting cavities of the tree-dwelling bat Nyctalus leisleri and all cavities where the species was absent. We compared models constructed from randomly selected data subsets with varying ratios of occupied and unoccupied cavities (1:1, 1:5, 1:10) with a full dataset model (ratio 1:31). These comparisons revealed that the power of LR models was low when the sampling did not reflect the population ratio of occupied and unoccupied cavities. The use of weights improved the subsampled models. Thus, this study warns against inadequate data sampling and highly encourages a randomized sampling procedure to estimate the true ratio of occupied:unoccupied locations, which can then be used to optimize a manageable sampling effort and apply weights to improve the LR model. Such an approach may provide robust and reliable models suitable for both inference and prediction.
format Online
Article
Text
id pubmed-6056037
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-60560372018-08-06 Why sampling ratio matters: Logistic regression and studies of habitat use Nad’o, Ladislav Kaňuch, Peter PLoS One Research Article Logistic regression (LR) models are among the most frequently used statistical tools in ecology. With LR one can infer if a species’ habitat use is related to environmental factors and estimate the probability of species occurrence based on the values of these factors. However, studies often use inadequate sampling with regards to the arbitrarily chosen ratio between occupied and unoccupied (or available) locations, and this has a profound effect on the inference and predictive power of LR models. To demonstrate the effect of various sampling strategies/efforts on the quality of LR models, we used a unique census dataset containing all the used roosting cavities of the tree-dwelling bat Nyctalus leisleri and all cavities where the species was absent. We compared models constructed from randomly selected data subsets with varying ratios of occupied and unoccupied cavities (1:1, 1:5, 1:10) with a full dataset model (ratio 1:31). These comparisons revealed that the power of LR models was low when the sampling did not reflect the population ratio of occupied and unoccupied cavities. The use of weights improved the subsampled models. Thus, this study warns against inadequate data sampling and highly encourages a randomized sampling procedure to estimate the true ratio of occupied:unoccupied locations, which can then be used to optimize a manageable sampling effort and apply weights to improve the LR model. Such an approach may provide robust and reliable models suitable for both inference and prediction. Public Library of Science 2018-07-23 /pmc/articles/PMC6056037/ /pubmed/30036369 http://dx.doi.org/10.1371/journal.pone.0200742 Text en © 2018 Nad’o, Kaňuch http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Nad’o, Ladislav
Kaňuch, Peter
Why sampling ratio matters: Logistic regression and studies of habitat use
title Why sampling ratio matters: Logistic regression and studies of habitat use
title_full Why sampling ratio matters: Logistic regression and studies of habitat use
title_fullStr Why sampling ratio matters: Logistic regression and studies of habitat use
title_full_unstemmed Why sampling ratio matters: Logistic regression and studies of habitat use
title_short Why sampling ratio matters: Logistic regression and studies of habitat use
title_sort why sampling ratio matters: logistic regression and studies of habitat use
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6056037/
https://www.ncbi.nlm.nih.gov/pubmed/30036369
http://dx.doi.org/10.1371/journal.pone.0200742
work_keys_str_mv AT nadoladislav whysamplingratiomatterslogisticregressionandstudiesofhabitatuse
AT kanuchpeter whysamplingratiomatterslogisticregressionandstudiesofhabitatuse