Cargando…

RandomForestsGLS: An R package for Random Forests for dependent data

With the modern advances in geographical information systems, remote sensing technologies, and low-cost sensors, we are increasingly encountering datasets where we need to account for spatial or serial dependence. Dependent observations (y(1), y(2), …, y(n)) with covariates (x(1), ..., x(n)) can be...

Descripción completa

Detalles Bibliográficos
Autores principales: Saha, Arkajyoti, Basu, Sumanta, Datta, Abhirup
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10112657/
https://www.ncbi.nlm.nih.gov/pubmed/37077317
http://dx.doi.org/10.21105/joss.03780
Descripción
Sumario:With the modern advances in geographical information systems, remote sensing technologies, and low-cost sensors, we are increasingly encountering datasets where we need to account for spatial or serial dependence. Dependent observations (y(1), y(2), …, y(n)) with covariates (x(1), ..., x(n)) can be modeled non-parametrically as y(i) = m(x(i)) + ϵ(i), where m(x(i)) is mean component and ∈(i) accounts for the dependency in data. We assume that dependence is captured through a covariance function of the correlated stochastic process ∈(i) (second order dependence). The correlation is typically a function of “spatial distance” or “time-lag” between two observations. Unlike linear regression, non-linear Machine Learning (ML) methods for estimating the regression function m can capture complex interactions among the variables. However, they often fail to account for the dependence structure, resulting in sub-optimal estimation. On the other hand, specialized software for spatial/temporal data properly models data correlation but lacks flexibility in modeling the mean function m by only focusing on linear models. RandomForestsGLS bridges the gap through a novel rendition of Random Forests (RF) – namely, RF-GLS – by explicitly modeling the spatial/serial data correlation in the RF fitting procedure to substantially improve the estimation of the mean function. Additionally, RandomForestsGLS leverages kriging to perform predictions at new locations for geo-spatial data.