Cargando…

RandomForestsGLS: An R package for Random Forests for dependent data

With the modern advances in geographical information systems, remote sensing technologies, and low-cost sensors, we are increasingly encountering datasets where we need to account for spatial or serial dependence. Dependent observations (y(1), y(2), …, y(n)) with covariates (x(1), ..., x(n)) can be...

Descripción completa

Detalles Bibliográficos
Autores principales: Saha, Arkajyoti, Basu, Sumanta, Datta, Abhirup
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10112657/
https://www.ncbi.nlm.nih.gov/pubmed/37077317
http://dx.doi.org/10.21105/joss.03780
_version_ 1785027670048768000
author Saha, Arkajyoti
Basu, Sumanta
Datta, Abhirup
author_facet Saha, Arkajyoti
Basu, Sumanta
Datta, Abhirup
author_sort Saha, Arkajyoti
collection PubMed
description With the modern advances in geographical information systems, remote sensing technologies, and low-cost sensors, we are increasingly encountering datasets where we need to account for spatial or serial dependence. Dependent observations (y(1), y(2), …, y(n)) with covariates (x(1), ..., x(n)) can be modeled non-parametrically as y(i) = m(x(i)) + ϵ(i), where m(x(i)) is mean component and ∈(i) accounts for the dependency in data. We assume that dependence is captured through a covariance function of the correlated stochastic process ∈(i) (second order dependence). The correlation is typically a function of “spatial distance” or “time-lag” between two observations. Unlike linear regression, non-linear Machine Learning (ML) methods for estimating the regression function m can capture complex interactions among the variables. However, they often fail to account for the dependence structure, resulting in sub-optimal estimation. On the other hand, specialized software for spatial/temporal data properly models data correlation but lacks flexibility in modeling the mean function m by only focusing on linear models. RandomForestsGLS bridges the gap through a novel rendition of Random Forests (RF) – namely, RF-GLS – by explicitly modeling the spatial/serial data correlation in the RF fitting procedure to substantially improve the estimation of the mean function. Additionally, RandomForestsGLS leverages kriging to perform predictions at new locations for geo-spatial data.
format Online
Article
Text
id pubmed-10112657
institution National Center for Biotechnology Information
language English
publishDate 2022
record_format MEDLINE/PubMed
spelling pubmed-101126572023-04-18 RandomForestsGLS: An R package for Random Forests for dependent data Saha, Arkajyoti Basu, Sumanta Datta, Abhirup J Open Source Softw Article With the modern advances in geographical information systems, remote sensing technologies, and low-cost sensors, we are increasingly encountering datasets where we need to account for spatial or serial dependence. Dependent observations (y(1), y(2), …, y(n)) with covariates (x(1), ..., x(n)) can be modeled non-parametrically as y(i) = m(x(i)) + ϵ(i), where m(x(i)) is mean component and ∈(i) accounts for the dependency in data. We assume that dependence is captured through a covariance function of the correlated stochastic process ∈(i) (second order dependence). The correlation is typically a function of “spatial distance” or “time-lag” between two observations. Unlike linear regression, non-linear Machine Learning (ML) methods for estimating the regression function m can capture complex interactions among the variables. However, they often fail to account for the dependence structure, resulting in sub-optimal estimation. On the other hand, specialized software for spatial/temporal data properly models data correlation but lacks flexibility in modeling the mean function m by only focusing on linear models. RandomForestsGLS bridges the gap through a novel rendition of Random Forests (RF) – namely, RF-GLS – by explicitly modeling the spatial/serial data correlation in the RF fitting procedure to substantially improve the estimation of the mean function. Additionally, RandomForestsGLS leverages kriging to perform predictions at new locations for geo-spatial data. 2022 2022-02-25 /pmc/articles/PMC10112657/ /pubmed/37077317 http://dx.doi.org/10.21105/joss.03780 Text en https://creativecommons.org/licenses/by/4.0/License Authors of papers retain copyright and release the work under a Creative Commons Attribution 4.0 International License (CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/) ).
spellingShingle Article
Saha, Arkajyoti
Basu, Sumanta
Datta, Abhirup
RandomForestsGLS: An R package for Random Forests for dependent data
title RandomForestsGLS: An R package for Random Forests for dependent data
title_full RandomForestsGLS: An R package for Random Forests for dependent data
title_fullStr RandomForestsGLS: An R package for Random Forests for dependent data
title_full_unstemmed RandomForestsGLS: An R package for Random Forests for dependent data
title_short RandomForestsGLS: An R package for Random Forests for dependent data
title_sort randomforestsgls: an r package for random forests for dependent data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10112657/
https://www.ncbi.nlm.nih.gov/pubmed/37077317
http://dx.doi.org/10.21105/joss.03780
work_keys_str_mv AT sahaarkajyoti randomforestsglsanrpackageforrandomforestsfordependentdata
AT basusumanta randomforestsglsanrpackageforrandomforestsfordependentdata
AT dattaabhirup randomforestsglsanrpackageforrandomforestsfordependentdata