Cargando…
RandomForestsGLS: An R package for Random Forests for dependent data
With the modern advances in geographical information systems, remote sensing technologies, and low-cost sensors, we are increasingly encountering datasets where we need to account for spatial or serial dependence. Dependent observations (y(1), y(2), …, y(n)) with covariates (x(1), ..., x(n)) can be...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10112657/ https://www.ncbi.nlm.nih.gov/pubmed/37077317 http://dx.doi.org/10.21105/joss.03780 |
_version_ | 1785027670048768000 |
---|---|
author | Saha, Arkajyoti Basu, Sumanta Datta, Abhirup |
author_facet | Saha, Arkajyoti Basu, Sumanta Datta, Abhirup |
author_sort | Saha, Arkajyoti |
collection | PubMed |
description | With the modern advances in geographical information systems, remote sensing technologies, and low-cost sensors, we are increasingly encountering datasets where we need to account for spatial or serial dependence. Dependent observations (y(1), y(2), …, y(n)) with covariates (x(1), ..., x(n)) can be modeled non-parametrically as y(i) = m(x(i)) + ϵ(i), where m(x(i)) is mean component and ∈(i) accounts for the dependency in data. We assume that dependence is captured through a covariance function of the correlated stochastic process ∈(i) (second order dependence). The correlation is typically a function of “spatial distance” or “time-lag” between two observations. Unlike linear regression, non-linear Machine Learning (ML) methods for estimating the regression function m can capture complex interactions among the variables. However, they often fail to account for the dependence structure, resulting in sub-optimal estimation. On the other hand, specialized software for spatial/temporal data properly models data correlation but lacks flexibility in modeling the mean function m by only focusing on linear models. RandomForestsGLS bridges the gap through a novel rendition of Random Forests (RF) – namely, RF-GLS – by explicitly modeling the spatial/serial data correlation in the RF fitting procedure to substantially improve the estimation of the mean function. Additionally, RandomForestsGLS leverages kriging to perform predictions at new locations for geo-spatial data. |
format | Online Article Text |
id | pubmed-10112657 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
record_format | MEDLINE/PubMed |
spelling | pubmed-101126572023-04-18 RandomForestsGLS: An R package for Random Forests for dependent data Saha, Arkajyoti Basu, Sumanta Datta, Abhirup J Open Source Softw Article With the modern advances in geographical information systems, remote sensing technologies, and low-cost sensors, we are increasingly encountering datasets where we need to account for spatial or serial dependence. Dependent observations (y(1), y(2), …, y(n)) with covariates (x(1), ..., x(n)) can be modeled non-parametrically as y(i) = m(x(i)) + ϵ(i), where m(x(i)) is mean component and ∈(i) accounts for the dependency in data. We assume that dependence is captured through a covariance function of the correlated stochastic process ∈(i) (second order dependence). The correlation is typically a function of “spatial distance” or “time-lag” between two observations. Unlike linear regression, non-linear Machine Learning (ML) methods for estimating the regression function m can capture complex interactions among the variables. However, they often fail to account for the dependence structure, resulting in sub-optimal estimation. On the other hand, specialized software for spatial/temporal data properly models data correlation but lacks flexibility in modeling the mean function m by only focusing on linear models. RandomForestsGLS bridges the gap through a novel rendition of Random Forests (RF) – namely, RF-GLS – by explicitly modeling the spatial/serial data correlation in the RF fitting procedure to substantially improve the estimation of the mean function. Additionally, RandomForestsGLS leverages kriging to perform predictions at new locations for geo-spatial data. 2022 2022-02-25 /pmc/articles/PMC10112657/ /pubmed/37077317 http://dx.doi.org/10.21105/joss.03780 Text en https://creativecommons.org/licenses/by/4.0/License Authors of papers retain copyright and release the work under a Creative Commons Attribution 4.0 International License (CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/) ). |
spellingShingle | Article Saha, Arkajyoti Basu, Sumanta Datta, Abhirup RandomForestsGLS: An R package for Random Forests for dependent data |
title | RandomForestsGLS: An R package for Random Forests for dependent data |
title_full | RandomForestsGLS: An R package for Random Forests for dependent data |
title_fullStr | RandomForestsGLS: An R package for Random Forests for dependent data |
title_full_unstemmed | RandomForestsGLS: An R package for Random Forests for dependent data |
title_short | RandomForestsGLS: An R package for Random Forests for dependent data |
title_sort | randomforestsgls: an r package for random forests for dependent data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10112657/ https://www.ncbi.nlm.nih.gov/pubmed/37077317 http://dx.doi.org/10.21105/joss.03780 |
work_keys_str_mv | AT sahaarkajyoti randomforestsglsanrpackageforrandomforestsfordependentdata AT basusumanta randomforestsglsanrpackageforrandomforestsfordependentdata AT dattaabhirup randomforestsglsanrpackageforrandomforestsfordependentdata |