Cargando…

Low-cost scalable discretization, prediction, and feature selection for complex systems

Finding reliable discrete approximations of complex systems is a key prerequisite when applying many of the most popular modeling tools. Common discretization approaches (e.g., the very popular K-means clustering) are crucially limited in terms of quality, parallelizability, and cost. We introduce a...

Descripción completa

Detalles Bibliográficos
Autores principales: Gerber, S., Pospisil, L., Navandar, M., Horenko, I.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Association for the Advancement of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6989146/
https://www.ncbi.nlm.nih.gov/pubmed/32064328
http://dx.doi.org/10.1126/sciadv.aaw0961
_version_ 1783492358385434624
author Gerber, S.
Pospisil, L.
Navandar, M.
Horenko, I.
author_facet Gerber, S.
Pospisil, L.
Navandar, M.
Horenko, I.
author_sort Gerber, S.
collection PubMed
description Finding reliable discrete approximations of complex systems is a key prerequisite when applying many of the most popular modeling tools. Common discretization approaches (e.g., the very popular K-means clustering) are crucially limited in terms of quality, parallelizability, and cost. We introduce a low-cost improved quality scalable probabilistic approximation (SPA) algorithm, allowing for simultaneous data-driven optimal discretization, feature selection, and prediction. We prove its optimality, parallel efficiency, and a linear scalability of iteration cost. Cross-validated applications of SPA to a range of large realistic data classification and prediction problems reveal marked cost and performance improvements. For example, SPA allows the data-driven next-day predictions of resimulated surface temperatures for Europe with the mean prediction error of 0.75°C on a common PC (being around 40% better in terms of errors and five to six orders of magnitude cheaper than with common computational instruments used by the weather services).
format Online
Article
Text
id pubmed-6989146
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher American Association for the Advancement of Science
record_format MEDLINE/PubMed
spelling pubmed-69891462020-02-14 Low-cost scalable discretization, prediction, and feature selection for complex systems Gerber, S. Pospisil, L. Navandar, M. Horenko, I. Sci Adv Research Articles Finding reliable discrete approximations of complex systems is a key prerequisite when applying many of the most popular modeling tools. Common discretization approaches (e.g., the very popular K-means clustering) are crucially limited in terms of quality, parallelizability, and cost. We introduce a low-cost improved quality scalable probabilistic approximation (SPA) algorithm, allowing for simultaneous data-driven optimal discretization, feature selection, and prediction. We prove its optimality, parallel efficiency, and a linear scalability of iteration cost. Cross-validated applications of SPA to a range of large realistic data classification and prediction problems reveal marked cost and performance improvements. For example, SPA allows the data-driven next-day predictions of resimulated surface temperatures for Europe with the mean prediction error of 0.75°C on a common PC (being around 40% better in terms of errors and five to six orders of magnitude cheaper than with common computational instruments used by the weather services). American Association for the Advancement of Science 2020-01-29 /pmc/articles/PMC6989146/ /pubmed/32064328 http://dx.doi.org/10.1126/sciadv.aaw0961 Text en Copyright © 2020 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution NonCommercial License 4.0 (CC BY-NC). http://creativecommons.org/licenses/by-nc/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license (http://creativecommons.org/licenses/by-nc/4.0/) , which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.
spellingShingle Research Articles
Gerber, S.
Pospisil, L.
Navandar, M.
Horenko, I.
Low-cost scalable discretization, prediction, and feature selection for complex systems
title Low-cost scalable discretization, prediction, and feature selection for complex systems
title_full Low-cost scalable discretization, prediction, and feature selection for complex systems
title_fullStr Low-cost scalable discretization, prediction, and feature selection for complex systems
title_full_unstemmed Low-cost scalable discretization, prediction, and feature selection for complex systems
title_short Low-cost scalable discretization, prediction, and feature selection for complex systems
title_sort low-cost scalable discretization, prediction, and feature selection for complex systems
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6989146/
https://www.ncbi.nlm.nih.gov/pubmed/32064328
http://dx.doi.org/10.1126/sciadv.aaw0961
work_keys_str_mv AT gerbers lowcostscalablediscretizationpredictionandfeatureselectionforcomplexsystems
AT pospisill lowcostscalablediscretizationpredictionandfeatureselectionforcomplexsystems
AT navandarm lowcostscalablediscretizationpredictionandfeatureselectionforcomplexsystems
AT horenkoi lowcostscalablediscretizationpredictionandfeatureselectionforcomplexsystems