Cargando…
Low-cost scalable discretization, prediction, and feature selection for complex systems
Finding reliable discrete approximations of complex systems is a key prerequisite when applying many of the most popular modeling tools. Common discretization approaches (e.g., the very popular K-means clustering) are crucially limited in terms of quality, parallelizability, and cost. We introduce a...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Association for the Advancement of Science
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6989146/ https://www.ncbi.nlm.nih.gov/pubmed/32064328 http://dx.doi.org/10.1126/sciadv.aaw0961 |
_version_ | 1783492358385434624 |
---|---|
author | Gerber, S. Pospisil, L. Navandar, M. Horenko, I. |
author_facet | Gerber, S. Pospisil, L. Navandar, M. Horenko, I. |
author_sort | Gerber, S. |
collection | PubMed |
description | Finding reliable discrete approximations of complex systems is a key prerequisite when applying many of the most popular modeling tools. Common discretization approaches (e.g., the very popular K-means clustering) are crucially limited in terms of quality, parallelizability, and cost. We introduce a low-cost improved quality scalable probabilistic approximation (SPA) algorithm, allowing for simultaneous data-driven optimal discretization, feature selection, and prediction. We prove its optimality, parallel efficiency, and a linear scalability of iteration cost. Cross-validated applications of SPA to a range of large realistic data classification and prediction problems reveal marked cost and performance improvements. For example, SPA allows the data-driven next-day predictions of resimulated surface temperatures for Europe with the mean prediction error of 0.75°C on a common PC (being around 40% better in terms of errors and five to six orders of magnitude cheaper than with common computational instruments used by the weather services). |
format | Online Article Text |
id | pubmed-6989146 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | American Association for the Advancement of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-69891462020-02-14 Low-cost scalable discretization, prediction, and feature selection for complex systems Gerber, S. Pospisil, L. Navandar, M. Horenko, I. Sci Adv Research Articles Finding reliable discrete approximations of complex systems is a key prerequisite when applying many of the most popular modeling tools. Common discretization approaches (e.g., the very popular K-means clustering) are crucially limited in terms of quality, parallelizability, and cost. We introduce a low-cost improved quality scalable probabilistic approximation (SPA) algorithm, allowing for simultaneous data-driven optimal discretization, feature selection, and prediction. We prove its optimality, parallel efficiency, and a linear scalability of iteration cost. Cross-validated applications of SPA to a range of large realistic data classification and prediction problems reveal marked cost and performance improvements. For example, SPA allows the data-driven next-day predictions of resimulated surface temperatures for Europe with the mean prediction error of 0.75°C on a common PC (being around 40% better in terms of errors and five to six orders of magnitude cheaper than with common computational instruments used by the weather services). American Association for the Advancement of Science 2020-01-29 /pmc/articles/PMC6989146/ /pubmed/32064328 http://dx.doi.org/10.1126/sciadv.aaw0961 Text en Copyright © 2020 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution NonCommercial License 4.0 (CC BY-NC). http://creativecommons.org/licenses/by-nc/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license (http://creativecommons.org/licenses/by-nc/4.0/) , which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited. |
spellingShingle | Research Articles Gerber, S. Pospisil, L. Navandar, M. Horenko, I. Low-cost scalable discretization, prediction, and feature selection for complex systems |
title | Low-cost scalable discretization, prediction, and feature selection for complex systems |
title_full | Low-cost scalable discretization, prediction, and feature selection for complex systems |
title_fullStr | Low-cost scalable discretization, prediction, and feature selection for complex systems |
title_full_unstemmed | Low-cost scalable discretization, prediction, and feature selection for complex systems |
title_short | Low-cost scalable discretization, prediction, and feature selection for complex systems |
title_sort | low-cost scalable discretization, prediction, and feature selection for complex systems |
topic | Research Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6989146/ https://www.ncbi.nlm.nih.gov/pubmed/32064328 http://dx.doi.org/10.1126/sciadv.aaw0961 |
work_keys_str_mv | AT gerbers lowcostscalablediscretizationpredictionandfeatureselectionforcomplexsystems AT pospisill lowcostscalablediscretizationpredictionandfeatureselectionforcomplexsystems AT navandarm lowcostscalablediscretizationpredictionandfeatureselectionforcomplexsystems AT horenkoi lowcostscalablediscretizationpredictionandfeatureselectionforcomplexsystems |