Cargando…
Cheap robust learning of data anomalies with analytically solvable entropic outlier sparsification
Entropic outlier sparsification (EOS) is proposed as a cheap and robust computational strategy for learning in the presence of data anomalies and outliers. EOS dwells on the derived analytic solution of the (weighted) expected loss minimization problem subject to Shannon entropy regularization. An i...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
National Academy of Sciences
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8917346/ https://www.ncbi.nlm.nih.gov/pubmed/35197293 http://dx.doi.org/10.1073/pnas.2119659119 |
_version_ | 1784668530153619456 |
---|---|
author | Horenko, Illia |
author_facet | Horenko, Illia |
author_sort | Horenko, Illia |
collection | PubMed |
description | Entropic outlier sparsification (EOS) is proposed as a cheap and robust computational strategy for learning in the presence of data anomalies and outliers. EOS dwells on the derived analytic solution of the (weighted) expected loss minimization problem subject to Shannon entropy regularization. An identified closed-form solution is proven to impose additional costs that depend linearly on statistics size and are independent of data dimension. Obtained analytic results also explain why the mixtures of spherically symmetric Gaussians—used heuristically in many popular data analysis algorithms—represent an optimal and least-biased choice for the nonparametric probability distributions when working with squared Euclidean distances. The performance of EOS is compared to a range of commonly used tools on synthetic problems and on partially mislabeled supervised classification problems from biomedicine. Applying EOS for coinference of data anomalies during learning is shown to allow reaching an accuracy of [Formula: see text] when predicting patient mortality after heart failure, statistically significantly outperforming predictive performance of common learning tools for the same data. |
format | Online Article Text |
id | pubmed-8917346 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | National Academy of Sciences |
record_format | MEDLINE/PubMed |
spelling | pubmed-89173462022-03-13 Cheap robust learning of data anomalies with analytically solvable entropic outlier sparsification Horenko, Illia Proc Natl Acad Sci U S A Physical Sciences Entropic outlier sparsification (EOS) is proposed as a cheap and robust computational strategy for learning in the presence of data anomalies and outliers. EOS dwells on the derived analytic solution of the (weighted) expected loss minimization problem subject to Shannon entropy regularization. An identified closed-form solution is proven to impose additional costs that depend linearly on statistics size and are independent of data dimension. Obtained analytic results also explain why the mixtures of spherically symmetric Gaussians—used heuristically in many popular data analysis algorithms—represent an optimal and least-biased choice for the nonparametric probability distributions when working with squared Euclidean distances. The performance of EOS is compared to a range of commonly used tools on synthetic problems and on partially mislabeled supervised classification problems from biomedicine. Applying EOS for coinference of data anomalies during learning is shown to allow reaching an accuracy of [Formula: see text] when predicting patient mortality after heart failure, statistically significantly outperforming predictive performance of common learning tools for the same data. National Academy of Sciences 2022-02-23 2022-03-01 /pmc/articles/PMC8917346/ /pubmed/35197293 http://dx.doi.org/10.1073/pnas.2119659119 Text en Copyright © 2022 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) . |
spellingShingle | Physical Sciences Horenko, Illia Cheap robust learning of data anomalies with analytically solvable entropic outlier sparsification |
title | Cheap robust learning of data anomalies with analytically solvable entropic outlier sparsification |
title_full | Cheap robust learning of data anomalies with analytically solvable entropic outlier sparsification |
title_fullStr | Cheap robust learning of data anomalies with analytically solvable entropic outlier sparsification |
title_full_unstemmed | Cheap robust learning of data anomalies with analytically solvable entropic outlier sparsification |
title_short | Cheap robust learning of data anomalies with analytically solvable entropic outlier sparsification |
title_sort | cheap robust learning of data anomalies with analytically solvable entropic outlier sparsification |
topic | Physical Sciences |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8917346/ https://www.ncbi.nlm.nih.gov/pubmed/35197293 http://dx.doi.org/10.1073/pnas.2119659119 |
work_keys_str_mv | AT horenkoillia cheaprobustlearningofdataanomalieswithanalyticallysolvableentropicoutliersparsification |