Cargando…

Cheap robust learning of data anomalies with analytically solvable entropic outlier sparsification

Entropic outlier sparsification (EOS) is proposed as a cheap and robust computational strategy for learning in the presence of data anomalies and outliers. EOS dwells on the derived analytic solution of the (weighted) expected loss minimization problem subject to Shannon entropy regularization. An i...

Descripción completa

Detalles Bibliográficos
Autor principal: Horenko, Illia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: National Academy of Sciences 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8917346/
https://www.ncbi.nlm.nih.gov/pubmed/35197293
http://dx.doi.org/10.1073/pnas.2119659119
_version_ 1784668530153619456
author Horenko, Illia
author_facet Horenko, Illia
author_sort Horenko, Illia
collection PubMed
description Entropic outlier sparsification (EOS) is proposed as a cheap and robust computational strategy for learning in the presence of data anomalies and outliers. EOS dwells on the derived analytic solution of the (weighted) expected loss minimization problem subject to Shannon entropy regularization. An identified closed-form solution is proven to impose additional costs that depend linearly on statistics size and are independent of data dimension. Obtained analytic results also explain why the mixtures of spherically symmetric Gaussians—used heuristically in many popular data analysis algorithms—represent an optimal and least-biased choice for the nonparametric probability distributions when working with squared Euclidean distances. The performance of EOS is compared to a range of commonly used tools on synthetic problems and on partially mislabeled supervised classification problems from biomedicine. Applying EOS for coinference of data anomalies during learning is shown to allow reaching an accuracy of [Formula: see text] when predicting patient mortality after heart failure, statistically significantly outperforming predictive performance of common learning tools for the same data.
format Online
Article
Text
id pubmed-8917346
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher National Academy of Sciences
record_format MEDLINE/PubMed
spelling pubmed-89173462022-03-13 Cheap robust learning of data anomalies with analytically solvable entropic outlier sparsification Horenko, Illia Proc Natl Acad Sci U S A Physical Sciences Entropic outlier sparsification (EOS) is proposed as a cheap and robust computational strategy for learning in the presence of data anomalies and outliers. EOS dwells on the derived analytic solution of the (weighted) expected loss minimization problem subject to Shannon entropy regularization. An identified closed-form solution is proven to impose additional costs that depend linearly on statistics size and are independent of data dimension. Obtained analytic results also explain why the mixtures of spherically symmetric Gaussians—used heuristically in many popular data analysis algorithms—represent an optimal and least-biased choice for the nonparametric probability distributions when working with squared Euclidean distances. The performance of EOS is compared to a range of commonly used tools on synthetic problems and on partially mislabeled supervised classification problems from biomedicine. Applying EOS for coinference of data anomalies during learning is shown to allow reaching an accuracy of [Formula: see text] when predicting patient mortality after heart failure, statistically significantly outperforming predictive performance of common learning tools for the same data. National Academy of Sciences 2022-02-23 2022-03-01 /pmc/articles/PMC8917346/ /pubmed/35197293 http://dx.doi.org/10.1073/pnas.2119659119 Text en Copyright © 2022 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) .
spellingShingle Physical Sciences
Horenko, Illia
Cheap robust learning of data anomalies with analytically solvable entropic outlier sparsification
title Cheap robust learning of data anomalies with analytically solvable entropic outlier sparsification
title_full Cheap robust learning of data anomalies with analytically solvable entropic outlier sparsification
title_fullStr Cheap robust learning of data anomalies with analytically solvable entropic outlier sparsification
title_full_unstemmed Cheap robust learning of data anomalies with analytically solvable entropic outlier sparsification
title_short Cheap robust learning of data anomalies with analytically solvable entropic outlier sparsification
title_sort cheap robust learning of data anomalies with analytically solvable entropic outlier sparsification
topic Physical Sciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8917346/
https://www.ncbi.nlm.nih.gov/pubmed/35197293
http://dx.doi.org/10.1073/pnas.2119659119
work_keys_str_mv AT horenkoillia cheaprobustlearningofdataanomalieswithanalyticallysolvableentropicoutliersparsification