Cargando…

Semblance: An empirical similarity kernel on probability spaces

In data science, determining proximity between observations is critical to many downstream analyses such as clustering, classification, and prediction. However, when the data’s underlying probability distribution is unclear, the function used to compute similarity between data points is often arbitr...

Descripción completa

Detalles Bibliográficos
Autores principales: Agarwal, Divyansh, Zhang, Nancy R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Association for the Advancement of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6892634/
https://www.ncbi.nlm.nih.gov/pubmed/31840051
http://dx.doi.org/10.1126/sciadv.aau9630
_version_ 1783476061331259392
author Agarwal, Divyansh
Zhang, Nancy R.
author_facet Agarwal, Divyansh
Zhang, Nancy R.
author_sort Agarwal, Divyansh
collection PubMed
description In data science, determining proximity between observations is critical to many downstream analyses such as clustering, classification, and prediction. However, when the data’s underlying probability distribution is unclear, the function used to compute similarity between data points is often arbitrarily chosen. Here, we present a novel definition of proximity, Semblance, that uses the empirical distribution of a feature to inform the pair-wise similarity between observations. The advantage of Semblance lies in its distribution-free formulation and its ability to place greater emphasis on proximity between observation pairs that fall at the outskirts of the data distribution, as opposed to those toward the center. Semblance is a valid Mercer kernel, allowing its principled use in kernel-based learning algorithms, and for any data modality. We demonstrate its consistently improved performance against conventional methods through simulations and real case studies from diverse applications in single-cell transcriptomics, image reconstruction, and financial forecasting.
format Online
Article
Text
id pubmed-6892634
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher American Association for the Advancement of Science
record_format MEDLINE/PubMed
spelling pubmed-68926342019-12-13 Semblance: An empirical similarity kernel on probability spaces Agarwal, Divyansh Zhang, Nancy R. Sci Adv Research Articles In data science, determining proximity between observations is critical to many downstream analyses such as clustering, classification, and prediction. However, when the data’s underlying probability distribution is unclear, the function used to compute similarity between data points is often arbitrarily chosen. Here, we present a novel definition of proximity, Semblance, that uses the empirical distribution of a feature to inform the pair-wise similarity between observations. The advantage of Semblance lies in its distribution-free formulation and its ability to place greater emphasis on proximity between observation pairs that fall at the outskirts of the data distribution, as opposed to those toward the center. Semblance is a valid Mercer kernel, allowing its principled use in kernel-based learning algorithms, and for any data modality. We demonstrate its consistently improved performance against conventional methods through simulations and real case studies from diverse applications in single-cell transcriptomics, image reconstruction, and financial forecasting. American Association for the Advancement of Science 2019-12-04 /pmc/articles/PMC6892634/ /pubmed/31840051 http://dx.doi.org/10.1126/sciadv.aau9630 Text en Copyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution NonCommercial License 4.0 (CC BY-NC). http://creativecommons.org/licenses/by-nc/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license (http://creativecommons.org/licenses/by-nc/4.0/) , which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.
spellingShingle Research Articles
Agarwal, Divyansh
Zhang, Nancy R.
Semblance: An empirical similarity kernel on probability spaces
title Semblance: An empirical similarity kernel on probability spaces
title_full Semblance: An empirical similarity kernel on probability spaces
title_fullStr Semblance: An empirical similarity kernel on probability spaces
title_full_unstemmed Semblance: An empirical similarity kernel on probability spaces
title_short Semblance: An empirical similarity kernel on probability spaces
title_sort semblance: an empirical similarity kernel on probability spaces
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6892634/
https://www.ncbi.nlm.nih.gov/pubmed/31840051
http://dx.doi.org/10.1126/sciadv.aau9630
work_keys_str_mv AT agarwaldivyansh semblanceanempiricalsimilaritykernelonprobabilityspaces
AT zhangnancyr semblanceanempiricalsimilaritykernelonprobabilityspaces