Cargando…
Semblance: An empirical similarity kernel on probability spaces
In data science, determining proximity between observations is critical to many downstream analyses such as clustering, classification, and prediction. However, when the data’s underlying probability distribution is unclear, the function used to compute similarity between data points is often arbitr...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Association for the Advancement of Science
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6892634/ https://www.ncbi.nlm.nih.gov/pubmed/31840051 http://dx.doi.org/10.1126/sciadv.aau9630 |
_version_ | 1783476061331259392 |
---|---|
author | Agarwal, Divyansh Zhang, Nancy R. |
author_facet | Agarwal, Divyansh Zhang, Nancy R. |
author_sort | Agarwal, Divyansh |
collection | PubMed |
description | In data science, determining proximity between observations is critical to many downstream analyses such as clustering, classification, and prediction. However, when the data’s underlying probability distribution is unclear, the function used to compute similarity between data points is often arbitrarily chosen. Here, we present a novel definition of proximity, Semblance, that uses the empirical distribution of a feature to inform the pair-wise similarity between observations. The advantage of Semblance lies in its distribution-free formulation and its ability to place greater emphasis on proximity between observation pairs that fall at the outskirts of the data distribution, as opposed to those toward the center. Semblance is a valid Mercer kernel, allowing its principled use in kernel-based learning algorithms, and for any data modality. We demonstrate its consistently improved performance against conventional methods through simulations and real case studies from diverse applications in single-cell transcriptomics, image reconstruction, and financial forecasting. |
format | Online Article Text |
id | pubmed-6892634 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | American Association for the Advancement of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-68926342019-12-13 Semblance: An empirical similarity kernel on probability spaces Agarwal, Divyansh Zhang, Nancy R. Sci Adv Research Articles In data science, determining proximity between observations is critical to many downstream analyses such as clustering, classification, and prediction. However, when the data’s underlying probability distribution is unclear, the function used to compute similarity between data points is often arbitrarily chosen. Here, we present a novel definition of proximity, Semblance, that uses the empirical distribution of a feature to inform the pair-wise similarity between observations. The advantage of Semblance lies in its distribution-free formulation and its ability to place greater emphasis on proximity between observation pairs that fall at the outskirts of the data distribution, as opposed to those toward the center. Semblance is a valid Mercer kernel, allowing its principled use in kernel-based learning algorithms, and for any data modality. We demonstrate its consistently improved performance against conventional methods through simulations and real case studies from diverse applications in single-cell transcriptomics, image reconstruction, and financial forecasting. American Association for the Advancement of Science 2019-12-04 /pmc/articles/PMC6892634/ /pubmed/31840051 http://dx.doi.org/10.1126/sciadv.aau9630 Text en Copyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution NonCommercial License 4.0 (CC BY-NC). http://creativecommons.org/licenses/by-nc/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license (http://creativecommons.org/licenses/by-nc/4.0/) , which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited. |
spellingShingle | Research Articles Agarwal, Divyansh Zhang, Nancy R. Semblance: An empirical similarity kernel on probability spaces |
title | Semblance: An empirical similarity kernel on probability spaces |
title_full | Semblance: An empirical similarity kernel on probability spaces |
title_fullStr | Semblance: An empirical similarity kernel on probability spaces |
title_full_unstemmed | Semblance: An empirical similarity kernel on probability spaces |
title_short | Semblance: An empirical similarity kernel on probability spaces |
title_sort | semblance: an empirical similarity kernel on probability spaces |
topic | Research Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6892634/ https://www.ncbi.nlm.nih.gov/pubmed/31840051 http://dx.doi.org/10.1126/sciadv.aau9630 |
work_keys_str_mv | AT agarwaldivyansh semblanceanempiricalsimilaritykernelonprobabilityspaces AT zhangnancyr semblanceanempiricalsimilaritykernelonprobabilityspaces |