Cargando…

Geometric anomaly detection in data

The quest for low-dimensional models which approximate high-dimensional data is pervasive across the physical, natural, and social sciences. The dominant paradigm underlying most standard modeling techniques assumes that the data are concentrated near a single unknown manifold of relatively small in...

Descripción completa

Detalles Bibliográficos
Autores principales: Stolz, Bernadette J., Tanner, Jared, Harrington, Heather A., Nanda, Vidit
Formato: Online Artículo Texto
Lenguaje:English
Publicado: National Academy of Sciences 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7443892/
https://www.ncbi.nlm.nih.gov/pubmed/32747569
http://dx.doi.org/10.1073/pnas.2001741117
_version_ 1783573709744766976
author Stolz, Bernadette J.
Tanner, Jared
Harrington, Heather A.
Nanda, Vidit
author_facet Stolz, Bernadette J.
Tanner, Jared
Harrington, Heather A.
Nanda, Vidit
author_sort Stolz, Bernadette J.
collection PubMed
description The quest for low-dimensional models which approximate high-dimensional data is pervasive across the physical, natural, and social sciences. The dominant paradigm underlying most standard modeling techniques assumes that the data are concentrated near a single unknown manifold of relatively small intrinsic dimension. Here, we present a systematic framework for detecting interfaces and related anomalies in data which may fail to satisfy the manifold hypothesis. By computing the local topology of small regions around each data point, we are able to partition a given dataset into disjoint classes, each of which can be individually approximated by a single manifold. Since these manifolds may have different intrinsic dimensions, local topology discovers singular regions in data even when none of the points have been sampled precisely from the singularities. We showcase this method by identifying the intersection of two surfaces in the 24-dimensional space of cyclo-octane conformations and by locating all of the self-intersections of a Henneberg minimal surface immersed in 3-dimensional space. Due to the local nature of the topological computations, the algorithmic burden of performing such data stratification is readily distributable across several processors.
format Online
Article
Text
id pubmed-7443892
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher National Academy of Sciences
record_format MEDLINE/PubMed
spelling pubmed-74438922020-09-01 Geometric anomaly detection in data Stolz, Bernadette J. Tanner, Jared Harrington, Heather A. Nanda, Vidit Proc Natl Acad Sci U S A Physical Sciences The quest for low-dimensional models which approximate high-dimensional data is pervasive across the physical, natural, and social sciences. The dominant paradigm underlying most standard modeling techniques assumes that the data are concentrated near a single unknown manifold of relatively small intrinsic dimension. Here, we present a systematic framework for detecting interfaces and related anomalies in data which may fail to satisfy the manifold hypothesis. By computing the local topology of small regions around each data point, we are able to partition a given dataset into disjoint classes, each of which can be individually approximated by a single manifold. Since these manifolds may have different intrinsic dimensions, local topology discovers singular regions in data even when none of the points have been sampled precisely from the singularities. We showcase this method by identifying the intersection of two surfaces in the 24-dimensional space of cyclo-octane conformations and by locating all of the self-intersections of a Henneberg minimal surface immersed in 3-dimensional space. Due to the local nature of the topological computations, the algorithmic burden of performing such data stratification is readily distributable across several processors. National Academy of Sciences 2020-08-18 2020-08-03 /pmc/articles/PMC7443892/ /pubmed/32747569 http://dx.doi.org/10.1073/pnas.2001741117 Text en Copyright © 2020 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/ https://creativecommons.org/licenses/by-nc-nd/4.0/This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) .
spellingShingle Physical Sciences
Stolz, Bernadette J.
Tanner, Jared
Harrington, Heather A.
Nanda, Vidit
Geometric anomaly detection in data
title Geometric anomaly detection in data
title_full Geometric anomaly detection in data
title_fullStr Geometric anomaly detection in data
title_full_unstemmed Geometric anomaly detection in data
title_short Geometric anomaly detection in data
title_sort geometric anomaly detection in data
topic Physical Sciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7443892/
https://www.ncbi.nlm.nih.gov/pubmed/32747569
http://dx.doi.org/10.1073/pnas.2001741117
work_keys_str_mv AT stolzbernadettej geometricanomalydetectionindata
AT tannerjared geometricanomalydetectionindata
AT harringtonheathera geometricanomalydetectionindata
AT nandavidit geometricanomalydetectionindata