Cargando…

Scikit-Dimension: A Python Package for Intrinsic Dimension Estimation

Dealing with uncertainty in applications of machine learning to real-life data critically depends on the knowledge of intrinsic dimensionality (ID). A number of methods have been suggested for the purpose of estimating ID, but no standard package to easily apply them one by one or all at once has be...

Descripción completa

Detalles Bibliográficos
Autores principales: Bac, Jonathan, Mirkes, Evgeny M., Gorban, Alexander N., Tyukin, Ivan, Zinovyev, Andrei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8534554/
https://www.ncbi.nlm.nih.gov/pubmed/34682092
http://dx.doi.org/10.3390/e23101368
_version_ 1784587580853977088
author Bac, Jonathan
Mirkes, Evgeny M.
Gorban, Alexander N.
Tyukin, Ivan
Zinovyev, Andrei
author_facet Bac, Jonathan
Mirkes, Evgeny M.
Gorban, Alexander N.
Tyukin, Ivan
Zinovyev, Andrei
author_sort Bac, Jonathan
collection PubMed
description Dealing with uncertainty in applications of machine learning to real-life data critically depends on the knowledge of intrinsic dimensionality (ID). A number of methods have been suggested for the purpose of estimating ID, but no standard package to easily apply them one by one or all at once has been implemented in Python. This technical note introduces scikit-dimension, an open-source Python package for intrinsic dimension estimation. The scikit-dimension package provides a uniform implementation of most of the known ID estimators based on the scikit-learn application programming interface to evaluate the global and local intrinsic dimension, as well as generators of synthetic toy and benchmark datasets widespread in the literature. The package is developed with tools assessing the code quality, coverage, unit testing and continuous integration. We briefly describe the package and demonstrate its use in a large-scale (more than 500 datasets) benchmarking of methods for ID estimation for real-life and synthetic data.
format Online
Article
Text
id pubmed-8534554
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-85345542021-10-23 Scikit-Dimension: A Python Package for Intrinsic Dimension Estimation Bac, Jonathan Mirkes, Evgeny M. Gorban, Alexander N. Tyukin, Ivan Zinovyev, Andrei Entropy (Basel) Technical Note Dealing with uncertainty in applications of machine learning to real-life data critically depends on the knowledge of intrinsic dimensionality (ID). A number of methods have been suggested for the purpose of estimating ID, but no standard package to easily apply them one by one or all at once has been implemented in Python. This technical note introduces scikit-dimension, an open-source Python package for intrinsic dimension estimation. The scikit-dimension package provides a uniform implementation of most of the known ID estimators based on the scikit-learn application programming interface to evaluate the global and local intrinsic dimension, as well as generators of synthetic toy and benchmark datasets widespread in the literature. The package is developed with tools assessing the code quality, coverage, unit testing and continuous integration. We briefly describe the package and demonstrate its use in a large-scale (more than 500 datasets) benchmarking of methods for ID estimation for real-life and synthetic data. MDPI 2021-10-19 /pmc/articles/PMC8534554/ /pubmed/34682092 http://dx.doi.org/10.3390/e23101368 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Technical Note
Bac, Jonathan
Mirkes, Evgeny M.
Gorban, Alexander N.
Tyukin, Ivan
Zinovyev, Andrei
Scikit-Dimension: A Python Package for Intrinsic Dimension Estimation
title Scikit-Dimension: A Python Package for Intrinsic Dimension Estimation
title_full Scikit-Dimension: A Python Package for Intrinsic Dimension Estimation
title_fullStr Scikit-Dimension: A Python Package for Intrinsic Dimension Estimation
title_full_unstemmed Scikit-Dimension: A Python Package for Intrinsic Dimension Estimation
title_short Scikit-Dimension: A Python Package for Intrinsic Dimension Estimation
title_sort scikit-dimension: a python package for intrinsic dimension estimation
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8534554/
https://www.ncbi.nlm.nih.gov/pubmed/34682092
http://dx.doi.org/10.3390/e23101368
work_keys_str_mv AT bacjonathan scikitdimensionapythonpackageforintrinsicdimensionestimation
AT mirkesevgenym scikitdimensionapythonpackageforintrinsicdimensionestimation
AT gorbanalexandern scikitdimensionapythonpackageforintrinsicdimensionestimation
AT tyukinivan scikitdimensionapythonpackageforintrinsicdimensionestimation
AT zinovyevandrei scikitdimensionapythonpackageforintrinsicdimensionestimation