Cargando…

Visualization of very large high-dimensional data sets as minimum spanning trees

The chemical sciences are producing an unprecedented amount of large, high-dimensional data sets containing chemical structures and associated properties. However, there are currently no algorithms to visualize such data while preserving both global and local features with a sufficient level of deta...

Descripción completa

Detalles Bibliográficos
Autores principales:	Probst, Daniel, Reymond, Jean-Louis
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer International Publishing 2020
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7015965/ https://www.ncbi.nlm.nih.gov/pubmed/33431043 http://dx.doi.org/10.1186/s13321-020-0416-x

_version_	1783496891769552896
author	Probst, Daniel Reymond, Jean-Louis
author_facet	Probst, Daniel Reymond, Jean-Louis
author_sort	Probst, Daniel
collection	PubMed
description	The chemical sciences are producing an unprecedented amount of large, high-dimensional data sets containing chemical structures and associated properties. However, there are currently no algorithms to visualize such data while preserving both global and local features with a sufficient level of detail to allow for human inspection and interpretation. Here, we propose a solution to this problem with a new data visualization method, TMAP, capable of representing data sets of up to millions of data points and arbitrary high dimensionality as a two-dimensional tree (http://tmap.gdb.tools). Visualizations based on TMAP are better suited than t-SNE or UMAP for the exploration and interpretation of large data sets due to their tree-like nature, increased local and global neighborhood and structure preservation, and the transparency of the methods the algorithm is based on. We apply TMAP to the most used chemistry data sets including databases of molecules such as ChEMBL, FDB17, the Natural Products Atlas, DSSTox, as well as to the MoleculeNet benchmark collection of data sets. We also show its broad applicability with further examples from biology, particle physics, and literature. [Image: see text]
format	Online Article Text
id	pubmed-7015965
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Springer International Publishing
record_format	MEDLINE/PubMed
spelling	pubmed-70159652020-02-20 Visualization of very large high-dimensional data sets as minimum spanning trees Probst, Daniel Reymond, Jean-Louis J Cheminform Research Article The chemical sciences are producing an unprecedented amount of large, high-dimensional data sets containing chemical structures and associated properties. However, there are currently no algorithms to visualize such data while preserving both global and local features with a sufficient level of detail to allow for human inspection and interpretation. Here, we propose a solution to this problem with a new data visualization method, TMAP, capable of representing data sets of up to millions of data points and arbitrary high dimensionality as a two-dimensional tree (http://tmap.gdb.tools). Visualizations based on TMAP are better suited than t-SNE or UMAP for the exploration and interpretation of large data sets due to their tree-like nature, increased local and global neighborhood and structure preservation, and the transparency of the methods the algorithm is based on. We apply TMAP to the most used chemistry data sets including databases of molecules such as ChEMBL, FDB17, the Natural Products Atlas, DSSTox, as well as to the MoleculeNet benchmark collection of data sets. We also show its broad applicability with further examples from biology, particle physics, and literature. [Image: see text] Springer International Publishing 2020-02-12 /pmc/articles/PMC7015965/ /pubmed/33431043 http://dx.doi.org/10.1186/s13321-020-0416-x Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Article Probst, Daniel Reymond, Jean-Louis Visualization of very large high-dimensional data sets as minimum spanning trees
title	Visualization of very large high-dimensional data sets as minimum spanning trees
title_full	Visualization of very large high-dimensional data sets as minimum spanning trees
title_fullStr	Visualization of very large high-dimensional data sets as minimum spanning trees
title_full_unstemmed	Visualization of very large high-dimensional data sets as minimum spanning trees
title_short	Visualization of very large high-dimensional data sets as minimum spanning trees
title_sort	visualization of very large high-dimensional data sets as minimum spanning trees
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7015965/ https://www.ncbi.nlm.nih.gov/pubmed/33431043 http://dx.doi.org/10.1186/s13321-020-0416-x
work_keys_str_mv	AT probstdaniel visualizationofverylargehighdimensionaldatasetsasminimumspanningtrees AT reymondjeanlouis visualizationofverylargehighdimensionaldatasetsasminimumspanningtrees

Visualization of very large high-dimensional data sets as minimum spanning trees

Ejemplares similares