Cargando…

Minimum Spanning vs. Principal Trees for Structured Approximations of Multi-Dimensional Datasets

Construction of graph-based approximations for multi-dimensional data point clouds is widely used in a variety of areas. Notable examples of applications of such approximators are cellular trajectory inference in single-cell data analysis, analysis of clinical trajectories from synchronic datasets,...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chervov, Alexander, Bac, Jonathan, Zinovyev, Andrei
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7711596/ https://www.ncbi.nlm.nih.gov/pubmed/33287042 http://dx.doi.org/10.3390/e22111274

_version_	1783618181150015488
author	Chervov, Alexander Bac, Jonathan Zinovyev, Andrei
author_facet	Chervov, Alexander Bac, Jonathan Zinovyev, Andrei
author_sort	Chervov, Alexander
collection	PubMed
description	Construction of graph-based approximations for multi-dimensional data point clouds is widely used in a variety of areas. Notable examples of applications of such approximators are cellular trajectory inference in single-cell data analysis, analysis of clinical trajectories from synchronic datasets, and skeletonization of images. Several methods have been proposed to construct such approximating graphs, with some based on computation of minimum spanning trees and some based on principal graphs generalizing principal curves. In this article we propose a methodology to compare and benchmark these two graph-based data approximation approaches, as well as to define their hyperparameters. The main idea is to avoid comparing graphs directly, but at first to induce clustering of the data point cloud from the graph approximation and, secondly, to use well-established methods to compare and score the data cloud partitioning induced by the graphs. In particular, mutual information-based approaches prove to be useful in this context. The induced clustering is based on decomposing a graph into non-branching segments, and then clustering the data point cloud by the nearest segment. Such a method allows efficient comparison of graph-based data approximations of arbitrary topology and complexity. The method is implemented in Python using the standard scikit-learn library which provides high speed and efficiency. As a demonstration of the methodology we analyse and compare graph-based data approximation methods using synthetic as well as real-life single cell datasets.
format	Online Article Text
id	pubmed-7711596
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-77115962021-02-24 Minimum Spanning vs. Principal Trees for Structured Approximations of Multi-Dimensional Datasets Chervov, Alexander Bac, Jonathan Zinovyev, Andrei Entropy (Basel) Article Construction of graph-based approximations for multi-dimensional data point clouds is widely used in a variety of areas. Notable examples of applications of such approximators are cellular trajectory inference in single-cell data analysis, analysis of clinical trajectories from synchronic datasets, and skeletonization of images. Several methods have been proposed to construct such approximating graphs, with some based on computation of minimum spanning trees and some based on principal graphs generalizing principal curves. In this article we propose a methodology to compare and benchmark these two graph-based data approximation approaches, as well as to define their hyperparameters. The main idea is to avoid comparing graphs directly, but at first to induce clustering of the data point cloud from the graph approximation and, secondly, to use well-established methods to compare and score the data cloud partitioning induced by the graphs. In particular, mutual information-based approaches prove to be useful in this context. The induced clustering is based on decomposing a graph into non-branching segments, and then clustering the data point cloud by the nearest segment. Such a method allows efficient comparison of graph-based data approximations of arbitrary topology and complexity. The method is implemented in Python using the standard scikit-learn library which provides high speed and efficiency. As a demonstration of the methodology we analyse and compare graph-based data approximation methods using synthetic as well as real-life single cell datasets. MDPI 2020-11-11 /pmc/articles/PMC7711596/ /pubmed/33287042 http://dx.doi.org/10.3390/e22111274 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Chervov, Alexander Bac, Jonathan Zinovyev, Andrei Minimum Spanning vs. Principal Trees for Structured Approximations of Multi-Dimensional Datasets
title	Minimum Spanning vs. Principal Trees for Structured Approximations of Multi-Dimensional Datasets
title_full	Minimum Spanning vs. Principal Trees for Structured Approximations of Multi-Dimensional Datasets
title_fullStr	Minimum Spanning vs. Principal Trees for Structured Approximations of Multi-Dimensional Datasets
title_full_unstemmed	Minimum Spanning vs. Principal Trees for Structured Approximations of Multi-Dimensional Datasets
title_short	Minimum Spanning vs. Principal Trees for Structured Approximations of Multi-Dimensional Datasets
title_sort	minimum spanning vs. principal trees for structured approximations of multi-dimensional datasets
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7711596/ https://www.ncbi.nlm.nih.gov/pubmed/33287042 http://dx.doi.org/10.3390/e22111274
work_keys_str_mv	AT chervovalexander minimumspanningvsprincipaltreesforstructuredapproximationsofmultidimensionaldatasets AT bacjonathan minimumspanningvsprincipaltreesforstructuredapproximationsofmultidimensionaldatasets AT zinovyevandrei minimumspanningvsprincipaltreesforstructuredapproximationsofmultidimensionaldatasets

Minimum Spanning vs. Principal Trees for Structured Approximations of Multi-Dimensional Datasets

Ejemplares similares