Cargando…

Robust and efficient single-cell Hi-C clustering with approximate k-nearest neighbor graphs

MOTIVATION: Hi-C technology provides insights into the 3D organization of the chromatin, and the single-cell Hi-C method enables researchers to gain knowledge about the chromatin state in individual cell levels. Single-cell Hi-C interaction matrices are high dimensional and very sparse. To cluster t...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wolff, Joachim, Backofen, Rolf, Grüning, Björn
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2021
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9502147/ https://www.ncbi.nlm.nih.gov/pubmed/34021764 http://dx.doi.org/10.1093/bioinformatics/btab394

_version_	1784795635128467456
author	Wolff, Joachim Backofen, Rolf Grüning, Björn
author_facet	Wolff, Joachim Backofen, Rolf Grüning, Björn
author_sort	Wolff, Joachim
collection	PubMed
description	MOTIVATION: Hi-C technology provides insights into the 3D organization of the chromatin, and the single-cell Hi-C method enables researchers to gain knowledge about the chromatin state in individual cell levels. Single-cell Hi-C interaction matrices are high dimensional and very sparse. To cluster thousands of single-cell Hi-C interaction matrices, they are flattened and compiled into one matrix. Depending on the resolution, this matrix can have a few million or even billions of features; therefore, computations can be memory intensive. We present a single-cell Hi-C clustering approach using an approximate nearest neighbors method based on locality-sensitive hashing to reduce the dimensions and the computational resources. RESULTS: The presented method can process a 10 kb single-cell Hi-C dataset with 2600 cells and needs 40 GB of memory, while competitive approaches are not computable even with 1 TB of memory. It can be shown that the differentiation of the cells by their chromatin folding properties and, therefore, the quality of the clustering of single-cell Hi-C data is advantageous compared to competitive algorithms. AVAILABILITY AND IMPLEMENTATION: The presented clustering algorithm is part of the scHiCExplorer, is available on Github https://github.com/joachimwolff/scHiCExplorer, and as a conda package via the bioconda channel. The approximate nearest neighbors implementation is available via https://github.com/joachimwolff/sparse-neighbors-search and as a conda package via the bioconda channel. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-9502147
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-95021472022-09-26 Robust and efficient single-cell Hi-C clustering with approximate k-nearest neighbor graphs Wolff, Joachim Backofen, Rolf Grüning, Björn Bioinformatics Original Papers MOTIVATION: Hi-C technology provides insights into the 3D organization of the chromatin, and the single-cell Hi-C method enables researchers to gain knowledge about the chromatin state in individual cell levels. Single-cell Hi-C interaction matrices are high dimensional and very sparse. To cluster thousands of single-cell Hi-C interaction matrices, they are flattened and compiled into one matrix. Depending on the resolution, this matrix can have a few million or even billions of features; therefore, computations can be memory intensive. We present a single-cell Hi-C clustering approach using an approximate nearest neighbors method based on locality-sensitive hashing to reduce the dimensions and the computational resources. RESULTS: The presented method can process a 10 kb single-cell Hi-C dataset with 2600 cells and needs 40 GB of memory, while competitive approaches are not computable even with 1 TB of memory. It can be shown that the differentiation of the cells by their chromatin folding properties and, therefore, the quality of the clustering of single-cell Hi-C data is advantageous compared to competitive algorithms. AVAILABILITY AND IMPLEMENTATION: The presented clustering algorithm is part of the scHiCExplorer, is available on Github https://github.com/joachimwolff/scHiCExplorer, and as a conda package via the bioconda channel. The approximate nearest neighbors implementation is available via https://github.com/joachimwolff/sparse-neighbors-search and as a conda package via the bioconda channel. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-05-22 /pmc/articles/PMC9502147/ /pubmed/34021764 http://dx.doi.org/10.1093/bioinformatics/btab394 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Papers Wolff, Joachim Backofen, Rolf Grüning, Björn Robust and efficient single-cell Hi-C clustering with approximate k-nearest neighbor graphs
title	Robust and efficient single-cell Hi-C clustering with approximate k-nearest neighbor graphs
title_full	Robust and efficient single-cell Hi-C clustering with approximate k-nearest neighbor graphs
title_fullStr	Robust and efficient single-cell Hi-C clustering with approximate k-nearest neighbor graphs
title_full_unstemmed	Robust and efficient single-cell Hi-C clustering with approximate k-nearest neighbor graphs
title_short	Robust and efficient single-cell Hi-C clustering with approximate k-nearest neighbor graphs
title_sort	robust and efficient single-cell hi-c clustering with approximate k-nearest neighbor graphs
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9502147/ https://www.ncbi.nlm.nih.gov/pubmed/34021764 http://dx.doi.org/10.1093/bioinformatics/btab394
work_keys_str_mv	AT wolffjoachim robustandefficientsinglecellhicclusteringwithapproximateknearestneighborgraphs AT backofenrolf robustandefficientsinglecellhicclusteringwithapproximateknearestneighborgraphs AT gruningbjorn robustandefficientsinglecellhicclusteringwithapproximateknearestneighborgraphs

Robust and efficient single-cell Hi-C clustering with approximate k-nearest neighbor graphs

Ejemplares similares