Cargando…

Spectrum: fast density-aware spectral clustering for single and multi-omic data

MOTIVATION: Clustering patient omic data is integral to developing precision medicine because it allows the identification of disease subtypes. A current major challenge is the integration multi-omic data to identify a shared structure and reduce noise. Cluster analysis is also increasingly applied...

Descripción completa

Detalles Bibliográficos
Autores principales:	John, Christopher R, Watson, David, Barnes, Michael R, Pitzalis, Costantino, Lewis, Myles J
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7703791/ https://www.ncbi.nlm.nih.gov/pubmed/31501851 http://dx.doi.org/10.1093/bioinformatics/btz704

_version_	1783616697386663936
author	John, Christopher R Watson, David Barnes, Michael R Pitzalis, Costantino Lewis, Myles J
author_facet	John, Christopher R Watson, David Barnes, Michael R Pitzalis, Costantino Lewis, Myles J
author_sort	John, Christopher R
collection	PubMed
description	MOTIVATION: Clustering patient omic data is integral to developing precision medicine because it allows the identification of disease subtypes. A current major challenge is the integration multi-omic data to identify a shared structure and reduce noise. Cluster analysis is also increasingly applied on single-omic data, for example, in single cell RNA-seq analysis for clustering the transcriptomes of individual cells. This technology has clinical implications. Our motivation was therefore to develop a flexible and effective spectral clustering tool for both single and multi-omic data. RESULTS: We present Spectrum, a new spectral clustering method for complex omic data. Spectrum uses a self-tuning density-aware kernel we developed that enhances the similarity between points that share common nearest neighbours. It uses a tensor product graph data integration and diffusion procedure to reduce noise and reveal underlying structures. Spectrum contains a new method for finding the optimal number of clusters (K) involving eigenvector distribution analysis. Spectrum can automatically find K for both Gaussian and non-Gaussian structures. We demonstrate across 21 real expression datasets that Spectrum gives improved runtimes and better clustering results relative to other methods. AVAILABILITY AND IMPLEMENTATION: Spectrum is available as an R software package from CRAN https://cran.r-project.org/web/packages/Spectrum/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-7703791
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-77037912020-12-07 Spectrum: fast density-aware spectral clustering for single and multi-omic data John, Christopher R Watson, David Barnes, Michael R Pitzalis, Costantino Lewis, Myles J Bioinformatics Original Papers MOTIVATION: Clustering patient omic data is integral to developing precision medicine because it allows the identification of disease subtypes. A current major challenge is the integration multi-omic data to identify a shared structure and reduce noise. Cluster analysis is also increasingly applied on single-omic data, for example, in single cell RNA-seq analysis for clustering the transcriptomes of individual cells. This technology has clinical implications. Our motivation was therefore to develop a flexible and effective spectral clustering tool for both single and multi-omic data. RESULTS: We present Spectrum, a new spectral clustering method for complex omic data. Spectrum uses a self-tuning density-aware kernel we developed that enhances the similarity between points that share common nearest neighbours. It uses a tensor product graph data integration and diffusion procedure to reduce noise and reveal underlying structures. Spectrum contains a new method for finding the optimal number of clusters (K) involving eigenvector distribution analysis. Spectrum can automatically find K for both Gaussian and non-Gaussian structures. We demonstrate across 21 real expression datasets that Spectrum gives improved runtimes and better clustering results relative to other methods. AVAILABILITY AND IMPLEMENTATION: Spectrum is available as an R software package from CRAN https://cran.r-project.org/web/packages/Spectrum/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-02-15 2019-09-10 /pmc/articles/PMC7703791/ /pubmed/31501851 http://dx.doi.org/10.1093/bioinformatics/btz704 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Papers John, Christopher R Watson, David Barnes, Michael R Pitzalis, Costantino Lewis, Myles J Spectrum: fast density-aware spectral clustering for single and multi-omic data
title	Spectrum: fast density-aware spectral clustering for single and multi-omic data
title_full	Spectrum: fast density-aware spectral clustering for single and multi-omic data
title_fullStr	Spectrum: fast density-aware spectral clustering for single and multi-omic data
title_full_unstemmed	Spectrum: fast density-aware spectral clustering for single and multi-omic data
title_short	Spectrum: fast density-aware spectral clustering for single and multi-omic data
title_sort	spectrum: fast density-aware spectral clustering for single and multi-omic data
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7703791/ https://www.ncbi.nlm.nih.gov/pubmed/31501851 http://dx.doi.org/10.1093/bioinformatics/btz704
work_keys_str_mv	AT johnchristopherr spectrumfastdensityawarespectralclusteringforsingleandmultiomicdata AT watsondavid spectrumfastdensityawarespectralclusteringforsingleandmultiomicdata AT barnesmichaelr spectrumfastdensityawarespectralclusteringforsingleandmultiomicdata AT pitzaliscostantino spectrumfastdensityawarespectralclusteringforsingleandmultiomicdata AT lewismylesj spectrumfastdensityawarespectralclusteringforsingleandmultiomicdata

Spectrum: fast density-aware spectral clustering for single and multi-omic data

Ejemplares similares