Cargando…

Clustering-independent analysis of genomic data using spectral simplicial theory

The prevailing paradigm for the analysis of biological data involves comparing groups of replicates from different conditions (e.g. control and treatment) to statistically infer features that discriminate them (e.g. differentially expressed genes). However, many situations in modern genomics such as...

Descripción completa

Detalles Bibliográficos
Autores principales: Govek, Kiya W., Yamajala, Venkata S., Camara, Pablo G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6897424/
https://www.ncbi.nlm.nih.gov/pubmed/31756191
http://dx.doi.org/10.1371/journal.pcbi.1007509
_version_ 1783476966683312128
author Govek, Kiya W.
Yamajala, Venkata S.
Camara, Pablo G.
author_facet Govek, Kiya W.
Yamajala, Venkata S.
Camara, Pablo G.
author_sort Govek, Kiya W.
collection PubMed
description The prevailing paradigm for the analysis of biological data involves comparing groups of replicates from different conditions (e.g. control and treatment) to statistically infer features that discriminate them (e.g. differentially expressed genes). However, many situations in modern genomics such as single-cell omics experiments do not fit well into this paradigm because they lack true replicates. In such instances, spectral techniques could be used to rank features according to their degree of consistency with an underlying metric structure without the need to cluster samples. Here, we extend spectral methods for feature selection to abstract simplicial complexes and present a general framework for clustering-independent analysis. Combinatorial Laplacian scores take into account the topology spanned by the data and reduce to the ordinary Laplacian score when restricted to graphs. We demonstrate the utility of this framework with several applications to the analysis of gene expression and multi-modal genomic data. Specifically, we perform differential expression analysis in situations where samples cannot be grouped into distinct classes, and we disaggregate differentially expressed genes according to the topology of the expression space (e.g. alternative paths of differentiation). We also apply this formalism to identify genes with spatial patterns of expression using fluorescence in-situ hybridization data and to establish associations between genetic alterations and global expression patterns in large cross-sectional studies. Our results provide a unifying perspective on topological data analysis and manifold learning approaches to the analysis of large-scale biological datasets.
format Online
Article
Text
id pubmed-6897424
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-68974242019-12-13 Clustering-independent analysis of genomic data using spectral simplicial theory Govek, Kiya W. Yamajala, Venkata S. Camara, Pablo G. PLoS Comput Biol Research Article The prevailing paradigm for the analysis of biological data involves comparing groups of replicates from different conditions (e.g. control and treatment) to statistically infer features that discriminate them (e.g. differentially expressed genes). However, many situations in modern genomics such as single-cell omics experiments do not fit well into this paradigm because they lack true replicates. In such instances, spectral techniques could be used to rank features according to their degree of consistency with an underlying metric structure without the need to cluster samples. Here, we extend spectral methods for feature selection to abstract simplicial complexes and present a general framework for clustering-independent analysis. Combinatorial Laplacian scores take into account the topology spanned by the data and reduce to the ordinary Laplacian score when restricted to graphs. We demonstrate the utility of this framework with several applications to the analysis of gene expression and multi-modal genomic data. Specifically, we perform differential expression analysis in situations where samples cannot be grouped into distinct classes, and we disaggregate differentially expressed genes according to the topology of the expression space (e.g. alternative paths of differentiation). We also apply this formalism to identify genes with spatial patterns of expression using fluorescence in-situ hybridization data and to establish associations between genetic alterations and global expression patterns in large cross-sectional studies. Our results provide a unifying perspective on topological data analysis and manifold learning approaches to the analysis of large-scale biological datasets. Public Library of Science 2019-11-22 /pmc/articles/PMC6897424/ /pubmed/31756191 http://dx.doi.org/10.1371/journal.pcbi.1007509 Text en © 2019 Govek et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Govek, Kiya W.
Yamajala, Venkata S.
Camara, Pablo G.
Clustering-independent analysis of genomic data using spectral simplicial theory
title Clustering-independent analysis of genomic data using spectral simplicial theory
title_full Clustering-independent analysis of genomic data using spectral simplicial theory
title_fullStr Clustering-independent analysis of genomic data using spectral simplicial theory
title_full_unstemmed Clustering-independent analysis of genomic data using spectral simplicial theory
title_short Clustering-independent analysis of genomic data using spectral simplicial theory
title_sort clustering-independent analysis of genomic data using spectral simplicial theory
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6897424/
https://www.ncbi.nlm.nih.gov/pubmed/31756191
http://dx.doi.org/10.1371/journal.pcbi.1007509
work_keys_str_mv AT govekkiyaw clusteringindependentanalysisofgenomicdatausingspectralsimplicialtheory
AT yamajalavenkatas clusteringindependentanalysisofgenomicdatausingspectralsimplicialtheory
AT camarapablog clusteringindependentanalysisofgenomicdatausingspectralsimplicialtheory