Cargando…
Clustering-independent analysis of genomic data using spectral simplicial theory
The prevailing paradigm for the analysis of biological data involves comparing groups of replicates from different conditions (e.g. control and treatment) to statistically infer features that discriminate them (e.g. differentially expressed genes). However, many situations in modern genomics such as...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6897424/ https://www.ncbi.nlm.nih.gov/pubmed/31756191 http://dx.doi.org/10.1371/journal.pcbi.1007509 |
_version_ | 1783476966683312128 |
---|---|
author | Govek, Kiya W. Yamajala, Venkata S. Camara, Pablo G. |
author_facet | Govek, Kiya W. Yamajala, Venkata S. Camara, Pablo G. |
author_sort | Govek, Kiya W. |
collection | PubMed |
description | The prevailing paradigm for the analysis of biological data involves comparing groups of replicates from different conditions (e.g. control and treatment) to statistically infer features that discriminate them (e.g. differentially expressed genes). However, many situations in modern genomics such as single-cell omics experiments do not fit well into this paradigm because they lack true replicates. In such instances, spectral techniques could be used to rank features according to their degree of consistency with an underlying metric structure without the need to cluster samples. Here, we extend spectral methods for feature selection to abstract simplicial complexes and present a general framework for clustering-independent analysis. Combinatorial Laplacian scores take into account the topology spanned by the data and reduce to the ordinary Laplacian score when restricted to graphs. We demonstrate the utility of this framework with several applications to the analysis of gene expression and multi-modal genomic data. Specifically, we perform differential expression analysis in situations where samples cannot be grouped into distinct classes, and we disaggregate differentially expressed genes according to the topology of the expression space (e.g. alternative paths of differentiation). We also apply this formalism to identify genes with spatial patterns of expression using fluorescence in-situ hybridization data and to establish associations between genetic alterations and global expression patterns in large cross-sectional studies. Our results provide a unifying perspective on topological data analysis and manifold learning approaches to the analysis of large-scale biological datasets. |
format | Online Article Text |
id | pubmed-6897424 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-68974242019-12-13 Clustering-independent analysis of genomic data using spectral simplicial theory Govek, Kiya W. Yamajala, Venkata S. Camara, Pablo G. PLoS Comput Biol Research Article The prevailing paradigm for the analysis of biological data involves comparing groups of replicates from different conditions (e.g. control and treatment) to statistically infer features that discriminate them (e.g. differentially expressed genes). However, many situations in modern genomics such as single-cell omics experiments do not fit well into this paradigm because they lack true replicates. In such instances, spectral techniques could be used to rank features according to their degree of consistency with an underlying metric structure without the need to cluster samples. Here, we extend spectral methods for feature selection to abstract simplicial complexes and present a general framework for clustering-independent analysis. Combinatorial Laplacian scores take into account the topology spanned by the data and reduce to the ordinary Laplacian score when restricted to graphs. We demonstrate the utility of this framework with several applications to the analysis of gene expression and multi-modal genomic data. Specifically, we perform differential expression analysis in situations where samples cannot be grouped into distinct classes, and we disaggregate differentially expressed genes according to the topology of the expression space (e.g. alternative paths of differentiation). We also apply this formalism to identify genes with spatial patterns of expression using fluorescence in-situ hybridization data and to establish associations between genetic alterations and global expression patterns in large cross-sectional studies. Our results provide a unifying perspective on topological data analysis and manifold learning approaches to the analysis of large-scale biological datasets. Public Library of Science 2019-11-22 /pmc/articles/PMC6897424/ /pubmed/31756191 http://dx.doi.org/10.1371/journal.pcbi.1007509 Text en © 2019 Govek et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Govek, Kiya W. Yamajala, Venkata S. Camara, Pablo G. Clustering-independent analysis of genomic data using spectral simplicial theory |
title | Clustering-independent analysis of genomic data using spectral simplicial theory |
title_full | Clustering-independent analysis of genomic data using spectral simplicial theory |
title_fullStr | Clustering-independent analysis of genomic data using spectral simplicial theory |
title_full_unstemmed | Clustering-independent analysis of genomic data using spectral simplicial theory |
title_short | Clustering-independent analysis of genomic data using spectral simplicial theory |
title_sort | clustering-independent analysis of genomic data using spectral simplicial theory |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6897424/ https://www.ncbi.nlm.nih.gov/pubmed/31756191 http://dx.doi.org/10.1371/journal.pcbi.1007509 |
work_keys_str_mv | AT govekkiyaw clusteringindependentanalysisofgenomicdatausingspectralsimplicialtheory AT yamajalavenkatas clusteringindependentanalysisofgenomicdatausingspectralsimplicialtheory AT camarapablog clusteringindependentanalysisofgenomicdatausingspectralsimplicialtheory |