Cargando…

Multiscale Methods for Signal Selection in Single-Cell Data

Analysis of single-cell transcriptomics often relies on clustering cells and then performing differential gene expression (DGE) to identify genes that vary between these clusters. These discrete analyses successfully determine cell types and markers; however, continuous variation within and between...

Descripción completa

Detalles Bibliográficos
Autores principales: Hoekzema, Renee S., Marsh, Lewis, Sumray, Otto, Carroll, Thomas M., Lu, Xin, Byrne, Helen M., Harrington, Heather A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9407339/
https://www.ncbi.nlm.nih.gov/pubmed/36010781
http://dx.doi.org/10.3390/e24081116
_version_ 1784774340236017664
author Hoekzema, Renee S.
Marsh, Lewis
Sumray, Otto
Carroll, Thomas M.
Lu, Xin
Byrne, Helen M.
Harrington, Heather A.
author_facet Hoekzema, Renee S.
Marsh, Lewis
Sumray, Otto
Carroll, Thomas M.
Lu, Xin
Byrne, Helen M.
Harrington, Heather A.
author_sort Hoekzema, Renee S.
collection PubMed
description Analysis of single-cell transcriptomics often relies on clustering cells and then performing differential gene expression (DGE) to identify genes that vary between these clusters. These discrete analyses successfully determine cell types and markers; however, continuous variation within and between cell types may not be detected. We propose three topologically motivated mathematical methods for unsupervised feature selection that consider discrete and continuous transcriptional patterns on an equal footing across multiple scales simultaneously. Eigenscores ([Formula: see text]) rank signals or genes based on their correspondence to low-frequency intrinsic patterning in the data using the spectral decomposition of the Laplacian graph. The multiscale Laplacian score (MLS) is an unsupervised method for locating relevant scales in data and selecting the genes that are coherently expressed at these respective scales. The persistent Rayleigh quotient (PRQ) takes data equipped with a filtration, allowing the separation of genes with different roles in a bifurcation process (e.g., pseudo-time). We demonstrate the utility of these techniques by applying them to published single-cell transcriptomics data sets. The methods validate previously identified genes and detect additional biologically meaningful genes with coherent expression patterns. By studying the interaction between gene signals and the geometry of the underlying space, the three methods give multidimensional rankings of the genes and visualisation of relationships between them.
format Online
Article
Text
id pubmed-9407339
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-94073392022-08-26 Multiscale Methods for Signal Selection in Single-Cell Data Hoekzema, Renee S. Marsh, Lewis Sumray, Otto Carroll, Thomas M. Lu, Xin Byrne, Helen M. Harrington, Heather A. Entropy (Basel) Article Analysis of single-cell transcriptomics often relies on clustering cells and then performing differential gene expression (DGE) to identify genes that vary between these clusters. These discrete analyses successfully determine cell types and markers; however, continuous variation within and between cell types may not be detected. We propose three topologically motivated mathematical methods for unsupervised feature selection that consider discrete and continuous transcriptional patterns on an equal footing across multiple scales simultaneously. Eigenscores ([Formula: see text]) rank signals or genes based on their correspondence to low-frequency intrinsic patterning in the data using the spectral decomposition of the Laplacian graph. The multiscale Laplacian score (MLS) is an unsupervised method for locating relevant scales in data and selecting the genes that are coherently expressed at these respective scales. The persistent Rayleigh quotient (PRQ) takes data equipped with a filtration, allowing the separation of genes with different roles in a bifurcation process (e.g., pseudo-time). We demonstrate the utility of these techniques by applying them to published single-cell transcriptomics data sets. The methods validate previously identified genes and detect additional biologically meaningful genes with coherent expression patterns. By studying the interaction between gene signals and the geometry of the underlying space, the three methods give multidimensional rankings of the genes and visualisation of relationships between them. MDPI 2022-08-13 /pmc/articles/PMC9407339/ /pubmed/36010781 http://dx.doi.org/10.3390/e24081116 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Hoekzema, Renee S.
Marsh, Lewis
Sumray, Otto
Carroll, Thomas M.
Lu, Xin
Byrne, Helen M.
Harrington, Heather A.
Multiscale Methods for Signal Selection in Single-Cell Data
title Multiscale Methods for Signal Selection in Single-Cell Data
title_full Multiscale Methods for Signal Selection in Single-Cell Data
title_fullStr Multiscale Methods for Signal Selection in Single-Cell Data
title_full_unstemmed Multiscale Methods for Signal Selection in Single-Cell Data
title_short Multiscale Methods for Signal Selection in Single-Cell Data
title_sort multiscale methods for signal selection in single-cell data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9407339/
https://www.ncbi.nlm.nih.gov/pubmed/36010781
http://dx.doi.org/10.3390/e24081116
work_keys_str_mv AT hoekzemarenees multiscalemethodsforsignalselectioninsinglecelldata
AT marshlewis multiscalemethodsforsignalselectioninsinglecelldata
AT sumrayotto multiscalemethodsforsignalselectioninsinglecelldata
AT carrollthomasm multiscalemethodsforsignalselectioninsinglecelldata
AT luxin multiscalemethodsforsignalselectioninsinglecelldata
AT byrnehelenm multiscalemethodsforsignalselectioninsinglecelldata
AT harringtonheathera multiscalemethodsforsignalselectioninsinglecelldata