Cargando…

An information-theoretic approach to single cell sequencing analysis

BACKGROUND: Single-cell sequencing (sc-Seq) experiments are producing increasingly large data sets. However, large data sets do not necessarily contain large amounts of information. RESULTS: Here, we formally quantify the information obtained from a sc-Seq experiment and show that it corresponds to...

Descripción completa

Detalles Bibliográficos
Autores principales: Casey, Michael J., Fliege, Jörg, Sánchez-García, Rubén J., MacArthur, Ben D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10422744/
https://www.ncbi.nlm.nih.gov/pubmed/37573291
http://dx.doi.org/10.1186/s12859-023-05424-8
_version_ 1785089286700269568
author Casey, Michael J.
Fliege, Jörg
Sánchez-García, Rubén J.
MacArthur, Ben D.
author_facet Casey, Michael J.
Fliege, Jörg
Sánchez-García, Rubén J.
MacArthur, Ben D.
author_sort Casey, Michael J.
collection PubMed
description BACKGROUND: Single-cell sequencing (sc-Seq) experiments are producing increasingly large data sets. However, large data sets do not necessarily contain large amounts of information. RESULTS: Here, we formally quantify the information obtained from a sc-Seq experiment and show that it corresponds to an intuitive notion of gene expression heterogeneity. We demonstrate a natural relation between our notion of heterogeneity and that of cell type, decomposing heterogeneity into that component attributable to differential expression between cell types (inter-cluster heterogeneity) and that remaining (intra-cluster heterogeneity). We test our definition of heterogeneity as the objective function of a clustering algorithm, and show that it is a useful descriptor for gene expression patterns associated with different cell types. CONCLUSIONS: Thus, our definition of gene heterogeneity leads to a biologically meaningful notion of cell type, as groups of cells that are statistically equivalent with respect to their patterns of gene expression. Our measure of heterogeneity, and its decomposition into inter- and intra-cluster, is non-parametric, intrinsic, unbiased, and requires no additional assumptions about expression patterns. Based on this theory, we develop an efficient method for the automatic unsupervised clustering of cells from sc-Seq data, and provide an R package implementation. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05424-8.
format Online
Article
Text
id pubmed-10422744
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-104227442023-08-13 An information-theoretic approach to single cell sequencing analysis Casey, Michael J. Fliege, Jörg Sánchez-García, Rubén J. MacArthur, Ben D. BMC Bioinformatics Research Article BACKGROUND: Single-cell sequencing (sc-Seq) experiments are producing increasingly large data sets. However, large data sets do not necessarily contain large amounts of information. RESULTS: Here, we formally quantify the information obtained from a sc-Seq experiment and show that it corresponds to an intuitive notion of gene expression heterogeneity. We demonstrate a natural relation between our notion of heterogeneity and that of cell type, decomposing heterogeneity into that component attributable to differential expression between cell types (inter-cluster heterogeneity) and that remaining (intra-cluster heterogeneity). We test our definition of heterogeneity as the objective function of a clustering algorithm, and show that it is a useful descriptor for gene expression patterns associated with different cell types. CONCLUSIONS: Thus, our definition of gene heterogeneity leads to a biologically meaningful notion of cell type, as groups of cells that are statistically equivalent with respect to their patterns of gene expression. Our measure of heterogeneity, and its decomposition into inter- and intra-cluster, is non-parametric, intrinsic, unbiased, and requires no additional assumptions about expression patterns. Based on this theory, we develop an efficient method for the automatic unsupervised clustering of cells from sc-Seq data, and provide an R package implementation. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05424-8. BioMed Central 2023-08-12 /pmc/articles/PMC10422744/ /pubmed/37573291 http://dx.doi.org/10.1186/s12859-023-05424-8 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Casey, Michael J.
Fliege, Jörg
Sánchez-García, Rubén J.
MacArthur, Ben D.
An information-theoretic approach to single cell sequencing analysis
title An information-theoretic approach to single cell sequencing analysis
title_full An information-theoretic approach to single cell sequencing analysis
title_fullStr An information-theoretic approach to single cell sequencing analysis
title_full_unstemmed An information-theoretic approach to single cell sequencing analysis
title_short An information-theoretic approach to single cell sequencing analysis
title_sort information-theoretic approach to single cell sequencing analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10422744/
https://www.ncbi.nlm.nih.gov/pubmed/37573291
http://dx.doi.org/10.1186/s12859-023-05424-8
work_keys_str_mv AT caseymichaelj aninformationtheoreticapproachtosinglecellsequencinganalysis
AT fliegejorg aninformationtheoreticapproachtosinglecellsequencinganalysis
AT sanchezgarciarubenj aninformationtheoreticapproachtosinglecellsequencinganalysis
AT macarthurbend aninformationtheoreticapproachtosinglecellsequencinganalysis
AT caseymichaelj informationtheoreticapproachtosinglecellsequencinganalysis
AT fliegejorg informationtheoreticapproachtosinglecellsequencinganalysis
AT sanchezgarciarubenj informationtheoreticapproachtosinglecellsequencinganalysis
AT macarthurbend informationtheoreticapproachtosinglecellsequencinganalysis