Cargando…
An information-theoretic approach to single cell sequencing analysis
BACKGROUND: Single-cell sequencing (sc-Seq) experiments are producing increasingly large data sets. However, large data sets do not necessarily contain large amounts of information. RESULTS: Here, we formally quantify the information obtained from a sc-Seq experiment and show that it corresponds to...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10422744/ https://www.ncbi.nlm.nih.gov/pubmed/37573291 http://dx.doi.org/10.1186/s12859-023-05424-8 |
_version_ | 1785089286700269568 |
---|---|
author | Casey, Michael J. Fliege, Jörg Sánchez-García, Rubén J. MacArthur, Ben D. |
author_facet | Casey, Michael J. Fliege, Jörg Sánchez-García, Rubén J. MacArthur, Ben D. |
author_sort | Casey, Michael J. |
collection | PubMed |
description | BACKGROUND: Single-cell sequencing (sc-Seq) experiments are producing increasingly large data sets. However, large data sets do not necessarily contain large amounts of information. RESULTS: Here, we formally quantify the information obtained from a sc-Seq experiment and show that it corresponds to an intuitive notion of gene expression heterogeneity. We demonstrate a natural relation between our notion of heterogeneity and that of cell type, decomposing heterogeneity into that component attributable to differential expression between cell types (inter-cluster heterogeneity) and that remaining (intra-cluster heterogeneity). We test our definition of heterogeneity as the objective function of a clustering algorithm, and show that it is a useful descriptor for gene expression patterns associated with different cell types. CONCLUSIONS: Thus, our definition of gene heterogeneity leads to a biologically meaningful notion of cell type, as groups of cells that are statistically equivalent with respect to their patterns of gene expression. Our measure of heterogeneity, and its decomposition into inter- and intra-cluster, is non-parametric, intrinsic, unbiased, and requires no additional assumptions about expression patterns. Based on this theory, we develop an efficient method for the automatic unsupervised clustering of cells from sc-Seq data, and provide an R package implementation. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05424-8. |
format | Online Article Text |
id | pubmed-10422744 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-104227442023-08-13 An information-theoretic approach to single cell sequencing analysis Casey, Michael J. Fliege, Jörg Sánchez-García, Rubén J. MacArthur, Ben D. BMC Bioinformatics Research Article BACKGROUND: Single-cell sequencing (sc-Seq) experiments are producing increasingly large data sets. However, large data sets do not necessarily contain large amounts of information. RESULTS: Here, we formally quantify the information obtained from a sc-Seq experiment and show that it corresponds to an intuitive notion of gene expression heterogeneity. We demonstrate a natural relation between our notion of heterogeneity and that of cell type, decomposing heterogeneity into that component attributable to differential expression between cell types (inter-cluster heterogeneity) and that remaining (intra-cluster heterogeneity). We test our definition of heterogeneity as the objective function of a clustering algorithm, and show that it is a useful descriptor for gene expression patterns associated with different cell types. CONCLUSIONS: Thus, our definition of gene heterogeneity leads to a biologically meaningful notion of cell type, as groups of cells that are statistically equivalent with respect to their patterns of gene expression. Our measure of heterogeneity, and its decomposition into inter- and intra-cluster, is non-parametric, intrinsic, unbiased, and requires no additional assumptions about expression patterns. Based on this theory, we develop an efficient method for the automatic unsupervised clustering of cells from sc-Seq data, and provide an R package implementation. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05424-8. BioMed Central 2023-08-12 /pmc/articles/PMC10422744/ /pubmed/37573291 http://dx.doi.org/10.1186/s12859-023-05424-8 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Article Casey, Michael J. Fliege, Jörg Sánchez-García, Rubén J. MacArthur, Ben D. An information-theoretic approach to single cell sequencing analysis |
title | An information-theoretic approach to single cell sequencing analysis |
title_full | An information-theoretic approach to single cell sequencing analysis |
title_fullStr | An information-theoretic approach to single cell sequencing analysis |
title_full_unstemmed | An information-theoretic approach to single cell sequencing analysis |
title_short | An information-theoretic approach to single cell sequencing analysis |
title_sort | information-theoretic approach to single cell sequencing analysis |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10422744/ https://www.ncbi.nlm.nih.gov/pubmed/37573291 http://dx.doi.org/10.1186/s12859-023-05424-8 |
work_keys_str_mv | AT caseymichaelj aninformationtheoreticapproachtosinglecellsequencinganalysis AT fliegejorg aninformationtheoreticapproachtosinglecellsequencinganalysis AT sanchezgarciarubenj aninformationtheoreticapproachtosinglecellsequencinganalysis AT macarthurbend aninformationtheoreticapproachtosinglecellsequencinganalysis AT caseymichaelj informationtheoreticapproachtosinglecellsequencinganalysis AT fliegejorg informationtheoreticapproachtosinglecellsequencinganalysis AT sanchezgarciarubenj informationtheoreticapproachtosinglecellsequencinganalysis AT macarthurbend informationtheoreticapproachtosinglecellsequencinganalysis |