Cargando…
MicroCellClust: mining rare and highly specific subpopulations from single-cell expression data
MOTIVATION: Identifying rare subpopulations of cells is a critical step in order to extract knowledge from single-cell expression data, especially when the available data is limited and rare subpopulations only contain a few cells. In this paper, we present a data mining method to identify small sub...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8504615/ https://www.ncbi.nlm.nih.gov/pubmed/33830183 http://dx.doi.org/10.1093/bioinformatics/btab239 |
_version_ | 1784581355685806080 |
---|---|
author | Gerniers, Alexander Bricard, Orian Dupont, Pierre |
author_facet | Gerniers, Alexander Bricard, Orian Dupont, Pierre |
author_sort | Gerniers, Alexander |
collection | PubMed |
description | MOTIVATION: Identifying rare subpopulations of cells is a critical step in order to extract knowledge from single-cell expression data, especially when the available data is limited and rare subpopulations only contain a few cells. In this paper, we present a data mining method to identify small subpopulations of cells that present highly specific expression profiles. This objective is formalized as a constrained optimization problem that jointly identifies a small group of cells and a corresponding subset of specific genes. The proposed method extends the max-sum submatrix problem to yield genes that are, for instance, highly expressed inside a small number of cells, but have a low expression in the remaining ones. RESULTS: We show through controlled experiments on scRNA-seq data that the MicroCellClust method achieves a high F(1) score to identify rare subpopulations of artificially planted human T cells. The effectiveness of MicroCellClust is confirmed as it reveals a subpopulation of CD4 T cells with a specific phenotype from breast cancer samples, and a subpopulation linked to a specific stage in the cell cycle from breast cancer samples as well. Finally, three rare subpopulations in mouse embryonic stem cells are also identified with MicroCellClust. These results illustrate the proposed method outperforms typical alternatives at identifying small subsets of cells with highly specific expression profiles. AVAILABILITYAND IMPLEMENTATION: The R and Scala implementation of MicroCellClust is freely available on GitHub, at https://github.com/agerniers/MicroCellClust/ The data underlying this article are available on Zenodo, at https://dx.doi.org/10.5281/zenodo.4580332. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-8504615 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-85046152021-10-13 MicroCellClust: mining rare and highly specific subpopulations from single-cell expression data Gerniers, Alexander Bricard, Orian Dupont, Pierre Bioinformatics Original Papers MOTIVATION: Identifying rare subpopulations of cells is a critical step in order to extract knowledge from single-cell expression data, especially when the available data is limited and rare subpopulations only contain a few cells. In this paper, we present a data mining method to identify small subpopulations of cells that present highly specific expression profiles. This objective is formalized as a constrained optimization problem that jointly identifies a small group of cells and a corresponding subset of specific genes. The proposed method extends the max-sum submatrix problem to yield genes that are, for instance, highly expressed inside a small number of cells, but have a low expression in the remaining ones. RESULTS: We show through controlled experiments on scRNA-seq data that the MicroCellClust method achieves a high F(1) score to identify rare subpopulations of artificially planted human T cells. The effectiveness of MicroCellClust is confirmed as it reveals a subpopulation of CD4 T cells with a specific phenotype from breast cancer samples, and a subpopulation linked to a specific stage in the cell cycle from breast cancer samples as well. Finally, three rare subpopulations in mouse embryonic stem cells are also identified with MicroCellClust. These results illustrate the proposed method outperforms typical alternatives at identifying small subsets of cells with highly specific expression profiles. AVAILABILITYAND IMPLEMENTATION: The R and Scala implementation of MicroCellClust is freely available on GitHub, at https://github.com/agerniers/MicroCellClust/ The data underlying this article are available on Zenodo, at https://dx.doi.org/10.5281/zenodo.4580332. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-04-08 /pmc/articles/PMC8504615/ /pubmed/33830183 http://dx.doi.org/10.1093/bioinformatics/btab239 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Gerniers, Alexander Bricard, Orian Dupont, Pierre MicroCellClust: mining rare and highly specific subpopulations from single-cell expression data |
title | MicroCellClust: mining rare and highly specific subpopulations from single-cell expression data |
title_full | MicroCellClust: mining rare and highly specific subpopulations from single-cell expression data |
title_fullStr | MicroCellClust: mining rare and highly specific subpopulations from single-cell expression data |
title_full_unstemmed | MicroCellClust: mining rare and highly specific subpopulations from single-cell expression data |
title_short | MicroCellClust: mining rare and highly specific subpopulations from single-cell expression data |
title_sort | microcellclust: mining rare and highly specific subpopulations from single-cell expression data |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8504615/ https://www.ncbi.nlm.nih.gov/pubmed/33830183 http://dx.doi.org/10.1093/bioinformatics/btab239 |
work_keys_str_mv | AT gerniersalexander microcellclustminingrareandhighlyspecificsubpopulationsfromsinglecellexpressiondata AT bricardorian microcellclustminingrareandhighlyspecificsubpopulationsfromsinglecellexpressiondata AT dupontpierre microcellclustminingrareandhighlyspecificsubpopulationsfromsinglecellexpressiondata |