Cargando…

ILoReg: a tool for high-resolution cell population identification from single-cell RNA-seq data

MOTIVATION: Single-cell RNA-seq allows researchers to identify cell populations based on unsupervised clustering of the transcriptome. However, subpopulations can have only subtle transcriptomic differences and the high dimensionality of the data makes their identification challenging. RESULTS: We i...

Descripción completa

Detalles Bibliográficos
Autores principales: Smolander, Johannes, Junttila, Sini, Venäläinen, Mikko S, Elo, Laura L
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8150131/
https://www.ncbi.nlm.nih.gov/pubmed/33151294
http://dx.doi.org/10.1093/bioinformatics/btaa919
_version_ 1783698096380706816
author Smolander, Johannes
Junttila, Sini
Venäläinen, Mikko S
Elo, Laura L
author_facet Smolander, Johannes
Junttila, Sini
Venäläinen, Mikko S
Elo, Laura L
author_sort Smolander, Johannes
collection PubMed
description MOTIVATION: Single-cell RNA-seq allows researchers to identify cell populations based on unsupervised clustering of the transcriptome. However, subpopulations can have only subtle transcriptomic differences and the high dimensionality of the data makes their identification challenging. RESULTS: We introduce ILoReg, an R package implementing a new cell population identification method that improves identification of cell populations with subtle differences through a probabilistic feature extraction step that is applied before clustering and visualization. The feature extraction is performed using a novel machine learning algorithm, called iterative clustering projection (ICP), that uses logistic regression and clustering similarity comparison to iteratively cluster data. Remarkably, ICP also manages to integrate feature selection with the clustering through L1-regularization, enabling the identification of genes that are differentially expressed between cell populations. By combining solutions of multiple ICP runs into a single consensus solution, ILoReg creates a representation that enables investigating cell populations with a high resolution. In particular, we show that the visualization of ILoReg allows segregation of immune and pancreatic cell populations in a more pronounced manner compared with current state-of-the-art methods. AVAILABILITY AND IMPLEMENTATION: ILoReg is available as an R package at https://bioconductor.org/packages/ILoReg. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8150131
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-81501312021-05-28 ILoReg: a tool for high-resolution cell population identification from single-cell RNA-seq data Smolander, Johannes Junttila, Sini Venäläinen, Mikko S Elo, Laura L Bioinformatics Original Papers MOTIVATION: Single-cell RNA-seq allows researchers to identify cell populations based on unsupervised clustering of the transcriptome. However, subpopulations can have only subtle transcriptomic differences and the high dimensionality of the data makes their identification challenging. RESULTS: We introduce ILoReg, an R package implementing a new cell population identification method that improves identification of cell populations with subtle differences through a probabilistic feature extraction step that is applied before clustering and visualization. The feature extraction is performed using a novel machine learning algorithm, called iterative clustering projection (ICP), that uses logistic regression and clustering similarity comparison to iteratively cluster data. Remarkably, ICP also manages to integrate feature selection with the clustering through L1-regularization, enabling the identification of genes that are differentially expressed between cell populations. By combining solutions of multiple ICP runs into a single consensus solution, ILoReg creates a representation that enables investigating cell populations with a high resolution. In particular, we show that the visualization of ILoReg allows segregation of immune and pancreatic cell populations in a more pronounced manner compared with current state-of-the-art methods. AVAILABILITY AND IMPLEMENTATION: ILoReg is available as an R package at https://bioconductor.org/packages/ILoReg. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-12-13 /pmc/articles/PMC8150131/ /pubmed/33151294 http://dx.doi.org/10.1093/bioinformatics/btaa919 Text en © The Author(s) 2020. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Smolander, Johannes
Junttila, Sini
Venäläinen, Mikko S
Elo, Laura L
ILoReg: a tool for high-resolution cell population identification from single-cell RNA-seq data
title ILoReg: a tool for high-resolution cell population identification from single-cell RNA-seq data
title_full ILoReg: a tool for high-resolution cell population identification from single-cell RNA-seq data
title_fullStr ILoReg: a tool for high-resolution cell population identification from single-cell RNA-seq data
title_full_unstemmed ILoReg: a tool for high-resolution cell population identification from single-cell RNA-seq data
title_short ILoReg: a tool for high-resolution cell population identification from single-cell RNA-seq data
title_sort iloreg: a tool for high-resolution cell population identification from single-cell rna-seq data
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8150131/
https://www.ncbi.nlm.nih.gov/pubmed/33151294
http://dx.doi.org/10.1093/bioinformatics/btaa919
work_keys_str_mv AT smolanderjohannes iloregatoolforhighresolutioncellpopulationidentificationfromsinglecellrnaseqdata
AT junttilasini iloregatoolforhighresolutioncellpopulationidentificationfromsinglecellrnaseqdata
AT venalainenmikkos iloregatoolforhighresolutioncellpopulationidentificationfromsinglecellrnaseqdata
AT elolaural iloregatoolforhighresolutioncellpopulationidentificationfromsinglecellrnaseqdata