Cargando…

An active learning approach for clustering single-cell RNA-seq data

Single-cell RNA sequencing (scRNA-seq) data has been widely used to profile cellular heterogeneities with a high-resolution picture. Clustering analysis is a crucial step of scRNA-seq data analysis because it provides a chance to identify and uncover the undiscovered cell types. Most methods for clu...

Descripción completa

Detalles Bibliográficos
Autores principales: Lin, Xiang, Liu, Haoran, Wei, Zhi, Roy, Senjuti Basu, Gao, Nan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8742847/
https://www.ncbi.nlm.nih.gov/pubmed/34244616
http://dx.doi.org/10.1038/s41374-021-00639-w
_version_ 1784629787717795840
author Lin, Xiang
Liu, Haoran
Wei, Zhi
Roy, Senjuti Basu
Gao, Nan
author_facet Lin, Xiang
Liu, Haoran
Wei, Zhi
Roy, Senjuti Basu
Gao, Nan
author_sort Lin, Xiang
collection PubMed
description Single-cell RNA sequencing (scRNA-seq) data has been widely used to profile cellular heterogeneities with a high-resolution picture. Clustering analysis is a crucial step of scRNA-seq data analysis because it provides a chance to identify and uncover the undiscovered cell types. Most methods for clustering scRNA-seq data use an unsupervised learning strategy. Since the clustering step is separated from the cell annotation and labeling step, it is not uncommon for a totally exotic clustering with poor biological interpretability to be generated — a result generally undesired by biologists. To solve this problem, we proposed an active learning (AL) framework for clustering scRNA-seq data. The AL model employed a learning algorithm that can actively query the biologist for labels, and the manual labeling is expected to be applied to only a subset of cells. To develop an optimal active learning approach, we explored several key parameters of the AL model in the experiments with four real scRNA-seq datasets. We demonstrate that the proposed AL model outperformed state-of-the-art unsupervised clustering methods with less than 1000 labeled cells. Therefore, we conclude that AL model is a promising tool for clustering scRNA-seq data that allows us to achieve a superior performance effectively and efficiently.
format Online
Article
Text
id pubmed-8742847
institution National Center for Biotechnology Information
language English
publishDate 2022
record_format MEDLINE/PubMed
spelling pubmed-87428472022-02-23 An active learning approach for clustering single-cell RNA-seq data Lin, Xiang Liu, Haoran Wei, Zhi Roy, Senjuti Basu Gao, Nan Lab Invest Article Single-cell RNA sequencing (scRNA-seq) data has been widely used to profile cellular heterogeneities with a high-resolution picture. Clustering analysis is a crucial step of scRNA-seq data analysis because it provides a chance to identify and uncover the undiscovered cell types. Most methods for clustering scRNA-seq data use an unsupervised learning strategy. Since the clustering step is separated from the cell annotation and labeling step, it is not uncommon for a totally exotic clustering with poor biological interpretability to be generated — a result generally undesired by biologists. To solve this problem, we proposed an active learning (AL) framework for clustering scRNA-seq data. The AL model employed a learning algorithm that can actively query the biologist for labels, and the manual labeling is expected to be applied to only a subset of cells. To develop an optimal active learning approach, we explored several key parameters of the AL model in the experiments with four real scRNA-seq datasets. We demonstrate that the proposed AL model outperformed state-of-the-art unsupervised clustering methods with less than 1000 labeled cells. Therefore, we conclude that AL model is a promising tool for clustering scRNA-seq data that allows us to achieve a superior performance effectively and efficiently. 2022-03 2021-07-09 /pmc/articles/PMC8742847/ /pubmed/34244616 http://dx.doi.org/10.1038/s41374-021-00639-w Text en Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: https://www.springernature.com/gp/open-research/policies/accepted-manuscript-terms
spellingShingle Article
Lin, Xiang
Liu, Haoran
Wei, Zhi
Roy, Senjuti Basu
Gao, Nan
An active learning approach for clustering single-cell RNA-seq data
title An active learning approach for clustering single-cell RNA-seq data
title_full An active learning approach for clustering single-cell RNA-seq data
title_fullStr An active learning approach for clustering single-cell RNA-seq data
title_full_unstemmed An active learning approach for clustering single-cell RNA-seq data
title_short An active learning approach for clustering single-cell RNA-seq data
title_sort active learning approach for clustering single-cell rna-seq data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8742847/
https://www.ncbi.nlm.nih.gov/pubmed/34244616
http://dx.doi.org/10.1038/s41374-021-00639-w
work_keys_str_mv AT linxiang anactivelearningapproachforclusteringsinglecellrnaseqdata
AT liuhaoran anactivelearningapproachforclusteringsinglecellrnaseqdata
AT weizhi anactivelearningapproachforclusteringsinglecellrnaseqdata
AT roysenjutibasu anactivelearningapproachforclusteringsinglecellrnaseqdata
AT gaonan anactivelearningapproachforclusteringsinglecellrnaseqdata
AT linxiang activelearningapproachforclusteringsinglecellrnaseqdata
AT liuhaoran activelearningapproachforclusteringsinglecellrnaseqdata
AT weizhi activelearningapproachforclusteringsinglecellrnaseqdata
AT roysenjutibasu activelearningapproachforclusteringsinglecellrnaseqdata
AT gaonan activelearningapproachforclusteringsinglecellrnaseqdata