Cargando…

A robust semi-supervised NMF model for single cell RNA-seq data

BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) technology is a powerful tool to study organism from a single cell perspective and explore the heterogeneity between cells. Clustering is a fundamental step in scRNA-seq data analysis and it is the key to understand cell function and constitutes the...

Descripción completa

Detalles Bibliográficos
Autores principales: Wu, Peng, An, Mo, Zou, Hai-Ren, Zhong, Cai-Ying, Wang, Wei, Wu, Chang-Peng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7571410/
https://www.ncbi.nlm.nih.gov/pubmed/33088619
http://dx.doi.org/10.7717/peerj.10091
_version_ 1783597166700265472
author Wu, Peng
An, Mo
Zou, Hai-Ren
Zhong, Cai-Ying
Wang, Wei
Wu, Chang-Peng
author_facet Wu, Peng
An, Mo
Zou, Hai-Ren
Zhong, Cai-Ying
Wang, Wei
Wu, Chang-Peng
author_sort Wu, Peng
collection PubMed
description BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) technology is a powerful tool to study organism from a single cell perspective and explore the heterogeneity between cells. Clustering is a fundamental step in scRNA-seq data analysis and it is the key to understand cell function and constitutes the basis of other advanced analysis. Nonnegative Matrix Factorization (NMF) has been widely used in clustering analysis of transcriptome data and achieved good performance. However, the existing NMF model is unsupervised and ignores known gene functions in the process of clustering. Knowledges of cell markers genes (genes that only express in specific cells) in human and model organisms have been accumulated a lot, such as the Molecular Signatures Database (MSigDB), which can be used as prior information in the clustering analysis of scRNA-seq data. Because the same kind of cells is likely to have similar biological functions and specific gene expression patterns, the marker genes of cells can be utilized as prior knowledge in the clustering analysis. METHODS: We propose a robust and semi-supervised NMF (rssNMF) model, which introduces a new variable to absorb noises of data and incorporates marker genes as prior information into a graph regularization term. We use rssNMF to solve the clustering problem of scRNA-seq data. RESULTS: Twelve scRNA-seq datasets with true labels are used to test the model performance and the results illustrate that our model outperforms original NMF and other common methods such as KMeans and Hierarchical Clustering. Biological significance analysis shows that rssNMF can identify key subclasses and latent biological processes. To our knowledge, this study is the first method that incorporates prior knowledge into the clustering analysis of scRNA-seq data.
format Online
Article
Text
id pubmed-7571410
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-75714102020-10-20 A robust semi-supervised NMF model for single cell RNA-seq data Wu, Peng An, Mo Zou, Hai-Ren Zhong, Cai-Ying Wang, Wei Wu, Chang-Peng PeerJ Bioinformatics BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) technology is a powerful tool to study organism from a single cell perspective and explore the heterogeneity between cells. Clustering is a fundamental step in scRNA-seq data analysis and it is the key to understand cell function and constitutes the basis of other advanced analysis. Nonnegative Matrix Factorization (NMF) has been widely used in clustering analysis of transcriptome data and achieved good performance. However, the existing NMF model is unsupervised and ignores known gene functions in the process of clustering. Knowledges of cell markers genes (genes that only express in specific cells) in human and model organisms have been accumulated a lot, such as the Molecular Signatures Database (MSigDB), which can be used as prior information in the clustering analysis of scRNA-seq data. Because the same kind of cells is likely to have similar biological functions and specific gene expression patterns, the marker genes of cells can be utilized as prior knowledge in the clustering analysis. METHODS: We propose a robust and semi-supervised NMF (rssNMF) model, which introduces a new variable to absorb noises of data and incorporates marker genes as prior information into a graph regularization term. We use rssNMF to solve the clustering problem of scRNA-seq data. RESULTS: Twelve scRNA-seq datasets with true labels are used to test the model performance and the results illustrate that our model outperforms original NMF and other common methods such as KMeans and Hierarchical Clustering. Biological significance analysis shows that rssNMF can identify key subclasses and latent biological processes. To our knowledge, this study is the first method that incorporates prior knowledge into the clustering analysis of scRNA-seq data. PeerJ Inc. 2020-10-16 /pmc/articles/PMC7571410/ /pubmed/33088619 http://dx.doi.org/10.7717/peerj.10091 Text en ©2020 Wu et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Wu, Peng
An, Mo
Zou, Hai-Ren
Zhong, Cai-Ying
Wang, Wei
Wu, Chang-Peng
A robust semi-supervised NMF model for single cell RNA-seq data
title A robust semi-supervised NMF model for single cell RNA-seq data
title_full A robust semi-supervised NMF model for single cell RNA-seq data
title_fullStr A robust semi-supervised NMF model for single cell RNA-seq data
title_full_unstemmed A robust semi-supervised NMF model for single cell RNA-seq data
title_short A robust semi-supervised NMF model for single cell RNA-seq data
title_sort robust semi-supervised nmf model for single cell rna-seq data
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7571410/
https://www.ncbi.nlm.nih.gov/pubmed/33088619
http://dx.doi.org/10.7717/peerj.10091
work_keys_str_mv AT wupeng arobustsemisupervisednmfmodelforsinglecellrnaseqdata
AT anmo arobustsemisupervisednmfmodelforsinglecellrnaseqdata
AT zouhairen arobustsemisupervisednmfmodelforsinglecellrnaseqdata
AT zhongcaiying arobustsemisupervisednmfmodelforsinglecellrnaseqdata
AT wangwei arobustsemisupervisednmfmodelforsinglecellrnaseqdata
AT wuchangpeng arobustsemisupervisednmfmodelforsinglecellrnaseqdata
AT wupeng robustsemisupervisednmfmodelforsinglecellrnaseqdata
AT anmo robustsemisupervisednmfmodelforsinglecellrnaseqdata
AT zouhairen robustsemisupervisednmfmodelforsinglecellrnaseqdata
AT zhongcaiying robustsemisupervisednmfmodelforsinglecellrnaseqdata
AT wangwei robustsemisupervisednmfmodelforsinglecellrnaseqdata
AT wuchangpeng robustsemisupervisednmfmodelforsinglecellrnaseqdata