Cargando…
A robust semi-supervised NMF model for single cell RNA-seq data
BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) technology is a powerful tool to study organism from a single cell perspective and explore the heterogeneity between cells. Clustering is a fundamental step in scRNA-seq data analysis and it is the key to understand cell function and constitutes the...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7571410/ https://www.ncbi.nlm.nih.gov/pubmed/33088619 http://dx.doi.org/10.7717/peerj.10091 |
_version_ | 1783597166700265472 |
---|---|
author | Wu, Peng An, Mo Zou, Hai-Ren Zhong, Cai-Ying Wang, Wei Wu, Chang-Peng |
author_facet | Wu, Peng An, Mo Zou, Hai-Ren Zhong, Cai-Ying Wang, Wei Wu, Chang-Peng |
author_sort | Wu, Peng |
collection | PubMed |
description | BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) technology is a powerful tool to study organism from a single cell perspective and explore the heterogeneity between cells. Clustering is a fundamental step in scRNA-seq data analysis and it is the key to understand cell function and constitutes the basis of other advanced analysis. Nonnegative Matrix Factorization (NMF) has been widely used in clustering analysis of transcriptome data and achieved good performance. However, the existing NMF model is unsupervised and ignores known gene functions in the process of clustering. Knowledges of cell markers genes (genes that only express in specific cells) in human and model organisms have been accumulated a lot, such as the Molecular Signatures Database (MSigDB), which can be used as prior information in the clustering analysis of scRNA-seq data. Because the same kind of cells is likely to have similar biological functions and specific gene expression patterns, the marker genes of cells can be utilized as prior knowledge in the clustering analysis. METHODS: We propose a robust and semi-supervised NMF (rssNMF) model, which introduces a new variable to absorb noises of data and incorporates marker genes as prior information into a graph regularization term. We use rssNMF to solve the clustering problem of scRNA-seq data. RESULTS: Twelve scRNA-seq datasets with true labels are used to test the model performance and the results illustrate that our model outperforms original NMF and other common methods such as KMeans and Hierarchical Clustering. Biological significance analysis shows that rssNMF can identify key subclasses and latent biological processes. To our knowledge, this study is the first method that incorporates prior knowledge into the clustering analysis of scRNA-seq data. |
format | Online Article Text |
id | pubmed-7571410 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-75714102020-10-20 A robust semi-supervised NMF model for single cell RNA-seq data Wu, Peng An, Mo Zou, Hai-Ren Zhong, Cai-Ying Wang, Wei Wu, Chang-Peng PeerJ Bioinformatics BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) technology is a powerful tool to study organism from a single cell perspective and explore the heterogeneity between cells. Clustering is a fundamental step in scRNA-seq data analysis and it is the key to understand cell function and constitutes the basis of other advanced analysis. Nonnegative Matrix Factorization (NMF) has been widely used in clustering analysis of transcriptome data and achieved good performance. However, the existing NMF model is unsupervised and ignores known gene functions in the process of clustering. Knowledges of cell markers genes (genes that only express in specific cells) in human and model organisms have been accumulated a lot, such as the Molecular Signatures Database (MSigDB), which can be used as prior information in the clustering analysis of scRNA-seq data. Because the same kind of cells is likely to have similar biological functions and specific gene expression patterns, the marker genes of cells can be utilized as prior knowledge in the clustering analysis. METHODS: We propose a robust and semi-supervised NMF (rssNMF) model, which introduces a new variable to absorb noises of data and incorporates marker genes as prior information into a graph regularization term. We use rssNMF to solve the clustering problem of scRNA-seq data. RESULTS: Twelve scRNA-seq datasets with true labels are used to test the model performance and the results illustrate that our model outperforms original NMF and other common methods such as KMeans and Hierarchical Clustering. Biological significance analysis shows that rssNMF can identify key subclasses and latent biological processes. To our knowledge, this study is the first method that incorporates prior knowledge into the clustering analysis of scRNA-seq data. PeerJ Inc. 2020-10-16 /pmc/articles/PMC7571410/ /pubmed/33088619 http://dx.doi.org/10.7717/peerj.10091 Text en ©2020 Wu et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Wu, Peng An, Mo Zou, Hai-Ren Zhong, Cai-Ying Wang, Wei Wu, Chang-Peng A robust semi-supervised NMF model for single cell RNA-seq data |
title | A robust semi-supervised NMF model for single cell RNA-seq data |
title_full | A robust semi-supervised NMF model for single cell RNA-seq data |
title_fullStr | A robust semi-supervised NMF model for single cell RNA-seq data |
title_full_unstemmed | A robust semi-supervised NMF model for single cell RNA-seq data |
title_short | A robust semi-supervised NMF model for single cell RNA-seq data |
title_sort | robust semi-supervised nmf model for single cell rna-seq data |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7571410/ https://www.ncbi.nlm.nih.gov/pubmed/33088619 http://dx.doi.org/10.7717/peerj.10091 |
work_keys_str_mv | AT wupeng arobustsemisupervisednmfmodelforsinglecellrnaseqdata AT anmo arobustsemisupervisednmfmodelforsinglecellrnaseqdata AT zouhairen arobustsemisupervisednmfmodelforsinglecellrnaseqdata AT zhongcaiying arobustsemisupervisednmfmodelforsinglecellrnaseqdata AT wangwei arobustsemisupervisednmfmodelforsinglecellrnaseqdata AT wuchangpeng arobustsemisupervisednmfmodelforsinglecellrnaseqdata AT wupeng robustsemisupervisednmfmodelforsinglecellrnaseqdata AT anmo robustsemisupervisednmfmodelforsinglecellrnaseqdata AT zouhairen robustsemisupervisednmfmodelforsinglecellrnaseqdata AT zhongcaiying robustsemisupervisednmfmodelforsinglecellrnaseqdata AT wangwei robustsemisupervisednmfmodelforsinglecellrnaseqdata AT wuchangpeng robustsemisupervisednmfmodelforsinglecellrnaseqdata |