Cargando…

Dimensionality reduction for single cell RNA sequencing data using constrained robust non-negative matrix factorization

Single cell RNA-sequencing (scRNA-seq) technology, a powerful tool for analyzing the entire transcriptome at single cell level, is receiving increasing research attention. The presence of dropouts is an important characteristic of scRNA-seq data that may affect the performance of downstream analyses...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Shuqin, Yang, Liu, Yang, Jinwen, Lin, Zhixiang, Ng, Michael K
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671375/
https://www.ncbi.nlm.nih.gov/pubmed/33575614
http://dx.doi.org/10.1093/nargab/lqaa064
_version_ 1783610917715443712
author Zhang, Shuqin
Yang, Liu
Yang, Jinwen
Lin, Zhixiang
Ng, Michael K
author_facet Zhang, Shuqin
Yang, Liu
Yang, Jinwen
Lin, Zhixiang
Ng, Michael K
author_sort Zhang, Shuqin
collection PubMed
description Single cell RNA-sequencing (scRNA-seq) technology, a powerful tool for analyzing the entire transcriptome at single cell level, is receiving increasing research attention. The presence of dropouts is an important characteristic of scRNA-seq data that may affect the performance of downstream analyses, such as dimensionality reduction and clustering. Cells sequenced to lower depths tend to have more dropouts than those sequenced to greater depths. In this study, we aimed to develop a dimensionality reduction method to address both dropouts and the non-negativity constraints in scRNA-seq data. The developed method simultaneously performs dimensionality reduction and dropout imputation under the non-negative matrix factorization (NMF) framework. The dropouts were modeled as a non-negative sparse matrix. Summation of the observed data matrix and dropout matrix was approximated by NMF. To ensure the sparsity pattern was maintained, a weighted ℓ(1) penalty that took into account the dependency of dropouts on the sequencing depth in each cell was imposed. An efficient algorithm was developed to solve the proposed optimization problem. Experiments using both synthetic data and real data showed that dimensionality reduction via the proposed method afforded more robust clustering results compared with those obtained from the existing methods, and that dropout imputation improved the differential expression analysis.
format Online
Article
Text
id pubmed-7671375
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-76713752021-02-10 Dimensionality reduction for single cell RNA sequencing data using constrained robust non-negative matrix factorization Zhang, Shuqin Yang, Liu Yang, Jinwen Lin, Zhixiang Ng, Michael K NAR Genom Bioinform Standard Article Single cell RNA-sequencing (scRNA-seq) technology, a powerful tool for analyzing the entire transcriptome at single cell level, is receiving increasing research attention. The presence of dropouts is an important characteristic of scRNA-seq data that may affect the performance of downstream analyses, such as dimensionality reduction and clustering. Cells sequenced to lower depths tend to have more dropouts than those sequenced to greater depths. In this study, we aimed to develop a dimensionality reduction method to address both dropouts and the non-negativity constraints in scRNA-seq data. The developed method simultaneously performs dimensionality reduction and dropout imputation under the non-negative matrix factorization (NMF) framework. The dropouts were modeled as a non-negative sparse matrix. Summation of the observed data matrix and dropout matrix was approximated by NMF. To ensure the sparsity pattern was maintained, a weighted ℓ(1) penalty that took into account the dependency of dropouts on the sequencing depth in each cell was imposed. An efficient algorithm was developed to solve the proposed optimization problem. Experiments using both synthetic data and real data showed that dimensionality reduction via the proposed method afforded more robust clustering results compared with those obtained from the existing methods, and that dropout imputation improved the differential expression analysis. Oxford University Press 2020-08-28 /pmc/articles/PMC7671375/ /pubmed/33575614 http://dx.doi.org/10.1093/nargab/lqaa064 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Standard Article
Zhang, Shuqin
Yang, Liu
Yang, Jinwen
Lin, Zhixiang
Ng, Michael K
Dimensionality reduction for single cell RNA sequencing data using constrained robust non-negative matrix factorization
title Dimensionality reduction for single cell RNA sequencing data using constrained robust non-negative matrix factorization
title_full Dimensionality reduction for single cell RNA sequencing data using constrained robust non-negative matrix factorization
title_fullStr Dimensionality reduction for single cell RNA sequencing data using constrained robust non-negative matrix factorization
title_full_unstemmed Dimensionality reduction for single cell RNA sequencing data using constrained robust non-negative matrix factorization
title_short Dimensionality reduction for single cell RNA sequencing data using constrained robust non-negative matrix factorization
title_sort dimensionality reduction for single cell rna sequencing data using constrained robust non-negative matrix factorization
topic Standard Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671375/
https://www.ncbi.nlm.nih.gov/pubmed/33575614
http://dx.doi.org/10.1093/nargab/lqaa064
work_keys_str_mv AT zhangshuqin dimensionalityreductionforsinglecellrnasequencingdatausingconstrainedrobustnonnegativematrixfactorization
AT yangliu dimensionalityreductionforsinglecellrnasequencingdatausingconstrainedrobustnonnegativematrixfactorization
AT yangjinwen dimensionalityreductionforsinglecellrnasequencingdatausingconstrainedrobustnonnegativematrixfactorization
AT linzhixiang dimensionalityreductionforsinglecellrnasequencingdatausingconstrainedrobustnonnegativematrixfactorization
AT ngmichaelk dimensionalityreductionforsinglecellrnasequencingdatausingconstrainedrobustnonnegativematrixfactorization