Cargando…

Scalable preprocessing for sparse scRNA-seq data exploiting prior knowledge

MOTIVATION: Single cell RNA-seq (scRNA-seq) data contains a wealth of information which has to be inferred computationally from the observed sequencing reads. As the ability to sequence more cells improves rapidly, existing computational tools suffer from three problems. (i) The decreased reads-per-...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mukherjee, Sumit, Zhang, Yue, Fan, Joshua, Seelig, Georg, Kannan, Sreeram
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2018
Materias:	Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022691/ https://www.ncbi.nlm.nih.gov/pubmed/29949988 http://dx.doi.org/10.1093/bioinformatics/bty293

_version_	1783335732209778688
author	Mukherjee, Sumit Zhang, Yue Fan, Joshua Seelig, Georg Kannan, Sreeram
author_facet	Mukherjee, Sumit Zhang, Yue Fan, Joshua Seelig, Georg Kannan, Sreeram
author_sort	Mukherjee, Sumit
collection	PubMed
description	MOTIVATION: Single cell RNA-seq (scRNA-seq) data contains a wealth of information which has to be inferred computationally from the observed sequencing reads. As the ability to sequence more cells improves rapidly, existing computational tools suffer from three problems. (i) The decreased reads-per-cell implies a highly sparse sample of the true cellular transcriptome. (ii) Many tools simply cannot handle the size of the resulting datasets. (iii) Prior biological knowledge such as bulk RNA-seq information of certain cell types or qualitative marker information is not taken into account. Here we present UNCURL, a preprocessing framework based on non-negative matrix factorization for scRNA-seq data, that is able to handle varying sampling distributions, scales to very large cell numbers and can incorporate prior knowledge. RESULTS: We find that preprocessing using UNCURL consistently improves performance of commonly used scRNA-seq tools for clustering, visualization and lineage estimation, both in the absence and presence of prior knowledge. Finally we demonstrate that UNCURL is extremely scalable and parallelizable, and runs faster than other methods on a scRNA-seq dataset containing 1.3 million cells. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/yjzhang/uncurl_python. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-6022691
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-60226912018-07-05 Scalable preprocessing for sparse scRNA-seq data exploiting prior knowledge Mukherjee, Sumit Zhang, Yue Fan, Joshua Seelig, Georg Kannan, Sreeram Bioinformatics Ismb 2018–Intelligent Systems for Molecular Biology Proceedings MOTIVATION: Single cell RNA-seq (scRNA-seq) data contains a wealth of information which has to be inferred computationally from the observed sequencing reads. As the ability to sequence more cells improves rapidly, existing computational tools suffer from three problems. (i) The decreased reads-per-cell implies a highly sparse sample of the true cellular transcriptome. (ii) Many tools simply cannot handle the size of the resulting datasets. (iii) Prior biological knowledge such as bulk RNA-seq information of certain cell types or qualitative marker information is not taken into account. Here we present UNCURL, a preprocessing framework based on non-negative matrix factorization for scRNA-seq data, that is able to handle varying sampling distributions, scales to very large cell numbers and can incorporate prior knowledge. RESULTS: We find that preprocessing using UNCURL consistently improves performance of commonly used scRNA-seq tools for clustering, visualization and lineage estimation, both in the absence and presence of prior knowledge. Finally we demonstrate that UNCURL is extremely scalable and parallelizable, and runs faster than other methods on a scRNA-seq dataset containing 1.3 million cells. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/yjzhang/uncurl_python. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-07-01 2018-06-27 /pmc/articles/PMC6022691/ /pubmed/29949988 http://dx.doi.org/10.1093/bioinformatics/bty293 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Ismb 2018–Intelligent Systems for Molecular Biology Proceedings Mukherjee, Sumit Zhang, Yue Fan, Joshua Seelig, Georg Kannan, Sreeram Scalable preprocessing for sparse scRNA-seq data exploiting prior knowledge
title	Scalable preprocessing for sparse scRNA-seq data exploiting prior knowledge
title_full	Scalable preprocessing for sparse scRNA-seq data exploiting prior knowledge
title_fullStr	Scalable preprocessing for sparse scRNA-seq data exploiting prior knowledge
title_full_unstemmed	Scalable preprocessing for sparse scRNA-seq data exploiting prior knowledge
title_short	Scalable preprocessing for sparse scRNA-seq data exploiting prior knowledge
title_sort	scalable preprocessing for sparse scrna-seq data exploiting prior knowledge
topic	Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022691/ https://www.ncbi.nlm.nih.gov/pubmed/29949988 http://dx.doi.org/10.1093/bioinformatics/bty293
work_keys_str_mv	AT mukherjeesumit scalablepreprocessingforsparsescrnaseqdataexploitingpriorknowledge AT zhangyue scalablepreprocessingforsparsescrnaseqdataexploitingpriorknowledge AT fanjoshua scalablepreprocessingforsparsescrnaseqdataexploitingpriorknowledge AT seeliggeorg scalablepreprocessingforsparsescrnaseqdataexploitingpriorknowledge AT kannansreeram scalablepreprocessingforsparsescrnaseqdataexploitingpriorknowledge

Scalable preprocessing for sparse scRNA-seq data exploiting prior knowledge

Ejemplares similares