Cargando…

MISC: missing imputation for single-cell RNA sequencing data

BACKGROUND: Single-cell RNA sequencing (scRNA-seq) technology provides an effective way to study cell heterogeneity. However, due to the low capture efficiency and stochastic gene expression, scRNA-seq data often contains a high percentage of missing values. It has been showed that the missing rate...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yang, Mary Qu, Weissman, Sherman M., Yang, William, Zhang, Jialing, Canaann, Allon, Guan, Renchu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2018
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6293493/ https://www.ncbi.nlm.nih.gov/pubmed/30547798 http://dx.doi.org/10.1186/s12918-018-0638-y

_version_	1783380544087654400
author	Yang, Mary Qu Weissman, Sherman M. Yang, William Zhang, Jialing Canaann, Allon Guan, Renchu
author_facet	Yang, Mary Qu Weissman, Sherman M. Yang, William Zhang, Jialing Canaann, Allon Guan, Renchu
author_sort	Yang, Mary Qu
collection	PubMed
description	BACKGROUND: Single-cell RNA sequencing (scRNA-seq) technology provides an effective way to study cell heterogeneity. However, due to the low capture efficiency and stochastic gene expression, scRNA-seq data often contains a high percentage of missing values. It has been showed that the missing rate can reach approximately 30% even after noise reduction. To accurately recover missing values in scRNA-seq data, we need to know where the missing data is; how much data is missing; and what are the values of these data. METHODS: To solve these three problems, we propose a novel model with a hybrid machine learning method, namely, missing imputation for single-cell RNA-seq (MISC). To solve the first problem, we transformed it to a binary classification problem on the RNA-seq expression matrix. Then, for the second problem, we searched for the intersection of the classification results, zero-inflated model and false negative model results. Finally, we used the regression model to recover the data in the missing elements. RESULTS: We compared the raw data without imputation, the mean-smooth neighbor cell trajectory, MISC on chronic myeloid leukemia data (CML), the primary somatosensory cortex and the hippocampal CA1 region of mouse brain cells. On the CML data, MISC discovered a trajectory branch from the CP-CML to the BC-CML, which provides direct evidence of evolution from CP to BC stem cells. On the mouse brain data, MISC clearly divides the pyramidal CA1 into different branches, and it is direct evidence of pyramidal CA1 in the subpopulations. In the meantime, with MISC, the oligodendrocyte cells became an independent group with an apparent boundary. CONCLUSIONS: Our results showed that the MISC model improved the cell type classification and could be instrumental to study cellular heterogeneity. Overall, MISC is a robust missing data imputation model for single-cell RNA-seq data.
format	Online Article Text
id	pubmed-6293493
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-62934932018-12-17 MISC: missing imputation for single-cell RNA sequencing data Yang, Mary Qu Weissman, Sherman M. Yang, William Zhang, Jialing Canaann, Allon Guan, Renchu BMC Syst Biol Research BACKGROUND: Single-cell RNA sequencing (scRNA-seq) technology provides an effective way to study cell heterogeneity. However, due to the low capture efficiency and stochastic gene expression, scRNA-seq data often contains a high percentage of missing values. It has been showed that the missing rate can reach approximately 30% even after noise reduction. To accurately recover missing values in scRNA-seq data, we need to know where the missing data is; how much data is missing; and what are the values of these data. METHODS: To solve these three problems, we propose a novel model with a hybrid machine learning method, namely, missing imputation for single-cell RNA-seq (MISC). To solve the first problem, we transformed it to a binary classification problem on the RNA-seq expression matrix. Then, for the second problem, we searched for the intersection of the classification results, zero-inflated model and false negative model results. Finally, we used the regression model to recover the data in the missing elements. RESULTS: We compared the raw data without imputation, the mean-smooth neighbor cell trajectory, MISC on chronic myeloid leukemia data (CML), the primary somatosensory cortex and the hippocampal CA1 region of mouse brain cells. On the CML data, MISC discovered a trajectory branch from the CP-CML to the BC-CML, which provides direct evidence of evolution from CP to BC stem cells. On the mouse brain data, MISC clearly divides the pyramidal CA1 into different branches, and it is direct evidence of pyramidal CA1 in the subpopulations. In the meantime, with MISC, the oligodendrocyte cells became an independent group with an apparent boundary. CONCLUSIONS: Our results showed that the MISC model improved the cell type classification and could be instrumental to study cellular heterogeneity. Overall, MISC is a robust missing data imputation model for single-cell RNA-seq data. BioMed Central 2018-12-14 /pmc/articles/PMC6293493/ /pubmed/30547798 http://dx.doi.org/10.1186/s12918-018-0638-y Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Yang, Mary Qu Weissman, Sherman M. Yang, William Zhang, Jialing Canaann, Allon Guan, Renchu MISC: missing imputation for single-cell RNA sequencing data
title	MISC: missing imputation for single-cell RNA sequencing data
title_full	MISC: missing imputation for single-cell RNA sequencing data
title_fullStr	MISC: missing imputation for single-cell RNA sequencing data
title_full_unstemmed	MISC: missing imputation for single-cell RNA sequencing data
title_short	MISC: missing imputation for single-cell RNA sequencing data
title_sort	misc: missing imputation for single-cell rna sequencing data
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6293493/ https://www.ncbi.nlm.nih.gov/pubmed/30547798 http://dx.doi.org/10.1186/s12918-018-0638-y
work_keys_str_mv	AT yangmaryqu miscmissingimputationforsinglecellrnasequencingdata AT weissmanshermanm miscmissingimputationforsinglecellrnasequencingdata AT yangwilliam miscmissingimputationforsinglecellrnasequencingdata AT zhangjialing miscmissingimputationforsinglecellrnasequencingdata AT canaannallon miscmissingimputationforsinglecellrnasequencingdata AT guanrenchu miscmissingimputationforsinglecellrnasequencingdata

MISC: missing imputation for single-cell RNA sequencing data

Ejemplares similares