Cargando…

Network-Based Single-Cell RNA-Seq Data Imputation Enhances Cell Type Identification

Single-cell RNA sequencing is a powerful technology for obtaining transcriptomes at single-cell resolutions. However, it suffers from dropout events (i.e., excess zero counts) since only a small fraction of transcripts get sequenced in each cell during the sequencing process. This inherent sparsity...

Descripción completa

Detalles Bibliográficos
Autores principales: Zand, Maryam, Ruan, Jianhua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7230610/
https://www.ncbi.nlm.nih.gov/pubmed/32244427
http://dx.doi.org/10.3390/genes11040377
_version_ 1783534994794217472
author Zand, Maryam
Ruan, Jianhua
author_facet Zand, Maryam
Ruan, Jianhua
author_sort Zand, Maryam
collection PubMed
description Single-cell RNA sequencing is a powerful technology for obtaining transcriptomes at single-cell resolutions. However, it suffers from dropout events (i.e., excess zero counts) since only a small fraction of transcripts get sequenced in each cell during the sequencing process. This inherent sparsity of expression profiles hinders further characterizations at cell/gene-level such as cell type identification and downstream analysis. To alleviate this dropout issue we introduce a network-based method, netImpute, by leveraging the hidden information in gene co-expression networks to recover real signals. netImpute employs Random Walk with Restart (RWR) to adjust the gene expression level in a given cell by borrowing information from its neighbors in a gene co-expression network. Performance evaluation and comparison with existing tools on simulated data and seven real datasets show that netImpute substantially enhances clustering accuracy and data visualization clarity, thanks to its effective treatment of dropouts. While the idea of netImpute is general and can be applied with other types of networks such as cell co-expression network or protein–protein interaction (PPI) network, evaluation results show that gene co-expression network is consistently more beneficial, presumably because PPI network usually lacks cell type context, while cell co-expression network can cause information loss for rare cell types. Evaluation results on several biological datasets show that netImpute can more effectively recover missing transcripts in scRNA-seq data and enhance the identification and visualization of heterogeneous cell types than existing methods.
format Online
Article
Text
id pubmed-7230610
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-72306102020-05-22 Network-Based Single-Cell RNA-Seq Data Imputation Enhances Cell Type Identification Zand, Maryam Ruan, Jianhua Genes (Basel) Article Single-cell RNA sequencing is a powerful technology for obtaining transcriptomes at single-cell resolutions. However, it suffers from dropout events (i.e., excess zero counts) since only a small fraction of transcripts get sequenced in each cell during the sequencing process. This inherent sparsity of expression profiles hinders further characterizations at cell/gene-level such as cell type identification and downstream analysis. To alleviate this dropout issue we introduce a network-based method, netImpute, by leveraging the hidden information in gene co-expression networks to recover real signals. netImpute employs Random Walk with Restart (RWR) to adjust the gene expression level in a given cell by borrowing information from its neighbors in a gene co-expression network. Performance evaluation and comparison with existing tools on simulated data and seven real datasets show that netImpute substantially enhances clustering accuracy and data visualization clarity, thanks to its effective treatment of dropouts. While the idea of netImpute is general and can be applied with other types of networks such as cell co-expression network or protein–protein interaction (PPI) network, evaluation results show that gene co-expression network is consistently more beneficial, presumably because PPI network usually lacks cell type context, while cell co-expression network can cause information loss for rare cell types. Evaluation results on several biological datasets show that netImpute can more effectively recover missing transcripts in scRNA-seq data and enhance the identification and visualization of heterogeneous cell types than existing methods. MDPI 2020-03-31 /pmc/articles/PMC7230610/ /pubmed/32244427 http://dx.doi.org/10.3390/genes11040377 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Zand, Maryam
Ruan, Jianhua
Network-Based Single-Cell RNA-Seq Data Imputation Enhances Cell Type Identification
title Network-Based Single-Cell RNA-Seq Data Imputation Enhances Cell Type Identification
title_full Network-Based Single-Cell RNA-Seq Data Imputation Enhances Cell Type Identification
title_fullStr Network-Based Single-Cell RNA-Seq Data Imputation Enhances Cell Type Identification
title_full_unstemmed Network-Based Single-Cell RNA-Seq Data Imputation Enhances Cell Type Identification
title_short Network-Based Single-Cell RNA-Seq Data Imputation Enhances Cell Type Identification
title_sort network-based single-cell rna-seq data imputation enhances cell type identification
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7230610/
https://www.ncbi.nlm.nih.gov/pubmed/32244427
http://dx.doi.org/10.3390/genes11040377
work_keys_str_mv AT zandmaryam networkbasedsinglecellrnaseqdataimputationenhancescelltypeidentification
AT ruanjianhua networkbasedsinglecellrnaseqdataimputationenhancescelltypeidentification