Cargando…

scSensitiveGeneDefine: sensitive gene detection in single-cell RNA sequencing data by Shannon entropy

BACKGROUND: Single-cell RNA sequencing (scRNA-seq) is the most widely used technique to obtain gene expression profiles from complex tissues. Cell subsets and developmental states are often identified via differential gene expression patterns. Most of the single-cell tools utilized highly variable g...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Zechuan, Yang, Zeruo, Yuan, Xiaojun, Zhang, Xiaoming, Hao, Pei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8063398/
https://www.ncbi.nlm.nih.gov/pubmed/33888056
http://dx.doi.org/10.1186/s12859-021-04136-1
_version_ 1783681946092568576
author Chen, Zechuan
Yang, Zeruo
Yuan, Xiaojun
Zhang, Xiaoming
Hao, Pei
author_facet Chen, Zechuan
Yang, Zeruo
Yuan, Xiaojun
Zhang, Xiaoming
Hao, Pei
author_sort Chen, Zechuan
collection PubMed
description BACKGROUND: Single-cell RNA sequencing (scRNA-seq) is the most widely used technique to obtain gene expression profiles from complex tissues. Cell subsets and developmental states are often identified via differential gene expression patterns. Most of the single-cell tools utilized highly variable genes to annotate cell subsets and states. However, we have discovered that a group of genes, which sensitively respond to environmental stimuli with high coefficients of variation (CV), might impose overwhelming influences on the cell type annotation. RESULT: In this research, we developed a method, based on the CV-rank and Shannon entropy, to identify these noise genes, and termed them as “sensitive genes”. To validate the reliability of our methods, we applied our tools in 11 single-cell data sets from different human tissues. The results showed that most of the sensitive genes were enriched pathways related to cellular stress response. Furthermore, we noticed that the unsupervised result was closer to the ground-truth cell labels, after removing the sensitive genes detected by our tools. CONCLUSION: Our study revealed the prevalence of stochastic gene expression patterns in most types of cells, compared the differences among cell marker genes, housekeeping genes (HK genes), and sensitive genes, demonstrated the similarities of functions of sensitive genes in various scRNA-seq data sets, and improved the results of unsupervised clustering towards the ground-truth labels. We hope our method would provide new insights into the reduction of data noise in scRNA-seq data analysis and contribute to the development of better scRNA-seq unsupervised clustering algorithms in the future. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04136-1.
format Online
Article
Text
id pubmed-8063398
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-80633982021-04-23 scSensitiveGeneDefine: sensitive gene detection in single-cell RNA sequencing data by Shannon entropy Chen, Zechuan Yang, Zeruo Yuan, Xiaojun Zhang, Xiaoming Hao, Pei BMC Bioinformatics Research BACKGROUND: Single-cell RNA sequencing (scRNA-seq) is the most widely used technique to obtain gene expression profiles from complex tissues. Cell subsets and developmental states are often identified via differential gene expression patterns. Most of the single-cell tools utilized highly variable genes to annotate cell subsets and states. However, we have discovered that a group of genes, which sensitively respond to environmental stimuli with high coefficients of variation (CV), might impose overwhelming influences on the cell type annotation. RESULT: In this research, we developed a method, based on the CV-rank and Shannon entropy, to identify these noise genes, and termed them as “sensitive genes”. To validate the reliability of our methods, we applied our tools in 11 single-cell data sets from different human tissues. The results showed that most of the sensitive genes were enriched pathways related to cellular stress response. Furthermore, we noticed that the unsupervised result was closer to the ground-truth cell labels, after removing the sensitive genes detected by our tools. CONCLUSION: Our study revealed the prevalence of stochastic gene expression patterns in most types of cells, compared the differences among cell marker genes, housekeeping genes (HK genes), and sensitive genes, demonstrated the similarities of functions of sensitive genes in various scRNA-seq data sets, and improved the results of unsupervised clustering towards the ground-truth labels. We hope our method would provide new insights into the reduction of data noise in scRNA-seq data analysis and contribute to the development of better scRNA-seq unsupervised clustering algorithms in the future. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04136-1. BioMed Central 2021-04-22 /pmc/articles/PMC8063398/ /pubmed/33888056 http://dx.doi.org/10.1186/s12859-021-04136-1 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Chen, Zechuan
Yang, Zeruo
Yuan, Xiaojun
Zhang, Xiaoming
Hao, Pei
scSensitiveGeneDefine: sensitive gene detection in single-cell RNA sequencing data by Shannon entropy
title scSensitiveGeneDefine: sensitive gene detection in single-cell RNA sequencing data by Shannon entropy
title_full scSensitiveGeneDefine: sensitive gene detection in single-cell RNA sequencing data by Shannon entropy
title_fullStr scSensitiveGeneDefine: sensitive gene detection in single-cell RNA sequencing data by Shannon entropy
title_full_unstemmed scSensitiveGeneDefine: sensitive gene detection in single-cell RNA sequencing data by Shannon entropy
title_short scSensitiveGeneDefine: sensitive gene detection in single-cell RNA sequencing data by Shannon entropy
title_sort scsensitivegenedefine: sensitive gene detection in single-cell rna sequencing data by shannon entropy
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8063398/
https://www.ncbi.nlm.nih.gov/pubmed/33888056
http://dx.doi.org/10.1186/s12859-021-04136-1
work_keys_str_mv AT chenzechuan scsensitivegenedefinesensitivegenedetectioninsinglecellrnasequencingdatabyshannonentropy
AT yangzeruo scsensitivegenedefinesensitivegenedetectioninsinglecellrnasequencingdatabyshannonentropy
AT yuanxiaojun scsensitivegenedefinesensitivegenedetectioninsinglecellrnasequencingdatabyshannonentropy
AT zhangxiaoming scsensitivegenedefinesensitivegenedetectioninsinglecellrnasequencingdatabyshannonentropy
AT haopei scsensitivegenedefinesensitivegenedetectioninsinglecellrnasequencingdatabyshannonentropy