Cargando…
scSensitiveGeneDefine: sensitive gene detection in single-cell RNA sequencing data by Shannon entropy
BACKGROUND: Single-cell RNA sequencing (scRNA-seq) is the most widely used technique to obtain gene expression profiles from complex tissues. Cell subsets and developmental states are often identified via differential gene expression patterns. Most of the single-cell tools utilized highly variable g...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8063398/ https://www.ncbi.nlm.nih.gov/pubmed/33888056 http://dx.doi.org/10.1186/s12859-021-04136-1 |
_version_ | 1783681946092568576 |
---|---|
author | Chen, Zechuan Yang, Zeruo Yuan, Xiaojun Zhang, Xiaoming Hao, Pei |
author_facet | Chen, Zechuan Yang, Zeruo Yuan, Xiaojun Zhang, Xiaoming Hao, Pei |
author_sort | Chen, Zechuan |
collection | PubMed |
description | BACKGROUND: Single-cell RNA sequencing (scRNA-seq) is the most widely used technique to obtain gene expression profiles from complex tissues. Cell subsets and developmental states are often identified via differential gene expression patterns. Most of the single-cell tools utilized highly variable genes to annotate cell subsets and states. However, we have discovered that a group of genes, which sensitively respond to environmental stimuli with high coefficients of variation (CV), might impose overwhelming influences on the cell type annotation. RESULT: In this research, we developed a method, based on the CV-rank and Shannon entropy, to identify these noise genes, and termed them as “sensitive genes”. To validate the reliability of our methods, we applied our tools in 11 single-cell data sets from different human tissues. The results showed that most of the sensitive genes were enriched pathways related to cellular stress response. Furthermore, we noticed that the unsupervised result was closer to the ground-truth cell labels, after removing the sensitive genes detected by our tools. CONCLUSION: Our study revealed the prevalence of stochastic gene expression patterns in most types of cells, compared the differences among cell marker genes, housekeeping genes (HK genes), and sensitive genes, demonstrated the similarities of functions of sensitive genes in various scRNA-seq data sets, and improved the results of unsupervised clustering towards the ground-truth labels. We hope our method would provide new insights into the reduction of data noise in scRNA-seq data analysis and contribute to the development of better scRNA-seq unsupervised clustering algorithms in the future. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04136-1. |
format | Online Article Text |
id | pubmed-8063398 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-80633982021-04-23 scSensitiveGeneDefine: sensitive gene detection in single-cell RNA sequencing data by Shannon entropy Chen, Zechuan Yang, Zeruo Yuan, Xiaojun Zhang, Xiaoming Hao, Pei BMC Bioinformatics Research BACKGROUND: Single-cell RNA sequencing (scRNA-seq) is the most widely used technique to obtain gene expression profiles from complex tissues. Cell subsets and developmental states are often identified via differential gene expression patterns. Most of the single-cell tools utilized highly variable genes to annotate cell subsets and states. However, we have discovered that a group of genes, which sensitively respond to environmental stimuli with high coefficients of variation (CV), might impose overwhelming influences on the cell type annotation. RESULT: In this research, we developed a method, based on the CV-rank and Shannon entropy, to identify these noise genes, and termed them as “sensitive genes”. To validate the reliability of our methods, we applied our tools in 11 single-cell data sets from different human tissues. The results showed that most of the sensitive genes were enriched pathways related to cellular stress response. Furthermore, we noticed that the unsupervised result was closer to the ground-truth cell labels, after removing the sensitive genes detected by our tools. CONCLUSION: Our study revealed the prevalence of stochastic gene expression patterns in most types of cells, compared the differences among cell marker genes, housekeeping genes (HK genes), and sensitive genes, demonstrated the similarities of functions of sensitive genes in various scRNA-seq data sets, and improved the results of unsupervised clustering towards the ground-truth labels. We hope our method would provide new insights into the reduction of data noise in scRNA-seq data analysis and contribute to the development of better scRNA-seq unsupervised clustering algorithms in the future. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04136-1. BioMed Central 2021-04-22 /pmc/articles/PMC8063398/ /pubmed/33888056 http://dx.doi.org/10.1186/s12859-021-04136-1 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Chen, Zechuan Yang, Zeruo Yuan, Xiaojun Zhang, Xiaoming Hao, Pei scSensitiveGeneDefine: sensitive gene detection in single-cell RNA sequencing data by Shannon entropy |
title | scSensitiveGeneDefine: sensitive gene detection in single-cell RNA sequencing data by Shannon entropy |
title_full | scSensitiveGeneDefine: sensitive gene detection in single-cell RNA sequencing data by Shannon entropy |
title_fullStr | scSensitiveGeneDefine: sensitive gene detection in single-cell RNA sequencing data by Shannon entropy |
title_full_unstemmed | scSensitiveGeneDefine: sensitive gene detection in single-cell RNA sequencing data by Shannon entropy |
title_short | scSensitiveGeneDefine: sensitive gene detection in single-cell RNA sequencing data by Shannon entropy |
title_sort | scsensitivegenedefine: sensitive gene detection in single-cell rna sequencing data by shannon entropy |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8063398/ https://www.ncbi.nlm.nih.gov/pubmed/33888056 http://dx.doi.org/10.1186/s12859-021-04136-1 |
work_keys_str_mv | AT chenzechuan scsensitivegenedefinesensitivegenedetectioninsinglecellrnasequencingdatabyshannonentropy AT yangzeruo scsensitivegenedefinesensitivegenedetectioninsinglecellrnasequencingdatabyshannonentropy AT yuanxiaojun scsensitivegenedefinesensitivegenedetectioninsinglecellrnasequencingdatabyshannonentropy AT zhangxiaoming scsensitivegenedefinesensitivegenedetectioninsinglecellrnasequencingdatabyshannonentropy AT haopei scsensitivegenedefinesensitivegenedetectioninsinglecellrnasequencingdatabyshannonentropy |