Cargando…

Intrinsic entropy model for feature selection of scRNA-seq data

Recent advances of single-cell RNA sequencing (scRNA-seq) technologies have led to extensive study of cellular heterogeneity and cell-to-cell variation. However, the high frequency of dropout events and noise in scRNA-seq data confounds the accuracy of the downstream analysis, i.e. clustering analys...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Lin, Tang, Hui, Xia, Rui, Dai, Hao, Liu, Rui, Chen, Luonan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9175189/
https://www.ncbi.nlm.nih.gov/pubmed/35102420
http://dx.doi.org/10.1093/jmcb/mjac008
_version_ 1784722403853598720
author Li, Lin
Tang, Hui
Xia, Rui
Dai, Hao
Liu, Rui
Chen, Luonan
author_facet Li, Lin
Tang, Hui
Xia, Rui
Dai, Hao
Liu, Rui
Chen, Luonan
author_sort Li, Lin
collection PubMed
description Recent advances of single-cell RNA sequencing (scRNA-seq) technologies have led to extensive study of cellular heterogeneity and cell-to-cell variation. However, the high frequency of dropout events and noise in scRNA-seq data confounds the accuracy of the downstream analysis, i.e. clustering analysis, whose accuracy depends heavily on the selected feature genes. Here, by deriving an entropy decomposition formula, we propose a feature selection method, i.e. an intrinsic entropy (IE) model, to identify the informative genes for accurately clustering analysis. Specifically, by eliminating the ‘noisy’ fluctuation or extrinsic entropy (EE), we extract the IE of each gene from the total entropy (TE), i.e. TE = IE + EE. We show that the IE of each gene actually reflects the regulatory fluctuation of this gene in a cellular process, and thus high-IE genes provide rich information on cell type or state analysis. To validate the performance of the high-IE genes, we conduct computational analysis on both simulated datasets and real single-cell datasets by comparing with other representative methods. The results show that our IE model is not only broadly applicable and robust for different clustering and classification methods, but also sensitive for novel cell types. Our results also demonstrate that the intrinsic entropy/fluctuation of a gene serves as information rather than noise in contrast to its total entropy/fluctuation.
format Online
Article
Text
id pubmed-9175189
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-91751892022-06-09 Intrinsic entropy model for feature selection of scRNA-seq data Li, Lin Tang, Hui Xia, Rui Dai, Hao Liu, Rui Chen, Luonan J Mol Cell Biol Article Recent advances of single-cell RNA sequencing (scRNA-seq) technologies have led to extensive study of cellular heterogeneity and cell-to-cell variation. However, the high frequency of dropout events and noise in scRNA-seq data confounds the accuracy of the downstream analysis, i.e. clustering analysis, whose accuracy depends heavily on the selected feature genes. Here, by deriving an entropy decomposition formula, we propose a feature selection method, i.e. an intrinsic entropy (IE) model, to identify the informative genes for accurately clustering analysis. Specifically, by eliminating the ‘noisy’ fluctuation or extrinsic entropy (EE), we extract the IE of each gene from the total entropy (TE), i.e. TE = IE + EE. We show that the IE of each gene actually reflects the regulatory fluctuation of this gene in a cellular process, and thus high-IE genes provide rich information on cell type or state analysis. To validate the performance of the high-IE genes, we conduct computational analysis on both simulated datasets and real single-cell datasets by comparing with other representative methods. The results show that our IE model is not only broadly applicable and robust for different clustering and classification methods, but also sensitive for novel cell types. Our results also demonstrate that the intrinsic entropy/fluctuation of a gene serves as information rather than noise in contrast to its total entropy/fluctuation. Oxford University Press 2022-01-31 /pmc/articles/PMC9175189/ /pubmed/35102420 http://dx.doi.org/10.1093/jmcb/mjac008 Text en © The Author(s) (2022). Published by Oxford University Press on behalf of Journal of Molecular Cell Biology, CEMCS, CAS. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Article
Li, Lin
Tang, Hui
Xia, Rui
Dai, Hao
Liu, Rui
Chen, Luonan
Intrinsic entropy model for feature selection of scRNA-seq data
title Intrinsic entropy model for feature selection of scRNA-seq data
title_full Intrinsic entropy model for feature selection of scRNA-seq data
title_fullStr Intrinsic entropy model for feature selection of scRNA-seq data
title_full_unstemmed Intrinsic entropy model for feature selection of scRNA-seq data
title_short Intrinsic entropy model for feature selection of scRNA-seq data
title_sort intrinsic entropy model for feature selection of scrna-seq data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9175189/
https://www.ncbi.nlm.nih.gov/pubmed/35102420
http://dx.doi.org/10.1093/jmcb/mjac008
work_keys_str_mv AT lilin intrinsicentropymodelforfeatureselectionofscrnaseqdata
AT tanghui intrinsicentropymodelforfeatureselectionofscrnaseqdata
AT xiarui intrinsicentropymodelforfeatureselectionofscrnaseqdata
AT daihao intrinsicentropymodelforfeatureselectionofscrnaseqdata
AT liurui intrinsicentropymodelforfeatureselectionofscrnaseqdata
AT chenluonan intrinsicentropymodelforfeatureselectionofscrnaseqdata