Cargando…

Transfer learning for clustering single-cell RNA-seq data crossing-species and batch, case on uterine fibroids

Due to the high dimensionality and sparsity of the gene expression matrix in single-cell RNA-sequencing (scRNA-seq) data, coupled with significant noise generated by shallow sequencing, it poses a great challenge for cell clustering methods. While numerous computational methods have been proposed, t...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Yu Mei, Sun, Yuzhi, Wang, Beiying, Wu, Zhiping, He, Xiao Ying, Zhao, Yuansong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10664408/
https://www.ncbi.nlm.nih.gov/pubmed/37991248
http://dx.doi.org/10.1093/bib/bbad426
_version_ 1785148730654064640
author Wang, Yu Mei
Sun, Yuzhi
Wang, Beiying
Wu, Zhiping
He, Xiao Ying
Zhao, Yuansong
author_facet Wang, Yu Mei
Sun, Yuzhi
Wang, Beiying
Wu, Zhiping
He, Xiao Ying
Zhao, Yuansong
author_sort Wang, Yu Mei
collection PubMed
description Due to the high dimensionality and sparsity of the gene expression matrix in single-cell RNA-sequencing (scRNA-seq) data, coupled with significant noise generated by shallow sequencing, it poses a great challenge for cell clustering methods. While numerous computational methods have been proposed, the majority of existing approaches center on processing the target dataset itself. This approach disregards the wealth of knowledge present within other species and batches of scRNA-seq data. In light of this, our paper proposes a novel method named graph-based deep embedding clustering (GDEC) that leverages transfer learning across species and batches. GDEC integrates graph convolutional networks, effectively overcoming the challenges posed by sparse gene expression matrices. Additionally, the incorporation of DEC in GDEC enables the partitioning of cell clusters within a lower-dimensional space, thereby mitigating the adverse effects of noise on clustering outcomes. GDEC constructs a model based on existing scRNA-seq datasets and then applying transfer learning techniques to fine-tune the model using a limited amount of prior knowledge gleaned from the target dataset. This empowers GDEC to adeptly cluster scRNA-seq data cross different species and batches. Through cross-species and cross-batch clustering experiments, we conducted a comparative analysis between GDEC and conventional packages. Furthermore, we implemented GDEC on the scRNA-seq data of uterine fibroids. Compared results obtained from the Seurat package, GDEC unveiled a novel cell type (epithelial cells) and identified a notable number of new pathways among various cell types, thus underscoring the enhanced analytical capabilities of GDEC. Availability and implementation: https://github.com/YuzhiSun/GDEC/tree/main
format Online
Article
Text
id pubmed-10664408
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-106644082023-11-22 Transfer learning for clustering single-cell RNA-seq data crossing-species and batch, case on uterine fibroids Wang, Yu Mei Sun, Yuzhi Wang, Beiying Wu, Zhiping He, Xiao Ying Zhao, Yuansong Brief Bioinform Problem Solving Protocol Due to the high dimensionality and sparsity of the gene expression matrix in single-cell RNA-sequencing (scRNA-seq) data, coupled with significant noise generated by shallow sequencing, it poses a great challenge for cell clustering methods. While numerous computational methods have been proposed, the majority of existing approaches center on processing the target dataset itself. This approach disregards the wealth of knowledge present within other species and batches of scRNA-seq data. In light of this, our paper proposes a novel method named graph-based deep embedding clustering (GDEC) that leverages transfer learning across species and batches. GDEC integrates graph convolutional networks, effectively overcoming the challenges posed by sparse gene expression matrices. Additionally, the incorporation of DEC in GDEC enables the partitioning of cell clusters within a lower-dimensional space, thereby mitigating the adverse effects of noise on clustering outcomes. GDEC constructs a model based on existing scRNA-seq datasets and then applying transfer learning techniques to fine-tune the model using a limited amount of prior knowledge gleaned from the target dataset. This empowers GDEC to adeptly cluster scRNA-seq data cross different species and batches. Through cross-species and cross-batch clustering experiments, we conducted a comparative analysis between GDEC and conventional packages. Furthermore, we implemented GDEC on the scRNA-seq data of uterine fibroids. Compared results obtained from the Seurat package, GDEC unveiled a novel cell type (epithelial cells) and identified a notable number of new pathways among various cell types, thus underscoring the enhanced analytical capabilities of GDEC. Availability and implementation: https://github.com/YuzhiSun/GDEC/tree/main Oxford University Press 2023-11-22 /pmc/articles/PMC10664408/ /pubmed/37991248 http://dx.doi.org/10.1093/bib/bbad426 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Problem Solving Protocol
Wang, Yu Mei
Sun, Yuzhi
Wang, Beiying
Wu, Zhiping
He, Xiao Ying
Zhao, Yuansong
Transfer learning for clustering single-cell RNA-seq data crossing-species and batch, case on uterine fibroids
title Transfer learning for clustering single-cell RNA-seq data crossing-species and batch, case on uterine fibroids
title_full Transfer learning for clustering single-cell RNA-seq data crossing-species and batch, case on uterine fibroids
title_fullStr Transfer learning for clustering single-cell RNA-seq data crossing-species and batch, case on uterine fibroids
title_full_unstemmed Transfer learning for clustering single-cell RNA-seq data crossing-species and batch, case on uterine fibroids
title_short Transfer learning for clustering single-cell RNA-seq data crossing-species and batch, case on uterine fibroids
title_sort transfer learning for clustering single-cell rna-seq data crossing-species and batch, case on uterine fibroids
topic Problem Solving Protocol
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10664408/
https://www.ncbi.nlm.nih.gov/pubmed/37991248
http://dx.doi.org/10.1093/bib/bbad426
work_keys_str_mv AT wangyumei transferlearningforclusteringsinglecellrnaseqdatacrossingspeciesandbatchcaseonuterinefibroids
AT sunyuzhi transferlearningforclusteringsinglecellrnaseqdatacrossingspeciesandbatchcaseonuterinefibroids
AT wangbeiying transferlearningforclusteringsinglecellrnaseqdatacrossingspeciesandbatchcaseonuterinefibroids
AT wuzhiping transferlearningforclusteringsinglecellrnaseqdatacrossingspeciesandbatchcaseonuterinefibroids
AT hexiaoying transferlearningforclusteringsinglecellrnaseqdatacrossingspeciesandbatchcaseonuterinefibroids
AT zhaoyuansong transferlearningforclusteringsinglecellrnaseqdatacrossingspeciesandbatchcaseonuterinefibroids