Cargando…

HArmonized single-cell RNA-seq Cell type Assisted Deconvolution (HASCAD)

BACKGROUND: Cell composition deconvolution (CCD) is a type of bioinformatic task to estimate the cell fractions from bulk gene expression profiles, such as RNA-seq. Many CCD models were developed to perform linear regression analysis using reference gene expression signatures of distinct cell types....

Descripción completa

Detalles Bibliográficos
Autores principales: Chiu, Yen-Jung, Ni, Chung-En, Huang, Yen-Hua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10619225/
https://www.ncbi.nlm.nih.gov/pubmed/37907883
http://dx.doi.org/10.1186/s12920-023-01674-w
_version_ 1785129940628275200
author Chiu, Yen-Jung
Ni, Chung-En
Huang, Yen-Hua
author_facet Chiu, Yen-Jung
Ni, Chung-En
Huang, Yen-Hua
author_sort Chiu, Yen-Jung
collection PubMed
description BACKGROUND: Cell composition deconvolution (CCD) is a type of bioinformatic task to estimate the cell fractions from bulk gene expression profiles, such as RNA-seq. Many CCD models were developed to perform linear regression analysis using reference gene expression signatures of distinct cell types. Reference gene expression signatures could be generated from cell-specific gene expression profiles, such as scRNA-seq. However, the batch effects and dropout events frequently observed across scRNA-seq datasets have limited the performances of CCD methods. METHODS: We developed a deep neural network (DNN) model, HASCAD, to predict the cell fractions of up to 15 immune cell types. HASCAD was trained using the bulk RNA-seq simulated from three scRNA-seq datasets that have been normalized by using a Harmony-Symphony based strategy. Mean square error and Pearson correlation coefficient were used to compare the performance of HASCAD with those of other widely used CCD methods. Two types of datasets, including a set of simulated bulk RNA-seq, and three human PBMC RNA-seq datasets, were arranged to conduct the benchmarks. RESULTS: HASCAD is useful for the investigation of the impacts of immune cell heterogeneity on the therapeutic effects of immune checkpoint inhibitors, since the target cell types include the ones known to play a role in anti-tumor immunity, such as three subtypes of CD8 T cells and three subtypes of CD4 T cells. We found that the removal of batch effects in the reference scRNA-seq datasets could benefit the task of CCD. Our benchmarks showed that HASCAD is more suitable for analyzing bulk RNA-seq data, compared with the two widely used CCD methods, CIBERSORTx and quanTIseq. We applied HASCAD to analyze the liver cancer samples of TCGA-LIHC, and found that there were significant associations of the predicted abundance of Treg and effector CD8 T cell with patients’ overall survival. CONCLUSION: HASCAD could predict the cell composition of the PBMC bulk RNA-seq and classify the cell type from pure bulk RNA-seq. The model of HASCAD is available at https://github.com/holiday01/HASCAD. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12920-023-01674-w.
format Online
Article
Text
id pubmed-10619225
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-106192252023-11-02 HArmonized single-cell RNA-seq Cell type Assisted Deconvolution (HASCAD) Chiu, Yen-Jung Ni, Chung-En Huang, Yen-Hua BMC Med Genomics Research BACKGROUND: Cell composition deconvolution (CCD) is a type of bioinformatic task to estimate the cell fractions from bulk gene expression profiles, such as RNA-seq. Many CCD models were developed to perform linear regression analysis using reference gene expression signatures of distinct cell types. Reference gene expression signatures could be generated from cell-specific gene expression profiles, such as scRNA-seq. However, the batch effects and dropout events frequently observed across scRNA-seq datasets have limited the performances of CCD methods. METHODS: We developed a deep neural network (DNN) model, HASCAD, to predict the cell fractions of up to 15 immune cell types. HASCAD was trained using the bulk RNA-seq simulated from three scRNA-seq datasets that have been normalized by using a Harmony-Symphony based strategy. Mean square error and Pearson correlation coefficient were used to compare the performance of HASCAD with those of other widely used CCD methods. Two types of datasets, including a set of simulated bulk RNA-seq, and three human PBMC RNA-seq datasets, were arranged to conduct the benchmarks. RESULTS: HASCAD is useful for the investigation of the impacts of immune cell heterogeneity on the therapeutic effects of immune checkpoint inhibitors, since the target cell types include the ones known to play a role in anti-tumor immunity, such as three subtypes of CD8 T cells and three subtypes of CD4 T cells. We found that the removal of batch effects in the reference scRNA-seq datasets could benefit the task of CCD. Our benchmarks showed that HASCAD is more suitable for analyzing bulk RNA-seq data, compared with the two widely used CCD methods, CIBERSORTx and quanTIseq. We applied HASCAD to analyze the liver cancer samples of TCGA-LIHC, and found that there were significant associations of the predicted abundance of Treg and effector CD8 T cell with patients’ overall survival. CONCLUSION: HASCAD could predict the cell composition of the PBMC bulk RNA-seq and classify the cell type from pure bulk RNA-seq. The model of HASCAD is available at https://github.com/holiday01/HASCAD. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12920-023-01674-w. BioMed Central 2023-10-31 /pmc/articles/PMC10619225/ /pubmed/37907883 http://dx.doi.org/10.1186/s12920-023-01674-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Chiu, Yen-Jung
Ni, Chung-En
Huang, Yen-Hua
HArmonized single-cell RNA-seq Cell type Assisted Deconvolution (HASCAD)
title HArmonized single-cell RNA-seq Cell type Assisted Deconvolution (HASCAD)
title_full HArmonized single-cell RNA-seq Cell type Assisted Deconvolution (HASCAD)
title_fullStr HArmonized single-cell RNA-seq Cell type Assisted Deconvolution (HASCAD)
title_full_unstemmed HArmonized single-cell RNA-seq Cell type Assisted Deconvolution (HASCAD)
title_short HArmonized single-cell RNA-seq Cell type Assisted Deconvolution (HASCAD)
title_sort harmonized single-cell rna-seq cell type assisted deconvolution (hascad)
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10619225/
https://www.ncbi.nlm.nih.gov/pubmed/37907883
http://dx.doi.org/10.1186/s12920-023-01674-w
work_keys_str_mv AT chiuyenjung harmonizedsinglecellrnaseqcelltypeassisteddeconvolutionhascad
AT nichungen harmonizedsinglecellrnaseqcelltypeassisteddeconvolutionhascad
AT huangyenhua harmonizedsinglecellrnaseqcelltypeassisteddeconvolutionhascad