Cargando…

CDSeqR: fast complete deconvolution for gene expression data from bulk tissues

BACKGROUND: Biological tissues consist of heterogenous populations of cells. Because gene expression patterns from bulk tissue samples reflect the contributions from all cells in the tissue, understanding the contribution of individual cell types to the overall gene expression in the tissue is funda...

Descripción completa

Detalles Bibliográficos
Autores principales: Kang, Kai, Huang, Caizhi, Li, Yuanyuan, Umbach, David M., Li, Leping
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8142515/
https://www.ncbi.nlm.nih.gov/pubmed/34030626
http://dx.doi.org/10.1186/s12859-021-04186-5
_version_ 1783696568824627200
author Kang, Kai
Huang, Caizhi
Li, Yuanyuan
Umbach, David M.
Li, Leping
author_facet Kang, Kai
Huang, Caizhi
Li, Yuanyuan
Umbach, David M.
Li, Leping
author_sort Kang, Kai
collection PubMed
description BACKGROUND: Biological tissues consist of heterogenous populations of cells. Because gene expression patterns from bulk tissue samples reflect the contributions from all cells in the tissue, understanding the contribution of individual cell types to the overall gene expression in the tissue is fundamentally important. We recently developed a computational method, CDSeq, that can simultaneously estimate both sample-specific cell-type proportions and cell-type-specific gene expression profiles using only bulk RNA-Seq counts from multiple samples. Here we present an R implementation of CDSeq (CDSeqR) with significant performance improvement over the original implementation in MATLAB and an added new function to aid cell type annotation. The R package would be of interest for the broader R community. RESULT: We developed a novel strategy to substantially improve computational efficiency in both speed and memory usage. In addition, we designed and implemented a new function for annotating the CDSeq estimated cell types using single-cell RNA sequencing (scRNA-seq) data. This function allows users to readily interpret and visualize the CDSeq estimated cell types. In addition, this new function further allows the users to annotate CDSeq-estimated cell types using marker genes. We carried out additional validations of the CDSeqR software using synthetic, real cell mixtures, and real bulk RNA-seq data from the Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) project. CONCLUSIONS: The existing bulk RNA-seq repositories, such as TCGA and GTEx, provide enormous resources for better understanding changes in transcriptomics and human diseases. They are also potentially useful for studying cell–cell interactions in the tissue microenvironment. Bulk level analyses neglect tissue heterogeneity, however, and hinder investigation of a cell-type-specific expression. The CDSeqR package may aid in silico dissection of bulk expression data, enabling researchers to recover cell-type-specific information. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04186-5.
format Online
Article
Text
id pubmed-8142515
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-81425152021-05-25 CDSeqR: fast complete deconvolution for gene expression data from bulk tissues Kang, Kai Huang, Caizhi Li, Yuanyuan Umbach, David M. Li, Leping BMC Bioinformatics Software BACKGROUND: Biological tissues consist of heterogenous populations of cells. Because gene expression patterns from bulk tissue samples reflect the contributions from all cells in the tissue, understanding the contribution of individual cell types to the overall gene expression in the tissue is fundamentally important. We recently developed a computational method, CDSeq, that can simultaneously estimate both sample-specific cell-type proportions and cell-type-specific gene expression profiles using only bulk RNA-Seq counts from multiple samples. Here we present an R implementation of CDSeq (CDSeqR) with significant performance improvement over the original implementation in MATLAB and an added new function to aid cell type annotation. The R package would be of interest for the broader R community. RESULT: We developed a novel strategy to substantially improve computational efficiency in both speed and memory usage. In addition, we designed and implemented a new function for annotating the CDSeq estimated cell types using single-cell RNA sequencing (scRNA-seq) data. This function allows users to readily interpret and visualize the CDSeq estimated cell types. In addition, this new function further allows the users to annotate CDSeq-estimated cell types using marker genes. We carried out additional validations of the CDSeqR software using synthetic, real cell mixtures, and real bulk RNA-seq data from the Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) project. CONCLUSIONS: The existing bulk RNA-seq repositories, such as TCGA and GTEx, provide enormous resources for better understanding changes in transcriptomics and human diseases. They are also potentially useful for studying cell–cell interactions in the tissue microenvironment. Bulk level analyses neglect tissue heterogeneity, however, and hinder investigation of a cell-type-specific expression. The CDSeqR package may aid in silico dissection of bulk expression data, enabling researchers to recover cell-type-specific information. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04186-5. BioMed Central 2021-05-24 /pmc/articles/PMC8142515/ /pubmed/34030626 http://dx.doi.org/10.1186/s12859-021-04186-5 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Kang, Kai
Huang, Caizhi
Li, Yuanyuan
Umbach, David M.
Li, Leping
CDSeqR: fast complete deconvolution for gene expression data from bulk tissues
title CDSeqR: fast complete deconvolution for gene expression data from bulk tissues
title_full CDSeqR: fast complete deconvolution for gene expression data from bulk tissues
title_fullStr CDSeqR: fast complete deconvolution for gene expression data from bulk tissues
title_full_unstemmed CDSeqR: fast complete deconvolution for gene expression data from bulk tissues
title_short CDSeqR: fast complete deconvolution for gene expression data from bulk tissues
title_sort cdseqr: fast complete deconvolution for gene expression data from bulk tissues
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8142515/
https://www.ncbi.nlm.nih.gov/pubmed/34030626
http://dx.doi.org/10.1186/s12859-021-04186-5
work_keys_str_mv AT kangkai cdseqrfastcompletedeconvolutionforgeneexpressiondatafrombulktissues
AT huangcaizhi cdseqrfastcompletedeconvolutionforgeneexpressiondatafrombulktissues
AT liyuanyuan cdseqrfastcompletedeconvolutionforgeneexpressiondatafrombulktissues
AT umbachdavidm cdseqrfastcompletedeconvolutionforgeneexpressiondatafrombulktissues
AT lileping cdseqrfastcompletedeconvolutionforgeneexpressiondatafrombulktissues