Cargando…
CDSeqR: fast complete deconvolution for gene expression data from bulk tissues
BACKGROUND: Biological tissues consist of heterogenous populations of cells. Because gene expression patterns from bulk tissue samples reflect the contributions from all cells in the tissue, understanding the contribution of individual cell types to the overall gene expression in the tissue is funda...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8142515/ https://www.ncbi.nlm.nih.gov/pubmed/34030626 http://dx.doi.org/10.1186/s12859-021-04186-5 |
_version_ | 1783696568824627200 |
---|---|
author | Kang, Kai Huang, Caizhi Li, Yuanyuan Umbach, David M. Li, Leping |
author_facet | Kang, Kai Huang, Caizhi Li, Yuanyuan Umbach, David M. Li, Leping |
author_sort | Kang, Kai |
collection | PubMed |
description | BACKGROUND: Biological tissues consist of heterogenous populations of cells. Because gene expression patterns from bulk tissue samples reflect the contributions from all cells in the tissue, understanding the contribution of individual cell types to the overall gene expression in the tissue is fundamentally important. We recently developed a computational method, CDSeq, that can simultaneously estimate both sample-specific cell-type proportions and cell-type-specific gene expression profiles using only bulk RNA-Seq counts from multiple samples. Here we present an R implementation of CDSeq (CDSeqR) with significant performance improvement over the original implementation in MATLAB and an added new function to aid cell type annotation. The R package would be of interest for the broader R community. RESULT: We developed a novel strategy to substantially improve computational efficiency in both speed and memory usage. In addition, we designed and implemented a new function for annotating the CDSeq estimated cell types using single-cell RNA sequencing (scRNA-seq) data. This function allows users to readily interpret and visualize the CDSeq estimated cell types. In addition, this new function further allows the users to annotate CDSeq-estimated cell types using marker genes. We carried out additional validations of the CDSeqR software using synthetic, real cell mixtures, and real bulk RNA-seq data from the Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) project. CONCLUSIONS: The existing bulk RNA-seq repositories, such as TCGA and GTEx, provide enormous resources for better understanding changes in transcriptomics and human diseases. They are also potentially useful for studying cell–cell interactions in the tissue microenvironment. Bulk level analyses neglect tissue heterogeneity, however, and hinder investigation of a cell-type-specific expression. The CDSeqR package may aid in silico dissection of bulk expression data, enabling researchers to recover cell-type-specific information. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04186-5. |
format | Online Article Text |
id | pubmed-8142515 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-81425152021-05-25 CDSeqR: fast complete deconvolution for gene expression data from bulk tissues Kang, Kai Huang, Caizhi Li, Yuanyuan Umbach, David M. Li, Leping BMC Bioinformatics Software BACKGROUND: Biological tissues consist of heterogenous populations of cells. Because gene expression patterns from bulk tissue samples reflect the contributions from all cells in the tissue, understanding the contribution of individual cell types to the overall gene expression in the tissue is fundamentally important. We recently developed a computational method, CDSeq, that can simultaneously estimate both sample-specific cell-type proportions and cell-type-specific gene expression profiles using only bulk RNA-Seq counts from multiple samples. Here we present an R implementation of CDSeq (CDSeqR) with significant performance improvement over the original implementation in MATLAB and an added new function to aid cell type annotation. The R package would be of interest for the broader R community. RESULT: We developed a novel strategy to substantially improve computational efficiency in both speed and memory usage. In addition, we designed and implemented a new function for annotating the CDSeq estimated cell types using single-cell RNA sequencing (scRNA-seq) data. This function allows users to readily interpret and visualize the CDSeq estimated cell types. In addition, this new function further allows the users to annotate CDSeq-estimated cell types using marker genes. We carried out additional validations of the CDSeqR software using synthetic, real cell mixtures, and real bulk RNA-seq data from the Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) project. CONCLUSIONS: The existing bulk RNA-seq repositories, such as TCGA and GTEx, provide enormous resources for better understanding changes in transcriptomics and human diseases. They are also potentially useful for studying cell–cell interactions in the tissue microenvironment. Bulk level analyses neglect tissue heterogeneity, however, and hinder investigation of a cell-type-specific expression. The CDSeqR package may aid in silico dissection of bulk expression data, enabling researchers to recover cell-type-specific information. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04186-5. BioMed Central 2021-05-24 /pmc/articles/PMC8142515/ /pubmed/34030626 http://dx.doi.org/10.1186/s12859-021-04186-5 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Software Kang, Kai Huang, Caizhi Li, Yuanyuan Umbach, David M. Li, Leping CDSeqR: fast complete deconvolution for gene expression data from bulk tissues |
title | CDSeqR: fast complete deconvolution for gene expression data from bulk tissues |
title_full | CDSeqR: fast complete deconvolution for gene expression data from bulk tissues |
title_fullStr | CDSeqR: fast complete deconvolution for gene expression data from bulk tissues |
title_full_unstemmed | CDSeqR: fast complete deconvolution for gene expression data from bulk tissues |
title_short | CDSeqR: fast complete deconvolution for gene expression data from bulk tissues |
title_sort | cdseqr: fast complete deconvolution for gene expression data from bulk tissues |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8142515/ https://www.ncbi.nlm.nih.gov/pubmed/34030626 http://dx.doi.org/10.1186/s12859-021-04186-5 |
work_keys_str_mv | AT kangkai cdseqrfastcompletedeconvolutionforgeneexpressiondatafrombulktissues AT huangcaizhi cdseqrfastcompletedeconvolutionforgeneexpressiondatafrombulktissues AT liyuanyuan cdseqrfastcompletedeconvolutionforgeneexpressiondatafrombulktissues AT umbachdavidm cdseqrfastcompletedeconvolutionforgeneexpressiondatafrombulktissues AT lileping cdseqrfastcompletedeconvolutionforgeneexpressiondatafrombulktissues |