Cargando…
Semi-CAM: A semi-supervised deconvolution method for bulk transcriptomic data with partial marker gene information
Deconvolution of bulk transcriptomics data from mixed cell populations is vital to identify the cellular mechanism of complex diseases. Existing deconvolution approaches can be divided into two major groups: supervised and unsupervised methods. Supervised deconvolution methods use cell type-specific...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7096458/ https://www.ncbi.nlm.nih.gov/pubmed/32214192 http://dx.doi.org/10.1038/s41598-020-62330-2 |
_version_ | 1783510808319229952 |
---|---|
author | Dong, Li Kollipara, Avinash Darville, Toni Zou, Fei Zheng, Xiaojing |
author_facet | Dong, Li Kollipara, Avinash Darville, Toni Zou, Fei Zheng, Xiaojing |
author_sort | Dong, Li |
collection | PubMed |
description | Deconvolution of bulk transcriptomics data from mixed cell populations is vital to identify the cellular mechanism of complex diseases. Existing deconvolution approaches can be divided into two major groups: supervised and unsupervised methods. Supervised deconvolution methods use cell type-specific prior information including cell proportions, reference cell type-specific gene signatures, or marker genes for each cell type, which may not be available in practice. Unsupervised methods, such as non-negative matrix factorization (NMF) and Convex Analysis of Mixtures (CAM), in contrast, completely disregard prior information and thus are not efficient for data with partial cell type-specific information. In this paper, we propose a semi-supervised deconvolution method, semi-CAM, that extends CAM by utilizing marker information from partial cell types. Analysis of simulation and two benchmark data have demonstrated that semi-CAM outperforms CAM by yielding more accurate cell proportion estimations when markers from partial/all cell types are available. In addition, when markers from all cell types are available, semi-CAM achieves better or similar accuracy compared to the supervised method using signature genes, CIBERSORT, and the marker-based supervised methods semi-NMF and DSA. Furthermore, analysis of human chlamydia-infection data with bulk expression profiles from six cell types and prior marker information of only three cell types suggests that semi-CAM achieves more accurate cell proportion estimations than CAM. |
format | Online Article Text |
id | pubmed-7096458 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-70964582020-03-30 Semi-CAM: A semi-supervised deconvolution method for bulk transcriptomic data with partial marker gene information Dong, Li Kollipara, Avinash Darville, Toni Zou, Fei Zheng, Xiaojing Sci Rep Article Deconvolution of bulk transcriptomics data from mixed cell populations is vital to identify the cellular mechanism of complex diseases. Existing deconvolution approaches can be divided into two major groups: supervised and unsupervised methods. Supervised deconvolution methods use cell type-specific prior information including cell proportions, reference cell type-specific gene signatures, or marker genes for each cell type, which may not be available in practice. Unsupervised methods, such as non-negative matrix factorization (NMF) and Convex Analysis of Mixtures (CAM), in contrast, completely disregard prior information and thus are not efficient for data with partial cell type-specific information. In this paper, we propose a semi-supervised deconvolution method, semi-CAM, that extends CAM by utilizing marker information from partial cell types. Analysis of simulation and two benchmark data have demonstrated that semi-CAM outperforms CAM by yielding more accurate cell proportion estimations when markers from partial/all cell types are available. In addition, when markers from all cell types are available, semi-CAM achieves better or similar accuracy compared to the supervised method using signature genes, CIBERSORT, and the marker-based supervised methods semi-NMF and DSA. Furthermore, analysis of human chlamydia-infection data with bulk expression profiles from six cell types and prior marker information of only three cell types suggests that semi-CAM achieves more accurate cell proportion estimations than CAM. Nature Publishing Group UK 2020-03-25 /pmc/articles/PMC7096458/ /pubmed/32214192 http://dx.doi.org/10.1038/s41598-020-62330-2 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Dong, Li Kollipara, Avinash Darville, Toni Zou, Fei Zheng, Xiaojing Semi-CAM: A semi-supervised deconvolution method for bulk transcriptomic data with partial marker gene information |
title | Semi-CAM: A semi-supervised deconvolution method for bulk transcriptomic data with partial marker gene information |
title_full | Semi-CAM: A semi-supervised deconvolution method for bulk transcriptomic data with partial marker gene information |
title_fullStr | Semi-CAM: A semi-supervised deconvolution method for bulk transcriptomic data with partial marker gene information |
title_full_unstemmed | Semi-CAM: A semi-supervised deconvolution method for bulk transcriptomic data with partial marker gene information |
title_short | Semi-CAM: A semi-supervised deconvolution method for bulk transcriptomic data with partial marker gene information |
title_sort | semi-cam: a semi-supervised deconvolution method for bulk transcriptomic data with partial marker gene information |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7096458/ https://www.ncbi.nlm.nih.gov/pubmed/32214192 http://dx.doi.org/10.1038/s41598-020-62330-2 |
work_keys_str_mv | AT dongli semicamasemisuperviseddeconvolutionmethodforbulktranscriptomicdatawithpartialmarkergeneinformation AT kolliparaavinash semicamasemisuperviseddeconvolutionmethodforbulktranscriptomicdatawithpartialmarkergeneinformation AT darvilletoni semicamasemisuperviseddeconvolutionmethodforbulktranscriptomicdatawithpartialmarkergeneinformation AT zoufei semicamasemisuperviseddeconvolutionmethodforbulktranscriptomicdatawithpartialmarkergeneinformation AT zhengxiaojing semicamasemisuperviseddeconvolutionmethodforbulktranscriptomicdatawithpartialmarkergeneinformation |