Cargando…

Semi-CAM: A semi-supervised deconvolution method for bulk transcriptomic data with partial marker gene information

Deconvolution of bulk transcriptomics data from mixed cell populations is vital to identify the cellular mechanism of complex diseases. Existing deconvolution approaches can be divided into two major groups: supervised and unsupervised methods. Supervised deconvolution methods use cell type-specific...

Descripción completa

Detalles Bibliográficos
Autores principales: Dong, Li, Kollipara, Avinash, Darville, Toni, Zou, Fei, Zheng, Xiaojing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7096458/
https://www.ncbi.nlm.nih.gov/pubmed/32214192
http://dx.doi.org/10.1038/s41598-020-62330-2
_version_ 1783510808319229952
author Dong, Li
Kollipara, Avinash
Darville, Toni
Zou, Fei
Zheng, Xiaojing
author_facet Dong, Li
Kollipara, Avinash
Darville, Toni
Zou, Fei
Zheng, Xiaojing
author_sort Dong, Li
collection PubMed
description Deconvolution of bulk transcriptomics data from mixed cell populations is vital to identify the cellular mechanism of complex diseases. Existing deconvolution approaches can be divided into two major groups: supervised and unsupervised methods. Supervised deconvolution methods use cell type-specific prior information including cell proportions, reference cell type-specific gene signatures, or marker genes for each cell type, which may not be available in practice. Unsupervised methods, such as non-negative matrix factorization (NMF) and Convex Analysis of Mixtures (CAM), in contrast, completely disregard prior information and thus are not efficient for data with partial cell type-specific information. In this paper, we propose a semi-supervised deconvolution method, semi-CAM, that extends CAM by utilizing marker information from partial cell types. Analysis of simulation and two benchmark data have demonstrated that semi-CAM outperforms CAM by yielding more accurate cell proportion estimations when markers from partial/all cell types are available. In addition, when markers from all cell types are available, semi-CAM achieves better or similar accuracy compared to the supervised method using signature genes, CIBERSORT, and the marker-based supervised methods semi-NMF and DSA. Furthermore, analysis of human chlamydia-infection data with bulk expression profiles from six cell types and prior marker information of only three cell types suggests that semi-CAM achieves more accurate cell proportion estimations than CAM.
format Online
Article
Text
id pubmed-7096458
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-70964582020-03-30 Semi-CAM: A semi-supervised deconvolution method for bulk transcriptomic data with partial marker gene information Dong, Li Kollipara, Avinash Darville, Toni Zou, Fei Zheng, Xiaojing Sci Rep Article Deconvolution of bulk transcriptomics data from mixed cell populations is vital to identify the cellular mechanism of complex diseases. Existing deconvolution approaches can be divided into two major groups: supervised and unsupervised methods. Supervised deconvolution methods use cell type-specific prior information including cell proportions, reference cell type-specific gene signatures, or marker genes for each cell type, which may not be available in practice. Unsupervised methods, such as non-negative matrix factorization (NMF) and Convex Analysis of Mixtures (CAM), in contrast, completely disregard prior information and thus are not efficient for data with partial cell type-specific information. In this paper, we propose a semi-supervised deconvolution method, semi-CAM, that extends CAM by utilizing marker information from partial cell types. Analysis of simulation and two benchmark data have demonstrated that semi-CAM outperforms CAM by yielding more accurate cell proportion estimations when markers from partial/all cell types are available. In addition, when markers from all cell types are available, semi-CAM achieves better or similar accuracy compared to the supervised method using signature genes, CIBERSORT, and the marker-based supervised methods semi-NMF and DSA. Furthermore, analysis of human chlamydia-infection data with bulk expression profiles from six cell types and prior marker information of only three cell types suggests that semi-CAM achieves more accurate cell proportion estimations than CAM. Nature Publishing Group UK 2020-03-25 /pmc/articles/PMC7096458/ /pubmed/32214192 http://dx.doi.org/10.1038/s41598-020-62330-2 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Dong, Li
Kollipara, Avinash
Darville, Toni
Zou, Fei
Zheng, Xiaojing
Semi-CAM: A semi-supervised deconvolution method for bulk transcriptomic data with partial marker gene information
title Semi-CAM: A semi-supervised deconvolution method for bulk transcriptomic data with partial marker gene information
title_full Semi-CAM: A semi-supervised deconvolution method for bulk transcriptomic data with partial marker gene information
title_fullStr Semi-CAM: A semi-supervised deconvolution method for bulk transcriptomic data with partial marker gene information
title_full_unstemmed Semi-CAM: A semi-supervised deconvolution method for bulk transcriptomic data with partial marker gene information
title_short Semi-CAM: A semi-supervised deconvolution method for bulk transcriptomic data with partial marker gene information
title_sort semi-cam: a semi-supervised deconvolution method for bulk transcriptomic data with partial marker gene information
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7096458/
https://www.ncbi.nlm.nih.gov/pubmed/32214192
http://dx.doi.org/10.1038/s41598-020-62330-2
work_keys_str_mv AT dongli semicamasemisuperviseddeconvolutionmethodforbulktranscriptomicdatawithpartialmarkergeneinformation
AT kolliparaavinash semicamasemisuperviseddeconvolutionmethodforbulktranscriptomicdatawithpartialmarkergeneinformation
AT darvilletoni semicamasemisuperviseddeconvolutionmethodforbulktranscriptomicdatawithpartialmarkergeneinformation
AT zoufei semicamasemisuperviseddeconvolutionmethodforbulktranscriptomicdatawithpartialmarkergeneinformation
AT zhengxiaojing semicamasemisuperviseddeconvolutionmethodforbulktranscriptomicdatawithpartialmarkergeneinformation