Cargando…

Mathematical modelling of transcriptional heterogeneity identifies novel markers and subpopulations in complex tissues

Tissue heterogeneity is both a major confounding factor and an underexploited information source. While a handful of reports have demonstrated the potential of supervised computational methods to deconvolute tissue heterogeneity, these approaches require a priori information on the marker genes or c...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Niya, Hoffman, Eric P., Chen, Lulu, Chen, Li, Zhang, Zhen, Liu, Chunyu, Yu, Guoqiang, Herrington, David M., Clarke, Robert, Wang, Yue
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4703969/
https://www.ncbi.nlm.nih.gov/pubmed/26739359
http://dx.doi.org/10.1038/srep18909
_version_ 1782408816592158720
author Wang, Niya
Hoffman, Eric P.
Chen, Lulu
Chen, Li
Zhang, Zhen
Liu, Chunyu
Yu, Guoqiang
Herrington, David M.
Clarke, Robert
Wang, Yue
author_facet Wang, Niya
Hoffman, Eric P.
Chen, Lulu
Chen, Li
Zhang, Zhen
Liu, Chunyu
Yu, Guoqiang
Herrington, David M.
Clarke, Robert
Wang, Yue
author_sort Wang, Niya
collection PubMed
description Tissue heterogeneity is both a major confounding factor and an underexploited information source. While a handful of reports have demonstrated the potential of supervised computational methods to deconvolute tissue heterogeneity, these approaches require a priori information on the marker genes or composition of known subpopulations. To address the critical problem of the absence of validated marker genes for many (including novel) subpopulations, we describe convex analysis of mixtures (CAM), a fully unsupervised in silico method, for identifying subpopulation marker genes directly from the original mixed gene expressions in scatter space that can improve molecular analyses in many biological contexts. Validated with predesigned mixtures, CAM on the gene expression data from peripheral leukocytes, brain tissue, and yeast cell cycle, revealed novel marker genes that were otherwise undetectable using existing methods. Importantly, CAM requires no a priori information on the number, identity, or composition of the subpopulations present in mixed samples, and does not require the presence of pure subpopulations in sample space. This advantage is significant in that CAM can achieve all of its goals using only a small number of heterogeneous samples, and is more powerful to distinguish between phenotypically similar subpopulations.
format Online
Article
Text
id pubmed-4703969
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-47039692016-01-19 Mathematical modelling of transcriptional heterogeneity identifies novel markers and subpopulations in complex tissues Wang, Niya Hoffman, Eric P. Chen, Lulu Chen, Li Zhang, Zhen Liu, Chunyu Yu, Guoqiang Herrington, David M. Clarke, Robert Wang, Yue Sci Rep Article Tissue heterogeneity is both a major confounding factor and an underexploited information source. While a handful of reports have demonstrated the potential of supervised computational methods to deconvolute tissue heterogeneity, these approaches require a priori information on the marker genes or composition of known subpopulations. To address the critical problem of the absence of validated marker genes for many (including novel) subpopulations, we describe convex analysis of mixtures (CAM), a fully unsupervised in silico method, for identifying subpopulation marker genes directly from the original mixed gene expressions in scatter space that can improve molecular analyses in many biological contexts. Validated with predesigned mixtures, CAM on the gene expression data from peripheral leukocytes, brain tissue, and yeast cell cycle, revealed novel marker genes that were otherwise undetectable using existing methods. Importantly, CAM requires no a priori information on the number, identity, or composition of the subpopulations present in mixed samples, and does not require the presence of pure subpopulations in sample space. This advantage is significant in that CAM can achieve all of its goals using only a small number of heterogeneous samples, and is more powerful to distinguish between phenotypically similar subpopulations. Nature Publishing Group 2016-01-07 /pmc/articles/PMC4703969/ /pubmed/26739359 http://dx.doi.org/10.1038/srep18909 Text en Copyright © 2016, Macmillan Publishers Limited http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Wang, Niya
Hoffman, Eric P.
Chen, Lulu
Chen, Li
Zhang, Zhen
Liu, Chunyu
Yu, Guoqiang
Herrington, David M.
Clarke, Robert
Wang, Yue
Mathematical modelling of transcriptional heterogeneity identifies novel markers and subpopulations in complex tissues
title Mathematical modelling of transcriptional heterogeneity identifies novel markers and subpopulations in complex tissues
title_full Mathematical modelling of transcriptional heterogeneity identifies novel markers and subpopulations in complex tissues
title_fullStr Mathematical modelling of transcriptional heterogeneity identifies novel markers and subpopulations in complex tissues
title_full_unstemmed Mathematical modelling of transcriptional heterogeneity identifies novel markers and subpopulations in complex tissues
title_short Mathematical modelling of transcriptional heterogeneity identifies novel markers and subpopulations in complex tissues
title_sort mathematical modelling of transcriptional heterogeneity identifies novel markers and subpopulations in complex tissues
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4703969/
https://www.ncbi.nlm.nih.gov/pubmed/26739359
http://dx.doi.org/10.1038/srep18909
work_keys_str_mv AT wangniya mathematicalmodellingoftranscriptionalheterogeneityidentifiesnovelmarkersandsubpopulationsincomplextissues
AT hoffmanericp mathematicalmodellingoftranscriptionalheterogeneityidentifiesnovelmarkersandsubpopulationsincomplextissues
AT chenlulu mathematicalmodellingoftranscriptionalheterogeneityidentifiesnovelmarkersandsubpopulationsincomplextissues
AT chenli mathematicalmodellingoftranscriptionalheterogeneityidentifiesnovelmarkersandsubpopulationsincomplextissues
AT zhangzhen mathematicalmodellingoftranscriptionalheterogeneityidentifiesnovelmarkersandsubpopulationsincomplextissues
AT liuchunyu mathematicalmodellingoftranscriptionalheterogeneityidentifiesnovelmarkersandsubpopulationsincomplextissues
AT yuguoqiang mathematicalmodellingoftranscriptionalheterogeneityidentifiesnovelmarkersandsubpopulationsincomplextissues
AT herringtondavidm mathematicalmodellingoftranscriptionalheterogeneityidentifiesnovelmarkersandsubpopulationsincomplextissues
AT clarkerobert mathematicalmodellingoftranscriptionalheterogeneityidentifiesnovelmarkersandsubpopulationsincomplextissues
AT wangyue mathematicalmodellingoftranscriptionalheterogeneityidentifiesnovelmarkersandsubpopulationsincomplextissues