Cargando…

Visualizing the structure of RNA-seq expression data using grade of membership models

Grade of membership models, also known as “admixture models”, “topic models” or “Latent Dirichlet Allocation”, are a generalization of cluster models that allow each sample to have membership in multiple clusters. These models are widely used in population genetics to model admixed individuals who h...

Descripción completa

Detalles Bibliográficos
Autores principales: Dey, Kushal K., Hsiao, Chiaowen Joyce, Stephens, Matthew
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5363805/
https://www.ncbi.nlm.nih.gov/pubmed/28333934
http://dx.doi.org/10.1371/journal.pgen.1006599
_version_ 1782517212626550784
author Dey, Kushal K.
Hsiao, Chiaowen Joyce
Stephens, Matthew
author_facet Dey, Kushal K.
Hsiao, Chiaowen Joyce
Stephens, Matthew
author_sort Dey, Kushal K.
collection PubMed
description Grade of membership models, also known as “admixture models”, “topic models” or “Latent Dirichlet Allocation”, are a generalization of cluster models that allow each sample to have membership in multiple clusters. These models are widely used in population genetics to model admixed individuals who have ancestry from multiple “populations”, and in natural language processing to model documents having words from multiple “topics”. Here we illustrate the potential for these models to cluster samples of RNA-seq gene expression data, measured on either bulk samples or single cells. We also provide methods to help interpret the clusters, by identifying genes that are distinctively expressed in each cluster. By applying these methods to several example RNA-seq applications we demonstrate their utility in identifying and summarizing structure and heterogeneity. Applied to data from the GTEx project on 53 human tissues, the approach highlights similarities among biologically-related tissues and identifies distinctively-expressed genes that recapitulate known biology. Applied to single-cell expression data from mouse preimplantation embryos, the approach highlights both discrete and continuous variation through early embryonic development stages, and highlights genes involved in a variety of relevant processes—from germ cell development, through compaction and morula formation, to the formation of inner cell mass and trophoblast at the blastocyst stage. The methods are implemented in the Bioconductor package CountClust.
format Online
Article
Text
id pubmed-5363805
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-53638052017-04-06 Visualizing the structure of RNA-seq expression data using grade of membership models Dey, Kushal K. Hsiao, Chiaowen Joyce Stephens, Matthew PLoS Genet Research Article Grade of membership models, also known as “admixture models”, “topic models” or “Latent Dirichlet Allocation”, are a generalization of cluster models that allow each sample to have membership in multiple clusters. These models are widely used in population genetics to model admixed individuals who have ancestry from multiple “populations”, and in natural language processing to model documents having words from multiple “topics”. Here we illustrate the potential for these models to cluster samples of RNA-seq gene expression data, measured on either bulk samples or single cells. We also provide methods to help interpret the clusters, by identifying genes that are distinctively expressed in each cluster. By applying these methods to several example RNA-seq applications we demonstrate their utility in identifying and summarizing structure and heterogeneity. Applied to data from the GTEx project on 53 human tissues, the approach highlights similarities among biologically-related tissues and identifies distinctively-expressed genes that recapitulate known biology. Applied to single-cell expression data from mouse preimplantation embryos, the approach highlights both discrete and continuous variation through early embryonic development stages, and highlights genes involved in a variety of relevant processes—from germ cell development, through compaction and morula formation, to the formation of inner cell mass and trophoblast at the blastocyst stage. The methods are implemented in the Bioconductor package CountClust. Public Library of Science 2017-03-23 /pmc/articles/PMC5363805/ /pubmed/28333934 http://dx.doi.org/10.1371/journal.pgen.1006599 Text en © 2017 Dey et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Dey, Kushal K.
Hsiao, Chiaowen Joyce
Stephens, Matthew
Visualizing the structure of RNA-seq expression data using grade of membership models
title Visualizing the structure of RNA-seq expression data using grade of membership models
title_full Visualizing the structure of RNA-seq expression data using grade of membership models
title_fullStr Visualizing the structure of RNA-seq expression data using grade of membership models
title_full_unstemmed Visualizing the structure of RNA-seq expression data using grade of membership models
title_short Visualizing the structure of RNA-seq expression data using grade of membership models
title_sort visualizing the structure of rna-seq expression data using grade of membership models
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5363805/
https://www.ncbi.nlm.nih.gov/pubmed/28333934
http://dx.doi.org/10.1371/journal.pgen.1006599
work_keys_str_mv AT deykushalk visualizingthestructureofrnaseqexpressiondatausinggradeofmembershipmodels
AT hsiaochiaowenjoyce visualizingthestructureofrnaseqexpressiondatausinggradeofmembershipmodels
AT stephensmatthew visualizingthestructureofrnaseqexpressiondatausinggradeofmembershipmodels