Cargando…

Outcome-Driven Cluster Analysis with Application to Microarray Data

One goal of cluster analysis is to sort characteristics into groups (clusters) so that those in the same group are more highly correlated to each other than they are to those in other groups. An example is the search for groups of genes whose expression of RNA is correlated in a population of patien...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hsu, Jessie J., Finkelstein, Dianne M., Schoenfeld, David A.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2015
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4643008/ https://www.ncbi.nlm.nih.gov/pubmed/26562156 http://dx.doi.org/10.1371/journal.pone.0141874

_version_	1782400450050392064
author	Hsu, Jessie J. Finkelstein, Dianne M. Schoenfeld, David A.
author_facet	Hsu, Jessie J. Finkelstein, Dianne M. Schoenfeld, David A.
author_sort	Hsu, Jessie J.
collection	PubMed
description	One goal of cluster analysis is to sort characteristics into groups (clusters) so that those in the same group are more highly correlated to each other than they are to those in other groups. An example is the search for groups of genes whose expression of RNA is correlated in a population of patients. These genes would be of greater interest if their common level of RNA expression were additionally predictive of the clinical outcome. This issue arose in the context of a study of trauma patients on whom RNA samples were available. The question of interest was whether there were groups of genes that were behaving similarly, and whether each gene in the cluster would have a similar effect on who would recover. For this, we develop an algorithm to simultaneously assign characteristics (genes) into groups of highly correlated genes that have the same effect on the outcome (recovery). We propose a random effects model where the genes within each group (cluster) equal the sum of a random effect, specific to the observation and cluster, and an independent error term. The outcome variable is a linear combination of the random effects of each cluster. To fit the model, we implement a Markov chain Monte Carlo algorithm based on the likelihood of the observed data. We evaluate the effect of including outcome in the model through simulation studies and describe a strategy for prediction. These methods are applied to trauma data from the Inflammation and Host Response to Injury research program, revealing a clustering of the genes that are informed by the recovery outcome.
format	Online Article Text
id	pubmed-4643008
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-46430082015-11-18 Outcome-Driven Cluster Analysis with Application to Microarray Data Hsu, Jessie J. Finkelstein, Dianne M. Schoenfeld, David A. PLoS One Research Article One goal of cluster analysis is to sort characteristics into groups (clusters) so that those in the same group are more highly correlated to each other than they are to those in other groups. An example is the search for groups of genes whose expression of RNA is correlated in a population of patients. These genes would be of greater interest if their common level of RNA expression were additionally predictive of the clinical outcome. This issue arose in the context of a study of trauma patients on whom RNA samples were available. The question of interest was whether there were groups of genes that were behaving similarly, and whether each gene in the cluster would have a similar effect on who would recover. For this, we develop an algorithm to simultaneously assign characteristics (genes) into groups of highly correlated genes that have the same effect on the outcome (recovery). We propose a random effects model where the genes within each group (cluster) equal the sum of a random effect, specific to the observation and cluster, and an independent error term. The outcome variable is a linear combination of the random effects of each cluster. To fit the model, we implement a Markov chain Monte Carlo algorithm based on the likelihood of the observed data. We evaluate the effect of including outcome in the model through simulation studies and describe a strategy for prediction. These methods are applied to trauma data from the Inflammation and Host Response to Injury research program, revealing a clustering of the genes that are informed by the recovery outcome. Public Library of Science 2015-11-12 /pmc/articles/PMC4643008/ /pubmed/26562156 http://dx.doi.org/10.1371/journal.pone.0141874 Text en © 2015 Hsu et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Hsu, Jessie J. Finkelstein, Dianne M. Schoenfeld, David A. Outcome-Driven Cluster Analysis with Application to Microarray Data
title	Outcome-Driven Cluster Analysis with Application to Microarray Data
title_full	Outcome-Driven Cluster Analysis with Application to Microarray Data
title_fullStr	Outcome-Driven Cluster Analysis with Application to Microarray Data
title_full_unstemmed	Outcome-Driven Cluster Analysis with Application to Microarray Data
title_short	Outcome-Driven Cluster Analysis with Application to Microarray Data
title_sort	outcome-driven cluster analysis with application to microarray data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4643008/ https://www.ncbi.nlm.nih.gov/pubmed/26562156 http://dx.doi.org/10.1371/journal.pone.0141874
work_keys_str_mv	AT hsujessiej outcomedrivenclusteranalysiswithapplicationtomicroarraydata AT finkelsteindiannem outcomedrivenclusteranalysiswithapplicationtomicroarraydata AT schoenfelddavida outcomedrivenclusteranalysiswithapplicationtomicroarraydata

Outcome-Driven Cluster Analysis with Application to Microarray Data

Ejemplares similares