Cargando…

Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes

BACKGROUND: Commonly employed clustering methods for analysis of gene expression data do not directly incorporate phenotypic data about the samples. Furthermore, clustering of samples with known phenotypes is typically performed in an informal fashion. The inability of clustering algorithms to incor...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bushel, Pierre R, Wolfinger, Russell D, Gibson, Greg
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2007
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1839893/ https://www.ncbi.nlm.nih.gov/pubmed/17408499 http://dx.doi.org/10.1186/1752-0509-1-15

_version_	1782132871098531840
author	Bushel, Pierre R Wolfinger, Russell D Gibson, Greg
author_facet	Bushel, Pierre R Wolfinger, Russell D Gibson, Greg
author_sort	Bushel, Pierre R
collection	PubMed
description	BACKGROUND: Commonly employed clustering methods for analysis of gene expression data do not directly incorporate phenotypic data about the samples. Furthermore, clustering of samples with known phenotypes is typically performed in an informal fashion. The inability of clustering algorithms to incorporate biological data in the grouping process can limit proper interpretation of the data and its underlying biology. RESULTS: We present a more formal approach, the modk-prototypes algorithm, for clustering biological samples based on simultaneously considering microarray gene expression data and classes of known phenotypic variables such as clinical chemistry evaluations and histopathologic observations. The strategy involves constructing an objective function with the sum of the squared Euclidean distances for numeric microarray and clinical chemistry data and simple matching for histopathology categorical values in order to measure dissimilarity of the samples. Separate weighting terms are used for microarray, clinical chemistry and histopathology measurements to control the influence of each data domain on the clustering of the samples. The dynamic validity index for numeric data was modified with a category utility measure for determining the number of clusters in the data sets. A cluster's prototype, formed from the mean of the values for numeric features and the mode of the categorical values of all the samples in the group, is representative of the phenotype of the cluster members. The approach is shown to work well with a simulated mixed data set and two real data examples containing numeric and categorical data types. One from a heart disease study and another from acetaminophen (an analgesic) exposure in rat liver that causes centrilobular necrosis. CONCLUSION: The modk-prototypes algorithm partitioned the simulated data into clusters with samples in their respective class group and the heart disease samples into two groups (sick and buff denoting samples having pain type representative of angina and non-angina respectively) with an accuracy of 79%. This is on par with, or better than, the assignment accuracy of the heart disease samples by several well-known and successful clustering algorithms. Following modk-prototypes clustering of the acetaminophen-exposed samples, informative genes from the cluster prototypes were identified that are descriptive of, and phenotypically anchored to, levels of necrosis of the centrilobular region of the rat liver. The biological processes cell growth and/or maintenance, amine metabolism, and stress response were shown to discern between no and moderate levels of acetaminophen-induced centrilobular necrosis. The use of well-known and traditional measurements directly in the clustering provides some guarantee that the resulting clusters will be meaningfully interpretable.
format	Text
id	pubmed-1839893
institution	National Center for Biotechnology Information
language	English
publishDate	2007
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-18398932007-04-02 Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes Bushel, Pierre R Wolfinger, Russell D Gibson, Greg BMC Syst Biol Research Article BACKGROUND: Commonly employed clustering methods for analysis of gene expression data do not directly incorporate phenotypic data about the samples. Furthermore, clustering of samples with known phenotypes is typically performed in an informal fashion. The inability of clustering algorithms to incorporate biological data in the grouping process can limit proper interpretation of the data and its underlying biology. RESULTS: We present a more formal approach, the modk-prototypes algorithm, for clustering biological samples based on simultaneously considering microarray gene expression data and classes of known phenotypic variables such as clinical chemistry evaluations and histopathologic observations. The strategy involves constructing an objective function with the sum of the squared Euclidean distances for numeric microarray and clinical chemistry data and simple matching for histopathology categorical values in order to measure dissimilarity of the samples. Separate weighting terms are used for microarray, clinical chemistry and histopathology measurements to control the influence of each data domain on the clustering of the samples. The dynamic validity index for numeric data was modified with a category utility measure for determining the number of clusters in the data sets. A cluster's prototype, formed from the mean of the values for numeric features and the mode of the categorical values of all the samples in the group, is representative of the phenotype of the cluster members. The approach is shown to work well with a simulated mixed data set and two real data examples containing numeric and categorical data types. One from a heart disease study and another from acetaminophen (an analgesic) exposure in rat liver that causes centrilobular necrosis. CONCLUSION: The modk-prototypes algorithm partitioned the simulated data into clusters with samples in their respective class group and the heart disease samples into two groups (sick and buff denoting samples having pain type representative of angina and non-angina respectively) with an accuracy of 79%. This is on par with, or better than, the assignment accuracy of the heart disease samples by several well-known and successful clustering algorithms. Following modk-prototypes clustering of the acetaminophen-exposed samples, informative genes from the cluster prototypes were identified that are descriptive of, and phenotypically anchored to, levels of necrosis of the centrilobular region of the rat liver. The biological processes cell growth and/or maintenance, amine metabolism, and stress response were shown to discern between no and moderate levels of acetaminophen-induced centrilobular necrosis. The use of well-known and traditional measurements directly in the clustering provides some guarantee that the resulting clusters will be meaningfully interpretable. BioMed Central 2007-02-23 /pmc/articles/PMC1839893/ /pubmed/17408499 http://dx.doi.org/10.1186/1752-0509-1-15 Text en Copyright © 2007 Bushel et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Bushel, Pierre R Wolfinger, Russell D Gibson, Greg Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes
title	Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes
title_full	Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes
title_fullStr	Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes
title_full_unstemmed	Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes
title_short	Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes
title_sort	simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1839893/ https://www.ncbi.nlm.nih.gov/pubmed/17408499 http://dx.doi.org/10.1186/1752-0509-1-15
work_keys_str_mv	AT bushelpierrer simultaneousclusteringofgeneexpressiondatawithclinicalchemistryandpathologicalevaluationsrevealsphenotypicprototypes AT wolfingerrusselld simultaneousclusteringofgeneexpressiondatawithclinicalchemistryandpathologicalevaluationsrevealsphenotypicprototypes AT gibsongreg simultaneousclusteringofgeneexpressiondatawithclinicalchemistryandpathologicalevaluationsrevealsphenotypicprototypes

Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes

Ejemplares similares