Cargando…

Partial mixture model for tight clustering of gene expression time-course

BACKGROUND: Tight clustering arose recently from a desire to obtain tighter and potentially more informative clusters in gene expression studies. Scattered genes with relatively loose correlations should be excluded from the clusters. However, in the literature there is little work dedicated to this...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yuan, Yinyin, Li, Chang-Tsun, Wilson, Roland
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2492882/ https://www.ncbi.nlm.nih.gov/pubmed/18564420 http://dx.doi.org/10.1186/1471-2105-9-287

_version_	1782158206412259328
author	Yuan, Yinyin Li, Chang-Tsun Wilson, Roland
author_facet	Yuan, Yinyin Li, Chang-Tsun Wilson, Roland
author_sort	Yuan, Yinyin
collection	PubMed
description	BACKGROUND: Tight clustering arose recently from a desire to obtain tighter and potentially more informative clusters in gene expression studies. Scattered genes with relatively loose correlations should be excluded from the clusters. However, in the literature there is little work dedicated to this area of research. On the other hand, there has been extensive use of maximum likelihood techniques for model parameter estimation. By contrast, the minimum distance estimator has been largely ignored. RESULTS: In this paper we show the inherent robustness of the minimum distance estimator that makes it a powerful tool for parameter estimation in model-based time-course clustering. To apply minimum distance estimation, a partial mixture model that can naturally incorporate replicate information and allow scattered genes is formulated. We provide experimental results of simulated data fitting, where the minimum distance estimator demonstrates superior performance to the maximum likelihood estimator. Both biological and statistical validations are conducted on a simulated dataset and two real gene expression datasets. Our proposed partial regression clustering algorithm scores top in Gene Ontology driven evaluation, in comparison with four other popular clustering algorithms. CONCLUSION: For the first time partial mixture model is successfully extended to time-course data analysis. The robustness of our partial regression clustering algorithm proves the suitability of the combination of both partial mixture model and minimum distance estimator in this field. We show that tight clustering not only is capable to generate more profound understanding of the dataset under study well in accordance to established biological knowledge, but also presents interesting new hypotheses during interpretation of clustering results. In particular, we provide biological evidences that scattered genes can be relevant and are interesting subjects for study, in contrast to prevailing opinion.
format	Text
id	pubmed-2492882
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-24928822008-08-01 Partial mixture model for tight clustering of gene expression time-course Yuan, Yinyin Li, Chang-Tsun Wilson, Roland BMC Bioinformatics Methodology Article BACKGROUND: Tight clustering arose recently from a desire to obtain tighter and potentially more informative clusters in gene expression studies. Scattered genes with relatively loose correlations should be excluded from the clusters. However, in the literature there is little work dedicated to this area of research. On the other hand, there has been extensive use of maximum likelihood techniques for model parameter estimation. By contrast, the minimum distance estimator has been largely ignored. RESULTS: In this paper we show the inherent robustness of the minimum distance estimator that makes it a powerful tool for parameter estimation in model-based time-course clustering. To apply minimum distance estimation, a partial mixture model that can naturally incorporate replicate information and allow scattered genes is formulated. We provide experimental results of simulated data fitting, where the minimum distance estimator demonstrates superior performance to the maximum likelihood estimator. Both biological and statistical validations are conducted on a simulated dataset and two real gene expression datasets. Our proposed partial regression clustering algorithm scores top in Gene Ontology driven evaluation, in comparison with four other popular clustering algorithms. CONCLUSION: For the first time partial mixture model is successfully extended to time-course data analysis. The robustness of our partial regression clustering algorithm proves the suitability of the combination of both partial mixture model and minimum distance estimator in this field. We show that tight clustering not only is capable to generate more profound understanding of the dataset under study well in accordance to established biological knowledge, but also presents interesting new hypotheses during interpretation of clustering results. In particular, we provide biological evidences that scattered genes can be relevant and are interesting subjects for study, in contrast to prevailing opinion. BioMed Central 2008-06-18 /pmc/articles/PMC2492882/ /pubmed/18564420 http://dx.doi.org/10.1186/1471-2105-9-287 Text en Copyright © 2008 Yuan et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Yuan, Yinyin Li, Chang-Tsun Wilson, Roland Partial mixture model for tight clustering of gene expression time-course
title	Partial mixture model for tight clustering of gene expression time-course
title_full	Partial mixture model for tight clustering of gene expression time-course
title_fullStr	Partial mixture model for tight clustering of gene expression time-course
title_full_unstemmed	Partial mixture model for tight clustering of gene expression time-course
title_short	Partial mixture model for tight clustering of gene expression time-course
title_sort	partial mixture model for tight clustering of gene expression time-course
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2492882/ https://www.ncbi.nlm.nih.gov/pubmed/18564420 http://dx.doi.org/10.1186/1471-2105-9-287
work_keys_str_mv	AT yuanyinyin partialmixturemodelfortightclusteringofgeneexpressiontimecourse AT lichangtsun partialmixturemodelfortightclusteringofgeneexpressiontimecourse AT wilsonroland partialmixturemodelfortightclusteringofgeneexpressiontimecourse

Partial mixture model for tight clustering of gene expression time-course

Ejemplares similares