Cargando…

Bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements

BACKGROUND: Post-genomic molecular biology has resulted in an explosion of data, providing measurements for large numbers of genes, proteins and metabolites. Time series experiments have become increasingly common, necessitating the development of novel analysis tools that capture the resulting data...

Descripción completa

Detalles Bibliográficos
Autores principales:	Cooke, Emma J, Savage, Richard S, Kirk, Paul DW, Darkins, Robert, Wild, David L
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3228548/ https://www.ncbi.nlm.nih.gov/pubmed/21995452 http://dx.doi.org/10.1186/1471-2105-12-399

_version_	1782217831191937024
author	Cooke, Emma J Savage, Richard S Kirk, Paul DW Darkins, Robert Wild, David L
author_facet	Cooke, Emma J Savage, Richard S Kirk, Paul DW Darkins, Robert Wild, David L
author_sort	Cooke, Emma J
collection	PubMed
description	BACKGROUND: Post-genomic molecular biology has resulted in an explosion of data, providing measurements for large numbers of genes, proteins and metabolites. Time series experiments have become increasingly common, necessitating the development of novel analysis tools that capture the resulting data structure. Outlier measurements at one or more time points present a significant challenge, while potentially valuable replicate information is often ignored by existing techniques. RESULTS: We present a generative model-based Bayesian hierarchical clustering algorithm for microarray time series that employs Gaussian process regression to capture the structure of the data. By using a mixture model likelihood, our method permits a small proportion of the data to be modelled as outlier measurements, and adopts an empirical Bayes approach which uses replicate observations to inform a prior distribution of the noise variance. The method automatically learns the optimum number of clusters and can incorporate non-uniformly sampled time points. Using a wide variety of experimental data sets, we show that our algorithm consistently yields higher quality and more biologically meaningful clusters than current state-of-the-art methodologies. We highlight the importance of modelling outlier values by demonstrating that noisy genes can be grouped with other genes of similar biological function. We demonstrate the importance of including replicate information, which we find enables the discrimination of additional distinct expression profiles. CONCLUSIONS: By incorporating outlier measurements and replicate values, this clustering algorithm for time series microarray data provides a step towards a better treatment of the noise inherent in measurements from high-throughput genomic technologies. Timeseries BHC is available as part of the R package 'BHC' (version 1.5), which is available for download from Bioconductor (version 2.9 and above) via http://www.bioconductor.org/packages/release/bioc/html/BHC.html?pagewanted=all.
format	Online Article Text
id	pubmed-3228548
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-32285482011-12-07 Bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements Cooke, Emma J Savage, Richard S Kirk, Paul DW Darkins, Robert Wild, David L BMC Bioinformatics Methodology Article BACKGROUND: Post-genomic molecular biology has resulted in an explosion of data, providing measurements for large numbers of genes, proteins and metabolites. Time series experiments have become increasingly common, necessitating the development of novel analysis tools that capture the resulting data structure. Outlier measurements at one or more time points present a significant challenge, while potentially valuable replicate information is often ignored by existing techniques. RESULTS: We present a generative model-based Bayesian hierarchical clustering algorithm for microarray time series that employs Gaussian process regression to capture the structure of the data. By using a mixture model likelihood, our method permits a small proportion of the data to be modelled as outlier measurements, and adopts an empirical Bayes approach which uses replicate observations to inform a prior distribution of the noise variance. The method automatically learns the optimum number of clusters and can incorporate non-uniformly sampled time points. Using a wide variety of experimental data sets, we show that our algorithm consistently yields higher quality and more biologically meaningful clusters than current state-of-the-art methodologies. We highlight the importance of modelling outlier values by demonstrating that noisy genes can be grouped with other genes of similar biological function. We demonstrate the importance of including replicate information, which we find enables the discrimination of additional distinct expression profiles. CONCLUSIONS: By incorporating outlier measurements and replicate values, this clustering algorithm for time series microarray data provides a step towards a better treatment of the noise inherent in measurements from high-throughput genomic technologies. Timeseries BHC is available as part of the R package 'BHC' (version 1.5), which is available for download from Bioconductor (version 2.9 and above) via http://www.bioconductor.org/packages/release/bioc/html/BHC.html?pagewanted=all. BioMed Central 2011-10-13 /pmc/articles/PMC3228548/ /pubmed/21995452 http://dx.doi.org/10.1186/1471-2105-12-399 Text en Copyright ©2011 Cooke et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Cooke, Emma J Savage, Richard S Kirk, Paul DW Darkins, Robert Wild, David L Bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements
title	Bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements
title_full	Bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements
title_fullStr	Bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements
title_full_unstemmed	Bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements
title_short	Bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements
title_sort	bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3228548/ https://www.ncbi.nlm.nih.gov/pubmed/21995452 http://dx.doi.org/10.1186/1471-2105-12-399
work_keys_str_mv	AT cookeemmaj bayesianhierarchicalclusteringformicroarraytimeseriesdatawithreplicatesandoutliermeasurements AT savagerichards bayesianhierarchicalclusteringformicroarraytimeseriesdatawithreplicatesandoutliermeasurements AT kirkpauldw bayesianhierarchicalclusteringformicroarraytimeseriesdatawithreplicatesandoutliermeasurements AT darkinsrobert bayesianhierarchicalclusteringformicroarraytimeseriesdatawithreplicatesandoutliermeasurements AT wilddavidl bayesianhierarchicalclusteringformicroarraytimeseriesdatawithreplicatesandoutliermeasurements

Bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements

Ejemplares similares