Cargando…

Interpolation based consensus clustering for gene expression time series

BACKGROUND: Unsupervised analyses such as clustering are the essential tools required to interpret time-series expression data from microarrays. Several clustering algorithms have been developed to analyze gene expression data. Early methods such as k-means, hierarchical clustering, and self-organiz...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chiu, Tai-Yu, Hsu, Ting-Chieh, Yen, Chia-Cheng, Wang, Jia-Shung
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4407314/ https://www.ncbi.nlm.nih.gov/pubmed/25888019 http://dx.doi.org/10.1186/s12859-015-0541-0

_version_	1782367886948433920
author	Chiu, Tai-Yu Hsu, Ting-Chieh Yen, Chia-Cheng Wang, Jia-Shung
author_facet	Chiu, Tai-Yu Hsu, Ting-Chieh Yen, Chia-Cheng Wang, Jia-Shung
author_sort	Chiu, Tai-Yu
collection	PubMed
description	BACKGROUND: Unsupervised analyses such as clustering are the essential tools required to interpret time-series expression data from microarrays. Several clustering algorithms have been developed to analyze gene expression data. Early methods such as k-means, hierarchical clustering, and self-organizing maps are popular for their simplicity. However, because of noise and uncertainty of measurement, these common algorithms have low accuracy. Moreover, because gene expression is a temporal process, the relationship between successive time points should be considered in the analyses. In addition, biological processes are generally continuous; therefore, the datasets collected from time series experiments are often found to have an insufficient number of data points and, as a result, compensation for missing data can also be an issue. RESULTS: An affinity propagation-based clustering algorithm for time-series gene expression data is proposed. The algorithm explores the relationship between genes using a sliding-window mechanism to extract a large number of features. In addition, the time-course datasets are resampled with spline interpolation to predict the unobserved values. Finally, a consensus process is applied to enhance the robustness of the method. Some real gene expression datasets were analyzed to demonstrate the accuracy and efficiency of the algorithm. CONCLUSION: The proposed algorithm has benefitted from the use of cubic B-splines interpolation, sliding-window, affinity propagation, gene relativity graph, and a consensus process, and, as a result, provides both appropriate and effective clustering of time-series gene expression data. The proposed method was tested with gene expression data from the Yeast galactose dataset, the Yeast cell-cycle dataset (Y5), and the Yeast sporulation dataset, and the results illustrated the relationships between the expressed genes, which may give some insights into the biological processes involved.
format	Online Article Text
id	pubmed-4407314
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-44073142015-04-24 Interpolation based consensus clustering for gene expression time series Chiu, Tai-Yu Hsu, Ting-Chieh Yen, Chia-Cheng Wang, Jia-Shung BMC Bioinformatics Research Article BACKGROUND: Unsupervised analyses such as clustering are the essential tools required to interpret time-series expression data from microarrays. Several clustering algorithms have been developed to analyze gene expression data. Early methods such as k-means, hierarchical clustering, and self-organizing maps are popular for their simplicity. However, because of noise and uncertainty of measurement, these common algorithms have low accuracy. Moreover, because gene expression is a temporal process, the relationship between successive time points should be considered in the analyses. In addition, biological processes are generally continuous; therefore, the datasets collected from time series experiments are often found to have an insufficient number of data points and, as a result, compensation for missing data can also be an issue. RESULTS: An affinity propagation-based clustering algorithm for time-series gene expression data is proposed. The algorithm explores the relationship between genes using a sliding-window mechanism to extract a large number of features. In addition, the time-course datasets are resampled with spline interpolation to predict the unobserved values. Finally, a consensus process is applied to enhance the robustness of the method. Some real gene expression datasets were analyzed to demonstrate the accuracy and efficiency of the algorithm. CONCLUSION: The proposed algorithm has benefitted from the use of cubic B-splines interpolation, sliding-window, affinity propagation, gene relativity graph, and a consensus process, and, as a result, provides both appropriate and effective clustering of time-series gene expression data. The proposed method was tested with gene expression data from the Yeast galactose dataset, the Yeast cell-cycle dataset (Y5), and the Yeast sporulation dataset, and the results illustrated the relationships between the expressed genes, which may give some insights into the biological processes involved. BioMed Central 2015-04-16 /pmc/articles/PMC4407314/ /pubmed/25888019 http://dx.doi.org/10.1186/s12859-015-0541-0 Text en © Chiu et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Chiu, Tai-Yu Hsu, Ting-Chieh Yen, Chia-Cheng Wang, Jia-Shung Interpolation based consensus clustering for gene expression time series
title	Interpolation based consensus clustering for gene expression time series
title_full	Interpolation based consensus clustering for gene expression time series
title_fullStr	Interpolation based consensus clustering for gene expression time series
title_full_unstemmed	Interpolation based consensus clustering for gene expression time series
title_short	Interpolation based consensus clustering for gene expression time series
title_sort	interpolation based consensus clustering for gene expression time series
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4407314/ https://www.ncbi.nlm.nih.gov/pubmed/25888019 http://dx.doi.org/10.1186/s12859-015-0541-0
work_keys_str_mv	AT chiutaiyu interpolationbasedconsensusclusteringforgeneexpressiontimeseries AT hsutingchieh interpolationbasedconsensusclusteringforgeneexpressiontimeseries AT yenchiacheng interpolationbasedconsensusclusteringforgeneexpressiontimeseries AT wangjiashung interpolationbasedconsensusclusteringforgeneexpressiontimeseries

Interpolation based consensus clustering for gene expression time series

Ejemplares similares