Cargando…

Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm

BACKGROUND: Nowadays, it is possible to collect expression levels of a set of genes from a set of biological samples during a series of time points. Such data have three dimensions: gene-sample-time (GST). Thus they are called 3D microarray gene expression data. To take advantage of the 3D data coll...

Descripción completa

Detalles Bibliográficos
Autores principales: Tchagang, Alain B, Phan, Sieu, Famili, Fazel, Shearer, Heather, Fobert, Pierre, Huang, Yi, Zou, Jitao, Huang, Daiqing, Cutler, Adrian, Liu, Ziying, Pan, Youlian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3376030/
https://www.ncbi.nlm.nih.gov/pubmed/22475802
http://dx.doi.org/10.1186/1471-2105-13-54
_version_ 1782235793120559104
author Tchagang, Alain B
Phan, Sieu
Famili, Fazel
Shearer, Heather
Fobert, Pierre
Huang, Yi
Zou, Jitao
Huang, Daiqing
Cutler, Adrian
Liu, Ziying
Pan, Youlian
author_facet Tchagang, Alain B
Phan, Sieu
Famili, Fazel
Shearer, Heather
Fobert, Pierre
Huang, Yi
Zou, Jitao
Huang, Daiqing
Cutler, Adrian
Liu, Ziying
Pan, Youlian
author_sort Tchagang, Alain B
collection PubMed
description BACKGROUND: Nowadays, it is possible to collect expression levels of a set of genes from a set of biological samples during a series of time points. Such data have three dimensions: gene-sample-time (GST). Thus they are called 3D microarray gene expression data. To take advantage of the 3D data collected, and to fully understand the biological knowledge hidden in the GST data, novel subspace clustering algorithms have to be developed to effectively address the biological problem in the corresponding space. RESULTS: We developed a subspace clustering algorithm called Order Preserving Triclustering (OPTricluster), for 3D short time-series data mining. OPTricluster is able to identify 3D clusters with coherent evolution from a given 3D dataset using a combinatorial approach on the sample dimension, and the order preserving (OP) concept on the time dimension. The fusion of the two methodologies allows one to study similarities and differences between samples in terms of their temporal expression profile. OPTricluster has been successfully applied to four case studies: immune response in mice infected by malaria (Plasmodium chabaudi), systemic acquired resistance in Arabidopsis thaliana, similarities and differences between inner and outer cotyledon in Brassica napus during seed development, and to Brassica napus whole seed development. These studies showed that OPTricluster is robust to noise and is able to detect the similarities and differences between biological samples. CONCLUSIONS: Our analysis showed that OPTricluster generally outperforms other well known clustering algorithms such as the TRICLUSTER, gTRICLUSTER and K-means; it is robust to noise and can effectively mine the biological knowledge hidden in the 3D short time-series gene expression data.
format Online
Article
Text
id pubmed-3376030
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-33760302012-06-18 Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm Tchagang, Alain B Phan, Sieu Famili, Fazel Shearer, Heather Fobert, Pierre Huang, Yi Zou, Jitao Huang, Daiqing Cutler, Adrian Liu, Ziying Pan, Youlian BMC Bioinformatics Methodology Article BACKGROUND: Nowadays, it is possible to collect expression levels of a set of genes from a set of biological samples during a series of time points. Such data have three dimensions: gene-sample-time (GST). Thus they are called 3D microarray gene expression data. To take advantage of the 3D data collected, and to fully understand the biological knowledge hidden in the GST data, novel subspace clustering algorithms have to be developed to effectively address the biological problem in the corresponding space. RESULTS: We developed a subspace clustering algorithm called Order Preserving Triclustering (OPTricluster), for 3D short time-series data mining. OPTricluster is able to identify 3D clusters with coherent evolution from a given 3D dataset using a combinatorial approach on the sample dimension, and the order preserving (OP) concept on the time dimension. The fusion of the two methodologies allows one to study similarities and differences between samples in terms of their temporal expression profile. OPTricluster has been successfully applied to four case studies: immune response in mice infected by malaria (Plasmodium chabaudi), systemic acquired resistance in Arabidopsis thaliana, similarities and differences between inner and outer cotyledon in Brassica napus during seed development, and to Brassica napus whole seed development. These studies showed that OPTricluster is robust to noise and is able to detect the similarities and differences between biological samples. CONCLUSIONS: Our analysis showed that OPTricluster generally outperforms other well known clustering algorithms such as the TRICLUSTER, gTRICLUSTER and K-means; it is robust to noise and can effectively mine the biological knowledge hidden in the 3D short time-series gene expression data. BioMed Central 2012-04-04 /pmc/articles/PMC3376030/ /pubmed/22475802 http://dx.doi.org/10.1186/1471-2105-13-54 Text en Copyright ©2012 Tchagang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Tchagang, Alain B
Phan, Sieu
Famili, Fazel
Shearer, Heather
Fobert, Pierre
Huang, Yi
Zou, Jitao
Huang, Daiqing
Cutler, Adrian
Liu, Ziying
Pan, Youlian
Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm
title Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm
title_full Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm
title_fullStr Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm
title_full_unstemmed Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm
title_short Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm
title_sort mining biological information from 3d short time-series gene expression data: the optricluster algorithm
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3376030/
https://www.ncbi.nlm.nih.gov/pubmed/22475802
http://dx.doi.org/10.1186/1471-2105-13-54
work_keys_str_mv AT tchagangalainb miningbiologicalinformationfrom3dshorttimeseriesgeneexpressiondatatheoptriclusteralgorithm
AT phansieu miningbiologicalinformationfrom3dshorttimeseriesgeneexpressiondatatheoptriclusteralgorithm
AT familifazel miningbiologicalinformationfrom3dshorttimeseriesgeneexpressiondatatheoptriclusteralgorithm
AT shearerheather miningbiologicalinformationfrom3dshorttimeseriesgeneexpressiondatatheoptriclusteralgorithm
AT fobertpierre miningbiologicalinformationfrom3dshorttimeseriesgeneexpressiondatatheoptriclusteralgorithm
AT huangyi miningbiologicalinformationfrom3dshorttimeseriesgeneexpressiondatatheoptriclusteralgorithm
AT zoujitao miningbiologicalinformationfrom3dshorttimeseriesgeneexpressiondatatheoptriclusteralgorithm
AT huangdaiqing miningbiologicalinformationfrom3dshorttimeseriesgeneexpressiondatatheoptriclusteralgorithm
AT cutleradrian miningbiologicalinformationfrom3dshorttimeseriesgeneexpressiondatatheoptriclusteralgorithm
AT liuziying miningbiologicalinformationfrom3dshorttimeseriesgeneexpressiondatatheoptriclusteralgorithm
AT panyoulian miningbiologicalinformationfrom3dshorttimeseriesgeneexpressiondatatheoptriclusteralgorithm