Cargando…

UNCLES: method for the identification of genes differentially consistently co-expressed in a specific subset of datasets

BACKGROUND: Collective analysis of the increasingly emerging gene expression datasets are required. The recently proposed binarisation of consensus partition matrices (Bi-CoPaM) method can combine clustering results from multiple datasets to identify the subsets of genes which are consistently co-ex...

Descripción completa

Detalles Bibliográficos
Autores principales:	Abu-Jamous, Basel, Fa, Rui, Roberts, David J., Nandi, Asoke K.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4453228/ https://www.ncbi.nlm.nih.gov/pubmed/26040489 http://dx.doi.org/10.1186/s12859-015-0614-0

_version_	1782374430918311936
author	Abu-Jamous, Basel Fa, Rui Roberts, David J. Nandi, Asoke K.
author_facet	Abu-Jamous, Basel Fa, Rui Roberts, David J. Nandi, Asoke K.
author_sort	Abu-Jamous, Basel
collection	PubMed
description	BACKGROUND: Collective analysis of the increasingly emerging gene expression datasets are required. The recently proposed binarisation of consensus partition matrices (Bi-CoPaM) method can combine clustering results from multiple datasets to identify the subsets of genes which are consistently co-expressed in all of the provided datasets in a tuneable manner. However, results validation and parameter setting are issues that complicate the design of such methods. Moreover, although it is a common practice to test methods by application to synthetic datasets, the mathematical models used to synthesise such datasets are usually based on approximations which may not always be sufficiently representative of real datasets. RESULTS: Here, we propose an unsupervised method for the unification of clustering results from multiple datasets using external specifications (UNCLES). This method has the ability to identify the subsets of genes consistently co-expressed in a subset of datasets while being poorly co-expressed in another subset of datasets, and to identify the subsets of genes consistently co-expressed in all given datasets. We also propose the M-N scatter plots validation technique and adopt it to set the parameters of UNCLES, such as the number of clusters, automatically. Additionally, we propose an approach for the synthesis of gene expression datasets using real data profiles in a way which combines the ground-truth-knowledge of synthetic data and the realistic expression values of real data, and therefore overcomes the problem of faithfulness of synthetic expression data modelling. By application to those datasets, we validate UNCLES while comparing it with other conventional clustering methods, and of particular relevance, biclustering methods. We further validate UNCLES by application to a set of 14 real genome-wide yeast datasets as it produces focused clusters that conform well to known biological facts. Furthermore, in-silico-based hypotheses regarding the function of a few previously unknown genes in those focused clusters are drawn. CONCLUSIONS: The UNCLES method, the M-N scatter plots technique, and the expression data synthesis approach will have wide application for the comprehensive analysis of genomic and other sources of multiple complex biological datasets. Moreover, the derived in-silico-based biological hypotheses represent subjects for future functional studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0614-0) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4453228
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-44532282015-06-04 UNCLES: method for the identification of genes differentially consistently co-expressed in a specific subset of datasets Abu-Jamous, Basel Fa, Rui Roberts, David J. Nandi, Asoke K. BMC Bioinformatics Methodology Article BACKGROUND: Collective analysis of the increasingly emerging gene expression datasets are required. The recently proposed binarisation of consensus partition matrices (Bi-CoPaM) method can combine clustering results from multiple datasets to identify the subsets of genes which are consistently co-expressed in all of the provided datasets in a tuneable manner. However, results validation and parameter setting are issues that complicate the design of such methods. Moreover, although it is a common practice to test methods by application to synthetic datasets, the mathematical models used to synthesise such datasets are usually based on approximations which may not always be sufficiently representative of real datasets. RESULTS: Here, we propose an unsupervised method for the unification of clustering results from multiple datasets using external specifications (UNCLES). This method has the ability to identify the subsets of genes consistently co-expressed in a subset of datasets while being poorly co-expressed in another subset of datasets, and to identify the subsets of genes consistently co-expressed in all given datasets. We also propose the M-N scatter plots validation technique and adopt it to set the parameters of UNCLES, such as the number of clusters, automatically. Additionally, we propose an approach for the synthesis of gene expression datasets using real data profiles in a way which combines the ground-truth-knowledge of synthetic data and the realistic expression values of real data, and therefore overcomes the problem of faithfulness of synthetic expression data modelling. By application to those datasets, we validate UNCLES while comparing it with other conventional clustering methods, and of particular relevance, biclustering methods. We further validate UNCLES by application to a set of 14 real genome-wide yeast datasets as it produces focused clusters that conform well to known biological facts. Furthermore, in-silico-based hypotheses regarding the function of a few previously unknown genes in those focused clusters are drawn. CONCLUSIONS: The UNCLES method, the M-N scatter plots technique, and the expression data synthesis approach will have wide application for the comprehensive analysis of genomic and other sources of multiple complex biological datasets. Moreover, the derived in-silico-based biological hypotheses represent subjects for future functional studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0614-0) contains supplementary material, which is available to authorized users. BioMed Central 2015-06-04 /pmc/articles/PMC4453228/ /pubmed/26040489 http://dx.doi.org/10.1186/s12859-015-0614-0 Text en © Abu-Jamous et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Abu-Jamous, Basel Fa, Rui Roberts, David J. Nandi, Asoke K. UNCLES: method for the identification of genes differentially consistently co-expressed in a specific subset of datasets
title	UNCLES: method for the identification of genes differentially consistently co-expressed in a specific subset of datasets
title_full	UNCLES: method for the identification of genes differentially consistently co-expressed in a specific subset of datasets
title_fullStr	UNCLES: method for the identification of genes differentially consistently co-expressed in a specific subset of datasets
title_full_unstemmed	UNCLES: method for the identification of genes differentially consistently co-expressed in a specific subset of datasets
title_short	UNCLES: method for the identification of genes differentially consistently co-expressed in a specific subset of datasets
title_sort	uncles: method for the identification of genes differentially consistently co-expressed in a specific subset of datasets
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4453228/ https://www.ncbi.nlm.nih.gov/pubmed/26040489 http://dx.doi.org/10.1186/s12859-015-0614-0
work_keys_str_mv	AT abujamousbasel unclesmethodfortheidentificationofgenesdifferentiallyconsistentlycoexpressedinaspecificsubsetofdatasets AT farui unclesmethodfortheidentificationofgenesdifferentiallyconsistentlycoexpressedinaspecificsubsetofdatasets AT robertsdavidj unclesmethodfortheidentificationofgenesdifferentiallyconsistentlycoexpressedinaspecificsubsetofdatasets AT nandiasokek unclesmethodfortheidentificationofgenesdifferentiallyconsistentlycoexpressedinaspecificsubsetofdatasets

UNCLES: method for the identification of genes differentially consistently co-expressed in a specific subset of datasets

Ejemplares similares