Cargando…

A framework for generalized subspace pattern mining in high-dimensional datasets

BACKGROUND: A generalized notion of biclustering involves the identification of patterns across subspaces within a data matrix. This approach is particularly well-suited to analysis of heterogeneous molecular biology datasets, such as those collected from populations of cancer patients. Different de...

Descripción completa

Detalles Bibliográficos
Autor principal:	Curry, Edward WJ
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4247685/ https://www.ncbi.nlm.nih.gov/pubmed/25413436 http://dx.doi.org/10.1186/s12859-014-0355-5

_version_	1782346682166411264
author	Curry, Edward WJ
author_facet	Curry, Edward WJ
author_sort	Curry, Edward WJ
collection	PubMed
description	BACKGROUND: A generalized notion of biclustering involves the identification of patterns across subspaces within a data matrix. This approach is particularly well-suited to analysis of heterogeneous molecular biology datasets, such as those collected from populations of cancer patients. Different definitions of biclusters will offer different opportunities to discover information from datasets, making it pertinent to tailor the desired patterns to the intended application. This paper introduces ‘GABi’, a customizable framework for subspace pattern mining suited to large heterogeneous datasets. Most existing biclustering algorithms discover biclusters of only a few distinct structures. However, by enabling definition of arbitrary bicluster models, the GABi framework enables the application of biclustering to tasks for which no existing algorithm could be used. RESULTS: First, a series of artificial datasets were constructed to represent three clearly distinct scenarios for applying biclustering. With a bicluster model created for each distinct scenario, GABi is shown to recover the correct solutions more effectively than a panel of alternative approaches, where the bicluster model may not reflect the structure of the desired solution. Secondly, the GABi framework is used to integrate clinical outcome data with an ovarian cancer DNA methylation dataset, leading to the discovery that widespread dysregulation of DNA methylation associates with poor patient prognosis, a result that has not previously been reported. This illustrates a further benefit of the flexible bicluster definition of GABi, which is that it enables incorporation of multiple sources of data, with each data source treated in a specific manner, leading to a means of intelligent integrated subspace pattern mining across multiple datasets. CONCLUSIONS: The GABi framework enables discovery of biologically relevant patterns of any specified structure from large collections of genomic data. An R implementation of the GABi framework is available through CRAN (http://cran.r-project.org/web/packages/GABi/index.html). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0355-5) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4247685
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-42476852014-12-02 A framework for generalized subspace pattern mining in high-dimensional datasets Curry, Edward WJ BMC Bioinformatics Methodology Article BACKGROUND: A generalized notion of biclustering involves the identification of patterns across subspaces within a data matrix. This approach is particularly well-suited to analysis of heterogeneous molecular biology datasets, such as those collected from populations of cancer patients. Different definitions of biclusters will offer different opportunities to discover information from datasets, making it pertinent to tailor the desired patterns to the intended application. This paper introduces ‘GABi’, a customizable framework for subspace pattern mining suited to large heterogeneous datasets. Most existing biclustering algorithms discover biclusters of only a few distinct structures. However, by enabling definition of arbitrary bicluster models, the GABi framework enables the application of biclustering to tasks for which no existing algorithm could be used. RESULTS: First, a series of artificial datasets were constructed to represent three clearly distinct scenarios for applying biclustering. With a bicluster model created for each distinct scenario, GABi is shown to recover the correct solutions more effectively than a panel of alternative approaches, where the bicluster model may not reflect the structure of the desired solution. Secondly, the GABi framework is used to integrate clinical outcome data with an ovarian cancer DNA methylation dataset, leading to the discovery that widespread dysregulation of DNA methylation associates with poor patient prognosis, a result that has not previously been reported. This illustrates a further benefit of the flexible bicluster definition of GABi, which is that it enables incorporation of multiple sources of data, with each data source treated in a specific manner, leading to a means of intelligent integrated subspace pattern mining across multiple datasets. CONCLUSIONS: The GABi framework enables discovery of biologically relevant patterns of any specified structure from large collections of genomic data. An R implementation of the GABi framework is available through CRAN (http://cran.r-project.org/web/packages/GABi/index.html). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0355-5) contains supplementary material, which is available to authorized users. BioMed Central 2014-11-21 /pmc/articles/PMC4247685/ /pubmed/25413436 http://dx.doi.org/10.1186/s12859-014-0355-5 Text en © Curry; licensee BioMed Central Ltd. 2014 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Curry, Edward WJ A framework for generalized subspace pattern mining in high-dimensional datasets
title	A framework for generalized subspace pattern mining in high-dimensional datasets
title_full	A framework for generalized subspace pattern mining in high-dimensional datasets
title_fullStr	A framework for generalized subspace pattern mining in high-dimensional datasets
title_full_unstemmed	A framework for generalized subspace pattern mining in high-dimensional datasets
title_short	A framework for generalized subspace pattern mining in high-dimensional datasets
title_sort	framework for generalized subspace pattern mining in high-dimensional datasets
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4247685/ https://www.ncbi.nlm.nih.gov/pubmed/25413436 http://dx.doi.org/10.1186/s12859-014-0355-5
work_keys_str_mv	AT curryedwardwj aframeworkforgeneralizedsubspacepatternmininginhighdimensionaldatasets AT curryedwardwj frameworkforgeneralizedsubspacepatternmininginhighdimensionaldatasets

A framework for generalized subspace pattern mining in high-dimensional datasets

Ejemplares similares