Cargando…
Partition decoupling for multi-gene analysis of gene expression profiling data
BACKGROUND: Multi-gene interactions likely play an important role in the development of complex phenotypes, and relationships between interacting genes pose a challenging statistical problem in microarray analysis, since the genes involved in these interactions may not exhibit marginal differential...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3276603/ https://www.ncbi.nlm.nih.gov/pubmed/22208906 http://dx.doi.org/10.1186/1471-2105-12-497 |
_version_ | 1782223391259885568 |
---|---|
author | Braun, Rosemary Leibon, Gregory Pauls, Scott Rockmore, Daniel |
author_facet | Braun, Rosemary Leibon, Gregory Pauls, Scott Rockmore, Daniel |
author_sort | Braun, Rosemary |
collection | PubMed |
description | BACKGROUND: Multi-gene interactions likely play an important role in the development of complex phenotypes, and relationships between interacting genes pose a challenging statistical problem in microarray analysis, since the genes involved in these interactions may not exhibit marginal differential expression. As a result, it is necessary to develop tools that can identify sets of interacting genes that discriminate phenotypes without requiring that the classification boundary between phenotypes be convex. RESULTS: We describe an extension and application of a new unsupervised statistical learning technique, known as the Partition Decoupling Method (PDM), to gene expression microarray data. This method may be used to classify samples based on multi-gene expression patterns and to identify pathways associated with phenotype, without relying upon the differential expression of individual genes. The PDM uses iterated spectral clustering and scrubbing steps, revealing at each iteration progressively finer structure in the geometry of the data. Because spectral clustering has the ability to discern clusters that are not linearly separable, it is able to articulate relationships between samples that would be missed by distance- and tree-based classifiers. After projecting the data onto the cluster centroids and computing the residuals ("scrubbing"), one can repeat the spectral clustering, revealing clusters that were not discernible in the first layer. These iterations, each of which provide a partition of the data that is decoupled from the others, are carried forward until the structure in the residuals is indistinguishable from noise, preventing over-fitting. We describe the PDM in detail and apply it to three publicly available cancer gene expression data sets. By applying the PDM on a pathway-by-pathway basis and identifying those pathways that permit unsupervised clustering of samples that match known sample characteristics, we show how the PDM may be used to find sets of mechanistically-related genes that may play a role in disease. An R package to carry out the PDM is available for download. CONCLUSIONS: We show that the PDM is a useful tool for the analysis of gene expression data from complex diseases, where phenotypes are not linearly separable and multi-gene effects are likely to play a role. Our results demonstrate that the PDM is able to distinguish cell types and treatments with higher accuracy than is obtained through other approaches, and that the Pathway-PDM application is a valuable technique for identifying disease-associated pathways. |
format | Online Article Text |
id | pubmed-3276603 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-32766032012-02-10 Partition decoupling for multi-gene analysis of gene expression profiling data Braun, Rosemary Leibon, Gregory Pauls, Scott Rockmore, Daniel BMC Bioinformatics Methodology Article BACKGROUND: Multi-gene interactions likely play an important role in the development of complex phenotypes, and relationships between interacting genes pose a challenging statistical problem in microarray analysis, since the genes involved in these interactions may not exhibit marginal differential expression. As a result, it is necessary to develop tools that can identify sets of interacting genes that discriminate phenotypes without requiring that the classification boundary between phenotypes be convex. RESULTS: We describe an extension and application of a new unsupervised statistical learning technique, known as the Partition Decoupling Method (PDM), to gene expression microarray data. This method may be used to classify samples based on multi-gene expression patterns and to identify pathways associated with phenotype, without relying upon the differential expression of individual genes. The PDM uses iterated spectral clustering and scrubbing steps, revealing at each iteration progressively finer structure in the geometry of the data. Because spectral clustering has the ability to discern clusters that are not linearly separable, it is able to articulate relationships between samples that would be missed by distance- and tree-based classifiers. After projecting the data onto the cluster centroids and computing the residuals ("scrubbing"), one can repeat the spectral clustering, revealing clusters that were not discernible in the first layer. These iterations, each of which provide a partition of the data that is decoupled from the others, are carried forward until the structure in the residuals is indistinguishable from noise, preventing over-fitting. We describe the PDM in detail and apply it to three publicly available cancer gene expression data sets. By applying the PDM on a pathway-by-pathway basis and identifying those pathways that permit unsupervised clustering of samples that match known sample characteristics, we show how the PDM may be used to find sets of mechanistically-related genes that may play a role in disease. An R package to carry out the PDM is available for download. CONCLUSIONS: We show that the PDM is a useful tool for the analysis of gene expression data from complex diseases, where phenotypes are not linearly separable and multi-gene effects are likely to play a role. Our results demonstrate that the PDM is able to distinguish cell types and treatments with higher accuracy than is obtained through other approaches, and that the Pathway-PDM application is a valuable technique for identifying disease-associated pathways. BioMed Central 2011-12-30 /pmc/articles/PMC3276603/ /pubmed/22208906 http://dx.doi.org/10.1186/1471-2105-12-497 Text en Copyright ©2011 Braun et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Braun, Rosemary Leibon, Gregory Pauls, Scott Rockmore, Daniel Partition decoupling for multi-gene analysis of gene expression profiling data |
title | Partition decoupling for multi-gene analysis of gene expression profiling data |
title_full | Partition decoupling for multi-gene analysis of gene expression profiling data |
title_fullStr | Partition decoupling for multi-gene analysis of gene expression profiling data |
title_full_unstemmed | Partition decoupling for multi-gene analysis of gene expression profiling data |
title_short | Partition decoupling for multi-gene analysis of gene expression profiling data |
title_sort | partition decoupling for multi-gene analysis of gene expression profiling data |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3276603/ https://www.ncbi.nlm.nih.gov/pubmed/22208906 http://dx.doi.org/10.1186/1471-2105-12-497 |
work_keys_str_mv | AT braunrosemary partitiondecouplingformultigeneanalysisofgeneexpressionprofilingdata AT leibongregory partitiondecouplingformultigeneanalysisofgeneexpressionprofilingdata AT paulsscott partitiondecouplingformultigeneanalysisofgeneexpressionprofilingdata AT rockmoredaniel partitiondecouplingformultigeneanalysisofgeneexpressionprofilingdata |