Cargando…

A Seriation Approach for Visualization-Driven Discovery of Co-Expression Patterns in Serial Analysis of Gene Expression (SAGE) Data

BACKGROUND: Serial Analysis of Gene Expression (SAGE) is a DNA sequencing-based method for large-scale gene expression profiling that provides an alternative to microarray analysis. Most analyses of SAGE data aimed at identifying co-expressed genes have been accomplished using various versions of cl...

Descripción completa

Detalles Bibliográficos
Autores principales: Morozova, Olena, Morozov, Vyacheslav, Hoffman, Brad G., Helgason, Cheryl D., Marra, Marco A.
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2527533/
https://www.ncbi.nlm.nih.gov/pubmed/18787709
http://dx.doi.org/10.1371/journal.pone.0003205
Descripción
Sumario:BACKGROUND: Serial Analysis of Gene Expression (SAGE) is a DNA sequencing-based method for large-scale gene expression profiling that provides an alternative to microarray analysis. Most analyses of SAGE data aimed at identifying co-expressed genes have been accomplished using various versions of clustering approaches that often result in a number of false positives. PRINCIPAL FINDINGS: Here we explore the use of seriation, a statistical approach for ordering sets of objects based on their similarity, for large-scale expression pattern discovery in SAGE data. For this specific task we implement a seriation heuristic we term ‘progressive construction of contigs’ that constructs local chains of related elements by sequentially rearranging margins of the correlation matrix. We apply the heuristic to the analysis of simulated and experimental SAGE data and compare our results to those obtained with a clustering algorithm developed specifically for SAGE data. We show using simulations that the performance of seriation compares favorably to that of the clustering algorithm on noisy SAGE data. CONCLUSIONS: We explore the use of a seriation approach for visualization-based pattern discovery in SAGE data. Using both simulations and experimental data, we demonstrate that seriation is able to identify groups of co-expressed genes more accurately than a clustering algorithm developed specifically for SAGE data. Our results suggest that seriation is a useful method for the analysis of gene expression data whose applicability should be further pursued.