Cargando…

Integration of gene expression data with prior knowledge for network analysis and validation

BACKGROUND: Reconstruction of protein-protein interaction or metabolic networks based on expression data often involves in silico predictions, while on the other hand, there are unspecific networks of in vivo interactions derived from knowledge bases. We analyze networks designed to come as close as...

Descripción completa

Detalles Bibliográficos
Autores principales: Ante, Michael, Wingender, Edgar, Fuchs, Mathias
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3298547/
https://www.ncbi.nlm.nih.gov/pubmed/22123172
http://dx.doi.org/10.1186/1756-0500-4-520
Descripción
Sumario:BACKGROUND: Reconstruction of protein-protein interaction or metabolic networks based on expression data often involves in silico predictions, while on the other hand, there are unspecific networks of in vivo interactions derived from knowledge bases. We analyze networks designed to come as close as possible to data measured in vivo, both with respect to the set of nodes which were taken to be expressed in experiment as well as with respect to the interactions between them which were taken from manually curated databases RESULTS: A signaling network derived from the TRANSPATH database and a metabolic network derived from KEGG LIGAND are each filtered onto expression data from breast cancer (SAGE) considering different levels of restrictiveness in edge and vertex selection. We perform several validation steps, in particular we define pathway over-representation tests based on refined null models to recover functional modules. The prominent role of the spindle checkpoint-related pathways in breast cancer is exhibited. High-ranking key nodes cluster in functional groups retrieved from literature. Results are consistent between several functional and topological analyses and between signaling and metabolic aspects. CONCLUSIONS: This construction involved as a crucial step the passage to a mammalian protein identifier format as well as to a reaction-based semantics of metabolism. This yielded good connectivity but also led to the need to perform benchmark tests to exclude loss of essential information. Such validation, albeit tedious due to limitations of existing methods, turned out to be informative, and in particular provided biological insights as well as information on the degrees of coherence of the networks despite fragmentation of experimental data. Key node analysis exploited the networks for potentially interesting proteins in view of drug target prediction.