Cargando…

Extending pathways based on gene lists using InterPro domain signatures

BACKGROUND: High-throughput technologies like functional screens and gene expression analysis produce extended lists of candidate genes. Gene-Set Enrichment Analysis is a commonly used and well established technique to test for the statistically significant over-representation of particular pathways...

Descripción completa

Detalles Bibliográficos
Autores principales: Hahne, Florian, Mehrle, Alexander, Arlt, Dorit, Poustka, Annemarie, Wiemann, Stefan, Beissbarth, Tim
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2245903/
https://www.ncbi.nlm.nih.gov/pubmed/18177498
http://dx.doi.org/10.1186/1471-2105-9-3
_version_ 1782150681267798016
author Hahne, Florian
Mehrle, Alexander
Arlt, Dorit
Poustka, Annemarie
Wiemann, Stefan
Beissbarth, Tim
author_facet Hahne, Florian
Mehrle, Alexander
Arlt, Dorit
Poustka, Annemarie
Wiemann, Stefan
Beissbarth, Tim
author_sort Hahne, Florian
collection PubMed
description BACKGROUND: High-throughput technologies like functional screens and gene expression analysis produce extended lists of candidate genes. Gene-Set Enrichment Analysis is a commonly used and well established technique to test for the statistically significant over-representation of particular pathways. A shortcoming of this method is however, that most genes that are investigated in the experiments have very sparse functional or pathway annotation and therefore cannot be the target of such an analysis. The approach presented here aims to assign lists of genes with limited annotation to previously described functional gene collections or pathways. This works by comparing InterPro domain signatures of the candidate gene lists with domain signatures of gene sets derived from known classifications, e.g. KEGG pathways. RESULTS: In order to validate our approach, we designed a simulation study. Based on all pathways available in the KEGG database, we create test gene lists by randomly selecting pathway genes, removing these genes from the known pathways and adding variable amounts of noise in the form of genes not annotated to the pathway. We show that we can recover pathway memberships based on the simulated gene lists with high accuracy. We further demonstrate the applicability of our approach on a biological example. CONCLUSION: Results based on simulation and data analysis show that domain based pathway enrichment analysis is a very sensitive method to test for enrichment of pathways in sparsely annotated lists of genes. An R based software package domainsignatures, to routinely perform this analysis on the results of high-throughput screening, is available via Bioconductor.
format Text
id pubmed-2245903
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-22459032008-02-16 Extending pathways based on gene lists using InterPro domain signatures Hahne, Florian Mehrle, Alexander Arlt, Dorit Poustka, Annemarie Wiemann, Stefan Beissbarth, Tim BMC Bioinformatics Research Article BACKGROUND: High-throughput technologies like functional screens and gene expression analysis produce extended lists of candidate genes. Gene-Set Enrichment Analysis is a commonly used and well established technique to test for the statistically significant over-representation of particular pathways. A shortcoming of this method is however, that most genes that are investigated in the experiments have very sparse functional or pathway annotation and therefore cannot be the target of such an analysis. The approach presented here aims to assign lists of genes with limited annotation to previously described functional gene collections or pathways. This works by comparing InterPro domain signatures of the candidate gene lists with domain signatures of gene sets derived from known classifications, e.g. KEGG pathways. RESULTS: In order to validate our approach, we designed a simulation study. Based on all pathways available in the KEGG database, we create test gene lists by randomly selecting pathway genes, removing these genes from the known pathways and adding variable amounts of noise in the form of genes not annotated to the pathway. We show that we can recover pathway memberships based on the simulated gene lists with high accuracy. We further demonstrate the applicability of our approach on a biological example. CONCLUSION: Results based on simulation and data analysis show that domain based pathway enrichment analysis is a very sensitive method to test for enrichment of pathways in sparsely annotated lists of genes. An R based software package domainsignatures, to routinely perform this analysis on the results of high-throughput screening, is available via Bioconductor. BioMed Central 2008-01-04 /pmc/articles/PMC2245903/ /pubmed/18177498 http://dx.doi.org/10.1186/1471-2105-9-3 Text en Copyright © 2008 Hahne et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Hahne, Florian
Mehrle, Alexander
Arlt, Dorit
Poustka, Annemarie
Wiemann, Stefan
Beissbarth, Tim
Extending pathways based on gene lists using InterPro domain signatures
title Extending pathways based on gene lists using InterPro domain signatures
title_full Extending pathways based on gene lists using InterPro domain signatures
title_fullStr Extending pathways based on gene lists using InterPro domain signatures
title_full_unstemmed Extending pathways based on gene lists using InterPro domain signatures
title_short Extending pathways based on gene lists using InterPro domain signatures
title_sort extending pathways based on gene lists using interpro domain signatures
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2245903/
https://www.ncbi.nlm.nih.gov/pubmed/18177498
http://dx.doi.org/10.1186/1471-2105-9-3
work_keys_str_mv AT hahneflorian extendingpathwaysbasedongenelistsusinginterprodomainsignatures
AT mehrlealexander extendingpathwaysbasedongenelistsusinginterprodomainsignatures
AT arltdorit extendingpathwaysbasedongenelistsusinginterprodomainsignatures
AT poustkaannemarie extendingpathwaysbasedongenelistsusinginterprodomainsignatures
AT wiemannstefan extendingpathwaysbasedongenelistsusinginterprodomainsignatures
AT beissbarthtim extendingpathwaysbasedongenelistsusinginterprodomainsignatures