Cargando…

Predicting pathway membership via domain signatures

Motivation: Functional characterization of genes is of great importance for the understanding of complex cellular processes. Valuable information for this purpose can be obtained from pathway databases, like KEGG. However, only a small fraction of genes is annotated with pathway information up to no...

Descripción completa

Detalles Bibliográficos
Autores principales: Fröhlich, Holger, Fellmann, Mark, Sültmann, Holger, Poustka, Annemarie, Beißbarth, Tim
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2553439/
https://www.ncbi.nlm.nih.gov/pubmed/18676972
http://dx.doi.org/10.1093/bioinformatics/btn403
Descripción
Sumario:Motivation: Functional characterization of genes is of great importance for the understanding of complex cellular processes. Valuable information for this purpose can be obtained from pathway databases, like KEGG. However, only a small fraction of genes is annotated with pathway information up to now. In contrast, information on contained protein domains can be obtained for a significantly higher number of genes, e.g. from the InterPro database. Results: We present a classification model, which for a specific gene of interest can predict the mapping to a KEGG pathway, based on its domain signature. The classifier makes explicit use of the hierarchical organization of pathways in the KEGG database. Furthermore, we take into account that a specific gene can be mapped to different pathways at the same time. The classification method produces a scoring of all possible mapping positions of the gene in the KEGG hierarchy. Evaluations of our model, which is a combination of a SVM and ranking perceptron approach, show a high prediction performance. Moreover, for signaling pathways we reveal that it is even possible to forecast accurately the membership to individual pathway components. Availability: The R package gene2pathway is a supplement to this article. Contact: h.froehlich@dkfz-heidelberg.de Supplementary Information: Supplementary data are available at Bioinformatics online.