Cargando…

Motif kernel generated by genetic programming improves remote homology and fold detection

BACKGROUND: Protein remote homology detection is a central problem in computational biology. Most recent methods train support vector machines to discriminate between related and unrelated sequences and these studies have introduced several types of kernels. One successful approach is to base a kern...

Descripción completa

Detalles Bibliográficos
Autores principales: Håndstad, Tony, Hestnes, Arne JH, Sætrom, Pål
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1794419/
https://www.ncbi.nlm.nih.gov/pubmed/17254344
http://dx.doi.org/10.1186/1471-2105-8-23
_version_ 1782132170626695168
author Håndstad, Tony
Hestnes, Arne JH
Sætrom, Pål
author_facet Håndstad, Tony
Hestnes, Arne JH
Sætrom, Pål
author_sort Håndstad, Tony
collection PubMed
description BACKGROUND: Protein remote homology detection is a central problem in computational biology. Most recent methods train support vector machines to discriminate between related and unrelated sequences and these studies have introduced several types of kernels. One successful approach is to base a kernel on shared occurrences of discrete sequence motifs. Still, many protein sequences fail to be classified correctly for a lack of a suitable set of motifs for these sequences. RESULTS: We introduce the GPkernel, which is a motif kernel based on discrete sequence motifs where the motifs are evolved using genetic programming. All proteins can be grouped according to evolutionary relations and structure, and the method uses this inherent structure to create groups of motifs that discriminate between different families of evolutionary origin. When tested on two SCOP benchmarks, the superfamily and fold recognition problems, the GPkernel gives significantly better results compared to related methods of remote homology detection. CONCLUSION: The GPkernel gives particularly good results on the more difficult fold recognition problem compared to the other methods. This is mainly because the method creates motif sets that describe similarities among subgroups of both the related and unrelated proteins. This rich set of motifs give a better description of the similarities and differences between different folds than do previous motif-based methods.
format Text
id pubmed-1794419
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-17944192007-02-08 Motif kernel generated by genetic programming improves remote homology and fold detection Håndstad, Tony Hestnes, Arne JH Sætrom, Pål BMC Bioinformatics Research Article BACKGROUND: Protein remote homology detection is a central problem in computational biology. Most recent methods train support vector machines to discriminate between related and unrelated sequences and these studies have introduced several types of kernels. One successful approach is to base a kernel on shared occurrences of discrete sequence motifs. Still, many protein sequences fail to be classified correctly for a lack of a suitable set of motifs for these sequences. RESULTS: We introduce the GPkernel, which is a motif kernel based on discrete sequence motifs where the motifs are evolved using genetic programming. All proteins can be grouped according to evolutionary relations and structure, and the method uses this inherent structure to create groups of motifs that discriminate between different families of evolutionary origin. When tested on two SCOP benchmarks, the superfamily and fold recognition problems, the GPkernel gives significantly better results compared to related methods of remote homology detection. CONCLUSION: The GPkernel gives particularly good results on the more difficult fold recognition problem compared to the other methods. This is mainly because the method creates motif sets that describe similarities among subgroups of both the related and unrelated proteins. This rich set of motifs give a better description of the similarities and differences between different folds than do previous motif-based methods. BioMed Central 2007-01-25 /pmc/articles/PMC1794419/ /pubmed/17254344 http://dx.doi.org/10.1186/1471-2105-8-23 Text en Copyright © 2007 Håndstad et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Håndstad, Tony
Hestnes, Arne JH
Sætrom, Pål
Motif kernel generated by genetic programming improves remote homology and fold detection
title Motif kernel generated by genetic programming improves remote homology and fold detection
title_full Motif kernel generated by genetic programming improves remote homology and fold detection
title_fullStr Motif kernel generated by genetic programming improves remote homology and fold detection
title_full_unstemmed Motif kernel generated by genetic programming improves remote homology and fold detection
title_short Motif kernel generated by genetic programming improves remote homology and fold detection
title_sort motif kernel generated by genetic programming improves remote homology and fold detection
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1794419/
https://www.ncbi.nlm.nih.gov/pubmed/17254344
http://dx.doi.org/10.1186/1471-2105-8-23
work_keys_str_mv AT handstadtony motifkernelgeneratedbygeneticprogrammingimprovesremotehomologyandfolddetection
AT hestnesarnejh motifkernelgeneratedbygeneticprogrammingimprovesremotehomologyandfolddetection
AT sætrompal motifkernelgeneratedbygeneticprogrammingimprovesremotehomologyandfolddetection