Cargando…

A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models

BACKGROUND: Remote homology detection is a hard computational problem. Most approaches have trained computational models by using either full protein sequences or multiple sequence alignments (MSA), including all positions. However, when we deal with proteins in the "twilight zone" we can...

Descripción completa

Detalles Bibliográficos
Autores principales: Bernardes, Juliana S, Carbone, Alessandra, Zaverucha, Gerson
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3078102/
https://www.ncbi.nlm.nih.gov/pubmed/21429187
http://dx.doi.org/10.1186/1471-2105-12-83
_version_ 1782201914429014016
author Bernardes, Juliana S
Carbone, Alessandra
Zaverucha, Gerson
author_facet Bernardes, Juliana S
Carbone, Alessandra
Zaverucha, Gerson
author_sort Bernardes, Juliana S
collection PubMed
description BACKGROUND: Remote homology detection is a hard computational problem. Most approaches have trained computational models by using either full protein sequences or multiple sequence alignments (MSA), including all positions. However, when we deal with proteins in the "twilight zone" we can observe that only some segments of sequences (motifs) are conserved. We introduce a novel logical representation that allows us to represent physico-chemical properties of sequences, conserved amino acid positions and conserved physico-chemical positions in the MSA. From this, Inductive Logic Programming (ILP) finds the most frequent patterns (motifs) and uses them to train propositional models, such as decision trees and support vector machines (SVM). RESULTS: We use the SCOP database to perform our experiments by evaluating protein recognition within the same superfamily. Our results show that our methodology when using SVM performs significantly better than some of the state of the art methods, and comparable to other. However, our method provides a comprehensible set of logical rules that can help to understand what determines a protein function. CONCLUSIONS: The strategy of selecting only the most frequent patterns is effective for the remote homology detection. This is possible through a suitable first-order logical representation of homologous properties, and through a set of frequent patterns, found by an ILP system, that summarizes essential features of protein functions.
format Text
id pubmed-3078102
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30781022011-04-16 A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models Bernardes, Juliana S Carbone, Alessandra Zaverucha, Gerson BMC Bioinformatics Research Article BACKGROUND: Remote homology detection is a hard computational problem. Most approaches have trained computational models by using either full protein sequences or multiple sequence alignments (MSA), including all positions. However, when we deal with proteins in the "twilight zone" we can observe that only some segments of sequences (motifs) are conserved. We introduce a novel logical representation that allows us to represent physico-chemical properties of sequences, conserved amino acid positions and conserved physico-chemical positions in the MSA. From this, Inductive Logic Programming (ILP) finds the most frequent patterns (motifs) and uses them to train propositional models, such as decision trees and support vector machines (SVM). RESULTS: We use the SCOP database to perform our experiments by evaluating protein recognition within the same superfamily. Our results show that our methodology when using SVM performs significantly better than some of the state of the art methods, and comparable to other. However, our method provides a comprehensible set of logical rules that can help to understand what determines a protein function. CONCLUSIONS: The strategy of selecting only the most frequent patterns is effective for the remote homology detection. This is possible through a suitable first-order logical representation of homologous properties, and through a set of frequent patterns, found by an ILP system, that summarizes essential features of protein functions. BioMed Central 2011-03-23 /pmc/articles/PMC3078102/ /pubmed/21429187 http://dx.doi.org/10.1186/1471-2105-12-83 Text en Copyright ©2011 Bernardes et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Bernardes, Juliana S
Carbone, Alessandra
Zaverucha, Gerson
A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models
title A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models
title_full A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models
title_fullStr A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models
title_full_unstemmed A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models
title_short A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models
title_sort discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3078102/
https://www.ncbi.nlm.nih.gov/pubmed/21429187
http://dx.doi.org/10.1186/1471-2105-12-83
work_keys_str_mv AT bernardesjulianas adiscriminativemethodforfamilybasedproteinremotehomologydetectionthatcombinesinductivelogicprogrammingandpropositionalmodels
AT carbonealessandra adiscriminativemethodforfamilybasedproteinremotehomologydetectionthatcombinesinductivelogicprogrammingandpropositionalmodels
AT zaveruchagerson adiscriminativemethodforfamilybasedproteinremotehomologydetectionthatcombinesinductivelogicprogrammingandpropositionalmodels
AT bernardesjulianas discriminativemethodforfamilybasedproteinremotehomologydetectionthatcombinesinductivelogicprogrammingandpropositionalmodels
AT carbonealessandra discriminativemethodforfamilybasedproteinremotehomologydetectionthatcombinesinductivelogicprogrammingandpropositionalmodels
AT zaveruchagerson discriminativemethodforfamilybasedproteinremotehomologydetectionthatcombinesinductivelogicprogrammingandpropositionalmodels