Cargando…

Probabilistic grammatical model for helix‐helix contact site classification

BACKGROUND: Hidden Markov Models power many state‐of‐the‐art tools in the field of protein bioinformatics. While excelling in their tasks, these methods of protein analysis do not convey directly information on medium‐ and long‐range residue‐residue interactions. This requires an expressive power of...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dyrka, Witold, Nebel, Jean‐Christophe, Kotulska, Malgorzata
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3892132/ https://www.ncbi.nlm.nih.gov/pubmed/24350601 http://dx.doi.org/10.1186/1748-7188-8-31

_version_	1782299473089658880
author	Dyrka, Witold Nebel, Jean‐Christophe Kotulska, Malgorzata
author_facet	Dyrka, Witold Nebel, Jean‐Christophe Kotulska, Malgorzata
author_sort	Dyrka, Witold
collection	PubMed
description	BACKGROUND: Hidden Markov Models power many state‐of‐the‐art tools in the field of protein bioinformatics. While excelling in their tasks, these methods of protein analysis do not convey directly information on medium‐ and long‐range residue‐residue interactions. This requires an expressive power of at least context‐free grammars. However, application of more powerful grammar formalisms to protein analysis has been surprisingly limited. RESULTS: In this work, we present a probabilistic grammatical framework for problem‐specific protein languages and apply it to classification of transmembrane helix‐helix pairs configurations. The core of the model consists of a probabilistic context‐free grammar, automatically inferred by a genetic algorithm from only a generic set of expert‐based rules and positive training samples. The model was applied to produce sequence based descriptors of four classes of transmembrane helix‐helix contact site configurations. The highest performance of the classifiers reached AUCROC of 0.70. The analysis of grammar parse trees revealed the ability of representing structural features of helix‐helix contact sites. CONCLUSIONS: We demonstrated that our probabilistic context‐free framework for analysis of protein sequences outperforms the state of the art in the task of helix‐helix contact site classification. However, this is achieved without necessarily requiring modeling long range dependencies between interacting residues. A significant feature of our approach is that grammar rules and parse trees are human‐readable. Thus they could provide biologically meaningful information for molecular biologists.
format	Online Article Text
id	pubmed-3892132
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-38921322014-01-28 Probabilistic grammatical model for helix‐helix contact site classification Dyrka, Witold Nebel, Jean‐Christophe Kotulska, Malgorzata Algorithms Mol Biol Research BACKGROUND: Hidden Markov Models power many state‐of‐the‐art tools in the field of protein bioinformatics. While excelling in their tasks, these methods of protein analysis do not convey directly information on medium‐ and long‐range residue‐residue interactions. This requires an expressive power of at least context‐free grammars. However, application of more powerful grammar formalisms to protein analysis has been surprisingly limited. RESULTS: In this work, we present a probabilistic grammatical framework for problem‐specific protein languages and apply it to classification of transmembrane helix‐helix pairs configurations. The core of the model consists of a probabilistic context‐free grammar, automatically inferred by a genetic algorithm from only a generic set of expert‐based rules and positive training samples. The model was applied to produce sequence based descriptors of four classes of transmembrane helix‐helix contact site configurations. The highest performance of the classifiers reached AUCROC of 0.70. The analysis of grammar parse trees revealed the ability of representing structural features of helix‐helix contact sites. CONCLUSIONS: We demonstrated that our probabilistic context‐free framework for analysis of protein sequences outperforms the state of the art in the task of helix‐helix contact site classification. However, this is achieved without necessarily requiring modeling long range dependencies between interacting residues. A significant feature of our approach is that grammar rules and parse trees are human‐readable. Thus they could provide biologically meaningful information for molecular biologists. BioMed Central 2013-12-18 /pmc/articles/PMC3892132/ /pubmed/24350601 http://dx.doi.org/10.1186/1748-7188-8-31 Text en Copyright © 2013 Dyrka et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Dyrka, Witold Nebel, Jean‐Christophe Kotulska, Malgorzata Probabilistic grammatical model for helix‐helix contact site classification
title	Probabilistic grammatical model for helix‐helix contact site classification
title_full	Probabilistic grammatical model for helix‐helix contact site classification
title_fullStr	Probabilistic grammatical model for helix‐helix contact site classification
title_full_unstemmed	Probabilistic grammatical model for helix‐helix contact site classification
title_short	Probabilistic grammatical model for helix‐helix contact site classification
title_sort	probabilistic grammatical model for helix‐helix contact site classification
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3892132/ https://www.ncbi.nlm.nih.gov/pubmed/24350601 http://dx.doi.org/10.1186/1748-7188-8-31
work_keys_str_mv	AT dyrkawitold probabilisticgrammaticalmodelforhelixhelixcontactsiteclassification AT nebeljeanchristophe probabilisticgrammaticalmodelforhelixhelixcontactsiteclassification AT kotulskamalgorzata probabilisticgrammaticalmodelforhelixhelixcontactsiteclassification

Probabilistic grammatical model for helix‐helix contact site classification

Ejemplares similares