Cargando…

On the Relevance of Sophisticated Structural Annotations for Disulfide Connectivity Pattern Prediction

Disulfide bridges strongly constrain the native structure of many proteins and predicting their formation is therefore a key sub-problem of protein structure and function inference. Most recently proposed approaches for this prediction problem adopt the following pipeline: first they enrich the prim...

Descripción completa

Detalles Bibliográficos
Autores principales:	Becker, Julien, Maes, Francis, Wehenkel, Louis
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2013
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3574028/ https://www.ncbi.nlm.nih.gov/pubmed/23533562 http://dx.doi.org/10.1371/journal.pone.0056621

_version_	1782259551421071360
author	Becker, Julien Maes, Francis Wehenkel, Louis
author_facet	Becker, Julien Maes, Francis Wehenkel, Louis
author_sort	Becker, Julien
collection	PubMed
description	Disulfide bridges strongly constrain the native structure of many proteins and predicting their formation is therefore a key sub-problem of protein structure and function inference. Most recently proposed approaches for this prediction problem adopt the following pipeline: first they enrich the primary sequence with structural annotations, second they apply a binary classifier to each candidate pair of cysteines to predict disulfide bonding probabilities and finally, they use a maximum weight graph matching algorithm to derive the predicted disulfide connectivity pattern of a protein. In this paper, we adopt this three step pipeline and propose an extensive study of the relevance of various structural annotations and feature encodings. In particular, we consider five kinds of structural annotations, among which three are novel in the context of disulfide bridge prediction. So as to be usable by machine learning algorithms, these annotations must be encoded into features. For this purpose, we propose four different feature encodings based on local windows and on different kinds of histograms. The combination of structural annotations with these possible encodings leads to a large number of possible feature functions. In order to identify a minimal subset of relevant feature functions among those, we propose an efficient and interpretable feature function selection scheme, designed so as to avoid any form of overfitting. We apply this scheme on top of three supervised learning algorithms: k-nearest neighbors, support vector machines and extremely randomized trees. Our results indicate that the use of only the PSSM (position-specific scoring matrix) together with the CSP (cysteine separation profile) are sufficient to construct a high performance disulfide pattern predictor and that extremely randomized trees reach a disulfide pattern prediction accuracy of [Image: see text] on the benchmark dataset SPX[Image: see text], which corresponds to [Image: see text] improvement over the state of the art. A web-application is available at http://m24.giga.ulg.ac.be:81/x3CysBridges.
format	Online Article Text
id	pubmed-3574028
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-35740282013-03-26 On the Relevance of Sophisticated Structural Annotations for Disulfide Connectivity Pattern Prediction Becker, Julien Maes, Francis Wehenkel, Louis PLoS One Research Article Disulfide bridges strongly constrain the native structure of many proteins and predicting their formation is therefore a key sub-problem of protein structure and function inference. Most recently proposed approaches for this prediction problem adopt the following pipeline: first they enrich the primary sequence with structural annotations, second they apply a binary classifier to each candidate pair of cysteines to predict disulfide bonding probabilities and finally, they use a maximum weight graph matching algorithm to derive the predicted disulfide connectivity pattern of a protein. In this paper, we adopt this three step pipeline and propose an extensive study of the relevance of various structural annotations and feature encodings. In particular, we consider five kinds of structural annotations, among which three are novel in the context of disulfide bridge prediction. So as to be usable by machine learning algorithms, these annotations must be encoded into features. For this purpose, we propose four different feature encodings based on local windows and on different kinds of histograms. The combination of structural annotations with these possible encodings leads to a large number of possible feature functions. In order to identify a minimal subset of relevant feature functions among those, we propose an efficient and interpretable feature function selection scheme, designed so as to avoid any form of overfitting. We apply this scheme on top of three supervised learning algorithms: k-nearest neighbors, support vector machines and extremely randomized trees. Our results indicate that the use of only the PSSM (position-specific scoring matrix) together with the CSP (cysteine separation profile) are sufficient to construct a high performance disulfide pattern predictor and that extremely randomized trees reach a disulfide pattern prediction accuracy of [Image: see text] on the benchmark dataset SPX[Image: see text], which corresponds to [Image: see text] improvement over the state of the art. A web-application is available at http://m24.giga.ulg.ac.be:81/x3CysBridges. Public Library of Science 2013-02-15 /pmc/articles/PMC3574028/ /pubmed/23533562 http://dx.doi.org/10.1371/journal.pone.0056621 Text en © 2013 Becker et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Becker, Julien Maes, Francis Wehenkel, Louis On the Relevance of Sophisticated Structural Annotations for Disulfide Connectivity Pattern Prediction
title	On the Relevance of Sophisticated Structural Annotations for Disulfide Connectivity Pattern Prediction
title_full	On the Relevance of Sophisticated Structural Annotations for Disulfide Connectivity Pattern Prediction
title_fullStr	On the Relevance of Sophisticated Structural Annotations for Disulfide Connectivity Pattern Prediction
title_full_unstemmed	On the Relevance of Sophisticated Structural Annotations for Disulfide Connectivity Pattern Prediction
title_short	On the Relevance of Sophisticated Structural Annotations for Disulfide Connectivity Pattern Prediction
title_sort	on the relevance of sophisticated structural annotations for disulfide connectivity pattern prediction
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3574028/ https://www.ncbi.nlm.nih.gov/pubmed/23533562 http://dx.doi.org/10.1371/journal.pone.0056621
work_keys_str_mv	AT beckerjulien ontherelevanceofsophisticatedstructuralannotationsfordisulfideconnectivitypatternprediction AT maesfrancis ontherelevanceofsophisticatedstructuralannotationsfordisulfideconnectivitypatternprediction AT wehenkellouis ontherelevanceofsophisticatedstructuralannotationsfordisulfideconnectivitypatternprediction

On the Relevance of Sophisticated Structural Annotations for Disulfide Connectivity Pattern Prediction

Ejemplares similares