Cargando…

On the Encoding of Proteins for Disordered Regions Prediction

Disordered regions, i.e., regions of proteins that do not adopt a stable three-dimensional structure, have been shown to play various and critical roles in many biological processes. Predicting and understanding their formation is therefore a key sub-problem of protein structure and function inferen...

Descripción completa

Detalles Bibliográficos
Autores principales: Becker, Julien, Maes, Francis, Wehenkel, Louis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3864923/
https://www.ncbi.nlm.nih.gov/pubmed/24358161
http://dx.doi.org/10.1371/journal.pone.0082252
_version_ 1782295966566580224
author Becker, Julien
Maes, Francis
Wehenkel, Louis
author_facet Becker, Julien
Maes, Francis
Wehenkel, Louis
author_sort Becker, Julien
collection PubMed
description Disordered regions, i.e., regions of proteins that do not adopt a stable three-dimensional structure, have been shown to play various and critical roles in many biological processes. Predicting and understanding their formation is therefore a key sub-problem of protein structure and function inference. A wide range of machine learning approaches have been developed to automatically predict disordered regions of proteins. One key factor of the success of these methods is the way in which protein information is encoded into features. Recently, we have proposed a systematic methodology to study the relevance of various feature encodings in the context of disulfide connectivity pattern prediction. In the present paper, we adapt this methodology to the problem of predicting disordered regions and assess it on proteins from the 10th CASP competition, as well as on a very large subset of proteins extracted from PDB. Our results, obtained with ensembles of extremely randomized trees, highlight a novel feature function encoding the proximity of residues according to their accessibility to the solvent, which is playing the second most important role in the prediction of disordered regions, just after evolutionary information. Furthermore, even though our approach treats each residue independently, our results are very competitive in terms of accuracy with respect to the state-of-the-art. A web-application is available at http://m24.giga.ulg.ac.be:81/x3Disorder.
format Online
Article
Text
id pubmed-3864923
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-38649232013-12-19 On the Encoding of Proteins for Disordered Regions Prediction Becker, Julien Maes, Francis Wehenkel, Louis PLoS One Research Article Disordered regions, i.e., regions of proteins that do not adopt a stable three-dimensional structure, have been shown to play various and critical roles in many biological processes. Predicting and understanding their formation is therefore a key sub-problem of protein structure and function inference. A wide range of machine learning approaches have been developed to automatically predict disordered regions of proteins. One key factor of the success of these methods is the way in which protein information is encoded into features. Recently, we have proposed a systematic methodology to study the relevance of various feature encodings in the context of disulfide connectivity pattern prediction. In the present paper, we adapt this methodology to the problem of predicting disordered regions and assess it on proteins from the 10th CASP competition, as well as on a very large subset of proteins extracted from PDB. Our results, obtained with ensembles of extremely randomized trees, highlight a novel feature function encoding the proximity of residues according to their accessibility to the solvent, which is playing the second most important role in the prediction of disordered regions, just after evolutionary information. Furthermore, even though our approach treats each residue independently, our results are very competitive in terms of accuracy with respect to the state-of-the-art. A web-application is available at http://m24.giga.ulg.ac.be:81/x3Disorder. Public Library of Science 2013-12-16 /pmc/articles/PMC3864923/ /pubmed/24358161 http://dx.doi.org/10.1371/journal.pone.0082252 Text en © 2013 Becker et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Becker, Julien
Maes, Francis
Wehenkel, Louis
On the Encoding of Proteins for Disordered Regions Prediction
title On the Encoding of Proteins for Disordered Regions Prediction
title_full On the Encoding of Proteins for Disordered Regions Prediction
title_fullStr On the Encoding of Proteins for Disordered Regions Prediction
title_full_unstemmed On the Encoding of Proteins for Disordered Regions Prediction
title_short On the Encoding of Proteins for Disordered Regions Prediction
title_sort on the encoding of proteins for disordered regions prediction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3864923/
https://www.ncbi.nlm.nih.gov/pubmed/24358161
http://dx.doi.org/10.1371/journal.pone.0082252
work_keys_str_mv AT beckerjulien ontheencodingofproteinsfordisorderedregionsprediction
AT maesfrancis ontheencodingofproteinsfordisorderedregionsprediction
AT wehenkellouis ontheencodingofproteinsfordisorderedregionsprediction