Cargando…

Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources

Motivation: Intrinsically disordered proteins play a crucial role in numerous regulatory processes. Their abundance and ubiquity combined with a relatively low quantity of their annotations motivate research toward the development of computational models that predict disordered regions from protein...

Descripción completa

Detalles Bibliográficos
Autores principales: Mizianty, Marcin J., Stach, Wojciech, Chen, Ke, Kedarisetti, Kanaka Durga, Disfani, Fatemeh Miri, Kurgan, Lukasz
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2935446/
https://www.ncbi.nlm.nih.gov/pubmed/20823312
http://dx.doi.org/10.1093/bioinformatics/btq373
_version_ 1782186404983341056
author Mizianty, Marcin J.
Stach, Wojciech
Chen, Ke
Kedarisetti, Kanaka Durga
Disfani, Fatemeh Miri
Kurgan, Lukasz
author_facet Mizianty, Marcin J.
Stach, Wojciech
Chen, Ke
Kedarisetti, Kanaka Durga
Disfani, Fatemeh Miri
Kurgan, Lukasz
author_sort Mizianty, Marcin J.
collection PubMed
description Motivation: Intrinsically disordered proteins play a crucial role in numerous regulatory processes. Their abundance and ubiquity combined with a relatively low quantity of their annotations motivate research toward the development of computational models that predict disordered regions from protein sequences. Although the prediction quality of these methods continues to rise, novel and improved predictors are urgently needed. Results: We propose a novel method, named MFDp (Multilayered Fusion-based Disorder predictor), that aims to improve over the current disorder predictors. MFDp is as an ensemble of 3 Support Vector Machines specialized for the prediction of short, long and generic disordered regions. It combines three complementary disorder predictors, sequence, sequence profiles, predicted secondary structure, solvent accessibility, backbone dihedral torsion angles, residue flexibility and B-factors. Our method utilizes a custom-designed set of features that are based on raw predictions and aggregated raw values and recognizes various types of disorder. The MFDp is compared at the residue level on two datasets against eight recent disorder predictors and top-performing methods from the most recent CASP8 experiment. In spite of using training chains with ≤25% similarity to the test sequences, our method consistently and significantly outperforms the other methods based on the MCC index. The MFDp outperforms modern disorder predictors for the binary disorder assignment and provides competitive real-valued predictions. The MFDp's outputs are also shown to outperform the other methods in the identification of proteins with long disordered regions. Availability: http://biomine.ece.ualberta.ca/MFDp.html Supplementary information: Supplementary data are available at Bioinformatics online. Contact: lkurgan@ece.ualberta.ca
format Text
id pubmed-2935446
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-29354462010-09-08 Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources Mizianty, Marcin J. Stach, Wojciech Chen, Ke Kedarisetti, Kanaka Durga Disfani, Fatemeh Miri Kurgan, Lukasz Bioinformatics Eccb 2010 Conference Proceedings September 26 to September 29, 2010, Ghent, Belgium Motivation: Intrinsically disordered proteins play a crucial role in numerous regulatory processes. Their abundance and ubiquity combined with a relatively low quantity of their annotations motivate research toward the development of computational models that predict disordered regions from protein sequences. Although the prediction quality of these methods continues to rise, novel and improved predictors are urgently needed. Results: We propose a novel method, named MFDp (Multilayered Fusion-based Disorder predictor), that aims to improve over the current disorder predictors. MFDp is as an ensemble of 3 Support Vector Machines specialized for the prediction of short, long and generic disordered regions. It combines three complementary disorder predictors, sequence, sequence profiles, predicted secondary structure, solvent accessibility, backbone dihedral torsion angles, residue flexibility and B-factors. Our method utilizes a custom-designed set of features that are based on raw predictions and aggregated raw values and recognizes various types of disorder. The MFDp is compared at the residue level on two datasets against eight recent disorder predictors and top-performing methods from the most recent CASP8 experiment. In spite of using training chains with ≤25% similarity to the test sequences, our method consistently and significantly outperforms the other methods based on the MCC index. The MFDp outperforms modern disorder predictors for the binary disorder assignment and provides competitive real-valued predictions. The MFDp's outputs are also shown to outperform the other methods in the identification of proteins with long disordered regions. Availability: http://biomine.ece.ualberta.ca/MFDp.html Supplementary information: Supplementary data are available at Bioinformatics online. Contact: lkurgan@ece.ualberta.ca Oxford University Press 2010-09-15 2010-09-04 /pmc/articles/PMC2935446/ /pubmed/20823312 http://dx.doi.org/10.1093/bioinformatics/btq373 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Eccb 2010 Conference Proceedings September 26 to September 29, 2010, Ghent, Belgium
Mizianty, Marcin J.
Stach, Wojciech
Chen, Ke
Kedarisetti, Kanaka Durga
Disfani, Fatemeh Miri
Kurgan, Lukasz
Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources
title Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources
title_full Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources
title_fullStr Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources
title_full_unstemmed Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources
title_short Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources
title_sort improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources
topic Eccb 2010 Conference Proceedings September 26 to September 29, 2010, Ghent, Belgium
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2935446/
https://www.ncbi.nlm.nih.gov/pubmed/20823312
http://dx.doi.org/10.1093/bioinformatics/btq373
work_keys_str_mv AT miziantymarcinj improvedsequencebasedpredictionofdisorderedregionswithmultilayerfusionofmultipleinformationsources
AT stachwojciech improvedsequencebasedpredictionofdisorderedregionswithmultilayerfusionofmultipleinformationsources
AT chenke improvedsequencebasedpredictionofdisorderedregionswithmultilayerfusionofmultipleinformationsources
AT kedarisettikanakadurga improvedsequencebasedpredictionofdisorderedregionswithmultilayerfusionofmultipleinformationsources
AT disfanifatemehmiri improvedsequencebasedpredictionofdisorderedregionswithmultilayerfusionofmultipleinformationsources
AT kurganlukasz improvedsequencebasedpredictionofdisorderedregionswithmultilayerfusionofmultipleinformationsources