Cargando…

DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences

Motivation: Disordered flexible linkers (DFLs) are disordered regions that serve as flexible linkers/spacers in multi-domain proteins or between structured constituents in domains. They are different from flexible linkers/residues because they are disordered and longer. Availability of experimentall...

Descripción completa

Detalles Bibliográficos
Autores principales: Meng, Fanchi, Kurgan, Lukasz
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4908364/
https://www.ncbi.nlm.nih.gov/pubmed/27307636
http://dx.doi.org/10.1093/bioinformatics/btw280
_version_ 1782437668644192256
author Meng, Fanchi
Kurgan, Lukasz
author_facet Meng, Fanchi
Kurgan, Lukasz
author_sort Meng, Fanchi
collection PubMed
description Motivation: Disordered flexible linkers (DFLs) are disordered regions that serve as flexible linkers/spacers in multi-domain proteins or between structured constituents in domains. They are different from flexible linkers/residues because they are disordered and longer. Availability of experimentally annotated DFLs provides an opportunity to build high-throughput computational predictors of these regions from protein sequences. To date, there are no computational methods that directly predict DFLs and they can be found only indirectly by filtering predicted flexible residues with predictions of disorder. Results: We conceptualized, developed and empirically assessed a first-of-its-kind sequence-based predictor of DFLs, DFLpred. This method outputs propensity to form DFLs for each residue in the input sequence. DFLpred uses a small set of empirically selected features that quantify propensities to form certain secondary structures, disordered regions and structured regions, which are processed by a fast linear model. Our high-throughput predictor can be used on the whole-proteome scale; it needs <1 h to predict entire proteome on a single CPU. When assessed on an independent test dataset with low sequence-identity proteins, it secures area under the receiver operating characteristic curve equal 0.715 and outperforms existing alternatives that include methods for the prediction of flexible linkers, flexible residues, intrinsically disordered residues and various combinations of these methods. Prediction on the complete human proteome reveals that about 10% of proteins have a large content of over 30% DFL residues. We also estimate that about 6000 DFL regions are long with ≥30 consecutive residues. Availability and implementation: http://biomine.ece.ualberta.ca/DFLpred/. Contact: lkurgan@vcu.edu Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-4908364
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-49083642016-06-17 DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences Meng, Fanchi Kurgan, Lukasz Bioinformatics Ismb 2016 Proceedings July 8 to July 12, 2016, Orlando, Florida Motivation: Disordered flexible linkers (DFLs) are disordered regions that serve as flexible linkers/spacers in multi-domain proteins or between structured constituents in domains. They are different from flexible linkers/residues because they are disordered and longer. Availability of experimentally annotated DFLs provides an opportunity to build high-throughput computational predictors of these regions from protein sequences. To date, there are no computational methods that directly predict DFLs and they can be found only indirectly by filtering predicted flexible residues with predictions of disorder. Results: We conceptualized, developed and empirically assessed a first-of-its-kind sequence-based predictor of DFLs, DFLpred. This method outputs propensity to form DFLs for each residue in the input sequence. DFLpred uses a small set of empirically selected features that quantify propensities to form certain secondary structures, disordered regions and structured regions, which are processed by a fast linear model. Our high-throughput predictor can be used on the whole-proteome scale; it needs <1 h to predict entire proteome on a single CPU. When assessed on an independent test dataset with low sequence-identity proteins, it secures area under the receiver operating characteristic curve equal 0.715 and outperforms existing alternatives that include methods for the prediction of flexible linkers, flexible residues, intrinsically disordered residues and various combinations of these methods. Prediction on the complete human proteome reveals that about 10% of proteins have a large content of over 30% DFL residues. We also estimate that about 6000 DFL regions are long with ≥30 consecutive residues. Availability and implementation: http://biomine.ece.ualberta.ca/DFLpred/. Contact: lkurgan@vcu.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-06-15 2016-06-11 /pmc/articles/PMC4908364/ /pubmed/27307636 http://dx.doi.org/10.1093/bioinformatics/btw280 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb 2016 Proceedings July 8 to July 12, 2016, Orlando, Florida
Meng, Fanchi
Kurgan, Lukasz
DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences
title DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences
title_full DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences
title_fullStr DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences
title_full_unstemmed DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences
title_short DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences
title_sort dflpred: high-throughput prediction of disordered flexible linker regions in protein sequences
topic Ismb 2016 Proceedings July 8 to July 12, 2016, Orlando, Florida
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4908364/
https://www.ncbi.nlm.nih.gov/pubmed/27307636
http://dx.doi.org/10.1093/bioinformatics/btw280
work_keys_str_mv AT mengfanchi dflpredhighthroughputpredictionofdisorderedflexiblelinkerregionsinproteinsequences
AT kurganlukasz dflpredhighthroughputpredictionofdisorderedflexiblelinkerregionsinproteinsequences