Cargando…
DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences
Motivation: Disordered flexible linkers (DFLs) are disordered regions that serve as flexible linkers/spacers in multi-domain proteins or between structured constituents in domains. They are different from flexible linkers/residues because they are disordered and longer. Availability of experimentall...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4908364/ https://www.ncbi.nlm.nih.gov/pubmed/27307636 http://dx.doi.org/10.1093/bioinformatics/btw280 |
_version_ | 1782437668644192256 |
---|---|
author | Meng, Fanchi Kurgan, Lukasz |
author_facet | Meng, Fanchi Kurgan, Lukasz |
author_sort | Meng, Fanchi |
collection | PubMed |
description | Motivation: Disordered flexible linkers (DFLs) are disordered regions that serve as flexible linkers/spacers in multi-domain proteins or between structured constituents in domains. They are different from flexible linkers/residues because they are disordered and longer. Availability of experimentally annotated DFLs provides an opportunity to build high-throughput computational predictors of these regions from protein sequences. To date, there are no computational methods that directly predict DFLs and they can be found only indirectly by filtering predicted flexible residues with predictions of disorder. Results: We conceptualized, developed and empirically assessed a first-of-its-kind sequence-based predictor of DFLs, DFLpred. This method outputs propensity to form DFLs for each residue in the input sequence. DFLpred uses a small set of empirically selected features that quantify propensities to form certain secondary structures, disordered regions and structured regions, which are processed by a fast linear model. Our high-throughput predictor can be used on the whole-proteome scale; it needs <1 h to predict entire proteome on a single CPU. When assessed on an independent test dataset with low sequence-identity proteins, it secures area under the receiver operating characteristic curve equal 0.715 and outperforms existing alternatives that include methods for the prediction of flexible linkers, flexible residues, intrinsically disordered residues and various combinations of these methods. Prediction on the complete human proteome reveals that about 10% of proteins have a large content of over 30% DFL residues. We also estimate that about 6000 DFL regions are long with ≥30 consecutive residues. Availability and implementation: http://biomine.ece.ualberta.ca/DFLpred/. Contact: lkurgan@vcu.edu Supplementary information: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-4908364 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-49083642016-06-17 DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences Meng, Fanchi Kurgan, Lukasz Bioinformatics Ismb 2016 Proceedings July 8 to July 12, 2016, Orlando, Florida Motivation: Disordered flexible linkers (DFLs) are disordered regions that serve as flexible linkers/spacers in multi-domain proteins or between structured constituents in domains. They are different from flexible linkers/residues because they are disordered and longer. Availability of experimentally annotated DFLs provides an opportunity to build high-throughput computational predictors of these regions from protein sequences. To date, there are no computational methods that directly predict DFLs and they can be found only indirectly by filtering predicted flexible residues with predictions of disorder. Results: We conceptualized, developed and empirically assessed a first-of-its-kind sequence-based predictor of DFLs, DFLpred. This method outputs propensity to form DFLs for each residue in the input sequence. DFLpred uses a small set of empirically selected features that quantify propensities to form certain secondary structures, disordered regions and structured regions, which are processed by a fast linear model. Our high-throughput predictor can be used on the whole-proteome scale; it needs <1 h to predict entire proteome on a single CPU. When assessed on an independent test dataset with low sequence-identity proteins, it secures area under the receiver operating characteristic curve equal 0.715 and outperforms existing alternatives that include methods for the prediction of flexible linkers, flexible residues, intrinsically disordered residues and various combinations of these methods. Prediction on the complete human proteome reveals that about 10% of proteins have a large content of over 30% DFL residues. We also estimate that about 6000 DFL regions are long with ≥30 consecutive residues. Availability and implementation: http://biomine.ece.ualberta.ca/DFLpred/. Contact: lkurgan@vcu.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-06-15 2016-06-11 /pmc/articles/PMC4908364/ /pubmed/27307636 http://dx.doi.org/10.1093/bioinformatics/btw280 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Ismb 2016 Proceedings July 8 to July 12, 2016, Orlando, Florida Meng, Fanchi Kurgan, Lukasz DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences |
title | DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences |
title_full | DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences |
title_fullStr | DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences |
title_full_unstemmed | DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences |
title_short | DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences |
title_sort | dflpred: high-throughput prediction of disordered flexible linker regions in protein sequences |
topic | Ismb 2016 Proceedings July 8 to July 12, 2016, Orlando, Florida |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4908364/ https://www.ncbi.nlm.nih.gov/pubmed/27307636 http://dx.doi.org/10.1093/bioinformatics/btw280 |
work_keys_str_mv | AT mengfanchi dflpredhighthroughputpredictionofdisorderedflexiblelinkerregionsinproteinsequences AT kurganlukasz dflpredhighthroughputpredictionofdisorderedflexiblelinkerregionsinproteinsequences |