Cargando…

DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences

Motivation: Disordered flexible linkers (DFLs) are disordered regions that serve as flexible linkers/spacers in multi-domain proteins or between structured constituents in domains. They are different from flexible linkers/residues because they are disordered and longer. Availability of experimentall...

Descripción completa

Detalles Bibliográficos
Autores principales:	Meng, Fanchi, Kurgan, Lukasz
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2016
Materias:	Ismb 2016 Proceedings July 8 to July 12, 2016, Orlando, Florida
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4908364/ https://www.ncbi.nlm.nih.gov/pubmed/27307636 http://dx.doi.org/10.1093/bioinformatics/btw280

_version_	1782437668644192256
author	Meng, Fanchi Kurgan, Lukasz
author_facet	Meng, Fanchi Kurgan, Lukasz
author_sort	Meng, Fanchi
collection	PubMed
description	Motivation: Disordered flexible linkers (DFLs) are disordered regions that serve as flexible linkers/spacers in multi-domain proteins or between structured constituents in domains. They are different from flexible linkers/residues because they are disordered and longer. Availability of experimentally annotated DFLs provides an opportunity to build high-throughput computational predictors of these regions from protein sequences. To date, there are no computational methods that directly predict DFLs and they can be found only indirectly by filtering predicted flexible residues with predictions of disorder. Results: We conceptualized, developed and empirically assessed a first-of-its-kind sequence-based predictor of DFLs, DFLpred. This method outputs propensity to form DFLs for each residue in the input sequence. DFLpred uses a small set of empirically selected features that quantify propensities to form certain secondary structures, disordered regions and structured regions, which are processed by a fast linear model. Our high-throughput predictor can be used on the whole-proteome scale; it needs <1 h to predict entire proteome on a single CPU. When assessed on an independent test dataset with low sequence-identity proteins, it secures area under the receiver operating characteristic curve equal 0.715 and outperforms existing alternatives that include methods for the prediction of flexible linkers, flexible residues, intrinsically disordered residues and various combinations of these methods. Prediction on the complete human proteome reveals that about 10% of proteins have a large content of over 30% DFL residues. We also estimate that about 6000 DFL regions are long with ≥30 consecutive residues. Availability and implementation: http://biomine.ece.ualberta.ca/DFLpred/. Contact: lkurgan@vcu.edu Supplementary information: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-4908364
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-49083642016-06-17 DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences Meng, Fanchi Kurgan, Lukasz Bioinformatics Ismb 2016 Proceedings July 8 to July 12, 2016, Orlando, Florida Motivation: Disordered flexible linkers (DFLs) are disordered regions that serve as flexible linkers/spacers in multi-domain proteins or between structured constituents in domains. They are different from flexible linkers/residues because they are disordered and longer. Availability of experimentally annotated DFLs provides an opportunity to build high-throughput computational predictors of these regions from protein sequences. To date, there are no computational methods that directly predict DFLs and they can be found only indirectly by filtering predicted flexible residues with predictions of disorder. Results: We conceptualized, developed and empirically assessed a first-of-its-kind sequence-based predictor of DFLs, DFLpred. This method outputs propensity to form DFLs for each residue in the input sequence. DFLpred uses a small set of empirically selected features that quantify propensities to form certain secondary structures, disordered regions and structured regions, which are processed by a fast linear model. Our high-throughput predictor can be used on the whole-proteome scale; it needs <1 h to predict entire proteome on a single CPU. When assessed on an independent test dataset with low sequence-identity proteins, it secures area under the receiver operating characteristic curve equal 0.715 and outperforms existing alternatives that include methods for the prediction of flexible linkers, flexible residues, intrinsically disordered residues and various combinations of these methods. Prediction on the complete human proteome reveals that about 10% of proteins have a large content of over 30% DFL residues. We also estimate that about 6000 DFL regions are long with ≥30 consecutive residues. Availability and implementation: http://biomine.ece.ualberta.ca/DFLpred/. Contact: lkurgan@vcu.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-06-15 2016-06-11 /pmc/articles/PMC4908364/ /pubmed/27307636 http://dx.doi.org/10.1093/bioinformatics/btw280 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Ismb 2016 Proceedings July 8 to July 12, 2016, Orlando, Florida Meng, Fanchi Kurgan, Lukasz DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences
title	DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences
title_full	DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences
title_fullStr	DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences
title_full_unstemmed	DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences
title_short	DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences
title_sort	dflpred: high-throughput prediction of disordered flexible linker regions in protein sequences
topic	Ismb 2016 Proceedings July 8 to July 12, 2016, Orlando, Florida
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4908364/ https://www.ncbi.nlm.nih.gov/pubmed/27307636 http://dx.doi.org/10.1093/bioinformatics/btw280
work_keys_str_mv	AT mengfanchi dflpredhighthroughputpredictionofdisorderedflexiblelinkerregionsinproteinsequences AT kurganlukasz dflpredhighthroughputpredictionofdisorderedflexiblelinkerregionsinproteinsequences

DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences

Ejemplares similares