Cargando…

Large-scale prediction of long disordered regions in proteins using random forests

BACKGROUND: Many proteins contain disordered regions that lack fixed three-dimensional (3D) structure under physiological conditions but have important biological functions. Prediction of disordered regions in protein sequences is important for understanding protein function and in high-throughput d...

Descripción completa

Detalles Bibliográficos
Autores principales:	Han, Pengfei, Zhang, Xiuzhen, Norton, Raymond S, Feng, Zhi-Ping
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2637845/ https://www.ncbi.nlm.nih.gov/pubmed/19128505 http://dx.doi.org/10.1186/1471-2105-10-8

_version_	1782164368050356224
author	Han, Pengfei Zhang, Xiuzhen Norton, Raymond S Feng, Zhi-Ping
author_facet	Han, Pengfei Zhang, Xiuzhen Norton, Raymond S Feng, Zhi-Ping
author_sort	Han, Pengfei
collection	PubMed
description	BACKGROUND: Many proteins contain disordered regions that lack fixed three-dimensional (3D) structure under physiological conditions but have important biological functions. Prediction of disordered regions in protein sequences is important for understanding protein function and in high-throughput determination of protein structures. Machine learning techniques, including neural networks and support vector machines have been widely used in such predictions. Predictors designed for long disordered regions are usually less successful in predicting short disordered regions. Combining prediction of short and long disordered regions will dramatically increase the complexity of the prediction algorithm and make the predictor unsuitable for large-scale applications. Efficient batch prediction of long disordered regions alone is of greater interest in large-scale proteome studies. RESULTS: A new algorithm, IUPforest-L, for predicting long disordered regions using the random forest learning model is proposed in this paper. IUPforest-L is based on the Moreau-Broto auto-correlation function of amino acid indices (AAIs) and other physicochemical features of the primary sequences. In 10-fold cross validation tests, IUPforest-L can achieve an area of 89.5% under the receiver operating characteristic (ROC) curve. Compared with existing disorder predictors, IUPforest-L has high prediction accuracy and is efficient for predicting long disordered regions in large-scale proteomes. CONCLUSION: The random forest model based on the auto-correlation functions of the AAIs within a protein fragment and other physicochemical features could effectively detect long disordered regions in proteins. A new predictor, IUPforest-L, was developed to batch predict long disordered regions in proteins, and the server can be accessed from
format	Text
id	pubmed-2637845
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-26378452009-02-10 Large-scale prediction of long disordered regions in proteins using random forests Han, Pengfei Zhang, Xiuzhen Norton, Raymond S Feng, Zhi-Ping BMC Bioinformatics Methodology Article BACKGROUND: Many proteins contain disordered regions that lack fixed three-dimensional (3D) structure under physiological conditions but have important biological functions. Prediction of disordered regions in protein sequences is important for understanding protein function and in high-throughput determination of protein structures. Machine learning techniques, including neural networks and support vector machines have been widely used in such predictions. Predictors designed for long disordered regions are usually less successful in predicting short disordered regions. Combining prediction of short and long disordered regions will dramatically increase the complexity of the prediction algorithm and make the predictor unsuitable for large-scale applications. Efficient batch prediction of long disordered regions alone is of greater interest in large-scale proteome studies. RESULTS: A new algorithm, IUPforest-L, for predicting long disordered regions using the random forest learning model is proposed in this paper. IUPforest-L is based on the Moreau-Broto auto-correlation function of amino acid indices (AAIs) and other physicochemical features of the primary sequences. In 10-fold cross validation tests, IUPforest-L can achieve an area of 89.5% under the receiver operating characteristic (ROC) curve. Compared with existing disorder predictors, IUPforest-L has high prediction accuracy and is efficient for predicting long disordered regions in large-scale proteomes. CONCLUSION: The random forest model based on the auto-correlation functions of the AAIs within a protein fragment and other physicochemical features could effectively detect long disordered regions in proteins. A new predictor, IUPforest-L, was developed to batch predict long disordered regions in proteins, and the server can be accessed from BioMed Central 2009-01-07 /pmc/articles/PMC2637845/ /pubmed/19128505 http://dx.doi.org/10.1186/1471-2105-10-8 Text en Copyright © 2009 Han et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Han, Pengfei Zhang, Xiuzhen Norton, Raymond S Feng, Zhi-Ping Large-scale prediction of long disordered regions in proteins using random forests
title	Large-scale prediction of long disordered regions in proteins using random forests
title_full	Large-scale prediction of long disordered regions in proteins using random forests
title_fullStr	Large-scale prediction of long disordered regions in proteins using random forests
title_full_unstemmed	Large-scale prediction of long disordered regions in proteins using random forests
title_short	Large-scale prediction of long disordered regions in proteins using random forests
title_sort	large-scale prediction of long disordered regions in proteins using random forests
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2637845/ https://www.ncbi.nlm.nih.gov/pubmed/19128505 http://dx.doi.org/10.1186/1471-2105-10-8
work_keys_str_mv	AT hanpengfei largescalepredictionoflongdisorderedregionsinproteinsusingrandomforests AT zhangxiuzhen largescalepredictionoflongdisorderedregionsinproteinsusingrandomforests AT nortonraymonds largescalepredictionoflongdisorderedregionsinproteinsusingrandomforests AT fengzhiping largescalepredictionoflongdisorderedregionsinproteinsusingrandomforests

Large-scale prediction of long disordered regions in proteins using random forests

Ejemplares similares