Cargando…

A Novel Method of Predicting Protein Disordered Regions Based on Sequence Features

With a large number of disordered proteins and their important functions discovered, it is highly desired to develop effective methods to computationally predict protein disordered regions. In this study, based on Random Forest (RF), Maximum Relevancy Minimum Redundancy (mRMR), and Incremental Featu...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Tong-Hui, Jiang, Min, Huang, Tao, Li, Bi-Qing, Zhang, Ning, Li, Hai-Peng, Cai, Yu-Dong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3654632/
https://www.ncbi.nlm.nih.gov/pubmed/23710446
http://dx.doi.org/10.1155/2013/414327
_version_ 1782476057430982656
author Zhao, Tong-Hui
Jiang, Min
Huang, Tao
Li, Bi-Qing
Zhang, Ning
Li, Hai-Peng
Cai, Yu-Dong
author_facet Zhao, Tong-Hui
Jiang, Min
Huang, Tao
Li, Bi-Qing
Zhang, Ning
Li, Hai-Peng
Cai, Yu-Dong
author_sort Zhao, Tong-Hui
collection PubMed
description With a large number of disordered proteins and their important functions discovered, it is highly desired to develop effective methods to computationally predict protein disordered regions. In this study, based on Random Forest (RF), Maximum Relevancy Minimum Redundancy (mRMR), and Incremental Feature Selection (IFS), we developed a new method to predict disordered regions in proteins. The mRMR criterion was used to rank the importance of all candidate features. Finally, top 128 features were selected from the ranked feature list to build the optimal model, including 92 Position Specific Scoring Matrix (PSSM) conservation score features and 36 secondary structure features. As a result, Matthews correlation coefficient (MCC) of 0.3895 was achieved on the training set by 10-fold cross-validation. On the basis of predicting results for each query sequence by using the method, we used the scanning and modification strategy to improve the performance. The accuracy (ACC) and MCC were increased by 4% and almost 0.2%, respectively, compared with other three popular predictors: DISOPRED, DISOclust, and OnD-CRF. The selected features may shed some light on the understanding of the formation mechanism of disordered structures, providing guidelines for experimental validation.
format Online
Article
Text
id pubmed-3654632
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-36546322013-05-24 A Novel Method of Predicting Protein Disordered Regions Based on Sequence Features Zhao, Tong-Hui Jiang, Min Huang, Tao Li, Bi-Qing Zhang, Ning Li, Hai-Peng Cai, Yu-Dong Biomed Res Int Research Article With a large number of disordered proteins and their important functions discovered, it is highly desired to develop effective methods to computationally predict protein disordered regions. In this study, based on Random Forest (RF), Maximum Relevancy Minimum Redundancy (mRMR), and Incremental Feature Selection (IFS), we developed a new method to predict disordered regions in proteins. The mRMR criterion was used to rank the importance of all candidate features. Finally, top 128 features were selected from the ranked feature list to build the optimal model, including 92 Position Specific Scoring Matrix (PSSM) conservation score features and 36 secondary structure features. As a result, Matthews correlation coefficient (MCC) of 0.3895 was achieved on the training set by 10-fold cross-validation. On the basis of predicting results for each query sequence by using the method, we used the scanning and modification strategy to improve the performance. The accuracy (ACC) and MCC were increased by 4% and almost 0.2%, respectively, compared with other three popular predictors: DISOPRED, DISOclust, and OnD-CRF. The selected features may shed some light on the understanding of the formation mechanism of disordered structures, providing guidelines for experimental validation. Hindawi Publishing Corporation 2013 2013-04-22 /pmc/articles/PMC3654632/ /pubmed/23710446 http://dx.doi.org/10.1155/2013/414327 Text en Copyright © 2013 Tong-Hui Zhao et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Zhao, Tong-Hui
Jiang, Min
Huang, Tao
Li, Bi-Qing
Zhang, Ning
Li, Hai-Peng
Cai, Yu-Dong
A Novel Method of Predicting Protein Disordered Regions Based on Sequence Features
title A Novel Method of Predicting Protein Disordered Regions Based on Sequence Features
title_full A Novel Method of Predicting Protein Disordered Regions Based on Sequence Features
title_fullStr A Novel Method of Predicting Protein Disordered Regions Based on Sequence Features
title_full_unstemmed A Novel Method of Predicting Protein Disordered Regions Based on Sequence Features
title_short A Novel Method of Predicting Protein Disordered Regions Based on Sequence Features
title_sort novel method of predicting protein disordered regions based on sequence features
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3654632/
https://www.ncbi.nlm.nih.gov/pubmed/23710446
http://dx.doi.org/10.1155/2013/414327
work_keys_str_mv AT zhaotonghui anovelmethodofpredictingproteindisorderedregionsbasedonsequencefeatures
AT jiangmin anovelmethodofpredictingproteindisorderedregionsbasedonsequencefeatures
AT huangtao anovelmethodofpredictingproteindisorderedregionsbasedonsequencefeatures
AT libiqing anovelmethodofpredictingproteindisorderedregionsbasedonsequencefeatures
AT zhangning anovelmethodofpredictingproteindisorderedregionsbasedonsequencefeatures
AT lihaipeng anovelmethodofpredictingproteindisorderedregionsbasedonsequencefeatures
AT caiyudong anovelmethodofpredictingproteindisorderedregionsbasedonsequencefeatures
AT zhaotonghui novelmethodofpredictingproteindisorderedregionsbasedonsequencefeatures
AT jiangmin novelmethodofpredictingproteindisorderedregionsbasedonsequencefeatures
AT huangtao novelmethodofpredictingproteindisorderedregionsbasedonsequencefeatures
AT libiqing novelmethodofpredictingproteindisorderedregionsbasedonsequencefeatures
AT zhangning novelmethodofpredictingproteindisorderedregionsbasedonsequencefeatures
AT lihaipeng novelmethodofpredictingproteindisorderedregionsbasedonsequencefeatures
AT caiyudong novelmethodofpredictingproteindisorderedregionsbasedonsequencefeatures