Cargando…

Prediction of Protein Domain with mRMR Feature Selection and Analysis

The domains are the structural and functional units of proteins. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to develop effective methods for predicting the protein domains according to the sequences information alone, so as to facilitate the struct...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Bi-Qing, Hu, Le-Le, Chen, Lei, Feng, Kai-Yan, Cai, Yu-Dong, Chou, Kuo-Chen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3376124/
https://www.ncbi.nlm.nih.gov/pubmed/22720092
http://dx.doi.org/10.1371/journal.pone.0039308
_version_ 1782235809784528896
author Li, Bi-Qing
Hu, Le-Le
Chen, Lei
Feng, Kai-Yan
Cai, Yu-Dong
Chou, Kuo-Chen
author_facet Li, Bi-Qing
Hu, Le-Le
Chen, Lei
Feng, Kai-Yan
Cai, Yu-Dong
Chou, Kuo-Chen
author_sort Li, Bi-Qing
collection PubMed
description The domains are the structural and functional units of proteins. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to develop effective methods for predicting the protein domains according to the sequences information alone, so as to facilitate the structure prediction of proteins and speed up their functional annotation. However, although many efforts have been made in this regard, prediction of protein domains from the sequence information still remains a challenging and elusive problem. Here, a new method was developed by combing the techniques of RF (random forest), mRMR (maximum relevance minimum redundancy), and IFS (incremental feature selection), as well as by incorporating the features of physicochemical and biochemical properties, sequence conservation, residual disorder, secondary structure, and solvent accessibility. The overall success rate achieved by the new method on an independent dataset was around 73%, which was about 28–40% higher than those by the existing method on the same benchmark dataset. Furthermore, it was revealed by an in-depth analysis that the features of evolution, codon diversity, electrostatic charge, and disorder played more important roles than the others in predicting protein domains, quite consistent with experimental observations. It is anticipated that the new method may become a high-throughput tool in annotating protein domains, or may, at the very least, play a complementary role to the existing domain prediction methods, and that the findings about the key features with high impacts to the domain prediction might provide useful insights or clues for further experimental investigations in this area. Finally, it has not escaped our notice that the current approach can also be utilized to study protein signal peptides, B-cell epitopes, HIV protease cleavage sites, among many other important topics in protein science and biomedicine.
format Online
Article
Text
id pubmed-3376124
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-33761242012-06-20 Prediction of Protein Domain with mRMR Feature Selection and Analysis Li, Bi-Qing Hu, Le-Le Chen, Lei Feng, Kai-Yan Cai, Yu-Dong Chou, Kuo-Chen PLoS One Research Article The domains are the structural and functional units of proteins. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to develop effective methods for predicting the protein domains according to the sequences information alone, so as to facilitate the structure prediction of proteins and speed up their functional annotation. However, although many efforts have been made in this regard, prediction of protein domains from the sequence information still remains a challenging and elusive problem. Here, a new method was developed by combing the techniques of RF (random forest), mRMR (maximum relevance minimum redundancy), and IFS (incremental feature selection), as well as by incorporating the features of physicochemical and biochemical properties, sequence conservation, residual disorder, secondary structure, and solvent accessibility. The overall success rate achieved by the new method on an independent dataset was around 73%, which was about 28–40% higher than those by the existing method on the same benchmark dataset. Furthermore, it was revealed by an in-depth analysis that the features of evolution, codon diversity, electrostatic charge, and disorder played more important roles than the others in predicting protein domains, quite consistent with experimental observations. It is anticipated that the new method may become a high-throughput tool in annotating protein domains, or may, at the very least, play a complementary role to the existing domain prediction methods, and that the findings about the key features with high impacts to the domain prediction might provide useful insights or clues for further experimental investigations in this area. Finally, it has not escaped our notice that the current approach can also be utilized to study protein signal peptides, B-cell epitopes, HIV protease cleavage sites, among many other important topics in protein science and biomedicine. Public Library of Science 2012-06-15 /pmc/articles/PMC3376124/ /pubmed/22720092 http://dx.doi.org/10.1371/journal.pone.0039308 Text en Li et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Li, Bi-Qing
Hu, Le-Le
Chen, Lei
Feng, Kai-Yan
Cai, Yu-Dong
Chou, Kuo-Chen
Prediction of Protein Domain with mRMR Feature Selection and Analysis
title Prediction of Protein Domain with mRMR Feature Selection and Analysis
title_full Prediction of Protein Domain with mRMR Feature Selection and Analysis
title_fullStr Prediction of Protein Domain with mRMR Feature Selection and Analysis
title_full_unstemmed Prediction of Protein Domain with mRMR Feature Selection and Analysis
title_short Prediction of Protein Domain with mRMR Feature Selection and Analysis
title_sort prediction of protein domain with mrmr feature selection and analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3376124/
https://www.ncbi.nlm.nih.gov/pubmed/22720092
http://dx.doi.org/10.1371/journal.pone.0039308
work_keys_str_mv AT libiqing predictionofproteindomainwithmrmrfeatureselectionandanalysis
AT hulele predictionofproteindomainwithmrmrfeatureselectionandanalysis
AT chenlei predictionofproteindomainwithmrmrfeatureselectionandanalysis
AT fengkaiyan predictionofproteindomainwithmrmrfeatureselectionandanalysis
AT caiyudong predictionofproteindomainwithmrmrfeatureselectionandanalysis
AT choukuochen predictionofproteindomainwithmrmrfeatureselectionandanalysis