Cargando…

An Ensemble Method with Hybrid Features to Identify Extracellular Matrix Proteins

The extracellular matrix (ECM) is a dynamic composite of secreted proteins that play important roles in numerous biological processes such as tissue morphogenesis, differentiation and homeostasis. Furthermore, various diseases are caused by the dysfunction of ECM proteins. Therefore, identifying the...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Runtao, Zhang, Chengjin, Gao, Rui, Zhang, Lina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4334504/
https://www.ncbi.nlm.nih.gov/pubmed/25680094
http://dx.doi.org/10.1371/journal.pone.0117804
_version_ 1782358196129628160
author Yang, Runtao
Zhang, Chengjin
Gao, Rui
Zhang, Lina
author_facet Yang, Runtao
Zhang, Chengjin
Gao, Rui
Zhang, Lina
author_sort Yang, Runtao
collection PubMed
description The extracellular matrix (ECM) is a dynamic composite of secreted proteins that play important roles in numerous biological processes such as tissue morphogenesis, differentiation and homeostasis. Furthermore, various diseases are caused by the dysfunction of ECM proteins. Therefore, identifying these important ECM proteins may assist in understanding related biological processes and drug development. In view of the serious imbalance in the training dataset, a Random Forest-based ensemble method with hybrid features is developed in this paper to identify ECM proteins. Hybrid features are employed by incorporating sequence composition, physicochemical properties, evolutionary and structural information. The Information Gain Ratio and Incremental Feature Selection (IGR-IFS) methods are adopted to select the optimal features. Finally, the resulting predictor termed IECMP (Identify ECM Proteins) achieves an balanced accuracy of 86.4% using the 10-fold cross-validation on the training dataset, which is much higher than results obtained by other methods (ECMPRED: 71.0%, ECMPP: 77.8%). Moreover, when tested on a common independent dataset, our method also achieves significantly improved performance over ECMPP and ECMPRED. These results indicate that IECMP is an effective method for ECM protein prediction, which has a more balanced prediction capability for positive and negative samples. It is anticipated that the proposed method will provide significant information to fully decipher the molecular mechanisms of ECM-related biological processes and discover candidate drug targets. For public access, we develop a user-friendly web server for ECM protein identification that is freely accessible at http://iecmp.weka.cc.
format Online
Article
Text
id pubmed-4334504
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-43345042015-02-24 An Ensemble Method with Hybrid Features to Identify Extracellular Matrix Proteins Yang, Runtao Zhang, Chengjin Gao, Rui Zhang, Lina PLoS One Research Article The extracellular matrix (ECM) is a dynamic composite of secreted proteins that play important roles in numerous biological processes such as tissue morphogenesis, differentiation and homeostasis. Furthermore, various diseases are caused by the dysfunction of ECM proteins. Therefore, identifying these important ECM proteins may assist in understanding related biological processes and drug development. In view of the serious imbalance in the training dataset, a Random Forest-based ensemble method with hybrid features is developed in this paper to identify ECM proteins. Hybrid features are employed by incorporating sequence composition, physicochemical properties, evolutionary and structural information. The Information Gain Ratio and Incremental Feature Selection (IGR-IFS) methods are adopted to select the optimal features. Finally, the resulting predictor termed IECMP (Identify ECM Proteins) achieves an balanced accuracy of 86.4% using the 10-fold cross-validation on the training dataset, which is much higher than results obtained by other methods (ECMPRED: 71.0%, ECMPP: 77.8%). Moreover, when tested on a common independent dataset, our method also achieves significantly improved performance over ECMPP and ECMPRED. These results indicate that IECMP is an effective method for ECM protein prediction, which has a more balanced prediction capability for positive and negative samples. It is anticipated that the proposed method will provide significant information to fully decipher the molecular mechanisms of ECM-related biological processes and discover candidate drug targets. For public access, we develop a user-friendly web server for ECM protein identification that is freely accessible at http://iecmp.weka.cc. Public Library of Science 2015-02-13 /pmc/articles/PMC4334504/ /pubmed/25680094 http://dx.doi.org/10.1371/journal.pone.0117804 Text en © 2015 Yang et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Yang, Runtao
Zhang, Chengjin
Gao, Rui
Zhang, Lina
An Ensemble Method with Hybrid Features to Identify Extracellular Matrix Proteins
title An Ensemble Method with Hybrid Features to Identify Extracellular Matrix Proteins
title_full An Ensemble Method with Hybrid Features to Identify Extracellular Matrix Proteins
title_fullStr An Ensemble Method with Hybrid Features to Identify Extracellular Matrix Proteins
title_full_unstemmed An Ensemble Method with Hybrid Features to Identify Extracellular Matrix Proteins
title_short An Ensemble Method with Hybrid Features to Identify Extracellular Matrix Proteins
title_sort ensemble method with hybrid features to identify extracellular matrix proteins
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4334504/
https://www.ncbi.nlm.nih.gov/pubmed/25680094
http://dx.doi.org/10.1371/journal.pone.0117804
work_keys_str_mv AT yangruntao anensemblemethodwithhybridfeaturestoidentifyextracellularmatrixproteins
AT zhangchengjin anensemblemethodwithhybridfeaturestoidentifyextracellularmatrixproteins
AT gaorui anensemblemethodwithhybridfeaturestoidentifyextracellularmatrixproteins
AT zhanglina anensemblemethodwithhybridfeaturestoidentifyextracellularmatrixproteins
AT yangruntao ensemblemethodwithhybridfeaturestoidentifyextracellularmatrixproteins
AT zhangchengjin ensemblemethodwithhybridfeaturestoidentifyextracellularmatrixproteins
AT gaorui ensemblemethodwithhybridfeaturestoidentifyextracellularmatrixproteins
AT zhanglina ensemblemethodwithhybridfeaturestoidentifyextracellularmatrixproteins