Cargando…
A Model Stacking Framework for Identifying DNA Binding Proteins by Orchestrating Multi-View Features and Classifiers
Nowadays, various machine learning-based approaches using sequence information alone have been proposed for identifying DNA-binding proteins, which are crucial to many cellular processes, such as DNA replication, DNA repair and DNA modification. Among these methods, building a meaningful feature rep...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6116045/ https://www.ncbi.nlm.nih.gov/pubmed/30071697 http://dx.doi.org/10.3390/genes9080394 |
_version_ | 1783351521435451392 |
---|---|
author | Liu, Xiu-Juan Gong, Xiu-Jun Yu, Hua Xu, Jia-Hui |
author_facet | Liu, Xiu-Juan Gong, Xiu-Jun Yu, Hua Xu, Jia-Hui |
author_sort | Liu, Xiu-Juan |
collection | PubMed |
description | Nowadays, various machine learning-based approaches using sequence information alone have been proposed for identifying DNA-binding proteins, which are crucial to many cellular processes, such as DNA replication, DNA repair and DNA modification. Among these methods, building a meaningful feature representation of the sequences and choosing an appropriate classifier are the most trivial tasks. Disclosing the significances and contributions of different feature spaces and classifiers to the final prediction is of the utmost importance, not only for the prediction performances, but also the practical clues of biological experiment designs. In this study, we propose a model stacking framework by orchestrating multi-view features and classifiers (MSFBinder) to investigate how to integrate and evaluate loosely-coupled models for predicting DNA-binding proteins. The framework integrates multi-view features including Local_DPP, 188D, Position-Specific Scoring Matrix (PSSM)_DWT and autocross-covariance of secondary structures(AC_Struc), which were extracted based on evolutionary information, sequence composition, physiochemical properties and predicted structural information, respectively. These features are fed into various loosely-coupled classifiers such as SVM and random forest. Then, a logistic regression model was applied to evaluate the contributions of these individual classifiers and to make the final prediction. When performing on the training dataset PDB1075, the proposed method achieves an accuracy of 83.53%. On the independent dataset PDB186, the method achieves an accuracy of 81.72%, which outperforms many existing methods. These results suggest that the framework is able to orchestrate various predicted models flexibly with good performances. |
format | Online Article Text |
id | pubmed-6116045 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-61160452018-08-31 A Model Stacking Framework for Identifying DNA Binding Proteins by Orchestrating Multi-View Features and Classifiers Liu, Xiu-Juan Gong, Xiu-Jun Yu, Hua Xu, Jia-Hui Genes (Basel) Article Nowadays, various machine learning-based approaches using sequence information alone have been proposed for identifying DNA-binding proteins, which are crucial to many cellular processes, such as DNA replication, DNA repair and DNA modification. Among these methods, building a meaningful feature representation of the sequences and choosing an appropriate classifier are the most trivial tasks. Disclosing the significances and contributions of different feature spaces and classifiers to the final prediction is of the utmost importance, not only for the prediction performances, but also the practical clues of biological experiment designs. In this study, we propose a model stacking framework by orchestrating multi-view features and classifiers (MSFBinder) to investigate how to integrate and evaluate loosely-coupled models for predicting DNA-binding proteins. The framework integrates multi-view features including Local_DPP, 188D, Position-Specific Scoring Matrix (PSSM)_DWT and autocross-covariance of secondary structures(AC_Struc), which were extracted based on evolutionary information, sequence composition, physiochemical properties and predicted structural information, respectively. These features are fed into various loosely-coupled classifiers such as SVM and random forest. Then, a logistic regression model was applied to evaluate the contributions of these individual classifiers and to make the final prediction. When performing on the training dataset PDB1075, the proposed method achieves an accuracy of 83.53%. On the independent dataset PDB186, the method achieves an accuracy of 81.72%, which outperforms many existing methods. These results suggest that the framework is able to orchestrate various predicted models flexibly with good performances. MDPI 2018-08-01 /pmc/articles/PMC6116045/ /pubmed/30071697 http://dx.doi.org/10.3390/genes9080394 Text en © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Liu, Xiu-Juan Gong, Xiu-Jun Yu, Hua Xu, Jia-Hui A Model Stacking Framework for Identifying DNA Binding Proteins by Orchestrating Multi-View Features and Classifiers |
title | A Model Stacking Framework for Identifying DNA Binding Proteins by Orchestrating Multi-View Features and Classifiers |
title_full | A Model Stacking Framework for Identifying DNA Binding Proteins by Orchestrating Multi-View Features and Classifiers |
title_fullStr | A Model Stacking Framework for Identifying DNA Binding Proteins by Orchestrating Multi-View Features and Classifiers |
title_full_unstemmed | A Model Stacking Framework for Identifying DNA Binding Proteins by Orchestrating Multi-View Features and Classifiers |
title_short | A Model Stacking Framework for Identifying DNA Binding Proteins by Orchestrating Multi-View Features and Classifiers |
title_sort | model stacking framework for identifying dna binding proteins by orchestrating multi-view features and classifiers |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6116045/ https://www.ncbi.nlm.nih.gov/pubmed/30071697 http://dx.doi.org/10.3390/genes9080394 |
work_keys_str_mv | AT liuxiujuan amodelstackingframeworkforidentifyingdnabindingproteinsbyorchestratingmultiviewfeaturesandclassifiers AT gongxiujun amodelstackingframeworkforidentifyingdnabindingproteinsbyorchestratingmultiviewfeaturesandclassifiers AT yuhua amodelstackingframeworkforidentifyingdnabindingproteinsbyorchestratingmultiviewfeaturesandclassifiers AT xujiahui amodelstackingframeworkforidentifyingdnabindingproteinsbyorchestratingmultiviewfeaturesandclassifiers AT liuxiujuan modelstackingframeworkforidentifyingdnabindingproteinsbyorchestratingmultiviewfeaturesandclassifiers AT gongxiujun modelstackingframeworkforidentifyingdnabindingproteinsbyorchestratingmultiviewfeaturesandclassifiers AT yuhua modelstackingframeworkforidentifyingdnabindingproteinsbyorchestratingmultiviewfeaturesandclassifiers AT xujiahui modelstackingframeworkforidentifyingdnabindingproteinsbyorchestratingmultiviewfeaturesandclassifiers |