Cargando…
PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method
DNA-binding proteins (DBPs) play vital roles in all aspects of genetic activities. However, the identification of DBPs by using wet-lab experimental approaches is often time-consuming and laborious. In this study, we develop a novel computational method, called PredDBP-Stack, to predict DBPs solely...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7174956/ https://www.ncbi.nlm.nih.gov/pubmed/32352006 http://dx.doi.org/10.1155/2020/7297631 |
_version_ | 1783524732272902144 |
---|---|
author | Wang, Jun Zheng, Huiwen Yang, Yang Xiao, Wanyue Liu, Taigang |
author_facet | Wang, Jun Zheng, Huiwen Yang, Yang Xiao, Wanyue Liu, Taigang |
author_sort | Wang, Jun |
collection | PubMed |
description | DNA-binding proteins (DBPs) play vital roles in all aspects of genetic activities. However, the identification of DBPs by using wet-lab experimental approaches is often time-consuming and laborious. In this study, we develop a novel computational method, called PredDBP-Stack, to predict DBPs solely based on protein sequences. First, amino acid composition (AAC) and transition probability composition (TPC) extracted from the hidden markov model (HMM) profile are adopted to represent a protein. Next, we establish a stacked ensemble model to identify DBPs, which involves two stages of learning. In the first stage, the four base classifiers are trained with the features of HMM-based compositions. In the second stage, the prediction probabilities of these base classifiers are used as inputs to the meta-classifier to perform the final prediction of DBPs. Based on the PDB1075 benchmark dataset, we conduct a jackknife cross validation with the proposed PredDBP-Stack predictor and obtain a balanced sensitivity and specificity of 92.47% and 92.36%, respectively. This outcome outperforms most of the existing classifiers. Furthermore, our method also achieves superior performance and model robustness on the PDB186 independent dataset. This demonstrates that the PredDBP-Stack is an effective classifier for accurately identifying DBPs based on protein sequence information alone. |
format | Online Article Text |
id | pubmed-7174956 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Hindawi |
record_format | MEDLINE/PubMed |
spelling | pubmed-71749562020-04-29 PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method Wang, Jun Zheng, Huiwen Yang, Yang Xiao, Wanyue Liu, Taigang Biomed Res Int Research Article DNA-binding proteins (DBPs) play vital roles in all aspects of genetic activities. However, the identification of DBPs by using wet-lab experimental approaches is often time-consuming and laborious. In this study, we develop a novel computational method, called PredDBP-Stack, to predict DBPs solely based on protein sequences. First, amino acid composition (AAC) and transition probability composition (TPC) extracted from the hidden markov model (HMM) profile are adopted to represent a protein. Next, we establish a stacked ensemble model to identify DBPs, which involves two stages of learning. In the first stage, the four base classifiers are trained with the features of HMM-based compositions. In the second stage, the prediction probabilities of these base classifiers are used as inputs to the meta-classifier to perform the final prediction of DBPs. Based on the PDB1075 benchmark dataset, we conduct a jackknife cross validation with the proposed PredDBP-Stack predictor and obtain a balanced sensitivity and specificity of 92.47% and 92.36%, respectively. This outcome outperforms most of the existing classifiers. Furthermore, our method also achieves superior performance and model robustness on the PDB186 independent dataset. This demonstrates that the PredDBP-Stack is an effective classifier for accurately identifying DBPs based on protein sequence information alone. Hindawi 2020-04-13 /pmc/articles/PMC7174956/ /pubmed/32352006 http://dx.doi.org/10.1155/2020/7297631 Text en Copyright © 2020 Jun Wang et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Wang, Jun Zheng, Huiwen Yang, Yang Xiao, Wanyue Liu, Taigang PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method |
title | PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method |
title_full | PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method |
title_fullStr | PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method |
title_full_unstemmed | PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method |
title_short | PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method |
title_sort | preddbp-stack: prediction of dna-binding proteins from hmm profiles using a stacked ensemble method |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7174956/ https://www.ncbi.nlm.nih.gov/pubmed/32352006 http://dx.doi.org/10.1155/2020/7297631 |
work_keys_str_mv | AT wangjun preddbpstackpredictionofdnabindingproteinsfromhmmprofilesusingastackedensemblemethod AT zhenghuiwen preddbpstackpredictionofdnabindingproteinsfromhmmprofilesusingastackedensemblemethod AT yangyang preddbpstackpredictionofdnabindingproteinsfromhmmprofilesusingastackedensemblemethod AT xiaowanyue preddbpstackpredictionofdnabindingproteinsfromhmmprofilesusingastackedensemblemethod AT liutaigang preddbpstackpredictionofdnabindingproteinsfromhmmprofilesusingastackedensemblemethod |