Cargando…

PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method

DNA-binding proteins (DBPs) play vital roles in all aspects of genetic activities. However, the identification of DBPs by using wet-lab experimental approaches is often time-consuming and laborious. In this study, we develop a novel computational method, called PredDBP-Stack, to predict DBPs solely...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Jun, Zheng, Huiwen, Yang, Yang, Xiao, Wanyue, Liu, Taigang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7174956/
https://www.ncbi.nlm.nih.gov/pubmed/32352006
http://dx.doi.org/10.1155/2020/7297631
_version_ 1783524732272902144
author Wang, Jun
Zheng, Huiwen
Yang, Yang
Xiao, Wanyue
Liu, Taigang
author_facet Wang, Jun
Zheng, Huiwen
Yang, Yang
Xiao, Wanyue
Liu, Taigang
author_sort Wang, Jun
collection PubMed
description DNA-binding proteins (DBPs) play vital roles in all aspects of genetic activities. However, the identification of DBPs by using wet-lab experimental approaches is often time-consuming and laborious. In this study, we develop a novel computational method, called PredDBP-Stack, to predict DBPs solely based on protein sequences. First, amino acid composition (AAC) and transition probability composition (TPC) extracted from the hidden markov model (HMM) profile are adopted to represent a protein. Next, we establish a stacked ensemble model to identify DBPs, which involves two stages of learning. In the first stage, the four base classifiers are trained with the features of HMM-based compositions. In the second stage, the prediction probabilities of these base classifiers are used as inputs to the meta-classifier to perform the final prediction of DBPs. Based on the PDB1075 benchmark dataset, we conduct a jackknife cross validation with the proposed PredDBP-Stack predictor and obtain a balanced sensitivity and specificity of 92.47% and 92.36%, respectively. This outcome outperforms most of the existing classifiers. Furthermore, our method also achieves superior performance and model robustness on the PDB186 independent dataset. This demonstrates that the PredDBP-Stack is an effective classifier for accurately identifying DBPs based on protein sequence information alone.
format Online
Article
Text
id pubmed-7174956
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-71749562020-04-29 PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method Wang, Jun Zheng, Huiwen Yang, Yang Xiao, Wanyue Liu, Taigang Biomed Res Int Research Article DNA-binding proteins (DBPs) play vital roles in all aspects of genetic activities. However, the identification of DBPs by using wet-lab experimental approaches is often time-consuming and laborious. In this study, we develop a novel computational method, called PredDBP-Stack, to predict DBPs solely based on protein sequences. First, amino acid composition (AAC) and transition probability composition (TPC) extracted from the hidden markov model (HMM) profile are adopted to represent a protein. Next, we establish a stacked ensemble model to identify DBPs, which involves two stages of learning. In the first stage, the four base classifiers are trained with the features of HMM-based compositions. In the second stage, the prediction probabilities of these base classifiers are used as inputs to the meta-classifier to perform the final prediction of DBPs. Based on the PDB1075 benchmark dataset, we conduct a jackknife cross validation with the proposed PredDBP-Stack predictor and obtain a balanced sensitivity and specificity of 92.47% and 92.36%, respectively. This outcome outperforms most of the existing classifiers. Furthermore, our method also achieves superior performance and model robustness on the PDB186 independent dataset. This demonstrates that the PredDBP-Stack is an effective classifier for accurately identifying DBPs based on protein sequence information alone. Hindawi 2020-04-13 /pmc/articles/PMC7174956/ /pubmed/32352006 http://dx.doi.org/10.1155/2020/7297631 Text en Copyright © 2020 Jun Wang et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Wang, Jun
Zheng, Huiwen
Yang, Yang
Xiao, Wanyue
Liu, Taigang
PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method
title PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method
title_full PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method
title_fullStr PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method
title_full_unstemmed PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method
title_short PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method
title_sort preddbp-stack: prediction of dna-binding proteins from hmm profiles using a stacked ensemble method
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7174956/
https://www.ncbi.nlm.nih.gov/pubmed/32352006
http://dx.doi.org/10.1155/2020/7297631
work_keys_str_mv AT wangjun preddbpstackpredictionofdnabindingproteinsfromhmmprofilesusingastackedensemblemethod
AT zhenghuiwen preddbpstackpredictionofdnabindingproteinsfromhmmprofilesusingastackedensemblemethod
AT yangyang preddbpstackpredictionofdnabindingproteinsfromhmmprofilesusingastackedensemblemethod
AT xiaowanyue preddbpstackpredictionofdnabindingproteinsfromhmmprofilesusingastackedensemblemethod
AT liutaigang preddbpstackpredictionofdnabindingproteinsfromhmmprofilesusingastackedensemblemethod