Cargando…

HMMPred: Accurate Prediction of DNA-Binding Proteins Based on HMM Profiles and XGBoost Feature Selection

Prediction of DNA-binding proteins (DBPs) has become a popular research topic in protein science due to its crucial role in all aspects of biological activities. Even though considerable efforts have been devoted to developing powerful computational methods to solve this problem, it is still a chall...

Descripción completa

Detalles Bibliográficos
Autores principales: Sang, Xiuzhi, Xiao, Wanyue, Zheng, Huiwen, Yang, Yang, Liu, Taigang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7142336/
https://www.ncbi.nlm.nih.gov/pubmed/32300371
http://dx.doi.org/10.1155/2020/1384749
_version_ 1783519358180392960
author Sang, Xiuzhi
Xiao, Wanyue
Zheng, Huiwen
Yang, Yang
Liu, Taigang
author_facet Sang, Xiuzhi
Xiao, Wanyue
Zheng, Huiwen
Yang, Yang
Liu, Taigang
author_sort Sang, Xiuzhi
collection PubMed
description Prediction of DNA-binding proteins (DBPs) has become a popular research topic in protein science due to its crucial role in all aspects of biological activities. Even though considerable efforts have been devoted to developing powerful computational methods to solve this problem, it is still a challenging task in the field of bioinformatics. A hidden Markov model (HMM) profile has been proved to provide important clues for improving the prediction performance of DBPs. In this paper, we propose a method, called HMMPred, which extracts the features of amino acid composition and auto- and cross-covariance transformation from the HMM profiles, to help train a machine learning model for identification of DBPs. Then, a feature selection technique is performed based on the extreme gradient boosting (XGBoost) algorithm. Finally, the selected optimal features are fed into a support vector machine (SVM) classifier to predict DBPs. The experimental results tested on two benchmark datasets show that the proposed method is superior to most of the existing methods and could serve as an alternative tool to identify DBPs.
format Online
Article
Text
id pubmed-7142336
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-71423362020-04-16 HMMPred: Accurate Prediction of DNA-Binding Proteins Based on HMM Profiles and XGBoost Feature Selection Sang, Xiuzhi Xiao, Wanyue Zheng, Huiwen Yang, Yang Liu, Taigang Comput Math Methods Med Research Article Prediction of DNA-binding proteins (DBPs) has become a popular research topic in protein science due to its crucial role in all aspects of biological activities. Even though considerable efforts have been devoted to developing powerful computational methods to solve this problem, it is still a challenging task in the field of bioinformatics. A hidden Markov model (HMM) profile has been proved to provide important clues for improving the prediction performance of DBPs. In this paper, we propose a method, called HMMPred, which extracts the features of amino acid composition and auto- and cross-covariance transformation from the HMM profiles, to help train a machine learning model for identification of DBPs. Then, a feature selection technique is performed based on the extreme gradient boosting (XGBoost) algorithm. Finally, the selected optimal features are fed into a support vector machine (SVM) classifier to predict DBPs. The experimental results tested on two benchmark datasets show that the proposed method is superior to most of the existing methods and could serve as an alternative tool to identify DBPs. Hindawi 2020-03-28 /pmc/articles/PMC7142336/ /pubmed/32300371 http://dx.doi.org/10.1155/2020/1384749 Text en Copyright © 2020 Xiuzhi Sang et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Sang, Xiuzhi
Xiao, Wanyue
Zheng, Huiwen
Yang, Yang
Liu, Taigang
HMMPred: Accurate Prediction of DNA-Binding Proteins Based on HMM Profiles and XGBoost Feature Selection
title HMMPred: Accurate Prediction of DNA-Binding Proteins Based on HMM Profiles and XGBoost Feature Selection
title_full HMMPred: Accurate Prediction of DNA-Binding Proteins Based on HMM Profiles and XGBoost Feature Selection
title_fullStr HMMPred: Accurate Prediction of DNA-Binding Proteins Based on HMM Profiles and XGBoost Feature Selection
title_full_unstemmed HMMPred: Accurate Prediction of DNA-Binding Proteins Based on HMM Profiles and XGBoost Feature Selection
title_short HMMPred: Accurate Prediction of DNA-Binding Proteins Based on HMM Profiles and XGBoost Feature Selection
title_sort hmmpred: accurate prediction of dna-binding proteins based on hmm profiles and xgboost feature selection
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7142336/
https://www.ncbi.nlm.nih.gov/pubmed/32300371
http://dx.doi.org/10.1155/2020/1384749
work_keys_str_mv AT sangxiuzhi hmmpredaccuratepredictionofdnabindingproteinsbasedonhmmprofilesandxgboostfeatureselection
AT xiaowanyue hmmpredaccuratepredictionofdnabindingproteinsbasedonhmmprofilesandxgboostfeatureselection
AT zhenghuiwen hmmpredaccuratepredictionofdnabindingproteinsbasedonhmmprofilesandxgboostfeatureselection
AT yangyang hmmpredaccuratepredictionofdnabindingproteinsbasedonhmmprofilesandxgboostfeatureselection
AT liutaigang hmmpredaccuratepredictionofdnabindingproteinsbasedonhmmprofilesandxgboostfeatureselection