Cargando…

SXGBsite: Prediction of Protein–Ligand Binding Sites Using Sequence Information and Extreme Gradient Boosting

The prediction of protein–ligand binding sites is important in drug discovery and drug design. Protein–ligand binding site prediction computational methods are inexpensive and fast compared with experimental methods. This paper proposes a new computational method, SXGBsite, which includes the synthe...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Ziqi, Xu, Yonghong, Zhao, Yong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6947422/
https://www.ncbi.nlm.nih.gov/pubmed/31771119
http://dx.doi.org/10.3390/genes10120965
_version_ 1783485547153457152
author Zhao, Ziqi
Xu, Yonghong
Zhao, Yong
author_facet Zhao, Ziqi
Xu, Yonghong
Zhao, Yong
author_sort Zhao, Ziqi
collection PubMed
description The prediction of protein–ligand binding sites is important in drug discovery and drug design. Protein–ligand binding site prediction computational methods are inexpensive and fast compared with experimental methods. This paper proposes a new computational method, SXGBsite, which includes the synthetic minority over-sampling technique (SMOTE) and the Extreme Gradient Boosting (XGBoost). SXGBsite uses the position-specific scoring matrix discrete cosine transform (PSSM-DCT) and predicted solvent accessibility (PSA) to extract features containing sequence information. A new balanced dataset was generated by SMOTE to improve classifier performance, and a prediction model was constructed using XGBoost. The parallel computing and regularization techniques enabled high-quality and fast predictions and mitigated overfitting caused by SMOTE. An evaluation using 12 different types of ligand binding site independent test sets showed that SXGBsite performs similarly to the existing methods on eight of the independent test sets with a faster computation time. SXGBsite may be applied as a complement to biological experiments.
format Online
Article
Text
id pubmed-6947422
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-69474222020-01-13 SXGBsite: Prediction of Protein–Ligand Binding Sites Using Sequence Information and Extreme Gradient Boosting Zhao, Ziqi Xu, Yonghong Zhao, Yong Genes (Basel) Article The prediction of protein–ligand binding sites is important in drug discovery and drug design. Protein–ligand binding site prediction computational methods are inexpensive and fast compared with experimental methods. This paper proposes a new computational method, SXGBsite, which includes the synthetic minority over-sampling technique (SMOTE) and the Extreme Gradient Boosting (XGBoost). SXGBsite uses the position-specific scoring matrix discrete cosine transform (PSSM-DCT) and predicted solvent accessibility (PSA) to extract features containing sequence information. A new balanced dataset was generated by SMOTE to improve classifier performance, and a prediction model was constructed using XGBoost. The parallel computing and regularization techniques enabled high-quality and fast predictions and mitigated overfitting caused by SMOTE. An evaluation using 12 different types of ligand binding site independent test sets showed that SXGBsite performs similarly to the existing methods on eight of the independent test sets with a faster computation time. SXGBsite may be applied as a complement to biological experiments. MDPI 2019-11-22 /pmc/articles/PMC6947422/ /pubmed/31771119 http://dx.doi.org/10.3390/genes10120965 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Zhao, Ziqi
Xu, Yonghong
Zhao, Yong
SXGBsite: Prediction of Protein–Ligand Binding Sites Using Sequence Information and Extreme Gradient Boosting
title SXGBsite: Prediction of Protein–Ligand Binding Sites Using Sequence Information and Extreme Gradient Boosting
title_full SXGBsite: Prediction of Protein–Ligand Binding Sites Using Sequence Information and Extreme Gradient Boosting
title_fullStr SXGBsite: Prediction of Protein–Ligand Binding Sites Using Sequence Information and Extreme Gradient Boosting
title_full_unstemmed SXGBsite: Prediction of Protein–Ligand Binding Sites Using Sequence Information and Extreme Gradient Boosting
title_short SXGBsite: Prediction of Protein–Ligand Binding Sites Using Sequence Information and Extreme Gradient Boosting
title_sort sxgbsite: prediction of protein–ligand binding sites using sequence information and extreme gradient boosting
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6947422/
https://www.ncbi.nlm.nih.gov/pubmed/31771119
http://dx.doi.org/10.3390/genes10120965
work_keys_str_mv AT zhaoziqi sxgbsitepredictionofproteinligandbindingsitesusingsequenceinformationandextremegradientboosting
AT xuyonghong sxgbsitepredictionofproteinligandbindingsitesusingsequenceinformationandextremegradientboosting
AT zhaoyong sxgbsitepredictionofproteinligandbindingsitesusingsequenceinformationandextremegradientboosting