Cargando…
SXGBsite: Prediction of Protein–Ligand Binding Sites Using Sequence Information and Extreme Gradient Boosting
The prediction of protein–ligand binding sites is important in drug discovery and drug design. Protein–ligand binding site prediction computational methods are inexpensive and fast compared with experimental methods. This paper proposes a new computational method, SXGBsite, which includes the synthe...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6947422/ https://www.ncbi.nlm.nih.gov/pubmed/31771119 http://dx.doi.org/10.3390/genes10120965 |
_version_ | 1783485547153457152 |
---|---|
author | Zhao, Ziqi Xu, Yonghong Zhao, Yong |
author_facet | Zhao, Ziqi Xu, Yonghong Zhao, Yong |
author_sort | Zhao, Ziqi |
collection | PubMed |
description | The prediction of protein–ligand binding sites is important in drug discovery and drug design. Protein–ligand binding site prediction computational methods are inexpensive and fast compared with experimental methods. This paper proposes a new computational method, SXGBsite, which includes the synthetic minority over-sampling technique (SMOTE) and the Extreme Gradient Boosting (XGBoost). SXGBsite uses the position-specific scoring matrix discrete cosine transform (PSSM-DCT) and predicted solvent accessibility (PSA) to extract features containing sequence information. A new balanced dataset was generated by SMOTE to improve classifier performance, and a prediction model was constructed using XGBoost. The parallel computing and regularization techniques enabled high-quality and fast predictions and mitigated overfitting caused by SMOTE. An evaluation using 12 different types of ligand binding site independent test sets showed that SXGBsite performs similarly to the existing methods on eight of the independent test sets with a faster computation time. SXGBsite may be applied as a complement to biological experiments. |
format | Online Article Text |
id | pubmed-6947422 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-69474222020-01-13 SXGBsite: Prediction of Protein–Ligand Binding Sites Using Sequence Information and Extreme Gradient Boosting Zhao, Ziqi Xu, Yonghong Zhao, Yong Genes (Basel) Article The prediction of protein–ligand binding sites is important in drug discovery and drug design. Protein–ligand binding site prediction computational methods are inexpensive and fast compared with experimental methods. This paper proposes a new computational method, SXGBsite, which includes the synthetic minority over-sampling technique (SMOTE) and the Extreme Gradient Boosting (XGBoost). SXGBsite uses the position-specific scoring matrix discrete cosine transform (PSSM-DCT) and predicted solvent accessibility (PSA) to extract features containing sequence information. A new balanced dataset was generated by SMOTE to improve classifier performance, and a prediction model was constructed using XGBoost. The parallel computing and regularization techniques enabled high-quality and fast predictions and mitigated overfitting caused by SMOTE. An evaluation using 12 different types of ligand binding site independent test sets showed that SXGBsite performs similarly to the existing methods on eight of the independent test sets with a faster computation time. SXGBsite may be applied as a complement to biological experiments. MDPI 2019-11-22 /pmc/articles/PMC6947422/ /pubmed/31771119 http://dx.doi.org/10.3390/genes10120965 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Zhao, Ziqi Xu, Yonghong Zhao, Yong SXGBsite: Prediction of Protein–Ligand Binding Sites Using Sequence Information and Extreme Gradient Boosting |
title | SXGBsite: Prediction of Protein–Ligand Binding Sites Using Sequence Information and Extreme Gradient Boosting |
title_full | SXGBsite: Prediction of Protein–Ligand Binding Sites Using Sequence Information and Extreme Gradient Boosting |
title_fullStr | SXGBsite: Prediction of Protein–Ligand Binding Sites Using Sequence Information and Extreme Gradient Boosting |
title_full_unstemmed | SXGBsite: Prediction of Protein–Ligand Binding Sites Using Sequence Information and Extreme Gradient Boosting |
title_short | SXGBsite: Prediction of Protein–Ligand Binding Sites Using Sequence Information and Extreme Gradient Boosting |
title_sort | sxgbsite: prediction of protein–ligand binding sites using sequence information and extreme gradient boosting |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6947422/ https://www.ncbi.nlm.nih.gov/pubmed/31771119 http://dx.doi.org/10.3390/genes10120965 |
work_keys_str_mv | AT zhaoziqi sxgbsitepredictionofproteinligandbindingsitesusingsequenceinformationandextremegradientboosting AT xuyonghong sxgbsitepredictionofproteinligandbindingsitesusingsequenceinformationandextremegradientboosting AT zhaoyong sxgbsitepredictionofproteinligandbindingsitesusingsequenceinformationandextremegradientboosting |