Cargando…

LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone

BACKGROUND: Protein-ligand binding is important for some proteins to perform their functions. Protein-ligand binding sites are the residues of proteins that physically bind to ligands. Despite of the recent advances in computational prediction for protein-ligand binding sites, the state-of-the-art m...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Peng, Huang, Jianhua Z, Gao, Xin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4271564/
https://www.ncbi.nlm.nih.gov/pubmed/25474163
http://dx.doi.org/10.1186/1471-2105-15-S15-S4
_version_ 1782349628437430272
author Chen, Peng
Huang, Jianhua Z
Gao, Xin
author_facet Chen, Peng
Huang, Jianhua Z
Gao, Xin
author_sort Chen, Peng
collection PubMed
description BACKGROUND: Protein-ligand binding is important for some proteins to perform their functions. Protein-ligand binding sites are the residues of proteins that physically bind to ligands. Despite of the recent advances in computational prediction for protein-ligand binding sites, the state-of-the-art methods search for similar, known structures of the query and predict the binding sites based on the solved structures. However, such structural information is not commonly available. RESULTS: In this paper, we propose a sequence-based approach to identify protein-ligand binding residues. We propose a combination technique to reduce the effects of different sliding residue windows in the process of encoding input feature vectors. Moreover, due to the highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we construct several balanced data sets, for each of which a random forest (RF)-based classifier is trained. The ensemble of these RF classifiers forms a sequence-based protein-ligand binding site predictor. CONCLUSIONS: Experimental results on CASP9 and CASP8 data sets demonstrate that our method compares favorably with the state-of-the-art protein-ligand binding site prediction methods.
format Online
Article
Text
id pubmed-4271564
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42715642015-01-02 LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone Chen, Peng Huang, Jianhua Z Gao, Xin BMC Bioinformatics Proceedings BACKGROUND: Protein-ligand binding is important for some proteins to perform their functions. Protein-ligand binding sites are the residues of proteins that physically bind to ligands. Despite of the recent advances in computational prediction for protein-ligand binding sites, the state-of-the-art methods search for similar, known structures of the query and predict the binding sites based on the solved structures. However, such structural information is not commonly available. RESULTS: In this paper, we propose a sequence-based approach to identify protein-ligand binding residues. We propose a combination technique to reduce the effects of different sliding residue windows in the process of encoding input feature vectors. Moreover, due to the highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we construct several balanced data sets, for each of which a random forest (RF)-based classifier is trained. The ensemble of these RF classifiers forms a sequence-based protein-ligand binding site predictor. CONCLUSIONS: Experimental results on CASP9 and CASP8 data sets demonstrate that our method compares favorably with the state-of-the-art protein-ligand binding site prediction methods. BioMed Central 2014-12-03 /pmc/articles/PMC4271564/ /pubmed/25474163 http://dx.doi.org/10.1186/1471-2105-15-S15-S4 Text en Copyright © 2014 Chen et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Proceedings
Chen, Peng
Huang, Jianhua Z
Gao, Xin
LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone
title LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone
title_full LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone
title_fullStr LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone
title_full_unstemmed LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone
title_short LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone
title_sort ligandrfs: random forest ensemble to identify ligand-binding residues from sequence information alone
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4271564/
https://www.ncbi.nlm.nih.gov/pubmed/25474163
http://dx.doi.org/10.1186/1471-2105-15-S15-S4
work_keys_str_mv AT chenpeng ligandrfsrandomforestensembletoidentifyligandbindingresiduesfromsequenceinformationalone
AT huangjianhuaz ligandrfsrandomforestensembletoidentifyligandbindingresiduesfromsequenceinformationalone
AT gaoxin ligandrfsrandomforestensembletoidentifyligandbindingresiduesfromsequenceinformationalone