Cargando…
LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone
BACKGROUND: Protein-ligand binding is important for some proteins to perform their functions. Protein-ligand binding sites are the residues of proteins that physically bind to ligands. Despite of the recent advances in computational prediction for protein-ligand binding sites, the state-of-the-art m...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4271564/ https://www.ncbi.nlm.nih.gov/pubmed/25474163 http://dx.doi.org/10.1186/1471-2105-15-S15-S4 |
_version_ | 1782349628437430272 |
---|---|
author | Chen, Peng Huang, Jianhua Z Gao, Xin |
author_facet | Chen, Peng Huang, Jianhua Z Gao, Xin |
author_sort | Chen, Peng |
collection | PubMed |
description | BACKGROUND: Protein-ligand binding is important for some proteins to perform their functions. Protein-ligand binding sites are the residues of proteins that physically bind to ligands. Despite of the recent advances in computational prediction for protein-ligand binding sites, the state-of-the-art methods search for similar, known structures of the query and predict the binding sites based on the solved structures. However, such structural information is not commonly available. RESULTS: In this paper, we propose a sequence-based approach to identify protein-ligand binding residues. We propose a combination technique to reduce the effects of different sliding residue windows in the process of encoding input feature vectors. Moreover, due to the highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we construct several balanced data sets, for each of which a random forest (RF)-based classifier is trained. The ensemble of these RF classifiers forms a sequence-based protein-ligand binding site predictor. CONCLUSIONS: Experimental results on CASP9 and CASP8 data sets demonstrate that our method compares favorably with the state-of-the-art protein-ligand binding site prediction methods. |
format | Online Article Text |
id | pubmed-4271564 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-42715642015-01-02 LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone Chen, Peng Huang, Jianhua Z Gao, Xin BMC Bioinformatics Proceedings BACKGROUND: Protein-ligand binding is important for some proteins to perform their functions. Protein-ligand binding sites are the residues of proteins that physically bind to ligands. Despite of the recent advances in computational prediction for protein-ligand binding sites, the state-of-the-art methods search for similar, known structures of the query and predict the binding sites based on the solved structures. However, such structural information is not commonly available. RESULTS: In this paper, we propose a sequence-based approach to identify protein-ligand binding residues. We propose a combination technique to reduce the effects of different sliding residue windows in the process of encoding input feature vectors. Moreover, due to the highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we construct several balanced data sets, for each of which a random forest (RF)-based classifier is trained. The ensemble of these RF classifiers forms a sequence-based protein-ligand binding site predictor. CONCLUSIONS: Experimental results on CASP9 and CASP8 data sets demonstrate that our method compares favorably with the state-of-the-art protein-ligand binding site prediction methods. BioMed Central 2014-12-03 /pmc/articles/PMC4271564/ /pubmed/25474163 http://dx.doi.org/10.1186/1471-2105-15-S15-S4 Text en Copyright © 2014 Chen et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Proceedings Chen, Peng Huang, Jianhua Z Gao, Xin LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone |
title | LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone |
title_full | LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone |
title_fullStr | LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone |
title_full_unstemmed | LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone |
title_short | LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone |
title_sort | ligandrfs: random forest ensemble to identify ligand-binding residues from sequence information alone |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4271564/ https://www.ncbi.nlm.nih.gov/pubmed/25474163 http://dx.doi.org/10.1186/1471-2105-15-S15-S4 |
work_keys_str_mv | AT chenpeng ligandrfsrandomforestensembletoidentifyligandbindingresiduesfromsequenceinformationalone AT huangjianhuaz ligandrfsrandomforestensembletoidentifyligandbindingresiduesfromsequenceinformationalone AT gaoxin ligandrfsrandomforestensembletoidentifyligandbindingresiduesfromsequenceinformationalone |