Cargando…

LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone

BACKGROUND: Protein-ligand binding is important for some proteins to perform their functions. Protein-ligand binding sites are the residues of proteins that physically bind to ligands. Despite of the recent advances in computational prediction for protein-ligand binding sites, the state-of-the-art m...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chen, Peng, Huang, Jianhua Z, Gao, Xin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4271564/ https://www.ncbi.nlm.nih.gov/pubmed/25474163 http://dx.doi.org/10.1186/1471-2105-15-S15-S4

_version_	1782349628437430272
author	Chen, Peng Huang, Jianhua Z Gao, Xin
author_facet	Chen, Peng Huang, Jianhua Z Gao, Xin
author_sort	Chen, Peng
collection	PubMed
description	BACKGROUND: Protein-ligand binding is important for some proteins to perform their functions. Protein-ligand binding sites are the residues of proteins that physically bind to ligands. Despite of the recent advances in computational prediction for protein-ligand binding sites, the state-of-the-art methods search for similar, known structures of the query and predict the binding sites based on the solved structures. However, such structural information is not commonly available. RESULTS: In this paper, we propose a sequence-based approach to identify protein-ligand binding residues. We propose a combination technique to reduce the effects of different sliding residue windows in the process of encoding input feature vectors. Moreover, due to the highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we construct several balanced data sets, for each of which a random forest (RF)-based classifier is trained. The ensemble of these RF classifiers forms a sequence-based protein-ligand binding site predictor. CONCLUSIONS: Experimental results on CASP9 and CASP8 data sets demonstrate that our method compares favorably with the state-of-the-art protein-ligand binding site prediction methods.
format	Online Article Text
id	pubmed-4271564
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-42715642015-01-02 LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone Chen, Peng Huang, Jianhua Z Gao, Xin BMC Bioinformatics Proceedings BACKGROUND: Protein-ligand binding is important for some proteins to perform their functions. Protein-ligand binding sites are the residues of proteins that physically bind to ligands. Despite of the recent advances in computational prediction for protein-ligand binding sites, the state-of-the-art methods search for similar, known structures of the query and predict the binding sites based on the solved structures. However, such structural information is not commonly available. RESULTS: In this paper, we propose a sequence-based approach to identify protein-ligand binding residues. We propose a combination technique to reduce the effects of different sliding residue windows in the process of encoding input feature vectors. Moreover, due to the highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we construct several balanced data sets, for each of which a random forest (RF)-based classifier is trained. The ensemble of these RF classifiers forms a sequence-based protein-ligand binding site predictor. CONCLUSIONS: Experimental results on CASP9 and CASP8 data sets demonstrate that our method compares favorably with the state-of-the-art protein-ligand binding site prediction methods. BioMed Central 2014-12-03 /pmc/articles/PMC4271564/ /pubmed/25474163 http://dx.doi.org/10.1186/1471-2105-15-S15-S4 Text en Copyright © 2014 Chen et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Proceedings Chen, Peng Huang, Jianhua Z Gao, Xin LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone
title	LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone
title_full	LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone
title_fullStr	LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone
title_full_unstemmed	LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone
title_short	LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone
title_sort	ligandrfs: random forest ensemble to identify ligand-binding residues from sequence information alone
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4271564/ https://www.ncbi.nlm.nih.gov/pubmed/25474163 http://dx.doi.org/10.1186/1471-2105-15-S15-S4
work_keys_str_mv	AT chenpeng ligandrfsrandomforestensembletoidentifyligandbindingresiduesfromsequenceinformationalone AT huangjianhuaz ligandrfsrandomforestensembletoidentifyligandbindingresiduesfromsequenceinformationalone AT gaoxin ligandrfsrandomforestensembletoidentifyligandbindingresiduesfromsequenceinformationalone

LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone

Ejemplares similares