Cargando…
Predicting protein-ligand binding residues with deep convolutional neural networks
BACKGROUND: Ligand-binding proteins play key roles in many biological processes. Identification of protein-ligand binding residues is important in understanding the biological functions of proteins. Existing computational methods can be roughly categorized as sequence-based or 3D-structure-based met...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6390579/ https://www.ncbi.nlm.nih.gov/pubmed/30808287 http://dx.doi.org/10.1186/s12859-019-2672-1 |
_version_ | 1783398167306305536 |
---|---|
author | Cui, Yifeng Dong, Qiwen Hong, Daocheng Wang, Xikun |
author_facet | Cui, Yifeng Dong, Qiwen Hong, Daocheng Wang, Xikun |
author_sort | Cui, Yifeng |
collection | PubMed |
description | BACKGROUND: Ligand-binding proteins play key roles in many biological processes. Identification of protein-ligand binding residues is important in understanding the biological functions of proteins. Existing computational methods can be roughly categorized as sequence-based or 3D-structure-based methods. All these methods are based on traditional machine learning. In a series of binding residue prediction tasks, 3D-structure-based methods are widely superior to sequence-based methods. However, due to the great number of proteins with known amino acid sequences, sequence-based methods have considerable room for improvement with the development of deep learning. Therefore, prediction of protein-ligand binding residues with deep learning requires study. RESULTS: In this study, we propose a new sequence-based approach called DeepCSeqSite for ab initio protein-ligand binding residue prediction. DeepCSeqSite includes a standard edition and an enhanced edition. The classifier of DeepCSeqSite is based on a deep convolutional neural network. Several convolutional layers are stacked on top of each other to extract hierarchical features. The size of the effective context scope is expanded as the number of convolutional layers increases. The long-distance dependencies between residues can be captured by the large effective context scope, and stacking several layers enables the maximum length of dependencies to be precisely controlled. The extracted features are ultimately combined through one-by-one convolution kernels and softmax to predict whether the residues are binding residues. The state-of-the-art ligand-binding method COACH and some of its submethods are selected as baselines. The methods are tested on a set of 151 nonredundant proteins and three extended test sets. Experiments show that the improvement of the Matthews correlation coefficient (MCC) is no less than 0.05. In addition, a training data augmentation method that slightly improves the performance is discussed in this study. CONCLUSIONS: Without using any templates that include 3D-structure data, DeepCSeqSite significantlyoutperforms existing sequence-based and 3D-structure-based methods, including COACH. Augmentation of the training sets slightly improves the performance. The model, code and datasets are available at https://github.com/yfCuiFaith/DeepCSeqSite. |
format | Online Article Text |
id | pubmed-6390579 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-63905792019-03-11 Predicting protein-ligand binding residues with deep convolutional neural networks Cui, Yifeng Dong, Qiwen Hong, Daocheng Wang, Xikun BMC Bioinformatics Research Article BACKGROUND: Ligand-binding proteins play key roles in many biological processes. Identification of protein-ligand binding residues is important in understanding the biological functions of proteins. Existing computational methods can be roughly categorized as sequence-based or 3D-structure-based methods. All these methods are based on traditional machine learning. In a series of binding residue prediction tasks, 3D-structure-based methods are widely superior to sequence-based methods. However, due to the great number of proteins with known amino acid sequences, sequence-based methods have considerable room for improvement with the development of deep learning. Therefore, prediction of protein-ligand binding residues with deep learning requires study. RESULTS: In this study, we propose a new sequence-based approach called DeepCSeqSite for ab initio protein-ligand binding residue prediction. DeepCSeqSite includes a standard edition and an enhanced edition. The classifier of DeepCSeqSite is based on a deep convolutional neural network. Several convolutional layers are stacked on top of each other to extract hierarchical features. The size of the effective context scope is expanded as the number of convolutional layers increases. The long-distance dependencies between residues can be captured by the large effective context scope, and stacking several layers enables the maximum length of dependencies to be precisely controlled. The extracted features are ultimately combined through one-by-one convolution kernels and softmax to predict whether the residues are binding residues. The state-of-the-art ligand-binding method COACH and some of its submethods are selected as baselines. The methods are tested on a set of 151 nonredundant proteins and three extended test sets. Experiments show that the improvement of the Matthews correlation coefficient (MCC) is no less than 0.05. In addition, a training data augmentation method that slightly improves the performance is discussed in this study. CONCLUSIONS: Without using any templates that include 3D-structure data, DeepCSeqSite significantlyoutperforms existing sequence-based and 3D-structure-based methods, including COACH. Augmentation of the training sets slightly improves the performance. The model, code and datasets are available at https://github.com/yfCuiFaith/DeepCSeqSite. BioMed Central 2019-02-26 /pmc/articles/PMC6390579/ /pubmed/30808287 http://dx.doi.org/10.1186/s12859-019-2672-1 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Cui, Yifeng Dong, Qiwen Hong, Daocheng Wang, Xikun Predicting protein-ligand binding residues with deep convolutional neural networks |
title | Predicting protein-ligand binding residues with deep convolutional neural networks |
title_full | Predicting protein-ligand binding residues with deep convolutional neural networks |
title_fullStr | Predicting protein-ligand binding residues with deep convolutional neural networks |
title_full_unstemmed | Predicting protein-ligand binding residues with deep convolutional neural networks |
title_short | Predicting protein-ligand binding residues with deep convolutional neural networks |
title_sort | predicting protein-ligand binding residues with deep convolutional neural networks |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6390579/ https://www.ncbi.nlm.nih.gov/pubmed/30808287 http://dx.doi.org/10.1186/s12859-019-2672-1 |
work_keys_str_mv | AT cuiyifeng predictingproteinligandbindingresidueswithdeepconvolutionalneuralnetworks AT dongqiwen predictingproteinligandbindingresidueswithdeepconvolutionalneuralnetworks AT hongdaocheng predictingproteinligandbindingresidueswithdeepconvolutionalneuralnetworks AT wangxikun predictingproteinligandbindingresidueswithdeepconvolutionalneuralnetworks |