Cargando…
Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree
Nowadays a number of computational approaches have been developed to effectively and accurately predict protein interactions. However, most of these methods typically perform worse when other biological data sources (e.g., protein structure information, protein domains, or gene neighborhoods informa...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5549711/ https://www.ncbi.nlm.nih.gov/pubmed/28792503 http://dx.doi.org/10.1371/journal.pone.0181426 |
_version_ | 1783256016546168832 |
---|---|
author | Zhou, Chang Yu, Hua Ding, Yijie Guo, Fei Gong, Xiu-Jun |
author_facet | Zhou, Chang Yu, Hua Ding, Yijie Guo, Fei Gong, Xiu-Jun |
author_sort | Zhou, Chang |
collection | PubMed |
description | Nowadays a number of computational approaches have been developed to effectively and accurately predict protein interactions. However, most of these methods typically perform worse when other biological data sources (e.g., protein structure information, protein domains, or gene neighborhoods information) are not available. In the present work, we propose a method for predicting protein interactions making full use of physicochemical characteristics of amino acids. A protein sequence is encoded at multi-scale by seven properties, including their qualitative and quantitative descriptions, of amino acids. Five kinds of protein descriptors, frequency, composition, transformation, distribution and auto covariance, are extracted from these encodings for representing each protein sequence. The new formed feature representation consisted of 347 dimensions is able to capture not only the compositional and positional information but also their statistical significance of amino acids in the sequence. Based on such a feature representation, the gradient boosting decision tree algorithm is introduced to predict protein interaction class. When the proposed method is tested with the PPI data of S.cerevisiae, it achieves a prediction accuracy of 95.28% at the Matthew’s correlation coefficient of 90.68%. Compared with the state-of-the-art works on H.pylori and Human, the accuracies can be raised to 89.27% and 98.00% respectively. Extensive experiments are performed for a crossover protein-protein interactions network and the prediction accuracies are also very promising. Because of learning capabilities of the gradient boosting decision tree and the mutil-scale feature representation scheme, the proposed method might be a useful tool for future proteomics studies. |
format | Online Article Text |
id | pubmed-5549711 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-55497112017-08-12 Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree Zhou, Chang Yu, Hua Ding, Yijie Guo, Fei Gong, Xiu-Jun PLoS One Research Article Nowadays a number of computational approaches have been developed to effectively and accurately predict protein interactions. However, most of these methods typically perform worse when other biological data sources (e.g., protein structure information, protein domains, or gene neighborhoods information) are not available. In the present work, we propose a method for predicting protein interactions making full use of physicochemical characteristics of amino acids. A protein sequence is encoded at multi-scale by seven properties, including their qualitative and quantitative descriptions, of amino acids. Five kinds of protein descriptors, frequency, composition, transformation, distribution and auto covariance, are extracted from these encodings for representing each protein sequence. The new formed feature representation consisted of 347 dimensions is able to capture not only the compositional and positional information but also their statistical significance of amino acids in the sequence. Based on such a feature representation, the gradient boosting decision tree algorithm is introduced to predict protein interaction class. When the proposed method is tested with the PPI data of S.cerevisiae, it achieves a prediction accuracy of 95.28% at the Matthew’s correlation coefficient of 90.68%. Compared with the state-of-the-art works on H.pylori and Human, the accuracies can be raised to 89.27% and 98.00% respectively. Extensive experiments are performed for a crossover protein-protein interactions network and the prediction accuracies are also very promising. Because of learning capabilities of the gradient boosting decision tree and the mutil-scale feature representation scheme, the proposed method might be a useful tool for future proteomics studies. Public Library of Science 2017-08-08 /pmc/articles/PMC5549711/ /pubmed/28792503 http://dx.doi.org/10.1371/journal.pone.0181426 Text en © 2017 Zhou et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Zhou, Chang Yu, Hua Ding, Yijie Guo, Fei Gong, Xiu-Jun Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree |
title | Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree |
title_full | Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree |
title_fullStr | Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree |
title_full_unstemmed | Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree |
title_short | Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree |
title_sort | multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5549711/ https://www.ncbi.nlm.nih.gov/pubmed/28792503 http://dx.doi.org/10.1371/journal.pone.0181426 |
work_keys_str_mv | AT zhouchang multiscaleencodingofaminoacidsequencesforpredictingproteininteractionsusinggradientboostingdecisiontree AT yuhua multiscaleencodingofaminoacidsequencesforpredictingproteininteractionsusinggradientboostingdecisiontree AT dingyijie multiscaleencodingofaminoacidsequencesforpredictingproteininteractionsusinggradientboostingdecisiontree AT guofei multiscaleencodingofaminoacidsequencesforpredictingproteininteractionsusinggradientboostingdecisiontree AT gongxiujun multiscaleencodingofaminoacidsequencesforpredictingproteininteractionsusinggradientboostingdecisiontree |