Cargando…
Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method
The identification of human-virus protein-protein interactions (PPIs) is an essential and challenging research topic, potentially providing a mechanistic understanding of viral infection. Given that the experimental determination of human-virus PPIs is time-consuming and labor-intensive, computation...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Research Network of Computational and Structural Biotechnology
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6961065/ https://www.ncbi.nlm.nih.gov/pubmed/31969974 http://dx.doi.org/10.1016/j.csbj.2019.12.005 |
_version_ | 1783487914436460544 |
---|---|
author | Yang, Xiaodi Yang, Shiping Li, Qinmengge Wuchty, Stefan Zhang, Ziding |
author_facet | Yang, Xiaodi Yang, Shiping Li, Qinmengge Wuchty, Stefan Zhang, Ziding |
author_sort | Yang, Xiaodi |
collection | PubMed |
description | The identification of human-virus protein-protein interactions (PPIs) is an essential and challenging research topic, potentially providing a mechanistic understanding of viral infection. Given that the experimental determination of human-virus PPIs is time-consuming and labor-intensive, computational methods are playing an important role in providing testable hypotheses, complementing the determination of large-scale interactome between species. In this work, we applied an unsupervised sequence embedding technique (doc2vec) to represent protein sequences as rich feature vectors of low dimensionality. Training a Random Forest (RF) classifier through a training dataset that covers known PPIs between human and all viruses, we obtained excellent predictive accuracy outperforming various combinations of machine learning algorithms and commonly-used sequence encoding schemes. Rigorous comparison with three existing human-virus PPI prediction methods, our proposed computational framework further provided very competitive and promising performance, suggesting that the doc2vec encoding scheme effectively captures context information of protein sequences, pertaining to corresponding protein-protein interactions. Our approach is freely accessible through our web server as part of our host-pathogen PPI prediction platform (http://zzdlab.com/InterSPPI/). Taken together, we hope the current work not only contributes a useful predictor to accelerate the exploration of human-virus PPIs, but also provides some meaningful insights into human-virus relationships. |
format | Online Article Text |
id | pubmed-6961065 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Research Network of Computational and Structural Biotechnology |
record_format | MEDLINE/PubMed |
spelling | pubmed-69610652020-01-22 Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method Yang, Xiaodi Yang, Shiping Li, Qinmengge Wuchty, Stefan Zhang, Ziding Comput Struct Biotechnol J Research Article The identification of human-virus protein-protein interactions (PPIs) is an essential and challenging research topic, potentially providing a mechanistic understanding of viral infection. Given that the experimental determination of human-virus PPIs is time-consuming and labor-intensive, computational methods are playing an important role in providing testable hypotheses, complementing the determination of large-scale interactome between species. In this work, we applied an unsupervised sequence embedding technique (doc2vec) to represent protein sequences as rich feature vectors of low dimensionality. Training a Random Forest (RF) classifier through a training dataset that covers known PPIs between human and all viruses, we obtained excellent predictive accuracy outperforming various combinations of machine learning algorithms and commonly-used sequence encoding schemes. Rigorous comparison with three existing human-virus PPI prediction methods, our proposed computational framework further provided very competitive and promising performance, suggesting that the doc2vec encoding scheme effectively captures context information of protein sequences, pertaining to corresponding protein-protein interactions. Our approach is freely accessible through our web server as part of our host-pathogen PPI prediction platform (http://zzdlab.com/InterSPPI/). Taken together, we hope the current work not only contributes a useful predictor to accelerate the exploration of human-virus PPIs, but also provides some meaningful insights into human-virus relationships. Research Network of Computational and Structural Biotechnology 2019-12-26 /pmc/articles/PMC6961065/ /pubmed/31969974 http://dx.doi.org/10.1016/j.csbj.2019.12.005 Text en © 2019 The Authors http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Research Article Yang, Xiaodi Yang, Shiping Li, Qinmengge Wuchty, Stefan Zhang, Ziding Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method |
title | Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method |
title_full | Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method |
title_fullStr | Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method |
title_full_unstemmed | Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method |
title_short | Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method |
title_sort | prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6961065/ https://www.ncbi.nlm.nih.gov/pubmed/31969974 http://dx.doi.org/10.1016/j.csbj.2019.12.005 |
work_keys_str_mv | AT yangxiaodi predictionofhumanvirusproteinproteininteractionsthroughasequenceembeddingbasedmachinelearningmethod AT yangshiping predictionofhumanvirusproteinproteininteractionsthroughasequenceembeddingbasedmachinelearningmethod AT liqinmengge predictionofhumanvirusproteinproteininteractionsthroughasequenceembeddingbasedmachinelearningmethod AT wuchtystefan predictionofhumanvirusproteinproteininteractionsthroughasequenceembeddingbasedmachinelearningmethod AT zhangziding predictionofhumanvirusproteinproteininteractionsthroughasequenceembeddingbasedmachinelearningmethod |