Cargando…

Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method

The identification of human-virus protein-protein interactions (PPIs) is an essential and challenging research topic, potentially providing a mechanistic understanding of viral infection. Given that the experimental determination of human-virus PPIs is time-consuming and labor-intensive, computation...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Xiaodi, Yang, Shiping, Li, Qinmengge, Wuchty, Stefan, Zhang, Ziding
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6961065/
https://www.ncbi.nlm.nih.gov/pubmed/31969974
http://dx.doi.org/10.1016/j.csbj.2019.12.005
_version_ 1783487914436460544
author Yang, Xiaodi
Yang, Shiping
Li, Qinmengge
Wuchty, Stefan
Zhang, Ziding
author_facet Yang, Xiaodi
Yang, Shiping
Li, Qinmengge
Wuchty, Stefan
Zhang, Ziding
author_sort Yang, Xiaodi
collection PubMed
description The identification of human-virus protein-protein interactions (PPIs) is an essential and challenging research topic, potentially providing a mechanistic understanding of viral infection. Given that the experimental determination of human-virus PPIs is time-consuming and labor-intensive, computational methods are playing an important role in providing testable hypotheses, complementing the determination of large-scale interactome between species. In this work, we applied an unsupervised sequence embedding technique (doc2vec) to represent protein sequences as rich feature vectors of low dimensionality. Training a Random Forest (RF) classifier through a training dataset that covers known PPIs between human and all viruses, we obtained excellent predictive accuracy outperforming various combinations of machine learning algorithms and commonly-used sequence encoding schemes. Rigorous comparison with three existing human-virus PPI prediction methods, our proposed computational framework further provided very competitive and promising performance, suggesting that the doc2vec encoding scheme effectively captures context information of protein sequences, pertaining to corresponding protein-protein interactions. Our approach is freely accessible through our web server as part of our host-pathogen PPI prediction platform (http://zzdlab.com/InterSPPI/). Taken together, we hope the current work not only contributes a useful predictor to accelerate the exploration of human-virus PPIs, but also provides some meaningful insights into human-virus relationships.
format Online
Article
Text
id pubmed-6961065
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-69610652020-01-22 Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method Yang, Xiaodi Yang, Shiping Li, Qinmengge Wuchty, Stefan Zhang, Ziding Comput Struct Biotechnol J Research Article The identification of human-virus protein-protein interactions (PPIs) is an essential and challenging research topic, potentially providing a mechanistic understanding of viral infection. Given that the experimental determination of human-virus PPIs is time-consuming and labor-intensive, computational methods are playing an important role in providing testable hypotheses, complementing the determination of large-scale interactome between species. In this work, we applied an unsupervised sequence embedding technique (doc2vec) to represent protein sequences as rich feature vectors of low dimensionality. Training a Random Forest (RF) classifier through a training dataset that covers known PPIs between human and all viruses, we obtained excellent predictive accuracy outperforming various combinations of machine learning algorithms and commonly-used sequence encoding schemes. Rigorous comparison with three existing human-virus PPI prediction methods, our proposed computational framework further provided very competitive and promising performance, suggesting that the doc2vec encoding scheme effectively captures context information of protein sequences, pertaining to corresponding protein-protein interactions. Our approach is freely accessible through our web server as part of our host-pathogen PPI prediction platform (http://zzdlab.com/InterSPPI/). Taken together, we hope the current work not only contributes a useful predictor to accelerate the exploration of human-virus PPIs, but also provides some meaningful insights into human-virus relationships. Research Network of Computational and Structural Biotechnology 2019-12-26 /pmc/articles/PMC6961065/ /pubmed/31969974 http://dx.doi.org/10.1016/j.csbj.2019.12.005 Text en © 2019 The Authors http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Yang, Xiaodi
Yang, Shiping
Li, Qinmengge
Wuchty, Stefan
Zhang, Ziding
Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method
title Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method
title_full Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method
title_fullStr Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method
title_full_unstemmed Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method
title_short Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method
title_sort prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6961065/
https://www.ncbi.nlm.nih.gov/pubmed/31969974
http://dx.doi.org/10.1016/j.csbj.2019.12.005
work_keys_str_mv AT yangxiaodi predictionofhumanvirusproteinproteininteractionsthroughasequenceembeddingbasedmachinelearningmethod
AT yangshiping predictionofhumanvirusproteinproteininteractionsthroughasequenceembeddingbasedmachinelearningmethod
AT liqinmengge predictionofhumanvirusproteinproteininteractionsthroughasequenceembeddingbasedmachinelearningmethod
AT wuchtystefan predictionofhumanvirusproteinproteininteractionsthroughasequenceembeddingbasedmachinelearningmethod
AT zhangziding predictionofhumanvirusproteinproteininteractionsthroughasequenceembeddingbasedmachinelearningmethod