Cargando…

Zero-shot-capable identification of phage–host relationships with whole-genome sequence representation by contrastive learning

Accurately identifying phage–host relationships from their genome sequences is still challenging, especially for those phages and hosts with less homologous sequences. In this work, focusing on identifying the phage–host relationships at the species and genus level, we propose a contrastive learning...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Yao-zhong, Liu, Yunjie, Bai, Zeheng, Fujimoto, Kosuke, Uematsu, Satoshi, Imoto, Seiya
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10516345/
https://www.ncbi.nlm.nih.gov/pubmed/37466138
http://dx.doi.org/10.1093/bib/bbad239
_version_ 1785109111074979840
author Zhang, Yao-zhong
Liu, Yunjie
Bai, Zeheng
Fujimoto, Kosuke
Uematsu, Satoshi
Imoto, Seiya
author_facet Zhang, Yao-zhong
Liu, Yunjie
Bai, Zeheng
Fujimoto, Kosuke
Uematsu, Satoshi
Imoto, Seiya
author_sort Zhang, Yao-zhong
collection PubMed
description Accurately identifying phage–host relationships from their genome sequences is still challenging, especially for those phages and hosts with less homologous sequences. In this work, focusing on identifying the phage–host relationships at the species and genus level, we propose a contrastive learning based approach to learn whole-genome sequence embeddings that can take account of phage–host interactions (PHIs). Contrastive learning is used to make phages infecting the same hosts close to each other in the new representation space. Specifically, we rephrase whole-genome sequences with frequency chaos game representation (FCGR) and learn latent embeddings that ‘encapsulate’ phages and host relationships through contrastive learning. The contrastive learning method works well on the imbalanced dataset. Based on the learned embeddings, a proposed pipeline named CL4PHI can predict known hosts and unseen hosts in training. We compare our method with two recently proposed state-of-the-art learning-based methods on their benchmark datasets. The experiment results demonstrate that the proposed method using contrastive learning improves the prediction accuracy on known hosts and demonstrates a zero-shot prediction capability on unseen hosts. In terms of potential applications, the rapid pace of genome sequencing across different species has resulted in a vast amount of whole-genome sequencing data that require efficient computational methods for identifying phage–host interactions. The proposed approach is expected to address this need by efficiently processing whole-genome sequences of phages and prokaryotic hosts and capturing features related to phage–host relationships for genome sequence representation. This approach can be used to accelerate the discovery of phage–host interactions and aid in the development of phage-based therapies for infectious diseases.
format Online
Article
Text
id pubmed-10516345
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-105163452023-09-23 Zero-shot-capable identification of phage–host relationships with whole-genome sequence representation by contrastive learning Zhang, Yao-zhong Liu, Yunjie Bai, Zeheng Fujimoto, Kosuke Uematsu, Satoshi Imoto, Seiya Brief Bioinform Problem Solving Protocol Accurately identifying phage–host relationships from their genome sequences is still challenging, especially for those phages and hosts with less homologous sequences. In this work, focusing on identifying the phage–host relationships at the species and genus level, we propose a contrastive learning based approach to learn whole-genome sequence embeddings that can take account of phage–host interactions (PHIs). Contrastive learning is used to make phages infecting the same hosts close to each other in the new representation space. Specifically, we rephrase whole-genome sequences with frequency chaos game representation (FCGR) and learn latent embeddings that ‘encapsulate’ phages and host relationships through contrastive learning. The contrastive learning method works well on the imbalanced dataset. Based on the learned embeddings, a proposed pipeline named CL4PHI can predict known hosts and unseen hosts in training. We compare our method with two recently proposed state-of-the-art learning-based methods on their benchmark datasets. The experiment results demonstrate that the proposed method using contrastive learning improves the prediction accuracy on known hosts and demonstrates a zero-shot prediction capability on unseen hosts. In terms of potential applications, the rapid pace of genome sequencing across different species has resulted in a vast amount of whole-genome sequencing data that require efficient computational methods for identifying phage–host interactions. The proposed approach is expected to address this need by efficiently processing whole-genome sequences of phages and prokaryotic hosts and capturing features related to phage–host relationships for genome sequence representation. This approach can be used to accelerate the discovery of phage–host interactions and aid in the development of phage-based therapies for infectious diseases. Oxford University Press 2023-07-18 /pmc/articles/PMC10516345/ /pubmed/37466138 http://dx.doi.org/10.1093/bib/bbad239 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Problem Solving Protocol
Zhang, Yao-zhong
Liu, Yunjie
Bai, Zeheng
Fujimoto, Kosuke
Uematsu, Satoshi
Imoto, Seiya
Zero-shot-capable identification of phage–host relationships with whole-genome sequence representation by contrastive learning
title Zero-shot-capable identification of phage–host relationships with whole-genome sequence representation by contrastive learning
title_full Zero-shot-capable identification of phage–host relationships with whole-genome sequence representation by contrastive learning
title_fullStr Zero-shot-capable identification of phage–host relationships with whole-genome sequence representation by contrastive learning
title_full_unstemmed Zero-shot-capable identification of phage–host relationships with whole-genome sequence representation by contrastive learning
title_short Zero-shot-capable identification of phage–host relationships with whole-genome sequence representation by contrastive learning
title_sort zero-shot-capable identification of phage–host relationships with whole-genome sequence representation by contrastive learning
topic Problem Solving Protocol
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10516345/
https://www.ncbi.nlm.nih.gov/pubmed/37466138
http://dx.doi.org/10.1093/bib/bbad239
work_keys_str_mv AT zhangyaozhong zeroshotcapableidentificationofphagehostrelationshipswithwholegenomesequencerepresentationbycontrastivelearning
AT liuyunjie zeroshotcapableidentificationofphagehostrelationshipswithwholegenomesequencerepresentationbycontrastivelearning
AT baizeheng zeroshotcapableidentificationofphagehostrelationshipswithwholegenomesequencerepresentationbycontrastivelearning
AT fujimotokosuke zeroshotcapableidentificationofphagehostrelationshipswithwholegenomesequencerepresentationbycontrastivelearning
AT uematsusatoshi zeroshotcapableidentificationofphagehostrelationshipswithwholegenomesequencerepresentationbycontrastivelearning
AT imotoseiya zeroshotcapableidentificationofphagehostrelationshipswithwholegenomesequencerepresentationbycontrastivelearning