Cargando…
Zero-shot-capable identification of phage–host relationships with whole-genome sequence representation by contrastive learning
Accurately identifying phage–host relationships from their genome sequences is still challenging, especially for those phages and hosts with less homologous sequences. In this work, focusing on identifying the phage–host relationships at the species and genus level, we propose a contrastive learning...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10516345/ https://www.ncbi.nlm.nih.gov/pubmed/37466138 http://dx.doi.org/10.1093/bib/bbad239 |
_version_ | 1785109111074979840 |
---|---|
author | Zhang, Yao-zhong Liu, Yunjie Bai, Zeheng Fujimoto, Kosuke Uematsu, Satoshi Imoto, Seiya |
author_facet | Zhang, Yao-zhong Liu, Yunjie Bai, Zeheng Fujimoto, Kosuke Uematsu, Satoshi Imoto, Seiya |
author_sort | Zhang, Yao-zhong |
collection | PubMed |
description | Accurately identifying phage–host relationships from their genome sequences is still challenging, especially for those phages and hosts with less homologous sequences. In this work, focusing on identifying the phage–host relationships at the species and genus level, we propose a contrastive learning based approach to learn whole-genome sequence embeddings that can take account of phage–host interactions (PHIs). Contrastive learning is used to make phages infecting the same hosts close to each other in the new representation space. Specifically, we rephrase whole-genome sequences with frequency chaos game representation (FCGR) and learn latent embeddings that ‘encapsulate’ phages and host relationships through contrastive learning. The contrastive learning method works well on the imbalanced dataset. Based on the learned embeddings, a proposed pipeline named CL4PHI can predict known hosts and unseen hosts in training. We compare our method with two recently proposed state-of-the-art learning-based methods on their benchmark datasets. The experiment results demonstrate that the proposed method using contrastive learning improves the prediction accuracy on known hosts and demonstrates a zero-shot prediction capability on unseen hosts. In terms of potential applications, the rapid pace of genome sequencing across different species has resulted in a vast amount of whole-genome sequencing data that require efficient computational methods for identifying phage–host interactions. The proposed approach is expected to address this need by efficiently processing whole-genome sequences of phages and prokaryotic hosts and capturing features related to phage–host relationships for genome sequence representation. This approach can be used to accelerate the discovery of phage–host interactions and aid in the development of phage-based therapies for infectious diseases. |
format | Online Article Text |
id | pubmed-10516345 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-105163452023-09-23 Zero-shot-capable identification of phage–host relationships with whole-genome sequence representation by contrastive learning Zhang, Yao-zhong Liu, Yunjie Bai, Zeheng Fujimoto, Kosuke Uematsu, Satoshi Imoto, Seiya Brief Bioinform Problem Solving Protocol Accurately identifying phage–host relationships from their genome sequences is still challenging, especially for those phages and hosts with less homologous sequences. In this work, focusing on identifying the phage–host relationships at the species and genus level, we propose a contrastive learning based approach to learn whole-genome sequence embeddings that can take account of phage–host interactions (PHIs). Contrastive learning is used to make phages infecting the same hosts close to each other in the new representation space. Specifically, we rephrase whole-genome sequences with frequency chaos game representation (FCGR) and learn latent embeddings that ‘encapsulate’ phages and host relationships through contrastive learning. The contrastive learning method works well on the imbalanced dataset. Based on the learned embeddings, a proposed pipeline named CL4PHI can predict known hosts and unseen hosts in training. We compare our method with two recently proposed state-of-the-art learning-based methods on their benchmark datasets. The experiment results demonstrate that the proposed method using contrastive learning improves the prediction accuracy on known hosts and demonstrates a zero-shot prediction capability on unseen hosts. In terms of potential applications, the rapid pace of genome sequencing across different species has resulted in a vast amount of whole-genome sequencing data that require efficient computational methods for identifying phage–host interactions. The proposed approach is expected to address this need by efficiently processing whole-genome sequences of phages and prokaryotic hosts and capturing features related to phage–host relationships for genome sequence representation. This approach can be used to accelerate the discovery of phage–host interactions and aid in the development of phage-based therapies for infectious diseases. Oxford University Press 2023-07-18 /pmc/articles/PMC10516345/ /pubmed/37466138 http://dx.doi.org/10.1093/bib/bbad239 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Problem Solving Protocol Zhang, Yao-zhong Liu, Yunjie Bai, Zeheng Fujimoto, Kosuke Uematsu, Satoshi Imoto, Seiya Zero-shot-capable identification of phage–host relationships with whole-genome sequence representation by contrastive learning |
title | Zero-shot-capable identification of phage–host relationships with whole-genome sequence representation by contrastive learning |
title_full | Zero-shot-capable identification of phage–host relationships with whole-genome sequence representation by contrastive learning |
title_fullStr | Zero-shot-capable identification of phage–host relationships with whole-genome sequence representation by contrastive learning |
title_full_unstemmed | Zero-shot-capable identification of phage–host relationships with whole-genome sequence representation by contrastive learning |
title_short | Zero-shot-capable identification of phage–host relationships with whole-genome sequence representation by contrastive learning |
title_sort | zero-shot-capable identification of phage–host relationships with whole-genome sequence representation by contrastive learning |
topic | Problem Solving Protocol |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10516345/ https://www.ncbi.nlm.nih.gov/pubmed/37466138 http://dx.doi.org/10.1093/bib/bbad239 |
work_keys_str_mv | AT zhangyaozhong zeroshotcapableidentificationofphagehostrelationshipswithwholegenomesequencerepresentationbycontrastivelearning AT liuyunjie zeroshotcapableidentificationofphagehostrelationshipswithwholegenomesequencerepresentationbycontrastivelearning AT baizeheng zeroshotcapableidentificationofphagehostrelationshipswithwholegenomesequencerepresentationbycontrastivelearning AT fujimotokosuke zeroshotcapableidentificationofphagehostrelationshipswithwholegenomesequencerepresentationbycontrastivelearning AT uematsusatoshi zeroshotcapableidentificationofphagehostrelationshipswithwholegenomesequencerepresentationbycontrastivelearning AT imotoseiya zeroshotcapableidentificationofphagehostrelationshipswithwholegenomesequencerepresentationbycontrastivelearning |