Cargando…

Protein embeddings improve phage-host interaction prediction

With the growing interest in using phages to combat antimicrobial resistance, computational methods for predicting phage-host interactions have been explored to help shortlist candidate phages. Most existing models consider entire proteomes and rely on manual feature engineering, which poses difficu...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gonzales, Mark Edward M., Ureta, Jennifer C., Shrestha, Anish M. S.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2023
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10365317/ https://www.ncbi.nlm.nih.gov/pubmed/37486915 http://dx.doi.org/10.1371/journal.pone.0289030

_version_	1785077016883625984
author	Gonzales, Mark Edward M. Ureta, Jennifer C. Shrestha, Anish M. S.
author_facet	Gonzales, Mark Edward M. Ureta, Jennifer C. Shrestha, Anish M. S.
author_sort	Gonzales, Mark Edward M.
collection	PubMed
description	With the growing interest in using phages to combat antimicrobial resistance, computational methods for predicting phage-host interactions have been explored to help shortlist candidate phages. Most existing models consider entire proteomes and rely on manual feature engineering, which poses difficulty in selecting the most informative sequence properties to serve as input to the model. In this paper, we framed phage-host interaction prediction as a multiclass classification problem that takes as input the embeddings of a phage’s receptor-binding proteins, which are known to be the key machinery for host recognition, and predicts the host genus. We explored different protein language models to automatically encode these protein sequences into dense embeddings without the need for additional alignment or structural information. We show that the use of embeddings of receptor-binding proteins presents improvements over handcrafted genomic and protein sequence features. The highest performance was obtained using the transformer-based protein language model ProtT5, resulting in a 3% to 4% increase in weighted F1 and recall scores across different prediction confidence thresholds, compared to using selected handcrafted sequence features.
format	Online Article Text
id	pubmed-10365317
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-103653172023-07-25 Protein embeddings improve phage-host interaction prediction Gonzales, Mark Edward M. Ureta, Jennifer C. Shrestha, Anish M. S. PLoS One Research Article With the growing interest in using phages to combat antimicrobial resistance, computational methods for predicting phage-host interactions have been explored to help shortlist candidate phages. Most existing models consider entire proteomes and rely on manual feature engineering, which poses difficulty in selecting the most informative sequence properties to serve as input to the model. In this paper, we framed phage-host interaction prediction as a multiclass classification problem that takes as input the embeddings of a phage’s receptor-binding proteins, which are known to be the key machinery for host recognition, and predicts the host genus. We explored different protein language models to automatically encode these protein sequences into dense embeddings without the need for additional alignment or structural information. We show that the use of embeddings of receptor-binding proteins presents improvements over handcrafted genomic and protein sequence features. The highest performance was obtained using the transformer-based protein language model ProtT5, resulting in a 3% to 4% increase in weighted F1 and recall scores across different prediction confidence thresholds, compared to using selected handcrafted sequence features. Public Library of Science 2023-07-24 /pmc/articles/PMC10365317/ /pubmed/37486915 http://dx.doi.org/10.1371/journal.pone.0289030 Text en © 2023 Gonzales et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Gonzales, Mark Edward M. Ureta, Jennifer C. Shrestha, Anish M. S. Protein embeddings improve phage-host interaction prediction
title	Protein embeddings improve phage-host interaction prediction
title_full	Protein embeddings improve phage-host interaction prediction
title_fullStr	Protein embeddings improve phage-host interaction prediction
title_full_unstemmed	Protein embeddings improve phage-host interaction prediction
title_short	Protein embeddings improve phage-host interaction prediction
title_sort	protein embeddings improve phage-host interaction prediction
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10365317/ https://www.ncbi.nlm.nih.gov/pubmed/37486915 http://dx.doi.org/10.1371/journal.pone.0289030
work_keys_str_mv	AT gonzalesmarkedwardm proteinembeddingsimprovephagehostinteractionprediction AT uretajenniferc proteinembeddingsimprovephagehostinteractionprediction AT shresthaanishms proteinembeddingsimprovephagehostinteractionprediction

Protein embeddings improve phage-host interaction prediction

Ejemplares similares