Cargando…

Generating interacting protein sequences using domain-to-domain translation

MOTIVATION: Being able to artificially design novel proteins of desired function is pivotal in many biological and biomedical applications. Generative statistical modeling has recently emerged as a new paradigm for designing amino acid sequences, including in particular models and embedding methods...

Descripción completa

Detalles Bibliográficos
Autores principales: Meynard-Piganeau, Barthelemy, Fabbri, Caterina, Weigt, Martin, Pagnani, Andrea, Feinauer, Christoph
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10329493/
https://www.ncbi.nlm.nih.gov/pubmed/37399105
http://dx.doi.org/10.1093/bioinformatics/btad401
_version_ 1785070029734150144
author Meynard-Piganeau, Barthelemy
Fabbri, Caterina
Weigt, Martin
Pagnani, Andrea
Feinauer, Christoph
author_facet Meynard-Piganeau, Barthelemy
Fabbri, Caterina
Weigt, Martin
Pagnani, Andrea
Feinauer, Christoph
author_sort Meynard-Piganeau, Barthelemy
collection PubMed
description MOTIVATION: Being able to artificially design novel proteins of desired function is pivotal in many biological and biomedical applications. Generative statistical modeling has recently emerged as a new paradigm for designing amino acid sequences, including in particular models and embedding methods borrowed from natural language processing (NLP). However, most approaches target single proteins or protein domains, and do not take into account any functional specificity or interaction with the context. To extend beyond current computational strategies, we develop a method for generating protein domain sequences intended to interact with another protein domain. Using data from natural multidomain proteins, we cast the problem as a translation problem from a given interactor domain to the new domain to be generated, i.e. we generate artificial partner sequences conditional on an input sequence. We also show in an example that the same procedure can be applied to interactions between distinct proteins. RESULTS: Evaluating our model’s quality using diverse metrics, in part related to distinct biological questions, we show that our method outperforms state-of-the-art shallow autoregressive strategies. We also explore the possibility of fine-tuning pretrained large language models for the same task and of using Alphafold 2 for assessing the quality of sampled sequences. AVAILABILITY AND IMPLEMENTATION: Data and code on https://github.com/barthelemymp/Domain2DomainProteinTranslation.
format Online
Article
Text
id pubmed-10329493
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-103294932023-07-09 Generating interacting protein sequences using domain-to-domain translation Meynard-Piganeau, Barthelemy Fabbri, Caterina Weigt, Martin Pagnani, Andrea Feinauer, Christoph Bioinformatics Original Paper MOTIVATION: Being able to artificially design novel proteins of desired function is pivotal in many biological and biomedical applications. Generative statistical modeling has recently emerged as a new paradigm for designing amino acid sequences, including in particular models and embedding methods borrowed from natural language processing (NLP). However, most approaches target single proteins or protein domains, and do not take into account any functional specificity or interaction with the context. To extend beyond current computational strategies, we develop a method for generating protein domain sequences intended to interact with another protein domain. Using data from natural multidomain proteins, we cast the problem as a translation problem from a given interactor domain to the new domain to be generated, i.e. we generate artificial partner sequences conditional on an input sequence. We also show in an example that the same procedure can be applied to interactions between distinct proteins. RESULTS: Evaluating our model’s quality using diverse metrics, in part related to distinct biological questions, we show that our method outperforms state-of-the-art shallow autoregressive strategies. We also explore the possibility of fine-tuning pretrained large language models for the same task and of using Alphafold 2 for assessing the quality of sampled sequences. AVAILABILITY AND IMPLEMENTATION: Data and code on https://github.com/barthelemymp/Domain2DomainProteinTranslation. Oxford University Press 2023-07-03 /pmc/articles/PMC10329493/ /pubmed/37399105 http://dx.doi.org/10.1093/bioinformatics/btad401 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Meynard-Piganeau, Barthelemy
Fabbri, Caterina
Weigt, Martin
Pagnani, Andrea
Feinauer, Christoph
Generating interacting protein sequences using domain-to-domain translation
title Generating interacting protein sequences using domain-to-domain translation
title_full Generating interacting protein sequences using domain-to-domain translation
title_fullStr Generating interacting protein sequences using domain-to-domain translation
title_full_unstemmed Generating interacting protein sequences using domain-to-domain translation
title_short Generating interacting protein sequences using domain-to-domain translation
title_sort generating interacting protein sequences using domain-to-domain translation
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10329493/
https://www.ncbi.nlm.nih.gov/pubmed/37399105
http://dx.doi.org/10.1093/bioinformatics/btad401
work_keys_str_mv AT meynardpiganeaubarthelemy generatinginteractingproteinsequencesusingdomaintodomaintranslation
AT fabbricaterina generatinginteractingproteinsequencesusingdomaintodomaintranslation
AT weigtmartin generatinginteractingproteinsequencesusingdomaintodomaintranslation
AT pagnaniandrea generatinginteractingproteinsequencesusingdomaintodomaintranslation
AT feinauerchristoph generatinginteractingproteinsequencesusingdomaintodomaintranslation