Cargando…

Searching for protein variants with desired properties using deep generative models

BACKGROUND: Protein engineering aims to improve the functional properties of existing proteins to meet people’s needs. Current deep learning-based models have captured evolutionary, functional, and biochemical features contained in amino acid sequences. However, the existing generative models need t...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Yan, Yao, Yinying, Xia, Yu, Tang, Mingjing
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2023
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10362698/ https://www.ncbi.nlm.nih.gov/pubmed/37480001 http://dx.doi.org/10.1186/s12859-023-05415-9

_version_	1785076484081188864
author	Li, Yan Yao, Yinying Xia, Yu Tang, Mingjing
author_facet	Li, Yan Yao, Yinying Xia, Yu Tang, Mingjing
author_sort	Li, Yan
collection	PubMed
description	BACKGROUND: Protein engineering aims to improve the functional properties of existing proteins to meet people’s needs. Current deep learning-based models have captured evolutionary, functional, and biochemical features contained in amino acid sequences. However, the existing generative models need to be improved when capturing the relationship between amino acid sites on longer sequences. At the same time, the distribution of protein sequences in the homologous family has a specific positional relationship in the latent space. We want to use this relationship to search for new variants directly from the vicinity of better-performing varieties. RESULTS: To improve the representation learning ability of the model for longer sequences and the similarity between the generated sequences and the original sequences, we propose a temporal variational autoencoder (T-VAE) model. T-VAE consists of an encoder and a decoder. The encoder expands the receptive field of neurons in the network structure by dilated causal convolution, thereby improving the encoding representation ability of longer sequences. The decoder decodes the sampled data into variants closely resembling the original sequence. CONCLUSION: Compared to other models, the person correlation coefficient between the predicted values of protein fitness obtained by T-VAE and the truth values was higher, and the mean absolute deviation was lower. In addition, the T-VAE model has a better representation learning ability for longer sequences when comparing the encoding of protein sequences of different lengths. These results show that our model has more advantages in representation learning for longer sequences. To verify the model’s generative effect, we also calculate the sequence identity between the generated data and the input data. The sequence identity obtained by T-VAE improved by 12.9% compared to the baseline model.
format	Online Article Text
id	pubmed-10362698
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-103626982023-07-23 Searching for protein variants with desired properties using deep generative models Li, Yan Yao, Yinying Xia, Yu Tang, Mingjing BMC Bioinformatics Research BACKGROUND: Protein engineering aims to improve the functional properties of existing proteins to meet people’s needs. Current deep learning-based models have captured evolutionary, functional, and biochemical features contained in amino acid sequences. However, the existing generative models need to be improved when capturing the relationship between amino acid sites on longer sequences. At the same time, the distribution of protein sequences in the homologous family has a specific positional relationship in the latent space. We want to use this relationship to search for new variants directly from the vicinity of better-performing varieties. RESULTS: To improve the representation learning ability of the model for longer sequences and the similarity between the generated sequences and the original sequences, we propose a temporal variational autoencoder (T-VAE) model. T-VAE consists of an encoder and a decoder. The encoder expands the receptive field of neurons in the network structure by dilated causal convolution, thereby improving the encoding representation ability of longer sequences. The decoder decodes the sampled data into variants closely resembling the original sequence. CONCLUSION: Compared to other models, the person correlation coefficient between the predicted values of protein fitness obtained by T-VAE and the truth values was higher, and the mean absolute deviation was lower. In addition, the T-VAE model has a better representation learning ability for longer sequences when comparing the encoding of protein sequences of different lengths. These results show that our model has more advantages in representation learning for longer sequences. To verify the model’s generative effect, we also calculate the sequence identity between the generated data and the input data. The sequence identity obtained by T-VAE improved by 12.9% compared to the baseline model. BioMed Central 2023-07-21 /pmc/articles/PMC10362698/ /pubmed/37480001 http://dx.doi.org/10.1186/s12859-023-05415-9 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Li, Yan Yao, Yinying Xia, Yu Tang, Mingjing Searching for protein variants with desired properties using deep generative models
title	Searching for protein variants with desired properties using deep generative models
title_full	Searching for protein variants with desired properties using deep generative models
title_fullStr	Searching for protein variants with desired properties using deep generative models
title_full_unstemmed	Searching for protein variants with desired properties using deep generative models
title_short	Searching for protein variants with desired properties using deep generative models
title_sort	searching for protein variants with desired properties using deep generative models
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10362698/ https://www.ncbi.nlm.nih.gov/pubmed/37480001 http://dx.doi.org/10.1186/s12859-023-05415-9
work_keys_str_mv	AT liyan searchingforproteinvariantswithdesiredpropertiesusingdeepgenerativemodels AT yaoyinying searchingforproteinvariantswithdesiredpropertiesusingdeepgenerativemodels AT xiayu searchingforproteinvariantswithdesiredpropertiesusingdeepgenerativemodels AT tangmingjing searchingforproteinvariantswithdesiredpropertiesusingdeepgenerativemodels

Searching for protein variants with desired properties using deep generative models

Ejemplares similares