Cargando…

A sequence embedding method for enzyme optimal condition analysis

BACKGROUND: An enzyme activity is influenced by the external environment. It is important to have an enzyme remain high activity in a specific condition. A usual way is to first determine the optimal condition of an enzyme by either the gradient test or by tertiary structure, and then to use protein...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Xiangjun, Dou, Zhixin, Sun, Yuqing, Wang, Lushan, Gong, Bin, Wan, Lin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2020
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7653822/ https://www.ncbi.nlm.nih.gov/pubmed/33167861 http://dx.doi.org/10.1186/s12859-020-03851-5

_version_	1783607952984244224
author	Li, Xiangjun Dou, Zhixin Sun, Yuqing Wang, Lushan Gong, Bin Wan, Lin
author_facet	Li, Xiangjun Dou, Zhixin Sun, Yuqing Wang, Lushan Gong, Bin Wan, Lin
author_sort	Li, Xiangjun
collection	PubMed
description	BACKGROUND: An enzyme activity is influenced by the external environment. It is important to have an enzyme remain high activity in a specific condition. A usual way is to first determine the optimal condition of an enzyme by either the gradient test or by tertiary structure, and then to use protein engineering to mutate a wild type enzyme for a higher activity in an expected condition. RESULTS: In this paper, we investigate the optimal condition of an enzyme by directly analyzing the sequence. We propose an embedding method to represent the amino acids and the structural information as vectors in the latent space. These vectors contain information about the correlations between amino acids and sites in the aligned amino acid sequences, as well as the correlation with the optimal condition. We crawled and processed the amino acid sequences in the glycoside hydrolase GH11 family, and got 125 amino acid sequences with optimal pH condition. We used probabilistic approximation method to implement the embedding learning method on these samples. Based on these embedding vectors, we design a computational score to determine which one has a better optimal condition for two given amino acid sequences and achieves the accuracy 80% on the test proteins in the same family. We also give the mutation suggestion such that it has a higher activity in an expected environment, which is consistent with the previously professional wet experiments and analysis. CONCLUSION: A new computational method is proposed for the sequence based on the enzyme optimal condition analysis. Compared with the traditional process that involves a lot of wet experiments and requires multiple mutations, this method can give recommendations on the direction and location of amino acid substitution with reference significance for an expected condition in an efficient and effective way.
format	Online Article Text
id	pubmed-7653822
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-76538222020-11-16 A sequence embedding method for enzyme optimal condition analysis Li, Xiangjun Dou, Zhixin Sun, Yuqing Wang, Lushan Gong, Bin Wan, Lin BMC Bioinformatics Methodology Article BACKGROUND: An enzyme activity is influenced by the external environment. It is important to have an enzyme remain high activity in a specific condition. A usual way is to first determine the optimal condition of an enzyme by either the gradient test or by tertiary structure, and then to use protein engineering to mutate a wild type enzyme for a higher activity in an expected condition. RESULTS: In this paper, we investigate the optimal condition of an enzyme by directly analyzing the sequence. We propose an embedding method to represent the amino acids and the structural information as vectors in the latent space. These vectors contain information about the correlations between amino acids and sites in the aligned amino acid sequences, as well as the correlation with the optimal condition. We crawled and processed the amino acid sequences in the glycoside hydrolase GH11 family, and got 125 amino acid sequences with optimal pH condition. We used probabilistic approximation method to implement the embedding learning method on these samples. Based on these embedding vectors, we design a computational score to determine which one has a better optimal condition for two given amino acid sequences and achieves the accuracy 80% on the test proteins in the same family. We also give the mutation suggestion such that it has a higher activity in an expected environment, which is consistent with the previously professional wet experiments and analysis. CONCLUSION: A new computational method is proposed for the sequence based on the enzyme optimal condition analysis. Compared with the traditional process that involves a lot of wet experiments and requires multiple mutations, this method can give recommendations on the direction and location of amino acid substitution with reference significance for an expected condition in an efficient and effective way. BioMed Central 2020-11-10 /pmc/articles/PMC7653822/ /pubmed/33167861 http://dx.doi.org/10.1186/s12859-020-03851-5 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Methodology Article Li, Xiangjun Dou, Zhixin Sun, Yuqing Wang, Lushan Gong, Bin Wan, Lin A sequence embedding method for enzyme optimal condition analysis
title	A sequence embedding method for enzyme optimal condition analysis
title_full	A sequence embedding method for enzyme optimal condition analysis
title_fullStr	A sequence embedding method for enzyme optimal condition analysis
title_full_unstemmed	A sequence embedding method for enzyme optimal condition analysis
title_short	A sequence embedding method for enzyme optimal condition analysis
title_sort	sequence embedding method for enzyme optimal condition analysis
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7653822/ https://www.ncbi.nlm.nih.gov/pubmed/33167861 http://dx.doi.org/10.1186/s12859-020-03851-5
work_keys_str_mv	AT lixiangjun asequenceembeddingmethodforenzymeoptimalconditionanalysis AT douzhixin asequenceembeddingmethodforenzymeoptimalconditionanalysis AT sunyuqing asequenceembeddingmethodforenzymeoptimalconditionanalysis AT wanglushan asequenceembeddingmethodforenzymeoptimalconditionanalysis AT gongbin asequenceembeddingmethodforenzymeoptimalconditionanalysis AT wanlin asequenceembeddingmethodforenzymeoptimalconditionanalysis AT lixiangjun sequenceembeddingmethodforenzymeoptimalconditionanalysis AT douzhixin sequenceembeddingmethodforenzymeoptimalconditionanalysis AT sunyuqing sequenceembeddingmethodforenzymeoptimalconditionanalysis AT wanglushan sequenceembeddingmethodforenzymeoptimalconditionanalysis AT gongbin sequenceembeddingmethodforenzymeoptimalconditionanalysis AT wanlin sequenceembeddingmethodforenzymeoptimalconditionanalysis

A sequence embedding method for enzyme optimal condition analysis

Ejemplares similares