Cargando…

Accurate and efficient protein sequence design through learning concise local environment of residues

MOTIVATION: Computational protein sequence design has been widely applied in rational protein engineering and increasing the design accuracy and efficiency is highly desired. RESULTS: Here, we present ProDESIGN-LE, an accurate and efficient approach to protein sequence design. ProDESIGN-LE adopts a...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Bin, Fan, Tingwen, Wang, Kaiyue, Zhang, Haicang, Yu, Chungong, Nie, Shuyu, Qi, Yangshuo, Zheng, Wei-Mou, Han, Jian, Fan, Zheng, Sun, Shiwei, Ye, Sheng, Yang, Huaiyi, Bu, Dongbo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10027430/
https://www.ncbi.nlm.nih.gov/pubmed/36916746
http://dx.doi.org/10.1093/bioinformatics/btad122
_version_ 1784909710616428544
author Huang, Bin
Fan, Tingwen
Wang, Kaiyue
Zhang, Haicang
Yu, Chungong
Nie, Shuyu
Qi, Yangshuo
Zheng, Wei-Mou
Han, Jian
Fan, Zheng
Sun, Shiwei
Ye, Sheng
Yang, Huaiyi
Bu, Dongbo
author_facet Huang, Bin
Fan, Tingwen
Wang, Kaiyue
Zhang, Haicang
Yu, Chungong
Nie, Shuyu
Qi, Yangshuo
Zheng, Wei-Mou
Han, Jian
Fan, Zheng
Sun, Shiwei
Ye, Sheng
Yang, Huaiyi
Bu, Dongbo
author_sort Huang, Bin
collection PubMed
description MOTIVATION: Computational protein sequence design has been widely applied in rational protein engineering and increasing the design accuracy and efficiency is highly desired. RESULTS: Here, we present ProDESIGN-LE, an accurate and efficient approach to protein sequence design. ProDESIGN-LE adopts a concise but informative representation of the residue’s local environment and trains a transformer to learn the correlation between local environment of residues and their amino acid types. For a target backbone structure, ProDESIGN-LE uses the transformer to assign an appropriate residue type for each position based on its local environment within this structure, eventually acquiring a designed sequence with all residues fitting well with their local environments. We applied ProDESIGN-LE to design sequences for 68 naturally occurring and 129 hallucinated proteins within 20 s per protein on average. The designed proteins have their predicted structures perfectly resembling the target structures with a state-of-the-art average TM-score exceeding 0.80. We further experimentally validated ProDESIGN-LE by designing five sequences for an enzyme, chloramphenicol O-acetyltransferase type III (CAT III), and recombinantly expressing the proteins in Escherichia coli. Of these proteins, three exhibited excellent solubility, and one yielded monomeric species with circular dichroism spectra consistent with the natural CAT III protein. AVAILABILITY AND IMPLEMENTATION: The source code of ProDESIGN-LE is available at https://github.com/bigict/ProDESIGN-LE.
format Online
Article
Text
id pubmed-10027430
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-100274302023-03-21 Accurate and efficient protein sequence design through learning concise local environment of residues Huang, Bin Fan, Tingwen Wang, Kaiyue Zhang, Haicang Yu, Chungong Nie, Shuyu Qi, Yangshuo Zheng, Wei-Mou Han, Jian Fan, Zheng Sun, Shiwei Ye, Sheng Yang, Huaiyi Bu, Dongbo Bioinformatics Original Paper MOTIVATION: Computational protein sequence design has been widely applied in rational protein engineering and increasing the design accuracy and efficiency is highly desired. RESULTS: Here, we present ProDESIGN-LE, an accurate and efficient approach to protein sequence design. ProDESIGN-LE adopts a concise but informative representation of the residue’s local environment and trains a transformer to learn the correlation between local environment of residues and their amino acid types. For a target backbone structure, ProDESIGN-LE uses the transformer to assign an appropriate residue type for each position based on its local environment within this structure, eventually acquiring a designed sequence with all residues fitting well with their local environments. We applied ProDESIGN-LE to design sequences for 68 naturally occurring and 129 hallucinated proteins within 20 s per protein on average. The designed proteins have their predicted structures perfectly resembling the target structures with a state-of-the-art average TM-score exceeding 0.80. We further experimentally validated ProDESIGN-LE by designing five sequences for an enzyme, chloramphenicol O-acetyltransferase type III (CAT III), and recombinantly expressing the proteins in Escherichia coli. Of these proteins, three exhibited excellent solubility, and one yielded monomeric species with circular dichroism spectra consistent with the natural CAT III protein. AVAILABILITY AND IMPLEMENTATION: The source code of ProDESIGN-LE is available at https://github.com/bigict/ProDESIGN-LE. Oxford University Press 2023-03-14 /pmc/articles/PMC10027430/ /pubmed/36916746 http://dx.doi.org/10.1093/bioinformatics/btad122 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Huang, Bin
Fan, Tingwen
Wang, Kaiyue
Zhang, Haicang
Yu, Chungong
Nie, Shuyu
Qi, Yangshuo
Zheng, Wei-Mou
Han, Jian
Fan, Zheng
Sun, Shiwei
Ye, Sheng
Yang, Huaiyi
Bu, Dongbo
Accurate and efficient protein sequence design through learning concise local environment of residues
title Accurate and efficient protein sequence design through learning concise local environment of residues
title_full Accurate and efficient protein sequence design through learning concise local environment of residues
title_fullStr Accurate and efficient protein sequence design through learning concise local environment of residues
title_full_unstemmed Accurate and efficient protein sequence design through learning concise local environment of residues
title_short Accurate and efficient protein sequence design through learning concise local environment of residues
title_sort accurate and efficient protein sequence design through learning concise local environment of residues
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10027430/
https://www.ncbi.nlm.nih.gov/pubmed/36916746
http://dx.doi.org/10.1093/bioinformatics/btad122
work_keys_str_mv AT huangbin accurateandefficientproteinsequencedesignthroughlearningconciselocalenvironmentofresidues
AT fantingwen accurateandefficientproteinsequencedesignthroughlearningconciselocalenvironmentofresidues
AT wangkaiyue accurateandefficientproteinsequencedesignthroughlearningconciselocalenvironmentofresidues
AT zhanghaicang accurateandefficientproteinsequencedesignthroughlearningconciselocalenvironmentofresidues
AT yuchungong accurateandefficientproteinsequencedesignthroughlearningconciselocalenvironmentofresidues
AT nieshuyu accurateandefficientproteinsequencedesignthroughlearningconciselocalenvironmentofresidues
AT qiyangshuo accurateandefficientproteinsequencedesignthroughlearningconciselocalenvironmentofresidues
AT zhengweimou accurateandefficientproteinsequencedesignthroughlearningconciselocalenvironmentofresidues
AT hanjian accurateandefficientproteinsequencedesignthroughlearningconciselocalenvironmentofresidues
AT fanzheng accurateandefficientproteinsequencedesignthroughlearningconciselocalenvironmentofresidues
AT sunshiwei accurateandefficientproteinsequencedesignthroughlearningconciselocalenvironmentofresidues
AT yesheng accurateandefficientproteinsequencedesignthroughlearningconciselocalenvironmentofresidues
AT yanghuaiyi accurateandefficientproteinsequencedesignthroughlearningconciselocalenvironmentofresidues
AT budongbo accurateandefficientproteinsequencedesignthroughlearningconciselocalenvironmentofresidues