Cargando…

Evaluation of protein descriptors in computer-aided rational protein engineering tasks and its application in property prediction in SARS-CoV-2 spike glycoprotein

The importance of protein engineering in the research and development of biopharmaceuticals and biomaterials has increased. Machine learning in computer-aided protein engineering can markedly reduce the experimental effort in identifying optimal sequences that satisfy the desired properties from a l...

Descripción completa

Detalles Bibliográficos
Autores principales: Lim, Hocheol, Jeon, Hyeon-Nae, Lim, Seungcheol, Jang, Yuil, Kim, Taehee, Cho, Hyein, Pan, Jae-Gu, No, Kyoung Tai
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8841378/
https://www.ncbi.nlm.nih.gov/pubmed/35222841
http://dx.doi.org/10.1016/j.csbj.2022.01.027
Descripción
Sumario:The importance of protein engineering in the research and development of biopharmaceuticals and biomaterials has increased. Machine learning in computer-aided protein engineering can markedly reduce the experimental effort in identifying optimal sequences that satisfy the desired properties from a large number of possible protein sequences. To develop general protein descriptors for computer-aided protein engineering tasks, we devised new protein descriptors, one sequence-based descriptor (PCgrades), and three structure-based descriptors (PCspairs, 3D-SPIEs_5.4 Å, and 3D-SPIEs_8Å). While the PCgrades and PCspairs include general and statistical information in physicochemical properties in single and pairwise amino acids respectively, the 3D-SPIEs include specific and quantum–mechanical information with parameterized quantum mechanical calculations (FMO2-DFTB3/D/PCM). To evaluate the protein descriptors, we made prediction models with the new descriptors and previously developed descriptors for diverse protein datasets including protein expression and binding affinity change in SARS-CoV-2 spike glycoprotein. As a result, the newly devised descriptors showed a good performance in diverse datasets, in which the PCspairs showed the best performance ([Formula: see text] for protein expression and [Formula: see text] for binding affinity). As a result, the newly devised descriptors showed a good performance in diverse datasets, in which the PCspairs showed the best performance. Similar approaches with those descriptors would be promising and useful if the prediction models are trained with sufficient quantitative experimental data from high-throughput assays for industrial enzymes or protein drugs.