Cargando…

Modeling mutational effects on biochemical phenotypes using convolutional neural networks: application to SARS-CoV-2

Biochemical phenotypes are major indexes for protein structure and function characterization. They are determined, at least in part, by the intrinsic physicochemical properties of amino acids and may be reflected in the protein three-dimensional structure. Modeling mutational effects on biochemical...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Bo, Gamazon, Eric R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7852230/
https://www.ncbi.nlm.nih.gov/pubmed/33532766
http://dx.doi.org/10.1101/2021.01.28.428521
_version_ 1783645781769584640
author Wang, Bo
Gamazon, Eric R.
author_facet Wang, Bo
Gamazon, Eric R.
author_sort Wang, Bo
collection PubMed
description Biochemical phenotypes are major indexes for protein structure and function characterization. They are determined, at least in part, by the intrinsic physicochemical properties of amino acids and may be reflected in the protein three-dimensional structure. Modeling mutational effects on biochemical phenotypes is a critical step for understanding protein function and disease mechanism as well as enabling drug discovery. Deep Mutational Scanning (DMS) experiments have been performed on SARS-CoV-2’s spike receptor binding domain and the human ACE2 zinc-binding peptidase domain - both central players in viral infection and evolution and antibody evasion - quantifying how mutations impact binding affinity and protein expression. Here, we modeled biochemical phenotypes from massively parallel assays, using convolutional neural networks trained on protein sequence mutations in the virus and human host. We found that neural networks are significantly predictive of binding affinity, protein expression, and antibody escape, learning complex interactions and higher-order features that are difficult to capture with conventional methods from structural biology. Integrating the intrinsic physicochemical properties of amino acids, including hydrophobicity, solvent-accessible surface area, and long-range non-bonded energy per atom, significantly improved prediction (empirical p<0.01) though there was such a strong dependence on the sequence data alone to yield reasonably good prediction. We observed concordance of the DMS data and our neural network predictions with an independent study on intermolecular interactions from molecular dynamics (multiple 500 ns or 1 μs all-atom) simulations of the spike protein-ACE2 interface, with critical implications for the use of deep learning to dissect molecular mechanisms. The mutation- or genetically-determined component of a biochemical phenotype estimated from the neural networks has improved causal inference properties relative to the original phenotype and can facilitate crucial insights into disease pathophysiology and therapeutic design.
format Online
Article
Text
id pubmed-7852230
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-78522302021-02-03 Modeling mutational effects on biochemical phenotypes using convolutional neural networks: application to SARS-CoV-2 Wang, Bo Gamazon, Eric R. bioRxiv Article Biochemical phenotypes are major indexes for protein structure and function characterization. They are determined, at least in part, by the intrinsic physicochemical properties of amino acids and may be reflected in the protein three-dimensional structure. Modeling mutational effects on biochemical phenotypes is a critical step for understanding protein function and disease mechanism as well as enabling drug discovery. Deep Mutational Scanning (DMS) experiments have been performed on SARS-CoV-2’s spike receptor binding domain and the human ACE2 zinc-binding peptidase domain - both central players in viral infection and evolution and antibody evasion - quantifying how mutations impact binding affinity and protein expression. Here, we modeled biochemical phenotypes from massively parallel assays, using convolutional neural networks trained on protein sequence mutations in the virus and human host. We found that neural networks are significantly predictive of binding affinity, protein expression, and antibody escape, learning complex interactions and higher-order features that are difficult to capture with conventional methods from structural biology. Integrating the intrinsic physicochemical properties of amino acids, including hydrophobicity, solvent-accessible surface area, and long-range non-bonded energy per atom, significantly improved prediction (empirical p<0.01) though there was such a strong dependence on the sequence data alone to yield reasonably good prediction. We observed concordance of the DMS data and our neural network predictions with an independent study on intermolecular interactions from molecular dynamics (multiple 500 ns or 1 μs all-atom) simulations of the spike protein-ACE2 interface, with critical implications for the use of deep learning to dissect molecular mechanisms. The mutation- or genetically-determined component of a biochemical phenotype estimated from the neural networks has improved causal inference properties relative to the original phenotype and can facilitate crucial insights into disease pathophysiology and therapeutic design. Cold Spring Harbor Laboratory 2021-02-08 /pmc/articles/PMC7852230/ /pubmed/33532766 http://dx.doi.org/10.1101/2021.01.28.428521 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Wang, Bo
Gamazon, Eric R.
Modeling mutational effects on biochemical phenotypes using convolutional neural networks: application to SARS-CoV-2
title Modeling mutational effects on biochemical phenotypes using convolutional neural networks: application to SARS-CoV-2
title_full Modeling mutational effects on biochemical phenotypes using convolutional neural networks: application to SARS-CoV-2
title_fullStr Modeling mutational effects on biochemical phenotypes using convolutional neural networks: application to SARS-CoV-2
title_full_unstemmed Modeling mutational effects on biochemical phenotypes using convolutional neural networks: application to SARS-CoV-2
title_short Modeling mutational effects on biochemical phenotypes using convolutional neural networks: application to SARS-CoV-2
title_sort modeling mutational effects on biochemical phenotypes using convolutional neural networks: application to sars-cov-2
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7852230/
https://www.ncbi.nlm.nih.gov/pubmed/33532766
http://dx.doi.org/10.1101/2021.01.28.428521
work_keys_str_mv AT wangbo modelingmutationaleffectsonbiochemicalphenotypesusingconvolutionalneuralnetworksapplicationtosarscov2
AT gamazonericr modelingmutationaleffectsonbiochemicalphenotypesusingconvolutionalneuralnetworksapplicationtosarscov2