Cargando…

Neural networks to learn protein sequence–function relationships from deep mutational scanning data

The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein’s behavior and properties. We present a supervised deep learning framework to learn the sequence–function mapping from deep mutational scanning data and make p...

Descripción completa

Detalles Bibliográficos
Autores principales: Gelman, Sam, Fahlberg, Sarah A., Heinzelman, Pete, Romero, Philip A., Gitter, Anthony
Formato: Online Artículo Texto
Lenguaje:English
Publicado: National Academy of Sciences 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8640744/
https://www.ncbi.nlm.nih.gov/pubmed/34815338
http://dx.doi.org/10.1073/pnas.2104878118
_version_ 1784609393076076544
author Gelman, Sam
Fahlberg, Sarah A.
Heinzelman, Pete
Romero, Philip A.
Gitter, Anthony
author_facet Gelman, Sam
Fahlberg, Sarah A.
Heinzelman, Pete
Romero, Philip A.
Gitter, Anthony
author_sort Gelman, Sam
collection PubMed
description The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein’s behavior and properties. We present a supervised deep learning framework to learn the sequence–function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants. We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network’s internal representation affects its ability to learn the sequence–function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find that networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks’ ability to learn biologically meaningful information about protein structure and mechanism. Finally, we demonstrate the models’ ability to navigate sequence space and design new proteins beyond the training set. We applied the protein G B1 domain (GB1) models to design a sequence that binds to immunoglobulin G with substantially higher affinity than wild-type GB1.
format Online
Article
Text
id pubmed-8640744
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher National Academy of Sciences
record_format MEDLINE/PubMed
spelling pubmed-86407442021-12-13 Neural networks to learn protein sequence–function relationships from deep mutational scanning data Gelman, Sam Fahlberg, Sarah A. Heinzelman, Pete Romero, Philip A. Gitter, Anthony Proc Natl Acad Sci U S A Biological Sciences The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein’s behavior and properties. We present a supervised deep learning framework to learn the sequence–function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants. We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network’s internal representation affects its ability to learn the sequence–function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find that networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks’ ability to learn biologically meaningful information about protein structure and mechanism. Finally, we demonstrate the models’ ability to navigate sequence space and design new proteins beyond the training set. We applied the protein G B1 domain (GB1) models to design a sequence that binds to immunoglobulin G with substantially higher affinity than wild-type GB1. National Academy of Sciences 2021-11-23 2021-11-30 /pmc/articles/PMC8640744/ /pubmed/34815338 http://dx.doi.org/10.1073/pnas.2104878118 Text en Copyright © 2021 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by/4.0/This open access article is distributed under Creative Commons Attribution License 4.0 (CC BY) (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Biological Sciences
Gelman, Sam
Fahlberg, Sarah A.
Heinzelman, Pete
Romero, Philip A.
Gitter, Anthony
Neural networks to learn protein sequence–function relationships from deep mutational scanning data
title Neural networks to learn protein sequence–function relationships from deep mutational scanning data
title_full Neural networks to learn protein sequence–function relationships from deep mutational scanning data
title_fullStr Neural networks to learn protein sequence–function relationships from deep mutational scanning data
title_full_unstemmed Neural networks to learn protein sequence–function relationships from deep mutational scanning data
title_short Neural networks to learn protein sequence–function relationships from deep mutational scanning data
title_sort neural networks to learn protein sequence–function relationships from deep mutational scanning data
topic Biological Sciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8640744/
https://www.ncbi.nlm.nih.gov/pubmed/34815338
http://dx.doi.org/10.1073/pnas.2104878118
work_keys_str_mv AT gelmansam neuralnetworkstolearnproteinsequencefunctionrelationshipsfromdeepmutationalscanningdata
AT fahlbergsaraha neuralnetworkstolearnproteinsequencefunctionrelationshipsfromdeepmutationalscanningdata
AT heinzelmanpete neuralnetworkstolearnproteinsequencefunctionrelationshipsfromdeepmutationalscanningdata
AT romerophilipa neuralnetworkstolearnproteinsequencefunctionrelationshipsfromdeepmutationalscanningdata
AT gitteranthony neuralnetworkstolearnproteinsequencefunctionrelationshipsfromdeepmutationalscanningdata