Cargando…
Neural networks to learn protein sequence–function relationships from deep mutational scanning data
The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein’s behavior and properties. We present a supervised deep learning framework to learn the sequence–function mapping from deep mutational scanning data and make p...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
National Academy of Sciences
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8640744/ https://www.ncbi.nlm.nih.gov/pubmed/34815338 http://dx.doi.org/10.1073/pnas.2104878118 |
_version_ | 1784609393076076544 |
---|---|
author | Gelman, Sam Fahlberg, Sarah A. Heinzelman, Pete Romero, Philip A. Gitter, Anthony |
author_facet | Gelman, Sam Fahlberg, Sarah A. Heinzelman, Pete Romero, Philip A. Gitter, Anthony |
author_sort | Gelman, Sam |
collection | PubMed |
description | The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein’s behavior and properties. We present a supervised deep learning framework to learn the sequence–function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants. We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network’s internal representation affects its ability to learn the sequence–function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find that networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks’ ability to learn biologically meaningful information about protein structure and mechanism. Finally, we demonstrate the models’ ability to navigate sequence space and design new proteins beyond the training set. We applied the protein G B1 domain (GB1) models to design a sequence that binds to immunoglobulin G with substantially higher affinity than wild-type GB1. |
format | Online Article Text |
id | pubmed-8640744 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | National Academy of Sciences |
record_format | MEDLINE/PubMed |
spelling | pubmed-86407442021-12-13 Neural networks to learn protein sequence–function relationships from deep mutational scanning data Gelman, Sam Fahlberg, Sarah A. Heinzelman, Pete Romero, Philip A. Gitter, Anthony Proc Natl Acad Sci U S A Biological Sciences The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein’s behavior and properties. We present a supervised deep learning framework to learn the sequence–function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants. We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network’s internal representation affects its ability to learn the sequence–function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find that networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks’ ability to learn biologically meaningful information about protein structure and mechanism. Finally, we demonstrate the models’ ability to navigate sequence space and design new proteins beyond the training set. We applied the protein G B1 domain (GB1) models to design a sequence that binds to immunoglobulin G with substantially higher affinity than wild-type GB1. National Academy of Sciences 2021-11-23 2021-11-30 /pmc/articles/PMC8640744/ /pubmed/34815338 http://dx.doi.org/10.1073/pnas.2104878118 Text en Copyright © 2021 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by/4.0/This open access article is distributed under Creative Commons Attribution License 4.0 (CC BY) (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Biological Sciences Gelman, Sam Fahlberg, Sarah A. Heinzelman, Pete Romero, Philip A. Gitter, Anthony Neural networks to learn protein sequence–function relationships from deep mutational scanning data |
title | Neural networks to learn protein sequence–function relationships from deep mutational scanning data |
title_full | Neural networks to learn protein sequence–function relationships from deep mutational scanning data |
title_fullStr | Neural networks to learn protein sequence–function relationships from deep mutational scanning data |
title_full_unstemmed | Neural networks to learn protein sequence–function relationships from deep mutational scanning data |
title_short | Neural networks to learn protein sequence–function relationships from deep mutational scanning data |
title_sort | neural networks to learn protein sequence–function relationships from deep mutational scanning data |
topic | Biological Sciences |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8640744/ https://www.ncbi.nlm.nih.gov/pubmed/34815338 http://dx.doi.org/10.1073/pnas.2104878118 |
work_keys_str_mv | AT gelmansam neuralnetworkstolearnproteinsequencefunctionrelationshipsfromdeepmutationalscanningdata AT fahlbergsaraha neuralnetworkstolearnproteinsequencefunctionrelationshipsfromdeepmutationalscanningdata AT heinzelmanpete neuralnetworkstolearnproteinsequencefunctionrelationshipsfromdeepmutationalscanningdata AT romerophilipa neuralnetworkstolearnproteinsequencefunctionrelationshipsfromdeepmutationalscanningdata AT gitteranthony neuralnetworkstolearnproteinsequencefunctionrelationshipsfromdeepmutationalscanningdata |