Cargando…
Unsupervised Inference of Protein Fitness Landscape from Deep Mutational Scan
The recent technological advances underlying the screening of large combinatorial libraries in high-throughput mutational scans deepen our understanding of adaptive protein evolution and boost its applications in protein design. Nevertheless, the large number of possible genotypes requires suitable...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7783173/ https://www.ncbi.nlm.nih.gov/pubmed/32770229 http://dx.doi.org/10.1093/molbev/msaa204 |
_version_ | 1783632058353975296 |
---|---|
author | Fernandez-de-Cossio-Diaz, Jorge Uguzzoni, Guido Pagnani, Andrea |
author_facet | Fernandez-de-Cossio-Diaz, Jorge Uguzzoni, Guido Pagnani, Andrea |
author_sort | Fernandez-de-Cossio-Diaz, Jorge |
collection | PubMed |
description | The recent technological advances underlying the screening of large combinatorial libraries in high-throughput mutational scans deepen our understanding of adaptive protein evolution and boost its applications in protein design. Nevertheless, the large number of possible genotypes requires suitable computational methods for data analysis, the prediction of mutational effects, and the generation of optimized sequences. We describe a computational method that, trained on sequencing samples from multiple rounds of a screening experiment, provides a model of the genotype–fitness relationship. We tested the method on five large-scale mutational scans, yielding accurate predictions of the mutational effects on fitness. The inferred fitness landscape is robust to experimental and sampling noise and exhibits high generalization power in terms of broader sequence space exploration and higher fitness variant predictions. We investigate the role of epistasis and show that the inferred model provides structural information about the 3D contacts in the molecular fold. |
format | Online Article Text |
id | pubmed-7783173 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-77831732021-01-08 Unsupervised Inference of Protein Fitness Landscape from Deep Mutational Scan Fernandez-de-Cossio-Diaz, Jorge Uguzzoni, Guido Pagnani, Andrea Mol Biol Evol Methods The recent technological advances underlying the screening of large combinatorial libraries in high-throughput mutational scans deepen our understanding of adaptive protein evolution and boost its applications in protein design. Nevertheless, the large number of possible genotypes requires suitable computational methods for data analysis, the prediction of mutational effects, and the generation of optimized sequences. We describe a computational method that, trained on sequencing samples from multiple rounds of a screening experiment, provides a model of the genotype–fitness relationship. We tested the method on five large-scale mutational scans, yielding accurate predictions of the mutational effects on fitness. The inferred fitness landscape is robust to experimental and sampling noise and exhibits high generalization power in terms of broader sequence space exploration and higher fitness variant predictions. We investigate the role of epistasis and show that the inferred model provides structural information about the 3D contacts in the molecular fold. Oxford University Press 2020-08-08 /pmc/articles/PMC7783173/ /pubmed/32770229 http://dx.doi.org/10.1093/molbev/msaa204 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Methods Fernandez-de-Cossio-Diaz, Jorge Uguzzoni, Guido Pagnani, Andrea Unsupervised Inference of Protein Fitness Landscape from Deep Mutational Scan |
title | Unsupervised Inference of Protein Fitness Landscape from Deep Mutational Scan |
title_full | Unsupervised Inference of Protein Fitness Landscape from Deep Mutational Scan |
title_fullStr | Unsupervised Inference of Protein Fitness Landscape from Deep Mutational Scan |
title_full_unstemmed | Unsupervised Inference of Protein Fitness Landscape from Deep Mutational Scan |
title_short | Unsupervised Inference of Protein Fitness Landscape from Deep Mutational Scan |
title_sort | unsupervised inference of protein fitness landscape from deep mutational scan |
topic | Methods |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7783173/ https://www.ncbi.nlm.nih.gov/pubmed/32770229 http://dx.doi.org/10.1093/molbev/msaa204 |
work_keys_str_mv | AT fernandezdecossiodiazjorge unsupervisedinferenceofproteinfitnesslandscapefromdeepmutationalscan AT uguzzoniguido unsupervisedinferenceofproteinfitnesslandscapefromdeepmutationalscan AT pagnaniandrea unsupervisedinferenceofproteinfitnesslandscapefromdeepmutationalscan |