Cargando…
Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention
The established approach to unsupervised protein contact prediction estimates coevolving positions using undirected graphical models. This approach trains a Potts model on a Multiple Sequence Alignment. Increasingly large Transformers are being pretrained on unlabeled, unaligned protein sequence dat...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8752338/ https://www.ncbi.nlm.nih.gov/pubmed/34890134 |
_version_ | 1784631867189755904 |
---|---|
author | Bhattacharya, Nicholas Thomas, Neil Rao, Roshan Dauparas, Justas Koo, Peter K. Baker, David Song, Yun S. Ovchinnikov, Sergey |
author_facet | Bhattacharya, Nicholas Thomas, Neil Rao, Roshan Dauparas, Justas Koo, Peter K. Baker, David Song, Yun S. Ovchinnikov, Sergey |
author_sort | Bhattacharya, Nicholas |
collection | PubMed |
description | The established approach to unsupervised protein contact prediction estimates coevolving positions using undirected graphical models. This approach trains a Potts model on a Multiple Sequence Alignment. Increasingly large Transformers are being pretrained on unlabeled, unaligned protein sequence databases and showing competitive performance on protein contact prediction. We argue that attention is a principled model of protein interactions, grounded in real properties of protein family data. We introduce an energy-based attention layer, factored attention, which, in a certain limit, recovers a Potts model, and use it to contrast Potts and Transformers. We show that the Transformer leverages hierarchical signal in protein family databases not captured by single-layer models. This raises the exciting possibility for the development of powerful structured models of protein family databases. |
format | Online Article Text |
id | pubmed-8752338 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
record_format | MEDLINE/PubMed |
spelling | pubmed-87523382022-01-11 Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention Bhattacharya, Nicholas Thomas, Neil Rao, Roshan Dauparas, Justas Koo, Peter K. Baker, David Song, Yun S. Ovchinnikov, Sergey Pac Symp Biocomput Article The established approach to unsupervised protein contact prediction estimates coevolving positions using undirected graphical models. This approach trains a Potts model on a Multiple Sequence Alignment. Increasingly large Transformers are being pretrained on unlabeled, unaligned protein sequence databases and showing competitive performance on protein contact prediction. We argue that attention is a principled model of protein interactions, grounded in real properties of protein family data. We introduce an energy-based attention layer, factored attention, which, in a certain limit, recovers a Potts model, and use it to contrast Potts and Transformers. We show that the Transformer leverages hierarchical signal in protein family databases not captured by single-layer models. This raises the exciting possibility for the development of powerful structured models of protein family databases. 2022 /pmc/articles/PMC8752338/ /pubmed/34890134 Text en https://creativecommons.org/licenses/by-nc/4.0/Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution Non-Commercial (CC BY-NC) 4.0 License. |
spellingShingle | Article Bhattacharya, Nicholas Thomas, Neil Rao, Roshan Dauparas, Justas Koo, Peter K. Baker, David Song, Yun S. Ovchinnikov, Sergey Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention |
title | Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention |
title_full | Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention |
title_fullStr | Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention |
title_full_unstemmed | Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention |
title_short | Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention |
title_sort | interpreting potts and transformer protein models through the lens of simplified attention |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8752338/ https://www.ncbi.nlm.nih.gov/pubmed/34890134 |
work_keys_str_mv | AT bhattacharyanicholas interpretingpottsandtransformerproteinmodelsthroughthelensofsimplifiedattention AT thomasneil interpretingpottsandtransformerproteinmodelsthroughthelensofsimplifiedattention AT raoroshan interpretingpottsandtransformerproteinmodelsthroughthelensofsimplifiedattention AT dauparasjustas interpretingpottsandtransformerproteinmodelsthroughthelensofsimplifiedattention AT koopeterk interpretingpottsandtransformerproteinmodelsthroughthelensofsimplifiedattention AT bakerdavid interpretingpottsandtransformerproteinmodelsthroughthelensofsimplifiedattention AT songyuns interpretingpottsandtransformerproteinmodelsthroughthelensofsimplifiedattention AT ovchinnikovsergey interpretingpottsandtransformerproteinmodelsthroughthelensofsimplifiedattention |