Cargando…

Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention

The established approach to unsupervised protein contact prediction estimates coevolving positions using undirected graphical models. This approach trains a Potts model on a Multiple Sequence Alignment. Increasingly large Transformers are being pretrained on unlabeled, unaligned protein sequence dat...

Descripción completa

Detalles Bibliográficos
Autores principales: Bhattacharya, Nicholas, Thomas, Neil, Rao, Roshan, Dauparas, Justas, Koo, Peter K., Baker, David, Song, Yun S., Ovchinnikov, Sergey
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8752338/
https://www.ncbi.nlm.nih.gov/pubmed/34890134
_version_ 1784631867189755904
author Bhattacharya, Nicholas
Thomas, Neil
Rao, Roshan
Dauparas, Justas
Koo, Peter K.
Baker, David
Song, Yun S.
Ovchinnikov, Sergey
author_facet Bhattacharya, Nicholas
Thomas, Neil
Rao, Roshan
Dauparas, Justas
Koo, Peter K.
Baker, David
Song, Yun S.
Ovchinnikov, Sergey
author_sort Bhattacharya, Nicholas
collection PubMed
description The established approach to unsupervised protein contact prediction estimates coevolving positions using undirected graphical models. This approach trains a Potts model on a Multiple Sequence Alignment. Increasingly large Transformers are being pretrained on unlabeled, unaligned protein sequence databases and showing competitive performance on protein contact prediction. We argue that attention is a principled model of protein interactions, grounded in real properties of protein family data. We introduce an energy-based attention layer, factored attention, which, in a certain limit, recovers a Potts model, and use it to contrast Potts and Transformers. We show that the Transformer leverages hierarchical signal in protein family databases not captured by single-layer models. This raises the exciting possibility for the development of powerful structured models of protein family databases.
format Online
Article
Text
id pubmed-8752338
institution National Center for Biotechnology Information
language English
publishDate 2022
record_format MEDLINE/PubMed
spelling pubmed-87523382022-01-11 Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention Bhattacharya, Nicholas Thomas, Neil Rao, Roshan Dauparas, Justas Koo, Peter K. Baker, David Song, Yun S. Ovchinnikov, Sergey Pac Symp Biocomput Article The established approach to unsupervised protein contact prediction estimates coevolving positions using undirected graphical models. This approach trains a Potts model on a Multiple Sequence Alignment. Increasingly large Transformers are being pretrained on unlabeled, unaligned protein sequence databases and showing competitive performance on protein contact prediction. We argue that attention is a principled model of protein interactions, grounded in real properties of protein family data. We introduce an energy-based attention layer, factored attention, which, in a certain limit, recovers a Potts model, and use it to contrast Potts and Transformers. We show that the Transformer leverages hierarchical signal in protein family databases not captured by single-layer models. This raises the exciting possibility for the development of powerful structured models of protein family databases. 2022 /pmc/articles/PMC8752338/ /pubmed/34890134 Text en https://creativecommons.org/licenses/by-nc/4.0/Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution Non-Commercial (CC BY-NC) 4.0 License.
spellingShingle Article
Bhattacharya, Nicholas
Thomas, Neil
Rao, Roshan
Dauparas, Justas
Koo, Peter K.
Baker, David
Song, Yun S.
Ovchinnikov, Sergey
Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention
title Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention
title_full Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention
title_fullStr Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention
title_full_unstemmed Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention
title_short Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention
title_sort interpreting potts and transformer protein models through the lens of simplified attention
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8752338/
https://www.ncbi.nlm.nih.gov/pubmed/34890134
work_keys_str_mv AT bhattacharyanicholas interpretingpottsandtransformerproteinmodelsthroughthelensofsimplifiedattention
AT thomasneil interpretingpottsandtransformerproteinmodelsthroughthelensofsimplifiedattention
AT raoroshan interpretingpottsandtransformerproteinmodelsthroughthelensofsimplifiedattention
AT dauparasjustas interpretingpottsandtransformerproteinmodelsthroughthelensofsimplifiedattention
AT koopeterk interpretingpottsandtransformerproteinmodelsthroughthelensofsimplifiedattention
AT bakerdavid interpretingpottsandtransformerproteinmodelsthroughthelensofsimplifiedattention
AT songyuns interpretingpottsandtransformerproteinmodelsthroughthelensofsimplifiedattention
AT ovchinnikovsergey interpretingpottsandtransformerproteinmodelsthroughthelensofsimplifiedattention