Cargando…

Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention

The established approach to unsupervised protein contact prediction estimates coevolving positions using undirected graphical models. This approach trains a Potts model on a Multiple Sequence Alignment. Increasingly large Transformers are being pretrained on unlabeled, unaligned protein sequence dat...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bhattacharya, Nicholas, Thomas, Neil, Rao, Roshan, Dauparas, Justas, Koo, Peter K., Baker, David, Song, Yun S., Ovchinnikov, Sergey
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8752338/ https://www.ncbi.nlm.nih.gov/pubmed/34890134

Descripción
Sumario:	The established approach to unsupervised protein contact prediction estimates coevolving positions using undirected graphical models. This approach trains a Potts model on a Multiple Sequence Alignment. Increasingly large Transformers are being pretrained on unlabeled, unaligned protein sequence databases and showing competitive performance on protein contact prediction. We argue that attention is a principled model of protein interactions, grounded in real properties of protein family data. We introduce an energy-based attention layer, factored attention, which, in a certain limit, recovers a Potts model, and use it to contrast Potts and Transformers. We show that the Transformer leverages hierarchical signal in protein family databases not captured by single-layer models. This raises the exciting possibility for the development of powerful structured models of protein family databases.

Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention

Ejemplares similares