Cargando…

Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention

The established approach to unsupervised protein contact prediction estimates coevolving positions using undirected graphical models. This approach trains a Potts model on a Multiple Sequence Alignment. Increasingly large Transformers are being pretrained on unlabeled, unaligned protein sequence dat...

Descripción completa

Detalles Bibliográficos
Autores principales: Bhattacharya, Nicholas, Thomas, Neil, Rao, Roshan, Dauparas, Justas, Koo, Peter K., Baker, David, Song, Yun S., Ovchinnikov, Sergey
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8752338/
https://www.ncbi.nlm.nih.gov/pubmed/34890134
Descripción
Sumario:The established approach to unsupervised protein contact prediction estimates coevolving positions using undirected graphical models. This approach trains a Potts model on a Multiple Sequence Alignment. Increasingly large Transformers are being pretrained on unlabeled, unaligned protein sequence databases and showing competitive performance on protein contact prediction. We argue that attention is a principled model of protein interactions, grounded in real properties of protein family data. We introduce an energy-based attention layer, factored attention, which, in a certain limit, recovers a Potts model, and use it to contrast Potts and Transformers. We show that the Transformer leverages hierarchical signal in protein family databases not captured by single-layer models. This raises the exciting possibility for the development of powerful structured models of protein family databases.