Cargando…

End-to-end learning of multiple sequence alignments with differentiable Smith–Waterman

MOTIVATION: Multiple sequence alignments (MSAs) of homologous sequences contain information on structural and functional constraints and their evolutionary histories. Despite their importance for many downstream tasks, such as structure prediction, MSA generation is often treated as a separate pre-p...

Descripción completa

Detalles Bibliográficos
Autores principales: Petti, Samantha, Bhattacharya, Nicholas, Rao, Roshan, Dauparas, Justas, Thomas, Neil, Zhou, Juannan, Rush, Alexander M, Koo, Peter, Ovchinnikov, Sergey
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9805565/
https://www.ncbi.nlm.nih.gov/pubmed/36355460
http://dx.doi.org/10.1093/bioinformatics/btac724
_version_ 1784862354514640896
author Petti, Samantha
Bhattacharya, Nicholas
Rao, Roshan
Dauparas, Justas
Thomas, Neil
Zhou, Juannan
Rush, Alexander M
Koo, Peter
Ovchinnikov, Sergey
author_facet Petti, Samantha
Bhattacharya, Nicholas
Rao, Roshan
Dauparas, Justas
Thomas, Neil
Zhou, Juannan
Rush, Alexander M
Koo, Peter
Ovchinnikov, Sergey
author_sort Petti, Samantha
collection PubMed
description MOTIVATION: Multiple sequence alignments (MSAs) of homologous sequences contain information on structural and functional constraints and their evolutionary histories. Despite their importance for many downstream tasks, such as structure prediction, MSA generation is often treated as a separate pre-processing step, without any guidance from the application it will be used for. RESULTS: Here, we implement a smooth and differentiable version of the Smith–Waterman pairwise alignment algorithm that enables jointly learning an MSA and a downstream machine learning system in an end-to-end fashion. To demonstrate its utility, we introduce SMURF (Smooth Markov Unaligned Random Field), a new method that jointly learns an alignment and the parameters of a Markov Random Field for unsupervised contact prediction. We find that SMURF learns MSAs that mildly improve contact prediction on a diverse set of protein and RNA families. As a proof of concept, we demonstrate that by connecting our differentiable alignment module to AlphaFold2 and maximizing predicted confidence, we can learn MSAs that improve structure predictions over the initial MSAs. Interestingly, the alignments that improve AlphaFold predictions are self-inconsistent and can be viewed as adversarial. This work highlights the potential of differentiable dynamic programming to improve neural network pipelines that rely on an alignment and the potential dangers of optimizing predictions of protein sequences with methods that are not fully understood. AVAILABILITY AND IMPLEMENTATION: Our code and examples are available at: https://github.com/spetti/SMURF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9805565
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-98055652023-01-03 End-to-end learning of multiple sequence alignments with differentiable Smith–Waterman Petti, Samantha Bhattacharya, Nicholas Rao, Roshan Dauparas, Justas Thomas, Neil Zhou, Juannan Rush, Alexander M Koo, Peter Ovchinnikov, Sergey Bioinformatics Original Paper MOTIVATION: Multiple sequence alignments (MSAs) of homologous sequences contain information on structural and functional constraints and their evolutionary histories. Despite their importance for many downstream tasks, such as structure prediction, MSA generation is often treated as a separate pre-processing step, without any guidance from the application it will be used for. RESULTS: Here, we implement a smooth and differentiable version of the Smith–Waterman pairwise alignment algorithm that enables jointly learning an MSA and a downstream machine learning system in an end-to-end fashion. To demonstrate its utility, we introduce SMURF (Smooth Markov Unaligned Random Field), a new method that jointly learns an alignment and the parameters of a Markov Random Field for unsupervised contact prediction. We find that SMURF learns MSAs that mildly improve contact prediction on a diverse set of protein and RNA families. As a proof of concept, we demonstrate that by connecting our differentiable alignment module to AlphaFold2 and maximizing predicted confidence, we can learn MSAs that improve structure predictions over the initial MSAs. Interestingly, the alignments that improve AlphaFold predictions are self-inconsistent and can be viewed as adversarial. This work highlights the potential of differentiable dynamic programming to improve neural network pipelines that rely on an alignment and the potential dangers of optimizing predictions of protein sequences with methods that are not fully understood. AVAILABILITY AND IMPLEMENTATION: Our code and examples are available at: https://github.com/spetti/SMURF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-11-10 /pmc/articles/PMC9805565/ /pubmed/36355460 http://dx.doi.org/10.1093/bioinformatics/btac724 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Petti, Samantha
Bhattacharya, Nicholas
Rao, Roshan
Dauparas, Justas
Thomas, Neil
Zhou, Juannan
Rush, Alexander M
Koo, Peter
Ovchinnikov, Sergey
End-to-end learning of multiple sequence alignments with differentiable Smith–Waterman
title End-to-end learning of multiple sequence alignments with differentiable Smith–Waterman
title_full End-to-end learning of multiple sequence alignments with differentiable Smith–Waterman
title_fullStr End-to-end learning of multiple sequence alignments with differentiable Smith–Waterman
title_full_unstemmed End-to-end learning of multiple sequence alignments with differentiable Smith–Waterman
title_short End-to-end learning of multiple sequence alignments with differentiable Smith–Waterman
title_sort end-to-end learning of multiple sequence alignments with differentiable smith–waterman
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9805565/
https://www.ncbi.nlm.nih.gov/pubmed/36355460
http://dx.doi.org/10.1093/bioinformatics/btac724
work_keys_str_mv AT pettisamantha endtoendlearningofmultiplesequencealignmentswithdifferentiablesmithwaterman
AT bhattacharyanicholas endtoendlearningofmultiplesequencealignmentswithdifferentiablesmithwaterman
AT raoroshan endtoendlearningofmultiplesequencealignmentswithdifferentiablesmithwaterman
AT dauparasjustas endtoendlearningofmultiplesequencealignmentswithdifferentiablesmithwaterman
AT thomasneil endtoendlearningofmultiplesequencealignmentswithdifferentiablesmithwaterman
AT zhoujuannan endtoendlearningofmultiplesequencealignmentswithdifferentiablesmithwaterman
AT rushalexanderm endtoendlearningofmultiplesequencealignmentswithdifferentiablesmithwaterman
AT koopeter endtoendlearningofmultiplesequencealignmentswithdifferentiablesmithwaterman
AT ovchinnikovsergey endtoendlearningofmultiplesequencealignmentswithdifferentiablesmithwaterman