Cargando…

Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning

A major challenge to the characterization of intrinsically disordered regions (IDRs), which are widespread in the proteome, but relatively poorly understood, is the identification of molecular features that mediate functions of these regions, such as short motifs, amino acid repeats and physicochemi...

Descripción completa

Detalles Bibliográficos
Autores principales: Lu, Alex X., Lu, Amy X., Pritišanac, Iva, Zarin, Taraneh, Forman-Kay, Julie D., Moses, Alan M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9275697/
https://www.ncbi.nlm.nih.gov/pubmed/35767567
http://dx.doi.org/10.1371/journal.pcbi.1010238
_version_ 1784745544836448256
author Lu, Alex X.
Lu, Amy X.
Pritišanac, Iva
Zarin, Taraneh
Forman-Kay, Julie D.
Moses, Alan M.
author_facet Lu, Alex X.
Lu, Amy X.
Pritišanac, Iva
Zarin, Taraneh
Forman-Kay, Julie D.
Moses, Alan M.
author_sort Lu, Alex X.
collection PubMed
description A major challenge to the characterization of intrinsically disordered regions (IDRs), which are widespread in the proteome, but relatively poorly understood, is the identification of molecular features that mediate functions of these regions, such as short motifs, amino acid repeats and physicochemical properties. Here, we introduce a proteome-scale feature discovery approach for IDRs. Our approach, which we call “reverse homology”, exploits the principle that important functional features are conserved over evolution. We use this as a contrastive learning signal for deep learning: given a set of homologous IDRs, the neural network has to correctly choose a held-out homolog from another set of IDRs sampled randomly from the proteome. We pair reverse homology with a simple architecture and standard interpretation techniques, and show that the network learns conserved features of IDRs that can be interpreted as motifs, repeats, or bulk features like charge or amino acid propensities. We also show that our model can be used to produce visualizations of what residues and regions are most important to IDR function, generating hypotheses for uncharacterized IDRs. Our results suggest that feature discovery using unsupervised neural networks is a promising avenue to gain systematic insight into poorly understood protein sequences.
format Online
Article
Text
id pubmed-9275697
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-92756972022-07-13 Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning Lu, Alex X. Lu, Amy X. Pritišanac, Iva Zarin, Taraneh Forman-Kay, Julie D. Moses, Alan M. PLoS Comput Biol Research Article A major challenge to the characterization of intrinsically disordered regions (IDRs), which are widespread in the proteome, but relatively poorly understood, is the identification of molecular features that mediate functions of these regions, such as short motifs, amino acid repeats and physicochemical properties. Here, we introduce a proteome-scale feature discovery approach for IDRs. Our approach, which we call “reverse homology”, exploits the principle that important functional features are conserved over evolution. We use this as a contrastive learning signal for deep learning: given a set of homologous IDRs, the neural network has to correctly choose a held-out homolog from another set of IDRs sampled randomly from the proteome. We pair reverse homology with a simple architecture and standard interpretation techniques, and show that the network learns conserved features of IDRs that can be interpreted as motifs, repeats, or bulk features like charge or amino acid propensities. We also show that our model can be used to produce visualizations of what residues and regions are most important to IDR function, generating hypotheses for uncharacterized IDRs. Our results suggest that feature discovery using unsupervised neural networks is a promising avenue to gain systematic insight into poorly understood protein sequences. Public Library of Science 2022-06-29 /pmc/articles/PMC9275697/ /pubmed/35767567 http://dx.doi.org/10.1371/journal.pcbi.1010238 Text en © 2022 Lu et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Lu, Alex X.
Lu, Amy X.
Pritišanac, Iva
Zarin, Taraneh
Forman-Kay, Julie D.
Moses, Alan M.
Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning
title Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning
title_full Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning
title_fullStr Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning
title_full_unstemmed Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning
title_short Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning
title_sort discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9275697/
https://www.ncbi.nlm.nih.gov/pubmed/35767567
http://dx.doi.org/10.1371/journal.pcbi.1010238
work_keys_str_mv AT lualexx discoveringmolecularfeaturesofintrinsicallydisorderedregionsbyusingevolutionforcontrastivelearning
AT luamyx discoveringmolecularfeaturesofintrinsicallydisorderedregionsbyusingevolutionforcontrastivelearning
AT pritisanaciva discoveringmolecularfeaturesofintrinsicallydisorderedregionsbyusingevolutionforcontrastivelearning
AT zarintaraneh discoveringmolecularfeaturesofintrinsicallydisorderedregionsbyusingevolutionforcontrastivelearning
AT formankayjulied discoveringmolecularfeaturesofintrinsicallydisorderedregionsbyusingevolutionforcontrastivelearning
AT mosesalanm discoveringmolecularfeaturesofintrinsicallydisorderedregionsbyusingevolutionforcontrastivelearning