Cargando…

Thousands of protein linear motif classes may still be undiscovered

Linear motifs are short protein subsequences that mediate protein interactions. Hundreds of motif classes including thousands of motif instances are known. Our theory estimates how many motif classes remain undiscovered. As commonly done, we describe motif classes as regular expressions specifying m...

Descripción completa

Detalles Bibliográficos
Autores principales: Bulavka, Denys, Aptekmann, Ariel A., Méndez, Nicolás A., Krick, Teresa, Sánchez, Ignacio E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8092775/
https://www.ncbi.nlm.nih.gov/pubmed/33939703
http://dx.doi.org/10.1371/journal.pone.0248841
_version_ 1783687692384468992
author Bulavka, Denys
Aptekmann, Ariel A.
Méndez, Nicolás A.
Krick, Teresa
Sánchez, Ignacio E.
author_facet Bulavka, Denys
Aptekmann, Ariel A.
Méndez, Nicolás A.
Krick, Teresa
Sánchez, Ignacio E.
author_sort Bulavka, Denys
collection PubMed
description Linear motifs are short protein subsequences that mediate protein interactions. Hundreds of motif classes including thousands of motif instances are known. Our theory estimates how many motif classes remain undiscovered. As commonly done, we describe motif classes as regular expressions specifying motif length and the allowed amino acids at each motif position. We measure motif specificity for a pair of motif classes by quantifying how many motif-discriminating positions prevent a protein subsequence from matching the two classes at once. We derive theorems for the maximal number of motif classes that can simultaneously maintain a certain number of motif-discriminating positions between all pairs of classes in the motif universe, for a given amino acid alphabet. We also calculate the fraction of all protein subsequences that would belong to a motif class if all potential motif classes came into existence. Naturally occurring pairs of motif classes present most often a single motif-discriminating position. This mild specificity maximizes the potential number of coexisting motif classes, the expansion of the motif universe due to amino acid modifications and the fraction of amino acid sequences that code for a motif instance. As a result, thousands of linear motif classes may remain undiscovered.
format Online
Article
Text
id pubmed-8092775
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-80927752021-05-07 Thousands of protein linear motif classes may still be undiscovered Bulavka, Denys Aptekmann, Ariel A. Méndez, Nicolás A. Krick, Teresa Sánchez, Ignacio E. PLoS One Research Article Linear motifs are short protein subsequences that mediate protein interactions. Hundreds of motif classes including thousands of motif instances are known. Our theory estimates how many motif classes remain undiscovered. As commonly done, we describe motif classes as regular expressions specifying motif length and the allowed amino acids at each motif position. We measure motif specificity for a pair of motif classes by quantifying how many motif-discriminating positions prevent a protein subsequence from matching the two classes at once. We derive theorems for the maximal number of motif classes that can simultaneously maintain a certain number of motif-discriminating positions between all pairs of classes in the motif universe, for a given amino acid alphabet. We also calculate the fraction of all protein subsequences that would belong to a motif class if all potential motif classes came into existence. Naturally occurring pairs of motif classes present most often a single motif-discriminating position. This mild specificity maximizes the potential number of coexisting motif classes, the expansion of the motif universe due to amino acid modifications and the fraction of amino acid sequences that code for a motif instance. As a result, thousands of linear motif classes may remain undiscovered. Public Library of Science 2021-05-03 /pmc/articles/PMC8092775/ /pubmed/33939703 http://dx.doi.org/10.1371/journal.pone.0248841 Text en © 2021 Bulavka et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Bulavka, Denys
Aptekmann, Ariel A.
Méndez, Nicolás A.
Krick, Teresa
Sánchez, Ignacio E.
Thousands of protein linear motif classes may still be undiscovered
title Thousands of protein linear motif classes may still be undiscovered
title_full Thousands of protein linear motif classes may still be undiscovered
title_fullStr Thousands of protein linear motif classes may still be undiscovered
title_full_unstemmed Thousands of protein linear motif classes may still be undiscovered
title_short Thousands of protein linear motif classes may still be undiscovered
title_sort thousands of protein linear motif classes may still be undiscovered
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8092775/
https://www.ncbi.nlm.nih.gov/pubmed/33939703
http://dx.doi.org/10.1371/journal.pone.0248841
work_keys_str_mv AT bulavkadenys thousandsofproteinlinearmotifclassesmaystillbeundiscovered
AT aptekmannariela thousandsofproteinlinearmotifclassesmaystillbeundiscovered
AT mendeznicolasa thousandsofproteinlinearmotifclassesmaystillbeundiscovered
AT krickteresa thousandsofproteinlinearmotifclassesmaystillbeundiscovered
AT sanchezignacioe thousandsofproteinlinearmotifclassesmaystillbeundiscovered