Cargando…

Regions with two amino acids in protein sequences: A step forward from homorepeats into the low complexity landscape

Low complexity regions (LCRs) differ in amino acid composition from the background provided by the corresponding proteomes. The simplest LCRs are homorepeats (or polyX), regions composed of mostly-one amino acid type. Extensive research has been done to characterize homorepeats, and their taxonomic,...

Descripción completa

Detalles Bibliográficos
Autores principales: Mier, Pablo, Andrade-Navarro, Miguel A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9550522/
https://www.ncbi.nlm.nih.gov/pubmed/36249567
http://dx.doi.org/10.1016/j.csbj.2022.09.011
_version_ 1784805906610913280
author Mier, Pablo
Andrade-Navarro, Miguel A.
author_facet Mier, Pablo
Andrade-Navarro, Miguel A.
author_sort Mier, Pablo
collection PubMed
description Low complexity regions (LCRs) differ in amino acid composition from the background provided by the corresponding proteomes. The simplest LCRs are homorepeats (or polyX), regions composed of mostly-one amino acid type. Extensive research has been done to characterize homorepeats, and their taxonomic, functional and structural features depend on the amino acid type and sequence context. From them, the next step towards the study of LCRs are the regions composed of two types of amino acids, which we call polyXY. We classify polyXY in three categories based on the arrangement of the two amino acid types ‘X’ and ‘Y’: direpeats (e.g. ‘XYXYXY’), joined (e.g. ‘XXXYYY’) and shuffled (e.g. ‘XYYXXY’). We developed a script to search for polyXY, and located them in a comprehensive set of 20,340 reference proteomes. These results are available in a dedicated web server called XYs, in which the user can also submit their own protein datasets to detect polyXY. We studied the distribution of polyXY types by amino acid pair XY and category, and show that polyXY in Eukaryota are mainly located within intrinsically disordered regions. Our study provides a first step towards the characterization of polyXY as protein motifs.
format Online
Article
Text
id pubmed-9550522
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-95505222022-10-14 Regions with two amino acids in protein sequences: A step forward from homorepeats into the low complexity landscape Mier, Pablo Andrade-Navarro, Miguel A. Comput Struct Biotechnol J Research Article Low complexity regions (LCRs) differ in amino acid composition from the background provided by the corresponding proteomes. The simplest LCRs are homorepeats (or polyX), regions composed of mostly-one amino acid type. Extensive research has been done to characterize homorepeats, and their taxonomic, functional and structural features depend on the amino acid type and sequence context. From them, the next step towards the study of LCRs are the regions composed of two types of amino acids, which we call polyXY. We classify polyXY in three categories based on the arrangement of the two amino acid types ‘X’ and ‘Y’: direpeats (e.g. ‘XYXYXY’), joined (e.g. ‘XXXYYY’) and shuffled (e.g. ‘XYYXXY’). We developed a script to search for polyXY, and located them in a comprehensive set of 20,340 reference proteomes. These results are available in a dedicated web server called XYs, in which the user can also submit their own protein datasets to detect polyXY. We studied the distribution of polyXY types by amino acid pair XY and category, and show that polyXY in Eukaryota are mainly located within intrinsically disordered regions. Our study provides a first step towards the characterization of polyXY as protein motifs. Research Network of Computational and Structural Biotechnology 2022-09-18 /pmc/articles/PMC9550522/ /pubmed/36249567 http://dx.doi.org/10.1016/j.csbj.2022.09.011 Text en © 2022 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Research Article
Mier, Pablo
Andrade-Navarro, Miguel A.
Regions with two amino acids in protein sequences: A step forward from homorepeats into the low complexity landscape
title Regions with two amino acids in protein sequences: A step forward from homorepeats into the low complexity landscape
title_full Regions with two amino acids in protein sequences: A step forward from homorepeats into the low complexity landscape
title_fullStr Regions with two amino acids in protein sequences: A step forward from homorepeats into the low complexity landscape
title_full_unstemmed Regions with two amino acids in protein sequences: A step forward from homorepeats into the low complexity landscape
title_short Regions with two amino acids in protein sequences: A step forward from homorepeats into the low complexity landscape
title_sort regions with two amino acids in protein sequences: a step forward from homorepeats into the low complexity landscape
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9550522/
https://www.ncbi.nlm.nih.gov/pubmed/36249567
http://dx.doi.org/10.1016/j.csbj.2022.09.011
work_keys_str_mv AT mierpablo regionswithtwoaminoacidsinproteinsequencesastepforwardfromhomorepeatsintothelowcomplexitylandscape
AT andradenavarromiguela regionswithtwoaminoacidsinproteinsequencesastepforwardfromhomorepeatsintothelowcomplexitylandscape