Cargando…
Regions with two amino acids in protein sequences: A step forward from homorepeats into the low complexity landscape
Low complexity regions (LCRs) differ in amino acid composition from the background provided by the corresponding proteomes. The simplest LCRs are homorepeats (or polyX), regions composed of mostly-one amino acid type. Extensive research has been done to characterize homorepeats, and their taxonomic,...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Research Network of Computational and Structural Biotechnology
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9550522/ https://www.ncbi.nlm.nih.gov/pubmed/36249567 http://dx.doi.org/10.1016/j.csbj.2022.09.011 |
_version_ | 1784805906610913280 |
---|---|
author | Mier, Pablo Andrade-Navarro, Miguel A. |
author_facet | Mier, Pablo Andrade-Navarro, Miguel A. |
author_sort | Mier, Pablo |
collection | PubMed |
description | Low complexity regions (LCRs) differ in amino acid composition from the background provided by the corresponding proteomes. The simplest LCRs are homorepeats (or polyX), regions composed of mostly-one amino acid type. Extensive research has been done to characterize homorepeats, and their taxonomic, functional and structural features depend on the amino acid type and sequence context. From them, the next step towards the study of LCRs are the regions composed of two types of amino acids, which we call polyXY. We classify polyXY in three categories based on the arrangement of the two amino acid types ‘X’ and ‘Y’: direpeats (e.g. ‘XYXYXY’), joined (e.g. ‘XXXYYY’) and shuffled (e.g. ‘XYYXXY’). We developed a script to search for polyXY, and located them in a comprehensive set of 20,340 reference proteomes. These results are available in a dedicated web server called XYs, in which the user can also submit their own protein datasets to detect polyXY. We studied the distribution of polyXY types by amino acid pair XY and category, and show that polyXY in Eukaryota are mainly located within intrinsically disordered regions. Our study provides a first step towards the characterization of polyXY as protein motifs. |
format | Online Article Text |
id | pubmed-9550522 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Research Network of Computational and Structural Biotechnology |
record_format | MEDLINE/PubMed |
spelling | pubmed-95505222022-10-14 Regions with two amino acids in protein sequences: A step forward from homorepeats into the low complexity landscape Mier, Pablo Andrade-Navarro, Miguel A. Comput Struct Biotechnol J Research Article Low complexity regions (LCRs) differ in amino acid composition from the background provided by the corresponding proteomes. The simplest LCRs are homorepeats (or polyX), regions composed of mostly-one amino acid type. Extensive research has been done to characterize homorepeats, and their taxonomic, functional and structural features depend on the amino acid type and sequence context. From them, the next step towards the study of LCRs are the regions composed of two types of amino acids, which we call polyXY. We classify polyXY in three categories based on the arrangement of the two amino acid types ‘X’ and ‘Y’: direpeats (e.g. ‘XYXYXY’), joined (e.g. ‘XXXYYY’) and shuffled (e.g. ‘XYYXXY’). We developed a script to search for polyXY, and located them in a comprehensive set of 20,340 reference proteomes. These results are available in a dedicated web server called XYs, in which the user can also submit their own protein datasets to detect polyXY. We studied the distribution of polyXY types by amino acid pair XY and category, and show that polyXY in Eukaryota are mainly located within intrinsically disordered regions. Our study provides a first step towards the characterization of polyXY as protein motifs. Research Network of Computational and Structural Biotechnology 2022-09-18 /pmc/articles/PMC9550522/ /pubmed/36249567 http://dx.doi.org/10.1016/j.csbj.2022.09.011 Text en © 2022 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Research Article Mier, Pablo Andrade-Navarro, Miguel A. Regions with two amino acids in protein sequences: A step forward from homorepeats into the low complexity landscape |
title | Regions with two amino acids in protein sequences: A step forward from homorepeats into the low complexity landscape |
title_full | Regions with two amino acids in protein sequences: A step forward from homorepeats into the low complexity landscape |
title_fullStr | Regions with two amino acids in protein sequences: A step forward from homorepeats into the low complexity landscape |
title_full_unstemmed | Regions with two amino acids in protein sequences: A step forward from homorepeats into the low complexity landscape |
title_short | Regions with two amino acids in protein sequences: A step forward from homorepeats into the low complexity landscape |
title_sort | regions with two amino acids in protein sequences: a step forward from homorepeats into the low complexity landscape |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9550522/ https://www.ncbi.nlm.nih.gov/pubmed/36249567 http://dx.doi.org/10.1016/j.csbj.2022.09.011 |
work_keys_str_mv | AT mierpablo regionswithtwoaminoacidsinproteinsequencesastepforwardfromhomorepeatsintothelowcomplexitylandscape AT andradenavarromiguela regionswithtwoaminoacidsinproteinsequencesastepforwardfromhomorepeatsintothelowcomplexitylandscape |