Cargando…

The importance of definitions in the study of polyQ regions: A tale of thresholds, impurities and sequence context

Polyglutamine (polyQ) regions are one of the most prevalent homorepeats in eukaryotes. It is however difficult to evaluate their prevalence because various studies claim different results. The reason is the lack of a consensus to define what is indeed a polyQ region. We have tackled this issue by st...

Descripción completa

Detalles Bibliográficos
Autores principales: Mier, Pablo, Elena-Real, Carlos, Urbanek, Annika, Bernadó, Pau, Andrade-Navarro, Miguel A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7016039/
https://www.ncbi.nlm.nih.gov/pubmed/32071707
http://dx.doi.org/10.1016/j.csbj.2020.01.012
_version_ 1783496907761385472
author Mier, Pablo
Elena-Real, Carlos
Urbanek, Annika
Bernadó, Pau
Andrade-Navarro, Miguel A.
author_facet Mier, Pablo
Elena-Real, Carlos
Urbanek, Annika
Bernadó, Pau
Andrade-Navarro, Miguel A.
author_sort Mier, Pablo
collection PubMed
description Polyglutamine (polyQ) regions are one of the most prevalent homorepeats in eukaryotes. It is however difficult to evaluate their prevalence because various studies claim different results. The reason is the lack of a consensus to define what is indeed a polyQ region. We have tackled this issue by studying how the use of different thresholds (i.e., minimum number of glutamines required in a protein region of a given size), to detect polyQ regions in the human proteome influences not only their prevalence but also their general features and sequence context. Threshold definition shapes the length distribution of the polyQ dataset, and changes the observed number and position of impurities (amino acids other than glutamine) within polyQ regions. Irrespective of the chosen threshold, leucine and proline residues are enriched both within and around polyQ. While leucine is enriched at the N-terminus of polyQ and specially at position −1 (amino acid preceding the polyQ), proline is prevalent in the C-terminus (positions +1 to +5, that is, the first five amino acids after the polyQ). We also checked the suitability of these thresholds for other species, and compared their polyQ features with those found in humans. As the sequence context and features of polyQ regions are threshold-dependent, we propose a method to quickly scan the polyQ landscape of a proteome. We complement our results with a summarized overview about which biases are to be expected per threshold when studying polyQ regions.
format Online
Article
Text
id pubmed-7016039
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-70160392020-02-18 The importance of definitions in the study of polyQ regions: A tale of thresholds, impurities and sequence context Mier, Pablo Elena-Real, Carlos Urbanek, Annika Bernadó, Pau Andrade-Navarro, Miguel A. Comput Struct Biotechnol J Research Article Polyglutamine (polyQ) regions are one of the most prevalent homorepeats in eukaryotes. It is however difficult to evaluate their prevalence because various studies claim different results. The reason is the lack of a consensus to define what is indeed a polyQ region. We have tackled this issue by studying how the use of different thresholds (i.e., minimum number of glutamines required in a protein region of a given size), to detect polyQ regions in the human proteome influences not only their prevalence but also their general features and sequence context. Threshold definition shapes the length distribution of the polyQ dataset, and changes the observed number and position of impurities (amino acids other than glutamine) within polyQ regions. Irrespective of the chosen threshold, leucine and proline residues are enriched both within and around polyQ. While leucine is enriched at the N-terminus of polyQ and specially at position −1 (amino acid preceding the polyQ), proline is prevalent in the C-terminus (positions +1 to +5, that is, the first five amino acids after the polyQ). We also checked the suitability of these thresholds for other species, and compared their polyQ features with those found in humans. As the sequence context and features of polyQ regions are threshold-dependent, we propose a method to quickly scan the polyQ landscape of a proteome. We complement our results with a summarized overview about which biases are to be expected per threshold when studying polyQ regions. Research Network of Computational and Structural Biotechnology 2020-02-04 /pmc/articles/PMC7016039/ /pubmed/32071707 http://dx.doi.org/10.1016/j.csbj.2020.01.012 Text en © 2020 The Authors http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Mier, Pablo
Elena-Real, Carlos
Urbanek, Annika
Bernadó, Pau
Andrade-Navarro, Miguel A.
The importance of definitions in the study of polyQ regions: A tale of thresholds, impurities and sequence context
title The importance of definitions in the study of polyQ regions: A tale of thresholds, impurities and sequence context
title_full The importance of definitions in the study of polyQ regions: A tale of thresholds, impurities and sequence context
title_fullStr The importance of definitions in the study of polyQ regions: A tale of thresholds, impurities and sequence context
title_full_unstemmed The importance of definitions in the study of polyQ regions: A tale of thresholds, impurities and sequence context
title_short The importance of definitions in the study of polyQ regions: A tale of thresholds, impurities and sequence context
title_sort importance of definitions in the study of polyq regions: a tale of thresholds, impurities and sequence context
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7016039/
https://www.ncbi.nlm.nih.gov/pubmed/32071707
http://dx.doi.org/10.1016/j.csbj.2020.01.012
work_keys_str_mv AT mierpablo theimportanceofdefinitionsinthestudyofpolyqregionsataleofthresholdsimpuritiesandsequencecontext
AT elenarealcarlos theimportanceofdefinitionsinthestudyofpolyqregionsataleofthresholdsimpuritiesandsequencecontext
AT urbanekannika theimportanceofdefinitionsinthestudyofpolyqregionsataleofthresholdsimpuritiesandsequencecontext
AT bernadopau theimportanceofdefinitionsinthestudyofpolyqregionsataleofthresholdsimpuritiesandsequencecontext
AT andradenavarromiguela theimportanceofdefinitionsinthestudyofpolyqregionsataleofthresholdsimpuritiesandsequencecontext
AT mierpablo importanceofdefinitionsinthestudyofpolyqregionsataleofthresholdsimpuritiesandsequencecontext
AT elenarealcarlos importanceofdefinitionsinthestudyofpolyqregionsataleofthresholdsimpuritiesandsequencecontext
AT urbanekannika importanceofdefinitionsinthestudyofpolyqregionsataleofthresholdsimpuritiesandsequencecontext
AT bernadopau importanceofdefinitionsinthestudyofpolyqregionsataleofthresholdsimpuritiesandsequencecontext
AT andradenavarromiguela importanceofdefinitionsinthestudyofpolyqregionsataleofthresholdsimpuritiesandsequencecontext