Cargando…
Low Complexity Regions in Proteins and DNA are Poorly Correlated
Low complexity sequences (LCRs) are well known within coding as well as non-coding sequences. A low complexity region within a protein must be encoded by the underlying DNA sequence. Here, we examine the relationship between the entropy of the protein sequence and that of the DNA sequence which enco...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10124876/ https://www.ncbi.nlm.nih.gov/pubmed/37036379 http://dx.doi.org/10.1093/molbev/msad084 |
_version_ | 1785029927023673344 |
---|---|
author | Enright, Johanna M Dickson, Zachery W Golding, G Brian |
author_facet | Enright, Johanna M Dickson, Zachery W Golding, G Brian |
author_sort | Enright, Johanna M |
collection | PubMed |
description | Low complexity sequences (LCRs) are well known within coding as well as non-coding sequences. A low complexity region within a protein must be encoded by the underlying DNA sequence. Here, we examine the relationship between the entropy of the protein sequence and that of the DNA sequence which encodes it. We show that they are poorly correlated whether starting with a low complexity region within the protein and comparing it to the corresponding sequence in the DNA or by finding a low complexity region within coding DNA and comparing it to the corresponding sequence in the protein. We show this is the case within the proteomes of five model organisms: Homo sapiens, Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, and Arabidopsis thaliana. We also report a significant bias against mononucleic codons in LCR encoding sequences. By comparison with simulated proteomes, we show that highly repetitive LCRs may be explained by neutral, slippage-based evolution, but compositionally biased LCRs with cryptic repeats are not. We demonstrate that other biological biases and forces must be acting to create and maintain these LCRs. Uncovering these forces will improve our understanding of protein LCR evolution. |
format | Online Article Text |
id | pubmed-10124876 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-101248762023-04-25 Low Complexity Regions in Proteins and DNA are Poorly Correlated Enright, Johanna M Dickson, Zachery W Golding, G Brian Mol Biol Evol Discoveries Low complexity sequences (LCRs) are well known within coding as well as non-coding sequences. A low complexity region within a protein must be encoded by the underlying DNA sequence. Here, we examine the relationship between the entropy of the protein sequence and that of the DNA sequence which encodes it. We show that they are poorly correlated whether starting with a low complexity region within the protein and comparing it to the corresponding sequence in the DNA or by finding a low complexity region within coding DNA and comparing it to the corresponding sequence in the protein. We show this is the case within the proteomes of five model organisms: Homo sapiens, Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, and Arabidopsis thaliana. We also report a significant bias against mononucleic codons in LCR encoding sequences. By comparison with simulated proteomes, we show that highly repetitive LCRs may be explained by neutral, slippage-based evolution, but compositionally biased LCRs with cryptic repeats are not. We demonstrate that other biological biases and forces must be acting to create and maintain these LCRs. Uncovering these forces will improve our understanding of protein LCR evolution. Oxford University Press 2023-04-10 /pmc/articles/PMC10124876/ /pubmed/37036379 http://dx.doi.org/10.1093/molbev/msad084 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Discoveries Enright, Johanna M Dickson, Zachery W Golding, G Brian Low Complexity Regions in Proteins and DNA are Poorly Correlated |
title | Low Complexity Regions in Proteins and DNA are Poorly Correlated |
title_full | Low Complexity Regions in Proteins and DNA are Poorly Correlated |
title_fullStr | Low Complexity Regions in Proteins and DNA are Poorly Correlated |
title_full_unstemmed | Low Complexity Regions in Proteins and DNA are Poorly Correlated |
title_short | Low Complexity Regions in Proteins and DNA are Poorly Correlated |
title_sort | low complexity regions in proteins and dna are poorly correlated |
topic | Discoveries |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10124876/ https://www.ncbi.nlm.nih.gov/pubmed/37036379 http://dx.doi.org/10.1093/molbev/msad084 |
work_keys_str_mv | AT enrightjohannam lowcomplexityregionsinproteinsanddnaarepoorlycorrelated AT dicksonzacheryw lowcomplexityregionsinproteinsanddnaarepoorlycorrelated AT goldinggbrian lowcomplexityregionsinproteinsanddnaarepoorlycorrelated |