Cargando…
Short sequence motifs, overrepresented in mammalian conserved non-coding sequences
BACKGROUND: A substantial fraction of non-coding DNA sequences of multicellular eukaryotes is under selective constraint. In particular, ~5% of the human genome consists of conserved non-coding sequences (CNSs). CNSs differ from other genomic sequences in their nucleotide composition and must play i...
Autores principales: | , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2007
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2176071/ https://www.ncbi.nlm.nih.gov/pubmed/17945028 http://dx.doi.org/10.1186/1471-2164-8-378 |
_version_ | 1782145484822937600 |
---|---|
author | Minovitsky, Simon Stegmaier, Philip Kel, Alexander Kondrashov, Alexey S Dubchak, Inna |
author_facet | Minovitsky, Simon Stegmaier, Philip Kel, Alexander Kondrashov, Alexey S Dubchak, Inna |
author_sort | Minovitsky, Simon |
collection | PubMed |
description | BACKGROUND: A substantial fraction of non-coding DNA sequences of multicellular eukaryotes is under selective constraint. In particular, ~5% of the human genome consists of conserved non-coding sequences (CNSs). CNSs differ from other genomic sequences in their nucleotide composition and must play important functional roles, which mostly remain obscure. RESULTS: We investigated relative abundances of short sequence motifs in all human CNSs present in the human/mouse whole-genome alignments vs. three background sets of sequences: (i) weakly conserved or unconserved non-coding sequences (non-CNSs); (ii) near-promoter sequences (located between nucleotides -500 and -1500, relative to a start of transcription); and (iii) random sequences with the same nucleotide composition as that of CNSs. When compared to non-CNSs and near-promoter sequences, CNSs possess an excess of AT-rich motifs, often containing runs of identical nucleotides. In contrast, when compared to random sequences, CNSs contain an excess of GC-rich motifs which, however, lack CpG dinucleotides. Thus, abundance of short sequence motifs in human CNSs, taken as a whole, is mostly determined by their overall compositional properties and not by overrepresentation of any specific short motifs. These properties are: (i) high AT-content of CNSs, (ii) a tendency, probably due to context-dependent mutation, of A's and T's to clump, (iii) presence of short GC-rich regions, and (iv) avoidance of CpG contexts, due to their hypermutability. Only a small number of short motifs, overrepresented in all human CNSs are similar to binding sites of transcription factors from the FOX family. CONCLUSION: Human CNSs as a whole appear to be too broad a class of sequences to possess strong footprints of any short sequence-specific functions. Such footprints should be studied at the level of functional subclasses of CNSs, such as those which flank genes with a particular pattern of expression. Overall properties of CNSs are affected by patterns in mutation, suggesting that selection which causes their conservation is not always very strong. |
format | Text |
id | pubmed-2176071 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2007 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-21760712008-01-09 Short sequence motifs, overrepresented in mammalian conserved non-coding sequences Minovitsky, Simon Stegmaier, Philip Kel, Alexander Kondrashov, Alexey S Dubchak, Inna BMC Genomics Research Article BACKGROUND: A substantial fraction of non-coding DNA sequences of multicellular eukaryotes is under selective constraint. In particular, ~5% of the human genome consists of conserved non-coding sequences (CNSs). CNSs differ from other genomic sequences in their nucleotide composition and must play important functional roles, which mostly remain obscure. RESULTS: We investigated relative abundances of short sequence motifs in all human CNSs present in the human/mouse whole-genome alignments vs. three background sets of sequences: (i) weakly conserved or unconserved non-coding sequences (non-CNSs); (ii) near-promoter sequences (located between nucleotides -500 and -1500, relative to a start of transcription); and (iii) random sequences with the same nucleotide composition as that of CNSs. When compared to non-CNSs and near-promoter sequences, CNSs possess an excess of AT-rich motifs, often containing runs of identical nucleotides. In contrast, when compared to random sequences, CNSs contain an excess of GC-rich motifs which, however, lack CpG dinucleotides. Thus, abundance of short sequence motifs in human CNSs, taken as a whole, is mostly determined by their overall compositional properties and not by overrepresentation of any specific short motifs. These properties are: (i) high AT-content of CNSs, (ii) a tendency, probably due to context-dependent mutation, of A's and T's to clump, (iii) presence of short GC-rich regions, and (iv) avoidance of CpG contexts, due to their hypermutability. Only a small number of short motifs, overrepresented in all human CNSs are similar to binding sites of transcription factors from the FOX family. CONCLUSION: Human CNSs as a whole appear to be too broad a class of sequences to possess strong footprints of any short sequence-specific functions. Such footprints should be studied at the level of functional subclasses of CNSs, such as those which flank genes with a particular pattern of expression. Overall properties of CNSs are affected by patterns in mutation, suggesting that selection which causes their conservation is not always very strong. BioMed Central 2007-10-18 /pmc/articles/PMC2176071/ /pubmed/17945028 http://dx.doi.org/10.1186/1471-2164-8-378 Text en Copyright © 2007 Minovitsky et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Minovitsky, Simon Stegmaier, Philip Kel, Alexander Kondrashov, Alexey S Dubchak, Inna Short sequence motifs, overrepresented in mammalian conserved non-coding sequences |
title | Short sequence motifs, overrepresented in mammalian conserved non-coding sequences |
title_full | Short sequence motifs, overrepresented in mammalian conserved non-coding sequences |
title_fullStr | Short sequence motifs, overrepresented in mammalian conserved non-coding sequences |
title_full_unstemmed | Short sequence motifs, overrepresented in mammalian conserved non-coding sequences |
title_short | Short sequence motifs, overrepresented in mammalian conserved non-coding sequences |
title_sort | short sequence motifs, overrepresented in mammalian conserved non-coding sequences |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2176071/ https://www.ncbi.nlm.nih.gov/pubmed/17945028 http://dx.doi.org/10.1186/1471-2164-8-378 |
work_keys_str_mv | AT minovitskysimon shortsequencemotifsoverrepresentedinmammalianconservednoncodingsequences AT stegmaierphilip shortsequencemotifsoverrepresentedinmammalianconservednoncodingsequences AT kelalexander shortsequencemotifsoverrepresentedinmammalianconservednoncodingsequences AT kondrashovalexeys shortsequencemotifsoverrepresentedinmammalianconservednoncodingsequences AT dubchakinna shortsequencemotifsoverrepresentedinmammalianconservednoncodingsequences |