Cargando…

Short sequence motifs, overrepresented in mammalian conserved non-coding sequences

BACKGROUND: A substantial fraction of non-coding DNA sequences of multicellular eukaryotes is under selective constraint. In particular, ~5% of the human genome consists of conserved non-coding sequences (CNSs). CNSs differ from other genomic sequences in their nucleotide composition and must play i...

Descripción completa

Detalles Bibliográficos
Autores principales: Minovitsky, Simon, Stegmaier, Philip, Kel, Alexander, Kondrashov, Alexey S, Dubchak, Inna
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2176071/
https://www.ncbi.nlm.nih.gov/pubmed/17945028
http://dx.doi.org/10.1186/1471-2164-8-378
_version_ 1782145484822937600
author Minovitsky, Simon
Stegmaier, Philip
Kel, Alexander
Kondrashov, Alexey S
Dubchak, Inna
author_facet Minovitsky, Simon
Stegmaier, Philip
Kel, Alexander
Kondrashov, Alexey S
Dubchak, Inna
author_sort Minovitsky, Simon
collection PubMed
description BACKGROUND: A substantial fraction of non-coding DNA sequences of multicellular eukaryotes is under selective constraint. In particular, ~5% of the human genome consists of conserved non-coding sequences (CNSs). CNSs differ from other genomic sequences in their nucleotide composition and must play important functional roles, which mostly remain obscure. RESULTS: We investigated relative abundances of short sequence motifs in all human CNSs present in the human/mouse whole-genome alignments vs. three background sets of sequences: (i) weakly conserved or unconserved non-coding sequences (non-CNSs); (ii) near-promoter sequences (located between nucleotides -500 and -1500, relative to a start of transcription); and (iii) random sequences with the same nucleotide composition as that of CNSs. When compared to non-CNSs and near-promoter sequences, CNSs possess an excess of AT-rich motifs, often containing runs of identical nucleotides. In contrast, when compared to random sequences, CNSs contain an excess of GC-rich motifs which, however, lack CpG dinucleotides. Thus, abundance of short sequence motifs in human CNSs, taken as a whole, is mostly determined by their overall compositional properties and not by overrepresentation of any specific short motifs. These properties are: (i) high AT-content of CNSs, (ii) a tendency, probably due to context-dependent mutation, of A's and T's to clump, (iii) presence of short GC-rich regions, and (iv) avoidance of CpG contexts, due to their hypermutability. Only a small number of short motifs, overrepresented in all human CNSs are similar to binding sites of transcription factors from the FOX family. CONCLUSION: Human CNSs as a whole appear to be too broad a class of sequences to possess strong footprints of any short sequence-specific functions. Such footprints should be studied at the level of functional subclasses of CNSs, such as those which flank genes with a particular pattern of expression. Overall properties of CNSs are affected by patterns in mutation, suggesting that selection which causes their conservation is not always very strong.
format Text
id pubmed-2176071
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-21760712008-01-09 Short sequence motifs, overrepresented in mammalian conserved non-coding sequences Minovitsky, Simon Stegmaier, Philip Kel, Alexander Kondrashov, Alexey S Dubchak, Inna BMC Genomics Research Article BACKGROUND: A substantial fraction of non-coding DNA sequences of multicellular eukaryotes is under selective constraint. In particular, ~5% of the human genome consists of conserved non-coding sequences (CNSs). CNSs differ from other genomic sequences in their nucleotide composition and must play important functional roles, which mostly remain obscure. RESULTS: We investigated relative abundances of short sequence motifs in all human CNSs present in the human/mouse whole-genome alignments vs. three background sets of sequences: (i) weakly conserved or unconserved non-coding sequences (non-CNSs); (ii) near-promoter sequences (located between nucleotides -500 and -1500, relative to a start of transcription); and (iii) random sequences with the same nucleotide composition as that of CNSs. When compared to non-CNSs and near-promoter sequences, CNSs possess an excess of AT-rich motifs, often containing runs of identical nucleotides. In contrast, when compared to random sequences, CNSs contain an excess of GC-rich motifs which, however, lack CpG dinucleotides. Thus, abundance of short sequence motifs in human CNSs, taken as a whole, is mostly determined by their overall compositional properties and not by overrepresentation of any specific short motifs. These properties are: (i) high AT-content of CNSs, (ii) a tendency, probably due to context-dependent mutation, of A's and T's to clump, (iii) presence of short GC-rich regions, and (iv) avoidance of CpG contexts, due to their hypermutability. Only a small number of short motifs, overrepresented in all human CNSs are similar to binding sites of transcription factors from the FOX family. CONCLUSION: Human CNSs as a whole appear to be too broad a class of sequences to possess strong footprints of any short sequence-specific functions. Such footprints should be studied at the level of functional subclasses of CNSs, such as those which flank genes with a particular pattern of expression. Overall properties of CNSs are affected by patterns in mutation, suggesting that selection which causes their conservation is not always very strong. BioMed Central 2007-10-18 /pmc/articles/PMC2176071/ /pubmed/17945028 http://dx.doi.org/10.1186/1471-2164-8-378 Text en Copyright © 2007 Minovitsky et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Minovitsky, Simon
Stegmaier, Philip
Kel, Alexander
Kondrashov, Alexey S
Dubchak, Inna
Short sequence motifs, overrepresented in mammalian conserved non-coding sequences
title Short sequence motifs, overrepresented in mammalian conserved non-coding sequences
title_full Short sequence motifs, overrepresented in mammalian conserved non-coding sequences
title_fullStr Short sequence motifs, overrepresented in mammalian conserved non-coding sequences
title_full_unstemmed Short sequence motifs, overrepresented in mammalian conserved non-coding sequences
title_short Short sequence motifs, overrepresented in mammalian conserved non-coding sequences
title_sort short sequence motifs, overrepresented in mammalian conserved non-coding sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2176071/
https://www.ncbi.nlm.nih.gov/pubmed/17945028
http://dx.doi.org/10.1186/1471-2164-8-378
work_keys_str_mv AT minovitskysimon shortsequencemotifsoverrepresentedinmammalianconservednoncodingsequences
AT stegmaierphilip shortsequencemotifsoverrepresentedinmammalianconservednoncodingsequences
AT kelalexander shortsequencemotifsoverrepresentedinmammalianconservednoncodingsequences
AT kondrashovalexeys shortsequencemotifsoverrepresentedinmammalianconservednoncodingsequences
AT dubchakinna shortsequencemotifsoverrepresentedinmammalianconservednoncodingsequences