Cargando…

Short sequence motifs, overrepresented in mammalian conserved non-coding sequences

BACKGROUND: A substantial fraction of non-coding DNA sequences of multicellular eukaryotes is under selective constraint. In particular, ~5% of the human genome consists of conserved non-coding sequences (CNSs). CNSs differ from other genomic sequences in their nucleotide composition and must play i...

Descripción completa

Detalles Bibliográficos
Autores principales:	Minovitsky, Simon, Stegmaier, Philip, Kel, Alexander, Kondrashov, Alexey S, Dubchak, Inna
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2007
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2176071/ https://www.ncbi.nlm.nih.gov/pubmed/17945028 http://dx.doi.org/10.1186/1471-2164-8-378

_version_	1782145484822937600
author	Minovitsky, Simon Stegmaier, Philip Kel, Alexander Kondrashov, Alexey S Dubchak, Inna
author_facet	Minovitsky, Simon Stegmaier, Philip Kel, Alexander Kondrashov, Alexey S Dubchak, Inna
author_sort	Minovitsky, Simon
collection	PubMed
description	BACKGROUND: A substantial fraction of non-coding DNA sequences of multicellular eukaryotes is under selective constraint. In particular, ~5% of the human genome consists of conserved non-coding sequences (CNSs). CNSs differ from other genomic sequences in their nucleotide composition and must play important functional roles, which mostly remain obscure. RESULTS: We investigated relative abundances of short sequence motifs in all human CNSs present in the human/mouse whole-genome alignments vs. three background sets of sequences: (i) weakly conserved or unconserved non-coding sequences (non-CNSs); (ii) near-promoter sequences (located between nucleotides -500 and -1500, relative to a start of transcription); and (iii) random sequences with the same nucleotide composition as that of CNSs. When compared to non-CNSs and near-promoter sequences, CNSs possess an excess of AT-rich motifs, often containing runs of identical nucleotides. In contrast, when compared to random sequences, CNSs contain an excess of GC-rich motifs which, however, lack CpG dinucleotides. Thus, abundance of short sequence motifs in human CNSs, taken as a whole, is mostly determined by their overall compositional properties and not by overrepresentation of any specific short motifs. These properties are: (i) high AT-content of CNSs, (ii) a tendency, probably due to context-dependent mutation, of A's and T's to clump, (iii) presence of short GC-rich regions, and (iv) avoidance of CpG contexts, due to their hypermutability. Only a small number of short motifs, overrepresented in all human CNSs are similar to binding sites of transcription factors from the FOX family. CONCLUSION: Human CNSs as a whole appear to be too broad a class of sequences to possess strong footprints of any short sequence-specific functions. Such footprints should be studied at the level of functional subclasses of CNSs, such as those which flank genes with a particular pattern of expression. Overall properties of CNSs are affected by patterns in mutation, suggesting that selection which causes their conservation is not always very strong.
format	Text
id	pubmed-2176071
institution	National Center for Biotechnology Information
language	English
publishDate	2007
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-21760712008-01-09 Short sequence motifs, overrepresented in mammalian conserved non-coding sequences Minovitsky, Simon Stegmaier, Philip Kel, Alexander Kondrashov, Alexey S Dubchak, Inna BMC Genomics Research Article BACKGROUND: A substantial fraction of non-coding DNA sequences of multicellular eukaryotes is under selective constraint. In particular, ~5% of the human genome consists of conserved non-coding sequences (CNSs). CNSs differ from other genomic sequences in their nucleotide composition and must play important functional roles, which mostly remain obscure. RESULTS: We investigated relative abundances of short sequence motifs in all human CNSs present in the human/mouse whole-genome alignments vs. three background sets of sequences: (i) weakly conserved or unconserved non-coding sequences (non-CNSs); (ii) near-promoter sequences (located between nucleotides -500 and -1500, relative to a start of transcription); and (iii) random sequences with the same nucleotide composition as that of CNSs. When compared to non-CNSs and near-promoter sequences, CNSs possess an excess of AT-rich motifs, often containing runs of identical nucleotides. In contrast, when compared to random sequences, CNSs contain an excess of GC-rich motifs which, however, lack CpG dinucleotides. Thus, abundance of short sequence motifs in human CNSs, taken as a whole, is mostly determined by their overall compositional properties and not by overrepresentation of any specific short motifs. These properties are: (i) high AT-content of CNSs, (ii) a tendency, probably due to context-dependent mutation, of A's and T's to clump, (iii) presence of short GC-rich regions, and (iv) avoidance of CpG contexts, due to their hypermutability. Only a small number of short motifs, overrepresented in all human CNSs are similar to binding sites of transcription factors from the FOX family. CONCLUSION: Human CNSs as a whole appear to be too broad a class of sequences to possess strong footprints of any short sequence-specific functions. Such footprints should be studied at the level of functional subclasses of CNSs, such as those which flank genes with a particular pattern of expression. Overall properties of CNSs are affected by patterns in mutation, suggesting that selection which causes their conservation is not always very strong. BioMed Central 2007-10-18 /pmc/articles/PMC2176071/ /pubmed/17945028 http://dx.doi.org/10.1186/1471-2164-8-378 Text en Copyright © 2007 Minovitsky et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Minovitsky, Simon Stegmaier, Philip Kel, Alexander Kondrashov, Alexey S Dubchak, Inna Short sequence motifs, overrepresented in mammalian conserved non-coding sequences
title	Short sequence motifs, overrepresented in mammalian conserved non-coding sequences
title_full	Short sequence motifs, overrepresented in mammalian conserved non-coding sequences
title_fullStr	Short sequence motifs, overrepresented in mammalian conserved non-coding sequences
title_full_unstemmed	Short sequence motifs, overrepresented in mammalian conserved non-coding sequences
title_short	Short sequence motifs, overrepresented in mammalian conserved non-coding sequences
title_sort	short sequence motifs, overrepresented in mammalian conserved non-coding sequences
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2176071/ https://www.ncbi.nlm.nih.gov/pubmed/17945028 http://dx.doi.org/10.1186/1471-2164-8-378
work_keys_str_mv	AT minovitskysimon shortsequencemotifsoverrepresentedinmammalianconservednoncodingsequences AT stegmaierphilip shortsequencemotifsoverrepresentedinmammalianconservednoncodingsequences AT kelalexander shortsequencemotifsoverrepresentedinmammalianconservednoncodingsequences AT kondrashovalexeys shortsequencemotifsoverrepresentedinmammalianconservednoncodingsequences AT dubchakinna shortsequencemotifsoverrepresentedinmammalianconservednoncodingsequences

Short sequence motifs, overrepresented in mammalian conserved non-coding sequences

Ejemplares similares