Cargando…

GC-compositional strand bias around transcription start sites in plants and fungi

BACKGROUND: A GC-compositional strand bias or GC-skew (=(C-G)/(C+G)), where C and G denote the numbers of cytosine and guanine residues, was recently reported near the transcription start sites (TSS) of Arabidopsis genes. However, it is unclear whether other eukaryotic species have equally prominent...

Descripción completa

Detalles Bibliográficos
Autores principales: Fujimori, Shigeo, Washio, Takanori, Tomita, Masaru
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC555766/
https://www.ncbi.nlm.nih.gov/pubmed/15733327
http://dx.doi.org/10.1186/1471-2164-6-26
_version_ 1782122559054020608
author Fujimori, Shigeo
Washio, Takanori
Tomita, Masaru
author_facet Fujimori, Shigeo
Washio, Takanori
Tomita, Masaru
author_sort Fujimori, Shigeo
collection PubMed
description BACKGROUND: A GC-compositional strand bias or GC-skew (=(C-G)/(C+G)), where C and G denote the numbers of cytosine and guanine residues, was recently reported near the transcription start sites (TSS) of Arabidopsis genes. However, it is unclear whether other eukaryotic species have equally prominent GC-skews, and the biological meaning of this trait remains unknown. RESULTS: Our study confirmed a significant GC-skew (C > G) in the TSS of Oryza sativa (rice) genes. The full-length cDNAs and genomic sequences from Arabidopsis and rice were compared using statistical analyses. Despite marked differences in the G+C content around the TSS in the two plants, the degrees of bias were almost identical. Although slight GC-skew peaks, including opposite skews (C < G), were detected around the TSS of genes in human and Drosophila, they were qualitatively and quantitatively different from those identified in plants. However, plant-like GC-skew in regions upstream of the translation initiation sites (TIS) in some fungi was identified following analyses of the expressed sequence tags and/or genomic sequences from other species. On the basis of our dataset, we estimated that >70 and 68% of Arabidopsis and rice genes, respectively, had a strong GC-skew (>0.33) in a 100-bp window (that is, the number of C residues was more than double the number of G residues in a +/-100-bp window around the TSS). The mean GC-skew value in the TSS of highly-expressed genes in Arabidopsis was significantly greater than that of genes with low expression levels. Many of the GC-skew peaks were preferentially located near the TSS, so we examined the potential value of GC-skew as an index for TSS identification. Our results confirm that the GC-skew can be used to assist the TSS prediction in plant genomes. CONCLUSION: The GC-skew (C > G) around the TSS is strictly conserved between monocot and eudicot plants (ie. angiosperms in general), and a similar skew has been observed in some fungi. Highly-expressed Arabidopsis genes had overall a more marked GC-skew in the TSS compared to genes with low expression levels. We therefore propose that the GC-skew around the TSS in some plants and fungi is related to transcription. It might be caused by mutations during transcription initiation or the frequent use of transcription factor-biding sites having a strand preference. In addition, GC-skew is a good candidate index for TSS prediction in plant genomes, where there is a lack of correlation among CpG islands and genes.
format Text
id pubmed-555766
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-5557662005-04-01 GC-compositional strand bias around transcription start sites in plants and fungi Fujimori, Shigeo Washio, Takanori Tomita, Masaru BMC Genomics Research Article BACKGROUND: A GC-compositional strand bias or GC-skew (=(C-G)/(C+G)), where C and G denote the numbers of cytosine and guanine residues, was recently reported near the transcription start sites (TSS) of Arabidopsis genes. However, it is unclear whether other eukaryotic species have equally prominent GC-skews, and the biological meaning of this trait remains unknown. RESULTS: Our study confirmed a significant GC-skew (C > G) in the TSS of Oryza sativa (rice) genes. The full-length cDNAs and genomic sequences from Arabidopsis and rice were compared using statistical analyses. Despite marked differences in the G+C content around the TSS in the two plants, the degrees of bias were almost identical. Although slight GC-skew peaks, including opposite skews (C < G), were detected around the TSS of genes in human and Drosophila, they were qualitatively and quantitatively different from those identified in plants. However, plant-like GC-skew in regions upstream of the translation initiation sites (TIS) in some fungi was identified following analyses of the expressed sequence tags and/or genomic sequences from other species. On the basis of our dataset, we estimated that >70 and 68% of Arabidopsis and rice genes, respectively, had a strong GC-skew (>0.33) in a 100-bp window (that is, the number of C residues was more than double the number of G residues in a +/-100-bp window around the TSS). The mean GC-skew value in the TSS of highly-expressed genes in Arabidopsis was significantly greater than that of genes with low expression levels. Many of the GC-skew peaks were preferentially located near the TSS, so we examined the potential value of GC-skew as an index for TSS identification. Our results confirm that the GC-skew can be used to assist the TSS prediction in plant genomes. CONCLUSION: The GC-skew (C > G) around the TSS is strictly conserved between monocot and eudicot plants (ie. angiosperms in general), and a similar skew has been observed in some fungi. Highly-expressed Arabidopsis genes had overall a more marked GC-skew in the TSS compared to genes with low expression levels. We therefore propose that the GC-skew around the TSS in some plants and fungi is related to transcription. It might be caused by mutations during transcription initiation or the frequent use of transcription factor-biding sites having a strand preference. In addition, GC-skew is a good candidate index for TSS prediction in plant genomes, where there is a lack of correlation among CpG islands and genes. BioMed Central 2005-02-28 /pmc/articles/PMC555766/ /pubmed/15733327 http://dx.doi.org/10.1186/1471-2164-6-26 Text en Copyright © 2005 Fujimori et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Fujimori, Shigeo
Washio, Takanori
Tomita, Masaru
GC-compositional strand bias around transcription start sites in plants and fungi
title GC-compositional strand bias around transcription start sites in plants and fungi
title_full GC-compositional strand bias around transcription start sites in plants and fungi
title_fullStr GC-compositional strand bias around transcription start sites in plants and fungi
title_full_unstemmed GC-compositional strand bias around transcription start sites in plants and fungi
title_short GC-compositional strand bias around transcription start sites in plants and fungi
title_sort gc-compositional strand bias around transcription start sites in plants and fungi
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC555766/
https://www.ncbi.nlm.nih.gov/pubmed/15733327
http://dx.doi.org/10.1186/1471-2164-6-26
work_keys_str_mv AT fujimorishigeo gccompositionalstrandbiasaroundtranscriptionstartsitesinplantsandfungi
AT washiotakanori gccompositionalstrandbiasaroundtranscriptionstartsitesinplantsandfungi
AT tomitamasaru gccompositionalstrandbiasaroundtranscriptionstartsitesinplantsandfungi