Cargando…
The word landscape of the non-coding segments of the Arabidopsis thaliana genome
BACKGROUND: Genome sequences can be conceptualized as arrangements of motifs or words. The frequencies and positional distributions of these words within particular non-coding genomic segments provide important insights into how the words function in processes such as mRNA stability and regulation o...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2770528/ https://www.ncbi.nlm.nih.gov/pubmed/19814816 http://dx.doi.org/10.1186/1471-2164-10-463 |
_version_ | 1782173676193447936 |
---|---|
author | Lichtenberg, Jens Yilmaz, Alper Welch, Joshua D Kurz, Kyle Liang, Xiaoyu Drews, Frank Ecker, Klaus Lee, Stephen S Geisler, Matt Grotewold, Erich Welch, Lonnie R |
author_facet | Lichtenberg, Jens Yilmaz, Alper Welch, Joshua D Kurz, Kyle Liang, Xiaoyu Drews, Frank Ecker, Klaus Lee, Stephen S Geisler, Matt Grotewold, Erich Welch, Lonnie R |
author_sort | Lichtenberg, Jens |
collection | PubMed |
description | BACKGROUND: Genome sequences can be conceptualized as arrangements of motifs or words. The frequencies and positional distributions of these words within particular non-coding genomic segments provide important insights into how the words function in processes such as mRNA stability and regulation of gene expression. RESULTS: Using an enumerative word discovery approach, we investigated the frequencies and positional distributions of all 65,536 different 8-letter words in the genome of Arabidopsis thaliana. Focusing on promoter regions, introns, and 3' and 5' untranslated regions (3'UTRs and 5'UTRs), we compared word frequencies in these segments to genome-wide frequencies. The statistically interesting words in each segment were clustered with similar words to generate motif logos. We investigated whether words were clustered at particular locations or were distributed randomly within each genomic segment, and we classified the words using gene expression information from public repositories. Finally, we investigated whether particular sets of words appeared together more frequently than others. CONCLUSION: Our studies provide a detailed view of the word composition of several segments of the non-coding portion of the Arabidopsis genome. Each segment contains a unique word-based signature. The respective signatures consist of the sets of enriched words, 'unwords', and word pairs within a segment, as well as the preferential locations and functional classifications for the signature words. Additionally, the positional distributions of enriched words within the segments highlight possible functional elements, and the co-associations of words in promoter regions likely represent the formation of higher order regulatory modules. This work is an important step toward fully cataloguing the functional elements of the Arabidopsis genome. |
format | Text |
id | pubmed-2770528 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-27705282009-10-30 The word landscape of the non-coding segments of the Arabidopsis thaliana genome Lichtenberg, Jens Yilmaz, Alper Welch, Joshua D Kurz, Kyle Liang, Xiaoyu Drews, Frank Ecker, Klaus Lee, Stephen S Geisler, Matt Grotewold, Erich Welch, Lonnie R BMC Genomics Research Article BACKGROUND: Genome sequences can be conceptualized as arrangements of motifs or words. The frequencies and positional distributions of these words within particular non-coding genomic segments provide important insights into how the words function in processes such as mRNA stability and regulation of gene expression. RESULTS: Using an enumerative word discovery approach, we investigated the frequencies and positional distributions of all 65,536 different 8-letter words in the genome of Arabidopsis thaliana. Focusing on promoter regions, introns, and 3' and 5' untranslated regions (3'UTRs and 5'UTRs), we compared word frequencies in these segments to genome-wide frequencies. The statistically interesting words in each segment were clustered with similar words to generate motif logos. We investigated whether words were clustered at particular locations or were distributed randomly within each genomic segment, and we classified the words using gene expression information from public repositories. Finally, we investigated whether particular sets of words appeared together more frequently than others. CONCLUSION: Our studies provide a detailed view of the word composition of several segments of the non-coding portion of the Arabidopsis genome. Each segment contains a unique word-based signature. The respective signatures consist of the sets of enriched words, 'unwords', and word pairs within a segment, as well as the preferential locations and functional classifications for the signature words. Additionally, the positional distributions of enriched words within the segments highlight possible functional elements, and the co-associations of words in promoter regions likely represent the formation of higher order regulatory modules. This work is an important step toward fully cataloguing the functional elements of the Arabidopsis genome. BioMed Central 2009-10-08 /pmc/articles/PMC2770528/ /pubmed/19814816 http://dx.doi.org/10.1186/1471-2164-10-463 Text en Copyright © 2009 Lichtenberg et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Lichtenberg, Jens Yilmaz, Alper Welch, Joshua D Kurz, Kyle Liang, Xiaoyu Drews, Frank Ecker, Klaus Lee, Stephen S Geisler, Matt Grotewold, Erich Welch, Lonnie R The word landscape of the non-coding segments of the Arabidopsis thaliana genome |
title | The word landscape of the non-coding segments of the Arabidopsis thaliana genome |
title_full | The word landscape of the non-coding segments of the Arabidopsis thaliana genome |
title_fullStr | The word landscape of the non-coding segments of the Arabidopsis thaliana genome |
title_full_unstemmed | The word landscape of the non-coding segments of the Arabidopsis thaliana genome |
title_short | The word landscape of the non-coding segments of the Arabidopsis thaliana genome |
title_sort | word landscape of the non-coding segments of the arabidopsis thaliana genome |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2770528/ https://www.ncbi.nlm.nih.gov/pubmed/19814816 http://dx.doi.org/10.1186/1471-2164-10-463 |
work_keys_str_mv | AT lichtenbergjens thewordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome AT yilmazalper thewordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome AT welchjoshuad thewordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome AT kurzkyle thewordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome AT liangxiaoyu thewordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome AT drewsfrank thewordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome AT eckerklaus thewordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome AT leestephens thewordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome AT geislermatt thewordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome AT grotewolderich thewordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome AT welchlonnier thewordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome AT lichtenbergjens wordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome AT yilmazalper wordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome AT welchjoshuad wordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome AT kurzkyle wordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome AT liangxiaoyu wordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome AT drewsfrank wordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome AT eckerklaus wordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome AT leestephens wordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome AT geislermatt wordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome AT grotewolderich wordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome AT welchlonnier wordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome |