Cargando…

The word landscape of the non-coding segments of the Arabidopsis thaliana genome

BACKGROUND: Genome sequences can be conceptualized as arrangements of motifs or words. The frequencies and positional distributions of these words within particular non-coding genomic segments provide important insights into how the words function in processes such as mRNA stability and regulation o...

Descripción completa

Detalles Bibliográficos
Autores principales: Lichtenberg, Jens, Yilmaz, Alper, Welch, Joshua D, Kurz, Kyle, Liang, Xiaoyu, Drews, Frank, Ecker, Klaus, Lee, Stephen S, Geisler, Matt, Grotewold, Erich, Welch, Lonnie R
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2770528/
https://www.ncbi.nlm.nih.gov/pubmed/19814816
http://dx.doi.org/10.1186/1471-2164-10-463
_version_ 1782173676193447936
author Lichtenberg, Jens
Yilmaz, Alper
Welch, Joshua D
Kurz, Kyle
Liang, Xiaoyu
Drews, Frank
Ecker, Klaus
Lee, Stephen S
Geisler, Matt
Grotewold, Erich
Welch, Lonnie R
author_facet Lichtenberg, Jens
Yilmaz, Alper
Welch, Joshua D
Kurz, Kyle
Liang, Xiaoyu
Drews, Frank
Ecker, Klaus
Lee, Stephen S
Geisler, Matt
Grotewold, Erich
Welch, Lonnie R
author_sort Lichtenberg, Jens
collection PubMed
description BACKGROUND: Genome sequences can be conceptualized as arrangements of motifs or words. The frequencies and positional distributions of these words within particular non-coding genomic segments provide important insights into how the words function in processes such as mRNA stability and regulation of gene expression. RESULTS: Using an enumerative word discovery approach, we investigated the frequencies and positional distributions of all 65,536 different 8-letter words in the genome of Arabidopsis thaliana. Focusing on promoter regions, introns, and 3' and 5' untranslated regions (3'UTRs and 5'UTRs), we compared word frequencies in these segments to genome-wide frequencies. The statistically interesting words in each segment were clustered with similar words to generate motif logos. We investigated whether words were clustered at particular locations or were distributed randomly within each genomic segment, and we classified the words using gene expression information from public repositories. Finally, we investigated whether particular sets of words appeared together more frequently than others. CONCLUSION: Our studies provide a detailed view of the word composition of several segments of the non-coding portion of the Arabidopsis genome. Each segment contains a unique word-based signature. The respective signatures consist of the sets of enriched words, 'unwords', and word pairs within a segment, as well as the preferential locations and functional classifications for the signature words. Additionally, the positional distributions of enriched words within the segments highlight possible functional elements, and the co-associations of words in promoter regions likely represent the formation of higher order regulatory modules. This work is an important step toward fully cataloguing the functional elements of the Arabidopsis genome.
format Text
id pubmed-2770528
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27705282009-10-30 The word landscape of the non-coding segments of the Arabidopsis thaliana genome Lichtenberg, Jens Yilmaz, Alper Welch, Joshua D Kurz, Kyle Liang, Xiaoyu Drews, Frank Ecker, Klaus Lee, Stephen S Geisler, Matt Grotewold, Erich Welch, Lonnie R BMC Genomics Research Article BACKGROUND: Genome sequences can be conceptualized as arrangements of motifs or words. The frequencies and positional distributions of these words within particular non-coding genomic segments provide important insights into how the words function in processes such as mRNA stability and regulation of gene expression. RESULTS: Using an enumerative word discovery approach, we investigated the frequencies and positional distributions of all 65,536 different 8-letter words in the genome of Arabidopsis thaliana. Focusing on promoter regions, introns, and 3' and 5' untranslated regions (3'UTRs and 5'UTRs), we compared word frequencies in these segments to genome-wide frequencies. The statistically interesting words in each segment were clustered with similar words to generate motif logos. We investigated whether words were clustered at particular locations or were distributed randomly within each genomic segment, and we classified the words using gene expression information from public repositories. Finally, we investigated whether particular sets of words appeared together more frequently than others. CONCLUSION: Our studies provide a detailed view of the word composition of several segments of the non-coding portion of the Arabidopsis genome. Each segment contains a unique word-based signature. The respective signatures consist of the sets of enriched words, 'unwords', and word pairs within a segment, as well as the preferential locations and functional classifications for the signature words. Additionally, the positional distributions of enriched words within the segments highlight possible functional elements, and the co-associations of words in promoter regions likely represent the formation of higher order regulatory modules. This work is an important step toward fully cataloguing the functional elements of the Arabidopsis genome. BioMed Central 2009-10-08 /pmc/articles/PMC2770528/ /pubmed/19814816 http://dx.doi.org/10.1186/1471-2164-10-463 Text en Copyright © 2009 Lichtenberg et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Lichtenberg, Jens
Yilmaz, Alper
Welch, Joshua D
Kurz, Kyle
Liang, Xiaoyu
Drews, Frank
Ecker, Klaus
Lee, Stephen S
Geisler, Matt
Grotewold, Erich
Welch, Lonnie R
The word landscape of the non-coding segments of the Arabidopsis thaliana genome
title The word landscape of the non-coding segments of the Arabidopsis thaliana genome
title_full The word landscape of the non-coding segments of the Arabidopsis thaliana genome
title_fullStr The word landscape of the non-coding segments of the Arabidopsis thaliana genome
title_full_unstemmed The word landscape of the non-coding segments of the Arabidopsis thaliana genome
title_short The word landscape of the non-coding segments of the Arabidopsis thaliana genome
title_sort word landscape of the non-coding segments of the arabidopsis thaliana genome
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2770528/
https://www.ncbi.nlm.nih.gov/pubmed/19814816
http://dx.doi.org/10.1186/1471-2164-10-463
work_keys_str_mv AT lichtenbergjens thewordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome
AT yilmazalper thewordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome
AT welchjoshuad thewordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome
AT kurzkyle thewordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome
AT liangxiaoyu thewordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome
AT drewsfrank thewordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome
AT eckerklaus thewordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome
AT leestephens thewordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome
AT geislermatt thewordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome
AT grotewolderich thewordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome
AT welchlonnier thewordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome
AT lichtenbergjens wordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome
AT yilmazalper wordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome
AT welchjoshuad wordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome
AT kurzkyle wordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome
AT liangxiaoyu wordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome
AT drewsfrank wordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome
AT eckerklaus wordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome
AT leestephens wordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome
AT geislermatt wordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome
AT grotewolderich wordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome
AT welchlonnier wordlandscapeofthenoncodingsegmentsofthearabidopsisthalianagenome