Cargando…
Genome-wide characterization of intergenic polyadenylation sites redefines gene spaces in Arabidopsis thaliana
BACKGROUND: Messenger RNA polyadenylation is an essential step for the maturation of most eukaryotic mRNAs. Accurate determination of poly(A) sites helps define the 3’-ends of genes, which is important for genome annotation and gene function research. Genomic studies have revealed the presence of po...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4568572/ https://www.ncbi.nlm.nih.gov/pubmed/26155789 http://dx.doi.org/10.1186/s12864-015-1691-1 |
_version_ | 1782389929405317120 |
---|---|
author | Wu, Xiaohui Zeng, Yong Guan, Jinting Ji, Guoli Huang, Rongting Li, Qingshun Q. |
author_facet | Wu, Xiaohui Zeng, Yong Guan, Jinting Ji, Guoli Huang, Rongting Li, Qingshun Q. |
author_sort | Wu, Xiaohui |
collection | PubMed |
description | BACKGROUND: Messenger RNA polyadenylation is an essential step for the maturation of most eukaryotic mRNAs. Accurate determination of poly(A) sites helps define the 3’-ends of genes, which is important for genome annotation and gene function research. Genomic studies have revealed the presence of poly(A) sites in intergenic regions, which may be attributed to 3’-UTR extensions and novel transcript units. However, there is no systematically evaluation of intergenic poly(A) sites in plants. RESULTS: Approximately 16,000 intergenic poly(A) site clusters (IPAC) in Arabidopsis thaliana were discovered and evaluated at the whole genome level. Based on the distributions of distance from IPACs to nearby sense and antisense genes, these IPACs were classified into three categories. About 70 % of them were from previously unannotated 3’-UTR extensions to known genes, which would extend 6985 transcripts of TAIR10 genome annotation beyond their 3’-ends, with a mean extension of 134 nucleotides. 1317 IPACs were originated from novel intergenic transcripts, 37 of which were likely to be associated with protein coding transcripts. 2957 IPACs corresponded to antisense transcripts for genes on the reverse strand, which might affect 2265 protein coding genes and 39 non-protein-coding genes, including long non-coding RNA genes. The rest of IPACs could be originated from transcriptional read-through or gene mis-annotations. CONCLUSIONS: The identified IPACs corresponding to novel transcripts, 3’-UTR extensions, and antisense transcription should be incorporated into current Arabidopsis genome annotation. Comprehensive characterization of IPACs from this study provides insights of alternative polyadenylation and antisense transcription in plants. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1691-1) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4568572 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-45685722015-09-15 Genome-wide characterization of intergenic polyadenylation sites redefines gene spaces in Arabidopsis thaliana Wu, Xiaohui Zeng, Yong Guan, Jinting Ji, Guoli Huang, Rongting Li, Qingshun Q. BMC Genomics Research Article BACKGROUND: Messenger RNA polyadenylation is an essential step for the maturation of most eukaryotic mRNAs. Accurate determination of poly(A) sites helps define the 3’-ends of genes, which is important for genome annotation and gene function research. Genomic studies have revealed the presence of poly(A) sites in intergenic regions, which may be attributed to 3’-UTR extensions and novel transcript units. However, there is no systematically evaluation of intergenic poly(A) sites in plants. RESULTS: Approximately 16,000 intergenic poly(A) site clusters (IPAC) in Arabidopsis thaliana were discovered and evaluated at the whole genome level. Based on the distributions of distance from IPACs to nearby sense and antisense genes, these IPACs were classified into three categories. About 70 % of them were from previously unannotated 3’-UTR extensions to known genes, which would extend 6985 transcripts of TAIR10 genome annotation beyond their 3’-ends, with a mean extension of 134 nucleotides. 1317 IPACs were originated from novel intergenic transcripts, 37 of which were likely to be associated with protein coding transcripts. 2957 IPACs corresponded to antisense transcripts for genes on the reverse strand, which might affect 2265 protein coding genes and 39 non-protein-coding genes, including long non-coding RNA genes. The rest of IPACs could be originated from transcriptional read-through or gene mis-annotations. CONCLUSIONS: The identified IPACs corresponding to novel transcripts, 3’-UTR extensions, and antisense transcription should be incorporated into current Arabidopsis genome annotation. Comprehensive characterization of IPACs from this study provides insights of alternative polyadenylation and antisense transcription in plants. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1691-1) contains supplementary material, which is available to authorized users. BioMed Central 2015-07-09 /pmc/articles/PMC4568572/ /pubmed/26155789 http://dx.doi.org/10.1186/s12864-015-1691-1 Text en © Wu et al. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Wu, Xiaohui Zeng, Yong Guan, Jinting Ji, Guoli Huang, Rongting Li, Qingshun Q. Genome-wide characterization of intergenic polyadenylation sites redefines gene spaces in Arabidopsis thaliana |
title | Genome-wide characterization of intergenic polyadenylation sites redefines gene spaces in Arabidopsis thaliana |
title_full | Genome-wide characterization of intergenic polyadenylation sites redefines gene spaces in Arabidopsis thaliana |
title_fullStr | Genome-wide characterization of intergenic polyadenylation sites redefines gene spaces in Arabidopsis thaliana |
title_full_unstemmed | Genome-wide characterization of intergenic polyadenylation sites redefines gene spaces in Arabidopsis thaliana |
title_short | Genome-wide characterization of intergenic polyadenylation sites redefines gene spaces in Arabidopsis thaliana |
title_sort | genome-wide characterization of intergenic polyadenylation sites redefines gene spaces in arabidopsis thaliana |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4568572/ https://www.ncbi.nlm.nih.gov/pubmed/26155789 http://dx.doi.org/10.1186/s12864-015-1691-1 |
work_keys_str_mv | AT wuxiaohui genomewidecharacterizationofintergenicpolyadenylationsitesredefinesgenespacesinarabidopsisthaliana AT zengyong genomewidecharacterizationofintergenicpolyadenylationsitesredefinesgenespacesinarabidopsisthaliana AT guanjinting genomewidecharacterizationofintergenicpolyadenylationsitesredefinesgenespacesinarabidopsisthaliana AT jiguoli genomewidecharacterizationofintergenicpolyadenylationsitesredefinesgenespacesinarabidopsisthaliana AT huangrongting genomewidecharacterizationofintergenicpolyadenylationsitesredefinesgenespacesinarabidopsisthaliana AT liqingshunq genomewidecharacterizationofintergenicpolyadenylationsitesredefinesgenespacesinarabidopsisthaliana |