Cargando…

Sequence determinants in human polyadenylation site selection

BACKGROUND: Differential polyadenylation is a widespread mechanism in higher eukaryotes producing mRNAs with different 3' ends in different contexts. This involves several alternative polyadenylation sites in the 3' UTR, each with its specific strength. Here, we analyze the vicinity of hum...

Descripción completa

Detalles Bibliográficos
Autores principales: Legendre, Matthieu, Gautheret, Daniel
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2003
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC151664/
https://www.ncbi.nlm.nih.gov/pubmed/12600277
http://dx.doi.org/10.1186/1471-2164-4-7
_version_ 1782120679241416704
author Legendre, Matthieu
Gautheret, Daniel
author_facet Legendre, Matthieu
Gautheret, Daniel
author_sort Legendre, Matthieu
collection PubMed
description BACKGROUND: Differential polyadenylation is a widespread mechanism in higher eukaryotes producing mRNAs with different 3' ends in different contexts. This involves several alternative polyadenylation sites in the 3' UTR, each with its specific strength. Here, we analyze the vicinity of human polyadenylation signals in search of patterns that would help discriminate strong and weak polyadenylation sites, or true sites from randomly occurring signals. RESULTS: We used human genomic sequences to retrieve the region downstream of polyadenylation signals, usually absent from cDNA or mRNA databases. Analyzing 4956 EST-validated polyadenylation sites and their -300/+300 nt flanking regions, we clearly visualized the upstream (USE) and downstream (DSE) sequence elements, both characterized by U-rich (not GU-rich) segments. The presence of a USE and a DSE is the main feature distinguishing true polyadenylation sites from randomly occurring A(A/U)UAAA hexamers. While USEs are indifferently associated with strong and weak poly(A) sites, DSEs are more conspicuous near strong poly(A) sites. We then used the region encompassing the hexamer and DSE as a training set for poly(A) site identification by the ERPIN program and achieved a prediction specificity of 69 to 85% for a sensitivity of 56%. CONCLUSION: The availability of complete genomes and large EST sequence databases now permit large-scale observation of polyadenylation sites. Both U-rich sequences flanking both sides of poly(A) signals contribute to the definition of "true" sites. However, the downstream U-rich sequences may also play an enhancing role. Based on this information, poly(A) site prediction accuracy was moderately but consistently improved compared to the best previously available algorithm.
format Text
id pubmed-151664
institution National Center for Biotechnology Information
language English
publishDate 2003
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-1516642003-03-20 Sequence determinants in human polyadenylation site selection Legendre, Matthieu Gautheret, Daniel BMC Genomics Research Article BACKGROUND: Differential polyadenylation is a widespread mechanism in higher eukaryotes producing mRNAs with different 3' ends in different contexts. This involves several alternative polyadenylation sites in the 3' UTR, each with its specific strength. Here, we analyze the vicinity of human polyadenylation signals in search of patterns that would help discriminate strong and weak polyadenylation sites, or true sites from randomly occurring signals. RESULTS: We used human genomic sequences to retrieve the region downstream of polyadenylation signals, usually absent from cDNA or mRNA databases. Analyzing 4956 EST-validated polyadenylation sites and their -300/+300 nt flanking regions, we clearly visualized the upstream (USE) and downstream (DSE) sequence elements, both characterized by U-rich (not GU-rich) segments. The presence of a USE and a DSE is the main feature distinguishing true polyadenylation sites from randomly occurring A(A/U)UAAA hexamers. While USEs are indifferently associated with strong and weak poly(A) sites, DSEs are more conspicuous near strong poly(A) sites. We then used the region encompassing the hexamer and DSE as a training set for poly(A) site identification by the ERPIN program and achieved a prediction specificity of 69 to 85% for a sensitivity of 56%. CONCLUSION: The availability of complete genomes and large EST sequence databases now permit large-scale observation of polyadenylation sites. Both U-rich sequences flanking both sides of poly(A) signals contribute to the definition of "true" sites. However, the downstream U-rich sequences may also play an enhancing role. Based on this information, poly(A) site prediction accuracy was moderately but consistently improved compared to the best previously available algorithm. BioMed Central 2003-02-25 /pmc/articles/PMC151664/ /pubmed/12600277 http://dx.doi.org/10.1186/1471-2164-4-7 Text en Copyright © 2003 Legendre and Gautheret; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle Research Article
Legendre, Matthieu
Gautheret, Daniel
Sequence determinants in human polyadenylation site selection
title Sequence determinants in human polyadenylation site selection
title_full Sequence determinants in human polyadenylation site selection
title_fullStr Sequence determinants in human polyadenylation site selection
title_full_unstemmed Sequence determinants in human polyadenylation site selection
title_short Sequence determinants in human polyadenylation site selection
title_sort sequence determinants in human polyadenylation site selection
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC151664/
https://www.ncbi.nlm.nih.gov/pubmed/12600277
http://dx.doi.org/10.1186/1471-2164-4-7
work_keys_str_mv AT legendrematthieu sequencedeterminantsinhumanpolyadenylationsiteselection
AT gautheretdaniel sequencedeterminantsinhumanpolyadenylationsiteselection