Cargando…

Inference of the human polyadenylation code

MOTIVATION: Processing of transcripts at the 3′-end involves cleavage at a polyadenylation site followed by the addition of a poly(A)-tail. By selecting which site is cleaved, the process of alternative polyadenylation enables genes to produce transcript isoforms with different 3′-ends. To facilitat...

Descripción completa

Detalles Bibliográficos
Autores principales: Leung, Michael K K, Delong, Andrew, Frey, Brendan J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6129302/
https://www.ncbi.nlm.nih.gov/pubmed/29648582
http://dx.doi.org/10.1093/bioinformatics/bty211
Descripción
Sumario:MOTIVATION: Processing of transcripts at the 3′-end involves cleavage at a polyadenylation site followed by the addition of a poly(A)-tail. By selecting which site is cleaved, the process of alternative polyadenylation enables genes to produce transcript isoforms with different 3′-ends. To facilitate the identification and treatment of disease-causing mutations that affect polyadenylation and to understand the sequence determinants underlying this regulatory process, a computational model that can accurately predict polyadenylation patterns from genomic features is desirable. RESULTS: Previous works have focused on identifying candidate polyadenylation sites and classifying tissue-specific sites. By training on how multiple sites in genes are competitively selected for polyadenylation from 3′-end sequencing data, we developed a deep learning model that can predict the tissue-specific strength of a polyadenylation site in the 3′ untranslated region of the human genome given only its genomic sequence. We demonstrate the model’s broad utility on multiple tasks, without any application-specific training. The model can be used to predict which polyadenylation site is more likely to be selected in genes with multiple sites. It can be used to scan the 3′ untranslated region to find candidate polyadenylation sites. It can be used to classify the pathogenicity of variants near annotated polyadenylation sites in ClinVar. It can also be used to anticipate the effect of antisense oligonucleotide experiments to redirect polyadenylation. We provide analysis on how different features affect the model’s predictive performance and a method to identify sensitive regions of the genome at the single-based resolution that can affect polyadenylation regulation. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.