Cargando…

Inference of the human polyadenylation code

MOTIVATION: Processing of transcripts at the 3′-end involves cleavage at a polyadenylation site followed by the addition of a poly(A)-tail. By selecting which site is cleaved, the process of alternative polyadenylation enables genes to produce transcript isoforms with different 3′-ends. To facilitat...

Descripción completa

Detalles Bibliográficos
Autores principales:	Leung, Michael K K, Delong, Andrew, Frey, Brendan J
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2018
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6129302/ https://www.ncbi.nlm.nih.gov/pubmed/29648582 http://dx.doi.org/10.1093/bioinformatics/bty211

_version_	1783353777530601472
author	Leung, Michael K K Delong, Andrew Frey, Brendan J
author_facet	Leung, Michael K K Delong, Andrew Frey, Brendan J
author_sort	Leung, Michael K K
collection	PubMed
description	MOTIVATION: Processing of transcripts at the 3′-end involves cleavage at a polyadenylation site followed by the addition of a poly(A)-tail. By selecting which site is cleaved, the process of alternative polyadenylation enables genes to produce transcript isoforms with different 3′-ends. To facilitate the identification and treatment of disease-causing mutations that affect polyadenylation and to understand the sequence determinants underlying this regulatory process, a computational model that can accurately predict polyadenylation patterns from genomic features is desirable. RESULTS: Previous works have focused on identifying candidate polyadenylation sites and classifying tissue-specific sites. By training on how multiple sites in genes are competitively selected for polyadenylation from 3′-end sequencing data, we developed a deep learning model that can predict the tissue-specific strength of a polyadenylation site in the 3′ untranslated region of the human genome given only its genomic sequence. We demonstrate the model’s broad utility on multiple tasks, without any application-specific training. The model can be used to predict which polyadenylation site is more likely to be selected in genes with multiple sites. It can be used to scan the 3′ untranslated region to find candidate polyadenylation sites. It can be used to classify the pathogenicity of variants near annotated polyadenylation sites in ClinVar. It can also be used to anticipate the effect of antisense oligonucleotide experiments to redirect polyadenylation. We provide analysis on how different features affect the model’s predictive performance and a method to identify sensitive regions of the genome at the single-based resolution that can affect polyadenylation regulation. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-6129302
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-61293022018-09-12 Inference of the human polyadenylation code Leung, Michael K K Delong, Andrew Frey, Brendan J Bioinformatics Original Papers MOTIVATION: Processing of transcripts at the 3′-end involves cleavage at a polyadenylation site followed by the addition of a poly(A)-tail. By selecting which site is cleaved, the process of alternative polyadenylation enables genes to produce transcript isoforms with different 3′-ends. To facilitate the identification and treatment of disease-causing mutations that affect polyadenylation and to understand the sequence determinants underlying this regulatory process, a computational model that can accurately predict polyadenylation patterns from genomic features is desirable. RESULTS: Previous works have focused on identifying candidate polyadenylation sites and classifying tissue-specific sites. By training on how multiple sites in genes are competitively selected for polyadenylation from 3′-end sequencing data, we developed a deep learning model that can predict the tissue-specific strength of a polyadenylation site in the 3′ untranslated region of the human genome given only its genomic sequence. We demonstrate the model’s broad utility on multiple tasks, without any application-specific training. The model can be used to predict which polyadenylation site is more likely to be selected in genes with multiple sites. It can be used to scan the 3′ untranslated region to find candidate polyadenylation sites. It can be used to classify the pathogenicity of variants near annotated polyadenylation sites in ClinVar. It can also be used to anticipate the effect of antisense oligonucleotide experiments to redirect polyadenylation. We provide analysis on how different features affect the model’s predictive performance and a method to identify sensitive regions of the genome at the single-based resolution that can affect polyadenylation regulation. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-09-01 2018-04-10 /pmc/articles/PMC6129302/ /pubmed/29648582 http://dx.doi.org/10.1093/bioinformatics/bty211 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Original Papers Leung, Michael K K Delong, Andrew Frey, Brendan J Inference of the human polyadenylation code
title	Inference of the human polyadenylation code
title_full	Inference of the human polyadenylation code
title_fullStr	Inference of the human polyadenylation code
title_full_unstemmed	Inference of the human polyadenylation code
title_short	Inference of the human polyadenylation code
title_sort	inference of the human polyadenylation code
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6129302/ https://www.ncbi.nlm.nih.gov/pubmed/29648582 http://dx.doi.org/10.1093/bioinformatics/bty211
work_keys_str_mv	AT leungmichaelkk inferenceofthehumanpolyadenylationcode AT delongandrew inferenceofthehumanpolyadenylationcode AT freybrendanj inferenceofthehumanpolyadenylationcode

Inference of the human polyadenylation code

Ejemplares similares