Cargando…

Deep learning of human polyadenylation sites at nucleotide resolution reveals molecular determinants of site usage and relevance in disease

The genomic distribution of cleavage and polyadenylation (polyA) sites should be co-evolutionally optimized with the local gene structure. Otherwise, spurious polyadenylation can cause premature transcription termination and generate aberrant proteins. To obtain mechanistic insights into polyA site...

Descripción completa

Detalles Bibliográficos
Autores principales: Stroup, Emily Kunce, Ji, Zhe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10651852/
https://www.ncbi.nlm.nih.gov/pubmed/37968271
http://dx.doi.org/10.1038/s41467-023-43266-3
_version_ 1785147642294042624
author Stroup, Emily Kunce
Ji, Zhe
author_facet Stroup, Emily Kunce
Ji, Zhe
author_sort Stroup, Emily Kunce
collection PubMed
description The genomic distribution of cleavage and polyadenylation (polyA) sites should be co-evolutionally optimized with the local gene structure. Otherwise, spurious polyadenylation can cause premature transcription termination and generate aberrant proteins. To obtain mechanistic insights into polyA site optimization across the human genome, we develop deep/machine learning models to identify genome-wide putative polyA sites at unprecedented nucleotide-level resolution and calculate their strength and usage in the genomic context. Our models quantitatively measure position-specific motif importance and their crosstalk in polyA site formation and cleavage heterogeneity. The intronic site expression is governed by the surrounding splicing landscape. The usage of alternative polyA sites in terminal exons is modulated by their relative locations and distance to downstream genes. Finally, we apply our models to reveal thousands of disease- and trait-associated genetic variants altering polyadenylation activity. Altogether, our models represent a valuable resource to dissect molecular mechanisms mediating genome-wide polyA site expression and characterize their functional roles in human diseases.
format Online
Article
Text
id pubmed-10651852
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-106518522023-11-15 Deep learning of human polyadenylation sites at nucleotide resolution reveals molecular determinants of site usage and relevance in disease Stroup, Emily Kunce Ji, Zhe Nat Commun Article The genomic distribution of cleavage and polyadenylation (polyA) sites should be co-evolutionally optimized with the local gene structure. Otherwise, spurious polyadenylation can cause premature transcription termination and generate aberrant proteins. To obtain mechanistic insights into polyA site optimization across the human genome, we develop deep/machine learning models to identify genome-wide putative polyA sites at unprecedented nucleotide-level resolution and calculate their strength and usage in the genomic context. Our models quantitatively measure position-specific motif importance and their crosstalk in polyA site formation and cleavage heterogeneity. The intronic site expression is governed by the surrounding splicing landscape. The usage of alternative polyA sites in terminal exons is modulated by their relative locations and distance to downstream genes. Finally, we apply our models to reveal thousands of disease- and trait-associated genetic variants altering polyadenylation activity. Altogether, our models represent a valuable resource to dissect molecular mechanisms mediating genome-wide polyA site expression and characterize their functional roles in human diseases. Nature Publishing Group UK 2023-11-15 /pmc/articles/PMC10651852/ /pubmed/37968271 http://dx.doi.org/10.1038/s41467-023-43266-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Stroup, Emily Kunce
Ji, Zhe
Deep learning of human polyadenylation sites at nucleotide resolution reveals molecular determinants of site usage and relevance in disease
title Deep learning of human polyadenylation sites at nucleotide resolution reveals molecular determinants of site usage and relevance in disease
title_full Deep learning of human polyadenylation sites at nucleotide resolution reveals molecular determinants of site usage and relevance in disease
title_fullStr Deep learning of human polyadenylation sites at nucleotide resolution reveals molecular determinants of site usage and relevance in disease
title_full_unstemmed Deep learning of human polyadenylation sites at nucleotide resolution reveals molecular determinants of site usage and relevance in disease
title_short Deep learning of human polyadenylation sites at nucleotide resolution reveals molecular determinants of site usage and relevance in disease
title_sort deep learning of human polyadenylation sites at nucleotide resolution reveals molecular determinants of site usage and relevance in disease
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10651852/
https://www.ncbi.nlm.nih.gov/pubmed/37968271
http://dx.doi.org/10.1038/s41467-023-43266-3
work_keys_str_mv AT stroupemilykunce deeplearningofhumanpolyadenylationsitesatnucleotideresolutionrevealsmoleculardeterminantsofsiteusageandrelevanceindisease
AT jizhe deeplearningofhumanpolyadenylationsitesatnucleotideresolutionrevealsmoleculardeterminantsofsiteusageandrelevanceindisease