Cargando…

PolyAMiner-Bulk: A Machine Learning Based Bioinformatics Algorithm to Infer and Decode Alternative Polyadenylation Dynamics from bulk RNA-seq data

More than half of human genes exercise alternative polyadenylation (APA) and generate mRNA transcripts with varying 3’ untranslated regions (UTR). However, current computational approaches for identifying cleavage and polyadenylation sites (C/PASs) and quantifying 3’UTR length changes from bulk RNA-...

Descripción completa

Detalles Bibliográficos
Autores principales: Jonnakuti, Venkata Soumith, Wagner, Eric J., Maletić-Savatić, Mirjana, Liu, Zhandong, Yalamanchili, Hari Krishna
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9900750/
https://www.ncbi.nlm.nih.gov/pubmed/36747700
http://dx.doi.org/10.1101/2023.01.23.523471
_version_ 1784882913077100544
author Jonnakuti, Venkata Soumith
Wagner, Eric J.
Maletić-Savatić, Mirjana
Liu, Zhandong
Yalamanchili, Hari Krishna
author_facet Jonnakuti, Venkata Soumith
Wagner, Eric J.
Maletić-Savatić, Mirjana
Liu, Zhandong
Yalamanchili, Hari Krishna
author_sort Jonnakuti, Venkata Soumith
collection PubMed
description More than half of human genes exercise alternative polyadenylation (APA) and generate mRNA transcripts with varying 3’ untranslated regions (UTR). However, current computational approaches for identifying cleavage and polyadenylation sites (C/PASs) and quantifying 3’UTR length changes from bulk RNA-seq data fail to unravel tissue- and disease-specific APA dynamics. Here, we developed a next-generation bioinformatics algorithm and application, PolyAMiner-Bulk, that utilizes an attention-based machine learning architecture and an improved vector projection-based engine to infer differential APA dynamics accurately. When applied to earlier studies, PolyAMiner-Bulk accurately identified more than twice the number of APA changes in an RBM17 knockdown bulk RNA-seq dataset compared to current generation tools. Moreover, on a separate dataset, PolyAMiner-Bulk revealed novel APA dynamics and pathways in scleroderma pathology and identified differential APA in a gene that was identified as being involved in scleroderma pathogenesis in an independent study. Lastly, we used PolyAMiner-Bulk to analyze the RNA-seq data of post-mortem prefrontal cortexes from the ROSMAP data consortium and unraveled novel APA dynamics in Alzheimer’s Disease. Our method, PolyAMiner-Bulk, creates a paradigm for future alternative polyadenylation analysis from bulk RNA-seq data.
format Online
Article
Text
id pubmed-9900750
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-99007502023-02-07 PolyAMiner-Bulk: A Machine Learning Based Bioinformatics Algorithm to Infer and Decode Alternative Polyadenylation Dynamics from bulk RNA-seq data Jonnakuti, Venkata Soumith Wagner, Eric J. Maletić-Savatić, Mirjana Liu, Zhandong Yalamanchili, Hari Krishna bioRxiv Article More than half of human genes exercise alternative polyadenylation (APA) and generate mRNA transcripts with varying 3’ untranslated regions (UTR). However, current computational approaches for identifying cleavage and polyadenylation sites (C/PASs) and quantifying 3’UTR length changes from bulk RNA-seq data fail to unravel tissue- and disease-specific APA dynamics. Here, we developed a next-generation bioinformatics algorithm and application, PolyAMiner-Bulk, that utilizes an attention-based machine learning architecture and an improved vector projection-based engine to infer differential APA dynamics accurately. When applied to earlier studies, PolyAMiner-Bulk accurately identified more than twice the number of APA changes in an RBM17 knockdown bulk RNA-seq dataset compared to current generation tools. Moreover, on a separate dataset, PolyAMiner-Bulk revealed novel APA dynamics and pathways in scleroderma pathology and identified differential APA in a gene that was identified as being involved in scleroderma pathogenesis in an independent study. Lastly, we used PolyAMiner-Bulk to analyze the RNA-seq data of post-mortem prefrontal cortexes from the ROSMAP data consortium and unraveled novel APA dynamics in Alzheimer’s Disease. Our method, PolyAMiner-Bulk, creates a paradigm for future alternative polyadenylation analysis from bulk RNA-seq data. Cold Spring Harbor Laboratory 2023-01-24 /pmc/articles/PMC9900750/ /pubmed/36747700 http://dx.doi.org/10.1101/2023.01.23.523471 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Jonnakuti, Venkata Soumith
Wagner, Eric J.
Maletić-Savatić, Mirjana
Liu, Zhandong
Yalamanchili, Hari Krishna
PolyAMiner-Bulk: A Machine Learning Based Bioinformatics Algorithm to Infer and Decode Alternative Polyadenylation Dynamics from bulk RNA-seq data
title PolyAMiner-Bulk: A Machine Learning Based Bioinformatics Algorithm to Infer and Decode Alternative Polyadenylation Dynamics from bulk RNA-seq data
title_full PolyAMiner-Bulk: A Machine Learning Based Bioinformatics Algorithm to Infer and Decode Alternative Polyadenylation Dynamics from bulk RNA-seq data
title_fullStr PolyAMiner-Bulk: A Machine Learning Based Bioinformatics Algorithm to Infer and Decode Alternative Polyadenylation Dynamics from bulk RNA-seq data
title_full_unstemmed PolyAMiner-Bulk: A Machine Learning Based Bioinformatics Algorithm to Infer and Decode Alternative Polyadenylation Dynamics from bulk RNA-seq data
title_short PolyAMiner-Bulk: A Machine Learning Based Bioinformatics Algorithm to Infer and Decode Alternative Polyadenylation Dynamics from bulk RNA-seq data
title_sort polyaminer-bulk: a machine learning based bioinformatics algorithm to infer and decode alternative polyadenylation dynamics from bulk rna-seq data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9900750/
https://www.ncbi.nlm.nih.gov/pubmed/36747700
http://dx.doi.org/10.1101/2023.01.23.523471
work_keys_str_mv AT jonnakutivenkatasoumith polyaminerbulkamachinelearningbasedbioinformaticsalgorithmtoinferanddecodealternativepolyadenylationdynamicsfrombulkrnaseqdata
AT wagnerericj polyaminerbulkamachinelearningbasedbioinformaticsalgorithmtoinferanddecodealternativepolyadenylationdynamicsfrombulkrnaseqdata
AT maleticsavaticmirjana polyaminerbulkamachinelearningbasedbioinformaticsalgorithmtoinferanddecodealternativepolyadenylationdynamicsfrombulkrnaseqdata
AT liuzhandong polyaminerbulkamachinelearningbasedbioinformaticsalgorithmtoinferanddecodealternativepolyadenylationdynamicsfrombulkrnaseqdata
AT yalamanchiliharikrishna polyaminerbulkamachinelearningbasedbioinformaticsalgorithmtoinferanddecodealternativepolyadenylationdynamicsfrombulkrnaseqdata