Cargando…

PolyA-miner: accurate assessment of differential alternative poly-adenylation from 3′Seq data using vector projections and non-negative matrix factorization

Almost 70% of human genes undergo alternative polyadenylation (APA) and generate mRNA transcripts with varying lengths, typically of the 3′ untranslated regions (UTR). APA plays an important role in development and cellular differentiation, and its dysregulation can cause neuropsychiatric diseases a...

Descripción completa

Detalles Bibliográficos
Autores principales: Yalamanchili, Hari Krishna, Alcott, Callison E, Ji, Ping, Wagner, Eric J, Zoghbi, Huda Y, Liu, Zhandong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7337927/
https://www.ncbi.nlm.nih.gov/pubmed/32463457
http://dx.doi.org/10.1093/nar/gkaa398
_version_ 1783554579950993408
author Yalamanchili, Hari Krishna
Alcott, Callison E
Ji, Ping
Wagner, Eric J
Zoghbi, Huda Y
Liu, Zhandong
author_facet Yalamanchili, Hari Krishna
Alcott, Callison E
Ji, Ping
Wagner, Eric J
Zoghbi, Huda Y
Liu, Zhandong
author_sort Yalamanchili, Hari Krishna
collection PubMed
description Almost 70% of human genes undergo alternative polyadenylation (APA) and generate mRNA transcripts with varying lengths, typically of the 3′ untranslated regions (UTR). APA plays an important role in development and cellular differentiation, and its dysregulation can cause neuropsychiatric diseases and increase cancer severity. Increasing awareness of APA’s role in human health and disease has propelled the development of several 3′ sequencing (3′Seq) techniques that allow for precise identification of APA sites. However, despite the recent data explosion, there are no robust computational tools that are precisely designed to analyze 3′Seq data. Analytical approaches that have been used to analyze these data predominantly use proximal to distal usage. With about 50% of human genes having more than two APA isoforms, current methods fail to capture the entirety of APA changes and do not account for non-proximal to non-distal changes. Addressing these key challenges, this study demonstrates PolyA-miner, an algorithm to accurately detect and assess differential alternative polyadenylation specifically from 3′Seq data. Genes are abstracted as APA matrices, and differential APA usage is inferred using iterative consensus non-negative matrix factorization (NMF) based clustering. PolyA-miner accounts for all non-proximal to non-distal APA switches using vector projections and reflects precise gene-level 3′UTR changes. It can also effectively identify novel APA sites that are otherwise undetected when using reference-based approaches. Evaluation on multiple datasets—first-generation MicroArray Quality Control (MAQC) brain and Universal Human Reference (UHR) PolyA-seq data, recent glioblastoma cell line NUDT21 knockdown Poly(A)-ClickSeq (PAC-seq) data, and our own mouse hippocampal and human stem cell-derived neuron PAC-seq data—strongly supports the value and protocol-independent applicability of PolyA-miner. Strikingly, in the glioblastoma cell line data, PolyA-miner identified more than twice the number of genes with APA changes than initially reported. With the emerging importance of APA in human development and disease, PolyA-miner can significantly improve data analysis and help decode the underlying APA dynamics.
format Online
Article
Text
id pubmed-7337927
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-73379272020-07-13 PolyA-miner: accurate assessment of differential alternative poly-adenylation from 3′Seq data using vector projections and non-negative matrix factorization Yalamanchili, Hari Krishna Alcott, Callison E Ji, Ping Wagner, Eric J Zoghbi, Huda Y Liu, Zhandong Nucleic Acids Res Methods Online Almost 70% of human genes undergo alternative polyadenylation (APA) and generate mRNA transcripts with varying lengths, typically of the 3′ untranslated regions (UTR). APA plays an important role in development and cellular differentiation, and its dysregulation can cause neuropsychiatric diseases and increase cancer severity. Increasing awareness of APA’s role in human health and disease has propelled the development of several 3′ sequencing (3′Seq) techniques that allow for precise identification of APA sites. However, despite the recent data explosion, there are no robust computational tools that are precisely designed to analyze 3′Seq data. Analytical approaches that have been used to analyze these data predominantly use proximal to distal usage. With about 50% of human genes having more than two APA isoforms, current methods fail to capture the entirety of APA changes and do not account for non-proximal to non-distal changes. Addressing these key challenges, this study demonstrates PolyA-miner, an algorithm to accurately detect and assess differential alternative polyadenylation specifically from 3′Seq data. Genes are abstracted as APA matrices, and differential APA usage is inferred using iterative consensus non-negative matrix factorization (NMF) based clustering. PolyA-miner accounts for all non-proximal to non-distal APA switches using vector projections and reflects precise gene-level 3′UTR changes. It can also effectively identify novel APA sites that are otherwise undetected when using reference-based approaches. Evaluation on multiple datasets—first-generation MicroArray Quality Control (MAQC) brain and Universal Human Reference (UHR) PolyA-seq data, recent glioblastoma cell line NUDT21 knockdown Poly(A)-ClickSeq (PAC-seq) data, and our own mouse hippocampal and human stem cell-derived neuron PAC-seq data—strongly supports the value and protocol-independent applicability of PolyA-miner. Strikingly, in the glioblastoma cell line data, PolyA-miner identified more than twice the number of genes with APA changes than initially reported. With the emerging importance of APA in human development and disease, PolyA-miner can significantly improve data analysis and help decode the underlying APA dynamics. Oxford University Press 2020-05-28 /pmc/articles/PMC7337927/ /pubmed/32463457 http://dx.doi.org/10.1093/nar/gkaa398 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Yalamanchili, Hari Krishna
Alcott, Callison E
Ji, Ping
Wagner, Eric J
Zoghbi, Huda Y
Liu, Zhandong
PolyA-miner: accurate assessment of differential alternative poly-adenylation from 3′Seq data using vector projections and non-negative matrix factorization
title PolyA-miner: accurate assessment of differential alternative poly-adenylation from 3′Seq data using vector projections and non-negative matrix factorization
title_full PolyA-miner: accurate assessment of differential alternative poly-adenylation from 3′Seq data using vector projections and non-negative matrix factorization
title_fullStr PolyA-miner: accurate assessment of differential alternative poly-adenylation from 3′Seq data using vector projections and non-negative matrix factorization
title_full_unstemmed PolyA-miner: accurate assessment of differential alternative poly-adenylation from 3′Seq data using vector projections and non-negative matrix factorization
title_short PolyA-miner: accurate assessment of differential alternative poly-adenylation from 3′Seq data using vector projections and non-negative matrix factorization
title_sort polya-miner: accurate assessment of differential alternative poly-adenylation from 3′seq data using vector projections and non-negative matrix factorization
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7337927/
https://www.ncbi.nlm.nih.gov/pubmed/32463457
http://dx.doi.org/10.1093/nar/gkaa398
work_keys_str_mv AT yalamanchiliharikrishna polyamineraccurateassessmentofdifferentialalternativepolyadenylationfrom3seqdatausingvectorprojectionsandnonnegativematrixfactorization
AT alcottcallisone polyamineraccurateassessmentofdifferentialalternativepolyadenylationfrom3seqdatausingvectorprojectionsandnonnegativematrixfactorization
AT jiping polyamineraccurateassessmentofdifferentialalternativepolyadenylationfrom3seqdatausingvectorprojectionsandnonnegativematrixfactorization
AT wagnerericj polyamineraccurateassessmentofdifferentialalternativepolyadenylationfrom3seqdatausingvectorprojectionsandnonnegativematrixfactorization
AT zoghbihuday polyamineraccurateassessmentofdifferentialalternativepolyadenylationfrom3seqdatausingvectorprojectionsandnonnegativematrixfactorization
AT liuzhandong polyamineraccurateassessmentofdifferentialalternativepolyadenylationfrom3seqdatausingvectorprojectionsandnonnegativematrixfactorization