Cargando…

Benchmarking sequencing methods and tools that facilitate the study of alternative polyadenylation

BACKGROUND: Alternative cleavage and polyadenylation (APA), an RNA processing event, occurs in over 70% of human protein-coding genes. APA results in mRNA transcripts with distinct 3′ ends. Most APA occurs within 3′ UTRs, which harbor regulatory elements that can impact mRNA stability, translation,...

Descripción completa

Detalles Bibliográficos
Autores principales: Shah, Ankeeta, Mittleman, Briana E., Gilad, Yoav, Li, Yang I.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8518154/
https://www.ncbi.nlm.nih.gov/pubmed/34649612
http://dx.doi.org/10.1186/s13059-021-02502-z
_version_ 1784584164172890112
author Shah, Ankeeta
Mittleman, Briana E.
Gilad, Yoav
Li, Yang I.
author_facet Shah, Ankeeta
Mittleman, Briana E.
Gilad, Yoav
Li, Yang I.
author_sort Shah, Ankeeta
collection PubMed
description BACKGROUND: Alternative cleavage and polyadenylation (APA), an RNA processing event, occurs in over 70% of human protein-coding genes. APA results in mRNA transcripts with distinct 3′ ends. Most APA occurs within 3′ UTRs, which harbor regulatory elements that can impact mRNA stability, translation, and localization. RESULTS: APA can be profiled using a number of established computational tools that infer polyadenylation sites from standard, short-read RNA-seq datasets. Here, we benchmarked a number of such tools—TAPAS, QAPA, DaPars2, GETUTR, and APATrap— against 3′-Seq, a specialized RNA-seq protocol that enriches for reads at the 3′ ends of genes, and Iso-Seq, a Pacific Biosciences (PacBio) single-molecule full-length RNA-seq method in their ability to identify polyadenylation sites and quantify polyadenylation site usage. We demonstrate that 3′-Seq and Iso-Seq are able to identify and quantify the usage of polyadenylation sites more reliably than computational tools that take short-read RNA-seq as input. However, we find that running one such tool, QAPA, with a set of polyadenylation site annotations derived from small quantities of 3′-Seq or Iso-Seq can reliably quantify variation in APA across conditions, such asacross genotypes, as demonstrated by the successful mapping of alternative polyadenylation quantitative trait loci (apaQTL). CONCLUSIONS: We envisage that our analyses will shed light on the advantages of studying APA with more specialized sequencing protocols, such as 3′-Seq or Iso-Seq, and the limitations of studying APA with short-read RNA-seq. We provide a computational pipeline to aid in the identification of polyadenylation sites and quantification of polyadenylation site usages using Iso-Seq data as input. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-021-02502-z.
format Online
Article
Text
id pubmed-8518154
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-85181542021-10-20 Benchmarking sequencing methods and tools that facilitate the study of alternative polyadenylation Shah, Ankeeta Mittleman, Briana E. Gilad, Yoav Li, Yang I. Genome Biol Research BACKGROUND: Alternative cleavage and polyadenylation (APA), an RNA processing event, occurs in over 70% of human protein-coding genes. APA results in mRNA transcripts with distinct 3′ ends. Most APA occurs within 3′ UTRs, which harbor regulatory elements that can impact mRNA stability, translation, and localization. RESULTS: APA can be profiled using a number of established computational tools that infer polyadenylation sites from standard, short-read RNA-seq datasets. Here, we benchmarked a number of such tools—TAPAS, QAPA, DaPars2, GETUTR, and APATrap— against 3′-Seq, a specialized RNA-seq protocol that enriches for reads at the 3′ ends of genes, and Iso-Seq, a Pacific Biosciences (PacBio) single-molecule full-length RNA-seq method in their ability to identify polyadenylation sites and quantify polyadenylation site usage. We demonstrate that 3′-Seq and Iso-Seq are able to identify and quantify the usage of polyadenylation sites more reliably than computational tools that take short-read RNA-seq as input. However, we find that running one such tool, QAPA, with a set of polyadenylation site annotations derived from small quantities of 3′-Seq or Iso-Seq can reliably quantify variation in APA across conditions, such asacross genotypes, as demonstrated by the successful mapping of alternative polyadenylation quantitative trait loci (apaQTL). CONCLUSIONS: We envisage that our analyses will shed light on the advantages of studying APA with more specialized sequencing protocols, such as 3′-Seq or Iso-Seq, and the limitations of studying APA with short-read RNA-seq. We provide a computational pipeline to aid in the identification of polyadenylation sites and quantification of polyadenylation site usages using Iso-Seq data as input. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-021-02502-z. BioMed Central 2021-10-14 /pmc/articles/PMC8518154/ /pubmed/34649612 http://dx.doi.org/10.1186/s13059-021-02502-z Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Shah, Ankeeta
Mittleman, Briana E.
Gilad, Yoav
Li, Yang I.
Benchmarking sequencing methods and tools that facilitate the study of alternative polyadenylation
title Benchmarking sequencing methods and tools that facilitate the study of alternative polyadenylation
title_full Benchmarking sequencing methods and tools that facilitate the study of alternative polyadenylation
title_fullStr Benchmarking sequencing methods and tools that facilitate the study of alternative polyadenylation
title_full_unstemmed Benchmarking sequencing methods and tools that facilitate the study of alternative polyadenylation
title_short Benchmarking sequencing methods and tools that facilitate the study of alternative polyadenylation
title_sort benchmarking sequencing methods and tools that facilitate the study of alternative polyadenylation
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8518154/
https://www.ncbi.nlm.nih.gov/pubmed/34649612
http://dx.doi.org/10.1186/s13059-021-02502-z
work_keys_str_mv AT shahankeeta benchmarkingsequencingmethodsandtoolsthatfacilitatethestudyofalternativepolyadenylation
AT mittlemanbrianae benchmarkingsequencingmethodsandtoolsthatfacilitatethestudyofalternativepolyadenylation
AT giladyoav benchmarkingsequencingmethodsandtoolsthatfacilitatethestudyofalternativepolyadenylation
AT liyangi benchmarkingsequencingmethodsandtoolsthatfacilitatethestudyofalternativepolyadenylation