Cargando…

Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence

Annotation of polyadenylation sites from short-read RNA sequencing alone is a challenging computational task. Other algorithms rooted in DNA sequence predict potential polyadenylation sites; however, in vivo expression of a particular site varies based on a myriad of conditions. Here, we introduce a...

Descripción completa

Detalles Bibliográficos
Autores principales: Lusk, Ryan, Stene, Evan, Banaei-Kashani, Farnoush, Tabakoff, Boris, Kechris, Katerina, Saba, Laura M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7955126/
https://www.ncbi.nlm.nih.gov/pubmed/33712618
http://dx.doi.org/10.1038/s41467-021-21894-x
_version_ 1783664198034653184
author Lusk, Ryan
Stene, Evan
Banaei-Kashani, Farnoush
Tabakoff, Boris
Kechris, Katerina
Saba, Laura M.
author_facet Lusk, Ryan
Stene, Evan
Banaei-Kashani, Farnoush
Tabakoff, Boris
Kechris, Katerina
Saba, Laura M.
author_sort Lusk, Ryan
collection PubMed
description Annotation of polyadenylation sites from short-read RNA sequencing alone is a challenging computational task. Other algorithms rooted in DNA sequence predict potential polyadenylation sites; however, in vivo expression of a particular site varies based on a myriad of conditions. Here, we introduce aptardi (alternative polyadenylation transcriptome analysis from RNA-Seq data and DNA sequence information), which leverages both DNA sequence and RNA sequencing in a machine learning paradigm to predict expressed polyadenylation sites. Specifically, as input aptardi takes DNA nucleotide sequence, genome-aligned RNA-Seq data, and an initial transcriptome. The program evaluates these initial transcripts to identify expressed polyadenylation sites in the biological sample and refines transcript 3′-ends accordingly. The average precision of the aptardi model is twice that of a standard transcriptome assembler. In particular, the recall of the aptardi model (the proportion of true polyadenylation sites detected by the algorithm) is improved by over three-fold. Also, the model—trained using the Human Brain Reference RNA commercial standard—performs well when applied to RNA-sequencing samples from different tissues and different mammalian species. Finally, aptardi’s input is simple to compile and its output is easily amenable to downstream analyses such as quantitation and differential expression.
format Online
Article
Text
id pubmed-7955126
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-79551262021-03-28 Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence Lusk, Ryan Stene, Evan Banaei-Kashani, Farnoush Tabakoff, Boris Kechris, Katerina Saba, Laura M. Nat Commun Article Annotation of polyadenylation sites from short-read RNA sequencing alone is a challenging computational task. Other algorithms rooted in DNA sequence predict potential polyadenylation sites; however, in vivo expression of a particular site varies based on a myriad of conditions. Here, we introduce aptardi (alternative polyadenylation transcriptome analysis from RNA-Seq data and DNA sequence information), which leverages both DNA sequence and RNA sequencing in a machine learning paradigm to predict expressed polyadenylation sites. Specifically, as input aptardi takes DNA nucleotide sequence, genome-aligned RNA-Seq data, and an initial transcriptome. The program evaluates these initial transcripts to identify expressed polyadenylation sites in the biological sample and refines transcript 3′-ends accordingly. The average precision of the aptardi model is twice that of a standard transcriptome assembler. In particular, the recall of the aptardi model (the proportion of true polyadenylation sites detected by the algorithm) is improved by over three-fold. Also, the model—trained using the Human Brain Reference RNA commercial standard—performs well when applied to RNA-sequencing samples from different tissues and different mammalian species. Finally, aptardi’s input is simple to compile and its output is easily amenable to downstream analyses such as quantitation and differential expression. Nature Publishing Group UK 2021-03-12 /pmc/articles/PMC7955126/ /pubmed/33712618 http://dx.doi.org/10.1038/s41467-021-21894-x Text en © The Author(s) 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Lusk, Ryan
Stene, Evan
Banaei-Kashani, Farnoush
Tabakoff, Boris
Kechris, Katerina
Saba, Laura M.
Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence
title Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence
title_full Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence
title_fullStr Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence
title_full_unstemmed Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence
title_short Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence
title_sort aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput rna sequencing and dna sequence
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7955126/
https://www.ncbi.nlm.nih.gov/pubmed/33712618
http://dx.doi.org/10.1038/s41467-021-21894-x
work_keys_str_mv AT luskryan aptardipredictspolyadenylationsitesinsamplespecifictranscriptomesusinghighthroughputrnasequencinganddnasequence
AT steneevan aptardipredictspolyadenylationsitesinsamplespecifictranscriptomesusinghighthroughputrnasequencinganddnasequence
AT banaeikashanifarnoush aptardipredictspolyadenylationsitesinsamplespecifictranscriptomesusinghighthroughputrnasequencinganddnasequence
AT tabakoffboris aptardipredictspolyadenylationsitesinsamplespecifictranscriptomesusinghighthroughputrnasequencinganddnasequence
AT kechriskaterina aptardipredictspolyadenylationsitesinsamplespecifictranscriptomesusinghighthroughputrnasequencinganddnasequence
AT sabalauram aptardipredictspolyadenylationsitesinsamplespecifictranscriptomesusinghighthroughputrnasequencinganddnasequence