Cargando…
Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence
Annotation of polyadenylation sites from short-read RNA sequencing alone is a challenging computational task. Other algorithms rooted in DNA sequence predict potential polyadenylation sites; however, in vivo expression of a particular site varies based on a myriad of conditions. Here, we introduce a...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7955126/ https://www.ncbi.nlm.nih.gov/pubmed/33712618 http://dx.doi.org/10.1038/s41467-021-21894-x |
_version_ | 1783664198034653184 |
---|---|
author | Lusk, Ryan Stene, Evan Banaei-Kashani, Farnoush Tabakoff, Boris Kechris, Katerina Saba, Laura M. |
author_facet | Lusk, Ryan Stene, Evan Banaei-Kashani, Farnoush Tabakoff, Boris Kechris, Katerina Saba, Laura M. |
author_sort | Lusk, Ryan |
collection | PubMed |
description | Annotation of polyadenylation sites from short-read RNA sequencing alone is a challenging computational task. Other algorithms rooted in DNA sequence predict potential polyadenylation sites; however, in vivo expression of a particular site varies based on a myriad of conditions. Here, we introduce aptardi (alternative polyadenylation transcriptome analysis from RNA-Seq data and DNA sequence information), which leverages both DNA sequence and RNA sequencing in a machine learning paradigm to predict expressed polyadenylation sites. Specifically, as input aptardi takes DNA nucleotide sequence, genome-aligned RNA-Seq data, and an initial transcriptome. The program evaluates these initial transcripts to identify expressed polyadenylation sites in the biological sample and refines transcript 3′-ends accordingly. The average precision of the aptardi model is twice that of a standard transcriptome assembler. In particular, the recall of the aptardi model (the proportion of true polyadenylation sites detected by the algorithm) is improved by over three-fold. Also, the model—trained using the Human Brain Reference RNA commercial standard—performs well when applied to RNA-sequencing samples from different tissues and different mammalian species. Finally, aptardi’s input is simple to compile and its output is easily amenable to downstream analyses such as quantitation and differential expression. |
format | Online Article Text |
id | pubmed-7955126 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-79551262021-03-28 Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence Lusk, Ryan Stene, Evan Banaei-Kashani, Farnoush Tabakoff, Boris Kechris, Katerina Saba, Laura M. Nat Commun Article Annotation of polyadenylation sites from short-read RNA sequencing alone is a challenging computational task. Other algorithms rooted in DNA sequence predict potential polyadenylation sites; however, in vivo expression of a particular site varies based on a myriad of conditions. Here, we introduce aptardi (alternative polyadenylation transcriptome analysis from RNA-Seq data and DNA sequence information), which leverages both DNA sequence and RNA sequencing in a machine learning paradigm to predict expressed polyadenylation sites. Specifically, as input aptardi takes DNA nucleotide sequence, genome-aligned RNA-Seq data, and an initial transcriptome. The program evaluates these initial transcripts to identify expressed polyadenylation sites in the biological sample and refines transcript 3′-ends accordingly. The average precision of the aptardi model is twice that of a standard transcriptome assembler. In particular, the recall of the aptardi model (the proportion of true polyadenylation sites detected by the algorithm) is improved by over three-fold. Also, the model—trained using the Human Brain Reference RNA commercial standard—performs well when applied to RNA-sequencing samples from different tissues and different mammalian species. Finally, aptardi’s input is simple to compile and its output is easily amenable to downstream analyses such as quantitation and differential expression. Nature Publishing Group UK 2021-03-12 /pmc/articles/PMC7955126/ /pubmed/33712618 http://dx.doi.org/10.1038/s41467-021-21894-x Text en © The Author(s) 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Lusk, Ryan Stene, Evan Banaei-Kashani, Farnoush Tabakoff, Boris Kechris, Katerina Saba, Laura M. Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence |
title | Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence |
title_full | Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence |
title_fullStr | Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence |
title_full_unstemmed | Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence |
title_short | Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence |
title_sort | aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput rna sequencing and dna sequence |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7955126/ https://www.ncbi.nlm.nih.gov/pubmed/33712618 http://dx.doi.org/10.1038/s41467-021-21894-x |
work_keys_str_mv | AT luskryan aptardipredictspolyadenylationsitesinsamplespecifictranscriptomesusinghighthroughputrnasequencinganddnasequence AT steneevan aptardipredictspolyadenylationsitesinsamplespecifictranscriptomesusinghighthroughputrnasequencinganddnasequence AT banaeikashanifarnoush aptardipredictspolyadenylationsitesinsamplespecifictranscriptomesusinghighthroughputrnasequencinganddnasequence AT tabakoffboris aptardipredictspolyadenylationsitesinsamplespecifictranscriptomesusinghighthroughputrnasequencinganddnasequence AT kechriskaterina aptardipredictspolyadenylationsitesinsamplespecifictranscriptomesusinghighthroughputrnasequencinganddnasequence AT sabalauram aptardipredictspolyadenylationsitesinsamplespecifictranscriptomesusinghighthroughputrnasequencinganddnasequence |