Cargando…

Shark: fishing relevant reads in an RNA-Seq sample

MOTIVATION: Recent advances in high-throughput RNA-Seq technologies allow to produce massive datasets. When a study focuses only on a handful of genes, most reads are not relevant and degrade the performance of the tools used to analyze the data. Removing irrelevant reads from the input dataset lead...

Descripción completa

Detalles Bibliográficos
Autores principales: Denti, Luca, Pirola, Yuri, Previtali, Marco, Ceccato, Tamara, Della Vedova, Gianluca, Rizzi, Raffaella, Bonizzoni, Paola
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8088329/
https://www.ncbi.nlm.nih.gov/pubmed/32926128
http://dx.doi.org/10.1093/bioinformatics/btaa779
_version_ 1783686826486136832
author Denti, Luca
Pirola, Yuri
Previtali, Marco
Ceccato, Tamara
Della Vedova, Gianluca
Rizzi, Raffaella
Bonizzoni, Paola
author_facet Denti, Luca
Pirola, Yuri
Previtali, Marco
Ceccato, Tamara
Della Vedova, Gianluca
Rizzi, Raffaella
Bonizzoni, Paola
author_sort Denti, Luca
collection PubMed
description MOTIVATION: Recent advances in high-throughput RNA-Seq technologies allow to produce massive datasets. When a study focuses only on a handful of genes, most reads are not relevant and degrade the performance of the tools used to analyze the data. Removing irrelevant reads from the input dataset leads to improved efficiency without compromising the results of the study. RESULTS: We introduce a novel computational problem, called gene assignment and we propose an efficient alignment-free approach to solve it. Given an RNA-Seq sample and a panel of genes, a gene assignment consists in extracting from the sample, the reads that most probably were sequenced from those genes. The problem becomes more complicated when the sample exhibits evidence of novel alternative splicing events. We implemented our approach in a tool called Shark and assessed its effectiveness in speeding up differential splicing analysis pipelines. This evaluation shows that Shark is able to significantly improve the performance of RNA-Seq analysis tools without having any impact on the final results. AVAILABILITY AND IMPLEMENTATION: The tool is distributed as a stand-alone module and the software is freely available at https://github.com/AlgoLab/shark. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8088329
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-80883292021-05-05 Shark: fishing relevant reads in an RNA-Seq sample Denti, Luca Pirola, Yuri Previtali, Marco Ceccato, Tamara Della Vedova, Gianluca Rizzi, Raffaella Bonizzoni, Paola Bioinformatics Original Papers MOTIVATION: Recent advances in high-throughput RNA-Seq technologies allow to produce massive datasets. When a study focuses only on a handful of genes, most reads are not relevant and degrade the performance of the tools used to analyze the data. Removing irrelevant reads from the input dataset leads to improved efficiency without compromising the results of the study. RESULTS: We introduce a novel computational problem, called gene assignment and we propose an efficient alignment-free approach to solve it. Given an RNA-Seq sample and a panel of genes, a gene assignment consists in extracting from the sample, the reads that most probably were sequenced from those genes. The problem becomes more complicated when the sample exhibits evidence of novel alternative splicing events. We implemented our approach in a tool called Shark and assessed its effectiveness in speeding up differential splicing analysis pipelines. This evaluation shows that Shark is able to significantly improve the performance of RNA-Seq analysis tools without having any impact on the final results. AVAILABILITY AND IMPLEMENTATION: The tool is distributed as a stand-alone module and the software is freely available at https://github.com/AlgoLab/shark. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-09-14 /pmc/articles/PMC8088329/ /pubmed/32926128 http://dx.doi.org/10.1093/bioinformatics/btaa779 Text en © The Author(s) 2020. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Denti, Luca
Pirola, Yuri
Previtali, Marco
Ceccato, Tamara
Della Vedova, Gianluca
Rizzi, Raffaella
Bonizzoni, Paola
Shark: fishing relevant reads in an RNA-Seq sample
title Shark: fishing relevant reads in an RNA-Seq sample
title_full Shark: fishing relevant reads in an RNA-Seq sample
title_fullStr Shark: fishing relevant reads in an RNA-Seq sample
title_full_unstemmed Shark: fishing relevant reads in an RNA-Seq sample
title_short Shark: fishing relevant reads in an RNA-Seq sample
title_sort shark: fishing relevant reads in an rna-seq sample
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8088329/
https://www.ncbi.nlm.nih.gov/pubmed/32926128
http://dx.doi.org/10.1093/bioinformatics/btaa779
work_keys_str_mv AT dentiluca sharkfishingrelevantreadsinanrnaseqsample
AT pirolayuri sharkfishingrelevantreadsinanrnaseqsample
AT previtalimarco sharkfishingrelevantreadsinanrnaseqsample
AT ceccatotamara sharkfishingrelevantreadsinanrnaseqsample
AT dellavedovagianluca sharkfishingrelevantreadsinanrnaseqsample
AT rizziraffaella sharkfishingrelevantreadsinanrnaseqsample
AT bonizzonipaola sharkfishingrelevantreadsinanrnaseqsample