Cargando…

SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses

BACKGROUND: RNA sequencing (RNA-seq) is a common and widespread biological assay, and an increasing amount of data is generated with it. In practice, there are a large number of individual steps a researcher must perform before raw RNA-seq reads yield directly valuable information, such as different...

Descripción completa

Detalles Bibliográficos
Autores principales: Eagles, Nicholas J., Burke, Emily E., Leonard, Jacob, Barry, Brianna K., Stolz, Joshua M., Huuki, Louise, Phan, BaDoi N., Serrato, Violeta Larios, Gutiérrez-Millán, Everardo, Aguilar-Ordoñez, Israel, Jaffe, Andrew E., Collado-Torres, Leonardo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8088074/
https://www.ncbi.nlm.nih.gov/pubmed/33932985
http://dx.doi.org/10.1186/s12859-021-04142-3
_version_ 1783686782501519360
author Eagles, Nicholas J.
Burke, Emily E.
Leonard, Jacob
Barry, Brianna K.
Stolz, Joshua M.
Huuki, Louise
Phan, BaDoi N.
Serrato, Violeta Larios
Gutiérrez-Millán, Everardo
Aguilar-Ordoñez, Israel
Jaffe, Andrew E.
Collado-Torres, Leonardo
author_facet Eagles, Nicholas J.
Burke, Emily E.
Leonard, Jacob
Barry, Brianna K.
Stolz, Joshua M.
Huuki, Louise
Phan, BaDoi N.
Serrato, Violeta Larios
Gutiérrez-Millán, Everardo
Aguilar-Ordoñez, Israel
Jaffe, Andrew E.
Collado-Torres, Leonardo
author_sort Eagles, Nicholas J.
collection PubMed
description BACKGROUND: RNA sequencing (RNA-seq) is a common and widespread biological assay, and an increasing amount of data is generated with it. In practice, there are a large number of individual steps a researcher must perform before raw RNA-seq reads yield directly valuable information, such as differential gene expression data. Existing software tools are typically specialized, only performing one step–such as alignment of reads to a reference genome–of a larger workflow. The demand for a more comprehensive and reproducible workflow has led to the production of a number of publicly available RNA-seq pipelines. However, we have found that most require computational expertise to set up or share among several users, are not actively maintained, or lack features we have found to be important in our own analyses. RESULTS: In response to these concerns, we have developed a Scalable Pipeline for Expression Analysis and Quantification (SPEAQeasy), which is easy to install and share, and provides a bridge towards R/Bioconductor downstream analysis solutions. SPEAQeasy is portable across computational frameworks (SGE, SLURM, local, docker integration) and different configuration files are provided (http://research.libd.org/SPEAQeasy/). CONCLUSIONS: SPEAQeasy is user-friendly and lowers the computational-domain entry barrier for biologists and clinicians to RNA-seq data processing as the main input file is a table with sample names and their corresponding FASTQ files. The goal is to provide a flexible pipeline that is immediately usable by researchers, regardless of their technical background or computing environment. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04142-3.
format Online
Article
Text
id pubmed-8088074
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-80880742021-05-03 SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses Eagles, Nicholas J. Burke, Emily E. Leonard, Jacob Barry, Brianna K. Stolz, Joshua M. Huuki, Louise Phan, BaDoi N. Serrato, Violeta Larios Gutiérrez-Millán, Everardo Aguilar-Ordoñez, Israel Jaffe, Andrew E. Collado-Torres, Leonardo BMC Bioinformatics Software BACKGROUND: RNA sequencing (RNA-seq) is a common and widespread biological assay, and an increasing amount of data is generated with it. In practice, there are a large number of individual steps a researcher must perform before raw RNA-seq reads yield directly valuable information, such as differential gene expression data. Existing software tools are typically specialized, only performing one step–such as alignment of reads to a reference genome–of a larger workflow. The demand for a more comprehensive and reproducible workflow has led to the production of a number of publicly available RNA-seq pipelines. However, we have found that most require computational expertise to set up or share among several users, are not actively maintained, or lack features we have found to be important in our own analyses. RESULTS: In response to these concerns, we have developed a Scalable Pipeline for Expression Analysis and Quantification (SPEAQeasy), which is easy to install and share, and provides a bridge towards R/Bioconductor downstream analysis solutions. SPEAQeasy is portable across computational frameworks (SGE, SLURM, local, docker integration) and different configuration files are provided (http://research.libd.org/SPEAQeasy/). CONCLUSIONS: SPEAQeasy is user-friendly and lowers the computational-domain entry barrier for biologists and clinicians to RNA-seq data processing as the main input file is a table with sample names and their corresponding FASTQ files. The goal is to provide a flexible pipeline that is immediately usable by researchers, regardless of their technical background or computing environment. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04142-3. BioMed Central 2021-05-01 /pmc/articles/PMC8088074/ /pubmed/33932985 http://dx.doi.org/10.1186/s12859-021-04142-3 Text en © The Author(s) 2021, corrected publication 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Eagles, Nicholas J.
Burke, Emily E.
Leonard, Jacob
Barry, Brianna K.
Stolz, Joshua M.
Huuki, Louise
Phan, BaDoi N.
Serrato, Violeta Larios
Gutiérrez-Millán, Everardo
Aguilar-Ordoñez, Israel
Jaffe, Andrew E.
Collado-Torres, Leonardo
SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses
title SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses
title_full SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses
title_fullStr SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses
title_full_unstemmed SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses
title_short SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses
title_sort speaqeasy: a scalable pipeline for expression analysis and quantification for r/bioconductor-powered rna-seq analyses
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8088074/
https://www.ncbi.nlm.nih.gov/pubmed/33932985
http://dx.doi.org/10.1186/s12859-021-04142-3
work_keys_str_mv AT eaglesnicholasj speaqeasyascalablepipelineforexpressionanalysisandquantificationforrbioconductorpoweredrnaseqanalyses
AT burkeemilye speaqeasyascalablepipelineforexpressionanalysisandquantificationforrbioconductorpoweredrnaseqanalyses
AT leonardjacob speaqeasyascalablepipelineforexpressionanalysisandquantificationforrbioconductorpoweredrnaseqanalyses
AT barrybriannak speaqeasyascalablepipelineforexpressionanalysisandquantificationforrbioconductorpoweredrnaseqanalyses
AT stolzjoshuam speaqeasyascalablepipelineforexpressionanalysisandquantificationforrbioconductorpoweredrnaseqanalyses
AT huukilouise speaqeasyascalablepipelineforexpressionanalysisandquantificationforrbioconductorpoweredrnaseqanalyses
AT phanbadoin speaqeasyascalablepipelineforexpressionanalysisandquantificationforrbioconductorpoweredrnaseqanalyses
AT serratovioletalarios speaqeasyascalablepipelineforexpressionanalysisandquantificationforrbioconductorpoweredrnaseqanalyses
AT gutierrezmillaneverardo speaqeasyascalablepipelineforexpressionanalysisandquantificationforrbioconductorpoweredrnaseqanalyses
AT aguilarordonezisrael speaqeasyascalablepipelineforexpressionanalysisandquantificationforrbioconductorpoweredrnaseqanalyses
AT jaffeandrewe speaqeasyascalablepipelineforexpressionanalysisandquantificationforrbioconductorpoweredrnaseqanalyses
AT colladotorresleonardo speaqeasyascalablepipelineforexpressionanalysisandquantificationforrbioconductorpoweredrnaseqanalyses