Cargando…
SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses
BACKGROUND: RNA sequencing (RNA-seq) is a common and widespread biological assay, and an increasing amount of data is generated with it. In practice, there are a large number of individual steps a researcher must perform before raw RNA-seq reads yield directly valuable information, such as different...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8088074/ https://www.ncbi.nlm.nih.gov/pubmed/33932985 http://dx.doi.org/10.1186/s12859-021-04142-3 |
_version_ | 1783686782501519360 |
---|---|
author | Eagles, Nicholas J. Burke, Emily E. Leonard, Jacob Barry, Brianna K. Stolz, Joshua M. Huuki, Louise Phan, BaDoi N. Serrato, Violeta Larios Gutiérrez-Millán, Everardo Aguilar-Ordoñez, Israel Jaffe, Andrew E. Collado-Torres, Leonardo |
author_facet | Eagles, Nicholas J. Burke, Emily E. Leonard, Jacob Barry, Brianna K. Stolz, Joshua M. Huuki, Louise Phan, BaDoi N. Serrato, Violeta Larios Gutiérrez-Millán, Everardo Aguilar-Ordoñez, Israel Jaffe, Andrew E. Collado-Torres, Leonardo |
author_sort | Eagles, Nicholas J. |
collection | PubMed |
description | BACKGROUND: RNA sequencing (RNA-seq) is a common and widespread biological assay, and an increasing amount of data is generated with it. In practice, there are a large number of individual steps a researcher must perform before raw RNA-seq reads yield directly valuable information, such as differential gene expression data. Existing software tools are typically specialized, only performing one step–such as alignment of reads to a reference genome–of a larger workflow. The demand for a more comprehensive and reproducible workflow has led to the production of a number of publicly available RNA-seq pipelines. However, we have found that most require computational expertise to set up or share among several users, are not actively maintained, or lack features we have found to be important in our own analyses. RESULTS: In response to these concerns, we have developed a Scalable Pipeline for Expression Analysis and Quantification (SPEAQeasy), which is easy to install and share, and provides a bridge towards R/Bioconductor downstream analysis solutions. SPEAQeasy is portable across computational frameworks (SGE, SLURM, local, docker integration) and different configuration files are provided (http://research.libd.org/SPEAQeasy/). CONCLUSIONS: SPEAQeasy is user-friendly and lowers the computational-domain entry barrier for biologists and clinicians to RNA-seq data processing as the main input file is a table with sample names and their corresponding FASTQ files. The goal is to provide a flexible pipeline that is immediately usable by researchers, regardless of their technical background or computing environment. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04142-3. |
format | Online Article Text |
id | pubmed-8088074 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-80880742021-05-03 SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses Eagles, Nicholas J. Burke, Emily E. Leonard, Jacob Barry, Brianna K. Stolz, Joshua M. Huuki, Louise Phan, BaDoi N. Serrato, Violeta Larios Gutiérrez-Millán, Everardo Aguilar-Ordoñez, Israel Jaffe, Andrew E. Collado-Torres, Leonardo BMC Bioinformatics Software BACKGROUND: RNA sequencing (RNA-seq) is a common and widespread biological assay, and an increasing amount of data is generated with it. In practice, there are a large number of individual steps a researcher must perform before raw RNA-seq reads yield directly valuable information, such as differential gene expression data. Existing software tools are typically specialized, only performing one step–such as alignment of reads to a reference genome–of a larger workflow. The demand for a more comprehensive and reproducible workflow has led to the production of a number of publicly available RNA-seq pipelines. However, we have found that most require computational expertise to set up or share among several users, are not actively maintained, or lack features we have found to be important in our own analyses. RESULTS: In response to these concerns, we have developed a Scalable Pipeline for Expression Analysis and Quantification (SPEAQeasy), which is easy to install and share, and provides a bridge towards R/Bioconductor downstream analysis solutions. SPEAQeasy is portable across computational frameworks (SGE, SLURM, local, docker integration) and different configuration files are provided (http://research.libd.org/SPEAQeasy/). CONCLUSIONS: SPEAQeasy is user-friendly and lowers the computational-domain entry barrier for biologists and clinicians to RNA-seq data processing as the main input file is a table with sample names and their corresponding FASTQ files. The goal is to provide a flexible pipeline that is immediately usable by researchers, regardless of their technical background or computing environment. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04142-3. BioMed Central 2021-05-01 /pmc/articles/PMC8088074/ /pubmed/33932985 http://dx.doi.org/10.1186/s12859-021-04142-3 Text en © The Author(s) 2021, corrected publication 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Software Eagles, Nicholas J. Burke, Emily E. Leonard, Jacob Barry, Brianna K. Stolz, Joshua M. Huuki, Louise Phan, BaDoi N. Serrato, Violeta Larios Gutiérrez-Millán, Everardo Aguilar-Ordoñez, Israel Jaffe, Andrew E. Collado-Torres, Leonardo SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses |
title | SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses |
title_full | SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses |
title_fullStr | SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses |
title_full_unstemmed | SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses |
title_short | SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses |
title_sort | speaqeasy: a scalable pipeline for expression analysis and quantification for r/bioconductor-powered rna-seq analyses |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8088074/ https://www.ncbi.nlm.nih.gov/pubmed/33932985 http://dx.doi.org/10.1186/s12859-021-04142-3 |
work_keys_str_mv | AT eaglesnicholasj speaqeasyascalablepipelineforexpressionanalysisandquantificationforrbioconductorpoweredrnaseqanalyses AT burkeemilye speaqeasyascalablepipelineforexpressionanalysisandquantificationforrbioconductorpoweredrnaseqanalyses AT leonardjacob speaqeasyascalablepipelineforexpressionanalysisandquantificationforrbioconductorpoweredrnaseqanalyses AT barrybriannak speaqeasyascalablepipelineforexpressionanalysisandquantificationforrbioconductorpoweredrnaseqanalyses AT stolzjoshuam speaqeasyascalablepipelineforexpressionanalysisandquantificationforrbioconductorpoweredrnaseqanalyses AT huukilouise speaqeasyascalablepipelineforexpressionanalysisandquantificationforrbioconductorpoweredrnaseqanalyses AT phanbadoin speaqeasyascalablepipelineforexpressionanalysisandquantificationforrbioconductorpoweredrnaseqanalyses AT serratovioletalarios speaqeasyascalablepipelineforexpressionanalysisandquantificationforrbioconductorpoweredrnaseqanalyses AT gutierrezmillaneverardo speaqeasyascalablepipelineforexpressionanalysisandquantificationforrbioconductorpoweredrnaseqanalyses AT aguilarordonezisrael speaqeasyascalablepipelineforexpressionanalysisandquantificationforrbioconductorpoweredrnaseqanalyses AT jaffeandrewe speaqeasyascalablepipelineforexpressionanalysisandquantificationforrbioconductorpoweredrnaseqanalyses AT colladotorresleonardo speaqeasyascalablepipelineforexpressionanalysisandquantificationforrbioconductorpoweredrnaseqanalyses |