Cargando…

Empirical assessment of the impact of sample number and read depth on RNA-Seq analysis workflow performance

BACKGROUND: RNA-Sequencing analysis methods are rapidly evolving, and the tool choice for each step of one common workflow, differential expression analysis, which includes read alignment, expression modeling, and differentially expressed gene identification, has a dramatic impact on performance cha...

Descripción completa

Detalles Bibliográficos
Autores principales:	Baccarella, Alyssa, Williams, Claire R., Parrish, Jay Z., Kim, Charles C.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2018
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6234607/ https://www.ncbi.nlm.nih.gov/pubmed/30428853 http://dx.doi.org/10.1186/s12859-018-2445-2

_version_	1783370729961553920
author	Baccarella, Alyssa Williams, Claire R. Parrish, Jay Z. Kim, Charles C.
author_facet	Baccarella, Alyssa Williams, Claire R. Parrish, Jay Z. Kim, Charles C.
author_sort	Baccarella, Alyssa
collection	PubMed
description	BACKGROUND: RNA-Sequencing analysis methods are rapidly evolving, and the tool choice for each step of one common workflow, differential expression analysis, which includes read alignment, expression modeling, and differentially expressed gene identification, has a dramatic impact on performance characteristics. Although a number of workflows are emerging as high performers that are robust to diverse input types, the relative performance characteristics of these workflows when either read depth or sample number is limited–a common occurrence in real-world practice–remain unexplored. RESULTS: Here, we evaluate the impact of varying read depth and sample number on the performance of differential gene expression identification workflows, as measured by precision, or the fraction of genes correctly identified as differentially expressed, and by recall, or the fraction of differentially expressed genes identified. We focus our analysis on 30 high-performing workflows, systematically varying the read depth and number of biological replicates of patient monocyte samples provided as input. We find that, in general for most workflows, read depth has little effect on workflow performance when held above two million reads per sample, with reduced workflow performance below this threshold. The greatest impact of decreased sample number is seen below seven samples per group, when more heterogeneity in workflow performance is observed. The choice of differential expression identification tool, in particular, has a large impact on the response to limited inputs. CONCLUSIONS: Among the tested workflows, the recall/precision balance remains relatively stable at a range of read depths and sample numbers, although some workflows are more sensitive to input restriction. At ranges typically recommended for biological studies, performance is more greatly impacted by the number of biological replicates than by read depth. Caution should be used when selecting analysis workflows and interpreting results from low sample number experiments, as all workflows exhibit poorer performance at lower sample numbers near typically reported values, with variable impact on recall versus precision. These analyses highlight the performance characteristics of common differential gene expression workflows at varying read depths and sample numbers, and provide empirical guidance in experimental and analytical design. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2445-2) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6234607
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-62346072018-11-23 Empirical assessment of the impact of sample number and read depth on RNA-Seq analysis workflow performance Baccarella, Alyssa Williams, Claire R. Parrish, Jay Z. Kim, Charles C. BMC Bioinformatics Research Article BACKGROUND: RNA-Sequencing analysis methods are rapidly evolving, and the tool choice for each step of one common workflow, differential expression analysis, which includes read alignment, expression modeling, and differentially expressed gene identification, has a dramatic impact on performance characteristics. Although a number of workflows are emerging as high performers that are robust to diverse input types, the relative performance characteristics of these workflows when either read depth or sample number is limited–a common occurrence in real-world practice–remain unexplored. RESULTS: Here, we evaluate the impact of varying read depth and sample number on the performance of differential gene expression identification workflows, as measured by precision, or the fraction of genes correctly identified as differentially expressed, and by recall, or the fraction of differentially expressed genes identified. We focus our analysis on 30 high-performing workflows, systematically varying the read depth and number of biological replicates of patient monocyte samples provided as input. We find that, in general for most workflows, read depth has little effect on workflow performance when held above two million reads per sample, with reduced workflow performance below this threshold. The greatest impact of decreased sample number is seen below seven samples per group, when more heterogeneity in workflow performance is observed. The choice of differential expression identification tool, in particular, has a large impact on the response to limited inputs. CONCLUSIONS: Among the tested workflows, the recall/precision balance remains relatively stable at a range of read depths and sample numbers, although some workflows are more sensitive to input restriction. At ranges typically recommended for biological studies, performance is more greatly impacted by the number of biological replicates than by read depth. Caution should be used when selecting analysis workflows and interpreting results from low sample number experiments, as all workflows exhibit poorer performance at lower sample numbers near typically reported values, with variable impact on recall versus precision. These analyses highlight the performance characteristics of common differential gene expression workflows at varying read depths and sample numbers, and provide empirical guidance in experimental and analytical design. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2445-2) contains supplementary material, which is available to authorized users. BioMed Central 2018-11-14 /pmc/articles/PMC6234607/ /pubmed/30428853 http://dx.doi.org/10.1186/s12859-018-2445-2 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Baccarella, Alyssa Williams, Claire R. Parrish, Jay Z. Kim, Charles C. Empirical assessment of the impact of sample number and read depth on RNA-Seq analysis workflow performance
title	Empirical assessment of the impact of sample number and read depth on RNA-Seq analysis workflow performance
title_full	Empirical assessment of the impact of sample number and read depth on RNA-Seq analysis workflow performance
title_fullStr	Empirical assessment of the impact of sample number and read depth on RNA-Seq analysis workflow performance
title_full_unstemmed	Empirical assessment of the impact of sample number and read depth on RNA-Seq analysis workflow performance
title_short	Empirical assessment of the impact of sample number and read depth on RNA-Seq analysis workflow performance
title_sort	empirical assessment of the impact of sample number and read depth on rna-seq analysis workflow performance
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6234607/ https://www.ncbi.nlm.nih.gov/pubmed/30428853 http://dx.doi.org/10.1186/s12859-018-2445-2
work_keys_str_mv	AT baccarellaalyssa empiricalassessmentoftheimpactofsamplenumberandreaddepthonrnaseqanalysisworkflowperformance AT williamsclairer empiricalassessmentoftheimpactofsamplenumberandreaddepthonrnaseqanalysisworkflowperformance AT parrishjayz empiricalassessmentoftheimpactofsamplenumberandreaddepthonrnaseqanalysisworkflowperformance AT kimcharlesc empiricalassessmentoftheimpactofsamplenumberandreaddepthonrnaseqanalysisworkflowperformance

Empirical assessment of the impact of sample number and read depth on RNA-Seq analysis workflow performance

Ejemplares similares