Cargando…

Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data

Benchmarking RNA-seq differential expression analysis methods using spike-in and simulated RNA-seq data has often yielded inconsistent results. The spike-in data, which were generated from the same bulk RNA sample, only represent technical variability, making the test results less reliable. We compa...

Descripción completa

Detalles Bibliográficos
Autores principales: Baik, Bukyung, Yoon, Sora, Nam, Dougu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7192453/
https://www.ncbi.nlm.nih.gov/pubmed/32353015
http://dx.doi.org/10.1371/journal.pone.0232271
_version_ 1783528012108529664
author Baik, Bukyung
Yoon, Sora
Nam, Dougu
author_facet Baik, Bukyung
Yoon, Sora
Nam, Dougu
author_sort Baik, Bukyung
collection PubMed
description Benchmarking RNA-seq differential expression analysis methods using spike-in and simulated RNA-seq data has often yielded inconsistent results. The spike-in data, which were generated from the same bulk RNA sample, only represent technical variability, making the test results less reliable. We compared the performance of 12 differential expression analysis methods for RNA-seq data, including recent variants in widely used software packages, using both RNA spike-in and simulation data for negative binomial (NB) model. Performance of edgeR, DESeq2, and ROTS was particularly different between the two benchmark tests. Then, each method was tested under most extensive simulation conditions especially demonstrating the large impacts of proportion, dispersion, and balance of differentially expressed (DE) genes. DESeq2, a robust version of edgeR (edgeR.rb), voom with TMM normalization (voom.tmm) and sample weights (voom.sw) showed an overall good performance regardless of presence of outliers and proportion of DE genes. The performance of RNA-seq DE gene analysis methods substantially depended on the benchmark used. Based on the simulation results, suitable methods were suggested under various test conditions.
format Online
Article
Text
id pubmed-7192453
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-71924532020-05-11 Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data Baik, Bukyung Yoon, Sora Nam, Dougu PLoS One Research Article Benchmarking RNA-seq differential expression analysis methods using spike-in and simulated RNA-seq data has often yielded inconsistent results. The spike-in data, which were generated from the same bulk RNA sample, only represent technical variability, making the test results less reliable. We compared the performance of 12 differential expression analysis methods for RNA-seq data, including recent variants in widely used software packages, using both RNA spike-in and simulation data for negative binomial (NB) model. Performance of edgeR, DESeq2, and ROTS was particularly different between the two benchmark tests. Then, each method was tested under most extensive simulation conditions especially demonstrating the large impacts of proportion, dispersion, and balance of differentially expressed (DE) genes. DESeq2, a robust version of edgeR (edgeR.rb), voom with TMM normalization (voom.tmm) and sample weights (voom.sw) showed an overall good performance regardless of presence of outliers and proportion of DE genes. The performance of RNA-seq DE gene analysis methods substantially depended on the benchmark used. Based on the simulation results, suitable methods were suggested under various test conditions. Public Library of Science 2020-04-30 /pmc/articles/PMC7192453/ /pubmed/32353015 http://dx.doi.org/10.1371/journal.pone.0232271 Text en © 2020 Baik et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Baik, Bukyung
Yoon, Sora
Nam, Dougu
Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data
title Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data
title_full Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data
title_fullStr Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data
title_full_unstemmed Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data
title_short Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data
title_sort benchmarking rna-seq differential expression analysis methods using spike-in and simulation data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7192453/
https://www.ncbi.nlm.nih.gov/pubmed/32353015
http://dx.doi.org/10.1371/journal.pone.0232271
work_keys_str_mv AT baikbukyung benchmarkingrnaseqdifferentialexpressionanalysismethodsusingspikeinandsimulationdata
AT yoonsora benchmarkingrnaseqdifferentialexpressionanalysismethodsusingspikeinandsimulationdata
AT namdougu benchmarkingrnaseqdifferentialexpressionanalysismethodsusingspikeinandsimulationdata