Cargando…

An evaluation of RNA-seq differential analysis methods

RNA-seq is a high-throughput sequencing technology widely used for gene transcript discovery and quantification under different biological or biomedical conditions. A fundamental research question in most RNA-seq experiments is the identification of differentially expressed genes among experimental...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Dongmei, Zand, Martin S., Dye, Timothy D., Goniewicz, Maciej L., Rahman, Irfan, Xie, Zidian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9480998/
https://www.ncbi.nlm.nih.gov/pubmed/36112652
http://dx.doi.org/10.1371/journal.pone.0264246
_version_ 1784791163931197440
author Li, Dongmei
Zand, Martin S.
Dye, Timothy D.
Goniewicz, Maciej L.
Rahman, Irfan
Xie, Zidian
author_facet Li, Dongmei
Zand, Martin S.
Dye, Timothy D.
Goniewicz, Maciej L.
Rahman, Irfan
Xie, Zidian
author_sort Li, Dongmei
collection PubMed
description RNA-seq is a high-throughput sequencing technology widely used for gene transcript discovery and quantification under different biological or biomedical conditions. A fundamental research question in most RNA-seq experiments is the identification of differentially expressed genes among experimental conditions or sample groups. Numerous statistical methods for RNA-seq differential analysis have been proposed since the emergence of the RNA-seq assay. To evaluate popular differential analysis methods used in the open source R and Bioconductor packages, we conducted multiple simulation studies to compare the performance of eight RNA-seq differential analysis methods used in RNA-seq data analysis (edgeR, DESeq, DESeq2, baySeq, EBSeq, NOISeq, SAMSeq, Voom). The comparisons were across different scenarios with either equal or unequal library sizes, different distribution assumptions and sample sizes. We measured performance using false discovery rate (FDR) control, power, and stability. No significant differences were observed for FDR control, power, or stability across methods, whether with equal or unequal library sizes. For RNA-seq count data with negative binomial distribution, when sample size is 3 in each group, EBSeq performed better than the other methods as indicated by FDR control, power, and stability. When sample sizes increase to 6 or 12 in each group, DESeq2 performed slightly better than other methods. All methods have improved performance when sample size increases to 12 in each group except DESeq. For RNA-seq count data with log-normal distribution, both DESeq and DESeq2 methods performed better than other methods in terms of FDR control, power, and stability across all sample sizes. Real RNA-seq experimental data were also used to compare the total number of discoveries and stability of discoveries for each method. For RNA-seq data analysis, the EBSeq method is recommended for studies with sample size as small as 3 in each group, and the DESeq2 method is recommended for sample size of 6 or higher in each group when the data follow the negative binomial distribution. Both DESeq and DESeq2 methods are recommended when the data follow the log-normal distribution.
format Online
Article
Text
id pubmed-9480998
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-94809982022-09-17 An evaluation of RNA-seq differential analysis methods Li, Dongmei Zand, Martin S. Dye, Timothy D. Goniewicz, Maciej L. Rahman, Irfan Xie, Zidian PLoS One Research Article RNA-seq is a high-throughput sequencing technology widely used for gene transcript discovery and quantification under different biological or biomedical conditions. A fundamental research question in most RNA-seq experiments is the identification of differentially expressed genes among experimental conditions or sample groups. Numerous statistical methods for RNA-seq differential analysis have been proposed since the emergence of the RNA-seq assay. To evaluate popular differential analysis methods used in the open source R and Bioconductor packages, we conducted multiple simulation studies to compare the performance of eight RNA-seq differential analysis methods used in RNA-seq data analysis (edgeR, DESeq, DESeq2, baySeq, EBSeq, NOISeq, SAMSeq, Voom). The comparisons were across different scenarios with either equal or unequal library sizes, different distribution assumptions and sample sizes. We measured performance using false discovery rate (FDR) control, power, and stability. No significant differences were observed for FDR control, power, or stability across methods, whether with equal or unequal library sizes. For RNA-seq count data with negative binomial distribution, when sample size is 3 in each group, EBSeq performed better than the other methods as indicated by FDR control, power, and stability. When sample sizes increase to 6 or 12 in each group, DESeq2 performed slightly better than other methods. All methods have improved performance when sample size increases to 12 in each group except DESeq. For RNA-seq count data with log-normal distribution, both DESeq and DESeq2 methods performed better than other methods in terms of FDR control, power, and stability across all sample sizes. Real RNA-seq experimental data were also used to compare the total number of discoveries and stability of discoveries for each method. For RNA-seq data analysis, the EBSeq method is recommended for studies with sample size as small as 3 in each group, and the DESeq2 method is recommended for sample size of 6 or higher in each group when the data follow the negative binomial distribution. Both DESeq and DESeq2 methods are recommended when the data follow the log-normal distribution. Public Library of Science 2022-09-16 /pmc/articles/PMC9480998/ /pubmed/36112652 http://dx.doi.org/10.1371/journal.pone.0264246 Text en © 2022 Li et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Li, Dongmei
Zand, Martin S.
Dye, Timothy D.
Goniewicz, Maciej L.
Rahman, Irfan
Xie, Zidian
An evaluation of RNA-seq differential analysis methods
title An evaluation of RNA-seq differential analysis methods
title_full An evaluation of RNA-seq differential analysis methods
title_fullStr An evaluation of RNA-seq differential analysis methods
title_full_unstemmed An evaluation of RNA-seq differential analysis methods
title_short An evaluation of RNA-seq differential analysis methods
title_sort evaluation of rna-seq differential analysis methods
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9480998/
https://www.ncbi.nlm.nih.gov/pubmed/36112652
http://dx.doi.org/10.1371/journal.pone.0264246
work_keys_str_mv AT lidongmei anevaluationofrnaseqdifferentialanalysismethods
AT zandmartins anevaluationofrnaseqdifferentialanalysismethods
AT dyetimothyd anevaluationofrnaseqdifferentialanalysismethods
AT goniewiczmaciejl anevaluationofrnaseqdifferentialanalysismethods
AT rahmanirfan anevaluationofrnaseqdifferentialanalysismethods
AT xiezidian anevaluationofrnaseqdifferentialanalysismethods
AT lidongmei evaluationofrnaseqdifferentialanalysismethods
AT zandmartins evaluationofrnaseqdifferentialanalysismethods
AT dyetimothyd evaluationofrnaseqdifferentialanalysismethods
AT goniewiczmaciejl evaluationofrnaseqdifferentialanalysismethods
AT rahmanirfan evaluationofrnaseqdifferentialanalysismethods
AT xiezidian evaluationofrnaseqdifferentialanalysismethods