Cargando…
An evaluation of RNA-seq differential analysis methods
RNA-seq is a high-throughput sequencing technology widely used for gene transcript discovery and quantification under different biological or biomedical conditions. A fundamental research question in most RNA-seq experiments is the identification of differentially expressed genes among experimental...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9480998/ https://www.ncbi.nlm.nih.gov/pubmed/36112652 http://dx.doi.org/10.1371/journal.pone.0264246 |
_version_ | 1784791163931197440 |
---|---|
author | Li, Dongmei Zand, Martin S. Dye, Timothy D. Goniewicz, Maciej L. Rahman, Irfan Xie, Zidian |
author_facet | Li, Dongmei Zand, Martin S. Dye, Timothy D. Goniewicz, Maciej L. Rahman, Irfan Xie, Zidian |
author_sort | Li, Dongmei |
collection | PubMed |
description | RNA-seq is a high-throughput sequencing technology widely used for gene transcript discovery and quantification under different biological or biomedical conditions. A fundamental research question in most RNA-seq experiments is the identification of differentially expressed genes among experimental conditions or sample groups. Numerous statistical methods for RNA-seq differential analysis have been proposed since the emergence of the RNA-seq assay. To evaluate popular differential analysis methods used in the open source R and Bioconductor packages, we conducted multiple simulation studies to compare the performance of eight RNA-seq differential analysis methods used in RNA-seq data analysis (edgeR, DESeq, DESeq2, baySeq, EBSeq, NOISeq, SAMSeq, Voom). The comparisons were across different scenarios with either equal or unequal library sizes, different distribution assumptions and sample sizes. We measured performance using false discovery rate (FDR) control, power, and stability. No significant differences were observed for FDR control, power, or stability across methods, whether with equal or unequal library sizes. For RNA-seq count data with negative binomial distribution, when sample size is 3 in each group, EBSeq performed better than the other methods as indicated by FDR control, power, and stability. When sample sizes increase to 6 or 12 in each group, DESeq2 performed slightly better than other methods. All methods have improved performance when sample size increases to 12 in each group except DESeq. For RNA-seq count data with log-normal distribution, both DESeq and DESeq2 methods performed better than other methods in terms of FDR control, power, and stability across all sample sizes. Real RNA-seq experimental data were also used to compare the total number of discoveries and stability of discoveries for each method. For RNA-seq data analysis, the EBSeq method is recommended for studies with sample size as small as 3 in each group, and the DESeq2 method is recommended for sample size of 6 or higher in each group when the data follow the negative binomial distribution. Both DESeq and DESeq2 methods are recommended when the data follow the log-normal distribution. |
format | Online Article Text |
id | pubmed-9480998 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-94809982022-09-17 An evaluation of RNA-seq differential analysis methods Li, Dongmei Zand, Martin S. Dye, Timothy D. Goniewicz, Maciej L. Rahman, Irfan Xie, Zidian PLoS One Research Article RNA-seq is a high-throughput sequencing technology widely used for gene transcript discovery and quantification under different biological or biomedical conditions. A fundamental research question in most RNA-seq experiments is the identification of differentially expressed genes among experimental conditions or sample groups. Numerous statistical methods for RNA-seq differential analysis have been proposed since the emergence of the RNA-seq assay. To evaluate popular differential analysis methods used in the open source R and Bioconductor packages, we conducted multiple simulation studies to compare the performance of eight RNA-seq differential analysis methods used in RNA-seq data analysis (edgeR, DESeq, DESeq2, baySeq, EBSeq, NOISeq, SAMSeq, Voom). The comparisons were across different scenarios with either equal or unequal library sizes, different distribution assumptions and sample sizes. We measured performance using false discovery rate (FDR) control, power, and stability. No significant differences were observed for FDR control, power, or stability across methods, whether with equal or unequal library sizes. For RNA-seq count data with negative binomial distribution, when sample size is 3 in each group, EBSeq performed better than the other methods as indicated by FDR control, power, and stability. When sample sizes increase to 6 or 12 in each group, DESeq2 performed slightly better than other methods. All methods have improved performance when sample size increases to 12 in each group except DESeq. For RNA-seq count data with log-normal distribution, both DESeq and DESeq2 methods performed better than other methods in terms of FDR control, power, and stability across all sample sizes. Real RNA-seq experimental data were also used to compare the total number of discoveries and stability of discoveries for each method. For RNA-seq data analysis, the EBSeq method is recommended for studies with sample size as small as 3 in each group, and the DESeq2 method is recommended for sample size of 6 or higher in each group when the data follow the negative binomial distribution. Both DESeq and DESeq2 methods are recommended when the data follow the log-normal distribution. Public Library of Science 2022-09-16 /pmc/articles/PMC9480998/ /pubmed/36112652 http://dx.doi.org/10.1371/journal.pone.0264246 Text en © 2022 Li et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Li, Dongmei Zand, Martin S. Dye, Timothy D. Goniewicz, Maciej L. Rahman, Irfan Xie, Zidian An evaluation of RNA-seq differential analysis methods |
title | An evaluation of RNA-seq differential analysis methods |
title_full | An evaluation of RNA-seq differential analysis methods |
title_fullStr | An evaluation of RNA-seq differential analysis methods |
title_full_unstemmed | An evaluation of RNA-seq differential analysis methods |
title_short | An evaluation of RNA-seq differential analysis methods |
title_sort | evaluation of rna-seq differential analysis methods |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9480998/ https://www.ncbi.nlm.nih.gov/pubmed/36112652 http://dx.doi.org/10.1371/journal.pone.0264246 |
work_keys_str_mv | AT lidongmei anevaluationofrnaseqdifferentialanalysismethods AT zandmartins anevaluationofrnaseqdifferentialanalysismethods AT dyetimothyd anevaluationofrnaseqdifferentialanalysismethods AT goniewiczmaciejl anevaluationofrnaseqdifferentialanalysismethods AT rahmanirfan anevaluationofrnaseqdifferentialanalysismethods AT xiezidian anevaluationofrnaseqdifferentialanalysismethods AT lidongmei evaluationofrnaseqdifferentialanalysismethods AT zandmartins evaluationofrnaseqdifferentialanalysismethods AT dyetimothyd evaluationofrnaseqdifferentialanalysismethods AT goniewiczmaciejl evaluationofrnaseqdifferentialanalysismethods AT rahmanirfan evaluationofrnaseqdifferentialanalysismethods AT xiezidian evaluationofrnaseqdifferentialanalysismethods |