Cargando…

How well do RNA-Seq differential gene expression tools perform in a complex eukaryote? A case study in Arabidopsis thaliana

MOTIVATION: RNA-seq experiments are usually carried out in three or fewer replicates. In order to work well with so few samples, differential gene expression (DGE) tools typically assume the form of the underlying gene expression distribution. In this paper, the statistical properties of gene expres...

Descripción completa

Detalles Bibliográficos
Autores principales: Froussios, Kimon, Schurch, Nick J, Mackinnon, Katarzyna, Gierliński, Marek, Duc, Céline, Simpson, Gordon G, Barton, Geoffrey J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6748783/
https://www.ncbi.nlm.nih.gov/pubmed/30726870
http://dx.doi.org/10.1093/bioinformatics/btz089
_version_ 1783452148011368448
author Froussios, Kimon
Schurch, Nick J
Mackinnon, Katarzyna
Gierliński, Marek
Duc, Céline
Simpson, Gordon G
Barton, Geoffrey J
author_facet Froussios, Kimon
Schurch, Nick J
Mackinnon, Katarzyna
Gierliński, Marek
Duc, Céline
Simpson, Gordon G
Barton, Geoffrey J
author_sort Froussios, Kimon
collection PubMed
description MOTIVATION: RNA-seq experiments are usually carried out in three or fewer replicates. In order to work well with so few samples, differential gene expression (DGE) tools typically assume the form of the underlying gene expression distribution. In this paper, the statistical properties of gene expression from RNA-seq are investigated in the complex eukaryote, Arabidopsis thaliana, extending and generalizing the results of previous work in the simple eukaryote Saccharomyces cerevisiae. RESULTS: We show that, consistent with the results in S.cerevisiae, more gene expression measurements in A.thaliana are consistent with being drawn from an underlying negative binomial distribution than either a log-normal distribution or a normal distribution, and that the size and complexity of the A.thaliana transcriptome does not influence the false positive rate performance of nine widely used DGE tools tested here. We therefore recommend the use of DGE tools that are based on the negative binomial distribution. AVAILABILITY AND IMPLEMENTATION: The raw data for the 17 WT Arabidopsis thaliana datasets is available from the European Nucleotide Archive (E-MTAB-5446). The processed and aligned data can be visualized in context using IGB (Freese et al., 2016), or downloaded directly, using our publicly available IGB quickload server at https://compbio.lifesci.dundee.ac.uk/arabidopsisQuickload/public_quickload/ under ‘RNAseq>Froussios2019’. All scripts and commands are available from github at https://github.com/bartongroup/KF_arabidopsis-GRNA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6748783
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-67487832019-09-23 How well do RNA-Seq differential gene expression tools perform in a complex eukaryote? A case study in Arabidopsis thaliana Froussios, Kimon Schurch, Nick J Mackinnon, Katarzyna Gierliński, Marek Duc, Céline Simpson, Gordon G Barton, Geoffrey J Bioinformatics Original Papers MOTIVATION: RNA-seq experiments are usually carried out in three or fewer replicates. In order to work well with so few samples, differential gene expression (DGE) tools typically assume the form of the underlying gene expression distribution. In this paper, the statistical properties of gene expression from RNA-seq are investigated in the complex eukaryote, Arabidopsis thaliana, extending and generalizing the results of previous work in the simple eukaryote Saccharomyces cerevisiae. RESULTS: We show that, consistent with the results in S.cerevisiae, more gene expression measurements in A.thaliana are consistent with being drawn from an underlying negative binomial distribution than either a log-normal distribution or a normal distribution, and that the size and complexity of the A.thaliana transcriptome does not influence the false positive rate performance of nine widely used DGE tools tested here. We therefore recommend the use of DGE tools that are based on the negative binomial distribution. AVAILABILITY AND IMPLEMENTATION: The raw data for the 17 WT Arabidopsis thaliana datasets is available from the European Nucleotide Archive (E-MTAB-5446). The processed and aligned data can be visualized in context using IGB (Freese et al., 2016), or downloaded directly, using our publicly available IGB quickload server at https://compbio.lifesci.dundee.ac.uk/arabidopsisQuickload/public_quickload/ under ‘RNAseq>Froussios2019’. All scripts and commands are available from github at https://github.com/bartongroup/KF_arabidopsis-GRNA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-09-15 2019-02-06 /pmc/articles/PMC6748783/ /pubmed/30726870 http://dx.doi.org/10.1093/bioinformatics/btz089 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Froussios, Kimon
Schurch, Nick J
Mackinnon, Katarzyna
Gierliński, Marek
Duc, Céline
Simpson, Gordon G
Barton, Geoffrey J
How well do RNA-Seq differential gene expression tools perform in a complex eukaryote? A case study in Arabidopsis thaliana
title How well do RNA-Seq differential gene expression tools perform in a complex eukaryote? A case study in Arabidopsis thaliana
title_full How well do RNA-Seq differential gene expression tools perform in a complex eukaryote? A case study in Arabidopsis thaliana
title_fullStr How well do RNA-Seq differential gene expression tools perform in a complex eukaryote? A case study in Arabidopsis thaliana
title_full_unstemmed How well do RNA-Seq differential gene expression tools perform in a complex eukaryote? A case study in Arabidopsis thaliana
title_short How well do RNA-Seq differential gene expression tools perform in a complex eukaryote? A case study in Arabidopsis thaliana
title_sort how well do rna-seq differential gene expression tools perform in a complex eukaryote? a case study in arabidopsis thaliana
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6748783/
https://www.ncbi.nlm.nih.gov/pubmed/30726870
http://dx.doi.org/10.1093/bioinformatics/btz089
work_keys_str_mv AT froussioskimon howwelldornaseqdifferentialgeneexpressiontoolsperforminacomplexeukaryoteacasestudyinarabidopsisthaliana
AT schurchnickj howwelldornaseqdifferentialgeneexpressiontoolsperforminacomplexeukaryoteacasestudyinarabidopsisthaliana
AT mackinnonkatarzyna howwelldornaseqdifferentialgeneexpressiontoolsperforminacomplexeukaryoteacasestudyinarabidopsisthaliana
AT gierlinskimarek howwelldornaseqdifferentialgeneexpressiontoolsperforminacomplexeukaryoteacasestudyinarabidopsisthaliana
AT ducceline howwelldornaseqdifferentialgeneexpressiontoolsperforminacomplexeukaryoteacasestudyinarabidopsisthaliana
AT simpsongordong howwelldornaseqdifferentialgeneexpressiontoolsperforminacomplexeukaryoteacasestudyinarabidopsisthaliana
AT bartongeoffreyj howwelldornaseqdifferentialgeneexpressiontoolsperforminacomplexeukaryoteacasestudyinarabidopsisthaliana