Cargando…

Comparative studies of differential gene calling using RNA-Seq data

BACKGROUND: With its massive amount of data, gene-expression profiling by RNA-Seq has many advantanges compared with microarray experiments. RNA-Seq analysis, however, is fundamentally different from microarray data analysis. Techniques developed for analyzing microarray data thus cannot be directly...

Descripción completa

Detalles Bibliográficos
Autores principales: Zheng, Ximeng, Moriyama, Etsuko N
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3891352/
https://www.ncbi.nlm.nih.gov/pubmed/24267181
http://dx.doi.org/10.1186/1471-2105-14-S13-S7
_version_ 1782299367467646976
author Zheng, Ximeng
Moriyama, Etsuko N
author_facet Zheng, Ximeng
Moriyama, Etsuko N
author_sort Zheng, Ximeng
collection PubMed
description BACKGROUND: With its massive amount of data, gene-expression profiling by RNA-Seq has many advantanges compared with microarray experiments. RNA-Seq analysis, however, is fundamentally different from microarray data analysis. Techniques developed for analyzing microarray data thus cannot be directly applicable for the digital gene expression data. Several statistical methods have been developed for identifying differentially expressed genes specifically from RNA-Seq data over the past few years. RESULTS: In this study, we examined the performance of differential gene-calling methods using RNA-Seq data in practical situations. We focused on two representative methods: one parametric method, DESeq, and one nonparametric method, NOISeq. We examined their performance using both simulated and real datasets. Our simulation followed the RNA-Seq process and produced more realistic short read data. Both DESeq and NOISeq identified over-expressed genes more correctly than under-expressed genes. While DESeq was more likely to call longer genes as differentially expressed than shorter ones, NOISeq did not have such bias. When the underlying variation increased, both methods showed higher rates of false positives. When replicates were not available in the experiments, both methods showed lower rates of true positives and higher rates of false positives. CONCLUSIONS: The level of variation clearly affected the performance of both methods, showing the importance of understanding the variation in the data as well as having replications in RNA-Seq experiments. We showed that it is possible to obtain improved differential gene-calling results by combining the results obtained by the two methods. We suggested strategies to use these two methods individually or combined according to the characteristics of the data.
format Online
Article
Text
id pubmed-3891352
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-38913522014-01-15 Comparative studies of differential gene calling using RNA-Seq data Zheng, Ximeng Moriyama, Etsuko N BMC Bioinformatics Research BACKGROUND: With its massive amount of data, gene-expression profiling by RNA-Seq has many advantanges compared with microarray experiments. RNA-Seq analysis, however, is fundamentally different from microarray data analysis. Techniques developed for analyzing microarray data thus cannot be directly applicable for the digital gene expression data. Several statistical methods have been developed for identifying differentially expressed genes specifically from RNA-Seq data over the past few years. RESULTS: In this study, we examined the performance of differential gene-calling methods using RNA-Seq data in practical situations. We focused on two representative methods: one parametric method, DESeq, and one nonparametric method, NOISeq. We examined their performance using both simulated and real datasets. Our simulation followed the RNA-Seq process and produced more realistic short read data. Both DESeq and NOISeq identified over-expressed genes more correctly than under-expressed genes. While DESeq was more likely to call longer genes as differentially expressed than shorter ones, NOISeq did not have such bias. When the underlying variation increased, both methods showed higher rates of false positives. When replicates were not available in the experiments, both methods showed lower rates of true positives and higher rates of false positives. CONCLUSIONS: The level of variation clearly affected the performance of both methods, showing the importance of understanding the variation in the data as well as having replications in RNA-Seq experiments. We showed that it is possible to obtain improved differential gene-calling results by combining the results obtained by the two methods. We suggested strategies to use these two methods individually or combined according to the characteristics of the data. BioMed Central 2013-10-01 /pmc/articles/PMC3891352/ /pubmed/24267181 http://dx.doi.org/10.1186/1471-2105-14-S13-S7 Text en Copyright © 2013 Zheng and Moriyama; licensee BioMed Central Ltd.
spellingShingle Research
Zheng, Ximeng
Moriyama, Etsuko N
Comparative studies of differential gene calling using RNA-Seq data
title Comparative studies of differential gene calling using RNA-Seq data
title_full Comparative studies of differential gene calling using RNA-Seq data
title_fullStr Comparative studies of differential gene calling using RNA-Seq data
title_full_unstemmed Comparative studies of differential gene calling using RNA-Seq data
title_short Comparative studies of differential gene calling using RNA-Seq data
title_sort comparative studies of differential gene calling using rna-seq data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3891352/
https://www.ncbi.nlm.nih.gov/pubmed/24267181
http://dx.doi.org/10.1186/1471-2105-14-S13-S7
work_keys_str_mv AT zhengximeng comparativestudiesofdifferentialgenecallingusingrnaseqdata
AT moriyamaetsukon comparativestudiesofdifferentialgenecallingusingrnaseqdata