Cargando…

Grouped False-Discovery Rate for Removing the Gene-Set-Level Bias of RNA-seq

In recent years, RNA-seq has become a very competitive alternative to microarrays. In RNA-seq experiments, the expected read count for a gene is proportional to its expression level multiplied by its transcript length. Even when two genes are expressed at the same level, differences in length will y...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Tae Young, Jeong, Seongmun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Libertas Academica 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3836564/
https://www.ncbi.nlm.nih.gov/pubmed/24277981
http://dx.doi.org/10.4137/EBO.S13099
_version_ 1782292320912146432
author Yang, Tae Young
Jeong, Seongmun
author_facet Yang, Tae Young
Jeong, Seongmun
author_sort Yang, Tae Young
collection PubMed
description In recent years, RNA-seq has become a very competitive alternative to microarrays. In RNA-seq experiments, the expected read count for a gene is proportional to its expression level multiplied by its transcript length. Even when two genes are expressed at the same level, differences in length will yield differing numbers of total reads. The characteristics of these RNA-seq experiments create a gene-level bias such that the proportion of significantly differentially expressed genes increases with the transcript length, whereas such bias is not present in microarray data. Gene-set analysis seeks to identify the gene sets that are enriched in the list of the identified significant genes. In the gene-set analysis of RNA-seq, the gene-level bias subsequently yields the gene-set-level bias that a gene set with genes of long length will be more likely to show up as enriched than will a gene set with genes of shorter length. Because gene expression is not related to its transcript length, any gene set containing long genes is not of biologically greater interest than gene sets with shorter genes. Accordingly the gene-set-level bias should be removed to accurately calculate the statistical significance of each gene-set enrichment in the RNA-seq. We present a new gene set analysis method of RNA-seq, called FDRseq, which can accurately calculate the statistical significance of a gene-set enrichment score by the grouped false-discovery rate. Numerical examples indicated that FDRseq is appropriate for controlling the transcript length bias in the gene-set analysis of RNA-seq data. To implement FDRseq, we developed the R program, which can be downloaded at no cost from http://home.mju.ac.kr/home/index.action?siteId=tyang.
format Online
Article
Text
id pubmed-3836564
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-38365642013-11-25 Grouped False-Discovery Rate for Removing the Gene-Set-Level Bias of RNA-seq Yang, Tae Young Jeong, Seongmun Evol Bioinform Online Methodology In recent years, RNA-seq has become a very competitive alternative to microarrays. In RNA-seq experiments, the expected read count for a gene is proportional to its expression level multiplied by its transcript length. Even when two genes are expressed at the same level, differences in length will yield differing numbers of total reads. The characteristics of these RNA-seq experiments create a gene-level bias such that the proportion of significantly differentially expressed genes increases with the transcript length, whereas such bias is not present in microarray data. Gene-set analysis seeks to identify the gene sets that are enriched in the list of the identified significant genes. In the gene-set analysis of RNA-seq, the gene-level bias subsequently yields the gene-set-level bias that a gene set with genes of long length will be more likely to show up as enriched than will a gene set with genes of shorter length. Because gene expression is not related to its transcript length, any gene set containing long genes is not of biologically greater interest than gene sets with shorter genes. Accordingly the gene-set-level bias should be removed to accurately calculate the statistical significance of each gene-set enrichment in the RNA-seq. We present a new gene set analysis method of RNA-seq, called FDRseq, which can accurately calculate the statistical significance of a gene-set enrichment score by the grouped false-discovery rate. Numerical examples indicated that FDRseq is appropriate for controlling the transcript length bias in the gene-set analysis of RNA-seq data. To implement FDRseq, we developed the R program, which can be downloaded at no cost from http://home.mju.ac.kr/home/index.action?siteId=tyang. Libertas Academica 2013-11-13 /pmc/articles/PMC3836564/ /pubmed/24277981 http://dx.doi.org/10.4137/EBO.S13099 Text en © 2013 the author(s), publisher and licensee Libertas Academica Ltd. This is an open access article published under the Creative Commons CC-BY-NC 3.0 license.
spellingShingle Methodology
Yang, Tae Young
Jeong, Seongmun
Grouped False-Discovery Rate for Removing the Gene-Set-Level Bias of RNA-seq
title Grouped False-Discovery Rate for Removing the Gene-Set-Level Bias of RNA-seq
title_full Grouped False-Discovery Rate for Removing the Gene-Set-Level Bias of RNA-seq
title_fullStr Grouped False-Discovery Rate for Removing the Gene-Set-Level Bias of RNA-seq
title_full_unstemmed Grouped False-Discovery Rate for Removing the Gene-Set-Level Bias of RNA-seq
title_short Grouped False-Discovery Rate for Removing the Gene-Set-Level Bias of RNA-seq
title_sort grouped false-discovery rate for removing the gene-set-level bias of rna-seq
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3836564/
https://www.ncbi.nlm.nih.gov/pubmed/24277981
http://dx.doi.org/10.4137/EBO.S13099
work_keys_str_mv AT yangtaeyoung groupedfalsediscoveryrateforremovingthegenesetlevelbiasofrnaseq
AT jeongseongmun groupedfalsediscoveryrateforremovingthegenesetlevelbiasofrnaseq