Cargando…
Grouped False-Discovery Rate for Removing the Gene-Set-Level Bias of RNA-seq
In recent years, RNA-seq has become a very competitive alternative to microarrays. In RNA-seq experiments, the expected read count for a gene is proportional to its expression level multiplied by its transcript length. Even when two genes are expressed at the same level, differences in length will y...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Libertas Academica
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3836564/ https://www.ncbi.nlm.nih.gov/pubmed/24277981 http://dx.doi.org/10.4137/EBO.S13099 |
_version_ | 1782292320912146432 |
---|---|
author | Yang, Tae Young Jeong, Seongmun |
author_facet | Yang, Tae Young Jeong, Seongmun |
author_sort | Yang, Tae Young |
collection | PubMed |
description | In recent years, RNA-seq has become a very competitive alternative to microarrays. In RNA-seq experiments, the expected read count for a gene is proportional to its expression level multiplied by its transcript length. Even when two genes are expressed at the same level, differences in length will yield differing numbers of total reads. The characteristics of these RNA-seq experiments create a gene-level bias such that the proportion of significantly differentially expressed genes increases with the transcript length, whereas such bias is not present in microarray data. Gene-set analysis seeks to identify the gene sets that are enriched in the list of the identified significant genes. In the gene-set analysis of RNA-seq, the gene-level bias subsequently yields the gene-set-level bias that a gene set with genes of long length will be more likely to show up as enriched than will a gene set with genes of shorter length. Because gene expression is not related to its transcript length, any gene set containing long genes is not of biologically greater interest than gene sets with shorter genes. Accordingly the gene-set-level bias should be removed to accurately calculate the statistical significance of each gene-set enrichment in the RNA-seq. We present a new gene set analysis method of RNA-seq, called FDRseq, which can accurately calculate the statistical significance of a gene-set enrichment score by the grouped false-discovery rate. Numerical examples indicated that FDRseq is appropriate for controlling the transcript length bias in the gene-set analysis of RNA-seq data. To implement FDRseq, we developed the R program, which can be downloaded at no cost from http://home.mju.ac.kr/home/index.action?siteId=tyang. |
format | Online Article Text |
id | pubmed-3836564 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Libertas Academica |
record_format | MEDLINE/PubMed |
spelling | pubmed-38365642013-11-25 Grouped False-Discovery Rate for Removing the Gene-Set-Level Bias of RNA-seq Yang, Tae Young Jeong, Seongmun Evol Bioinform Online Methodology In recent years, RNA-seq has become a very competitive alternative to microarrays. In RNA-seq experiments, the expected read count for a gene is proportional to its expression level multiplied by its transcript length. Even when two genes are expressed at the same level, differences in length will yield differing numbers of total reads. The characteristics of these RNA-seq experiments create a gene-level bias such that the proportion of significantly differentially expressed genes increases with the transcript length, whereas such bias is not present in microarray data. Gene-set analysis seeks to identify the gene sets that are enriched in the list of the identified significant genes. In the gene-set analysis of RNA-seq, the gene-level bias subsequently yields the gene-set-level bias that a gene set with genes of long length will be more likely to show up as enriched than will a gene set with genes of shorter length. Because gene expression is not related to its transcript length, any gene set containing long genes is not of biologically greater interest than gene sets with shorter genes. Accordingly the gene-set-level bias should be removed to accurately calculate the statistical significance of each gene-set enrichment in the RNA-seq. We present a new gene set analysis method of RNA-seq, called FDRseq, which can accurately calculate the statistical significance of a gene-set enrichment score by the grouped false-discovery rate. Numerical examples indicated that FDRseq is appropriate for controlling the transcript length bias in the gene-set analysis of RNA-seq data. To implement FDRseq, we developed the R program, which can be downloaded at no cost from http://home.mju.ac.kr/home/index.action?siteId=tyang. Libertas Academica 2013-11-13 /pmc/articles/PMC3836564/ /pubmed/24277981 http://dx.doi.org/10.4137/EBO.S13099 Text en © 2013 the author(s), publisher and licensee Libertas Academica Ltd. This is an open access article published under the Creative Commons CC-BY-NC 3.0 license. |
spellingShingle | Methodology Yang, Tae Young Jeong, Seongmun Grouped False-Discovery Rate for Removing the Gene-Set-Level Bias of RNA-seq |
title | Grouped False-Discovery Rate for Removing the Gene-Set-Level Bias of RNA-seq |
title_full | Grouped False-Discovery Rate for Removing the Gene-Set-Level Bias of RNA-seq |
title_fullStr | Grouped False-Discovery Rate for Removing the Gene-Set-Level Bias of RNA-seq |
title_full_unstemmed | Grouped False-Discovery Rate for Removing the Gene-Set-Level Bias of RNA-seq |
title_short | Grouped False-Discovery Rate for Removing the Gene-Set-Level Bias of RNA-seq |
title_sort | grouped false-discovery rate for removing the gene-set-level bias of rna-seq |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3836564/ https://www.ncbi.nlm.nih.gov/pubmed/24277981 http://dx.doi.org/10.4137/EBO.S13099 |
work_keys_str_mv | AT yangtaeyoung groupedfalsediscoveryrateforremovingthegenesetlevelbiasofrnaseq AT jeongseongmun groupedfalsediscoveryrateforremovingthegenesetlevelbiasofrnaseq |