Cargando…

Gene set analysis controlling for length bias in RNA-seq experiments

BACKGROUND: In gene set analysis, the researchers are interested in determining the gene sets that are significantly correlated with an outcome, e.g. disease status or treatment. With the rapid development of high throughput sequencing technologies, Ribonucleic acid sequencing (RNA-seq) has become a...

Descripción completa

Detalles Bibliográficos
Autores principales: Ren, Xing, Hu, Qiang, Liu, Song, Wang, Jianmin, Miecznikowski, Jeffrey C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5294840/
https://www.ncbi.nlm.nih.gov/pubmed/28184252
http://dx.doi.org/10.1186/s13040-017-0125-9
_version_ 1782505316218306560
author Ren, Xing
Hu, Qiang
Liu, Song
Wang, Jianmin
Miecznikowski, Jeffrey C.
author_facet Ren, Xing
Hu, Qiang
Liu, Song
Wang, Jianmin
Miecznikowski, Jeffrey C.
author_sort Ren, Xing
collection PubMed
description BACKGROUND: In gene set analysis, the researchers are interested in determining the gene sets that are significantly correlated with an outcome, e.g. disease status or treatment. With the rapid development of high throughput sequencing technologies, Ribonucleic acid sequencing (RNA-seq) has become an important alternative to traditional expression arrays in gene expression studies. Challenges exist in adopting the existent algorithms to RNA-seq data given the intrinsic difference of the technologies and data. In RNA-seq experiments, the measure of gene expression is correlated with gene length. This inherent correlation may cause bias in gene set analysis. RESULTS: We develop SeqGSA, a new method for gene set analysis with length bias adjustment for RNA-seq data. It extends from the R package GSA designed for microarrays. Our method compares the gene set maxmean statistic against permutations, while also taking into account of the statistics of the other gene sets. To adjust for the gene length bias, we implement a flexible weighted sampling scheme in the restandardization step of our algorithm. We show our method improves the power of identifying significant gene sets that are affected by the length bias. We also show that our method maintains the type I error comparing with another representative method for gene set enrichment test. CONCLUSIONS: SeqGSA is a promising tool for testing significant gene pathways with RNA-seq data while adjusting for inherent gene length effect. It enhances the power to detect gene sets affected by the bias and maintains type I error under various situations.
format Online
Article
Text
id pubmed-5294840
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-52948402017-02-09 Gene set analysis controlling for length bias in RNA-seq experiments Ren, Xing Hu, Qiang Liu, Song Wang, Jianmin Miecznikowski, Jeffrey C. BioData Min Methodology BACKGROUND: In gene set analysis, the researchers are interested in determining the gene sets that are significantly correlated with an outcome, e.g. disease status or treatment. With the rapid development of high throughput sequencing technologies, Ribonucleic acid sequencing (RNA-seq) has become an important alternative to traditional expression arrays in gene expression studies. Challenges exist in adopting the existent algorithms to RNA-seq data given the intrinsic difference of the technologies and data. In RNA-seq experiments, the measure of gene expression is correlated with gene length. This inherent correlation may cause bias in gene set analysis. RESULTS: We develop SeqGSA, a new method for gene set analysis with length bias adjustment for RNA-seq data. It extends from the R package GSA designed for microarrays. Our method compares the gene set maxmean statistic against permutations, while also taking into account of the statistics of the other gene sets. To adjust for the gene length bias, we implement a flexible weighted sampling scheme in the restandardization step of our algorithm. We show our method improves the power of identifying significant gene sets that are affected by the length bias. We also show that our method maintains the type I error comparing with another representative method for gene set enrichment test. CONCLUSIONS: SeqGSA is a promising tool for testing significant gene pathways with RNA-seq data while adjusting for inherent gene length effect. It enhances the power to detect gene sets affected by the bias and maintains type I error under various situations. BioMed Central 2017-02-06 /pmc/articles/PMC5294840/ /pubmed/28184252 http://dx.doi.org/10.1186/s13040-017-0125-9 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology
Ren, Xing
Hu, Qiang
Liu, Song
Wang, Jianmin
Miecznikowski, Jeffrey C.
Gene set analysis controlling for length bias in RNA-seq experiments
title Gene set analysis controlling for length bias in RNA-seq experiments
title_full Gene set analysis controlling for length bias in RNA-seq experiments
title_fullStr Gene set analysis controlling for length bias in RNA-seq experiments
title_full_unstemmed Gene set analysis controlling for length bias in RNA-seq experiments
title_short Gene set analysis controlling for length bias in RNA-seq experiments
title_sort gene set analysis controlling for length bias in rna-seq experiments
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5294840/
https://www.ncbi.nlm.nih.gov/pubmed/28184252
http://dx.doi.org/10.1186/s13040-017-0125-9
work_keys_str_mv AT renxing genesetanalysiscontrollingforlengthbiasinrnaseqexperiments
AT huqiang genesetanalysiscontrollingforlengthbiasinrnaseqexperiments
AT liusong genesetanalysiscontrollingforlengthbiasinrnaseqexperiments
AT wangjianmin genesetanalysiscontrollingforlengthbiasinrnaseqexperiments
AT miecznikowskijeffreyc genesetanalysiscontrollingforlengthbiasinrnaseqexperiments