Cargando…

SCIA: A Novel Gene Set Analysis Applicable to Data With Different Characteristics

Gene set analysis is commonly used in functional enrichment and molecular pathway analyses. Most of the present methods are based on the competitive testing methods which assume each gene is independent of the others. However, the false discovery rates of competitive methods are amplified when they...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Yiqun, Wu, Ying, Zhang, Xiaohan, Bai, Yunfan, Akthar, Luqman Muhammad, Lu, Xin, Shi, Ming, Zhao, Jianxiang, Jiang, Qinghua, Li, Yu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6603225/
https://www.ncbi.nlm.nih.gov/pubmed/31293623
http://dx.doi.org/10.3389/fgene.2019.00598
_version_ 1783431477255471104
author Li, Yiqun
Wu, Ying
Zhang, Xiaohan
Bai, Yunfan
Akthar, Luqman Muhammad
Lu, Xin
Shi, Ming
Zhao, Jianxiang
Jiang, Qinghua
Li, Yu
author_facet Li, Yiqun
Wu, Ying
Zhang, Xiaohan
Bai, Yunfan
Akthar, Luqman Muhammad
Lu, Xin
Shi, Ming
Zhao, Jianxiang
Jiang, Qinghua
Li, Yu
author_sort Li, Yiqun
collection PubMed
description Gene set analysis is commonly used in functional enrichment and molecular pathway analyses. Most of the present methods are based on the competitive testing methods which assume each gene is independent of the others. However, the false discovery rates of competitive methods are amplified when they are applied to datasets with high inter-gene correlations. The self-contained testing methods could solve this problem, but there are other restrictions on data characteristics. Therefore, a statistically rigorous testing method applicable to different datasets with various complex characteristics is needed to obtain unbiased and comparable results. We propose a self-contained and competitive incorporated analysis (SCIA) to alleviate the bias caused by the limited application scope of existing gene set analysis methods. This is accomplished through a novel permutation strategy using a priori biological networks to selectively permute gene labels with different probabilities. In simulation studies, SCIA was compared with four representative analysis methods (GSEA, CAMERA, ROAST, and NES), and produced the best performance in both false discovery rate and sensitivity under most conditions with different parameter settings. Further, the KEGG pathway analysis on two real datasets of lung cancer showed that the results found by SCIA in both of the two datasets are much more than that of GSEA and most of them could be supported by literature. Overall, SCIA promisingly offers researchers more reliable and comparable results with different datasets.
format Online
Article
Text
id pubmed-6603225
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-66032252019-07-10 SCIA: A Novel Gene Set Analysis Applicable to Data With Different Characteristics Li, Yiqun Wu, Ying Zhang, Xiaohan Bai, Yunfan Akthar, Luqman Muhammad Lu, Xin Shi, Ming Zhao, Jianxiang Jiang, Qinghua Li, Yu Front Genet Genetics Gene set analysis is commonly used in functional enrichment and molecular pathway analyses. Most of the present methods are based on the competitive testing methods which assume each gene is independent of the others. However, the false discovery rates of competitive methods are amplified when they are applied to datasets with high inter-gene correlations. The self-contained testing methods could solve this problem, but there are other restrictions on data characteristics. Therefore, a statistically rigorous testing method applicable to different datasets with various complex characteristics is needed to obtain unbiased and comparable results. We propose a self-contained and competitive incorporated analysis (SCIA) to alleviate the bias caused by the limited application scope of existing gene set analysis methods. This is accomplished through a novel permutation strategy using a priori biological networks to selectively permute gene labels with different probabilities. In simulation studies, SCIA was compared with four representative analysis methods (GSEA, CAMERA, ROAST, and NES), and produced the best performance in both false discovery rate and sensitivity under most conditions with different parameter settings. Further, the KEGG pathway analysis on two real datasets of lung cancer showed that the results found by SCIA in both of the two datasets are much more than that of GSEA and most of them could be supported by literature. Overall, SCIA promisingly offers researchers more reliable and comparable results with different datasets. Frontiers Media S.A. 2019-06-25 /pmc/articles/PMC6603225/ /pubmed/31293623 http://dx.doi.org/10.3389/fgene.2019.00598 Text en Copyright © 2019 Li, Wu, Zhang, Bai, Akthar, Lu, Shi, Zhao, Jiang and Li. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Li, Yiqun
Wu, Ying
Zhang, Xiaohan
Bai, Yunfan
Akthar, Luqman Muhammad
Lu, Xin
Shi, Ming
Zhao, Jianxiang
Jiang, Qinghua
Li, Yu
SCIA: A Novel Gene Set Analysis Applicable to Data With Different Characteristics
title SCIA: A Novel Gene Set Analysis Applicable to Data With Different Characteristics
title_full SCIA: A Novel Gene Set Analysis Applicable to Data With Different Characteristics
title_fullStr SCIA: A Novel Gene Set Analysis Applicable to Data With Different Characteristics
title_full_unstemmed SCIA: A Novel Gene Set Analysis Applicable to Data With Different Characteristics
title_short SCIA: A Novel Gene Set Analysis Applicable to Data With Different Characteristics
title_sort scia: a novel gene set analysis applicable to data with different characteristics
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6603225/
https://www.ncbi.nlm.nih.gov/pubmed/31293623
http://dx.doi.org/10.3389/fgene.2019.00598
work_keys_str_mv AT liyiqun sciaanovelgenesetanalysisapplicabletodatawithdifferentcharacteristics
AT wuying sciaanovelgenesetanalysisapplicabletodatawithdifferentcharacteristics
AT zhangxiaohan sciaanovelgenesetanalysisapplicabletodatawithdifferentcharacteristics
AT baiyunfan sciaanovelgenesetanalysisapplicabletodatawithdifferentcharacteristics
AT aktharluqmanmuhammad sciaanovelgenesetanalysisapplicabletodatawithdifferentcharacteristics
AT luxin sciaanovelgenesetanalysisapplicabletodatawithdifferentcharacteristics
AT shiming sciaanovelgenesetanalysisapplicabletodatawithdifferentcharacteristics
AT zhaojianxiang sciaanovelgenesetanalysisapplicabletodatawithdifferentcharacteristics
AT jiangqinghua sciaanovelgenesetanalysisapplicabletodatawithdifferentcharacteristics
AT liyu sciaanovelgenesetanalysisapplicabletodatawithdifferentcharacteristics