Cargando…
paraGSEA: a scalable approach for large-scale gene expression profiling
More studies have been conducted using gene expression similarity to identify functional connections among genes, diseases and drugs. Gene Set Enrichment Analysis (GSEA) is a powerful analytical method for interpreting gene expression data. However, due to its enormous computational overhead in the...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737394/ https://www.ncbi.nlm.nih.gov/pubmed/28973463 http://dx.doi.org/10.1093/nar/gkx679 |
_version_ | 1783287511417618432 |
---|---|
author | Peng, Shaoliang Yang, Shunyun Bo, Xiaochen Li, Fei |
author_facet | Peng, Shaoliang Yang, Shunyun Bo, Xiaochen Li, Fei |
author_sort | Peng, Shaoliang |
collection | PubMed |
description | More studies have been conducted using gene expression similarity to identify functional connections among genes, diseases and drugs. Gene Set Enrichment Analysis (GSEA) is a powerful analytical method for interpreting gene expression data. However, due to its enormous computational overhead in the estimation of significance level step and multiple hypothesis testing step, the computation scalability and efficiency are poor on large-scale datasets. We proposed paraGSEA for efficient large-scale transcriptome data analysis. By optimization, the overall time complexity of paraGSEA is reduced from O(mn) to O(m+n), where m is the length of the gene sets and n is the length of the gene expression profiles, which contributes more than 100-fold increase in performance compared with other popular GSEA implementations such as GSEA-P, SAM-GS and GSEA2. By further parallelization, a near-linear speed-up is gained on both workstations and clusters in an efficient manner with high scalability and performance on large-scale datasets. The analysis time of whole LINCS phase I dataset (GSE92742) was reduced to nearly half hour on a 1000 node cluster on Tianhe-2, or within 120 hours on a 96-core workstation. The source code of paraGSEA is licensed under the GPLv3 and available at http://github.com/ysycloud/paraGSEA. |
format | Online Article Text |
id | pubmed-5737394 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-57373942018-01-08 paraGSEA: a scalable approach for large-scale gene expression profiling Peng, Shaoliang Yang, Shunyun Bo, Xiaochen Li, Fei Nucleic Acids Res Methods Online More studies have been conducted using gene expression similarity to identify functional connections among genes, diseases and drugs. Gene Set Enrichment Analysis (GSEA) is a powerful analytical method for interpreting gene expression data. However, due to its enormous computational overhead in the estimation of significance level step and multiple hypothesis testing step, the computation scalability and efficiency are poor on large-scale datasets. We proposed paraGSEA for efficient large-scale transcriptome data analysis. By optimization, the overall time complexity of paraGSEA is reduced from O(mn) to O(m+n), where m is the length of the gene sets and n is the length of the gene expression profiles, which contributes more than 100-fold increase in performance compared with other popular GSEA implementations such as GSEA-P, SAM-GS and GSEA2. By further parallelization, a near-linear speed-up is gained on both workstations and clusters in an efficient manner with high scalability and performance on large-scale datasets. The analysis time of whole LINCS phase I dataset (GSE92742) was reduced to nearly half hour on a 1000 node cluster on Tianhe-2, or within 120 hours on a 96-core workstation. The source code of paraGSEA is licensed under the GPLv3 and available at http://github.com/ysycloud/paraGSEA. Oxford University Press 2017-09-29 2017-07-31 /pmc/articles/PMC5737394/ /pubmed/28973463 http://dx.doi.org/10.1093/nar/gkx679 Text en © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Methods Online Peng, Shaoliang Yang, Shunyun Bo, Xiaochen Li, Fei paraGSEA: a scalable approach for large-scale gene expression profiling |
title | paraGSEA: a scalable approach for large-scale gene expression profiling |
title_full | paraGSEA: a scalable approach for large-scale gene expression profiling |
title_fullStr | paraGSEA: a scalable approach for large-scale gene expression profiling |
title_full_unstemmed | paraGSEA: a scalable approach for large-scale gene expression profiling |
title_short | paraGSEA: a scalable approach for large-scale gene expression profiling |
title_sort | paragsea: a scalable approach for large-scale gene expression profiling |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737394/ https://www.ncbi.nlm.nih.gov/pubmed/28973463 http://dx.doi.org/10.1093/nar/gkx679 |
work_keys_str_mv | AT pengshaoliang paragseaascalableapproachforlargescalegeneexpressionprofiling AT yangshunyun paragseaascalableapproachforlargescalegeneexpressionprofiling AT boxiaochen paragseaascalableapproachforlargescalegeneexpressionprofiling AT lifei paragseaascalableapproachforlargescalegeneexpressionprofiling |