Cargando…

paraGSEA: a scalable approach for large-scale gene expression profiling

More studies have been conducted using gene expression similarity to identify functional connections among genes, diseases and drugs. Gene Set Enrichment Analysis (GSEA) is a powerful analytical method for interpreting gene expression data. However, due to its enormous computational overhead in the...

Descripción completa

Detalles Bibliográficos
Autores principales:	Peng, Shaoliang, Yang, Shunyun, Bo, Xiaochen, Li, Fei
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2017
Materias:	Methods Online
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737394/ https://www.ncbi.nlm.nih.gov/pubmed/28973463 http://dx.doi.org/10.1093/nar/gkx679

_version_	1783287511417618432
author	Peng, Shaoliang Yang, Shunyun Bo, Xiaochen Li, Fei
author_facet	Peng, Shaoliang Yang, Shunyun Bo, Xiaochen Li, Fei
author_sort	Peng, Shaoliang
collection	PubMed
description	More studies have been conducted using gene expression similarity to identify functional connections among genes, diseases and drugs. Gene Set Enrichment Analysis (GSEA) is a powerful analytical method for interpreting gene expression data. However, due to its enormous computational overhead in the estimation of significance level step and multiple hypothesis testing step, the computation scalability and efficiency are poor on large-scale datasets. We proposed paraGSEA for efficient large-scale transcriptome data analysis. By optimization, the overall time complexity of paraGSEA is reduced from O(mn) to O(m+n), where m is the length of the gene sets and n is the length of the gene expression profiles, which contributes more than 100-fold increase in performance compared with other popular GSEA implementations such as GSEA-P, SAM-GS and GSEA2. By further parallelization, a near-linear speed-up is gained on both workstations and clusters in an efficient manner with high scalability and performance on large-scale datasets. The analysis time of whole LINCS phase I dataset (GSE92742) was reduced to nearly half hour on a 1000 node cluster on Tianhe-2, or within 120 hours on a 96-core workstation. The source code of paraGSEA is licensed under the GPLv3 and available at http://github.com/ysycloud/paraGSEA.
format	Online Article Text
id	pubmed-5737394
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-57373942018-01-08 paraGSEA: a scalable approach for large-scale gene expression profiling Peng, Shaoliang Yang, Shunyun Bo, Xiaochen Li, Fei Nucleic Acids Res Methods Online More studies have been conducted using gene expression similarity to identify functional connections among genes, diseases and drugs. Gene Set Enrichment Analysis (GSEA) is a powerful analytical method for interpreting gene expression data. However, due to its enormous computational overhead in the estimation of significance level step and multiple hypothesis testing step, the computation scalability and efficiency are poor on large-scale datasets. We proposed paraGSEA for efficient large-scale transcriptome data analysis. By optimization, the overall time complexity of paraGSEA is reduced from O(mn) to O(m+n), where m is the length of the gene sets and n is the length of the gene expression profiles, which contributes more than 100-fold increase in performance compared with other popular GSEA implementations such as GSEA-P, SAM-GS and GSEA2. By further parallelization, a near-linear speed-up is gained on both workstations and clusters in an efficient manner with high scalability and performance on large-scale datasets. The analysis time of whole LINCS phase I dataset (GSE92742) was reduced to nearly half hour on a 1000 node cluster on Tianhe-2, or within 120 hours on a 96-core workstation. The source code of paraGSEA is licensed under the GPLv3 and available at http://github.com/ysycloud/paraGSEA. Oxford University Press 2017-09-29 2017-07-31 /pmc/articles/PMC5737394/ /pubmed/28973463 http://dx.doi.org/10.1093/nar/gkx679 Text en © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Methods Online Peng, Shaoliang Yang, Shunyun Bo, Xiaochen Li, Fei paraGSEA: a scalable approach for large-scale gene expression profiling
title	paraGSEA: a scalable approach for large-scale gene expression profiling
title_full	paraGSEA: a scalable approach for large-scale gene expression profiling
title_fullStr	paraGSEA: a scalable approach for large-scale gene expression profiling
title_full_unstemmed	paraGSEA: a scalable approach for large-scale gene expression profiling
title_short	paraGSEA: a scalable approach for large-scale gene expression profiling
title_sort	paragsea: a scalable approach for large-scale gene expression profiling
topic	Methods Online
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737394/ https://www.ncbi.nlm.nih.gov/pubmed/28973463 http://dx.doi.org/10.1093/nar/gkx679
work_keys_str_mv	AT pengshaoliang paragseaascalableapproachforlargescalegeneexpressionprofiling AT yangshunyun paragseaascalableapproachforlargescalegeneexpressionprofiling AT boxiaochen paragseaascalableapproachforlargescalegeneexpressionprofiling AT lifei paragseaascalableapproachforlargescalegeneexpressionprofiling

paraGSEA: a scalable approach for large-scale gene expression profiling

Ejemplares similares