Cargando…
Probabilistic prioritization of candidate pathway association with pathway score
BACKGROUND: Current methods for gene-set or pathway analysis are usually designed to test the enrichment of a single gene-set. Once the analysis is carried out for each of the sets under study, a list of significant sets can be obtained. However, if one wishes to further prioritize the importance or...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6201593/ https://www.ncbi.nlm.nih.gov/pubmed/30355338 http://dx.doi.org/10.1186/s12859-018-2411-z |
_version_ | 1783365538040250368 |
---|---|
author | Lin, Shu-Ju Lu, Tzu-Pin Yu, Qi-You Hsiao, Chuhsing Kate |
author_facet | Lin, Shu-Ju Lu, Tzu-Pin Yu, Qi-You Hsiao, Chuhsing Kate |
author_sort | Lin, Shu-Ju |
collection | PubMed |
description | BACKGROUND: Current methods for gene-set or pathway analysis are usually designed to test the enrichment of a single gene-set. Once the analysis is carried out for each of the sets under study, a list of significant sets can be obtained. However, if one wishes to further prioritize the importance or strength of association of these sets, no such quantitative measure is available. Using the magnitude of p-value to rank the pathways may not be appropriate because p-value is not a measure for strength of significance. In addition, when testing each pathway, these analyses are often implicitly affected by the number of differentially expressed genes included in the set and/or affected by the dependence among genes. RESULTS: Here we propose a two-stage procedure to prioritize the pathways/gene-sets. In the first stage we develop a pathway-level measure with three properties. First, it contains all genes (differentially expressed or not) in the same set, and summarizes the collective effect of all genes per sample. Second, this pathway score accounts for the correlation between genes by synchronizing their correlation directions. Third, the score includes a rank transformation to enhance the variation among samples as well as to avoid the influence of extreme heterogeneity among genes. In the second stage, all scores are included simultaneously in a Bayesian logistic regression model which can evaluate the strength of association for each set and rank the sets based on posterior probabilities. Simulations from Gaussian distributions and human microarray data, and a breast cancer study with RNA-Seq are considered for demonstration and comparison with other existing methods. CONCLUSIONS: The proposed summary pathway score provides for each sample an overall evaluation of gene expression in a gene-set. It demonstrates the advantages of including all genes in the set and the synchronization of correlation direction. The simultaneous utilization of all pathway-level scores in a Bayesian model not only offers a probabilistic evaluation and ranking of the pathway association but also presents good accuracy in identifying the top-ranking pathways. The resulting recommendation list of ranked pathways can be a reference for potential target therapy or for future allocation of research resources. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2411-z) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6201593 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-62015932018-10-31 Probabilistic prioritization of candidate pathway association with pathway score Lin, Shu-Ju Lu, Tzu-Pin Yu, Qi-You Hsiao, Chuhsing Kate BMC Bioinformatics Methodology Article BACKGROUND: Current methods for gene-set or pathway analysis are usually designed to test the enrichment of a single gene-set. Once the analysis is carried out for each of the sets under study, a list of significant sets can be obtained. However, if one wishes to further prioritize the importance or strength of association of these sets, no such quantitative measure is available. Using the magnitude of p-value to rank the pathways may not be appropriate because p-value is not a measure for strength of significance. In addition, when testing each pathway, these analyses are often implicitly affected by the number of differentially expressed genes included in the set and/or affected by the dependence among genes. RESULTS: Here we propose a two-stage procedure to prioritize the pathways/gene-sets. In the first stage we develop a pathway-level measure with three properties. First, it contains all genes (differentially expressed or not) in the same set, and summarizes the collective effect of all genes per sample. Second, this pathway score accounts for the correlation between genes by synchronizing their correlation directions. Third, the score includes a rank transformation to enhance the variation among samples as well as to avoid the influence of extreme heterogeneity among genes. In the second stage, all scores are included simultaneously in a Bayesian logistic regression model which can evaluate the strength of association for each set and rank the sets based on posterior probabilities. Simulations from Gaussian distributions and human microarray data, and a breast cancer study with RNA-Seq are considered for demonstration and comparison with other existing methods. CONCLUSIONS: The proposed summary pathway score provides for each sample an overall evaluation of gene expression in a gene-set. It demonstrates the advantages of including all genes in the set and the synchronization of correlation direction. The simultaneous utilization of all pathway-level scores in a Bayesian model not only offers a probabilistic evaluation and ranking of the pathway association but also presents good accuracy in identifying the top-ranking pathways. The resulting recommendation list of ranked pathways can be a reference for potential target therapy or for future allocation of research resources. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2411-z) contains supplementary material, which is available to authorized users. BioMed Central 2018-10-24 /pmc/articles/PMC6201593/ /pubmed/30355338 http://dx.doi.org/10.1186/s12859-018-2411-z Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Lin, Shu-Ju Lu, Tzu-Pin Yu, Qi-You Hsiao, Chuhsing Kate Probabilistic prioritization of candidate pathway association with pathway score |
title | Probabilistic prioritization of candidate pathway association with pathway score |
title_full | Probabilistic prioritization of candidate pathway association with pathway score |
title_fullStr | Probabilistic prioritization of candidate pathway association with pathway score |
title_full_unstemmed | Probabilistic prioritization of candidate pathway association with pathway score |
title_short | Probabilistic prioritization of candidate pathway association with pathway score |
title_sort | probabilistic prioritization of candidate pathway association with pathway score |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6201593/ https://www.ncbi.nlm.nih.gov/pubmed/30355338 http://dx.doi.org/10.1186/s12859-018-2411-z |
work_keys_str_mv | AT linshuju probabilisticprioritizationofcandidatepathwayassociationwithpathwayscore AT lutzupin probabilisticprioritizationofcandidatepathwayassociationwithpathwayscore AT yuqiyou probabilisticprioritizationofcandidatepathwayassociationwithpathwayscore AT hsiaochuhsingkate probabilisticprioritizationofcandidatepathwayassociationwithpathwayscore |