Cargando…

Combining Shapley value and statistics to the analysis of gene expression data in children exposed to air pollution

BACKGROUND: In gene expression analysis, statistical tests for differential gene expression provide lists of candidate genes having, individually, a sufficiently low p-value. However, the interpretation of each single p-value within complex systems involving several interacting genes is problematic....

Descripción completa

Detalles Bibliográficos
Autores principales: Moretti, Stefano, van Leeuwen, Danitsja, Gmuender, Hans, Bonassi, Stefano, van Delft, Joost, Kleinjans, Jos, Patrone, Fioravante, Merlo, Domenico Franco
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2556684/
https://www.ncbi.nlm.nih.gov/pubmed/18764936
http://dx.doi.org/10.1186/1471-2105-9-361
_version_ 1782159585650409472
author Moretti, Stefano
van Leeuwen, Danitsja
Gmuender, Hans
Bonassi, Stefano
van Delft, Joost
Kleinjans, Jos
Patrone, Fioravante
Merlo, Domenico Franco
author_facet Moretti, Stefano
van Leeuwen, Danitsja
Gmuender, Hans
Bonassi, Stefano
van Delft, Joost
Kleinjans, Jos
Patrone, Fioravante
Merlo, Domenico Franco
author_sort Moretti, Stefano
collection PubMed
description BACKGROUND: In gene expression analysis, statistical tests for differential gene expression provide lists of candidate genes having, individually, a sufficiently low p-value. However, the interpretation of each single p-value within complex systems involving several interacting genes is problematic. In parallel, in the last sixty years, game theory has been applied to political and social problems to assess the power of interacting agents in forcing a decision and, more recently, to represent the relevance of genes in response to certain conditions. RESULTS: In this paper we introduce a Bootstrap procedure to test the null hypothesis that each gene has the same relevance between two conditions, where the relevance is represented by the Shapley value of a particular coalitional game defined on a microarray data-set. This method, which is called Comparative Analysis of Shapley value (shortly, CASh), is applied to data concerning the gene expression in children differentially exposed to air pollution. The results provided by CASh are compared with the results from a parametric statistical test for testing differential gene expression. Both lists of genes provided by CASh and t-test are informative enough to discriminate exposed subjects on the basis of their gene expression profiles. While many genes are selected in common by CASh and the parametric test, it turns out that the biological interpretation of the differences between these two selections is more interesting, suggesting a different interpretation of the main biological pathways in gene expression regulation for exposed individuals. A simulation study suggests that CASh offers more power than t-test for the detection of differential gene expression variability. CONCLUSION: CASh is successfully applied to gene expression analysis of a data-set where the joint expression behavior of genes may be critical to characterize the expression response to air pollution. We demonstrate a synergistic effect between coalitional games and statistics that resulted in a selection of genes with a potential impact in the regulation of complex pathways.
format Text
id pubmed-2556684
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-25566842008-10-01 Combining Shapley value and statistics to the analysis of gene expression data in children exposed to air pollution Moretti, Stefano van Leeuwen, Danitsja Gmuender, Hans Bonassi, Stefano van Delft, Joost Kleinjans, Jos Patrone, Fioravante Merlo, Domenico Franco BMC Bioinformatics Methodology Article BACKGROUND: In gene expression analysis, statistical tests for differential gene expression provide lists of candidate genes having, individually, a sufficiently low p-value. However, the interpretation of each single p-value within complex systems involving several interacting genes is problematic. In parallel, in the last sixty years, game theory has been applied to political and social problems to assess the power of interacting agents in forcing a decision and, more recently, to represent the relevance of genes in response to certain conditions. RESULTS: In this paper we introduce a Bootstrap procedure to test the null hypothesis that each gene has the same relevance between two conditions, where the relevance is represented by the Shapley value of a particular coalitional game defined on a microarray data-set. This method, which is called Comparative Analysis of Shapley value (shortly, CASh), is applied to data concerning the gene expression in children differentially exposed to air pollution. The results provided by CASh are compared with the results from a parametric statistical test for testing differential gene expression. Both lists of genes provided by CASh and t-test are informative enough to discriminate exposed subjects on the basis of their gene expression profiles. While many genes are selected in common by CASh and the parametric test, it turns out that the biological interpretation of the differences between these two selections is more interesting, suggesting a different interpretation of the main biological pathways in gene expression regulation for exposed individuals. A simulation study suggests that CASh offers more power than t-test for the detection of differential gene expression variability. CONCLUSION: CASh is successfully applied to gene expression analysis of a data-set where the joint expression behavior of genes may be critical to characterize the expression response to air pollution. We demonstrate a synergistic effect between coalitional games and statistics that resulted in a selection of genes with a potential impact in the regulation of complex pathways. BioMed Central 2008-09-02 /pmc/articles/PMC2556684/ /pubmed/18764936 http://dx.doi.org/10.1186/1471-2105-9-361 Text en Copyright © 2008 Moretti et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Moretti, Stefano
van Leeuwen, Danitsja
Gmuender, Hans
Bonassi, Stefano
van Delft, Joost
Kleinjans, Jos
Patrone, Fioravante
Merlo, Domenico Franco
Combining Shapley value and statistics to the analysis of gene expression data in children exposed to air pollution
title Combining Shapley value and statistics to the analysis of gene expression data in children exposed to air pollution
title_full Combining Shapley value and statistics to the analysis of gene expression data in children exposed to air pollution
title_fullStr Combining Shapley value and statistics to the analysis of gene expression data in children exposed to air pollution
title_full_unstemmed Combining Shapley value and statistics to the analysis of gene expression data in children exposed to air pollution
title_short Combining Shapley value and statistics to the analysis of gene expression data in children exposed to air pollution
title_sort combining shapley value and statistics to the analysis of gene expression data in children exposed to air pollution
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2556684/
https://www.ncbi.nlm.nih.gov/pubmed/18764936
http://dx.doi.org/10.1186/1471-2105-9-361
work_keys_str_mv AT morettistefano combiningshapleyvalueandstatisticstotheanalysisofgeneexpressiondatainchildrenexposedtoairpollution
AT vanleeuwendanitsja combiningshapleyvalueandstatisticstotheanalysisofgeneexpressiondatainchildrenexposedtoairpollution
AT gmuenderhans combiningshapleyvalueandstatisticstotheanalysisofgeneexpressiondatainchildrenexposedtoairpollution
AT bonassistefano combiningshapleyvalueandstatisticstotheanalysisofgeneexpressiondatainchildrenexposedtoairpollution
AT vandelftjoost combiningshapleyvalueandstatisticstotheanalysisofgeneexpressiondatainchildrenexposedtoairpollution
AT kleinjansjos combiningshapleyvalueandstatisticstotheanalysisofgeneexpressiondatainchildrenexposedtoairpollution
AT patronefioravante combiningshapleyvalueandstatisticstotheanalysisofgeneexpressiondatainchildrenexposedtoairpollution
AT merlodomenicofranco combiningshapleyvalueandstatisticstotheanalysisofgeneexpressiondatainchildrenexposedtoairpollution