Cargando…

Genetic association studies for gene expressions: permutation-based mutual information in a comparison with standard ANOVA and as a novel approach for feature selection

Mutual information (MI) is a robust nonparametric statistical approach for identifying associations between genotypes and gene expression levels. Using the data of Problem 1 provided for the Genetic Analysis Workshop 15, we first compared a quantitative MI (Tsalenko et al. 2006 J Bioinform Comput Bi...

Descripción completa

Detalles Bibliográficos
Autores principales: Szymczak, Silke, Nuzzo, Angelo, Fuchsberger, Christian, Schwarz, Daniel F, Ziegler, Andreas, Bellazzi, Riccardo, Igl, Bernd-Wolfgang
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2359872/
https://www.ncbi.nlm.nih.gov/pubmed/18466593
_version_ 1782152915210731520
author Szymczak, Silke
Nuzzo, Angelo
Fuchsberger, Christian
Schwarz, Daniel F
Ziegler, Andreas
Bellazzi, Riccardo
Igl, Bernd-Wolfgang
author_facet Szymczak, Silke
Nuzzo, Angelo
Fuchsberger, Christian
Schwarz, Daniel F
Ziegler, Andreas
Bellazzi, Riccardo
Igl, Bernd-Wolfgang
author_sort Szymczak, Silke
collection PubMed
description Mutual information (MI) is a robust nonparametric statistical approach for identifying associations between genotypes and gene expression levels. Using the data of Problem 1 provided for the Genetic Analysis Workshop 15, we first compared a quantitative MI (Tsalenko et al. 2006 J Bioinform Comput Biol 4:259–4) with the standard analysis of variance (ANOVA) and the nonparametric Kruskal-Wallis (KW) test. We then proposed a novel feature selection approach using MI in a classification scenario to address the small n - large p problem and compared it with a feature selection that relies on an asymptotic χ(2 ) distribution. In both applications, we used a permutation-based approach for evaluating the significance of MI. Substantial discrepancies in significance were observed between MI, ANOVA, and KW that can be explained by different empirical distributions of the data. In contrast to ANOVA and KW, MI detects shifts in location when the data are non-normally distributed, skewed, or contaminated with outliers. ANOVA but not MI is often significant if one genotype with a small frequency had a remarkable difference in the average gene expression level relative to the other two genotypes. MI depends on genotype frequencies and cannot detect these differences. In the classification scenario, we show that our novel approach for feature selection identifies a smaller list of markers with higher accuracy compared to the standard method. In conclusion, permutation-based MI approaches provide reliable and flexible statistical frameworks which seem to be well suited for data that are non-normal, skewed, or have an otherwise peculiar distribution. They merit further methodological investigation.
format Text
id pubmed-2359872
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-23598722008-05-06 Genetic association studies for gene expressions: permutation-based mutual information in a comparison with standard ANOVA and as a novel approach for feature selection Szymczak, Silke Nuzzo, Angelo Fuchsberger, Christian Schwarz, Daniel F Ziegler, Andreas Bellazzi, Riccardo Igl, Bernd-Wolfgang BMC Proc Proceedings Mutual information (MI) is a robust nonparametric statistical approach for identifying associations between genotypes and gene expression levels. Using the data of Problem 1 provided for the Genetic Analysis Workshop 15, we first compared a quantitative MI (Tsalenko et al. 2006 J Bioinform Comput Biol 4:259–4) with the standard analysis of variance (ANOVA) and the nonparametric Kruskal-Wallis (KW) test. We then proposed a novel feature selection approach using MI in a classification scenario to address the small n - large p problem and compared it with a feature selection that relies on an asymptotic χ(2 ) distribution. In both applications, we used a permutation-based approach for evaluating the significance of MI. Substantial discrepancies in significance were observed between MI, ANOVA, and KW that can be explained by different empirical distributions of the data. In contrast to ANOVA and KW, MI detects shifts in location when the data are non-normally distributed, skewed, or contaminated with outliers. ANOVA but not MI is often significant if one genotype with a small frequency had a remarkable difference in the average gene expression level relative to the other two genotypes. MI depends on genotype frequencies and cannot detect these differences. In the classification scenario, we show that our novel approach for feature selection identifies a smaller list of markers with higher accuracy compared to the standard method. In conclusion, permutation-based MI approaches provide reliable and flexible statistical frameworks which seem to be well suited for data that are non-normal, skewed, or have an otherwise peculiar distribution. They merit further methodological investigation. BioMed Central 2007-12-18 /pmc/articles/PMC2359872/ /pubmed/18466593 Text en Copyright © 2007 Szymczak et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Szymczak, Silke
Nuzzo, Angelo
Fuchsberger, Christian
Schwarz, Daniel F
Ziegler, Andreas
Bellazzi, Riccardo
Igl, Bernd-Wolfgang
Genetic association studies for gene expressions: permutation-based mutual information in a comparison with standard ANOVA and as a novel approach for feature selection
title Genetic association studies for gene expressions: permutation-based mutual information in a comparison with standard ANOVA and as a novel approach for feature selection
title_full Genetic association studies for gene expressions: permutation-based mutual information in a comparison with standard ANOVA and as a novel approach for feature selection
title_fullStr Genetic association studies for gene expressions: permutation-based mutual information in a comparison with standard ANOVA and as a novel approach for feature selection
title_full_unstemmed Genetic association studies for gene expressions: permutation-based mutual information in a comparison with standard ANOVA and as a novel approach for feature selection
title_short Genetic association studies for gene expressions: permutation-based mutual information in a comparison with standard ANOVA and as a novel approach for feature selection
title_sort genetic association studies for gene expressions: permutation-based mutual information in a comparison with standard anova and as a novel approach for feature selection
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2359872/
https://www.ncbi.nlm.nih.gov/pubmed/18466593
work_keys_str_mv AT szymczaksilke geneticassociationstudiesforgeneexpressionspermutationbasedmutualinformationinacomparisonwithstandardanovaandasanovelapproachforfeatureselection
AT nuzzoangelo geneticassociationstudiesforgeneexpressionspermutationbasedmutualinformationinacomparisonwithstandardanovaandasanovelapproachforfeatureselection
AT fuchsbergerchristian geneticassociationstudiesforgeneexpressionspermutationbasedmutualinformationinacomparisonwithstandardanovaandasanovelapproachforfeatureselection
AT schwarzdanielf geneticassociationstudiesforgeneexpressionspermutationbasedmutualinformationinacomparisonwithstandardanovaandasanovelapproachforfeatureselection
AT zieglerandreas geneticassociationstudiesforgeneexpressionspermutationbasedmutualinformationinacomparisonwithstandardanovaandasanovelapproachforfeatureselection
AT bellazziriccardo geneticassociationstudiesforgeneexpressionspermutationbasedmutualinformationinacomparisonwithstandardanovaandasanovelapproachforfeatureselection
AT iglberndwolfgang geneticassociationstudiesforgeneexpressionspermutationbasedmutualinformationinacomparisonwithstandardanovaandasanovelapproachforfeatureselection