Cargando…

SNP-based pathway enrichment analysis for genome-wide association studies

BACKGROUND: Recently we have witnessed a surge of interest in using genome-wide association studies (GWAS) to discover the genetic basis of complex diseases. Many genetic variations, mostly in the form of single nucleotide polymorphisms (SNPs), have been identified in a wide spectrum of diseases, in...

Descripción completa

Detalles Bibliográficos
Autores principales: Weng, Lingjie, Macciardi, Fabio, Subramanian, Aravind, Guffanti, Guia, Potkin, Steven G, Yu, Zhaoxia, Xie, Xiaohui
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3102637/
https://www.ncbi.nlm.nih.gov/pubmed/21496265
http://dx.doi.org/10.1186/1471-2105-12-99
_version_ 1782204399391604736
author Weng, Lingjie
Macciardi, Fabio
Subramanian, Aravind
Guffanti, Guia
Potkin, Steven G
Yu, Zhaoxia
Xie, Xiaohui
author_facet Weng, Lingjie
Macciardi, Fabio
Subramanian, Aravind
Guffanti, Guia
Potkin, Steven G
Yu, Zhaoxia
Xie, Xiaohui
author_sort Weng, Lingjie
collection PubMed
description BACKGROUND: Recently we have witnessed a surge of interest in using genome-wide association studies (GWAS) to discover the genetic basis of complex diseases. Many genetic variations, mostly in the form of single nucleotide polymorphisms (SNPs), have been identified in a wide spectrum of diseases, including diabetes, cancer, and psychiatric diseases. A common theme arising from these studies is that the genetic variations discovered by GWAS can only explain a small fraction of the genetic risks associated with the complex diseases. New strategies and statistical approaches are needed to address this lack of explanation. One such approach is the pathway analysis, which considers the genetic variations underlying a biological pathway, rather than separately as in the traditional GWAS studies. A critical challenge in the pathway analysis is how to combine evidences of association over multiple SNPs within a gene and multiple genes within a pathway. Most current methods choose the most significant SNP from each gene as a representative, ignoring the joint action of multiple SNPs within a gene. This approach leads to preferential identification of genes with a greater number of SNPs. RESULTS: We describe a SNP-based pathway enrichment method for GWAS studies. The method consists of the following two main steps: 1) for a given pathway, using an adaptive truncated product statistic to identify all representative (potentially more than one) SNPs of each gene, calculating the average number of representative SNPs for the genes, then re-selecting the representative SNPs of genes in the pathway based on this number; and 2) ranking all selected SNPs by the significance of their statistical association with a trait of interest, and testing if the set of SNPs from a particular pathway is significantly enriched with high ranks using a weighted Kolmogorov-Smirnov test. We applied our method to two large genetically distinct GWAS data sets of schizophrenia, one from European-American (EA) and the other from African-American (AA). In the EA data set, we found 22 pathways with nominal P-value less than or equal to 0.001 and corresponding false discovery rate (FDR) less than 5%. In the AA data set, we found 11 pathways by controlling the same nominal P-value and FDR threshold. Interestingly, 8 of these pathways overlap with those found in the EA sample. We have implemented our method in a JAVA software package, called SNP Set Enrichment Analysis (SSEA), which contains a user-friendly interface and is freely available at http://cbcl.ics.uci.edu/SSEA. CONCLUSIONS: The SNP-based pathway enrichment method described here offers a new alternative approach for analysing GWAS data. By applying it to schizophrenia GWAS studies, we show that our method is able to identify statistically significant pathways, and importantly, pathways that can be replicated in large genetically distinct samples.
format Text
id pubmed-3102637
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31026372011-05-27 SNP-based pathway enrichment analysis for genome-wide association studies Weng, Lingjie Macciardi, Fabio Subramanian, Aravind Guffanti, Guia Potkin, Steven G Yu, Zhaoxia Xie, Xiaohui BMC Bioinformatics Methodology Article BACKGROUND: Recently we have witnessed a surge of interest in using genome-wide association studies (GWAS) to discover the genetic basis of complex diseases. Many genetic variations, mostly in the form of single nucleotide polymorphisms (SNPs), have been identified in a wide spectrum of diseases, including diabetes, cancer, and psychiatric diseases. A common theme arising from these studies is that the genetic variations discovered by GWAS can only explain a small fraction of the genetic risks associated with the complex diseases. New strategies and statistical approaches are needed to address this lack of explanation. One such approach is the pathway analysis, which considers the genetic variations underlying a biological pathway, rather than separately as in the traditional GWAS studies. A critical challenge in the pathway analysis is how to combine evidences of association over multiple SNPs within a gene and multiple genes within a pathway. Most current methods choose the most significant SNP from each gene as a representative, ignoring the joint action of multiple SNPs within a gene. This approach leads to preferential identification of genes with a greater number of SNPs. RESULTS: We describe a SNP-based pathway enrichment method for GWAS studies. The method consists of the following two main steps: 1) for a given pathway, using an adaptive truncated product statistic to identify all representative (potentially more than one) SNPs of each gene, calculating the average number of representative SNPs for the genes, then re-selecting the representative SNPs of genes in the pathway based on this number; and 2) ranking all selected SNPs by the significance of their statistical association with a trait of interest, and testing if the set of SNPs from a particular pathway is significantly enriched with high ranks using a weighted Kolmogorov-Smirnov test. We applied our method to two large genetically distinct GWAS data sets of schizophrenia, one from European-American (EA) and the other from African-American (AA). In the EA data set, we found 22 pathways with nominal P-value less than or equal to 0.001 and corresponding false discovery rate (FDR) less than 5%. In the AA data set, we found 11 pathways by controlling the same nominal P-value and FDR threshold. Interestingly, 8 of these pathways overlap with those found in the EA sample. We have implemented our method in a JAVA software package, called SNP Set Enrichment Analysis (SSEA), which contains a user-friendly interface and is freely available at http://cbcl.ics.uci.edu/SSEA. CONCLUSIONS: The SNP-based pathway enrichment method described here offers a new alternative approach for analysing GWAS data. By applying it to schizophrenia GWAS studies, we show that our method is able to identify statistically significant pathways, and importantly, pathways that can be replicated in large genetically distinct samples. BioMed Central 2011-04-15 /pmc/articles/PMC3102637/ /pubmed/21496265 http://dx.doi.org/10.1186/1471-2105-12-99 Text en Copyright ©2011 Weng et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Weng, Lingjie
Macciardi, Fabio
Subramanian, Aravind
Guffanti, Guia
Potkin, Steven G
Yu, Zhaoxia
Xie, Xiaohui
SNP-based pathway enrichment analysis for genome-wide association studies
title SNP-based pathway enrichment analysis for genome-wide association studies
title_full SNP-based pathway enrichment analysis for genome-wide association studies
title_fullStr SNP-based pathway enrichment analysis for genome-wide association studies
title_full_unstemmed SNP-based pathway enrichment analysis for genome-wide association studies
title_short SNP-based pathway enrichment analysis for genome-wide association studies
title_sort snp-based pathway enrichment analysis for genome-wide association studies
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3102637/
https://www.ncbi.nlm.nih.gov/pubmed/21496265
http://dx.doi.org/10.1186/1471-2105-12-99
work_keys_str_mv AT wenglingjie snpbasedpathwayenrichmentanalysisforgenomewideassociationstudies
AT macciardifabio snpbasedpathwayenrichmentanalysisforgenomewideassociationstudies
AT subramanianaravind snpbasedpathwayenrichmentanalysisforgenomewideassociationstudies
AT guffantiguia snpbasedpathwayenrichmentanalysisforgenomewideassociationstudies
AT potkinsteveng snpbasedpathwayenrichmentanalysisforgenomewideassociationstudies
AT yuzhaoxia snpbasedpathwayenrichmentanalysisforgenomewideassociationstudies
AT xiexiaohui snpbasedpathwayenrichmentanalysisforgenomewideassociationstudies