Cargando…

Importance of SNP Dependency Correction and Association Integration for Gene Set Analysis in Genome-Wide Association Studies

A typical genome-wide association study (GWAS) analyzes millions of single-nucleotide polymorphisms (SNPs), several of which are in a region of the same gene. To conduct gene set analysis (GSA), information from SNPs needs to be unified at the gene level. A widely used practice is to use only the mo...

Descripción completa

Detalles Bibliográficos
Autores principales:	Marczyk, Michal, Macioszek, Agnieszka, Tobiasz, Joanna, Polanska, Joanna, Zyla, Joanna
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2021
Materias:	Genetics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8696167/ https://www.ncbi.nlm.nih.gov/pubmed/34956320 http://dx.doi.org/10.3389/fgene.2021.767358

_version_	1784619746799386624
author	Marczyk, Michal Macioszek, Agnieszka Tobiasz, Joanna Polanska, Joanna Zyla, Joanna
author_facet	Marczyk, Michal Macioszek, Agnieszka Tobiasz, Joanna Polanska, Joanna Zyla, Joanna
author_sort	Marczyk, Michal
collection	PubMed
description	A typical genome-wide association study (GWAS) analyzes millions of single-nucleotide polymorphisms (SNPs), several of which are in a region of the same gene. To conduct gene set analysis (GSA), information from SNPs needs to be unified at the gene level. A widely used practice is to use only the most relevant SNP per gene; however, there are other methods of integration that could be applied here. Also, the problem of nonrandom association of alleles at two or more loci is often neglected. Here, we tested the impact of incorporation of different integrations and linkage disequilibrium (LD) correction on the performance of several GSA methods. Matched normal and breast cancer samples from The Cancer Genome Atlas database were used to evaluate the performance of six GSA algorithms: Coincident Extreme Ranks in Numerical Observations (CERNO), Gene Set Enrichment Analysis (GSEA), GSEA-SNP, improved GSEA for GWAS (i-GSEA4GWAS), Meta-Analysis Gene-set Enrichment of variaNT Associations (MAGENTA), and Over-Representation Analysis (ORA). Association of SNPs to phenotype was calculated using modified McNemar’s test. Results for SNPs mapped to the same gene were integrated using Fisher and Stouffer methods and compared with the minimum p-value method. Four common measures were used to quantify the performance of all combinations of methods. Results of GSA analysis on GWAS were compared to the one performed on gene expression data. Comparing all evaluation metrics across different GSA algorithms, integrations, and LD correction, we highlighted CERNO, and MAGENTA with Stouffer as the most efficient. Applying LD correction increased prioritization and specificity of enrichment outcomes for all tested algorithms. When Fisher or Stouffer were used with LD, sensitivity and reproducibility were also better. Using any integration method was beneficial in comparison with a minimum p-value method in specific combinations. The correlation between GSA results from genomic and transcriptomic level was the highest when Stouffer integration was combined with LD correction. We thoroughly evaluated different approaches to GSA in GWAS in terms of performance to guide others to select the most effective combinations. We showed that LD correction and Stouffer integration could increase the performance of enrichment analysis and encourage the usage of these techniques.
format	Online Article Text
id	pubmed-8696167
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-86961672021-12-24 Importance of SNP Dependency Correction and Association Integration for Gene Set Analysis in Genome-Wide Association Studies Marczyk, Michal Macioszek, Agnieszka Tobiasz, Joanna Polanska, Joanna Zyla, Joanna Front Genet Genetics A typical genome-wide association study (GWAS) analyzes millions of single-nucleotide polymorphisms (SNPs), several of which are in a region of the same gene. To conduct gene set analysis (GSA), information from SNPs needs to be unified at the gene level. A widely used practice is to use only the most relevant SNP per gene; however, there are other methods of integration that could be applied here. Also, the problem of nonrandom association of alleles at two or more loci is often neglected. Here, we tested the impact of incorporation of different integrations and linkage disequilibrium (LD) correction on the performance of several GSA methods. Matched normal and breast cancer samples from The Cancer Genome Atlas database were used to evaluate the performance of six GSA algorithms: Coincident Extreme Ranks in Numerical Observations (CERNO), Gene Set Enrichment Analysis (GSEA), GSEA-SNP, improved GSEA for GWAS (i-GSEA4GWAS), Meta-Analysis Gene-set Enrichment of variaNT Associations (MAGENTA), and Over-Representation Analysis (ORA). Association of SNPs to phenotype was calculated using modified McNemar’s test. Results for SNPs mapped to the same gene were integrated using Fisher and Stouffer methods and compared with the minimum p-value method. Four common measures were used to quantify the performance of all combinations of methods. Results of GSA analysis on GWAS were compared to the one performed on gene expression data. Comparing all evaluation metrics across different GSA algorithms, integrations, and LD correction, we highlighted CERNO, and MAGENTA with Stouffer as the most efficient. Applying LD correction increased prioritization and specificity of enrichment outcomes for all tested algorithms. When Fisher or Stouffer were used with LD, sensitivity and reproducibility were also better. Using any integration method was beneficial in comparison with a minimum p-value method in specific combinations. The correlation between GSA results from genomic and transcriptomic level was the highest when Stouffer integration was combined with LD correction. We thoroughly evaluated different approaches to GSA in GWAS in terms of performance to guide others to select the most effective combinations. We showed that LD correction and Stouffer integration could increase the performance of enrichment analysis and encourage the usage of these techniques. Frontiers Media S.A. 2021-12-09 /pmc/articles/PMC8696167/ /pubmed/34956320 http://dx.doi.org/10.3389/fgene.2021.767358 Text en Copyright © 2021 Marczyk, Macioszek, Tobiasz, Polanska and Zyla. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Genetics Marczyk, Michal Macioszek, Agnieszka Tobiasz, Joanna Polanska, Joanna Zyla, Joanna Importance of SNP Dependency Correction and Association Integration for Gene Set Analysis in Genome-Wide Association Studies
title	Importance of SNP Dependency Correction and Association Integration for Gene Set Analysis in Genome-Wide Association Studies
title_full	Importance of SNP Dependency Correction and Association Integration for Gene Set Analysis in Genome-Wide Association Studies
title_fullStr	Importance of SNP Dependency Correction and Association Integration for Gene Set Analysis in Genome-Wide Association Studies
title_full_unstemmed	Importance of SNP Dependency Correction and Association Integration for Gene Set Analysis in Genome-Wide Association Studies
title_short	Importance of SNP Dependency Correction and Association Integration for Gene Set Analysis in Genome-Wide Association Studies
title_sort	importance of snp dependency correction and association integration for gene set analysis in genome-wide association studies
topic	Genetics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8696167/ https://www.ncbi.nlm.nih.gov/pubmed/34956320 http://dx.doi.org/10.3389/fgene.2021.767358
work_keys_str_mv	AT marczykmichal importanceofsnpdependencycorrectionandassociationintegrationforgenesetanalysisingenomewideassociationstudies AT macioszekagnieszka importanceofsnpdependencycorrectionandassociationintegrationforgenesetanalysisingenomewideassociationstudies AT tobiaszjoanna importanceofsnpdependencycorrectionandassociationintegrationforgenesetanalysisingenomewideassociationstudies AT polanskajoanna importanceofsnpdependencycorrectionandassociationintegrationforgenesetanalysisingenomewideassociationstudies AT zylajoanna importanceofsnpdependencycorrectionandassociationintegrationforgenesetanalysisingenomewideassociationstudies

Importance of SNP Dependency Correction and Association Integration for Gene Set Analysis in Genome-Wide Association Studies

Ejemplares similares