Cargando…

Stability of Bivariate GWAS Biomarker Detection

Given the difficulty and effort required to confirm candidate causal SNPs detected in genome-wide association studies (GWAS), there is no practical way to definitively filter false positives. Recent advances in algorithmics and statistics have enabled repeated exhaustive search for bivariate feature...

Descripción completa

Detalles Bibliográficos
Autores principales: Bedő, Justin, Rawlinson, David, Goudey, Benjamin, Ong, Cheng Soon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4005767/
https://www.ncbi.nlm.nih.gov/pubmed/24787002
http://dx.doi.org/10.1371/journal.pone.0093319
_version_ 1782314153845719040
author Bedő, Justin
Rawlinson, David
Goudey, Benjamin
Ong, Cheng Soon
author_facet Bedő, Justin
Rawlinson, David
Goudey, Benjamin
Ong, Cheng Soon
author_sort Bedő, Justin
collection PubMed
description Given the difficulty and effort required to confirm candidate causal SNPs detected in genome-wide association studies (GWAS), there is no practical way to definitively filter false positives. Recent advances in algorithmics and statistics have enabled repeated exhaustive search for bivariate features in a practical amount of time using standard computational resources, allowing us to use cross-validation to evaluate the stability. We performed 10 trials of 2-fold cross-validation of exhaustive bivariate analysis on seven Wellcome–Trust Case–Control Consortium GWAS datasets, comparing the traditional [Image: see text] test for association, the high-performance GBOOST method and the recently proposed GSS statistic (Available at http://bioinformatics.research.nicta.com.au/software/gwis/). We use Spearman's correlation to measure the similarity between the folds of cross validation. To compare incomplete lists of ranks we propose an extension to Spearman's correlation. The extension allows us to consider a natural threshold for feature selection where the correlation is zero. This is the first reported cross-validation study of exhaustive bivariate GWAS feature selection. We found that stability between ranked lists from different cross-validation folds was higher for GSS in the majority of diseases. A thorough analysis of the correlation between SNP-frequency and univariate [Image: see text] score demonstrated that the [Image: see text] test for association is highly confounded by main effects: SNPs with high univariate significance replicably dominate the ranked results. We show that removal of the univariately significant SNPs improves [Image: see text] replicability but risks filtering pairs involving SNPs with univariate effects. We empirically confirm that the stability of GSS and GBOOST were not affected by removal of univariately significant SNPs. These results suggest that the GSS and GBOOST tests are successfully targeting bivariate association with phenotype and that GSS is able to reliably detect a larger set of SNP-pairs than GBOOST in the majority of the data we analysed. However, the [Image: see text] test for association was confounded by main effects.
format Online
Article
Text
id pubmed-4005767
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-40057672014-05-09 Stability of Bivariate GWAS Biomarker Detection Bedő, Justin Rawlinson, David Goudey, Benjamin Ong, Cheng Soon PLoS One Research Article Given the difficulty and effort required to confirm candidate causal SNPs detected in genome-wide association studies (GWAS), there is no practical way to definitively filter false positives. Recent advances in algorithmics and statistics have enabled repeated exhaustive search for bivariate features in a practical amount of time using standard computational resources, allowing us to use cross-validation to evaluate the stability. We performed 10 trials of 2-fold cross-validation of exhaustive bivariate analysis on seven Wellcome–Trust Case–Control Consortium GWAS datasets, comparing the traditional [Image: see text] test for association, the high-performance GBOOST method and the recently proposed GSS statistic (Available at http://bioinformatics.research.nicta.com.au/software/gwis/). We use Spearman's correlation to measure the similarity between the folds of cross validation. To compare incomplete lists of ranks we propose an extension to Spearman's correlation. The extension allows us to consider a natural threshold for feature selection where the correlation is zero. This is the first reported cross-validation study of exhaustive bivariate GWAS feature selection. We found that stability between ranked lists from different cross-validation folds was higher for GSS in the majority of diseases. A thorough analysis of the correlation between SNP-frequency and univariate [Image: see text] score demonstrated that the [Image: see text] test for association is highly confounded by main effects: SNPs with high univariate significance replicably dominate the ranked results. We show that removal of the univariately significant SNPs improves [Image: see text] replicability but risks filtering pairs involving SNPs with univariate effects. We empirically confirm that the stability of GSS and GBOOST were not affected by removal of univariately significant SNPs. These results suggest that the GSS and GBOOST tests are successfully targeting bivariate association with phenotype and that GSS is able to reliably detect a larger set of SNP-pairs than GBOOST in the majority of the data we analysed. However, the [Image: see text] test for association was confounded by main effects. Public Library of Science 2014-04-30 /pmc/articles/PMC4005767/ /pubmed/24787002 http://dx.doi.org/10.1371/journal.pone.0093319 Text en © 2014 Bedő et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Bedő, Justin
Rawlinson, David
Goudey, Benjamin
Ong, Cheng Soon
Stability of Bivariate GWAS Biomarker Detection
title Stability of Bivariate GWAS Biomarker Detection
title_full Stability of Bivariate GWAS Biomarker Detection
title_fullStr Stability of Bivariate GWAS Biomarker Detection
title_full_unstemmed Stability of Bivariate GWAS Biomarker Detection
title_short Stability of Bivariate GWAS Biomarker Detection
title_sort stability of bivariate gwas biomarker detection
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4005767/
https://www.ncbi.nlm.nih.gov/pubmed/24787002
http://dx.doi.org/10.1371/journal.pone.0093319
work_keys_str_mv AT bedojustin stabilityofbivariategwasbiomarkerdetection
AT rawlinsondavid stabilityofbivariategwasbiomarkerdetection
AT goudeybenjamin stabilityofbivariategwasbiomarkerdetection
AT ongchengsoon stabilityofbivariategwasbiomarkerdetection