Cargando…
Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies
The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine le...
Autores principales: | , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5125008/ https://www.ncbi.nlm.nih.gov/pubmed/27892471 http://dx.doi.org/10.1038/srep36671 |
_version_ | 1782469917892673536 |
---|---|
author | Mieth, Bettina Kloft, Marius Rodríguez, Juan Antonio Sonnenburg, Sören Vobruba, Robin Morcillo-Suárez, Carlos Farré, Xavier Marigorta, Urko M. Fehr, Ernst Dickhaus, Thorsten Blanchard, Gilles Schunk, Daniel Navarro, Arcadi Müller, Klaus-Robert |
author_facet | Mieth, Bettina Kloft, Marius Rodríguez, Juan Antonio Sonnenburg, Sören Vobruba, Robin Morcillo-Suárez, Carlos Farré, Xavier Marigorta, Urko M. Fehr, Ernst Dickhaus, Thorsten Blanchard, Gilles Schunk, Daniel Navarro, Arcadi Müller, Klaus-Robert |
author_sort | Mieth, Bettina |
collection | PubMed |
description | The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008–2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0. |
format | Online Article Text |
id | pubmed-5125008 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Nature Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-51250082016-12-08 Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies Mieth, Bettina Kloft, Marius Rodríguez, Juan Antonio Sonnenburg, Sören Vobruba, Robin Morcillo-Suárez, Carlos Farré, Xavier Marigorta, Urko M. Fehr, Ernst Dickhaus, Thorsten Blanchard, Gilles Schunk, Daniel Navarro, Arcadi Müller, Klaus-Robert Sci Rep Article The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008–2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0. Nature Publishing Group 2016-11-28 /pmc/articles/PMC5125008/ /pubmed/27892471 http://dx.doi.org/10.1038/srep36671 Text en Copyright © 2016, The Author(s) http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ |
spellingShingle | Article Mieth, Bettina Kloft, Marius Rodríguez, Juan Antonio Sonnenburg, Sören Vobruba, Robin Morcillo-Suárez, Carlos Farré, Xavier Marigorta, Urko M. Fehr, Ernst Dickhaus, Thorsten Blanchard, Gilles Schunk, Daniel Navarro, Arcadi Müller, Klaus-Robert Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies |
title | Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies |
title_full | Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies |
title_fullStr | Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies |
title_full_unstemmed | Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies |
title_short | Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies |
title_sort | combining multiple hypothesis testing with machine learning increases the statistical power of genome-wide association studies |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5125008/ https://www.ncbi.nlm.nih.gov/pubmed/27892471 http://dx.doi.org/10.1038/srep36671 |
work_keys_str_mv | AT miethbettina combiningmultiplehypothesistestingwithmachinelearningincreasesthestatisticalpowerofgenomewideassociationstudies AT kloftmarius combiningmultiplehypothesistestingwithmachinelearningincreasesthestatisticalpowerofgenomewideassociationstudies AT rodriguezjuanantonio combiningmultiplehypothesistestingwithmachinelearningincreasesthestatisticalpowerofgenomewideassociationstudies AT sonnenburgsoren combiningmultiplehypothesistestingwithmachinelearningincreasesthestatisticalpowerofgenomewideassociationstudies AT vobrubarobin combiningmultiplehypothesistestingwithmachinelearningincreasesthestatisticalpowerofgenomewideassociationstudies AT morcillosuarezcarlos combiningmultiplehypothesistestingwithmachinelearningincreasesthestatisticalpowerofgenomewideassociationstudies AT farrexavier combiningmultiplehypothesistestingwithmachinelearningincreasesthestatisticalpowerofgenomewideassociationstudies AT marigortaurkom combiningmultiplehypothesistestingwithmachinelearningincreasesthestatisticalpowerofgenomewideassociationstudies AT fehrernst combiningmultiplehypothesistestingwithmachinelearningincreasesthestatisticalpowerofgenomewideassociationstudies AT dickhausthorsten combiningmultiplehypothesistestingwithmachinelearningincreasesthestatisticalpowerofgenomewideassociationstudies AT blanchardgilles combiningmultiplehypothesistestingwithmachinelearningincreasesthestatisticalpowerofgenomewideassociationstudies AT schunkdaniel combiningmultiplehypothesistestingwithmachinelearningincreasesthestatisticalpowerofgenomewideassociationstudies AT navarroarcadi combiningmultiplehypothesistestingwithmachinelearningincreasesthestatisticalpowerofgenomewideassociationstudies AT mullerklausrobert combiningmultiplehypothesistestingwithmachinelearningincreasesthestatisticalpowerofgenomewideassociationstudies |