Cargando…

Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies

The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine le...

Descripción completa

Detalles Bibliográficos
Autores principales: Mieth, Bettina, Kloft, Marius, Rodríguez, Juan Antonio, Sonnenburg, Sören, Vobruba, Robin, Morcillo-Suárez, Carlos, Farré, Xavier, Marigorta, Urko M., Fehr, Ernst, Dickhaus, Thorsten, Blanchard, Gilles, Schunk, Daniel, Navarro, Arcadi, Müller, Klaus-Robert
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5125008/
https://www.ncbi.nlm.nih.gov/pubmed/27892471
http://dx.doi.org/10.1038/srep36671
_version_ 1782469917892673536
author Mieth, Bettina
Kloft, Marius
Rodríguez, Juan Antonio
Sonnenburg, Sören
Vobruba, Robin
Morcillo-Suárez, Carlos
Farré, Xavier
Marigorta, Urko M.
Fehr, Ernst
Dickhaus, Thorsten
Blanchard, Gilles
Schunk, Daniel
Navarro, Arcadi
Müller, Klaus-Robert
author_facet Mieth, Bettina
Kloft, Marius
Rodríguez, Juan Antonio
Sonnenburg, Sören
Vobruba, Robin
Morcillo-Suárez, Carlos
Farré, Xavier
Marigorta, Urko M.
Fehr, Ernst
Dickhaus, Thorsten
Blanchard, Gilles
Schunk, Daniel
Navarro, Arcadi
Müller, Klaus-Robert
author_sort Mieth, Bettina
collection PubMed
description The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008–2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0.
format Online
Article
Text
id pubmed-5125008
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-51250082016-12-08 Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies Mieth, Bettina Kloft, Marius Rodríguez, Juan Antonio Sonnenburg, Sören Vobruba, Robin Morcillo-Suárez, Carlos Farré, Xavier Marigorta, Urko M. Fehr, Ernst Dickhaus, Thorsten Blanchard, Gilles Schunk, Daniel Navarro, Arcadi Müller, Klaus-Robert Sci Rep Article The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008–2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0. Nature Publishing Group 2016-11-28 /pmc/articles/PMC5125008/ /pubmed/27892471 http://dx.doi.org/10.1038/srep36671 Text en Copyright © 2016, The Author(s) http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Mieth, Bettina
Kloft, Marius
Rodríguez, Juan Antonio
Sonnenburg, Sören
Vobruba, Robin
Morcillo-Suárez, Carlos
Farré, Xavier
Marigorta, Urko M.
Fehr, Ernst
Dickhaus, Thorsten
Blanchard, Gilles
Schunk, Daniel
Navarro, Arcadi
Müller, Klaus-Robert
Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies
title Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies
title_full Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies
title_fullStr Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies
title_full_unstemmed Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies
title_short Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies
title_sort combining multiple hypothesis testing with machine learning increases the statistical power of genome-wide association studies
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5125008/
https://www.ncbi.nlm.nih.gov/pubmed/27892471
http://dx.doi.org/10.1038/srep36671
work_keys_str_mv AT miethbettina combiningmultiplehypothesistestingwithmachinelearningincreasesthestatisticalpowerofgenomewideassociationstudies
AT kloftmarius combiningmultiplehypothesistestingwithmachinelearningincreasesthestatisticalpowerofgenomewideassociationstudies
AT rodriguezjuanantonio combiningmultiplehypothesistestingwithmachinelearningincreasesthestatisticalpowerofgenomewideassociationstudies
AT sonnenburgsoren combiningmultiplehypothesistestingwithmachinelearningincreasesthestatisticalpowerofgenomewideassociationstudies
AT vobrubarobin combiningmultiplehypothesistestingwithmachinelearningincreasesthestatisticalpowerofgenomewideassociationstudies
AT morcillosuarezcarlos combiningmultiplehypothesistestingwithmachinelearningincreasesthestatisticalpowerofgenomewideassociationstudies
AT farrexavier combiningmultiplehypothesistestingwithmachinelearningincreasesthestatisticalpowerofgenomewideassociationstudies
AT marigortaurkom combiningmultiplehypothesistestingwithmachinelearningincreasesthestatisticalpowerofgenomewideassociationstudies
AT fehrernst combiningmultiplehypothesistestingwithmachinelearningincreasesthestatisticalpowerofgenomewideassociationstudies
AT dickhausthorsten combiningmultiplehypothesistestingwithmachinelearningincreasesthestatisticalpowerofgenomewideassociationstudies
AT blanchardgilles combiningmultiplehypothesistestingwithmachinelearningincreasesthestatisticalpowerofgenomewideassociationstudies
AT schunkdaniel combiningmultiplehypothesistestingwithmachinelearningincreasesthestatisticalpowerofgenomewideassociationstudies
AT navarroarcadi combiningmultiplehypothesistestingwithmachinelearningincreasesthestatisticalpowerofgenomewideassociationstudies
AT mullerklausrobert combiningmultiplehypothesistestingwithmachinelearningincreasesthestatisticalpowerofgenomewideassociationstudies