Cargando…

An efficient algorithm to perform multiple testing in epistasis screening

BACKGROUND: Research in epistasis or gene-gene interaction detection for human complex traits has grown over the last few years. It has been marked by promising methodological developments, improved translation efforts of statistical epistasis to biological epistasis and attempts to integrate differ...

Descripción completa

Detalles Bibliográficos
Autores principales: Lishout, François Van, Mahachie John, Jestinah M, Gusareva, Elena S, Urrea, Victor, Cleynen, Isabelle, Théâtre, Emilie, Charloteaux, Benoît, Calle, Malu Luz, Wehenkel, Louis, Steen, Kristel Van
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3648350/
https://www.ncbi.nlm.nih.gov/pubmed/23617239
http://dx.doi.org/10.1186/1471-2105-14-138
_version_ 1782268823562354688
author Lishout, François Van
Mahachie John, Jestinah M
Gusareva, Elena S
Urrea, Victor
Cleynen, Isabelle
Théâtre, Emilie
Charloteaux, Benoît
Calle, Malu Luz
Wehenkel, Louis
Steen, Kristel Van
author_facet Lishout, François Van
Mahachie John, Jestinah M
Gusareva, Elena S
Urrea, Victor
Cleynen, Isabelle
Théâtre, Emilie
Charloteaux, Benoît
Calle, Malu Luz
Wehenkel, Louis
Steen, Kristel Van
author_sort Lishout, François Van
collection PubMed
description BACKGROUND: Research in epistasis or gene-gene interaction detection for human complex traits has grown over the last few years. It has been marked by promising methodological developments, improved translation efforts of statistical epistasis to biological epistasis and attempts to integrate different omics information sources into the epistasis screening to enhance power. The quest for gene-gene interactions poses severe multiple-testing problems. In this context, the maxT algorithm is one technique to control the false-positive rate. However, the memory needed by this algorithm rises linearly with the amount of hypothesis tests. Gene-gene interaction studies will require a memory proportional to the squared number of SNPs. A genome-wide epistasis search would therefore require terabytes of memory. Hence, cache problems are likely to occur, increasing the computation time. In this work we present a new version of maxT, requiring an amount of memory independent from the number of genetic effects to be investigated. This algorithm was implemented in C++ in our epistasis screening software MBMDR-3.0.3. We evaluate the new implementation in terms of memory efficiency and speed using simulated data. The software is illustrated on real-life data for Crohn’s disease. RESULTS: In the case of a binary (affected/unaffected) trait, the parallel workflow of MBMDR-3.0.3 analyzes all gene-gene interactions with a dataset of 100,000 SNPs typed on 1000 individuals within 4 days and 9 hours, using 999 permutations of the trait to assess statistical significance, on a cluster composed of 10 blades, containing each four Quad-Core AMD Opteron(tm) Processor 2352 2.1 GHz. In the case of a continuous trait, a similar run takes 9 days. Our program found 14 SNP-SNP interactions with a multiple-testing corrected p-value of less than 0.05 on real-life Crohn’s disease (CD) data. CONCLUSIONS: Our software is the first implementation of the MB-MDR methodology able to solve large-scale SNP-SNP interactions problems within a few days, without using much memory, while adequately controlling the type I error rates. A new implementation to reach genome-wide epistasis screening is under construction. In the context of Crohn’s disease, MBMDR-3.0.3 could identify epistasis involving regions that are well known in the field and could be explained from a biological point of view. This demonstrates the power of our software to find relevant phenotype-genotype higher-order associations.
format Online
Article
Text
id pubmed-3648350
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-36483502013-05-10 An efficient algorithm to perform multiple testing in epistasis screening Lishout, François Van Mahachie John, Jestinah M Gusareva, Elena S Urrea, Victor Cleynen, Isabelle Théâtre, Emilie Charloteaux, Benoît Calle, Malu Luz Wehenkel, Louis Steen, Kristel Van BMC Bioinformatics Software BACKGROUND: Research in epistasis or gene-gene interaction detection for human complex traits has grown over the last few years. It has been marked by promising methodological developments, improved translation efforts of statistical epistasis to biological epistasis and attempts to integrate different omics information sources into the epistasis screening to enhance power. The quest for gene-gene interactions poses severe multiple-testing problems. In this context, the maxT algorithm is one technique to control the false-positive rate. However, the memory needed by this algorithm rises linearly with the amount of hypothesis tests. Gene-gene interaction studies will require a memory proportional to the squared number of SNPs. A genome-wide epistasis search would therefore require terabytes of memory. Hence, cache problems are likely to occur, increasing the computation time. In this work we present a new version of maxT, requiring an amount of memory independent from the number of genetic effects to be investigated. This algorithm was implemented in C++ in our epistasis screening software MBMDR-3.0.3. We evaluate the new implementation in terms of memory efficiency and speed using simulated data. The software is illustrated on real-life data for Crohn’s disease. RESULTS: In the case of a binary (affected/unaffected) trait, the parallel workflow of MBMDR-3.0.3 analyzes all gene-gene interactions with a dataset of 100,000 SNPs typed on 1000 individuals within 4 days and 9 hours, using 999 permutations of the trait to assess statistical significance, on a cluster composed of 10 blades, containing each four Quad-Core AMD Opteron(tm) Processor 2352 2.1 GHz. In the case of a continuous trait, a similar run takes 9 days. Our program found 14 SNP-SNP interactions with a multiple-testing corrected p-value of less than 0.05 on real-life Crohn’s disease (CD) data. CONCLUSIONS: Our software is the first implementation of the MB-MDR methodology able to solve large-scale SNP-SNP interactions problems within a few days, without using much memory, while adequately controlling the type I error rates. A new implementation to reach genome-wide epistasis screening is under construction. In the context of Crohn’s disease, MBMDR-3.0.3 could identify epistasis involving regions that are well known in the field and could be explained from a biological point of view. This demonstrates the power of our software to find relevant phenotype-genotype higher-order associations. BioMed Central 2013-04-24 /pmc/articles/PMC3648350/ /pubmed/23617239 http://dx.doi.org/10.1186/1471-2105-14-138 Text en Copyright © 2013 Van Lishout et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Lishout, François Van
Mahachie John, Jestinah M
Gusareva, Elena S
Urrea, Victor
Cleynen, Isabelle
Théâtre, Emilie
Charloteaux, Benoît
Calle, Malu Luz
Wehenkel, Louis
Steen, Kristel Van
An efficient algorithm to perform multiple testing in epistasis screening
title An efficient algorithm to perform multiple testing in epistasis screening
title_full An efficient algorithm to perform multiple testing in epistasis screening
title_fullStr An efficient algorithm to perform multiple testing in epistasis screening
title_full_unstemmed An efficient algorithm to perform multiple testing in epistasis screening
title_short An efficient algorithm to perform multiple testing in epistasis screening
title_sort efficient algorithm to perform multiple testing in epistasis screening
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3648350/
https://www.ncbi.nlm.nih.gov/pubmed/23617239
http://dx.doi.org/10.1186/1471-2105-14-138
work_keys_str_mv AT lishoutfrancoisvan anefficientalgorithmtoperformmultipletestinginepistasisscreening
AT mahachiejohnjestinahm anefficientalgorithmtoperformmultipletestinginepistasisscreening
AT gusarevaelenas anefficientalgorithmtoperformmultipletestinginepistasisscreening
AT urreavictor anefficientalgorithmtoperformmultipletestinginepistasisscreening
AT cleynenisabelle anefficientalgorithmtoperformmultipletestinginepistasisscreening
AT theatreemilie anefficientalgorithmtoperformmultipletestinginepistasisscreening
AT charloteauxbenoit anefficientalgorithmtoperformmultipletestinginepistasisscreening
AT callemaluluz anefficientalgorithmtoperformmultipletestinginepistasisscreening
AT wehenkellouis anefficientalgorithmtoperformmultipletestinginepistasisscreening
AT steenkristelvan anefficientalgorithmtoperformmultipletestinginepistasisscreening
AT lishoutfrancoisvan efficientalgorithmtoperformmultipletestinginepistasisscreening
AT mahachiejohnjestinahm efficientalgorithmtoperformmultipletestinginepistasisscreening
AT gusarevaelenas efficientalgorithmtoperformmultipletestinginepistasisscreening
AT urreavictor efficientalgorithmtoperformmultipletestinginepistasisscreening
AT cleynenisabelle efficientalgorithmtoperformmultipletestinginepistasisscreening
AT theatreemilie efficientalgorithmtoperformmultipletestinginepistasisscreening
AT charloteauxbenoit efficientalgorithmtoperformmultipletestinginepistasisscreening
AT callemaluluz efficientalgorithmtoperformmultipletestinginepistasisscreening
AT wehenkellouis efficientalgorithmtoperformmultipletestinginepistasisscreening
AT steenkristelvan efficientalgorithmtoperformmultipletestinginepistasisscreening