Cargando…

Prioritizing tests of epistasis through hierarchical representation of genomic redundancies

Epistasis is defined as a statistical interaction between two or more genomic loci in terms of their association with a phenotype of interest. Epistatic loci that are identified using data from Genome-Wide Association Studies (GWAS) provide insights into the interplay among multiple genetic factors,...

Descripción completa

Detalles Bibliográficos
Autores principales: Cowman, Tyler, Koyutürk, Mehmet
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737499/
https://www.ncbi.nlm.nih.gov/pubmed/28605458
http://dx.doi.org/10.1093/nar/gkx505
_version_ 1783287529370288128
author Cowman, Tyler
Koyutürk, Mehmet
author_facet Cowman, Tyler
Koyutürk, Mehmet
author_sort Cowman, Tyler
collection PubMed
description Epistasis is defined as a statistical interaction between two or more genomic loci in terms of their association with a phenotype of interest. Epistatic loci that are identified using data from Genome-Wide Association Studies (GWAS) provide insights into the interplay among multiple genetic factors, with applications including assessment of susceptibility to complex diseases, decision making in precision medicine, and gaining insights into disease mechanisms. Since the number of genomic loci assayed by GWAS is extremely large (usually in the order of millions), identification of epistatic loci is a statistically difficult and computationally intensive problem. Even when only pairwise interactions are considered, the size of the search space ranges from hundreds of millions to billions of locus pairs. The large number of statistical tests performed also makes sufficient type one error correction imperative. Consequently, efficient algorithms are required to filter the tests that are performed and evaluate large GWAS data sets in a reasonable amount of computation time. It has been observed that many pairwise tests are redundant due to correlations in their genotype values across samples, known as linkage disequilibrium. However, algorithms that have been developed for efficient identification of epistatic loci do not systematically exploit linkage disequilibrium. Here, we propose a new algorithm for fast epistasis detection based on hierarchical representation of linkage disequilibrium (LinDen). We utilize redundancies in genotype patterns between neighboring loci to generate a hierarchical structure and execute a branch-and-bound search to prioritize loci testing based on approximations of a test statistic for pairs of locus groups. The hierarchical organization of tests performed by LinDen allows for efficient scaling based on the screened loci. We test LinDen comprehensively on three data sets obtained from the Wellcome Trust Case Control Consortium: type two diabetes, psoriasis, and hypertension. Our results show that, as compared other state-of-the-art tools for fast epistasis detection, LinDen drastically reduces the number of tests performed while discovering statistically significant locus pairs. LinDen is implemented in C++ and is available as open source at http://compbio.case.edu/linden/.
format Online
Article
Text
id pubmed-5737499
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-57374992018-01-09 Prioritizing tests of epistasis through hierarchical representation of genomic redundancies Cowman, Tyler Koyutürk, Mehmet Nucleic Acids Res Methods Online Epistasis is defined as a statistical interaction between two or more genomic loci in terms of their association with a phenotype of interest. Epistatic loci that are identified using data from Genome-Wide Association Studies (GWAS) provide insights into the interplay among multiple genetic factors, with applications including assessment of susceptibility to complex diseases, decision making in precision medicine, and gaining insights into disease mechanisms. Since the number of genomic loci assayed by GWAS is extremely large (usually in the order of millions), identification of epistatic loci is a statistically difficult and computationally intensive problem. Even when only pairwise interactions are considered, the size of the search space ranges from hundreds of millions to billions of locus pairs. The large number of statistical tests performed also makes sufficient type one error correction imperative. Consequently, efficient algorithms are required to filter the tests that are performed and evaluate large GWAS data sets in a reasonable amount of computation time. It has been observed that many pairwise tests are redundant due to correlations in their genotype values across samples, known as linkage disequilibrium. However, algorithms that have been developed for efficient identification of epistatic loci do not systematically exploit linkage disequilibrium. Here, we propose a new algorithm for fast epistasis detection based on hierarchical representation of linkage disequilibrium (LinDen). We utilize redundancies in genotype patterns between neighboring loci to generate a hierarchical structure and execute a branch-and-bound search to prioritize loci testing based on approximations of a test statistic for pairs of locus groups. The hierarchical organization of tests performed by LinDen allows for efficient scaling based on the screened loci. We test LinDen comprehensively on three data sets obtained from the Wellcome Trust Case Control Consortium: type two diabetes, psoriasis, and hypertension. Our results show that, as compared other state-of-the-art tools for fast epistasis detection, LinDen drastically reduces the number of tests performed while discovering statistically significant locus pairs. LinDen is implemented in C++ and is available as open source at http://compbio.case.edu/linden/. Oxford University Press 2017-08-21 2017-06-09 /pmc/articles/PMC5737499/ /pubmed/28605458 http://dx.doi.org/10.1093/nar/gkx505 Text en © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Cowman, Tyler
Koyutürk, Mehmet
Prioritizing tests of epistasis through hierarchical representation of genomic redundancies
title Prioritizing tests of epistasis through hierarchical representation of genomic redundancies
title_full Prioritizing tests of epistasis through hierarchical representation of genomic redundancies
title_fullStr Prioritizing tests of epistasis through hierarchical representation of genomic redundancies
title_full_unstemmed Prioritizing tests of epistasis through hierarchical representation of genomic redundancies
title_short Prioritizing tests of epistasis through hierarchical representation of genomic redundancies
title_sort prioritizing tests of epistasis through hierarchical representation of genomic redundancies
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737499/
https://www.ncbi.nlm.nih.gov/pubmed/28605458
http://dx.doi.org/10.1093/nar/gkx505
work_keys_str_mv AT cowmantyler prioritizingtestsofepistasisthroughhierarchicalrepresentationofgenomicredundancies
AT koyuturkmehmet prioritizingtestsofepistasisthroughhierarchicalrepresentationofgenomicredundancies