Cargando…

Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants

Gene-based tests of association (e.g., variance components and burden tests) are now common practice for analyses attempting to elucidate the contribution of rare genetic variants on common disease. As sequencing datasets continue to grow in size, the number of variants within each set (e.g., gene)...

Descripción completa

Detalles Bibliográficos
Autores principales: Fore, Ruby, Boehme, Jaden, Li, Kevin, Westra, Jason, Tintle, Nathan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7680887/
https://www.ncbi.nlm.nih.gov/pubmed/33240333
http://dx.doi.org/10.3389/fgene.2020.591606
_version_ 1783612523798331392
author Fore, Ruby
Boehme, Jaden
Li, Kevin
Westra, Jason
Tintle, Nathan
author_facet Fore, Ruby
Boehme, Jaden
Li, Kevin
Westra, Jason
Tintle, Nathan
author_sort Fore, Ruby
collection PubMed
description Gene-based tests of association (e.g., variance components and burden tests) are now common practice for analyses attempting to elucidate the contribution of rare genetic variants on common disease. As sequencing datasets continue to grow in size, the number of variants within each set (e.g., gene) being tested is also continuing to grow. Pathway-based methods have been used to allow for the initial aggregation of gene-based statistical evidence and then the subsequent aggregation of evidence across the pathway. This “multi-set” approach (first gene-based test, followed by pathway-based) lacks thorough exploration in regard to evaluating genotype–phenotype associations in the age of large, sequenced datasets. In particular, we wonder whether there are statistical and biological characteristics that make the multi-set approach optimal vs. simply doing all gene-based tests? In this paper, we provide an intuitive framework for evaluating these questions and use simulated data to affirm us this intuition. A real data application is provided demonstrating how our insights manifest themselves in practice. Ultimately, we find that when initial subsets are biologically informative (e.g., tending to aggregate causal genetic variants within one or more subsets, often genes), multi-set strategies can improve statistical power, with particular gains in cases where causal variants are aggregated in subsets with less variants overall (high proportion of causal variants in the subset). However, we find that there is little advantage when the sets are non-informative (similar proportion of causal variants in the subsets). Our application to real data further demonstrates this intuition. In practice, we recommend wider use of pathway-based methods and further exploration of optimal ways of aggregating variants into subsets based on emerging biological evidence of the genetic architecture of complex disease.
format Online
Article
Text
id pubmed-7680887
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-76808872020-11-24 Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants Fore, Ruby Boehme, Jaden Li, Kevin Westra, Jason Tintle, Nathan Front Genet Genetics Gene-based tests of association (e.g., variance components and burden tests) are now common practice for analyses attempting to elucidate the contribution of rare genetic variants on common disease. As sequencing datasets continue to grow in size, the number of variants within each set (e.g., gene) being tested is also continuing to grow. Pathway-based methods have been used to allow for the initial aggregation of gene-based statistical evidence and then the subsequent aggregation of evidence across the pathway. This “multi-set” approach (first gene-based test, followed by pathway-based) lacks thorough exploration in regard to evaluating genotype–phenotype associations in the age of large, sequenced datasets. In particular, we wonder whether there are statistical and biological characteristics that make the multi-set approach optimal vs. simply doing all gene-based tests? In this paper, we provide an intuitive framework for evaluating these questions and use simulated data to affirm us this intuition. A real data application is provided demonstrating how our insights manifest themselves in practice. Ultimately, we find that when initial subsets are biologically informative (e.g., tending to aggregate causal genetic variants within one or more subsets, often genes), multi-set strategies can improve statistical power, with particular gains in cases where causal variants are aggregated in subsets with less variants overall (high proportion of causal variants in the subset). However, we find that there is little advantage when the sets are non-informative (similar proportion of causal variants in the subsets). Our application to real data further demonstrates this intuition. In practice, we recommend wider use of pathway-based methods and further exploration of optimal ways of aggregating variants into subsets based on emerging biological evidence of the genetic architecture of complex disease. Frontiers Media S.A. 2020-11-09 /pmc/articles/PMC7680887/ /pubmed/33240333 http://dx.doi.org/10.3389/fgene.2020.591606 Text en Copyright © 2020 Fore, Boehme, Li, Westra and Tintle. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Fore, Ruby
Boehme, Jaden
Li, Kevin
Westra, Jason
Tintle, Nathan
Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants
title Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants
title_full Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants
title_fullStr Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants
title_full_unstemmed Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants
title_short Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants
title_sort multi-set testing strategies show good behavior when applied to very large sets of rare variants
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7680887/
https://www.ncbi.nlm.nih.gov/pubmed/33240333
http://dx.doi.org/10.3389/fgene.2020.591606
work_keys_str_mv AT foreruby multisettestingstrategiesshowgoodbehaviorwhenappliedtoverylargesetsofrarevariants
AT boehmejaden multisettestingstrategiesshowgoodbehaviorwhenappliedtoverylargesetsofrarevariants
AT likevin multisettestingstrategiesshowgoodbehaviorwhenappliedtoverylargesetsofrarevariants
AT westrajason multisettestingstrategiesshowgoodbehaviorwhenappliedtoverylargesetsofrarevariants
AT tintlenathan multisettestingstrategiesshowgoodbehaviorwhenappliedtoverylargesetsofrarevariants