Cargando…
Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants
Gene-based tests of association (e.g., variance components and burden tests) are now common practice for analyses attempting to elucidate the contribution of rare genetic variants on common disease. As sequencing datasets continue to grow in size, the number of variants within each set (e.g., gene)...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7680887/ https://www.ncbi.nlm.nih.gov/pubmed/33240333 http://dx.doi.org/10.3389/fgene.2020.591606 |
_version_ | 1783612523798331392 |
---|---|
author | Fore, Ruby Boehme, Jaden Li, Kevin Westra, Jason Tintle, Nathan |
author_facet | Fore, Ruby Boehme, Jaden Li, Kevin Westra, Jason Tintle, Nathan |
author_sort | Fore, Ruby |
collection | PubMed |
description | Gene-based tests of association (e.g., variance components and burden tests) are now common practice for analyses attempting to elucidate the contribution of rare genetic variants on common disease. As sequencing datasets continue to grow in size, the number of variants within each set (e.g., gene) being tested is also continuing to grow. Pathway-based methods have been used to allow for the initial aggregation of gene-based statistical evidence and then the subsequent aggregation of evidence across the pathway. This “multi-set” approach (first gene-based test, followed by pathway-based) lacks thorough exploration in regard to evaluating genotype–phenotype associations in the age of large, sequenced datasets. In particular, we wonder whether there are statistical and biological characteristics that make the multi-set approach optimal vs. simply doing all gene-based tests? In this paper, we provide an intuitive framework for evaluating these questions and use simulated data to affirm us this intuition. A real data application is provided demonstrating how our insights manifest themselves in practice. Ultimately, we find that when initial subsets are biologically informative (e.g., tending to aggregate causal genetic variants within one or more subsets, often genes), multi-set strategies can improve statistical power, with particular gains in cases where causal variants are aggregated in subsets with less variants overall (high proportion of causal variants in the subset). However, we find that there is little advantage when the sets are non-informative (similar proportion of causal variants in the subsets). Our application to real data further demonstrates this intuition. In practice, we recommend wider use of pathway-based methods and further exploration of optimal ways of aggregating variants into subsets based on emerging biological evidence of the genetic architecture of complex disease. |
format | Online Article Text |
id | pubmed-7680887 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-76808872020-11-24 Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants Fore, Ruby Boehme, Jaden Li, Kevin Westra, Jason Tintle, Nathan Front Genet Genetics Gene-based tests of association (e.g., variance components and burden tests) are now common practice for analyses attempting to elucidate the contribution of rare genetic variants on common disease. As sequencing datasets continue to grow in size, the number of variants within each set (e.g., gene) being tested is also continuing to grow. Pathway-based methods have been used to allow for the initial aggregation of gene-based statistical evidence and then the subsequent aggregation of evidence across the pathway. This “multi-set” approach (first gene-based test, followed by pathway-based) lacks thorough exploration in regard to evaluating genotype–phenotype associations in the age of large, sequenced datasets. In particular, we wonder whether there are statistical and biological characteristics that make the multi-set approach optimal vs. simply doing all gene-based tests? In this paper, we provide an intuitive framework for evaluating these questions and use simulated data to affirm us this intuition. A real data application is provided demonstrating how our insights manifest themselves in practice. Ultimately, we find that when initial subsets are biologically informative (e.g., tending to aggregate causal genetic variants within one or more subsets, often genes), multi-set strategies can improve statistical power, with particular gains in cases where causal variants are aggregated in subsets with less variants overall (high proportion of causal variants in the subset). However, we find that there is little advantage when the sets are non-informative (similar proportion of causal variants in the subsets). Our application to real data further demonstrates this intuition. In practice, we recommend wider use of pathway-based methods and further exploration of optimal ways of aggregating variants into subsets based on emerging biological evidence of the genetic architecture of complex disease. Frontiers Media S.A. 2020-11-09 /pmc/articles/PMC7680887/ /pubmed/33240333 http://dx.doi.org/10.3389/fgene.2020.591606 Text en Copyright © 2020 Fore, Boehme, Li, Westra and Tintle. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Fore, Ruby Boehme, Jaden Li, Kevin Westra, Jason Tintle, Nathan Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants |
title | Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants |
title_full | Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants |
title_fullStr | Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants |
title_full_unstemmed | Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants |
title_short | Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants |
title_sort | multi-set testing strategies show good behavior when applied to very large sets of rare variants |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7680887/ https://www.ncbi.nlm.nih.gov/pubmed/33240333 http://dx.doi.org/10.3389/fgene.2020.591606 |
work_keys_str_mv | AT foreruby multisettestingstrategiesshowgoodbehaviorwhenappliedtoverylargesetsofrarevariants AT boehmejaden multisettestingstrategiesshowgoodbehaviorwhenappliedtoverylargesetsofrarevariants AT likevin multisettestingstrategiesshowgoodbehaviorwhenappliedtoverylargesetsofrarevariants AT westrajason multisettestingstrategiesshowgoodbehaviorwhenappliedtoverylargesetsofrarevariants AT tintlenathan multisettestingstrategiesshowgoodbehaviorwhenappliedtoverylargesetsofrarevariants |