Cargando…

Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants

Gene-based tests of association (e.g., variance components and burden tests) are now common practice for analyses attempting to elucidate the contribution of rare genetic variants on common disease. As sequencing datasets continue to grow in size, the number of variants within each set (e.g., gene)...

Descripción completa

Detalles Bibliográficos
Autores principales:	Fore, Ruby, Boehme, Jaden, Li, Kevin, Westra, Jason, Tintle, Nathan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2020
Materias:	Genetics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7680887/ https://www.ncbi.nlm.nih.gov/pubmed/33240333 http://dx.doi.org/10.3389/fgene.2020.591606

_version_	1783612523798331392
author	Fore, Ruby Boehme, Jaden Li, Kevin Westra, Jason Tintle, Nathan
author_facet	Fore, Ruby Boehme, Jaden Li, Kevin Westra, Jason Tintle, Nathan
author_sort	Fore, Ruby
collection	PubMed
description	Gene-based tests of association (e.g., variance components and burden tests) are now common practice for analyses attempting to elucidate the contribution of rare genetic variants on common disease. As sequencing datasets continue to grow in size, the number of variants within each set (e.g., gene) being tested is also continuing to grow. Pathway-based methods have been used to allow for the initial aggregation of gene-based statistical evidence and then the subsequent aggregation of evidence across the pathway. This “multi-set” approach (first gene-based test, followed by pathway-based) lacks thorough exploration in regard to evaluating genotype–phenotype associations in the age of large, sequenced datasets. In particular, we wonder whether there are statistical and biological characteristics that make the multi-set approach optimal vs. simply doing all gene-based tests? In this paper, we provide an intuitive framework for evaluating these questions and use simulated data to affirm us this intuition. A real data application is provided demonstrating how our insights manifest themselves in practice. Ultimately, we find that when initial subsets are biologically informative (e.g., tending to aggregate causal genetic variants within one or more subsets, often genes), multi-set strategies can improve statistical power, with particular gains in cases where causal variants are aggregated in subsets with less variants overall (high proportion of causal variants in the subset). However, we find that there is little advantage when the sets are non-informative (similar proportion of causal variants in the subsets). Our application to real data further demonstrates this intuition. In practice, we recommend wider use of pathway-based methods and further exploration of optimal ways of aggregating variants into subsets based on emerging biological evidence of the genetic architecture of complex disease.
format	Online Article Text
id	pubmed-7680887
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-76808872020-11-24 Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants Fore, Ruby Boehme, Jaden Li, Kevin Westra, Jason Tintle, Nathan Front Genet Genetics Gene-based tests of association (e.g., variance components and burden tests) are now common practice for analyses attempting to elucidate the contribution of rare genetic variants on common disease. As sequencing datasets continue to grow in size, the number of variants within each set (e.g., gene) being tested is also continuing to grow. Pathway-based methods have been used to allow for the initial aggregation of gene-based statistical evidence and then the subsequent aggregation of evidence across the pathway. This “multi-set” approach (first gene-based test, followed by pathway-based) lacks thorough exploration in regard to evaluating genotype–phenotype associations in the age of large, sequenced datasets. In particular, we wonder whether there are statistical and biological characteristics that make the multi-set approach optimal vs. simply doing all gene-based tests? In this paper, we provide an intuitive framework for evaluating these questions and use simulated data to affirm us this intuition. A real data application is provided demonstrating how our insights manifest themselves in practice. Ultimately, we find that when initial subsets are biologically informative (e.g., tending to aggregate causal genetic variants within one or more subsets, often genes), multi-set strategies can improve statistical power, with particular gains in cases where causal variants are aggregated in subsets with less variants overall (high proportion of causal variants in the subset). However, we find that there is little advantage when the sets are non-informative (similar proportion of causal variants in the subsets). Our application to real data further demonstrates this intuition. In practice, we recommend wider use of pathway-based methods and further exploration of optimal ways of aggregating variants into subsets based on emerging biological evidence of the genetic architecture of complex disease. Frontiers Media S.A. 2020-11-09 /pmc/articles/PMC7680887/ /pubmed/33240333 http://dx.doi.org/10.3389/fgene.2020.591606 Text en Copyright © 2020 Fore, Boehme, Li, Westra and Tintle. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Genetics Fore, Ruby Boehme, Jaden Li, Kevin Westra, Jason Tintle, Nathan Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants
title	Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants
title_full	Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants
title_fullStr	Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants
title_full_unstemmed	Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants
title_short	Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants
title_sort	multi-set testing strategies show good behavior when applied to very large sets of rare variants
topic	Genetics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7680887/ https://www.ncbi.nlm.nih.gov/pubmed/33240333 http://dx.doi.org/10.3389/fgene.2020.591606
work_keys_str_mv	AT foreruby multisettestingstrategiesshowgoodbehaviorwhenappliedtoverylargesetsofrarevariants AT boehmejaden multisettestingstrategiesshowgoodbehaviorwhenappliedtoverylargesetsofrarevariants AT likevin multisettestingstrategiesshowgoodbehaviorwhenappliedtoverylargesetsofrarevariants AT westrajason multisettestingstrategiesshowgoodbehaviorwhenappliedtoverylargesetsofrarevariants AT tintlenathan multisettestingstrategiesshowgoodbehaviorwhenappliedtoverylargesetsofrarevariants

Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants

Ejemplares similares