Cargando…

Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection

BACKGROUND: Algorithms designed to detect complex genetic disease associations are initially evaluated using simulated datasets. Typical evaluations vary constraints that influence the correct detection of underlying models (i.e. number of loci, heritability, and minor allele frequency). Such studie...

Descripción completa

Detalles Bibliográficos
Autores principales:	Urbanowicz, Ryan J, Kiralis, Jeff, Fisher, Jonathan M, Moore, Jason H
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Methodology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3549792/ https://www.ncbi.nlm.nih.gov/pubmed/23014095 http://dx.doi.org/10.1186/1756-0381-5-15

_version_	1782256470984753152
author	Urbanowicz, Ryan J Kiralis, Jeff Fisher, Jonathan M Moore, Jason H
author_facet	Urbanowicz, Ryan J Kiralis, Jeff Fisher, Jonathan M Moore, Jason H
author_sort	Urbanowicz, Ryan J
collection	PubMed
description	BACKGROUND: Algorithms designed to detect complex genetic disease associations are initially evaluated using simulated datasets. Typical evaluations vary constraints that influence the correct detection of underlying models (i.e. number of loci, heritability, and minor allele frequency). Such studies neglect to account for model architecture (i.e. the unique specification and arrangement of penetrance values comprising the genetic model), which alone can influence the detectability of a model. In order to design a simulation study which efficiently takes architecture into account, a reliable metric is needed for model selection. RESULTS: We evaluate three metrics as predictors of relative model detection difficulty derived from previous works: (1) Penetrance table variance (PTV), (2) customized odds ratio (COR), and (3) our own Ease of Detection Measure (EDM), calculated from the penetrance values and respective genotype frequencies of each simulated genetic model. We evaluate the reliability of these metrics across three very different data search algorithms, each with the capacity to detect epistatic interactions. We find that a model’s EDM and COR are each stronger predictors of model detection success than heritability. CONCLUSIONS: This study formally identifies and evaluates metrics which quantify model detection difficulty. We utilize these metrics to intelligently select models from a population of potential architectures. This allows for an improved simulation study design which accounts for differences in detection difficulty attributed to model architecture. We implement the calculation and utilization of EDM and COR into GAMETES, an algorithm which rapidly and precisely generates pure, strict, n-locus epistatic models.
format	Online Article Text
id	pubmed-3549792
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-35497922013-01-23 Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection Urbanowicz, Ryan J Kiralis, Jeff Fisher, Jonathan M Moore, Jason H BioData Min Methodology BACKGROUND: Algorithms designed to detect complex genetic disease associations are initially evaluated using simulated datasets. Typical evaluations vary constraints that influence the correct detection of underlying models (i.e. number of loci, heritability, and minor allele frequency). Such studies neglect to account for model architecture (i.e. the unique specification and arrangement of penetrance values comprising the genetic model), which alone can influence the detectability of a model. In order to design a simulation study which efficiently takes architecture into account, a reliable metric is needed for model selection. RESULTS: We evaluate three metrics as predictors of relative model detection difficulty derived from previous works: (1) Penetrance table variance (PTV), (2) customized odds ratio (COR), and (3) our own Ease of Detection Measure (EDM), calculated from the penetrance values and respective genotype frequencies of each simulated genetic model. We evaluate the reliability of these metrics across three very different data search algorithms, each with the capacity to detect epistatic interactions. We find that a model’s EDM and COR are each stronger predictors of model detection success than heritability. CONCLUSIONS: This study formally identifies and evaluates metrics which quantify model detection difficulty. We utilize these metrics to intelligently select models from a population of potential architectures. This allows for an improved simulation study design which accounts for differences in detection difficulty attributed to model architecture. We implement the calculation and utilization of EDM and COR into GAMETES, an algorithm which rapidly and precisely generates pure, strict, n-locus epistatic models. BioMed Central 2012-09-26 /pmc/articles/PMC3549792/ /pubmed/23014095 http://dx.doi.org/10.1186/1756-0381-5-15 Text en Copyright ©2012 Urbanowicz et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Urbanowicz, Ryan J Kiralis, Jeff Fisher, Jonathan M Moore, Jason H Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection
title	Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection
title_full	Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection
title_fullStr	Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection
title_full_unstemmed	Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection
title_short	Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection
title_sort	predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection
topic	Methodology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3549792/ https://www.ncbi.nlm.nih.gov/pubmed/23014095 http://dx.doi.org/10.1186/1756-0381-5-15
work_keys_str_mv	AT urbanowiczryanj predictingthedifficultyofpurestrictepistaticmodelsmetricsforsimulatedmodelselection AT kiralisjeff predictingthedifficultyofpurestrictepistaticmodelsmetricsforsimulatedmodelselection AT fisherjonathanm predictingthedifficultyofpurestrictepistaticmodelsmetricsforsimulatedmodelselection AT moorejasonh predictingthedifficultyofpurestrictepistaticmodelsmetricsforsimulatedmodelselection

Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection

Ejemplares similares