Cargando…

Evolving hard problems: Generating human genetics datasets with a complex etiology

BACKGROUND: A goal of human genetics is to discover genetic factors that influence individuals' susceptibility to common diseases. Most common diseases are thought to result from the joint failure of two or more interacting components instead of single component failures. This greatly complicat...

Descripción completa

Detalles Bibliográficos
Autores principales:	Himmelstein, Daniel S, Greene, Casey S, Moore, Jason H
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Methodology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3154150/ https://www.ncbi.nlm.nih.gov/pubmed/21736753 http://dx.doi.org/10.1186/1756-0381-4-21

_version_	1782209983431049216
author	Himmelstein, Daniel S Greene, Casey S Moore, Jason H
author_facet	Himmelstein, Daniel S Greene, Casey S Moore, Jason H
author_sort	Himmelstein, Daniel S
collection	PubMed
description	BACKGROUND: A goal of human genetics is to discover genetic factors that influence individuals' susceptibility to common diseases. Most common diseases are thought to result from the joint failure of two or more interacting components instead of single component failures. This greatly complicates both the task of selecting informative genetic variants and the task of modeling interactions between them. We and others have previously developed algorithms to detect and model the relationships between these genetic factors and disease. Previously these methods have been evaluated with datasets simulated according to pre-defined genetic models. RESULTS: Here we develop and evaluate a model free evolution strategy to generate datasets which display a complex relationship between individual genotype and disease susceptibility. We show that this model free approach is capable of generating a diverse array of datasets with distinct gene-disease relationships for an arbitrary interaction order and sample size. We specifically generate eight-hundred Pareto fronts; one for each independent run of our algorithm. In each run the predictiveness of single genetic variation and pairs of genetic variants have been minimized, while the predictiveness of third, fourth, or fifth-order combinations is maximized. Two hundred runs of the algorithm are further dedicated to creating datasets with predictive four or five order interactions and minimized lower-level effects. CONCLUSIONS: This method and the resulting datasets will allow the capabilities of novel methods to be tested without pre-specified genetic models. This allows researchers to evaluate which methods will succeed on human genetics problems where the model is not known in advance. We further make freely available to the community the entire Pareto-optimal front of datasets from each run so that novel methods may be rigorously evaluated. These 76,600 datasets are available from http://discovery.dartmouth.edu/model_free_data/.
format	Online Article Text
id	pubmed-3154150
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-31541502011-08-11 Evolving hard problems: Generating human genetics datasets with a complex etiology Himmelstein, Daniel S Greene, Casey S Moore, Jason H BioData Min Methodology BACKGROUND: A goal of human genetics is to discover genetic factors that influence individuals' susceptibility to common diseases. Most common diseases are thought to result from the joint failure of two or more interacting components instead of single component failures. This greatly complicates both the task of selecting informative genetic variants and the task of modeling interactions between them. We and others have previously developed algorithms to detect and model the relationships between these genetic factors and disease. Previously these methods have been evaluated with datasets simulated according to pre-defined genetic models. RESULTS: Here we develop and evaluate a model free evolution strategy to generate datasets which display a complex relationship between individual genotype and disease susceptibility. We show that this model free approach is capable of generating a diverse array of datasets with distinct gene-disease relationships for an arbitrary interaction order and sample size. We specifically generate eight-hundred Pareto fronts; one for each independent run of our algorithm. In each run the predictiveness of single genetic variation and pairs of genetic variants have been minimized, while the predictiveness of third, fourth, or fifth-order combinations is maximized. Two hundred runs of the algorithm are further dedicated to creating datasets with predictive four or five order interactions and minimized lower-level effects. CONCLUSIONS: This method and the resulting datasets will allow the capabilities of novel methods to be tested without pre-specified genetic models. This allows researchers to evaluate which methods will succeed on human genetics problems where the model is not known in advance. We further make freely available to the community the entire Pareto-optimal front of datasets from each run so that novel methods may be rigorously evaluated. These 76,600 datasets are available from http://discovery.dartmouth.edu/model_free_data/. BioMed Central 2011-07-07 /pmc/articles/PMC3154150/ /pubmed/21736753 http://dx.doi.org/10.1186/1756-0381-4-21 Text en Copyright ©2011 Himmelstein et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Himmelstein, Daniel S Greene, Casey S Moore, Jason H Evolving hard problems: Generating human genetics datasets with a complex etiology
title	Evolving hard problems: Generating human genetics datasets with a complex etiology
title_full	Evolving hard problems: Generating human genetics datasets with a complex etiology
title_fullStr	Evolving hard problems: Generating human genetics datasets with a complex etiology
title_full_unstemmed	Evolving hard problems: Generating human genetics datasets with a complex etiology
title_short	Evolving hard problems: Generating human genetics datasets with a complex etiology
title_sort	evolving hard problems: generating human genetics datasets with a complex etiology
topic	Methodology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3154150/ https://www.ncbi.nlm.nih.gov/pubmed/21736753 http://dx.doi.org/10.1186/1756-0381-4-21
work_keys_str_mv	AT himmelsteindaniels evolvinghardproblemsgeneratinghumangeneticsdatasetswithacomplexetiology AT greenecaseys evolvinghardproblemsgeneratinghumangeneticsdatasetswithacomplexetiology AT moorejasonh evolvinghardproblemsgeneratinghumangeneticsdatasetswithacomplexetiology

Evolving hard problems: Generating human genetics datasets with a complex etiology

Ejemplares similares