Cargando…

Determination of nonlinear genetic architecture using compressed sensing

BACKGROUND: One of the fundamental problems of modern genomics is to extract the genetic architecture of a complex trait from a data set of individual genotypes and trait values. Establishing this important connection between genotype and phenotype is complicated by the large number of candidate gen...

Descripción completa

Detalles Bibliográficos
Autores principales: Ho, Chiu Man, Hsu, Stephen DH
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4570224/
https://www.ncbi.nlm.nih.gov/pubmed/26380078
http://dx.doi.org/10.1186/s13742-015-0081-6
_version_ 1782390168718671872
author Ho, Chiu Man
Hsu, Stephen DH
author_facet Ho, Chiu Man
Hsu, Stephen DH
author_sort Ho, Chiu Man
collection PubMed
description BACKGROUND: One of the fundamental problems of modern genomics is to extract the genetic architecture of a complex trait from a data set of individual genotypes and trait values. Establishing this important connection between genotype and phenotype is complicated by the large number of candidate genes, the potentially large number of causal loci, and the likely presence of some nonlinear interactions between different genes. Compressed Sensing methods obtain solutions to under-constrained systems of linear equations. These methods can be applied to the problem of determining the best model relating genotype to phenotype, and generally deliver better performance than simply regressing the phenotype against each genetic variant, one at a time. We introduce a Compressed Sensing method that can reconstruct nonlinear genetic models (i.e., including epistasis, or gene-gene interactions) from phenotype-genotype (GWAS) data. Our method uses L1-penalized regression applied to nonlinear functions of the sensing matrix. RESULTS: The computational and data resource requirements for our method are similar to those necessary for reconstruction of linear genetic models (or identification of gene-trait associations), assuming a condition of generalized sparsity, which limits the total number of gene-gene interactions. An example of a sparse nonlinear model is one in which a typical locus interacts with several or even many others, but only a small subset of all possible interactions exist. It seems plausible that most genetic architectures fall in this category. We give theoretical arguments suggesting that the method is nearly optimal in performance, and demonstrate its effectiveness on broad classes of nonlinear genetic models using simulated human genomes and the small amount of currently available real data. A phase transition (i.e., dramatic and qualitative change) in the behavior of the algorithm indicates when sufficient data is available for its successful application. CONCLUSION: Our results indicate that predictive models for many complex traits, including a variety of human disease susceptibilities (e.g., with additive heritability h(2)∼0.5), can be extracted from data sets comprised of n(⋆)∼100s individuals, where s is the number of distinct causal variants influencing the trait. For example, given a trait controlled by ∼10 k loci, roughly a million individuals would be sufficient for application of the method.
format Online
Article
Text
id pubmed-4570224
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-45702242015-09-16 Determination of nonlinear genetic architecture using compressed sensing Ho, Chiu Man Hsu, Stephen DH Gigascience Research BACKGROUND: One of the fundamental problems of modern genomics is to extract the genetic architecture of a complex trait from a data set of individual genotypes and trait values. Establishing this important connection between genotype and phenotype is complicated by the large number of candidate genes, the potentially large number of causal loci, and the likely presence of some nonlinear interactions between different genes. Compressed Sensing methods obtain solutions to under-constrained systems of linear equations. These methods can be applied to the problem of determining the best model relating genotype to phenotype, and generally deliver better performance than simply regressing the phenotype against each genetic variant, one at a time. We introduce a Compressed Sensing method that can reconstruct nonlinear genetic models (i.e., including epistasis, or gene-gene interactions) from phenotype-genotype (GWAS) data. Our method uses L1-penalized regression applied to nonlinear functions of the sensing matrix. RESULTS: The computational and data resource requirements for our method are similar to those necessary for reconstruction of linear genetic models (or identification of gene-trait associations), assuming a condition of generalized sparsity, which limits the total number of gene-gene interactions. An example of a sparse nonlinear model is one in which a typical locus interacts with several or even many others, but only a small subset of all possible interactions exist. It seems plausible that most genetic architectures fall in this category. We give theoretical arguments suggesting that the method is nearly optimal in performance, and demonstrate its effectiveness on broad classes of nonlinear genetic models using simulated human genomes and the small amount of currently available real data. A phase transition (i.e., dramatic and qualitative change) in the behavior of the algorithm indicates when sufficient data is available for its successful application. CONCLUSION: Our results indicate that predictive models for many complex traits, including a variety of human disease susceptibilities (e.g., with additive heritability h(2)∼0.5), can be extracted from data sets comprised of n(⋆)∼100s individuals, where s is the number of distinct causal variants influencing the trait. For example, given a trait controlled by ∼10 k loci, roughly a million individuals would be sufficient for application of the method. BioMed Central 2015-09-14 /pmc/articles/PMC4570224/ /pubmed/26380078 http://dx.doi.org/10.1186/s13742-015-0081-6 Text en © Ho and Hsu. 2015 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Ho, Chiu Man
Hsu, Stephen DH
Determination of nonlinear genetic architecture using compressed sensing
title Determination of nonlinear genetic architecture using compressed sensing
title_full Determination of nonlinear genetic architecture using compressed sensing
title_fullStr Determination of nonlinear genetic architecture using compressed sensing
title_full_unstemmed Determination of nonlinear genetic architecture using compressed sensing
title_short Determination of nonlinear genetic architecture using compressed sensing
title_sort determination of nonlinear genetic architecture using compressed sensing
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4570224/
https://www.ncbi.nlm.nih.gov/pubmed/26380078
http://dx.doi.org/10.1186/s13742-015-0081-6
work_keys_str_mv AT hochiuman determinationofnonlineargeneticarchitectureusingcompressedsensing
AT hsustephendh determinationofnonlineargeneticarchitectureusingcompressedsensing