Cargando…
Will Big Data Close the Missing Heritability Gap?
Despite the important discoveries reported by genome-wide association (GWA) studies, for most traits and diseases the prediction R-squared (R-sq.) achieved with genetic scores remains considerably lower than the trait heritability. Modern biobanks will soon deliver unprecedentedly large biomedical d...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Genetics Society of America
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5676235/ https://www.ncbi.nlm.nih.gov/pubmed/28893854 http://dx.doi.org/10.1534/genetics.117.300271 |
_version_ | 1783277026214412288 |
---|---|
author | Kim, Hwasoon Grueneberg, Alexander Vazquez, Ana I. Hsu, Stephen de los Campos, Gustavo |
author_facet | Kim, Hwasoon Grueneberg, Alexander Vazquez, Ana I. Hsu, Stephen de los Campos, Gustavo |
author_sort | Kim, Hwasoon |
collection | PubMed |
description | Despite the important discoveries reported by genome-wide association (GWA) studies, for most traits and diseases the prediction R-squared (R-sq.) achieved with genetic scores remains considerably lower than the trait heritability. Modern biobanks will soon deliver unprecedentedly large biomedical data sets: Will the advent of big data close the gap between the trait heritability and the proportion of variance that can be explained by a genomic predictor? We addressed this question using Bayesian methods and a data analysis approach that produces a surface response relating prediction R-sq. with sample size and model complexity (e.g., number of SNPs). We applied the methodology to data from the interim release of the UK Biobank. Focusing on human height as a model trait and using 80,000 records for model training, we achieved a prediction R-sq. in testing (n = 22,221) of 0.24 (95% C.I.: 0.23–0.25). Our estimates show that prediction R-sq. increases with sample size, reaching an estimated plateau at values that ranged from 0.1 to 0.37 for models using 500 and 50,000 (GWA-selected) SNPs, respectively. Soon much larger data sets will become available. Using the estimated surface response, we forecast that larger sample sizes will lead to further improvements in prediction R-sq. We conclude that big data will lead to a substantial reduction of the gap between trait heritability and the proportion of interindividual differences that can be explained with a genomic predictor. However, even with the power of big data, for complex traits we anticipate that the gap between prediction R-sq. and trait heritability will not be fully closed. |
format | Online Article Text |
id | pubmed-5676235 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Genetics Society of America |
record_format | MEDLINE/PubMed |
spelling | pubmed-56762352017-11-09 Will Big Data Close the Missing Heritability Gap? Kim, Hwasoon Grueneberg, Alexander Vazquez, Ana I. Hsu, Stephen de los Campos, Gustavo Genetics Investigations Despite the important discoveries reported by genome-wide association (GWA) studies, for most traits and diseases the prediction R-squared (R-sq.) achieved with genetic scores remains considerably lower than the trait heritability. Modern biobanks will soon deliver unprecedentedly large biomedical data sets: Will the advent of big data close the gap between the trait heritability and the proportion of variance that can be explained by a genomic predictor? We addressed this question using Bayesian methods and a data analysis approach that produces a surface response relating prediction R-sq. with sample size and model complexity (e.g., number of SNPs). We applied the methodology to data from the interim release of the UK Biobank. Focusing on human height as a model trait and using 80,000 records for model training, we achieved a prediction R-sq. in testing (n = 22,221) of 0.24 (95% C.I.: 0.23–0.25). Our estimates show that prediction R-sq. increases with sample size, reaching an estimated plateau at values that ranged from 0.1 to 0.37 for models using 500 and 50,000 (GWA-selected) SNPs, respectively. Soon much larger data sets will become available. Using the estimated surface response, we forecast that larger sample sizes will lead to further improvements in prediction R-sq. We conclude that big data will lead to a substantial reduction of the gap between trait heritability and the proportion of interindividual differences that can be explained with a genomic predictor. However, even with the power of big data, for complex traits we anticipate that the gap between prediction R-sq. and trait heritability will not be fully closed. Genetics Society of America 2017-11 2017-09-11 /pmc/articles/PMC5676235/ /pubmed/28893854 http://dx.doi.org/10.1534/genetics.117.300271 Text en Copyright © 2017 by the Genetics Society of America Available freely online through the author-supported open access option. |
spellingShingle | Investigations Kim, Hwasoon Grueneberg, Alexander Vazquez, Ana I. Hsu, Stephen de los Campos, Gustavo Will Big Data Close the Missing Heritability Gap? |
title | Will Big Data Close the Missing Heritability Gap? |
title_full | Will Big Data Close the Missing Heritability Gap? |
title_fullStr | Will Big Data Close the Missing Heritability Gap? |
title_full_unstemmed | Will Big Data Close the Missing Heritability Gap? |
title_short | Will Big Data Close the Missing Heritability Gap? |
title_sort | will big data close the missing heritability gap? |
topic | Investigations |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5676235/ https://www.ncbi.nlm.nih.gov/pubmed/28893854 http://dx.doi.org/10.1534/genetics.117.300271 |
work_keys_str_mv | AT kimhwasoon willbigdataclosethemissingheritabilitygap AT gruenebergalexander willbigdataclosethemissingheritabilitygap AT vazquezanai willbigdataclosethemissingheritabilitygap AT hsustephen willbigdataclosethemissingheritabilitygap AT deloscamposgustavo willbigdataclosethemissingheritabilitygap |