Cargando…

Will Big Data Close the Missing Heritability Gap?

Despite the important discoveries reported by genome-wide association (GWA) studies, for most traits and diseases the prediction R-squared (R-sq.) achieved with genetic scores remains considerably lower than the trait heritability. Modern biobanks will soon deliver unprecedentedly large biomedical d...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Hwasoon, Grueneberg, Alexander, Vazquez, Ana I., Hsu, Stephen, de los Campos, Gustavo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5676235/
https://www.ncbi.nlm.nih.gov/pubmed/28893854
http://dx.doi.org/10.1534/genetics.117.300271
_version_ 1783277026214412288
author Kim, Hwasoon
Grueneberg, Alexander
Vazquez, Ana I.
Hsu, Stephen
de los Campos, Gustavo
author_facet Kim, Hwasoon
Grueneberg, Alexander
Vazquez, Ana I.
Hsu, Stephen
de los Campos, Gustavo
author_sort Kim, Hwasoon
collection PubMed
description Despite the important discoveries reported by genome-wide association (GWA) studies, for most traits and diseases the prediction R-squared (R-sq.) achieved with genetic scores remains considerably lower than the trait heritability. Modern biobanks will soon deliver unprecedentedly large biomedical data sets: Will the advent of big data close the gap between the trait heritability and the proportion of variance that can be explained by a genomic predictor? We addressed this question using Bayesian methods and a data analysis approach that produces a surface response relating prediction R-sq. with sample size and model complexity (e.g., number of SNPs). We applied the methodology to data from the interim release of the UK Biobank. Focusing on human height as a model trait and using 80,000 records for model training, we achieved a prediction R-sq. in testing (n = 22,221) of 0.24 (95% C.I.: 0.23–0.25). Our estimates show that prediction R-sq. increases with sample size, reaching an estimated plateau at values that ranged from 0.1 to 0.37 for models using 500 and 50,000 (GWA-selected) SNPs, respectively. Soon much larger data sets will become available. Using the estimated surface response, we forecast that larger sample sizes will lead to further improvements in prediction R-sq. We conclude that big data will lead to a substantial reduction of the gap between trait heritability and the proportion of interindividual differences that can be explained with a genomic predictor. However, even with the power of big data, for complex traits we anticipate that the gap between prediction R-sq. and trait heritability will not be fully closed.
format Online
Article
Text
id pubmed-5676235
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-56762352017-11-09 Will Big Data Close the Missing Heritability Gap? Kim, Hwasoon Grueneberg, Alexander Vazquez, Ana I. Hsu, Stephen de los Campos, Gustavo Genetics Investigations Despite the important discoveries reported by genome-wide association (GWA) studies, for most traits and diseases the prediction R-squared (R-sq.) achieved with genetic scores remains considerably lower than the trait heritability. Modern biobanks will soon deliver unprecedentedly large biomedical data sets: Will the advent of big data close the gap between the trait heritability and the proportion of variance that can be explained by a genomic predictor? We addressed this question using Bayesian methods and a data analysis approach that produces a surface response relating prediction R-sq. with sample size and model complexity (e.g., number of SNPs). We applied the methodology to data from the interim release of the UK Biobank. Focusing on human height as a model trait and using 80,000 records for model training, we achieved a prediction R-sq. in testing (n = 22,221) of 0.24 (95% C.I.: 0.23–0.25). Our estimates show that prediction R-sq. increases with sample size, reaching an estimated plateau at values that ranged from 0.1 to 0.37 for models using 500 and 50,000 (GWA-selected) SNPs, respectively. Soon much larger data sets will become available. Using the estimated surface response, we forecast that larger sample sizes will lead to further improvements in prediction R-sq. We conclude that big data will lead to a substantial reduction of the gap between trait heritability and the proportion of interindividual differences that can be explained with a genomic predictor. However, even with the power of big data, for complex traits we anticipate that the gap between prediction R-sq. and trait heritability will not be fully closed. Genetics Society of America 2017-11 2017-09-11 /pmc/articles/PMC5676235/ /pubmed/28893854 http://dx.doi.org/10.1534/genetics.117.300271 Text en Copyright © 2017 by the Genetics Society of America Available freely online through the author-supported open access option.
spellingShingle Investigations
Kim, Hwasoon
Grueneberg, Alexander
Vazquez, Ana I.
Hsu, Stephen
de los Campos, Gustavo
Will Big Data Close the Missing Heritability Gap?
title Will Big Data Close the Missing Heritability Gap?
title_full Will Big Data Close the Missing Heritability Gap?
title_fullStr Will Big Data Close the Missing Heritability Gap?
title_full_unstemmed Will Big Data Close the Missing Heritability Gap?
title_short Will Big Data Close the Missing Heritability Gap?
title_sort will big data close the missing heritability gap?
topic Investigations
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5676235/
https://www.ncbi.nlm.nih.gov/pubmed/28893854
http://dx.doi.org/10.1534/genetics.117.300271
work_keys_str_mv AT kimhwasoon willbigdataclosethemissingheritabilitygap
AT gruenebergalexander willbigdataclosethemissingheritabilitygap
AT vazquezanai willbigdataclosethemissingheritabilitygap
AT hsustephen willbigdataclosethemissingheritabilitygap
AT deloscamposgustavo willbigdataclosethemissingheritabilitygap