Cargando…

An alternative covariance estimator to investigate genetic heterogeneity in populations

BACKGROUND: For genomic prediction and genome-wide association studies (GWAS) using mixed models, covariance between individuals is estimated using molecular markers. Based on the properties of mixed models, using available molecular data for prediction is optimal if this covariance is known. Under...

Descripción completa

Detalles Bibliográficos
Autores principales: Heslot, Nicolas, Jannink, Jean-Luc
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4661961/
https://www.ncbi.nlm.nih.gov/pubmed/26612537
http://dx.doi.org/10.1186/s12711-015-0171-z
_version_ 1782403085365149696
author Heslot, Nicolas
Jannink, Jean-Luc
author_facet Heslot, Nicolas
Jannink, Jean-Luc
author_sort Heslot, Nicolas
collection PubMed
description BACKGROUND: For genomic prediction and genome-wide association studies (GWAS) using mixed models, covariance between individuals is estimated using molecular markers. Based on the properties of mixed models, using available molecular data for prediction is optimal if this covariance is known. Under this assumption, adding individuals to the analysis should never be detrimental. However, some empirical studies showed that increasing training population size decreased prediction accuracy. Recently, results from theoretical models indicated that even if marker density is high and the genetic architecture of traits is controlled by many loci with small additive effects, the covariance between individuals, which depends on relationships at causal loci, is not always well estimated by the whole-genome kinship. RESULTS: We propose an alternative covariance estimator named K-kernel, to account for potential genetic heterogeneity between populations that is characterized by a lack of genetic correlation, and to limit the information flow between a priori unknown populations in a trait-specific manner. This is similar to a multi-trait model and parameters are estimated by REML and, in extreme cases, it can allow for an independent genetic architecture between populations. As such, K-kernel is useful to study the problem of the design of training populations. K-kernel was compared to other covariance estimators or kernels to examine its fit to the data, cross-validated accuracy and suitability for GWAS on several datasets. It provides a significantly better fit to the data than the genomic best linear unbiased prediction model and, in some cases it performs better than other kernels such as the Gaussian kernel, as shown by an empirical null distribution. In GWAS simulations, alternative kernels control type I errors as well as or better than the classical whole-genome kinship and increase statistical power. No or small gains were observed in cross-validated prediction accuracy. CONCLUSIONS: This alternative covariance estimator can be used to gain insight into trait-specific genetic heterogeneity by identifying relevant sub-populations that lack genetic correlation between them. Genetic correlation can be 0 between identified sub-populations by performing automatic selection of relevant sets of individuals to be included in the training population. It may also increase statistical power in GWAS. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12711-015-0171-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4661961
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-46619612015-11-28 An alternative covariance estimator to investigate genetic heterogeneity in populations Heslot, Nicolas Jannink, Jean-Luc Genet Sel Evol Research Article BACKGROUND: For genomic prediction and genome-wide association studies (GWAS) using mixed models, covariance between individuals is estimated using molecular markers. Based on the properties of mixed models, using available molecular data for prediction is optimal if this covariance is known. Under this assumption, adding individuals to the analysis should never be detrimental. However, some empirical studies showed that increasing training population size decreased prediction accuracy. Recently, results from theoretical models indicated that even if marker density is high and the genetic architecture of traits is controlled by many loci with small additive effects, the covariance between individuals, which depends on relationships at causal loci, is not always well estimated by the whole-genome kinship. RESULTS: We propose an alternative covariance estimator named K-kernel, to account for potential genetic heterogeneity between populations that is characterized by a lack of genetic correlation, and to limit the information flow between a priori unknown populations in a trait-specific manner. This is similar to a multi-trait model and parameters are estimated by REML and, in extreme cases, it can allow for an independent genetic architecture between populations. As such, K-kernel is useful to study the problem of the design of training populations. K-kernel was compared to other covariance estimators or kernels to examine its fit to the data, cross-validated accuracy and suitability for GWAS on several datasets. It provides a significantly better fit to the data than the genomic best linear unbiased prediction model and, in some cases it performs better than other kernels such as the Gaussian kernel, as shown by an empirical null distribution. In GWAS simulations, alternative kernels control type I errors as well as or better than the classical whole-genome kinship and increase statistical power. No or small gains were observed in cross-validated prediction accuracy. CONCLUSIONS: This alternative covariance estimator can be used to gain insight into trait-specific genetic heterogeneity by identifying relevant sub-populations that lack genetic correlation between them. Genetic correlation can be 0 between identified sub-populations by performing automatic selection of relevant sets of individuals to be included in the training population. It may also increase statistical power in GWAS. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12711-015-0171-z) contains supplementary material, which is available to authorized users. BioMed Central 2015-11-26 /pmc/articles/PMC4661961/ /pubmed/26612537 http://dx.doi.org/10.1186/s12711-015-0171-z Text en © Heslot and Jannink. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Heslot, Nicolas
Jannink, Jean-Luc
An alternative covariance estimator to investigate genetic heterogeneity in populations
title An alternative covariance estimator to investigate genetic heterogeneity in populations
title_full An alternative covariance estimator to investigate genetic heterogeneity in populations
title_fullStr An alternative covariance estimator to investigate genetic heterogeneity in populations
title_full_unstemmed An alternative covariance estimator to investigate genetic heterogeneity in populations
title_short An alternative covariance estimator to investigate genetic heterogeneity in populations
title_sort alternative covariance estimator to investigate genetic heterogeneity in populations
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4661961/
https://www.ncbi.nlm.nih.gov/pubmed/26612537
http://dx.doi.org/10.1186/s12711-015-0171-z
work_keys_str_mv AT heslotnicolas analternativecovarianceestimatortoinvestigategeneticheterogeneityinpopulations
AT janninkjeanluc analternativecovarianceestimatortoinvestigategeneticheterogeneityinpopulations
AT heslotnicolas alternativecovarianceestimatortoinvestigategeneticheterogeneityinpopulations
AT janninkjeanluc alternativecovarianceestimatortoinvestigategeneticheterogeneityinpopulations