Cargando…

On the impact of relatedness on SNP association analysis

BACKGROUND: When testing for SNP (single nucleotide polymorphism) associations in related individuals, observations are not independent. Simple linear regression assuming independent normally distributed residuals results in an increased type I error and the power of the test is also affected in a m...

Descripción completa

Detalles Bibliográficos
Autores principales: Gross, Arnd, Tönjes, Anke, Scholz, Markus
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5719591/
https://www.ncbi.nlm.nih.gov/pubmed/29212447
http://dx.doi.org/10.1186/s12863-017-0571-x
_version_ 1783284524264718336
author Gross, Arnd
Tönjes, Anke
Scholz, Markus
author_facet Gross, Arnd
Tönjes, Anke
Scholz, Markus
author_sort Gross, Arnd
collection PubMed
description BACKGROUND: When testing for SNP (single nucleotide polymorphism) associations in related individuals, observations are not independent. Simple linear regression assuming independent normally distributed residuals results in an increased type I error and the power of the test is also affected in a more complicate manner. Inflation of type I error is often successfully corrected by genomic control. However, this reduces the power of the test when relatedness is of concern. In the present paper, we derive explicit formulae to investigate how heritability and strength of relatedness contribute to variance inflation of the effect estimate of the linear model. Further, we study the consequences of variance inflation on hypothesis testing and compare the results with those of genomic control correction. We apply the developed theory to the publicly available HapMap trio data (N=129), the Sorbs (a self-contained population with N=977 characterised by a cryptic relatedness structure) and synthetic family studies with different sample sizes (ranging from N=129 to N=999) and different degrees of relatedness. RESULTS: We derive explicit and easily to apply approximation formulae to estimate the impact of relatedness on the variance of the effect estimate of the linear regression model. Variance inflation increases with increasing heritability. Relatedness structure also impacts the degree of variance inflation as shown for example family structures. Variance inflation is smallest for HapMap trios, followed by a synthetic family study corresponding to the trio data but with larger sample size than HapMap. Next strongest inflation is observed for the Sorbs, and finally, for a synthetic family study with a more extreme relatedness structure but with similar sample size as the Sorbs. Type I error increases rapidly with increasing inflation. However, for smaller significance levels, power increases with increasing inflation while the opposite holds for larger significance levels. When genomic control is applied, type I error is preserved while power decreases rapidly with increasing variance inflation. CONCLUSIONS: Stronger relatedness as well as higher heritability result in increased variance of the effect estimate of simple linear regression analysis. While type I error rates are generally inflated, the behaviour of power is more complex since power can be increased or reduced in dependence on relatedness and the heritability of the phenotype. Genomic control cannot be recommended to deal with inflation due to relatedness. Although it preserves type I error, the loss in power can be considerable. We provide a simple formula for estimating variance inflation given the relatedness structure and the heritability of a trait of interest. As a rule of thumb, variance inflation below 1.05 does not require correction and simple linear regression analysis is still appropriate. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12863-017-0571-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5719591
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-57195912017-12-08 On the impact of relatedness on SNP association analysis Gross, Arnd Tönjes, Anke Scholz, Markus BMC Genet Methodology Article BACKGROUND: When testing for SNP (single nucleotide polymorphism) associations in related individuals, observations are not independent. Simple linear regression assuming independent normally distributed residuals results in an increased type I error and the power of the test is also affected in a more complicate manner. Inflation of type I error is often successfully corrected by genomic control. However, this reduces the power of the test when relatedness is of concern. In the present paper, we derive explicit formulae to investigate how heritability and strength of relatedness contribute to variance inflation of the effect estimate of the linear model. Further, we study the consequences of variance inflation on hypothesis testing and compare the results with those of genomic control correction. We apply the developed theory to the publicly available HapMap trio data (N=129), the Sorbs (a self-contained population with N=977 characterised by a cryptic relatedness structure) and synthetic family studies with different sample sizes (ranging from N=129 to N=999) and different degrees of relatedness. RESULTS: We derive explicit and easily to apply approximation formulae to estimate the impact of relatedness on the variance of the effect estimate of the linear regression model. Variance inflation increases with increasing heritability. Relatedness structure also impacts the degree of variance inflation as shown for example family structures. Variance inflation is smallest for HapMap trios, followed by a synthetic family study corresponding to the trio data but with larger sample size than HapMap. Next strongest inflation is observed for the Sorbs, and finally, for a synthetic family study with a more extreme relatedness structure but with similar sample size as the Sorbs. Type I error increases rapidly with increasing inflation. However, for smaller significance levels, power increases with increasing inflation while the opposite holds for larger significance levels. When genomic control is applied, type I error is preserved while power decreases rapidly with increasing variance inflation. CONCLUSIONS: Stronger relatedness as well as higher heritability result in increased variance of the effect estimate of simple linear regression analysis. While type I error rates are generally inflated, the behaviour of power is more complex since power can be increased or reduced in dependence on relatedness and the heritability of the phenotype. Genomic control cannot be recommended to deal with inflation due to relatedness. Although it preserves type I error, the loss in power can be considerable. We provide a simple formula for estimating variance inflation given the relatedness structure and the heritability of a trait of interest. As a rule of thumb, variance inflation below 1.05 does not require correction and simple linear regression analysis is still appropriate. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12863-017-0571-x) contains supplementary material, which is available to authorized users. BioMed Central 2017-12-06 /pmc/articles/PMC5719591/ /pubmed/29212447 http://dx.doi.org/10.1186/s12863-017-0571-x Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Gross, Arnd
Tönjes, Anke
Scholz, Markus
On the impact of relatedness on SNP association analysis
title On the impact of relatedness on SNP association analysis
title_full On the impact of relatedness on SNP association analysis
title_fullStr On the impact of relatedness on SNP association analysis
title_full_unstemmed On the impact of relatedness on SNP association analysis
title_short On the impact of relatedness on SNP association analysis
title_sort on the impact of relatedness on snp association analysis
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5719591/
https://www.ncbi.nlm.nih.gov/pubmed/29212447
http://dx.doi.org/10.1186/s12863-017-0571-x
work_keys_str_mv AT grossarnd ontheimpactofrelatednessonsnpassociationanalysis
AT tonjesanke ontheimpactofrelatednessonsnpassociationanalysis
AT scholzmarkus ontheimpactofrelatednessonsnpassociationanalysis