Cargando…
On the impact of relatedness on SNP association analysis
BACKGROUND: When testing for SNP (single nucleotide polymorphism) associations in related individuals, observations are not independent. Simple linear regression assuming independent normally distributed residuals results in an increased type I error and the power of the test is also affected in a m...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5719591/ https://www.ncbi.nlm.nih.gov/pubmed/29212447 http://dx.doi.org/10.1186/s12863-017-0571-x |
_version_ | 1783284524264718336 |
---|---|
author | Gross, Arnd Tönjes, Anke Scholz, Markus |
author_facet | Gross, Arnd Tönjes, Anke Scholz, Markus |
author_sort | Gross, Arnd |
collection | PubMed |
description | BACKGROUND: When testing for SNP (single nucleotide polymorphism) associations in related individuals, observations are not independent. Simple linear regression assuming independent normally distributed residuals results in an increased type I error and the power of the test is also affected in a more complicate manner. Inflation of type I error is often successfully corrected by genomic control. However, this reduces the power of the test when relatedness is of concern. In the present paper, we derive explicit formulae to investigate how heritability and strength of relatedness contribute to variance inflation of the effect estimate of the linear model. Further, we study the consequences of variance inflation on hypothesis testing and compare the results with those of genomic control correction. We apply the developed theory to the publicly available HapMap trio data (N=129), the Sorbs (a self-contained population with N=977 characterised by a cryptic relatedness structure) and synthetic family studies with different sample sizes (ranging from N=129 to N=999) and different degrees of relatedness. RESULTS: We derive explicit and easily to apply approximation formulae to estimate the impact of relatedness on the variance of the effect estimate of the linear regression model. Variance inflation increases with increasing heritability. Relatedness structure also impacts the degree of variance inflation as shown for example family structures. Variance inflation is smallest for HapMap trios, followed by a synthetic family study corresponding to the trio data but with larger sample size than HapMap. Next strongest inflation is observed for the Sorbs, and finally, for a synthetic family study with a more extreme relatedness structure but with similar sample size as the Sorbs. Type I error increases rapidly with increasing inflation. However, for smaller significance levels, power increases with increasing inflation while the opposite holds for larger significance levels. When genomic control is applied, type I error is preserved while power decreases rapidly with increasing variance inflation. CONCLUSIONS: Stronger relatedness as well as higher heritability result in increased variance of the effect estimate of simple linear regression analysis. While type I error rates are generally inflated, the behaviour of power is more complex since power can be increased or reduced in dependence on relatedness and the heritability of the phenotype. Genomic control cannot be recommended to deal with inflation due to relatedness. Although it preserves type I error, the loss in power can be considerable. We provide a simple formula for estimating variance inflation given the relatedness structure and the heritability of a trait of interest. As a rule of thumb, variance inflation below 1.05 does not require correction and simple linear regression analysis is still appropriate. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12863-017-0571-x) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5719591 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-57195912017-12-08 On the impact of relatedness on SNP association analysis Gross, Arnd Tönjes, Anke Scholz, Markus BMC Genet Methodology Article BACKGROUND: When testing for SNP (single nucleotide polymorphism) associations in related individuals, observations are not independent. Simple linear regression assuming independent normally distributed residuals results in an increased type I error and the power of the test is also affected in a more complicate manner. Inflation of type I error is often successfully corrected by genomic control. However, this reduces the power of the test when relatedness is of concern. In the present paper, we derive explicit formulae to investigate how heritability and strength of relatedness contribute to variance inflation of the effect estimate of the linear model. Further, we study the consequences of variance inflation on hypothesis testing and compare the results with those of genomic control correction. We apply the developed theory to the publicly available HapMap trio data (N=129), the Sorbs (a self-contained population with N=977 characterised by a cryptic relatedness structure) and synthetic family studies with different sample sizes (ranging from N=129 to N=999) and different degrees of relatedness. RESULTS: We derive explicit and easily to apply approximation formulae to estimate the impact of relatedness on the variance of the effect estimate of the linear regression model. Variance inflation increases with increasing heritability. Relatedness structure also impacts the degree of variance inflation as shown for example family structures. Variance inflation is smallest for HapMap trios, followed by a synthetic family study corresponding to the trio data but with larger sample size than HapMap. Next strongest inflation is observed for the Sorbs, and finally, for a synthetic family study with a more extreme relatedness structure but with similar sample size as the Sorbs. Type I error increases rapidly with increasing inflation. However, for smaller significance levels, power increases with increasing inflation while the opposite holds for larger significance levels. When genomic control is applied, type I error is preserved while power decreases rapidly with increasing variance inflation. CONCLUSIONS: Stronger relatedness as well as higher heritability result in increased variance of the effect estimate of simple linear regression analysis. While type I error rates are generally inflated, the behaviour of power is more complex since power can be increased or reduced in dependence on relatedness and the heritability of the phenotype. Genomic control cannot be recommended to deal with inflation due to relatedness. Although it preserves type I error, the loss in power can be considerable. We provide a simple formula for estimating variance inflation given the relatedness structure and the heritability of a trait of interest. As a rule of thumb, variance inflation below 1.05 does not require correction and simple linear regression analysis is still appropriate. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12863-017-0571-x) contains supplementary material, which is available to authorized users. BioMed Central 2017-12-06 /pmc/articles/PMC5719591/ /pubmed/29212447 http://dx.doi.org/10.1186/s12863-017-0571-x Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Gross, Arnd Tönjes, Anke Scholz, Markus On the impact of relatedness on SNP association analysis |
title | On the impact of relatedness on SNP association analysis |
title_full | On the impact of relatedness on SNP association analysis |
title_fullStr | On the impact of relatedness on SNP association analysis |
title_full_unstemmed | On the impact of relatedness on SNP association analysis |
title_short | On the impact of relatedness on SNP association analysis |
title_sort | on the impact of relatedness on snp association analysis |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5719591/ https://www.ncbi.nlm.nih.gov/pubmed/29212447 http://dx.doi.org/10.1186/s12863-017-0571-x |
work_keys_str_mv | AT grossarnd ontheimpactofrelatednessonsnpassociationanalysis AT tonjesanke ontheimpactofrelatednessonsnpassociationanalysis AT scholzmarkus ontheimpactofrelatednessonsnpassociationanalysis |