Cargando…

Statistical distributions of test statistics used for quantitative trait association mapping in structured populations

BACKGROUND: Spurious associations between single nucleotide polymorphisms and phenotypes are a major issue in genome-wide association studies and have led to underestimation of type 1 error rate and overestimation of the number of quantitative trait loci found. Many authors have investigated the inf...

Descripción completa

Detalles Bibliográficos
Autores principales: Teyssèdre, Simon, Elsen, Jean-Michel, Ricard, Anne
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3817592/
https://www.ncbi.nlm.nih.gov/pubmed/23146127
http://dx.doi.org/10.1186/1297-9686-44-32
_version_ 1782478096285302784
author Teyssèdre, Simon
Elsen, Jean-Michel
Ricard, Anne
author_facet Teyssèdre, Simon
Elsen, Jean-Michel
Ricard, Anne
author_sort Teyssèdre, Simon
collection PubMed
description BACKGROUND: Spurious associations between single nucleotide polymorphisms and phenotypes are a major issue in genome-wide association studies and have led to underestimation of type 1 error rate and overestimation of the number of quantitative trait loci found. Many authors have investigated the influence of population structure on the robustness of methods by simulation. This paper is aimed at developing further the algebraic formalization of power and type 1 error rate for some of the classical statistical methods used: simple regression, two approximate methods of mixed models involving the effect of a single nucleotide polymorphism (SNP) and a random polygenic effect (GRAMMAR and FASTA) and the transmission/disequilibrium test for quantitative traits and nuclear families. Analytical formulae were derived using matrix algebra for the first and second moments of the statistical tests, assuming a true mixed model with a polygenic effect and SNP effects. RESULTS: The expectation and variance of the test statistics and their marginal expectations and variances according to the distribution of genotypes and estimators of variance components are given as a function of the relationship matrix and of the heritability of the polygenic effect. These formulae were used to compute type 1 error rate and power for any kind of relationship matrix between phenotyped and genotyped individuals for any level of heritability. For the regression method, type 1 error rate increased with the variability of relationships and with heritability, but decreased with the GRAMMAR method and was not affected with the FASTA and quantitative transmission/disequilibrium test methods. CONCLUSIONS: The formulae can be easily used to provide the correct threshold of type 1 error rate and to calculate the power when designing experiments or data collection protocols. The results concerning the efficacy of each method agree with simulation results in the literature but were generalized in this work. The power of the GRAMMAR method was equal to the power of the FASTA method at the same type 1 error rate. The power of the quantitative transmission/disequilibrium test was low. In conclusion, the FASTA method, which is very close to the full mixed model, is recommended in association mapping studies.
format Online
Article
Text
id pubmed-3817592
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-38175922013-11-07 Statistical distributions of test statistics used for quantitative trait association mapping in structured populations Teyssèdre, Simon Elsen, Jean-Michel Ricard, Anne Genet Sel Evol Research BACKGROUND: Spurious associations between single nucleotide polymorphisms and phenotypes are a major issue in genome-wide association studies and have led to underestimation of type 1 error rate and overestimation of the number of quantitative trait loci found. Many authors have investigated the influence of population structure on the robustness of methods by simulation. This paper is aimed at developing further the algebraic formalization of power and type 1 error rate for some of the classical statistical methods used: simple regression, two approximate methods of mixed models involving the effect of a single nucleotide polymorphism (SNP) and a random polygenic effect (GRAMMAR and FASTA) and the transmission/disequilibrium test for quantitative traits and nuclear families. Analytical formulae were derived using matrix algebra for the first and second moments of the statistical tests, assuming a true mixed model with a polygenic effect and SNP effects. RESULTS: The expectation and variance of the test statistics and their marginal expectations and variances according to the distribution of genotypes and estimators of variance components are given as a function of the relationship matrix and of the heritability of the polygenic effect. These formulae were used to compute type 1 error rate and power for any kind of relationship matrix between phenotyped and genotyped individuals for any level of heritability. For the regression method, type 1 error rate increased with the variability of relationships and with heritability, but decreased with the GRAMMAR method and was not affected with the FASTA and quantitative transmission/disequilibrium test methods. CONCLUSIONS: The formulae can be easily used to provide the correct threshold of type 1 error rate and to calculate the power when designing experiments or data collection protocols. The results concerning the efficacy of each method agree with simulation results in the literature but were generalized in this work. The power of the GRAMMAR method was equal to the power of the FASTA method at the same type 1 error rate. The power of the quantitative transmission/disequilibrium test was low. In conclusion, the FASTA method, which is very close to the full mixed model, is recommended in association mapping studies. BioMed Central 2012-11-12 /pmc/articles/PMC3817592/ /pubmed/23146127 http://dx.doi.org/10.1186/1297-9686-44-32 Text en Copyright © 2012 Teyssèdre et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Teyssèdre, Simon
Elsen, Jean-Michel
Ricard, Anne
Statistical distributions of test statistics used for quantitative trait association mapping in structured populations
title Statistical distributions of test statistics used for quantitative trait association mapping in structured populations
title_full Statistical distributions of test statistics used for quantitative trait association mapping in structured populations
title_fullStr Statistical distributions of test statistics used for quantitative trait association mapping in structured populations
title_full_unstemmed Statistical distributions of test statistics used for quantitative trait association mapping in structured populations
title_short Statistical distributions of test statistics used for quantitative trait association mapping in structured populations
title_sort statistical distributions of test statistics used for quantitative trait association mapping in structured populations
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3817592/
https://www.ncbi.nlm.nih.gov/pubmed/23146127
http://dx.doi.org/10.1186/1297-9686-44-32
work_keys_str_mv AT teyssedresimon statisticaldistributionsofteststatisticsusedforquantitativetraitassociationmappinginstructuredpopulations
AT elsenjeanmichel statisticaldistributionsofteststatisticsusedforquantitativetraitassociationmappinginstructuredpopulations
AT ricardanne statisticaldistributionsofteststatisticsusedforquantitativetraitassociationmappinginstructuredpopulations