Cargando…

Comparison of multimarker logistic regression models, with application to a genomewide scan of schizophrenia

BACKGROUND: Genome-wide association studies (GWAS) are a widely used study design for detecting genetic causes of complex diseases. Current studies provide good coverage of common causal SNPs, but not rare ones. A popular method to detect rare causal variants is haplotype testing. A disadvantage of...

Descripción completa

Detalles Bibliográficos
Autores principales: Wason, James MS, Dudbridge, Frank
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2949738/
https://www.ncbi.nlm.nih.gov/pubmed/20828390
http://dx.doi.org/10.1186/1471-2156-11-80
_version_ 1782187567690547200
author Wason, James MS
Dudbridge, Frank
author_facet Wason, James MS
Dudbridge, Frank
author_sort Wason, James MS
collection PubMed
description BACKGROUND: Genome-wide association studies (GWAS) are a widely used study design for detecting genetic causes of complex diseases. Current studies provide good coverage of common causal SNPs, but not rare ones. A popular method to detect rare causal variants is haplotype testing. A disadvantage of this approach is that many parameters are estimated simultaneously, which can mean a loss of power and slower fitting to large datasets. Haplotype testing effectively tests both the allele frequencies and the linkage disequilibrium (LD) structure of the data. LD has previously been shown to be mostly attributable to LD between adjacent SNPs. We propose a generalised linear model (GLM) which models the effects of each SNP in a region as well as the statistical interactions between adjacent pairs. This is compared to two other commonly used multimarker GLMs: one with a main-effect parameter for each SNP; one with a parameter for each haplotype. RESULTS: We show the haplotype model has higher power for rare untyped causal SNPs, the main-effects model has higher power for common untyped causal SNPs, and the proposed model generally has power in between the two others. We show that the relative power of the three methods is dependent on the number of marker haplotypes the causal allele is present on, which depends on the age of the mutation. Except in the case of a common causal variant in high LD with markers, all three multimarker models are superior in power to single-SNP tests. Including the adjacent statistical interactions results in lower inflation in test statistics when a realistic level of population stratification is present in a dataset. Using the multimarker models, we analyse data from the Molecular Genetics of Schizophrenia study. The multimarker models find potential associations that are not found by single-SNP tests. However, multimarker models also require stricter control of data quality since biases can have a larger inflationary effect on multimarker test statistics than on single-SNP test statistics. CONCLUSIONS: Analysing a GWAS with multimarker models can yield candidate regions which may contain rare untyped causal variants. This is useful for increasing prior odds of association in future whole-genome sequence analyses.
format Text
id pubmed-2949738
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29497382010-11-03 Comparison of multimarker logistic regression models, with application to a genomewide scan of schizophrenia Wason, James MS Dudbridge, Frank BMC Genet Methodology Article BACKGROUND: Genome-wide association studies (GWAS) are a widely used study design for detecting genetic causes of complex diseases. Current studies provide good coverage of common causal SNPs, but not rare ones. A popular method to detect rare causal variants is haplotype testing. A disadvantage of this approach is that many parameters are estimated simultaneously, which can mean a loss of power and slower fitting to large datasets. Haplotype testing effectively tests both the allele frequencies and the linkage disequilibrium (LD) structure of the data. LD has previously been shown to be mostly attributable to LD between adjacent SNPs. We propose a generalised linear model (GLM) which models the effects of each SNP in a region as well as the statistical interactions between adjacent pairs. This is compared to two other commonly used multimarker GLMs: one with a main-effect parameter for each SNP; one with a parameter for each haplotype. RESULTS: We show the haplotype model has higher power for rare untyped causal SNPs, the main-effects model has higher power for common untyped causal SNPs, and the proposed model generally has power in between the two others. We show that the relative power of the three methods is dependent on the number of marker haplotypes the causal allele is present on, which depends on the age of the mutation. Except in the case of a common causal variant in high LD with markers, all three multimarker models are superior in power to single-SNP tests. Including the adjacent statistical interactions results in lower inflation in test statistics when a realistic level of population stratification is present in a dataset. Using the multimarker models, we analyse data from the Molecular Genetics of Schizophrenia study. The multimarker models find potential associations that are not found by single-SNP tests. However, multimarker models also require stricter control of data quality since biases can have a larger inflationary effect on multimarker test statistics than on single-SNP test statistics. CONCLUSIONS: Analysing a GWAS with multimarker models can yield candidate regions which may contain rare untyped causal variants. This is useful for increasing prior odds of association in future whole-genome sequence analyses. BioMed Central 2010-09-09 /pmc/articles/PMC2949738/ /pubmed/20828390 http://dx.doi.org/10.1186/1471-2156-11-80 Text en Copyright ©2010 Wason and Dudbridge; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Wason, James MS
Dudbridge, Frank
Comparison of multimarker logistic regression models, with application to a genomewide scan of schizophrenia
title Comparison of multimarker logistic regression models, with application to a genomewide scan of schizophrenia
title_full Comparison of multimarker logistic regression models, with application to a genomewide scan of schizophrenia
title_fullStr Comparison of multimarker logistic regression models, with application to a genomewide scan of schizophrenia
title_full_unstemmed Comparison of multimarker logistic regression models, with application to a genomewide scan of schizophrenia
title_short Comparison of multimarker logistic regression models, with application to a genomewide scan of schizophrenia
title_sort comparison of multimarker logistic regression models, with application to a genomewide scan of schizophrenia
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2949738/
https://www.ncbi.nlm.nih.gov/pubmed/20828390
http://dx.doi.org/10.1186/1471-2156-11-80
work_keys_str_mv AT wasonjamesms comparisonofmultimarkerlogisticregressionmodelswithapplicationtoagenomewidescanofschizophrenia
AT dudbridgefrank comparisonofmultimarkerlogisticregressionmodelswithapplicationtoagenomewidescanofschizophrenia