Cargando…
Conditional Random Fields for Fast, Large-Scale Genome-Wide Association Studies
Understanding the role of genetic variation in human diseases remains an important problem to be solved in genomics. An important component of such variation consist of variations at single sites in DNA, or single nucleotide polymorphisms (SNPs). Typically, the problem of associating particular SNPs...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3134455/ https://www.ncbi.nlm.nih.gov/pubmed/21765897 http://dx.doi.org/10.1371/journal.pone.0021591 |
_version_ | 1782207989605728256 |
---|---|
author | Huang, Jim C. Meek, Christopher Kadie, Carl Heckerman, David |
author_facet | Huang, Jim C. Meek, Christopher Kadie, Carl Heckerman, David |
author_sort | Huang, Jim C. |
collection | PubMed |
description | Understanding the role of genetic variation in human diseases remains an important problem to be solved in genomics. An important component of such variation consist of variations at single sites in DNA, or single nucleotide polymorphisms (SNPs). Typically, the problem of associating particular SNPs to phenotypes has been confounded by hidden factors such as the presence of population structure, family structure or cryptic relatedness in the sample of individuals being analyzed. Such confounding factors lead to a large number of spurious associations and missed associations. Various statistical methods have been proposed to account for such confounding factors such as linear mixed-effect models (LMMs) or methods that adjust data based on a principal components analysis (PCA), but these methods either suffer from low power or cease to be tractable for larger numbers of individuals in the sample. Here we present a statistical model for conducting genome-wide association studies (GWAS) that accounts for such confounding factors. Our method scales in runtime quadratic in the number of individuals being studied with only a modest loss in statistical power as compared to LMM-based and PCA-based methods when testing on synthetic data that was generated from a generalized LMM. Applying our method to both real and synthetic human genotype/phenotype data, we demonstrate the ability of our model to correct for confounding factors while requiring significantly less runtime relative to LMMs. We have implemented methods for fitting these models, which are available at http://www.microsoft.com/science. |
format | Online Article Text |
id | pubmed-3134455 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-31344552011-07-15 Conditional Random Fields for Fast, Large-Scale Genome-Wide Association Studies Huang, Jim C. Meek, Christopher Kadie, Carl Heckerman, David PLoS One Research Article Understanding the role of genetic variation in human diseases remains an important problem to be solved in genomics. An important component of such variation consist of variations at single sites in DNA, or single nucleotide polymorphisms (SNPs). Typically, the problem of associating particular SNPs to phenotypes has been confounded by hidden factors such as the presence of population structure, family structure or cryptic relatedness in the sample of individuals being analyzed. Such confounding factors lead to a large number of spurious associations and missed associations. Various statistical methods have been proposed to account for such confounding factors such as linear mixed-effect models (LMMs) or methods that adjust data based on a principal components analysis (PCA), but these methods either suffer from low power or cease to be tractable for larger numbers of individuals in the sample. Here we present a statistical model for conducting genome-wide association studies (GWAS) that accounts for such confounding factors. Our method scales in runtime quadratic in the number of individuals being studied with only a modest loss in statistical power as compared to LMM-based and PCA-based methods when testing on synthetic data that was generated from a generalized LMM. Applying our method to both real and synthetic human genotype/phenotype data, we demonstrate the ability of our model to correct for confounding factors while requiring significantly less runtime relative to LMMs. We have implemented methods for fitting these models, which are available at http://www.microsoft.com/science. Public Library of Science 2011-07-12 /pmc/articles/PMC3134455/ /pubmed/21765897 http://dx.doi.org/10.1371/journal.pone.0021591 Text en Huang et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Huang, Jim C. Meek, Christopher Kadie, Carl Heckerman, David Conditional Random Fields for Fast, Large-Scale Genome-Wide Association Studies |
title | Conditional Random Fields for Fast, Large-Scale Genome-Wide Association Studies |
title_full | Conditional Random Fields for Fast, Large-Scale Genome-Wide Association Studies |
title_fullStr | Conditional Random Fields for Fast, Large-Scale Genome-Wide Association Studies |
title_full_unstemmed | Conditional Random Fields for Fast, Large-Scale Genome-Wide Association Studies |
title_short | Conditional Random Fields for Fast, Large-Scale Genome-Wide Association Studies |
title_sort | conditional random fields for fast, large-scale genome-wide association studies |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3134455/ https://www.ncbi.nlm.nih.gov/pubmed/21765897 http://dx.doi.org/10.1371/journal.pone.0021591 |
work_keys_str_mv | AT huangjimc conditionalrandomfieldsforfastlargescalegenomewideassociationstudies AT meekchristopher conditionalrandomfieldsforfastlargescalegenomewideassociationstudies AT kadiecarl conditionalrandomfieldsforfastlargescalegenomewideassociationstudies AT heckermandavid conditionalrandomfieldsforfastlargescalegenomewideassociationstudies |