Cargando…

Identifying main effects and epistatic interactions from large-scale SNP data via adaptive group Lasso

BACKGROUND: Single nucleotide polymorphism (SNP) based association studies aim at identifying SNPs associated with phenotypes, for example, complex diseases. The associated SNPs may influence the disease risk individually (main effects) or behave jointly (epistatic interactions). For the analysis of...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Can, Wan, Xiang, Yang, Qiang, Xue, Hong, Yu, Weichuan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3203332/
https://www.ncbi.nlm.nih.gov/pubmed/20122189
http://dx.doi.org/10.1186/1471-2105-11-S1-S18
_version_ 1782215107488514048
author Yang, Can
Wan, Xiang
Yang, Qiang
Xue, Hong
Yu, Weichuan
author_facet Yang, Can
Wan, Xiang
Yang, Qiang
Xue, Hong
Yu, Weichuan
author_sort Yang, Can
collection PubMed
description BACKGROUND: Single nucleotide polymorphism (SNP) based association studies aim at identifying SNPs associated with phenotypes, for example, complex diseases. The associated SNPs may influence the disease risk individually (main effects) or behave jointly (epistatic interactions). For the analysis of high throughput data, the main difficulty is that the number of SNPs far exceeds the number of samples. This difficulty is amplified when identifying interactions. RESULTS: In this paper, we propose an Adaptive Group Lasso (AGL) model for large-scale association studies. Our model enables us to analyze SNPs and their interactions simultaneously. We achieve this by introducing a sparsity constraint in our model based on the fact that only a small fraction of SNPs is disease-associated. In order to reduce the number of false positive findings, we develop an adaptive reweighting scheme to enhance sparsity. In addition, our method treats SNPs and their interactions as factors, and identifies them in a grouped manner. Thus, it is flexible to analyze various disease models, especially for interaction detection. However, due to the intensive computation when millions of interaction terms needs to be searched in the model fitting, our method needs to combined with some filtering methods when applied to genome-wide data for detecting interactions. CONCLUSION: By using a wide range of simulated datasets and a real dataset from WTCCC, we demonstrate the advantages of our method.
format Online
Article
Text
id pubmed-3203332
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32033322011-10-29 Identifying main effects and epistatic interactions from large-scale SNP data via adaptive group Lasso Yang, Can Wan, Xiang Yang, Qiang Xue, Hong Yu, Weichuan BMC Bioinformatics Research BACKGROUND: Single nucleotide polymorphism (SNP) based association studies aim at identifying SNPs associated with phenotypes, for example, complex diseases. The associated SNPs may influence the disease risk individually (main effects) or behave jointly (epistatic interactions). For the analysis of high throughput data, the main difficulty is that the number of SNPs far exceeds the number of samples. This difficulty is amplified when identifying interactions. RESULTS: In this paper, we propose an Adaptive Group Lasso (AGL) model for large-scale association studies. Our model enables us to analyze SNPs and their interactions simultaneously. We achieve this by introducing a sparsity constraint in our model based on the fact that only a small fraction of SNPs is disease-associated. In order to reduce the number of false positive findings, we develop an adaptive reweighting scheme to enhance sparsity. In addition, our method treats SNPs and their interactions as factors, and identifies them in a grouped manner. Thus, it is flexible to analyze various disease models, especially for interaction detection. However, due to the intensive computation when millions of interaction terms needs to be searched in the model fitting, our method needs to combined with some filtering methods when applied to genome-wide data for detecting interactions. CONCLUSION: By using a wide range of simulated datasets and a real dataset from WTCCC, we demonstrate the advantages of our method. BioMed Central 2010-01-18 /pmc/articles/PMC3203332/ /pubmed/20122189 http://dx.doi.org/10.1186/1471-2105-11-S1-S18 Text en Copyright ©2010 Yang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Yang, Can
Wan, Xiang
Yang, Qiang
Xue, Hong
Yu, Weichuan
Identifying main effects and epistatic interactions from large-scale SNP data via adaptive group Lasso
title Identifying main effects and epistatic interactions from large-scale SNP data via adaptive group Lasso
title_full Identifying main effects and epistatic interactions from large-scale SNP data via adaptive group Lasso
title_fullStr Identifying main effects and epistatic interactions from large-scale SNP data via adaptive group Lasso
title_full_unstemmed Identifying main effects and epistatic interactions from large-scale SNP data via adaptive group Lasso
title_short Identifying main effects and epistatic interactions from large-scale SNP data via adaptive group Lasso
title_sort identifying main effects and epistatic interactions from large-scale snp data via adaptive group lasso
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3203332/
https://www.ncbi.nlm.nih.gov/pubmed/20122189
http://dx.doi.org/10.1186/1471-2105-11-S1-S18
work_keys_str_mv AT yangcan identifyingmaineffectsandepistaticinteractionsfromlargescalesnpdataviaadaptivegrouplasso
AT wanxiang identifyingmaineffectsandepistaticinteractionsfromlargescalesnpdataviaadaptivegrouplasso
AT yangqiang identifyingmaineffectsandepistaticinteractionsfromlargescalesnpdataviaadaptivegrouplasso
AT xuehong identifyingmaineffectsandepistaticinteractionsfromlargescalesnpdataviaadaptivegrouplasso
AT yuweichuan identifyingmaineffectsandepistaticinteractionsfromlargescalesnpdataviaadaptivegrouplasso