Cargando…
Network-based regularization for high dimensional SNP data in the case–control study of Type 2 diabetes
BACKGROUND: Over the past decades, the prevalence of type 2 diabetes mellitus (T2D) has been steadily increasing around the world. Despite large efforts devoted to better understand the genetic basis of the disease, the identified susceptibility loci can only account for a small portion of the T2D h...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5434559/ https://www.ncbi.nlm.nih.gov/pubmed/28511641 http://dx.doi.org/10.1186/s12863-017-0495-5 |
_version_ | 1783237070269972480 |
---|---|
author | Ren, Jie He, Tao Li, Ye Liu, Sai Du, Yinhao Jiang, Yu Wu, Cen |
author_facet | Ren, Jie He, Tao Li, Ye Liu, Sai Du, Yinhao Jiang, Yu Wu, Cen |
author_sort | Ren, Jie |
collection | PubMed |
description | BACKGROUND: Over the past decades, the prevalence of type 2 diabetes mellitus (T2D) has been steadily increasing around the world. Despite large efforts devoted to better understand the genetic basis of the disease, the identified susceptibility loci can only account for a small portion of the T2D heritability. Some of the existing approaches proposed for the high dimensional genetic data from the T2D case–control study are limited by analyzing a few number of SNPs at a time from a large pool of SNPs, by ignoring the correlations among SNPs and by adopting inefficient selection techniques. METHODS: We propose a network constrained regularization method to select important SNPs by taking the linkage disequilibrium into account. To accomodate the case control study, an iteratively reweighted least square algorithm has been developed within the coordinate descent framework where optimization of the regularized logistic loss function is performed with respect to one parameter at a time and iteratively cycle through all the parameters until convergence. RESULTS: In this article, a novel approach is developed to identify important SNPs more effectively through incorporating the interconnections among them in the regularized selection. A coordinate descent based iteratively reweighed least squares (IRLS) algorithm has been proposed. CONCLUSIONS: Both the simulation study and the analysis of the Nurses’s Health Study, a case–control study of type 2 diabetes data with high dimensional SNP measurements, demonstrate the advantage of the network based approach over the competing alternatives. |
format | Online Article Text |
id | pubmed-5434559 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-54345592017-05-18 Network-based regularization for high dimensional SNP data in the case–control study of Type 2 diabetes Ren, Jie He, Tao Li, Ye Liu, Sai Du, Yinhao Jiang, Yu Wu, Cen BMC Genet Methodology Article BACKGROUND: Over the past decades, the prevalence of type 2 diabetes mellitus (T2D) has been steadily increasing around the world. Despite large efforts devoted to better understand the genetic basis of the disease, the identified susceptibility loci can only account for a small portion of the T2D heritability. Some of the existing approaches proposed for the high dimensional genetic data from the T2D case–control study are limited by analyzing a few number of SNPs at a time from a large pool of SNPs, by ignoring the correlations among SNPs and by adopting inefficient selection techniques. METHODS: We propose a network constrained regularization method to select important SNPs by taking the linkage disequilibrium into account. To accomodate the case control study, an iteratively reweighted least square algorithm has been developed within the coordinate descent framework where optimization of the regularized logistic loss function is performed with respect to one parameter at a time and iteratively cycle through all the parameters until convergence. RESULTS: In this article, a novel approach is developed to identify important SNPs more effectively through incorporating the interconnections among them in the regularized selection. A coordinate descent based iteratively reweighed least squares (IRLS) algorithm has been proposed. CONCLUSIONS: Both the simulation study and the analysis of the Nurses’s Health Study, a case–control study of type 2 diabetes data with high dimensional SNP measurements, demonstrate the advantage of the network based approach over the competing alternatives. BioMed Central 2017-05-16 /pmc/articles/PMC5434559/ /pubmed/28511641 http://dx.doi.org/10.1186/s12863-017-0495-5 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Ren, Jie He, Tao Li, Ye Liu, Sai Du, Yinhao Jiang, Yu Wu, Cen Network-based regularization for high dimensional SNP data in the case–control study of Type 2 diabetes |
title | Network-based regularization for high dimensional SNP data in the case–control study of Type 2 diabetes |
title_full | Network-based regularization for high dimensional SNP data in the case–control study of Type 2 diabetes |
title_fullStr | Network-based regularization for high dimensional SNP data in the case–control study of Type 2 diabetes |
title_full_unstemmed | Network-based regularization for high dimensional SNP data in the case–control study of Type 2 diabetes |
title_short | Network-based regularization for high dimensional SNP data in the case–control study of Type 2 diabetes |
title_sort | network-based regularization for high dimensional snp data in the case–control study of type 2 diabetes |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5434559/ https://www.ncbi.nlm.nih.gov/pubmed/28511641 http://dx.doi.org/10.1186/s12863-017-0495-5 |
work_keys_str_mv | AT renjie networkbasedregularizationforhighdimensionalsnpdatainthecasecontrolstudyoftype2diabetes AT hetao networkbasedregularizationforhighdimensionalsnpdatainthecasecontrolstudyoftype2diabetes AT liye networkbasedregularizationforhighdimensionalsnpdatainthecasecontrolstudyoftype2diabetes AT liusai networkbasedregularizationforhighdimensionalsnpdatainthecasecontrolstudyoftype2diabetes AT duyinhao networkbasedregularizationforhighdimensionalsnpdatainthecasecontrolstudyoftype2diabetes AT jiangyu networkbasedregularizationforhighdimensionalsnpdatainthecasecontrolstudyoftype2diabetes AT wucen networkbasedregularizationforhighdimensionalsnpdatainthecasecontrolstudyoftype2diabetes |