Cargando…

Network-based regularization for high dimensional SNP data in the case–control study of Type 2 diabetes

BACKGROUND: Over the past decades, the prevalence of type 2 diabetes mellitus (T2D) has been steadily increasing around the world. Despite large efforts devoted to better understand the genetic basis of the disease, the identified susceptibility loci can only account for a small portion of the T2D h...

Descripción completa

Detalles Bibliográficos
Autores principales: Ren, Jie, He, Tao, Li, Ye, Liu, Sai, Du, Yinhao, Jiang, Yu, Wu, Cen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5434559/
https://www.ncbi.nlm.nih.gov/pubmed/28511641
http://dx.doi.org/10.1186/s12863-017-0495-5
_version_ 1783237070269972480
author Ren, Jie
He, Tao
Li, Ye
Liu, Sai
Du, Yinhao
Jiang, Yu
Wu, Cen
author_facet Ren, Jie
He, Tao
Li, Ye
Liu, Sai
Du, Yinhao
Jiang, Yu
Wu, Cen
author_sort Ren, Jie
collection PubMed
description BACKGROUND: Over the past decades, the prevalence of type 2 diabetes mellitus (T2D) has been steadily increasing around the world. Despite large efforts devoted to better understand the genetic basis of the disease, the identified susceptibility loci can only account for a small portion of the T2D heritability. Some of the existing approaches proposed for the high dimensional genetic data from the T2D case–control study are limited by analyzing a few number of SNPs at a time from a large pool of SNPs, by ignoring the correlations among SNPs and by adopting inefficient selection techniques. METHODS: We propose a network constrained regularization method to select important SNPs by taking the linkage disequilibrium into account. To accomodate the case control study, an iteratively reweighted least square algorithm has been developed within the coordinate descent framework where optimization of the regularized logistic loss function is performed with respect to one parameter at a time and iteratively cycle through all the parameters until convergence. RESULTS: In this article, a novel approach is developed to identify important SNPs more effectively through incorporating the interconnections among them in the regularized selection. A coordinate descent based iteratively reweighed least squares (IRLS) algorithm has been proposed. CONCLUSIONS: Both the simulation study and the analysis of the Nurses’s Health Study, a case–control study of type 2 diabetes data with high dimensional SNP measurements, demonstrate the advantage of the network based approach over the competing alternatives.
format Online
Article
Text
id pubmed-5434559
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-54345592017-05-18 Network-based regularization for high dimensional SNP data in the case–control study of Type 2 diabetes Ren, Jie He, Tao Li, Ye Liu, Sai Du, Yinhao Jiang, Yu Wu, Cen BMC Genet Methodology Article BACKGROUND: Over the past decades, the prevalence of type 2 diabetes mellitus (T2D) has been steadily increasing around the world. Despite large efforts devoted to better understand the genetic basis of the disease, the identified susceptibility loci can only account for a small portion of the T2D heritability. Some of the existing approaches proposed for the high dimensional genetic data from the T2D case–control study are limited by analyzing a few number of SNPs at a time from a large pool of SNPs, by ignoring the correlations among SNPs and by adopting inefficient selection techniques. METHODS: We propose a network constrained regularization method to select important SNPs by taking the linkage disequilibrium into account. To accomodate the case control study, an iteratively reweighted least square algorithm has been developed within the coordinate descent framework where optimization of the regularized logistic loss function is performed with respect to one parameter at a time and iteratively cycle through all the parameters until convergence. RESULTS: In this article, a novel approach is developed to identify important SNPs more effectively through incorporating the interconnections among them in the regularized selection. A coordinate descent based iteratively reweighed least squares (IRLS) algorithm has been proposed. CONCLUSIONS: Both the simulation study and the analysis of the Nurses’s Health Study, a case–control study of type 2 diabetes data with high dimensional SNP measurements, demonstrate the advantage of the network based approach over the competing alternatives. BioMed Central 2017-05-16 /pmc/articles/PMC5434559/ /pubmed/28511641 http://dx.doi.org/10.1186/s12863-017-0495-5 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Ren, Jie
He, Tao
Li, Ye
Liu, Sai
Du, Yinhao
Jiang, Yu
Wu, Cen
Network-based regularization for high dimensional SNP data in the case–control study of Type 2 diabetes
title Network-based regularization for high dimensional SNP data in the case–control study of Type 2 diabetes
title_full Network-based regularization for high dimensional SNP data in the case–control study of Type 2 diabetes
title_fullStr Network-based regularization for high dimensional SNP data in the case–control study of Type 2 diabetes
title_full_unstemmed Network-based regularization for high dimensional SNP data in the case–control study of Type 2 diabetes
title_short Network-based regularization for high dimensional SNP data in the case–control study of Type 2 diabetes
title_sort network-based regularization for high dimensional snp data in the case–control study of type 2 diabetes
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5434559/
https://www.ncbi.nlm.nih.gov/pubmed/28511641
http://dx.doi.org/10.1186/s12863-017-0495-5
work_keys_str_mv AT renjie networkbasedregularizationforhighdimensionalsnpdatainthecasecontrolstudyoftype2diabetes
AT hetao networkbasedregularizationforhighdimensionalsnpdatainthecasecontrolstudyoftype2diabetes
AT liye networkbasedregularizationforhighdimensionalsnpdatainthecasecontrolstudyoftype2diabetes
AT liusai networkbasedregularizationforhighdimensionalsnpdatainthecasecontrolstudyoftype2diabetes
AT duyinhao networkbasedregularizationforhighdimensionalsnpdatainthecasecontrolstudyoftype2diabetes
AT jiangyu networkbasedregularizationforhighdimensionalsnpdatainthecasecontrolstudyoftype2diabetes
AT wucen networkbasedregularizationforhighdimensionalsnpdatainthecasecontrolstudyoftype2diabetes