Cargando…

Integrative analysis of multiple diverse omics datasets by sparse group multitask regression

A variety of high throughput genome-wide assays enable the exploration of genetic risk factors underlying complex traits. Although these studies have remarkable impact on identifying susceptible biomarkers, they suffer from issues such as limited sample size and low reproducibility. Combining indivi...

Descripción completa

Detalles Bibliográficos
Autores principales: Lin, Dongdong, Zhang, Jigang, Li, Jingyao, He, Hao, Deng, Hong-Wen, Wang, Yu-Ping
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4209817/
https://www.ncbi.nlm.nih.gov/pubmed/25364766
http://dx.doi.org/10.3389/fcell.2014.00062
_version_ 1782341300238942208
author Lin, Dongdong
Zhang, Jigang
Li, Jingyao
He, Hao
Deng, Hong-Wen
Wang, Yu-Ping
author_facet Lin, Dongdong
Zhang, Jigang
Li, Jingyao
He, Hao
Deng, Hong-Wen
Wang, Yu-Ping
author_sort Lin, Dongdong
collection PubMed
description A variety of high throughput genome-wide assays enable the exploration of genetic risk factors underlying complex traits. Although these studies have remarkable impact on identifying susceptible biomarkers, they suffer from issues such as limited sample size and low reproducibility. Combining individual studies of different genetic levels/platforms has the promise to improve the power and consistency of biomarker identification. In this paper, we propose a novel integrative method, namely sparse group multitask regression, for integrating diverse omics datasets, platforms, and populations to identify risk genes/factors of complex diseases. This method combines multitask learning with sparse group regularization, which will: (1) treat the biomarker identification in each single study as a task and then combine them by multitask learning; (2) group variables from all studies for identifying significant genes; (3) enforce sparse constraint on groups of variables to overcome the “small sample, but large variables” problem. We introduce two sparse group penalties: sparse group lasso and sparse group ridge in our multitask model, and provide an effective algorithm for each model. In addition, we propose a significance test for the identification of potential risk genes. Two simulation studies are performed to evaluate the performance of our integrative method by comparing it with conventional meta-analysis method. The results show that our sparse group multitask method outperforms meta-analysis method significantly. In an application to our osteoporosis studies, 7 genes are identified as significant genes by our method and are found to have significant effects in other three independent studies for validation. The most significant gene SOD2 has been identified in our previous osteoporosis study involving the same expression dataset. Several other genes such as TREML2, HTR1E, and GLO1 are shown to be novel susceptible genes for osteoporosis, as confirmed from other studies.
format Online
Article
Text
id pubmed-4209817
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-42098172014-10-31 Integrative analysis of multiple diverse omics datasets by sparse group multitask regression Lin, Dongdong Zhang, Jigang Li, Jingyao He, Hao Deng, Hong-Wen Wang, Yu-Ping Front Cell Dev Biol Physiology A variety of high throughput genome-wide assays enable the exploration of genetic risk factors underlying complex traits. Although these studies have remarkable impact on identifying susceptible biomarkers, they suffer from issues such as limited sample size and low reproducibility. Combining individual studies of different genetic levels/platforms has the promise to improve the power and consistency of biomarker identification. In this paper, we propose a novel integrative method, namely sparse group multitask regression, for integrating diverse omics datasets, platforms, and populations to identify risk genes/factors of complex diseases. This method combines multitask learning with sparse group regularization, which will: (1) treat the biomarker identification in each single study as a task and then combine them by multitask learning; (2) group variables from all studies for identifying significant genes; (3) enforce sparse constraint on groups of variables to overcome the “small sample, but large variables” problem. We introduce two sparse group penalties: sparse group lasso and sparse group ridge in our multitask model, and provide an effective algorithm for each model. In addition, we propose a significance test for the identification of potential risk genes. Two simulation studies are performed to evaluate the performance of our integrative method by comparing it with conventional meta-analysis method. The results show that our sparse group multitask method outperforms meta-analysis method significantly. In an application to our osteoporosis studies, 7 genes are identified as significant genes by our method and are found to have significant effects in other three independent studies for validation. The most significant gene SOD2 has been identified in our previous osteoporosis study involving the same expression dataset. Several other genes such as TREML2, HTR1E, and GLO1 are shown to be novel susceptible genes for osteoporosis, as confirmed from other studies. Frontiers Media S.A. 2014-10-27 /pmc/articles/PMC4209817/ /pubmed/25364766 http://dx.doi.org/10.3389/fcell.2014.00062 Text en Copyright © 2014 Lin, Zhang, Li, He, Deng and Wang. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Physiology
Lin, Dongdong
Zhang, Jigang
Li, Jingyao
He, Hao
Deng, Hong-Wen
Wang, Yu-Ping
Integrative analysis of multiple diverse omics datasets by sparse group multitask regression
title Integrative analysis of multiple diverse omics datasets by sparse group multitask regression
title_full Integrative analysis of multiple diverse omics datasets by sparse group multitask regression
title_fullStr Integrative analysis of multiple diverse omics datasets by sparse group multitask regression
title_full_unstemmed Integrative analysis of multiple diverse omics datasets by sparse group multitask regression
title_short Integrative analysis of multiple diverse omics datasets by sparse group multitask regression
title_sort integrative analysis of multiple diverse omics datasets by sparse group multitask regression
topic Physiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4209817/
https://www.ncbi.nlm.nih.gov/pubmed/25364766
http://dx.doi.org/10.3389/fcell.2014.00062
work_keys_str_mv AT lindongdong integrativeanalysisofmultiplediverseomicsdatasetsbysparsegroupmultitaskregression
AT zhangjigang integrativeanalysisofmultiplediverseomicsdatasetsbysparsegroupmultitaskregression
AT lijingyao integrativeanalysisofmultiplediverseomicsdatasetsbysparsegroupmultitaskregression
AT hehao integrativeanalysisofmultiplediverseomicsdatasetsbysparsegroupmultitaskregression
AT denghongwen integrativeanalysisofmultiplediverseomicsdatasetsbysparsegroupmultitaskregression
AT wangyuping integrativeanalysisofmultiplediverseomicsdatasetsbysparsegroupmultitaskregression