Cargando…
Integration Analysis of Three Omics Data Using Penalized Regression Methods: An Application to Bladder Cancer
Omics data integration is becoming necessary to investigate the genomic mechanisms involved in complex diseases. During the integration process, many challenges arise such as data heterogeneity, the smaller number of individuals in comparison to the number of parameters, multicollinearity, and inter...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4672920/ https://www.ncbi.nlm.nih.gov/pubmed/26646822 http://dx.doi.org/10.1371/journal.pgen.1005689 |
_version_ | 1782404645643091968 |
---|---|
author | Pineda, Silvia Real, Francisco X. Kogevinas, Manolis Carrato, Alfredo Chanock, Stephen J. Malats, Núria Van Steen, Kristel |
author_facet | Pineda, Silvia Real, Francisco X. Kogevinas, Manolis Carrato, Alfredo Chanock, Stephen J. Malats, Núria Van Steen, Kristel |
author_sort | Pineda, Silvia |
collection | PubMed |
description | Omics data integration is becoming necessary to investigate the genomic mechanisms involved in complex diseases. During the integration process, many challenges arise such as data heterogeneity, the smaller number of individuals in comparison to the number of parameters, multicollinearity, and interpretation and validation of results due to their complexity and lack of knowledge about biological processes. To overcome some of these issues, innovative statistical approaches are being developed. In this work, we propose a permutation-based method to concomitantly assess significance and correct by multiple testing with the MaxT algorithm. This was applied with penalized regression methods (LASSO and ENET) when exploring relationships between common genetic variants, DNA methylation and gene expression measured in bladder tumor samples. The overall analysis flow consisted of three steps: (1) SNPs/CpGs were selected per each gene probe within 1Mb window upstream and downstream the gene; (2) LASSO and ENET were applied to assess the association between each expression probe and the selected SNPs/CpGs in three multivariable models (SNP, CPG, and Global models, the latter integrating SNPs and CPGs); and (3) the significance of each model was assessed using the permutation-based MaxT method. We identified 48 genes whose expression levels were significantly associated with both SNPs and CPGs. Importantly, 36 (75%) of them were replicated in an independent data set (TCGA) and the performance of the proposed method was checked with a simulation study. We further support our results with a biological interpretation based on an enrichment analysis. The approach we propose allows reducing computational time and is flexible and easy to implement when analyzing several types of omics data. Our results highlight the importance of integrating omics data by applying appropriate statistical strategies to discover new insights into the complex genetic mechanisms involved in disease conditions. |
format | Online Article Text |
id | pubmed-4672920 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-46729202015-12-16 Integration Analysis of Three Omics Data Using Penalized Regression Methods: An Application to Bladder Cancer Pineda, Silvia Real, Francisco X. Kogevinas, Manolis Carrato, Alfredo Chanock, Stephen J. Malats, Núria Van Steen, Kristel PLoS Genet Research Article Omics data integration is becoming necessary to investigate the genomic mechanisms involved in complex diseases. During the integration process, many challenges arise such as data heterogeneity, the smaller number of individuals in comparison to the number of parameters, multicollinearity, and interpretation and validation of results due to their complexity and lack of knowledge about biological processes. To overcome some of these issues, innovative statistical approaches are being developed. In this work, we propose a permutation-based method to concomitantly assess significance and correct by multiple testing with the MaxT algorithm. This was applied with penalized regression methods (LASSO and ENET) when exploring relationships between common genetic variants, DNA methylation and gene expression measured in bladder tumor samples. The overall analysis flow consisted of three steps: (1) SNPs/CpGs were selected per each gene probe within 1Mb window upstream and downstream the gene; (2) LASSO and ENET were applied to assess the association between each expression probe and the selected SNPs/CpGs in three multivariable models (SNP, CPG, and Global models, the latter integrating SNPs and CPGs); and (3) the significance of each model was assessed using the permutation-based MaxT method. We identified 48 genes whose expression levels were significantly associated with both SNPs and CPGs. Importantly, 36 (75%) of them were replicated in an independent data set (TCGA) and the performance of the proposed method was checked with a simulation study. We further support our results with a biological interpretation based on an enrichment analysis. The approach we propose allows reducing computational time and is flexible and easy to implement when analyzing several types of omics data. Our results highlight the importance of integrating omics data by applying appropriate statistical strategies to discover new insights into the complex genetic mechanisms involved in disease conditions. Public Library of Science 2015-12-08 /pmc/articles/PMC4672920/ /pubmed/26646822 http://dx.doi.org/10.1371/journal.pgen.1005689 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration, which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. |
spellingShingle | Research Article Pineda, Silvia Real, Francisco X. Kogevinas, Manolis Carrato, Alfredo Chanock, Stephen J. Malats, Núria Van Steen, Kristel Integration Analysis of Three Omics Data Using Penalized Regression Methods: An Application to Bladder Cancer |
title | Integration Analysis of Three Omics Data Using Penalized Regression Methods: An Application to Bladder Cancer |
title_full | Integration Analysis of Three Omics Data Using Penalized Regression Methods: An Application to Bladder Cancer |
title_fullStr | Integration Analysis of Three Omics Data Using Penalized Regression Methods: An Application to Bladder Cancer |
title_full_unstemmed | Integration Analysis of Three Omics Data Using Penalized Regression Methods: An Application to Bladder Cancer |
title_short | Integration Analysis of Three Omics Data Using Penalized Regression Methods: An Application to Bladder Cancer |
title_sort | integration analysis of three omics data using penalized regression methods: an application to bladder cancer |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4672920/ https://www.ncbi.nlm.nih.gov/pubmed/26646822 http://dx.doi.org/10.1371/journal.pgen.1005689 |
work_keys_str_mv | AT pinedasilvia integrationanalysisofthreeomicsdatausingpenalizedregressionmethodsanapplicationtobladdercancer AT realfranciscox integrationanalysisofthreeomicsdatausingpenalizedregressionmethodsanapplicationtobladdercancer AT kogevinasmanolis integrationanalysisofthreeomicsdatausingpenalizedregressionmethodsanapplicationtobladdercancer AT carratoalfredo integrationanalysisofthreeomicsdatausingpenalizedregressionmethodsanapplicationtobladdercancer AT chanockstephenj integrationanalysisofthreeomicsdatausingpenalizedregressionmethodsanapplicationtobladdercancer AT malatsnuria integrationanalysisofthreeomicsdatausingpenalizedregressionmethodsanapplicationtobladdercancer AT vansteenkristel integrationanalysisofthreeomicsdatausingpenalizedregressionmethodsanapplicationtobladdercancer |