Cargando…

IPF-LASSO: Integrative L (1)-Penalized Regression with Penalty Factors for Prediction Based on Multi-Omics Data

As modern biotechnologies advance, it has become increasingly frequent that different modalities of high-dimensional molecular data (termed “omics” data in this paper), such as gene expression, methylation, and copy number, are collected from the same patient cohort to predict the clinical outcome....

Descripción completa

Detalles Bibliográficos
Autores principales: Boulesteix, Anne-Laure, De Bin, Riccardo, Jiang, Xiaoyu, Fuchs, Mathias
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5435977/
https://www.ncbi.nlm.nih.gov/pubmed/28546826
http://dx.doi.org/10.1155/2017/7691937
_version_ 1783237319866712064
author Boulesteix, Anne-Laure
De Bin, Riccardo
Jiang, Xiaoyu
Fuchs, Mathias
author_facet Boulesteix, Anne-Laure
De Bin, Riccardo
Jiang, Xiaoyu
Fuchs, Mathias
author_sort Boulesteix, Anne-Laure
collection PubMed
description As modern biotechnologies advance, it has become increasingly frequent that different modalities of high-dimensional molecular data (termed “omics” data in this paper), such as gene expression, methylation, and copy number, are collected from the same patient cohort to predict the clinical outcome. While prediction based on omics data has been widely studied in the last fifteen years, little has been done in the statistical literature on the integration of multiple omics modalities to select a subset of variables for prediction, which is a critical task in personalized medicine. In this paper, we propose a simple penalized regression method to address this problem by assigning different penalty factors to different data modalities for feature selection and prediction. The penalty factors can be chosen in a fully data-driven fashion by cross-validation or by taking practical considerations into account. In simulation studies, we compare the prediction performance of our approach, called IPF-LASSO (Integrative LASSO with Penalty Factors) and implemented in the R package ipflasso, with the standard LASSO and sparse group LASSO. The use of IPF-LASSO is also illustrated through applications to two real-life cancer datasets. All data and codes are available on the companion website to ensure reproducibility.
format Online
Article
Text
id pubmed-5435977
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-54359772017-05-25 IPF-LASSO: Integrative L (1)-Penalized Regression with Penalty Factors for Prediction Based on Multi-Omics Data Boulesteix, Anne-Laure De Bin, Riccardo Jiang, Xiaoyu Fuchs, Mathias Comput Math Methods Med Research Article As modern biotechnologies advance, it has become increasingly frequent that different modalities of high-dimensional molecular data (termed “omics” data in this paper), such as gene expression, methylation, and copy number, are collected from the same patient cohort to predict the clinical outcome. While prediction based on omics data has been widely studied in the last fifteen years, little has been done in the statistical literature on the integration of multiple omics modalities to select a subset of variables for prediction, which is a critical task in personalized medicine. In this paper, we propose a simple penalized regression method to address this problem by assigning different penalty factors to different data modalities for feature selection and prediction. The penalty factors can be chosen in a fully data-driven fashion by cross-validation or by taking practical considerations into account. In simulation studies, we compare the prediction performance of our approach, called IPF-LASSO (Integrative LASSO with Penalty Factors) and implemented in the R package ipflasso, with the standard LASSO and sparse group LASSO. The use of IPF-LASSO is also illustrated through applications to two real-life cancer datasets. All data and codes are available on the companion website to ensure reproducibility. Hindawi 2017 2017-05-04 /pmc/articles/PMC5435977/ /pubmed/28546826 http://dx.doi.org/10.1155/2017/7691937 Text en Copyright © 2017 Anne-Laure Boulesteix et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Boulesteix, Anne-Laure
De Bin, Riccardo
Jiang, Xiaoyu
Fuchs, Mathias
IPF-LASSO: Integrative L (1)-Penalized Regression with Penalty Factors for Prediction Based on Multi-Omics Data
title IPF-LASSO: Integrative L (1)-Penalized Regression with Penalty Factors for Prediction Based on Multi-Omics Data
title_full IPF-LASSO: Integrative L (1)-Penalized Regression with Penalty Factors for Prediction Based on Multi-Omics Data
title_fullStr IPF-LASSO: Integrative L (1)-Penalized Regression with Penalty Factors for Prediction Based on Multi-Omics Data
title_full_unstemmed IPF-LASSO: Integrative L (1)-Penalized Regression with Penalty Factors for Prediction Based on Multi-Omics Data
title_short IPF-LASSO: Integrative L (1)-Penalized Regression with Penalty Factors for Prediction Based on Multi-Omics Data
title_sort ipf-lasso: integrative l (1)-penalized regression with penalty factors for prediction based on multi-omics data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5435977/
https://www.ncbi.nlm.nih.gov/pubmed/28546826
http://dx.doi.org/10.1155/2017/7691937
work_keys_str_mv AT boulesteixannelaure ipflassointegrativel1penalizedregressionwithpenaltyfactorsforpredictionbasedonmultiomicsdata
AT debinriccardo ipflassointegrativel1penalizedregressionwithpenaltyfactorsforpredictionbasedonmultiomicsdata
AT jiangxiaoyu ipflassointegrativel1penalizedregressionwithpenaltyfactorsforpredictionbasedonmultiomicsdata
AT fuchsmathias ipflassointegrativel1penalizedregressionwithpenaltyfactorsforpredictionbasedonmultiomicsdata