Cargando…

Improving stability of prediction models based on correlated omics data by using network approaches

Building prediction models based on complex omics datasets such as transcriptomics, proteomics, metabolomics remains a challenge in bioinformatics and biostatistics. Regularized regression techniques are typically used to deal with the high dimensionality of these datasets. However, due to the prese...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tissier, Renaud, Houwing-Duistermaat, Jeanine, Rodríguez-Girondo, Mar
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2018
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5819809/ https://www.ncbi.nlm.nih.gov/pubmed/29462177 http://dx.doi.org/10.1371/journal.pone.0192853

_version_	1783301273291849728
author	Tissier, Renaud Houwing-Duistermaat, Jeanine Rodríguez-Girondo, Mar
author_facet	Tissier, Renaud Houwing-Duistermaat, Jeanine Rodríguez-Girondo, Mar
author_sort	Tissier, Renaud
collection	PubMed
description	Building prediction models based on complex omics datasets such as transcriptomics, proteomics, metabolomics remains a challenge in bioinformatics and biostatistics. Regularized regression techniques are typically used to deal with the high dimensionality of these datasets. However, due to the presence of correlation in the datasets, it is difficult to select the best model and application of these methods yields unstable results. We propose a novel strategy for model selection where the obtained models also perform well in terms of overall predictability. Several three step approaches are considered, where the steps are 1) network construction, 2) clustering to empirically derive modules or pathways, and 3) building a prediction model incorporating the information on the modules. For the first step, we use weighted correlation networks and Gaussian graphical modelling. Identification of groups of features is performed by hierarchical clustering. The grouping information is included in the prediction model by using group-based variable selection or group-specific penalization. We compare the performance of our new approaches with standard regularized regression via simulations. Based on these results we provide recommendations for selecting a strategy for building a prediction model given the specific goal of the analysis and the sizes of the datasets. Finally we illustrate the advantages of our approach by application of the methodology to two problems, namely prediction of body mass index in the DIetary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome study (DILGOM) and prediction of response of each breast cancer cell line to treatment with specific drugs using a breast cancer cell lines pharmacogenomics dataset.
format	Online Article Text
id	pubmed-5819809
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-58198092018-03-15 Improving stability of prediction models based on correlated omics data by using network approaches Tissier, Renaud Houwing-Duistermaat, Jeanine Rodríguez-Girondo, Mar PLoS One Research Article Building prediction models based on complex omics datasets such as transcriptomics, proteomics, metabolomics remains a challenge in bioinformatics and biostatistics. Regularized regression techniques are typically used to deal with the high dimensionality of these datasets. However, due to the presence of correlation in the datasets, it is difficult to select the best model and application of these methods yields unstable results. We propose a novel strategy for model selection where the obtained models also perform well in terms of overall predictability. Several three step approaches are considered, where the steps are 1) network construction, 2) clustering to empirically derive modules or pathways, and 3) building a prediction model incorporating the information on the modules. For the first step, we use weighted correlation networks and Gaussian graphical modelling. Identification of groups of features is performed by hierarchical clustering. The grouping information is included in the prediction model by using group-based variable selection or group-specific penalization. We compare the performance of our new approaches with standard regularized regression via simulations. Based on these results we provide recommendations for selecting a strategy for building a prediction model given the specific goal of the analysis and the sizes of the datasets. Finally we illustrate the advantages of our approach by application of the methodology to two problems, namely prediction of body mass index in the DIetary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome study (DILGOM) and prediction of response of each breast cancer cell line to treatment with specific drugs using a breast cancer cell lines pharmacogenomics dataset. Public Library of Science 2018-02-20 /pmc/articles/PMC5819809/ /pubmed/29462177 http://dx.doi.org/10.1371/journal.pone.0192853 Text en © 2018 Tissier et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Tissier, Renaud Houwing-Duistermaat, Jeanine Rodríguez-Girondo, Mar Improving stability of prediction models based on correlated omics data by using network approaches
title	Improving stability of prediction models based on correlated omics data by using network approaches
title_full	Improving stability of prediction models based on correlated omics data by using network approaches
title_fullStr	Improving stability of prediction models based on correlated omics data by using network approaches
title_full_unstemmed	Improving stability of prediction models based on correlated omics data by using network approaches
title_short	Improving stability of prediction models based on correlated omics data by using network approaches
title_sort	improving stability of prediction models based on correlated omics data by using network approaches
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5819809/ https://www.ncbi.nlm.nih.gov/pubmed/29462177 http://dx.doi.org/10.1371/journal.pone.0192853
work_keys_str_mv	AT tissierrenaud improvingstabilityofpredictionmodelsbasedoncorrelatedomicsdatabyusingnetworkapproaches AT houwingduistermaatjeanine improvingstabilityofpredictionmodelsbasedoncorrelatedomicsdatabyusingnetworkapproaches AT rodriguezgirondomar improvingstabilityofpredictionmodelsbasedoncorrelatedomicsdatabyusingnetworkapproaches

Improving stability of prediction models based on correlated omics data by using network approaches

Ejemplares similares