Cargando…

An analytic approach for interpretable predictive models in high‐dimensional data in the presence of interactions with exposures

Predicting a phenotype and understanding which variables improve that prediction are two very challenging and overlapping problems in the analysis of high‐dimensional (HD) data such as those arising from genomic and brain imaging studies. It is often believed that the number of truly important predi...

Descripción completa

Detalles Bibliográficos
Autores principales: Bhatnagar, Sahir Rai, Yang, Yi, Khundrakpam, Budhachandra, Evans, Alan C., Blanchette, Mathieu, Bouchard, Luigi, Greenwood, Celia M.T.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6175336/
https://www.ncbi.nlm.nih.gov/pubmed/29423954
http://dx.doi.org/10.1002/gepi.22112
_version_ 1783361483964416000
author Bhatnagar, Sahir Rai
Yang, Yi
Khundrakpam, Budhachandra
Evans, Alan C.
Blanchette, Mathieu
Bouchard, Luigi
Greenwood, Celia M.T.
author_facet Bhatnagar, Sahir Rai
Yang, Yi
Khundrakpam, Budhachandra
Evans, Alan C.
Blanchette, Mathieu
Bouchard, Luigi
Greenwood, Celia M.T.
author_sort Bhatnagar, Sahir Rai
collection PubMed
description Predicting a phenotype and understanding which variables improve that prediction are two very challenging and overlapping problems in the analysis of high‐dimensional (HD) data such as those arising from genomic and brain imaging studies. It is often believed that the number of truly important predictors is small relative to the total number of variables, making computational approaches to variable selection and dimension reduction extremely important. To reduce dimensionality, commonly used two‐step methods first cluster the data in some way, and build models using cluster summaries to predict the phenotype. It is known that important exposure variables can alter correlation patterns between clusters of HD variables, that is, alter network properties of the variables. However, it is not well understood whether such altered clustering is informative in prediction. Here, assuming there is a binary exposure with such network‐altering effects, we explore whether the use of exposure‐dependent clustering relationships in dimension reduction can improve predictive modeling in a two‐step framework. Hence, we propose a modeling framework called ECLUST to test this hypothesis, and evaluate its performance through extensive simulations. With ECLUST, we found improved prediction and variable selection performance compared to methods that do not consider the environment in the clustering step, or to methods that use the original data as features. We further illustrate this modeling framework through the analysis of three data sets from very different fields, each with HD data, a binary exposure, and a phenotype of interest. Our method is available in the eclust CRAN package.
format Online
Article
Text
id pubmed-6175336
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-61753362018-10-19 An analytic approach for interpretable predictive models in high‐dimensional data in the presence of interactions with exposures Bhatnagar, Sahir Rai Yang, Yi Khundrakpam, Budhachandra Evans, Alan C. Blanchette, Mathieu Bouchard, Luigi Greenwood, Celia M.T. Genet Epidemiol Research Articles Predicting a phenotype and understanding which variables improve that prediction are two very challenging and overlapping problems in the analysis of high‐dimensional (HD) data such as those arising from genomic and brain imaging studies. It is often believed that the number of truly important predictors is small relative to the total number of variables, making computational approaches to variable selection and dimension reduction extremely important. To reduce dimensionality, commonly used two‐step methods first cluster the data in some way, and build models using cluster summaries to predict the phenotype. It is known that important exposure variables can alter correlation patterns between clusters of HD variables, that is, alter network properties of the variables. However, it is not well understood whether such altered clustering is informative in prediction. Here, assuming there is a binary exposure with such network‐altering effects, we explore whether the use of exposure‐dependent clustering relationships in dimension reduction can improve predictive modeling in a two‐step framework. Hence, we propose a modeling framework called ECLUST to test this hypothesis, and evaluate its performance through extensive simulations. With ECLUST, we found improved prediction and variable selection performance compared to methods that do not consider the environment in the clustering step, or to methods that use the original data as features. We further illustrate this modeling framework through the analysis of three data sets from very different fields, each with HD data, a binary exposure, and a phenotype of interest. Our method is available in the eclust CRAN package. John Wiley and Sons Inc. 2018-02-08 2018-04 /pmc/articles/PMC6175336/ /pubmed/29423954 http://dx.doi.org/10.1002/gepi.22112 Text en © 2018 The Authors. Genetic Epidemiology published by Wiley Periodicals, Inc. This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
spellingShingle Research Articles
Bhatnagar, Sahir Rai
Yang, Yi
Khundrakpam, Budhachandra
Evans, Alan C.
Blanchette, Mathieu
Bouchard, Luigi
Greenwood, Celia M.T.
An analytic approach for interpretable predictive models in high‐dimensional data in the presence of interactions with exposures
title An analytic approach for interpretable predictive models in high‐dimensional data in the presence of interactions with exposures
title_full An analytic approach for interpretable predictive models in high‐dimensional data in the presence of interactions with exposures
title_fullStr An analytic approach for interpretable predictive models in high‐dimensional data in the presence of interactions with exposures
title_full_unstemmed An analytic approach for interpretable predictive models in high‐dimensional data in the presence of interactions with exposures
title_short An analytic approach for interpretable predictive models in high‐dimensional data in the presence of interactions with exposures
title_sort analytic approach for interpretable predictive models in high‐dimensional data in the presence of interactions with exposures
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6175336/
https://www.ncbi.nlm.nih.gov/pubmed/29423954
http://dx.doi.org/10.1002/gepi.22112
work_keys_str_mv AT bhatnagarsahirrai ananalyticapproachforinterpretablepredictivemodelsinhighdimensionaldatainthepresenceofinteractionswithexposures
AT yangyi ananalyticapproachforinterpretablepredictivemodelsinhighdimensionaldatainthepresenceofinteractionswithexposures
AT khundrakpambudhachandra ananalyticapproachforinterpretablepredictivemodelsinhighdimensionaldatainthepresenceofinteractionswithexposures
AT evansalanc ananalyticapproachforinterpretablepredictivemodelsinhighdimensionaldatainthepresenceofinteractionswithexposures
AT blanchettemathieu ananalyticapproachforinterpretablepredictivemodelsinhighdimensionaldatainthepresenceofinteractionswithexposures
AT bouchardluigi ananalyticapproachforinterpretablepredictivemodelsinhighdimensionaldatainthepresenceofinteractionswithexposures
AT greenwoodceliamt ananalyticapproachforinterpretablepredictivemodelsinhighdimensionaldatainthepresenceofinteractionswithexposures
AT bhatnagarsahirrai analyticapproachforinterpretablepredictivemodelsinhighdimensionaldatainthepresenceofinteractionswithexposures
AT yangyi analyticapproachforinterpretablepredictivemodelsinhighdimensionaldatainthepresenceofinteractionswithexposures
AT khundrakpambudhachandra analyticapproachforinterpretablepredictivemodelsinhighdimensionaldatainthepresenceofinteractionswithexposures
AT evansalanc analyticapproachforinterpretablepredictivemodelsinhighdimensionaldatainthepresenceofinteractionswithexposures
AT blanchettemathieu analyticapproachforinterpretablepredictivemodelsinhighdimensionaldatainthepresenceofinteractionswithexposures
AT bouchardluigi analyticapproachforinterpretablepredictivemodelsinhighdimensionaldatainthepresenceofinteractionswithexposures
AT greenwoodceliamt analyticapproachforinterpretablepredictivemodelsinhighdimensionaldatainthepresenceofinteractionswithexposures