Cargando…

Data integration with high dimensionality

We consider situations where the data consist of a number of responses for each individual, which may include a mix of discrete and continuous variables. The data also include a class of predictors, where the same predictor may have different physical measurements across different experiments depend...

Descripción completa

Detalles Bibliográficos
Autores principales: Gao, Xin, Carroll, Raymond J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5532816/
https://www.ncbi.nlm.nih.gov/pubmed/28757650
http://dx.doi.org/10.1093/biomet/asx023
_version_ 1783253525506031616
author Gao, Xin
Carroll, Raymond J.
author_facet Gao, Xin
Carroll, Raymond J.
author_sort Gao, Xin
collection PubMed
description We consider situations where the data consist of a number of responses for each individual, which may include a mix of discrete and continuous variables. The data also include a class of predictors, where the same predictor may have different physical measurements across different experiments depending on how the predictor is measured. The goal is to select which predictors affect any of the responses, where the number of such informative predictors tends to infinity as the sample size increases. There are marginal likelihoods for each experiment; we specify a pseudolikelihood combining the marginal likelihoods, and propose a pseudolikelihood information criterion. Under regularity conditions, we establish selection consistency for this criterion with unbounded true model size. The proposed method includes a Bayesian information criterion with appropriate penalty term as a special case. Simulations indicate that data integration can dramatically improve upon using only one data source.
format Online
Article
Text
id pubmed-5532816
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-55328162017-07-28 Data integration with high dimensionality Gao, Xin Carroll, Raymond J. Biometrika Articles We consider situations where the data consist of a number of responses for each individual, which may include a mix of discrete and continuous variables. The data also include a class of predictors, where the same predictor may have different physical measurements across different experiments depending on how the predictor is measured. The goal is to select which predictors affect any of the responses, where the number of such informative predictors tends to infinity as the sample size increases. There are marginal likelihoods for each experiment; we specify a pseudolikelihood combining the marginal likelihoods, and propose a pseudolikelihood information criterion. Under regularity conditions, we establish selection consistency for this criterion with unbounded true model size. The proposed method includes a Bayesian information criterion with appropriate penalty term as a special case. Simulations indicate that data integration can dramatically improve upon using only one data source. Oxford University Press 2017-06 2017-05-09 /pmc/articles/PMC5532816/ /pubmed/28757650 http://dx.doi.org/10.1093/biomet/asx023 Text en © 2017 Biometrika Trust
spellingShingle Articles
Gao, Xin
Carroll, Raymond J.
Data integration with high dimensionality
title Data integration with high dimensionality
title_full Data integration with high dimensionality
title_fullStr Data integration with high dimensionality
title_full_unstemmed Data integration with high dimensionality
title_short Data integration with high dimensionality
title_sort data integration with high dimensionality
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5532816/
https://www.ncbi.nlm.nih.gov/pubmed/28757650
http://dx.doi.org/10.1093/biomet/asx023
work_keys_str_mv AT gaoxin dataintegrationwithhighdimensionality
AT carrollraymondj dataintegrationwithhighdimensionality