Cargando…

An Information Criterion for Auxiliary Variable Selection in Incomplete Data Analysis

Statistical inference is considered for variables of interest, called primary variables, when auxiliary variables are observed along with the primary variables. We consider the setting of incomplete data analysis, where some primary variables are not observed. Utilizing a parametric model of joint d...

Descripción completa

Detalles Bibliográficos
Autores principales: Imori, Shinpei, Shimodaira, Hidetoshi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514761/
https://www.ncbi.nlm.nih.gov/pubmed/33266996
http://dx.doi.org/10.3390/e21030281
_version_ 1783586663130202112
author Imori, Shinpei
Shimodaira, Hidetoshi
author_facet Imori, Shinpei
Shimodaira, Hidetoshi
author_sort Imori, Shinpei
collection PubMed
description Statistical inference is considered for variables of interest, called primary variables, when auxiliary variables are observed along with the primary variables. We consider the setting of incomplete data analysis, where some primary variables are not observed. Utilizing a parametric model of joint distribution of primary and auxiliary variables, it is possible to improve the estimation of parametric model for the primary variables when the auxiliary variables are closely related to the primary variables. However, the estimation accuracy reduces when the auxiliary variables are irrelevant to the primary variables. For selecting useful auxiliary variables, we formulate the problem as model selection, and propose an information criterion for predicting primary variables by leveraging auxiliary variables. The proposed information criterion is an asymptotically unbiased estimator of the Kullback–Leibler divergence for complete data of primary variables under some reasonable conditions. We also clarify an asymptotic equivalence between the proposed information criterion and a variant of leave-one-out cross validation. Performance of our method is demonstrated via a simulation study and a real data example.
format Online
Article
Text
id pubmed-7514761
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75147612020-11-09 An Information Criterion for Auxiliary Variable Selection in Incomplete Data Analysis Imori, Shinpei Shimodaira, Hidetoshi Entropy (Basel) Article Statistical inference is considered for variables of interest, called primary variables, when auxiliary variables are observed along with the primary variables. We consider the setting of incomplete data analysis, where some primary variables are not observed. Utilizing a parametric model of joint distribution of primary and auxiliary variables, it is possible to improve the estimation of parametric model for the primary variables when the auxiliary variables are closely related to the primary variables. However, the estimation accuracy reduces when the auxiliary variables are irrelevant to the primary variables. For selecting useful auxiliary variables, we formulate the problem as model selection, and propose an information criterion for predicting primary variables by leveraging auxiliary variables. The proposed information criterion is an asymptotically unbiased estimator of the Kullback–Leibler divergence for complete data of primary variables under some reasonable conditions. We also clarify an asymptotic equivalence between the proposed information criterion and a variant of leave-one-out cross validation. Performance of our method is demonstrated via a simulation study and a real data example. MDPI 2019-03-14 /pmc/articles/PMC7514761/ /pubmed/33266996 http://dx.doi.org/10.3390/e21030281 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Imori, Shinpei
Shimodaira, Hidetoshi
An Information Criterion for Auxiliary Variable Selection in Incomplete Data Analysis
title An Information Criterion for Auxiliary Variable Selection in Incomplete Data Analysis
title_full An Information Criterion for Auxiliary Variable Selection in Incomplete Data Analysis
title_fullStr An Information Criterion for Auxiliary Variable Selection in Incomplete Data Analysis
title_full_unstemmed An Information Criterion for Auxiliary Variable Selection in Incomplete Data Analysis
title_short An Information Criterion for Auxiliary Variable Selection in Incomplete Data Analysis
title_sort information criterion for auxiliary variable selection in incomplete data analysis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514761/
https://www.ncbi.nlm.nih.gov/pubmed/33266996
http://dx.doi.org/10.3390/e21030281
work_keys_str_mv AT imorishinpei aninformationcriterionforauxiliaryvariableselectioninincompletedataanalysis
AT shimodairahidetoshi aninformationcriterionforauxiliaryvariableselectioninincompletedataanalysis
AT imorishinpei informationcriterionforauxiliaryvariableselectioninincompletedataanalysis
AT shimodairahidetoshi informationcriterionforauxiliaryvariableselectioninincompletedataanalysis