Cargando…

Estimating missing values in China’s official socioeconomic statistics using progressive spatiotemporal Bayesian hierarchical modeling

Due to a large number of missing values, both spatially and temporally, China has not published a complete official socioeconomic statistics dataset at the county level, which is the country’s basic scale of official statistics data collection. We developed a procedure to impute the missing values u...

Descripción completa

Detalles Bibliográficos
Autores principales: Song, Chao, Yang, Xiu, Shi, Xun, Bo, Yanchen, Wang, Jinfeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6030081/
https://www.ncbi.nlm.nih.gov/pubmed/29968777
http://dx.doi.org/10.1038/s41598-018-28322-z
_version_ 1783337073976016896
author Song, Chao
Yang, Xiu
Shi, Xun
Bo, Yanchen
Wang, Jinfeng
author_facet Song, Chao
Yang, Xiu
Shi, Xun
Bo, Yanchen
Wang, Jinfeng
author_sort Song, Chao
collection PubMed
description Due to a large number of missing values, both spatially and temporally, China has not published a complete official socioeconomic statistics dataset at the county level, which is the country’s basic scale of official statistics data collection. We developed a procedure to impute the missing values under the Bayesian hierarchical modeling framework. The procedure incorporates two novelties. First, it takes into account spatial autocorrelations and temporal trends for those easier-to-impute variables with small missing percentages. Second, it further uses the first-step complete variables as covariate information to improve the modeling of more-difficult-to-impute variables with large missing percentages. We applied this progressive spatiotemporal (PST) method to China’s official socioeconomic statistics during 2002–2011 and compared it with four other widely used imputation methods, including k-nearest neighbors (kNN), expectation maximum (EM), singular value decomposition (SVD) and random forest (RF). The results show that the PST method outperforms these methods, thus proving the effects of sophisticatedly incorporating the additional spatial and temporal information and progressively utilizing the covariate information. This study has an outcome that allows China to construct a complete socioeconomic dataset and establishes a methodology that can be generally useful for estimating missing values in large spatiotemporal datasets.
format Online
Article
Text
id pubmed-6030081
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-60300812018-07-11 Estimating missing values in China’s official socioeconomic statistics using progressive spatiotemporal Bayesian hierarchical modeling Song, Chao Yang, Xiu Shi, Xun Bo, Yanchen Wang, Jinfeng Sci Rep Article Due to a large number of missing values, both spatially and temporally, China has not published a complete official socioeconomic statistics dataset at the county level, which is the country’s basic scale of official statistics data collection. We developed a procedure to impute the missing values under the Bayesian hierarchical modeling framework. The procedure incorporates two novelties. First, it takes into account spatial autocorrelations and temporal trends for those easier-to-impute variables with small missing percentages. Second, it further uses the first-step complete variables as covariate information to improve the modeling of more-difficult-to-impute variables with large missing percentages. We applied this progressive spatiotemporal (PST) method to China’s official socioeconomic statistics during 2002–2011 and compared it with four other widely used imputation methods, including k-nearest neighbors (kNN), expectation maximum (EM), singular value decomposition (SVD) and random forest (RF). The results show that the PST method outperforms these methods, thus proving the effects of sophisticatedly incorporating the additional spatial and temporal information and progressively utilizing the covariate information. This study has an outcome that allows China to construct a complete socioeconomic dataset and establishes a methodology that can be generally useful for estimating missing values in large spatiotemporal datasets. Nature Publishing Group UK 2018-07-03 /pmc/articles/PMC6030081/ /pubmed/29968777 http://dx.doi.org/10.1038/s41598-018-28322-z Text en © The Author(s) 2018 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Song, Chao
Yang, Xiu
Shi, Xun
Bo, Yanchen
Wang, Jinfeng
Estimating missing values in China’s official socioeconomic statistics using progressive spatiotemporal Bayesian hierarchical modeling
title Estimating missing values in China’s official socioeconomic statistics using progressive spatiotemporal Bayesian hierarchical modeling
title_full Estimating missing values in China’s official socioeconomic statistics using progressive spatiotemporal Bayesian hierarchical modeling
title_fullStr Estimating missing values in China’s official socioeconomic statistics using progressive spatiotemporal Bayesian hierarchical modeling
title_full_unstemmed Estimating missing values in China’s official socioeconomic statistics using progressive spatiotemporal Bayesian hierarchical modeling
title_short Estimating missing values in China’s official socioeconomic statistics using progressive spatiotemporal Bayesian hierarchical modeling
title_sort estimating missing values in china’s official socioeconomic statistics using progressive spatiotemporal bayesian hierarchical modeling
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6030081/
https://www.ncbi.nlm.nih.gov/pubmed/29968777
http://dx.doi.org/10.1038/s41598-018-28322-z
work_keys_str_mv AT songchao estimatingmissingvaluesinchinasofficialsocioeconomicstatisticsusingprogressivespatiotemporalbayesianhierarchicalmodeling
AT yangxiu estimatingmissingvaluesinchinasofficialsocioeconomicstatisticsusingprogressivespatiotemporalbayesianhierarchicalmodeling
AT shixun estimatingmissingvaluesinchinasofficialsocioeconomicstatisticsusingprogressivespatiotemporalbayesianhierarchicalmodeling
AT boyanchen estimatingmissingvaluesinchinasofficialsocioeconomicstatisticsusingprogressivespatiotemporalbayesianhierarchicalmodeling
AT wangjinfeng estimatingmissingvaluesinchinasofficialsocioeconomicstatisticsusingprogressivespatiotemporalbayesianhierarchicalmodeling