Cargando…
Estimating missing values in China’s official socioeconomic statistics using progressive spatiotemporal Bayesian hierarchical modeling
Due to a large number of missing values, both spatially and temporally, China has not published a complete official socioeconomic statistics dataset at the county level, which is the country’s basic scale of official statistics data collection. We developed a procedure to impute the missing values u...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6030081/ https://www.ncbi.nlm.nih.gov/pubmed/29968777 http://dx.doi.org/10.1038/s41598-018-28322-z |
_version_ | 1783337073976016896 |
---|---|
author | Song, Chao Yang, Xiu Shi, Xun Bo, Yanchen Wang, Jinfeng |
author_facet | Song, Chao Yang, Xiu Shi, Xun Bo, Yanchen Wang, Jinfeng |
author_sort | Song, Chao |
collection | PubMed |
description | Due to a large number of missing values, both spatially and temporally, China has not published a complete official socioeconomic statistics dataset at the county level, which is the country’s basic scale of official statistics data collection. We developed a procedure to impute the missing values under the Bayesian hierarchical modeling framework. The procedure incorporates two novelties. First, it takes into account spatial autocorrelations and temporal trends for those easier-to-impute variables with small missing percentages. Second, it further uses the first-step complete variables as covariate information to improve the modeling of more-difficult-to-impute variables with large missing percentages. We applied this progressive spatiotemporal (PST) method to China’s official socioeconomic statistics during 2002–2011 and compared it with four other widely used imputation methods, including k-nearest neighbors (kNN), expectation maximum (EM), singular value decomposition (SVD) and random forest (RF). The results show that the PST method outperforms these methods, thus proving the effects of sophisticatedly incorporating the additional spatial and temporal information and progressively utilizing the covariate information. This study has an outcome that allows China to construct a complete socioeconomic dataset and establishes a methodology that can be generally useful for estimating missing values in large spatiotemporal datasets. |
format | Online Article Text |
id | pubmed-6030081 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-60300812018-07-11 Estimating missing values in China’s official socioeconomic statistics using progressive spatiotemporal Bayesian hierarchical modeling Song, Chao Yang, Xiu Shi, Xun Bo, Yanchen Wang, Jinfeng Sci Rep Article Due to a large number of missing values, both spatially and temporally, China has not published a complete official socioeconomic statistics dataset at the county level, which is the country’s basic scale of official statistics data collection. We developed a procedure to impute the missing values under the Bayesian hierarchical modeling framework. The procedure incorporates two novelties. First, it takes into account spatial autocorrelations and temporal trends for those easier-to-impute variables with small missing percentages. Second, it further uses the first-step complete variables as covariate information to improve the modeling of more-difficult-to-impute variables with large missing percentages. We applied this progressive spatiotemporal (PST) method to China’s official socioeconomic statistics during 2002–2011 and compared it with four other widely used imputation methods, including k-nearest neighbors (kNN), expectation maximum (EM), singular value decomposition (SVD) and random forest (RF). The results show that the PST method outperforms these methods, thus proving the effects of sophisticatedly incorporating the additional spatial and temporal information and progressively utilizing the covariate information. This study has an outcome that allows China to construct a complete socioeconomic dataset and establishes a methodology that can be generally useful for estimating missing values in large spatiotemporal datasets. Nature Publishing Group UK 2018-07-03 /pmc/articles/PMC6030081/ /pubmed/29968777 http://dx.doi.org/10.1038/s41598-018-28322-z Text en © The Author(s) 2018 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Song, Chao Yang, Xiu Shi, Xun Bo, Yanchen Wang, Jinfeng Estimating missing values in China’s official socioeconomic statistics using progressive spatiotemporal Bayesian hierarchical modeling |
title | Estimating missing values in China’s official socioeconomic statistics using progressive spatiotemporal Bayesian hierarchical modeling |
title_full | Estimating missing values in China’s official socioeconomic statistics using progressive spatiotemporal Bayesian hierarchical modeling |
title_fullStr | Estimating missing values in China’s official socioeconomic statistics using progressive spatiotemporal Bayesian hierarchical modeling |
title_full_unstemmed | Estimating missing values in China’s official socioeconomic statistics using progressive spatiotemporal Bayesian hierarchical modeling |
title_short | Estimating missing values in China’s official socioeconomic statistics using progressive spatiotemporal Bayesian hierarchical modeling |
title_sort | estimating missing values in china’s official socioeconomic statistics using progressive spatiotemporal bayesian hierarchical modeling |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6030081/ https://www.ncbi.nlm.nih.gov/pubmed/29968777 http://dx.doi.org/10.1038/s41598-018-28322-z |
work_keys_str_mv | AT songchao estimatingmissingvaluesinchinasofficialsocioeconomicstatisticsusingprogressivespatiotemporalbayesianhierarchicalmodeling AT yangxiu estimatingmissingvaluesinchinasofficialsocioeconomicstatisticsusingprogressivespatiotemporalbayesianhierarchicalmodeling AT shixun estimatingmissingvaluesinchinasofficialsocioeconomicstatisticsusingprogressivespatiotemporalbayesianhierarchicalmodeling AT boyanchen estimatingmissingvaluesinchinasofficialsocioeconomicstatisticsusingprogressivespatiotemporalbayesianhierarchicalmodeling AT wangjinfeng estimatingmissingvaluesinchinasofficialsocioeconomicstatisticsusingprogressivespatiotemporalbayesianhierarchicalmodeling |