Cargando…

Evaluating geographic imputation approaches for zip code level data: an application to a study of pediatric diabetes

BACKGROUND: There is increasing interest in the study of place effects on health, facilitated in part by geographic information systems. Incomplete or missing address information reduces geocoding success. Several geographic imputation methods have been suggested to overcome this limitation. Accurac...

Descripción completa

Detalles Bibliográficos
Autores principales: Hibbert, James D, Liese, Angela D, Lawson, Andrew, Porter, Dwayne E, Puett, Robin C, Standiford, Debra, Liu, Lenna, Dabelea, Dana
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2763852/
https://www.ncbi.nlm.nih.gov/pubmed/19814809
http://dx.doi.org/10.1186/1476-072X-8-54
_version_ 1782173042457182208
author Hibbert, James D
Liese, Angela D
Lawson, Andrew
Porter, Dwayne E
Puett, Robin C
Standiford, Debra
Liu, Lenna
Dabelea, Dana
author_facet Hibbert, James D
Liese, Angela D
Lawson, Andrew
Porter, Dwayne E
Puett, Robin C
Standiford, Debra
Liu, Lenna
Dabelea, Dana
author_sort Hibbert, James D
collection PubMed
description BACKGROUND: There is increasing interest in the study of place effects on health, facilitated in part by geographic information systems. Incomplete or missing address information reduces geocoding success. Several geographic imputation methods have been suggested to overcome this limitation. Accuracy evaluation of these methods can be focused at the level of individuals and at higher group-levels (e.g., spatial distribution). METHODS: We evaluated the accuracy of eight geo-imputation methods for address allocation from ZIP codes to census tracts at the individual and group level. The spatial apportioning approaches underlying the imputation methods included four fixed (deterministic) and four random (stochastic) allocation methods using land area, total population, population under age 20, and race/ethnicity as weighting factors. Data included more than 2,000 geocoded cases of diabetes mellitus among youth aged 0-19 in four U.S. regions. The imputed distribution of cases across tracts was compared to the true distribution using a chi-squared statistic. RESULTS: At the individual level, population-weighted (total or under age 20) fixed allocation showed the greatest level of accuracy, with correct census tract assignments averaging 30.01% across all regions, followed by the race/ethnicity-weighted random method (23.83%). The true distribution of cases across census tracts was that 58.2% of tracts exhibited no cases, 26.2% had one case, 9.5% had two cases, and less than 3% had three or more. This distribution was best captured by random allocation methods, with no significant differences (p-value > 0.90). However, significant differences in distributions based on fixed allocation methods were found (p-value < 0.0003). CONCLUSION: Fixed imputation methods seemed to yield greatest accuracy at the individual level, suggesting use for studies on area-level environmental exposures. Fixed methods result in artificial clusters in single census tracts. For studies focusing on spatial distribution of disease, random methods seemed superior, as they most closely replicated the true spatial distribution. When selecting an imputation approach, researchers should consider carefully the study aims.
format Text
id pubmed-2763852
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27638522009-10-20 Evaluating geographic imputation approaches for zip code level data: an application to a study of pediatric diabetes Hibbert, James D Liese, Angela D Lawson, Andrew Porter, Dwayne E Puett, Robin C Standiford, Debra Liu, Lenna Dabelea, Dana Int J Health Geogr Research BACKGROUND: There is increasing interest in the study of place effects on health, facilitated in part by geographic information systems. Incomplete or missing address information reduces geocoding success. Several geographic imputation methods have been suggested to overcome this limitation. Accuracy evaluation of these methods can be focused at the level of individuals and at higher group-levels (e.g., spatial distribution). METHODS: We evaluated the accuracy of eight geo-imputation methods for address allocation from ZIP codes to census tracts at the individual and group level. The spatial apportioning approaches underlying the imputation methods included four fixed (deterministic) and four random (stochastic) allocation methods using land area, total population, population under age 20, and race/ethnicity as weighting factors. Data included more than 2,000 geocoded cases of diabetes mellitus among youth aged 0-19 in four U.S. regions. The imputed distribution of cases across tracts was compared to the true distribution using a chi-squared statistic. RESULTS: At the individual level, population-weighted (total or under age 20) fixed allocation showed the greatest level of accuracy, with correct census tract assignments averaging 30.01% across all regions, followed by the race/ethnicity-weighted random method (23.83%). The true distribution of cases across census tracts was that 58.2% of tracts exhibited no cases, 26.2% had one case, 9.5% had two cases, and less than 3% had three or more. This distribution was best captured by random allocation methods, with no significant differences (p-value > 0.90). However, significant differences in distributions based on fixed allocation methods were found (p-value < 0.0003). CONCLUSION: Fixed imputation methods seemed to yield greatest accuracy at the individual level, suggesting use for studies on area-level environmental exposures. Fixed methods result in artificial clusters in single census tracts. For studies focusing on spatial distribution of disease, random methods seemed superior, as they most closely replicated the true spatial distribution. When selecting an imputation approach, researchers should consider carefully the study aims. BioMed Central 2009-10-08 /pmc/articles/PMC2763852/ /pubmed/19814809 http://dx.doi.org/10.1186/1476-072X-8-54 Text en Copyright © 2009 Hibbert et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Hibbert, James D
Liese, Angela D
Lawson, Andrew
Porter, Dwayne E
Puett, Robin C
Standiford, Debra
Liu, Lenna
Dabelea, Dana
Evaluating geographic imputation approaches for zip code level data: an application to a study of pediatric diabetes
title Evaluating geographic imputation approaches for zip code level data: an application to a study of pediatric diabetes
title_full Evaluating geographic imputation approaches for zip code level data: an application to a study of pediatric diabetes
title_fullStr Evaluating geographic imputation approaches for zip code level data: an application to a study of pediatric diabetes
title_full_unstemmed Evaluating geographic imputation approaches for zip code level data: an application to a study of pediatric diabetes
title_short Evaluating geographic imputation approaches for zip code level data: an application to a study of pediatric diabetes
title_sort evaluating geographic imputation approaches for zip code level data: an application to a study of pediatric diabetes
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2763852/
https://www.ncbi.nlm.nih.gov/pubmed/19814809
http://dx.doi.org/10.1186/1476-072X-8-54
work_keys_str_mv AT hibbertjamesd evaluatinggeographicimputationapproachesforzipcodeleveldataanapplicationtoastudyofpediatricdiabetes
AT lieseangelad evaluatinggeographicimputationapproachesforzipcodeleveldataanapplicationtoastudyofpediatricdiabetes
AT lawsonandrew evaluatinggeographicimputationapproachesforzipcodeleveldataanapplicationtoastudyofpediatricdiabetes
AT porterdwaynee evaluatinggeographicimputationapproachesforzipcodeleveldataanapplicationtoastudyofpediatricdiabetes
AT puettrobinc evaluatinggeographicimputationapproachesforzipcodeleveldataanapplicationtoastudyofpediatricdiabetes
AT standiforddebra evaluatinggeographicimputationapproachesforzipcodeleveldataanapplicationtoastudyofpediatricdiabetes
AT liulenna evaluatinggeographicimputationapproachesforzipcodeleveldataanapplicationtoastudyofpediatricdiabetes
AT dabeleadana evaluatinggeographicimputationapproachesforzipcodeleveldataanapplicationtoastudyofpediatricdiabetes