Cargando…

Using Imputation to Provide Location Information for Nongeocoded Addresses

BACKGROUND: The importance of geography as a source of variation in health research continues to receive sustained attention in the literature. The inclusion of geographic information in such research often begins by adding data to a map which is predicated by some knowledge of location. A precise l...

Descripción completa

Detalles Bibliográficos
Autores principales: Curriero, Frank C., Kulldorff, Martin, Boscoe, Francis P., Klassen, Ann C.
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2818716/
https://www.ncbi.nlm.nih.gov/pubmed/20161766
http://dx.doi.org/10.1371/journal.pone.0008998
_version_ 1782177281671692288
author Curriero, Frank C.
Kulldorff, Martin
Boscoe, Francis P.
Klassen, Ann C.
author_facet Curriero, Frank C.
Kulldorff, Martin
Boscoe, Francis P.
Klassen, Ann C.
author_sort Curriero, Frank C.
collection PubMed
description BACKGROUND: The importance of geography as a source of variation in health research continues to receive sustained attention in the literature. The inclusion of geographic information in such research often begins by adding data to a map which is predicated by some knowledge of location. A precise level of spatial information is conventionally achieved through geocoding, the geographic information system (GIS) process of translating mailing address information to coordinates on a map. The geocoding process is not without its limitations, though, since there is always a percentage of addresses which cannot be converted successfully (nongeocodable). This raises concerns regarding bias since traditionally the practice has been to exclude nongeocoded data records from analysis. METHODOLOGY/PRINCIPAL FINDINGS: In this manuscript we develop and evaluate a set of imputation strategies for dealing with missing spatial information from nongeocoded addresses. The strategies are developed assuming a known zip code with increasing use of collateral information, namely the spatial distribution of the population at risk. Strategies are evaluated using prostate cancer data obtained from the Maryland Cancer Registry. We consider total case enumerations at the Census county, tract, and block group level as the outcome of interest when applying and evaluating the methods. Multiple imputation is used to provide estimated total case counts based on complete data (geocodes plus imputed nongeocodes) with a measure of uncertainty. Results indicate that the imputation strategy based on using available population-based age, gender, and race information performed the best overall at the county, tract, and block group levels. CONCLUSIONS/SIGNIFICANCE: The procedure allows for the potentially biased and likely under reported outcome, case enumerations based on only the geocoded records, to be presented with a statistically adjusted count (imputed count) with a measure of uncertainty that are based on all the case data, the geocodes and imputed nongeocodes. Similar strategies can be applied in other analysis settings.
format Text
id pubmed-2818716
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-28187162010-02-17 Using Imputation to Provide Location Information for Nongeocoded Addresses Curriero, Frank C. Kulldorff, Martin Boscoe, Francis P. Klassen, Ann C. PLoS One Research Article BACKGROUND: The importance of geography as a source of variation in health research continues to receive sustained attention in the literature. The inclusion of geographic information in such research often begins by adding data to a map which is predicated by some knowledge of location. A precise level of spatial information is conventionally achieved through geocoding, the geographic information system (GIS) process of translating mailing address information to coordinates on a map. The geocoding process is not without its limitations, though, since there is always a percentage of addresses which cannot be converted successfully (nongeocodable). This raises concerns regarding bias since traditionally the practice has been to exclude nongeocoded data records from analysis. METHODOLOGY/PRINCIPAL FINDINGS: In this manuscript we develop and evaluate a set of imputation strategies for dealing with missing spatial information from nongeocoded addresses. The strategies are developed assuming a known zip code with increasing use of collateral information, namely the spatial distribution of the population at risk. Strategies are evaluated using prostate cancer data obtained from the Maryland Cancer Registry. We consider total case enumerations at the Census county, tract, and block group level as the outcome of interest when applying and evaluating the methods. Multiple imputation is used to provide estimated total case counts based on complete data (geocodes plus imputed nongeocodes) with a measure of uncertainty. Results indicate that the imputation strategy based on using available population-based age, gender, and race information performed the best overall at the county, tract, and block group levels. CONCLUSIONS/SIGNIFICANCE: The procedure allows for the potentially biased and likely under reported outcome, case enumerations based on only the geocoded records, to be presented with a statistically adjusted count (imputed count) with a measure of uncertainty that are based on all the case data, the geocodes and imputed nongeocodes. Similar strategies can be applied in other analysis settings. Public Library of Science 2010-02-10 /pmc/articles/PMC2818716/ /pubmed/20161766 http://dx.doi.org/10.1371/journal.pone.0008998 Text en Curriero et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Curriero, Frank C.
Kulldorff, Martin
Boscoe, Francis P.
Klassen, Ann C.
Using Imputation to Provide Location Information for Nongeocoded Addresses
title Using Imputation to Provide Location Information for Nongeocoded Addresses
title_full Using Imputation to Provide Location Information for Nongeocoded Addresses
title_fullStr Using Imputation to Provide Location Information for Nongeocoded Addresses
title_full_unstemmed Using Imputation to Provide Location Information for Nongeocoded Addresses
title_short Using Imputation to Provide Location Information for Nongeocoded Addresses
title_sort using imputation to provide location information for nongeocoded addresses
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2818716/
https://www.ncbi.nlm.nih.gov/pubmed/20161766
http://dx.doi.org/10.1371/journal.pone.0008998
work_keys_str_mv AT currierofrankc usingimputationtoprovidelocationinformationfornongeocodedaddresses
AT kulldorffmartin usingimputationtoprovidelocationinformationfornongeocodedaddresses
AT boscoefrancisp usingimputationtoprovidelocationinformationfornongeocodedaddresses
AT klassenannc usingimputationtoprovidelocationinformationfornongeocodedaddresses