Cargando…

Neighborhood level chronic respiratory disease prevalence estimation using search query data

Estimation of disease prevalence at sub-city neighborhood scale allows early and targeted interventions that can help save lives and reduce public health burdens. However, the cost-prohibitive nature of highly localized data collection and sparsity of representative signals, has made it challenging...

Descripción completa

Detalles Bibliográficos
Autores principales: Abdur Rehman, Nabeel, Counts, Scott
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8189491/
https://www.ncbi.nlm.nih.gov/pubmed/34106982
http://dx.doi.org/10.1371/journal.pone.0252383
_version_ 1783705507450585088
author Abdur Rehman, Nabeel
Counts, Scott
author_facet Abdur Rehman, Nabeel
Counts, Scott
author_sort Abdur Rehman, Nabeel
collection PubMed
description Estimation of disease prevalence at sub-city neighborhood scale allows early and targeted interventions that can help save lives and reduce public health burdens. However, the cost-prohibitive nature of highly localized data collection and sparsity of representative signals, has made it challenging to identify neighborhood scale prevalence of disease. To overcome this challenge, we utilize alternative data sources, which are both less sparse and representative of localized disease prevalence: using query data from a large commercial search engine, we identify the prevalence of respiratory illness in the United States, localized to census tract geographic granularity. Focusing on asthma and Chronic Obstructive Pulmonary Disease (COPD), we construct a set of features based on searches for symptoms, medications, and disease-related information, and use these to identify illness rates in more than 23 thousand tracts in 500 cities across the United States. Out of sample model estimates from search data alone correlate with ground truth illness rate estimates from the CDC at 0.69 to 0.76, with simple additions to these models raising those correlations to as high as 0.84. We then show that in practice search query data can be added to other relevant data such as census or land cover data to boost results, with models that incorporate all data sources correlating with ground truth data at 0.91 for asthma and 0.88 for COPD.
format Online
Article
Text
id pubmed-8189491
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-81894912021-06-16 Neighborhood level chronic respiratory disease prevalence estimation using search query data Abdur Rehman, Nabeel Counts, Scott PLoS One Research Article Estimation of disease prevalence at sub-city neighborhood scale allows early and targeted interventions that can help save lives and reduce public health burdens. However, the cost-prohibitive nature of highly localized data collection and sparsity of representative signals, has made it challenging to identify neighborhood scale prevalence of disease. To overcome this challenge, we utilize alternative data sources, which are both less sparse and representative of localized disease prevalence: using query data from a large commercial search engine, we identify the prevalence of respiratory illness in the United States, localized to census tract geographic granularity. Focusing on asthma and Chronic Obstructive Pulmonary Disease (COPD), we construct a set of features based on searches for symptoms, medications, and disease-related information, and use these to identify illness rates in more than 23 thousand tracts in 500 cities across the United States. Out of sample model estimates from search data alone correlate with ground truth illness rate estimates from the CDC at 0.69 to 0.76, with simple additions to these models raising those correlations to as high as 0.84. We then show that in practice search query data can be added to other relevant data such as census or land cover data to boost results, with models that incorporate all data sources correlating with ground truth data at 0.91 for asthma and 0.88 for COPD. Public Library of Science 2021-06-09 /pmc/articles/PMC8189491/ /pubmed/34106982 http://dx.doi.org/10.1371/journal.pone.0252383 Text en © 2021 Abdur Rehman, Counts https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Abdur Rehman, Nabeel
Counts, Scott
Neighborhood level chronic respiratory disease prevalence estimation using search query data
title Neighborhood level chronic respiratory disease prevalence estimation using search query data
title_full Neighborhood level chronic respiratory disease prevalence estimation using search query data
title_fullStr Neighborhood level chronic respiratory disease prevalence estimation using search query data
title_full_unstemmed Neighborhood level chronic respiratory disease prevalence estimation using search query data
title_short Neighborhood level chronic respiratory disease prevalence estimation using search query data
title_sort neighborhood level chronic respiratory disease prevalence estimation using search query data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8189491/
https://www.ncbi.nlm.nih.gov/pubmed/34106982
http://dx.doi.org/10.1371/journal.pone.0252383
work_keys_str_mv AT abdurrehmannabeel neighborhoodlevelchronicrespiratorydiseaseprevalenceestimationusingsearchquerydata
AT countsscott neighborhoodlevelchronicrespiratorydiseaseprevalenceestimationusingsearchquerydata