Cargando…
Neighborhood level chronic respiratory disease prevalence estimation using search query data
Estimation of disease prevalence at sub-city neighborhood scale allows early and targeted interventions that can help save lives and reduce public health burdens. However, the cost-prohibitive nature of highly localized data collection and sparsity of representative signals, has made it challenging...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8189491/ https://www.ncbi.nlm.nih.gov/pubmed/34106982 http://dx.doi.org/10.1371/journal.pone.0252383 |
_version_ | 1783705507450585088 |
---|---|
author | Abdur Rehman, Nabeel Counts, Scott |
author_facet | Abdur Rehman, Nabeel Counts, Scott |
author_sort | Abdur Rehman, Nabeel |
collection | PubMed |
description | Estimation of disease prevalence at sub-city neighborhood scale allows early and targeted interventions that can help save lives and reduce public health burdens. However, the cost-prohibitive nature of highly localized data collection and sparsity of representative signals, has made it challenging to identify neighborhood scale prevalence of disease. To overcome this challenge, we utilize alternative data sources, which are both less sparse and representative of localized disease prevalence: using query data from a large commercial search engine, we identify the prevalence of respiratory illness in the United States, localized to census tract geographic granularity. Focusing on asthma and Chronic Obstructive Pulmonary Disease (COPD), we construct a set of features based on searches for symptoms, medications, and disease-related information, and use these to identify illness rates in more than 23 thousand tracts in 500 cities across the United States. Out of sample model estimates from search data alone correlate with ground truth illness rate estimates from the CDC at 0.69 to 0.76, with simple additions to these models raising those correlations to as high as 0.84. We then show that in practice search query data can be added to other relevant data such as census or land cover data to boost results, with models that incorporate all data sources correlating with ground truth data at 0.91 for asthma and 0.88 for COPD. |
format | Online Article Text |
id | pubmed-8189491 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-81894912021-06-16 Neighborhood level chronic respiratory disease prevalence estimation using search query data Abdur Rehman, Nabeel Counts, Scott PLoS One Research Article Estimation of disease prevalence at sub-city neighborhood scale allows early and targeted interventions that can help save lives and reduce public health burdens. However, the cost-prohibitive nature of highly localized data collection and sparsity of representative signals, has made it challenging to identify neighborhood scale prevalence of disease. To overcome this challenge, we utilize alternative data sources, which are both less sparse and representative of localized disease prevalence: using query data from a large commercial search engine, we identify the prevalence of respiratory illness in the United States, localized to census tract geographic granularity. Focusing on asthma and Chronic Obstructive Pulmonary Disease (COPD), we construct a set of features based on searches for symptoms, medications, and disease-related information, and use these to identify illness rates in more than 23 thousand tracts in 500 cities across the United States. Out of sample model estimates from search data alone correlate with ground truth illness rate estimates from the CDC at 0.69 to 0.76, with simple additions to these models raising those correlations to as high as 0.84. We then show that in practice search query data can be added to other relevant data such as census or land cover data to boost results, with models that incorporate all data sources correlating with ground truth data at 0.91 for asthma and 0.88 for COPD. Public Library of Science 2021-06-09 /pmc/articles/PMC8189491/ /pubmed/34106982 http://dx.doi.org/10.1371/journal.pone.0252383 Text en © 2021 Abdur Rehman, Counts https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Abdur Rehman, Nabeel Counts, Scott Neighborhood level chronic respiratory disease prevalence estimation using search query data |
title | Neighborhood level chronic respiratory disease prevalence estimation using search query data |
title_full | Neighborhood level chronic respiratory disease prevalence estimation using search query data |
title_fullStr | Neighborhood level chronic respiratory disease prevalence estimation using search query data |
title_full_unstemmed | Neighborhood level chronic respiratory disease prevalence estimation using search query data |
title_short | Neighborhood level chronic respiratory disease prevalence estimation using search query data |
title_sort | neighborhood level chronic respiratory disease prevalence estimation using search query data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8189491/ https://www.ncbi.nlm.nih.gov/pubmed/34106982 http://dx.doi.org/10.1371/journal.pone.0252383 |
work_keys_str_mv | AT abdurrehmannabeel neighborhoodlevelchronicrespiratorydiseaseprevalenceestimationusingsearchquerydata AT countsscott neighborhoodlevelchronicrespiratorydiseaseprevalenceestimationusingsearchquerydata |