Cargando…

Estimating influenza incidence using search query deceptiveness and generalized ridge regression

Seasonal influenza is a sometimes surprisingly impactful disease, causing thousands of deaths per year along with much additional morbidity. Timely knowledge of the outbreak state is valuable for managing an effective response. The current state of the art is to gather this knowledge using in-person...

Descripción completa

Detalles Bibliográficos
Autores principales: Priedhorsky, Reid, Daughton, Ashlynn R., Barnard, Martha, O’Connell, Fiona, Osthus, Dave
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6771994/
https://www.ncbi.nlm.nih.gov/pubmed/31574086
http://dx.doi.org/10.1371/journal.pcbi.1007165
_version_ 1783455811283976192
author Priedhorsky, Reid
Daughton, Ashlynn R.
Barnard, Martha
O’Connell, Fiona
Osthus, Dave
author_facet Priedhorsky, Reid
Daughton, Ashlynn R.
Barnard, Martha
O’Connell, Fiona
Osthus, Dave
author_sort Priedhorsky, Reid
collection PubMed
description Seasonal influenza is a sometimes surprisingly impactful disease, causing thousands of deaths per year along with much additional morbidity. Timely knowledge of the outbreak state is valuable for managing an effective response. The current state of the art is to gather this knowledge using in-person patient contact. While accurate, this is time-consuming and expensive. This has motivated inquiry into new approaches using internet activity traces, based on the theory that lay observations of health status lead to informative features in internet data. These approaches risk being deceived by activity traces having a coincidental, rather than informative, relationship to disease incidence; to our knowledge, this risk has not yet been quantitatively explored. We evaluated both simulated and real activity traces of varying deceptiveness for influenza incidence estimation using linear regression. We found that deceptiveness knowledge does reduce error in such estimates, that it may help automatically-selected features perform as well or better than features that require human curation, and that a semantic distance measure derived from the Wikipedia article category tree serves as a useful proxy for deceptiveness. This suggests that disease incidence estimation models should incorporate not only data about how internet features map to incidence but also additional data to estimate feature deceptiveness. By doing so, we may gain one more step along the path to accurate, reliable disease incidence estimation using internet data. This capability would improve public health by decreasing the cost and increasing the timeliness of such estimates.
format Online
Article
Text
id pubmed-6771994
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-67719942019-10-12 Estimating influenza incidence using search query deceptiveness and generalized ridge regression Priedhorsky, Reid Daughton, Ashlynn R. Barnard, Martha O’Connell, Fiona Osthus, Dave PLoS Comput Biol Research Article Seasonal influenza is a sometimes surprisingly impactful disease, causing thousands of deaths per year along with much additional morbidity. Timely knowledge of the outbreak state is valuable for managing an effective response. The current state of the art is to gather this knowledge using in-person patient contact. While accurate, this is time-consuming and expensive. This has motivated inquiry into new approaches using internet activity traces, based on the theory that lay observations of health status lead to informative features in internet data. These approaches risk being deceived by activity traces having a coincidental, rather than informative, relationship to disease incidence; to our knowledge, this risk has not yet been quantitatively explored. We evaluated both simulated and real activity traces of varying deceptiveness for influenza incidence estimation using linear regression. We found that deceptiveness knowledge does reduce error in such estimates, that it may help automatically-selected features perform as well or better than features that require human curation, and that a semantic distance measure derived from the Wikipedia article category tree serves as a useful proxy for deceptiveness. This suggests that disease incidence estimation models should incorporate not only data about how internet features map to incidence but also additional data to estimate feature deceptiveness. By doing so, we may gain one more step along the path to accurate, reliable disease incidence estimation using internet data. This capability would improve public health by decreasing the cost and increasing the timeliness of such estimates. Public Library of Science 2019-10-01 /pmc/articles/PMC6771994/ /pubmed/31574086 http://dx.doi.org/10.1371/journal.pcbi.1007165 Text en © 2019 Priedhorsky et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Priedhorsky, Reid
Daughton, Ashlynn R.
Barnard, Martha
O’Connell, Fiona
Osthus, Dave
Estimating influenza incidence using search query deceptiveness and generalized ridge regression
title Estimating influenza incidence using search query deceptiveness and generalized ridge regression
title_full Estimating influenza incidence using search query deceptiveness and generalized ridge regression
title_fullStr Estimating influenza incidence using search query deceptiveness and generalized ridge regression
title_full_unstemmed Estimating influenza incidence using search query deceptiveness and generalized ridge regression
title_short Estimating influenza incidence using search query deceptiveness and generalized ridge regression
title_sort estimating influenza incidence using search query deceptiveness and generalized ridge regression
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6771994/
https://www.ncbi.nlm.nih.gov/pubmed/31574086
http://dx.doi.org/10.1371/journal.pcbi.1007165
work_keys_str_mv AT priedhorskyreid estimatinginfluenzaincidenceusingsearchquerydeceptivenessandgeneralizedridgeregression
AT daughtonashlynnr estimatinginfluenzaincidenceusingsearchquerydeceptivenessandgeneralizedridgeregression
AT barnardmartha estimatinginfluenzaincidenceusingsearchquerydeceptivenessandgeneralizedridgeregression
AT oconnellfiona estimatinginfluenzaincidenceusingsearchquerydeceptivenessandgeneralizedridgeregression
AT osthusdave estimatinginfluenzaincidenceusingsearchquerydeceptivenessandgeneralizedridgeregression