Cargando…
Using search engine big data for predicting new HIV diagnoses
BACKGROUND: A large and growing body of “big data” is generated by internet search engines, such as Google. Because people often search for information about public health and medical issues, researchers may be able to use search engine data to monitor and predict public health problems, such as HIV...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6042696/ https://www.ncbi.nlm.nih.gov/pubmed/30001360 http://dx.doi.org/10.1371/journal.pone.0199527 |
_version_ | 1783339198872289280 |
---|---|
author | Young, Sean D. Zhang, Qingpeng |
author_facet | Young, Sean D. Zhang, Qingpeng |
author_sort | Young, Sean D. |
collection | PubMed |
description | BACKGROUND: A large and growing body of “big data” is generated by internet search engines, such as Google. Because people often search for information about public health and medical issues, researchers may be able to use search engine data to monitor and predict public health problems, such as HIV. We sought to assess the feasibility of using Google search data to analyze and predict new HIV diagnoses cases in the United States. METHODS AND FINDINGS: From 2007 to 2014, we collected search volume data on HIV-related Google search keywords across the United States. State-level new HIV diagnoses data were collected from the Centers for Disease Control and Prevention (CDC) and AIDSVu.org. We developed a negative binomial model to predict HIV cases using a subset of significant predictor keywords identified by LASSO. The Google search data were combined with state-level HIV case reports provided by the CDC. We use historical data to train the model and predict new HIV diagnoses from 2011 to 2014, with an average R(2) value of 0.99 between predicted versus actual cases, and average root-mean-square error (RMSE) of 108.75. CONCLUSIONS: Results indicate that Google Trends is a feasible tool to predict new cases of HIV at the state level. We discuss the implications of integrating visualization maps and tools based on these models into public health and HIV monitoring and surveillance. |
format | Online Article Text |
id | pubmed-6042696 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-60426962018-07-19 Using search engine big data for predicting new HIV diagnoses Young, Sean D. Zhang, Qingpeng PLoS One Research Article BACKGROUND: A large and growing body of “big data” is generated by internet search engines, such as Google. Because people often search for information about public health and medical issues, researchers may be able to use search engine data to monitor and predict public health problems, such as HIV. We sought to assess the feasibility of using Google search data to analyze and predict new HIV diagnoses cases in the United States. METHODS AND FINDINGS: From 2007 to 2014, we collected search volume data on HIV-related Google search keywords across the United States. State-level new HIV diagnoses data were collected from the Centers for Disease Control and Prevention (CDC) and AIDSVu.org. We developed a negative binomial model to predict HIV cases using a subset of significant predictor keywords identified by LASSO. The Google search data were combined with state-level HIV case reports provided by the CDC. We use historical data to train the model and predict new HIV diagnoses from 2011 to 2014, with an average R(2) value of 0.99 between predicted versus actual cases, and average root-mean-square error (RMSE) of 108.75. CONCLUSIONS: Results indicate that Google Trends is a feasible tool to predict new cases of HIV at the state level. We discuss the implications of integrating visualization maps and tools based on these models into public health and HIV monitoring and surveillance. Public Library of Science 2018-07-12 /pmc/articles/PMC6042696/ /pubmed/30001360 http://dx.doi.org/10.1371/journal.pone.0199527 Text en © 2018 Young, Zhang http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Young, Sean D. Zhang, Qingpeng Using search engine big data for predicting new HIV diagnoses |
title | Using search engine big data for predicting new HIV diagnoses |
title_full | Using search engine big data for predicting new HIV diagnoses |
title_fullStr | Using search engine big data for predicting new HIV diagnoses |
title_full_unstemmed | Using search engine big data for predicting new HIV diagnoses |
title_short | Using search engine big data for predicting new HIV diagnoses |
title_sort | using search engine big data for predicting new hiv diagnoses |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6042696/ https://www.ncbi.nlm.nih.gov/pubmed/30001360 http://dx.doi.org/10.1371/journal.pone.0199527 |
work_keys_str_mv | AT youngseand usingsearchenginebigdataforpredictingnewhivdiagnoses AT zhangqingpeng usingsearchenginebigdataforpredictingnewhivdiagnoses |