Cargando…
Mining Google Trends data for nowcasting and forecasting colorectal cancer (CRC) prevalence
BACKGROUND: Colorectal cancer (CRC) is the third most prevalent and second most lethal form of cancer in the world. Consequently, CRC cancer prevalence projections are essential for assessing the future burden of the disease, planning resource allocation, and developing service delivery strategies,...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10588692/ https://www.ncbi.nlm.nih.gov/pubmed/37869464 http://dx.doi.org/10.7717/peerj-cs.1518 |
_version_ | 1785123633372332032 |
---|---|
author | Tudor, Cristiana Sova, Robert Aurelian |
author_facet | Tudor, Cristiana Sova, Robert Aurelian |
author_sort | Tudor, Cristiana |
collection | PubMed |
description | BACKGROUND: Colorectal cancer (CRC) is the third most prevalent and second most lethal form of cancer in the world. Consequently, CRC cancer prevalence projections are essential for assessing the future burden of the disease, planning resource allocation, and developing service delivery strategies, as well as for grasping the shifting environment of cancer risk factors. However, unlike cancer incidence and mortality rates, national and international agencies do not routinely issue projections for cancer prevalence. Moreover, the limited or even nonexistent cancer statistics for large portions of the world, along with the high heterogeneity among world nations, further complicate the task of producing timely and accurate CRC prevalence projections. In this situation, population interest, as shown by Internet searches, can be very important for improving cancer statistics and, in the long run, for helping cancer research. METHODS: This study aims to model, nowcast and forecast the CRC prevalence at the global level using a three-step framework that incorporates three well-established univariate statistical and machine-learning models. First, data mining is performed to evaluate the relevancy of Google Trends (GT) data as a surrogate for the number of CRC survivors. The results demonstrate that population web-search interest in the term “colonoscopy” is the most reliable indicator to nowcast CRC disease prevalence. Then, various statistical and machine-learning models, including ARIMA, ETS, and FNNAR, are trained and tested using relevant GT time series. Finally, the updated monthly query series spanning 2004–2022 and the best forecasting model in terms of out-of-sample forecasting ability (i.e., the neural network autoregression) are utilized to generate point forecasts up to 2025. RESULTS: Results show that the number of people with colorectal cancer will continue to rise over the next 24 months. This in turn emphasizes the urgency for public policies aimed at reducing the population's exposure to the principal modifiable risk factors, such as lifestyle and nutrition. In addition, given the major drop in population interest in CRC during the first wave of the COVID-19 pandemic, the findings suggest that public health authorities should implement measures to increase cancer screening rates during pandemics. This in turn would deliver positive externalities, including the mitigation of the global burden and the enhancement of the quality of official statistics. |
format | Online Article Text |
id | pubmed-10588692 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-105886922023-10-21 Mining Google Trends data for nowcasting and forecasting colorectal cancer (CRC) prevalence Tudor, Cristiana Sova, Robert Aurelian PeerJ Comput Sci Bioinformatics BACKGROUND: Colorectal cancer (CRC) is the third most prevalent and second most lethal form of cancer in the world. Consequently, CRC cancer prevalence projections are essential for assessing the future burden of the disease, planning resource allocation, and developing service delivery strategies, as well as for grasping the shifting environment of cancer risk factors. However, unlike cancer incidence and mortality rates, national and international agencies do not routinely issue projections for cancer prevalence. Moreover, the limited or even nonexistent cancer statistics for large portions of the world, along with the high heterogeneity among world nations, further complicate the task of producing timely and accurate CRC prevalence projections. In this situation, population interest, as shown by Internet searches, can be very important for improving cancer statistics and, in the long run, for helping cancer research. METHODS: This study aims to model, nowcast and forecast the CRC prevalence at the global level using a three-step framework that incorporates three well-established univariate statistical and machine-learning models. First, data mining is performed to evaluate the relevancy of Google Trends (GT) data as a surrogate for the number of CRC survivors. The results demonstrate that population web-search interest in the term “colonoscopy” is the most reliable indicator to nowcast CRC disease prevalence. Then, various statistical and machine-learning models, including ARIMA, ETS, and FNNAR, are trained and tested using relevant GT time series. Finally, the updated monthly query series spanning 2004–2022 and the best forecasting model in terms of out-of-sample forecasting ability (i.e., the neural network autoregression) are utilized to generate point forecasts up to 2025. RESULTS: Results show that the number of people with colorectal cancer will continue to rise over the next 24 months. This in turn emphasizes the urgency for public policies aimed at reducing the population's exposure to the principal modifiable risk factors, such as lifestyle and nutrition. In addition, given the major drop in population interest in CRC during the first wave of the COVID-19 pandemic, the findings suggest that public health authorities should implement measures to increase cancer screening rates during pandemics. This in turn would deliver positive externalities, including the mitigation of the global burden and the enhancement of the quality of official statistics. PeerJ Inc. 2023-10-04 /pmc/articles/PMC10588692/ /pubmed/37869464 http://dx.doi.org/10.7717/peerj-cs.1518 Text en © 2023 Tudor and Sova https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Tudor, Cristiana Sova, Robert Aurelian Mining Google Trends data for nowcasting and forecasting colorectal cancer (CRC) prevalence |
title | Mining Google Trends data for nowcasting and forecasting colorectal cancer (CRC) prevalence |
title_full | Mining Google Trends data for nowcasting and forecasting colorectal cancer (CRC) prevalence |
title_fullStr | Mining Google Trends data for nowcasting and forecasting colorectal cancer (CRC) prevalence |
title_full_unstemmed | Mining Google Trends data for nowcasting and forecasting colorectal cancer (CRC) prevalence |
title_short | Mining Google Trends data for nowcasting and forecasting colorectal cancer (CRC) prevalence |
title_sort | mining google trends data for nowcasting and forecasting colorectal cancer (crc) prevalence |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10588692/ https://www.ncbi.nlm.nih.gov/pubmed/37869464 http://dx.doi.org/10.7717/peerj-cs.1518 |
work_keys_str_mv | AT tudorcristiana mininggoogletrendsdatafornowcastingandforecastingcolorectalcancercrcprevalence AT sovarobertaurelian mininggoogletrendsdatafornowcastingandforecastingcolorectalcancercrcprevalence |