Cargando…

Mining Google Trends data for nowcasting and forecasting colorectal cancer (CRC) prevalence

BACKGROUND: Colorectal cancer (CRC) is the third most prevalent and second most lethal form of cancer in the world. Consequently, CRC cancer prevalence projections are essential for assessing the future burden of the disease, planning resource allocation, and developing service delivery strategies,...

Descripción completa

Detalles Bibliográficos
Autores principales: Tudor, Cristiana, Sova, Robert Aurelian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10588692/
https://www.ncbi.nlm.nih.gov/pubmed/37869464
http://dx.doi.org/10.7717/peerj-cs.1518
_version_ 1785123633372332032
author Tudor, Cristiana
Sova, Robert Aurelian
author_facet Tudor, Cristiana
Sova, Robert Aurelian
author_sort Tudor, Cristiana
collection PubMed
description BACKGROUND: Colorectal cancer (CRC) is the third most prevalent and second most lethal form of cancer in the world. Consequently, CRC cancer prevalence projections are essential for assessing the future burden of the disease, planning resource allocation, and developing service delivery strategies, as well as for grasping the shifting environment of cancer risk factors. However, unlike cancer incidence and mortality rates, national and international agencies do not routinely issue projections for cancer prevalence. Moreover, the limited or even nonexistent cancer statistics for large portions of the world, along with the high heterogeneity among world nations, further complicate the task of producing timely and accurate CRC prevalence projections. In this situation, population interest, as shown by Internet searches, can be very important for improving cancer statistics and, in the long run, for helping cancer research. METHODS: This study aims to model, nowcast and forecast the CRC prevalence at the global level using a three-step framework that incorporates three well-established univariate statistical and machine-learning models. First, data mining is performed to evaluate the relevancy of Google Trends (GT) data as a surrogate for the number of CRC survivors. The results demonstrate that population web-search interest in the term “colonoscopy” is the most reliable indicator to nowcast CRC disease prevalence. Then, various statistical and machine-learning models, including ARIMA, ETS, and FNNAR, are trained and tested using relevant GT time series. Finally, the updated monthly query series spanning 2004–2022 and the best forecasting model in terms of out-of-sample forecasting ability (i.e., the neural network autoregression) are utilized to generate point forecasts up to 2025. RESULTS: Results show that the number of people with colorectal cancer will continue to rise over the next 24 months. This in turn emphasizes the urgency for public policies aimed at reducing the population's exposure to the principal modifiable risk factors, such as lifestyle and nutrition. In addition, given the major drop in population interest in CRC during the first wave of the COVID-19 pandemic, the findings suggest that public health authorities should implement measures to increase cancer screening rates during pandemics. This in turn would deliver positive externalities, including the mitigation of the global burden and the enhancement of the quality of official statistics.
format Online
Article
Text
id pubmed-10588692
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-105886922023-10-21 Mining Google Trends data for nowcasting and forecasting colorectal cancer (CRC) prevalence Tudor, Cristiana Sova, Robert Aurelian PeerJ Comput Sci Bioinformatics BACKGROUND: Colorectal cancer (CRC) is the third most prevalent and second most lethal form of cancer in the world. Consequently, CRC cancer prevalence projections are essential for assessing the future burden of the disease, planning resource allocation, and developing service delivery strategies, as well as for grasping the shifting environment of cancer risk factors. However, unlike cancer incidence and mortality rates, national and international agencies do not routinely issue projections for cancer prevalence. Moreover, the limited or even nonexistent cancer statistics for large portions of the world, along with the high heterogeneity among world nations, further complicate the task of producing timely and accurate CRC prevalence projections. In this situation, population interest, as shown by Internet searches, can be very important for improving cancer statistics and, in the long run, for helping cancer research. METHODS: This study aims to model, nowcast and forecast the CRC prevalence at the global level using a three-step framework that incorporates three well-established univariate statistical and machine-learning models. First, data mining is performed to evaluate the relevancy of Google Trends (GT) data as a surrogate for the number of CRC survivors. The results demonstrate that population web-search interest in the term “colonoscopy” is the most reliable indicator to nowcast CRC disease prevalence. Then, various statistical and machine-learning models, including ARIMA, ETS, and FNNAR, are trained and tested using relevant GT time series. Finally, the updated monthly query series spanning 2004–2022 and the best forecasting model in terms of out-of-sample forecasting ability (i.e., the neural network autoregression) are utilized to generate point forecasts up to 2025. RESULTS: Results show that the number of people with colorectal cancer will continue to rise over the next 24 months. This in turn emphasizes the urgency for public policies aimed at reducing the population's exposure to the principal modifiable risk factors, such as lifestyle and nutrition. In addition, given the major drop in population interest in CRC during the first wave of the COVID-19 pandemic, the findings suggest that public health authorities should implement measures to increase cancer screening rates during pandemics. This in turn would deliver positive externalities, including the mitigation of the global burden and the enhancement of the quality of official statistics. PeerJ Inc. 2023-10-04 /pmc/articles/PMC10588692/ /pubmed/37869464 http://dx.doi.org/10.7717/peerj-cs.1518 Text en © 2023 Tudor and Sova https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Tudor, Cristiana
Sova, Robert Aurelian
Mining Google Trends data for nowcasting and forecasting colorectal cancer (CRC) prevalence
title Mining Google Trends data for nowcasting and forecasting colorectal cancer (CRC) prevalence
title_full Mining Google Trends data for nowcasting and forecasting colorectal cancer (CRC) prevalence
title_fullStr Mining Google Trends data for nowcasting and forecasting colorectal cancer (CRC) prevalence
title_full_unstemmed Mining Google Trends data for nowcasting and forecasting colorectal cancer (CRC) prevalence
title_short Mining Google Trends data for nowcasting and forecasting colorectal cancer (CRC) prevalence
title_sort mining google trends data for nowcasting and forecasting colorectal cancer (crc) prevalence
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10588692/
https://www.ncbi.nlm.nih.gov/pubmed/37869464
http://dx.doi.org/10.7717/peerj-cs.1518
work_keys_str_mv AT tudorcristiana mininggoogletrendsdatafornowcastingandforecastingcolorectalcancercrcprevalence
AT sovarobertaurelian mininggoogletrendsdatafornowcastingandforecastingcolorectalcancercrcprevalence