Cargando…

Early prediction of SARS-CoV-2 reproductive number from environmental, atmospheric and mobility data: A supervised machine learning approach

INTRODUCTION: SARS-CoV-2 was declared a pandemic by the WHO on March 11th, 2020. Public protective measures were enforced in every country to limit the diffusion of SARS-CoV-2. Its transmission, mainly by droplets, has been measured by the effective reproduction number (Rt) that counts the number of...

Descripción completa

Detalles Bibliográficos
Autores principales: Caruso, Pier Francesco, Angelotti, Giovanni, Greco, Massimiliano, Guzzetta, Giorgio, Cereda, Danilo, Merler, Stefano, Cecconi, Maurizio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier B.V. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8970608/
https://www.ncbi.nlm.nih.gov/pubmed/35390590
http://dx.doi.org/10.1016/j.ijmedinf.2022.104755
_version_ 1784679492987387904
author Caruso, Pier Francesco
Angelotti, Giovanni
Greco, Massimiliano
Guzzetta, Giorgio
Cereda, Danilo
Merler, Stefano
Cecconi, Maurizio
author_facet Caruso, Pier Francesco
Angelotti, Giovanni
Greco, Massimiliano
Guzzetta, Giorgio
Cereda, Danilo
Merler, Stefano
Cecconi, Maurizio
author_sort Caruso, Pier Francesco
collection PubMed
description INTRODUCTION: SARS-CoV-2 was declared a pandemic by the WHO on March 11th, 2020. Public protective measures were enforced in every country to limit the diffusion of SARS-CoV-2. Its transmission, mainly by droplets, has been measured by the effective reproduction number (Rt) that counts the number of secondary cases caused in a population by an average infectious individual at time t. Current strategies to calculate Rt reflect the number of secondary cases after several days, due to a delay from symptoms onset to reporting. We propose a complementary Rt estimation using supervised machine learning techniques to predict short term variations with more timely results. MATERIAL AND METHODS: Our primary goal was to predict Rt of the current day in the twelve provinces of Lombardy with the highest possible accuracy, and with no influence of the local testing strategies. We gathered data about mobility, weather, and pollution from different public sources as a proxy of human behavior and public health measures. We built four supervised machine learning algorithms with different strategies: the outcome variable was the daily median Rt values per province obtained from officially adopted algorithms. RESULTS: Data from 243 days for every province were presented to our four models (from February 15th, 2020, to October 14th, 2020). Two models using differential calculation of Rt instead of the raw values showed the highest mean coefficient of determination (0.93 for both) and residuals reported the lowest mean error (-0.03 and 0.01) and standard deviation (0.13 for both) as well. The one with access to the value of Rt of the day before heavily relied on that feature for prediction, while the other one had more distributed weights. DISCUSSION: The model that had not access to the Rt value of the previous day and used Rt differential value as outcome (FDRt) was considered the most robust according to the metrics. Its forecasts were able to predict the trend that Rt values would have developed over different weeks, but it was not particularly accurate in predicting the precise value of Rt. A correlation among mobility, atmospheric, features, pollution and Rt values is plausible, but further testing should be performed.
format Online
Article
Text
id pubmed-8970608
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier B.V.
record_format MEDLINE/PubMed
spelling pubmed-89706082022-04-01 Early prediction of SARS-CoV-2 reproductive number from environmental, atmospheric and mobility data: A supervised machine learning approach Caruso, Pier Francesco Angelotti, Giovanni Greco, Massimiliano Guzzetta, Giorgio Cereda, Danilo Merler, Stefano Cecconi, Maurizio Int J Med Inform Article INTRODUCTION: SARS-CoV-2 was declared a pandemic by the WHO on March 11th, 2020. Public protective measures were enforced in every country to limit the diffusion of SARS-CoV-2. Its transmission, mainly by droplets, has been measured by the effective reproduction number (Rt) that counts the number of secondary cases caused in a population by an average infectious individual at time t. Current strategies to calculate Rt reflect the number of secondary cases after several days, due to a delay from symptoms onset to reporting. We propose a complementary Rt estimation using supervised machine learning techniques to predict short term variations with more timely results. MATERIAL AND METHODS: Our primary goal was to predict Rt of the current day in the twelve provinces of Lombardy with the highest possible accuracy, and with no influence of the local testing strategies. We gathered data about mobility, weather, and pollution from different public sources as a proxy of human behavior and public health measures. We built four supervised machine learning algorithms with different strategies: the outcome variable was the daily median Rt values per province obtained from officially adopted algorithms. RESULTS: Data from 243 days for every province were presented to our four models (from February 15th, 2020, to October 14th, 2020). Two models using differential calculation of Rt instead of the raw values showed the highest mean coefficient of determination (0.93 for both) and residuals reported the lowest mean error (-0.03 and 0.01) and standard deviation (0.13 for both) as well. The one with access to the value of Rt of the day before heavily relied on that feature for prediction, while the other one had more distributed weights. DISCUSSION: The model that had not access to the Rt value of the previous day and used Rt differential value as outcome (FDRt) was considered the most robust according to the metrics. Its forecasts were able to predict the trend that Rt values would have developed over different weeks, but it was not particularly accurate in predicting the precise value of Rt. A correlation among mobility, atmospheric, features, pollution and Rt values is plausible, but further testing should be performed. Elsevier B.V. 2022-06 2022-04-01 /pmc/articles/PMC8970608/ /pubmed/35390590 http://dx.doi.org/10.1016/j.ijmedinf.2022.104755 Text en © 2022 Elsevier B.V. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Article
Caruso, Pier Francesco
Angelotti, Giovanni
Greco, Massimiliano
Guzzetta, Giorgio
Cereda, Danilo
Merler, Stefano
Cecconi, Maurizio
Early prediction of SARS-CoV-2 reproductive number from environmental, atmospheric and mobility data: A supervised machine learning approach
title Early prediction of SARS-CoV-2 reproductive number from environmental, atmospheric and mobility data: A supervised machine learning approach
title_full Early prediction of SARS-CoV-2 reproductive number from environmental, atmospheric and mobility data: A supervised machine learning approach
title_fullStr Early prediction of SARS-CoV-2 reproductive number from environmental, atmospheric and mobility data: A supervised machine learning approach
title_full_unstemmed Early prediction of SARS-CoV-2 reproductive number from environmental, atmospheric and mobility data: A supervised machine learning approach
title_short Early prediction of SARS-CoV-2 reproductive number from environmental, atmospheric and mobility data: A supervised machine learning approach
title_sort early prediction of sars-cov-2 reproductive number from environmental, atmospheric and mobility data: a supervised machine learning approach
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8970608/
https://www.ncbi.nlm.nih.gov/pubmed/35390590
http://dx.doi.org/10.1016/j.ijmedinf.2022.104755
work_keys_str_mv AT carusopierfrancesco earlypredictionofsarscov2reproductivenumberfromenvironmentalatmosphericandmobilitydataasupervisedmachinelearningapproach
AT angelottigiovanni earlypredictionofsarscov2reproductivenumberfromenvironmentalatmosphericandmobilitydataasupervisedmachinelearningapproach
AT grecomassimiliano earlypredictionofsarscov2reproductivenumberfromenvironmentalatmosphericandmobilitydataasupervisedmachinelearningapproach
AT guzzettagiorgio earlypredictionofsarscov2reproductivenumberfromenvironmentalatmosphericandmobilitydataasupervisedmachinelearningapproach
AT ceredadanilo earlypredictionofsarscov2reproductivenumberfromenvironmentalatmosphericandmobilitydataasupervisedmachinelearningapproach
AT merlerstefano earlypredictionofsarscov2reproductivenumberfromenvironmentalatmosphericandmobilitydataasupervisedmachinelearningapproach
AT cecconimaurizio earlypredictionofsarscov2reproductivenumberfromenvironmentalatmosphericandmobilitydataasupervisedmachinelearningapproach