Cargando…

Modeling household online shopping demand in the U.S.: a machine learning approach and comparative investigation between 2009 and 2017

Despite the rapid growth of online shopping and research interest in the relationship between online and in-store shopping, national-level modeling and investigation of the demand for online shopping with a prediction focus remain limited in the literature. This paper differs from prior work and lev...

Descripción completa

Detalles Bibliográficos
Autores principales: Barua, Limon, Zou, Bo, Zhou, Yan, Liu, Yulin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8637526/
https://www.ncbi.nlm.nih.gov/pubmed/34873350
http://dx.doi.org/10.1007/s11116-021-10250-z
_version_ 1784608758497804288
author Barua, Limon
Zou, Bo
Zhou, Yan
Liu, Yulin
author_facet Barua, Limon
Zou, Bo
Zhou, Yan
Liu, Yulin
author_sort Barua, Limon
collection PubMed
description Despite the rapid growth of online shopping and research interest in the relationship between online and in-store shopping, national-level modeling and investigation of the demand for online shopping with a prediction focus remain limited in the literature. This paper differs from prior work and leverages two recent releases of the U.S. National Household Travel Survey (NHTS) data for 2009 and 2017 to develop machine learning (ML) models, specifically gradient boosting machine (GBM), for predicting household-level online shopping purchases. The NHTS data allow for not only conducting nationwide investigation but also at the level of households, which is more appropriate than at the individual level given the connected consumption and shopping needs of members in a household. We follow a systematic procedure for model development including employing Recursive Feature Elimination algorithm to select input variables (features) in order to reduce the risk of model overfitting and increase model explainability. Among several ML models, GBM is found to yield the best prediction accuracy. Extensive post-modeling investigation is conducted in a comparative manner between 2009 and 2017, including quantifying the importance of each input variable in predicting online shopping demand, and characterizing value-dependent relationships between demand and the input variables. In doing so, two latest advances in machine learning techniques, namely Shapley value-based feature importance and Accumulated Local Effects plots, are adopted to overcome inherent drawbacks of the popular techniques in current ML modeling. The modeling and investigation are performed at the national level, with a number of findings obtained. The models developed and insights gained can be used for online shopping-related freight demand generation and may also be considered for evaluating the potential impact of relevant policies on online shopping demand.
format Online
Article
Text
id pubmed-8637526
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Springer US
record_format MEDLINE/PubMed
spelling pubmed-86375262021-12-02 Modeling household online shopping demand in the U.S.: a machine learning approach and comparative investigation between 2009 and 2017 Barua, Limon Zou, Bo Zhou, Yan Liu, Yulin Transportation (Amst) Article Despite the rapid growth of online shopping and research interest in the relationship between online and in-store shopping, national-level modeling and investigation of the demand for online shopping with a prediction focus remain limited in the literature. This paper differs from prior work and leverages two recent releases of the U.S. National Household Travel Survey (NHTS) data for 2009 and 2017 to develop machine learning (ML) models, specifically gradient boosting machine (GBM), for predicting household-level online shopping purchases. The NHTS data allow for not only conducting nationwide investigation but also at the level of households, which is more appropriate than at the individual level given the connected consumption and shopping needs of members in a household. We follow a systematic procedure for model development including employing Recursive Feature Elimination algorithm to select input variables (features) in order to reduce the risk of model overfitting and increase model explainability. Among several ML models, GBM is found to yield the best prediction accuracy. Extensive post-modeling investigation is conducted in a comparative manner between 2009 and 2017, including quantifying the importance of each input variable in predicting online shopping demand, and characterizing value-dependent relationships between demand and the input variables. In doing so, two latest advances in machine learning techniques, namely Shapley value-based feature importance and Accumulated Local Effects plots, are adopted to overcome inherent drawbacks of the popular techniques in current ML modeling. The modeling and investigation are performed at the national level, with a number of findings obtained. The models developed and insights gained can be used for online shopping-related freight demand generation and may also be considered for evaluating the potential impact of relevant policies on online shopping demand. Springer US 2021-12-02 2023 /pmc/articles/PMC8637526/ /pubmed/34873350 http://dx.doi.org/10.1007/s11116-021-10250-z Text en © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Barua, Limon
Zou, Bo
Zhou, Yan
Liu, Yulin
Modeling household online shopping demand in the U.S.: a machine learning approach and comparative investigation between 2009 and 2017
title Modeling household online shopping demand in the U.S.: a machine learning approach and comparative investigation between 2009 and 2017
title_full Modeling household online shopping demand in the U.S.: a machine learning approach and comparative investigation between 2009 and 2017
title_fullStr Modeling household online shopping demand in the U.S.: a machine learning approach and comparative investigation between 2009 and 2017
title_full_unstemmed Modeling household online shopping demand in the U.S.: a machine learning approach and comparative investigation between 2009 and 2017
title_short Modeling household online shopping demand in the U.S.: a machine learning approach and comparative investigation between 2009 and 2017
title_sort modeling household online shopping demand in the u.s.: a machine learning approach and comparative investigation between 2009 and 2017
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8637526/
https://www.ncbi.nlm.nih.gov/pubmed/34873350
http://dx.doi.org/10.1007/s11116-021-10250-z
work_keys_str_mv AT barualimon modelinghouseholdonlineshoppingdemandintheusamachinelearningapproachandcomparativeinvestigationbetween2009and2017
AT zoubo modelinghouseholdonlineshoppingdemandintheusamachinelearningapproachandcomparativeinvestigationbetween2009and2017
AT zhouyan modelinghouseholdonlineshoppingdemandintheusamachinelearningapproachandcomparativeinvestigationbetween2009and2017
AT liuyulin modelinghouseholdonlineshoppingdemandintheusamachinelearningapproachandcomparativeinvestigationbetween2009and2017