Cargando…

Development of a Machine Learning Model Using Multiple, Heterogeneous Data Sources to Estimate Weekly US Suicide Fatalities

IMPORTANCE: Suicide is a leading cause of death in the US. However, official national statistics on suicide rates are delayed by 1 to 2 years, hampering evidence-based public health planning and decision-making. OBJECTIVE: To estimate weekly suicide fatalities in the US in near real time. DESIGN, SE...

Descripción completa

Detalles Bibliográficos
Autores principales: Choi, Daejin, Sumner, Steven A., Holland, Kristin M., Draper, John, Murphy, Sean, Bowen, Daniel A., Zwald, Marissa, Wang, Jing, Law, Royal, Taylor, Jordan, Konjeti, Chaitanya, De Choudhury, Munmun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Medical Association 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7758810/
https://www.ncbi.nlm.nih.gov/pubmed/33355678
http://dx.doi.org/10.1001/jamanetworkopen.2020.30932
_version_ 1783627008483262464
author Choi, Daejin
Sumner, Steven A.
Holland, Kristin M.
Draper, John
Murphy, Sean
Bowen, Daniel A.
Zwald, Marissa
Wang, Jing
Law, Royal
Taylor, Jordan
Konjeti, Chaitanya
De Choudhury, Munmun
author_facet Choi, Daejin
Sumner, Steven A.
Holland, Kristin M.
Draper, John
Murphy, Sean
Bowen, Daniel A.
Zwald, Marissa
Wang, Jing
Law, Royal
Taylor, Jordan
Konjeti, Chaitanya
De Choudhury, Munmun
author_sort Choi, Daejin
collection PubMed
description IMPORTANCE: Suicide is a leading cause of death in the US. However, official national statistics on suicide rates are delayed by 1 to 2 years, hampering evidence-based public health planning and decision-making. OBJECTIVE: To estimate weekly suicide fatalities in the US in near real time. DESIGN, SETTING, AND PARTICIPANTS: This cross-sectional national study used a machine learning pipeline to combine signals from several streams of real-time information to estimate weekly suicide fatalities in the US in near real time. This 2-phase approach first fits optimal machine learning models to each individual data stream and subsequently combines predictions made from each data stream via an artificial neural network. National-level US administrative data on suicide deaths, health services, and economic, meteorological, and online data were variously obtained from 2014 to 2017. Data were analyzed from January 1, 2014, to December 31, 2017. EXPOSURES: Longitudinal data on suicide-related exposures were obtained from multiple, heterogeneous streams: emergency department visits for suicide ideation and attempts collected via the National Syndromic Surveillance Program (2015-2017); calls to the National Suicide Prevention Lifeline (2014-2017); calls to US poison control centers for intentional self-harm (2014-2017); consumer price index and seasonality-adjusted unemployment rate, hourly earnings, home price index, and 3-month and 10-year yield curves from the Federal Reserve Economic Data (2014-2017); weekly daylight hours (2014-2017); Google and YouTube search trends related to suicide (2014-2017); and public posts on suicide on Reddit (2 314 533 posts), Twitter (9 327 472 tweets; 2015-2017), and Tumblr (1 670 378 posts; 2014-2017). MAIN OUTCOMES AND MEASURES: Weekly estimates of suicide fatalities in the US were obtained through a machine learning pipeline that integrated the above data sources. Estimates were compared statistically with actual fatalities recorded by the National Vital Statistics System. RESULTS: Combining information from multiple data streams, the machine learning method yielded estimates of weekly suicide deaths with high correlation to actual counts and trends (Pearson correlation, 0.811; P < .001), while estimating annual suicide rates with low error (0.55%). CONCLUSIONS AND RELEVANCE: The proposed ensemble machine learning framework reduces the error for annual suicide rate estimation to less than one-tenth of that of current forecasting approaches that use only historical information on suicide deaths. These findings establish a novel approach for tracking suicide fatalities in near real time and provide the potential for an effective public health response such as supporting budgetary decisions or deploying interventions.
format Online
Article
Text
id pubmed-7758810
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher American Medical Association
record_format MEDLINE/PubMed
spelling pubmed-77588102021-01-04 Development of a Machine Learning Model Using Multiple, Heterogeneous Data Sources to Estimate Weekly US Suicide Fatalities Choi, Daejin Sumner, Steven A. Holland, Kristin M. Draper, John Murphy, Sean Bowen, Daniel A. Zwald, Marissa Wang, Jing Law, Royal Taylor, Jordan Konjeti, Chaitanya De Choudhury, Munmun JAMA Netw Open Original Investigation IMPORTANCE: Suicide is a leading cause of death in the US. However, official national statistics on suicide rates are delayed by 1 to 2 years, hampering evidence-based public health planning and decision-making. OBJECTIVE: To estimate weekly suicide fatalities in the US in near real time. DESIGN, SETTING, AND PARTICIPANTS: This cross-sectional national study used a machine learning pipeline to combine signals from several streams of real-time information to estimate weekly suicide fatalities in the US in near real time. This 2-phase approach first fits optimal machine learning models to each individual data stream and subsequently combines predictions made from each data stream via an artificial neural network. National-level US administrative data on suicide deaths, health services, and economic, meteorological, and online data were variously obtained from 2014 to 2017. Data were analyzed from January 1, 2014, to December 31, 2017. EXPOSURES: Longitudinal data on suicide-related exposures were obtained from multiple, heterogeneous streams: emergency department visits for suicide ideation and attempts collected via the National Syndromic Surveillance Program (2015-2017); calls to the National Suicide Prevention Lifeline (2014-2017); calls to US poison control centers for intentional self-harm (2014-2017); consumer price index and seasonality-adjusted unemployment rate, hourly earnings, home price index, and 3-month and 10-year yield curves from the Federal Reserve Economic Data (2014-2017); weekly daylight hours (2014-2017); Google and YouTube search trends related to suicide (2014-2017); and public posts on suicide on Reddit (2 314 533 posts), Twitter (9 327 472 tweets; 2015-2017), and Tumblr (1 670 378 posts; 2014-2017). MAIN OUTCOMES AND MEASURES: Weekly estimates of suicide fatalities in the US were obtained through a machine learning pipeline that integrated the above data sources. Estimates were compared statistically with actual fatalities recorded by the National Vital Statistics System. RESULTS: Combining information from multiple data streams, the machine learning method yielded estimates of weekly suicide deaths with high correlation to actual counts and trends (Pearson correlation, 0.811; P < .001), while estimating annual suicide rates with low error (0.55%). CONCLUSIONS AND RELEVANCE: The proposed ensemble machine learning framework reduces the error for annual suicide rate estimation to less than one-tenth of that of current forecasting approaches that use only historical information on suicide deaths. These findings establish a novel approach for tracking suicide fatalities in near real time and provide the potential for an effective public health response such as supporting budgetary decisions or deploying interventions. American Medical Association 2020-12-23 /pmc/articles/PMC7758810/ /pubmed/33355678 http://dx.doi.org/10.1001/jamanetworkopen.2020.30932 Text en Copyright 2020 Choi D et al. JAMA Network Open. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the CC-BY License.
spellingShingle Original Investigation
Choi, Daejin
Sumner, Steven A.
Holland, Kristin M.
Draper, John
Murphy, Sean
Bowen, Daniel A.
Zwald, Marissa
Wang, Jing
Law, Royal
Taylor, Jordan
Konjeti, Chaitanya
De Choudhury, Munmun
Development of a Machine Learning Model Using Multiple, Heterogeneous Data Sources to Estimate Weekly US Suicide Fatalities
title Development of a Machine Learning Model Using Multiple, Heterogeneous Data Sources to Estimate Weekly US Suicide Fatalities
title_full Development of a Machine Learning Model Using Multiple, Heterogeneous Data Sources to Estimate Weekly US Suicide Fatalities
title_fullStr Development of a Machine Learning Model Using Multiple, Heterogeneous Data Sources to Estimate Weekly US Suicide Fatalities
title_full_unstemmed Development of a Machine Learning Model Using Multiple, Heterogeneous Data Sources to Estimate Weekly US Suicide Fatalities
title_short Development of a Machine Learning Model Using Multiple, Heterogeneous Data Sources to Estimate Weekly US Suicide Fatalities
title_sort development of a machine learning model using multiple, heterogeneous data sources to estimate weekly us suicide fatalities
topic Original Investigation
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7758810/
https://www.ncbi.nlm.nih.gov/pubmed/33355678
http://dx.doi.org/10.1001/jamanetworkopen.2020.30932
work_keys_str_mv AT choidaejin developmentofamachinelearningmodelusingmultipleheterogeneousdatasourcestoestimateweeklyussuicidefatalities
AT sumnerstevena developmentofamachinelearningmodelusingmultipleheterogeneousdatasourcestoestimateweeklyussuicidefatalities
AT hollandkristinm developmentofamachinelearningmodelusingmultipleheterogeneousdatasourcestoestimateweeklyussuicidefatalities
AT draperjohn developmentofamachinelearningmodelusingmultipleheterogeneousdatasourcestoestimateweeklyussuicidefatalities
AT murphysean developmentofamachinelearningmodelusingmultipleheterogeneousdatasourcestoestimateweeklyussuicidefatalities
AT bowendaniela developmentofamachinelearningmodelusingmultipleheterogeneousdatasourcestoestimateweeklyussuicidefatalities
AT zwaldmarissa developmentofamachinelearningmodelusingmultipleheterogeneousdatasourcestoestimateweeklyussuicidefatalities
AT wangjing developmentofamachinelearningmodelusingmultipleheterogeneousdatasourcestoestimateweeklyussuicidefatalities
AT lawroyal developmentofamachinelearningmodelusingmultipleheterogeneousdatasourcestoestimateweeklyussuicidefatalities
AT taylorjordan developmentofamachinelearningmodelusingmultipleheterogeneousdatasourcestoestimateweeklyussuicidefatalities
AT konjetichaitanya developmentofamachinelearningmodelusingmultipleheterogeneousdatasourcestoestimateweeklyussuicidefatalities
AT dechoudhurymunmun developmentofamachinelearningmodelusingmultipleheterogeneousdatasourcestoestimateweeklyussuicidefatalities