Cargando…

Machine Learning and Natural Language Processing for Geolocation-Centric Monitoring and Characterization of Opioid-Related Social Media Chatter

IMPORTANCE: Automatic curation of consumer-generated, opioid-related social media big data may enable real-time monitoring of the opioid epidemic in the United States. OBJECTIVE: To develop and validate an automatic text-processing pipeline for geospatial and temporal analysis of opioid-mentioning s...

Descripción completa

Detalles Bibliográficos
Autores principales: Sarker, Abeed, Gonzalez-Hernandez, Graciela, Ruan, Yucheng, Perrone, Jeanmarie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Medical Association 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6865282/
https://www.ncbi.nlm.nih.gov/pubmed/31693125
http://dx.doi.org/10.1001/jamanetworkopen.2019.14672
Descripción
Sumario:IMPORTANCE: Automatic curation of consumer-generated, opioid-related social media big data may enable real-time monitoring of the opioid epidemic in the United States. OBJECTIVE: To develop and validate an automatic text-processing pipeline for geospatial and temporal analysis of opioid-mentioning social media chatter. DESIGN, SETTING, AND PARTICIPANTS: This cross-sectional, population-based study was conducted from December 1, 2017, to August 31, 2019, and used more than 3 years of publicly available social media posts on Twitter, dated from January 1, 2012, to October 31, 2015, that were geolocated in Pennsylvania. Opioid-mentioning tweets were extracted using prescription and illicit opioid names, including street names and misspellings. Social media posts (tweets) (n = 9006) were manually categorized into 4 classes, and training and evaluation of several machine learning algorithms were performed. Temporal and geospatial patterns were analyzed with the best-performing classifier on unlabeled data. MAIN OUTCOMES AND MEASURES: Pearson and Spearman correlations of county- and substate-level abuse-indicating tweet rates with opioid overdose death rates from the Centers for Disease Control and Prevention WONDER database and with 4 metrics from the National Survey on Drug Use and Health for 3 years were calculated. Classifier performances were measured through microaveraged F1 scores (harmonic mean of precision and recall) or accuracies and 95% CIs. RESULTS: A total of 9006 social media posts were annotated, of which 1748 (19.4%) were related to abuse, 2001 (22.2%) were related to information, 4830 (53.6%) were unrelated, and 427 (4.7%) were not in the English language. Yearly rates of abuse-indicating social media post showed statistically significant correlation with county-level opioid-related overdose death rates (n = 75) for 3 years (Pearson r = 0.451, P < .001; Spearman r = 0.331, P = .004). Abuse-indicating tweet rates showed consistent correlations with 4 NSDUH metrics (n = 13) associated with nonmedical prescription opioid use (Pearson r = 0.683, P = .01; Spearman r = 0.346, P = .25), illicit drug use (Pearson r = 0.850, P < .001; Spearman r = 0.341, P = .25), illicit drug dependence (Pearson r = 0.937, P < .001; Spearman r = 0.495, P = .09), and illicit drug dependence or abuse (Pearson r = 0.935, P < .001; Spearman r = 0.401, P = .17) over the same 3-year period, although the tests lacked power to demonstrate statistical significance. A classification approach involving an ensemble of classifiers produced the best performance in accuracy or microaveraged F1 score (0.726; 95% CI, 0.708-0.743). CONCLUSIONS AND RELEVANCE: The correlations obtained in this study suggest that a social media–based approach reliant on supervised machine learning may be suitable for geolocation-centric monitoring of the US opioid epidemic in near real time.