Cargando…

Prospective Evaluation of Adverse Event Recognition Systems in Twitter: Results from the Web-RADR Project

INTRODUCTION: A large number of studies on systems to detect and sometimes normalize adverse events (AEs) in social media have been published, but evidence of their practical utility is scarce. This raises the question of the transferability of such systems to new settings. OBJECTIVES: The aims of t...

Descripción completa

Detalles Bibliográficos
Autores principales: Gattepaille, Lucie M., Hedfors Vidlin, Sara, Bergvall, Tomas, Pierce, Carrie E., Ellenius, Johan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7395913/
https://www.ncbi.nlm.nih.gov/pubmed/32410156
http://dx.doi.org/10.1007/s40264-020-00942-3
_version_ 1783565481154707456
author Gattepaille, Lucie M.
Hedfors Vidlin, Sara
Bergvall, Tomas
Pierce, Carrie E.
Ellenius, Johan
author_facet Gattepaille, Lucie M.
Hedfors Vidlin, Sara
Bergvall, Tomas
Pierce, Carrie E.
Ellenius, Johan
author_sort Gattepaille, Lucie M.
collection PubMed
description INTRODUCTION: A large number of studies on systems to detect and sometimes normalize adverse events (AEs) in social media have been published, but evidence of their practical utility is scarce. This raises the question of the transferability of such systems to new settings. OBJECTIVES: The aims of this study were to develop an AE recognition system, prospectively evaluate its performance on an external benchmark dataset and identify potential factors influencing the transferability of AE recognition systems. METHODS: A pipeline based on dictionary lookups and logistic regression classifiers was developed using a proprietary dataset of 196,533 Tweets manually annotated for AE relations and prospectively evaluated the system on the publicly available WEB-RADR reference dataset, exploring different aspects affecting transferability. RESULTS: Our system achieved 0.53 precision, 0.52 recall and 0.52 F1-score on the development test set; however, when applied to the WEB-RADR reference dataset, system performance dropped to 0.38 precision, 0.20 recall and 0.26 F1-score. Similarly, a previously published method aiming at automatically detecting adverse event posts reported 0.5 precision, 0.92 recall and 0.65 F1-score on thus another dataset, while performance on the WEB-RADR reference dataset was reduced to 0.37 precision, 0.63 recall and 0.46 F1-score. We identified four potential factors leading to poor transferability: overfitting, selection bias, label bias and prevalence. CONCLUSION: We warn the community about a potentially large discrepancy between the expected performance of automated AE recognition systems based on published results and the actual observed performance on independent data. This study highlights the difficulty of implementing an all-purpose system for automatic adverse event recognition in Twitter, which could explain the lack of such systems in practical pharmacovigilance settings. Our recommendation is to use benchmark independent datasets, such as the WEB-RADR reference, to investigate the transferability of the adverse event recognition systems and ultimately enforce rigorous comparisons across studies on the task. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1007/s40264-020-00942-3) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-7395913
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-73959132020-08-18 Prospective Evaluation of Adverse Event Recognition Systems in Twitter: Results from the Web-RADR Project Gattepaille, Lucie M. Hedfors Vidlin, Sara Bergvall, Tomas Pierce, Carrie E. Ellenius, Johan Drug Saf Original Research Article INTRODUCTION: A large number of studies on systems to detect and sometimes normalize adverse events (AEs) in social media have been published, but evidence of their practical utility is scarce. This raises the question of the transferability of such systems to new settings. OBJECTIVES: The aims of this study were to develop an AE recognition system, prospectively evaluate its performance on an external benchmark dataset and identify potential factors influencing the transferability of AE recognition systems. METHODS: A pipeline based on dictionary lookups and logistic regression classifiers was developed using a proprietary dataset of 196,533 Tweets manually annotated for AE relations and prospectively evaluated the system on the publicly available WEB-RADR reference dataset, exploring different aspects affecting transferability. RESULTS: Our system achieved 0.53 precision, 0.52 recall and 0.52 F1-score on the development test set; however, when applied to the WEB-RADR reference dataset, system performance dropped to 0.38 precision, 0.20 recall and 0.26 F1-score. Similarly, a previously published method aiming at automatically detecting adverse event posts reported 0.5 precision, 0.92 recall and 0.65 F1-score on thus another dataset, while performance on the WEB-RADR reference dataset was reduced to 0.37 precision, 0.63 recall and 0.46 F1-score. We identified four potential factors leading to poor transferability: overfitting, selection bias, label bias and prevalence. CONCLUSION: We warn the community about a potentially large discrepancy between the expected performance of automated AE recognition systems based on published results and the actual observed performance on independent data. This study highlights the difficulty of implementing an all-purpose system for automatic adverse event recognition in Twitter, which could explain the lack of such systems in practical pharmacovigilance settings. Our recommendation is to use benchmark independent datasets, such as the WEB-RADR reference, to investigate the transferability of the adverse event recognition systems and ultimately enforce rigorous comparisons across studies on the task. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1007/s40264-020-00942-3) contains supplementary material, which is available to authorized users. Springer International Publishing 2020-05-14 2020 /pmc/articles/PMC7395913/ /pubmed/32410156 http://dx.doi.org/10.1007/s40264-020-00942-3 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, which permits any non-commercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle Original Research Article
Gattepaille, Lucie M.
Hedfors Vidlin, Sara
Bergvall, Tomas
Pierce, Carrie E.
Ellenius, Johan
Prospective Evaluation of Adverse Event Recognition Systems in Twitter: Results from the Web-RADR Project
title Prospective Evaluation of Adverse Event Recognition Systems in Twitter: Results from the Web-RADR Project
title_full Prospective Evaluation of Adverse Event Recognition Systems in Twitter: Results from the Web-RADR Project
title_fullStr Prospective Evaluation of Adverse Event Recognition Systems in Twitter: Results from the Web-RADR Project
title_full_unstemmed Prospective Evaluation of Adverse Event Recognition Systems in Twitter: Results from the Web-RADR Project
title_short Prospective Evaluation of Adverse Event Recognition Systems in Twitter: Results from the Web-RADR Project
title_sort prospective evaluation of adverse event recognition systems in twitter: results from the web-radr project
topic Original Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7395913/
https://www.ncbi.nlm.nih.gov/pubmed/32410156
http://dx.doi.org/10.1007/s40264-020-00942-3
work_keys_str_mv AT gattepailleluciem prospectiveevaluationofadverseeventrecognitionsystemsintwitterresultsfromthewebradrproject
AT hedforsvidlinsara prospectiveevaluationofadverseeventrecognitionsystemsintwitterresultsfromthewebradrproject
AT bergvalltomas prospectiveevaluationofadverseeventrecognitionsystemsintwitterresultsfromthewebradrproject
AT piercecarriee prospectiveevaluationofadverseeventrecognitionsystemsintwitterresultsfromthewebradrproject
AT elleniusjohan prospectiveevaluationofadverseeventrecognitionsystemsintwitterresultsfromthewebradrproject