Cargando…

Sorting Through the Safety Data Haystack: Using Machine Learning to Identify Individual Case Safety Reports in Social-Digital Media

INTRODUCTION: There is increasing interest in social digital media (SDM) as a data source for pharmacovigilance activities; however, SDM is considered a low information content data source for safety data. Given that pharmacovigilance itself operates in a high-noise, lower-validity environment witho...

Descripción completa

Detalles Bibliográficos
Autores principales:	Comfort, Shaun, Perera, Sujan, Hudson, Zoe, Dorrell, Darren, Meireis, Shawman, Nagarajan, Meenakshi, Ramakrishnan, Cartic, Fine, Jennifer
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer International Publishing 2018
Materias:	Original Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5966485/ https://www.ncbi.nlm.nih.gov/pubmed/29446035 http://dx.doi.org/10.1007/s40264-018-0641-7

_version_	1783325468682878976
author	Comfort, Shaun Perera, Sujan Hudson, Zoe Dorrell, Darren Meireis, Shawman Nagarajan, Meenakshi Ramakrishnan, Cartic Fine, Jennifer
author_facet	Comfort, Shaun Perera, Sujan Hudson, Zoe Dorrell, Darren Meireis, Shawman Nagarajan, Meenakshi Ramakrishnan, Cartic Fine, Jennifer
author_sort	Comfort, Shaun
collection	PubMed
description	INTRODUCTION: There is increasing interest in social digital media (SDM) as a data source for pharmacovigilance activities; however, SDM is considered a low information content data source for safety data. Given that pharmacovigilance itself operates in a high-noise, lower-validity environment without objective ‘gold standards’ beyond process definitions, the introduction of large volumes of SDM into the pharmacovigilance workflow has the potential to exacerbate issues with limited manual resources to perform adverse event identification and processing. Recent advances in medical informatics have resulted in methods for developing programs which can assist human experts in the detection of valid individual case safety reports (ICSRs) within SDM. OBJECTIVE: In this study, we developed rule-based and machine learning (ML) models for classifying ICSRs from SDM and compared their performance with that of human pharmacovigilance experts. METHODS: We used a random sampling from a collection of 311,189 SDM posts that mentioned Roche products and brands in combination with common medical and scientific terms sourced from Twitter, Tumblr, Facebook, and a spectrum of news media blogs to develop and evaluate three iterations of an automated ICSR classifier. The ICSR classifier models consisted of sub-components to annotate the relevant ICSR elements and a component to make the final decision on the validity of the ICSR. Agreement with human pharmacovigilance experts was chosen as the preferred performance metric and was evaluated by calculating the Gwet AC1 statistic (gKappa). The best performing model was tested against the Roche global pharmacovigilance expert using a blind dataset and put through a time test of the full 311,189-post dataset. RESULTS: During this effort, the initial strict rule-based approach to ICSR classification resulted in a model with an accuracy of 65% and a gKappa of 46%. Adding an ML-based adverse event annotator improved the accuracy to 74% and gKappa to 60%. This was further improved by the addition of an additional ML ICSR detector. On a blind test set of 2500 posts, the final model demonstrated a gKappa of 78% and an accuracy of 83%. In the time test, it took the final model 48 h to complete a task that would have taken an estimated 44,000 h for human experts to perform. CONCLUSION: The results of this study indicate that an effective and scalable solution to the challenge of ICSR detection in SDM includes a workflow using an automated ML classifier to identify likely ICSRs for further human SME review. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1007/s40264-018-0641-7) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5966485
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Springer International Publishing
record_format	MEDLINE/PubMed
spelling	pubmed-59664852018-06-04 Sorting Through the Safety Data Haystack: Using Machine Learning to Identify Individual Case Safety Reports in Social-Digital Media Comfort, Shaun Perera, Sujan Hudson, Zoe Dorrell, Darren Meireis, Shawman Nagarajan, Meenakshi Ramakrishnan, Cartic Fine, Jennifer Drug Saf Original Research Article INTRODUCTION: There is increasing interest in social digital media (SDM) as a data source for pharmacovigilance activities; however, SDM is considered a low information content data source for safety data. Given that pharmacovigilance itself operates in a high-noise, lower-validity environment without objective ‘gold standards’ beyond process definitions, the introduction of large volumes of SDM into the pharmacovigilance workflow has the potential to exacerbate issues with limited manual resources to perform adverse event identification and processing. Recent advances in medical informatics have resulted in methods for developing programs which can assist human experts in the detection of valid individual case safety reports (ICSRs) within SDM. OBJECTIVE: In this study, we developed rule-based and machine learning (ML) models for classifying ICSRs from SDM and compared their performance with that of human pharmacovigilance experts. METHODS: We used a random sampling from a collection of 311,189 SDM posts that mentioned Roche products and brands in combination with common medical and scientific terms sourced from Twitter, Tumblr, Facebook, and a spectrum of news media blogs to develop and evaluate three iterations of an automated ICSR classifier. The ICSR classifier models consisted of sub-components to annotate the relevant ICSR elements and a component to make the final decision on the validity of the ICSR. Agreement with human pharmacovigilance experts was chosen as the preferred performance metric and was evaluated by calculating the Gwet AC1 statistic (gKappa). The best performing model was tested against the Roche global pharmacovigilance expert using a blind dataset and put through a time test of the full 311,189-post dataset. RESULTS: During this effort, the initial strict rule-based approach to ICSR classification resulted in a model with an accuracy of 65% and a gKappa of 46%. Adding an ML-based adverse event annotator improved the accuracy to 74% and gKappa to 60%. This was further improved by the addition of an additional ML ICSR detector. On a blind test set of 2500 posts, the final model demonstrated a gKappa of 78% and an accuracy of 83%. In the time test, it took the final model 48 h to complete a task that would have taken an estimated 44,000 h for human experts to perform. CONCLUSION: The results of this study indicate that an effective and scalable solution to the challenge of ICSR detection in SDM includes a workflow using an automated ML classifier to identify likely ICSRs for further human SME review. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1007/s40264-018-0641-7) contains supplementary material, which is available to authorized users. Springer International Publishing 2018-02-14 2018 /pmc/articles/PMC5966485/ /pubmed/29446035 http://dx.doi.org/10.1007/s40264-018-0641-7 Text en © The Author(s) 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle	Original Research Article Comfort, Shaun Perera, Sujan Hudson, Zoe Dorrell, Darren Meireis, Shawman Nagarajan, Meenakshi Ramakrishnan, Cartic Fine, Jennifer Sorting Through the Safety Data Haystack: Using Machine Learning to Identify Individual Case Safety Reports in Social-Digital Media
title	Sorting Through the Safety Data Haystack: Using Machine Learning to Identify Individual Case Safety Reports in Social-Digital Media
title_full	Sorting Through the Safety Data Haystack: Using Machine Learning to Identify Individual Case Safety Reports in Social-Digital Media
title_fullStr	Sorting Through the Safety Data Haystack: Using Machine Learning to Identify Individual Case Safety Reports in Social-Digital Media
title_full_unstemmed	Sorting Through the Safety Data Haystack: Using Machine Learning to Identify Individual Case Safety Reports in Social-Digital Media
title_short	Sorting Through the Safety Data Haystack: Using Machine Learning to Identify Individual Case Safety Reports in Social-Digital Media
title_sort	sorting through the safety data haystack: using machine learning to identify individual case safety reports in social-digital media
topic	Original Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5966485/ https://www.ncbi.nlm.nih.gov/pubmed/29446035 http://dx.doi.org/10.1007/s40264-018-0641-7
work_keys_str_mv	AT comfortshaun sortingthroughthesafetydatahaystackusingmachinelearningtoidentifyindividualcasesafetyreportsinsocialdigitalmedia AT pererasujan sortingthroughthesafetydatahaystackusingmachinelearningtoidentifyindividualcasesafetyreportsinsocialdigitalmedia AT hudsonzoe sortingthroughthesafetydatahaystackusingmachinelearningtoidentifyindividualcasesafetyreportsinsocialdigitalmedia AT dorrelldarren sortingthroughthesafetydatahaystackusingmachinelearningtoidentifyindividualcasesafetyreportsinsocialdigitalmedia AT meireisshawman sortingthroughthesafetydatahaystackusingmachinelearningtoidentifyindividualcasesafetyreportsinsocialdigitalmedia AT nagarajanmeenakshi sortingthroughthesafetydatahaystackusingmachinelearningtoidentifyindividualcasesafetyreportsinsocialdigitalmedia AT ramakrishnancartic sortingthroughthesafetydatahaystackusingmachinelearningtoidentifyindividualcasesafetyreportsinsocialdigitalmedia AT finejennifer sortingthroughthesafetydatahaystackusingmachinelearningtoidentifyindividualcasesafetyreportsinsocialdigitalmedia

Sorting Through the Safety Data Haystack: Using Machine Learning to Identify Individual Case Safety Reports in Social-Digital Media

Ejemplares similares