Cargando…
ADE Eval: An Evaluation of Text Processing Systems for Adverse Event Extraction from Drug Labels for Pharmacovigilance
INTRODUCTION: The US FDA is interested in a tool that would enable pharmacovigilance safety evaluators to automate the identification of adverse drug events (ADEs) mentioned in FDA prescribing information. The MITRE Corporation (MITRE) and the FDA organized a shared task—Adverse Drug Event Evaluatio...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7813736/ https://www.ncbi.nlm.nih.gov/pubmed/33006728 http://dx.doi.org/10.1007/s40264-020-00996-3 |
Sumario: | INTRODUCTION: The US FDA is interested in a tool that would enable pharmacovigilance safety evaluators to automate the identification of adverse drug events (ADEs) mentioned in FDA prescribing information. The MITRE Corporation (MITRE) and the FDA organized a shared task—Adverse Drug Event Evaluation (ADE Eval)—to determine whether the performance of algorithms currently used for natural language processing (NLP) might be good enough for real-world use. OBJECTIVE: ADE Eval was conducted to evaluate a range of NLP techniques for identifying ADEs mentioned in publicly available FDA-approved drug labels (package inserts). It was designed specifically to reflect pharmacovigilance practices within the FDA and model possible pharmacovigilance use cases. METHODS: Pharmacovigilance-specific annotation guidelines and annotated corpora were created. Two metrics modeled the experiences of FDA safety evaluators: one measured the ability of an algorithm to identify correct Medical Dictionary for Regulatory Activities (MedDRA(®)) terms for the text from the annotated corpora, and the other assessed the quality of evidence extracted from the corpora to support the selected MedDRA(®) term by measuring the portion of annotated text an algorithm correctly identified. A third metric assessed the cost of correcting system output for subsequent training (averaged, weighted F1-measure for mention finding). RESULTS: In total, 13 teams submitted 23 runs: the top MedDRA(®) coding F1-measure was 0.79, the top quality score was 0.96, and the top mention-finding F1-measure was 0.89. CONCLUSION: While NLP techniques do not perform at levels that would allow them to be used without intervention, it is now worthwhile exploring making NLP outputs available in human pharmacovigilance workflows. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1007/s40264-020-00996-3) contains supplementary material, which is available to authorized users. |
---|