Cargando…

Development of a Pipeline for Adverse Drug Reaction Identification in Clinical Notes: Word Embedding Models and String Matching

BACKGROUND: Knowledge about adverse drug reactions (ADRs) in the population is limited because of underreporting, which hampers surveillance and assessment of drug safety. Therefore, gathering accurate information that can be retrieved from clinical notes about the incidence of ADRs is of great rele...

Descripción completa

Detalles Bibliográficos
Autores principales: Siegersma, Klaske R, Evers, Maxime, Bots, Sophie H, Groepenhoff, Floor, Appelman, Yolande, Hofstra, Leonard, Tulevski, Igor I, Somsen, G Aernout, den Ruijter, Hester M, Spruit, Marco, Onland-Moret, N Charlotte
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8826143/
https://www.ncbi.nlm.nih.gov/pubmed/35076407
http://dx.doi.org/10.2196/31063
_version_ 1784647370679517184
author Siegersma, Klaske R
Evers, Maxime
Bots, Sophie H
Groepenhoff, Floor
Appelman, Yolande
Hofstra, Leonard
Tulevski, Igor I
Somsen, G Aernout
den Ruijter, Hester M
Spruit, Marco
Onland-Moret, N Charlotte
author_facet Siegersma, Klaske R
Evers, Maxime
Bots, Sophie H
Groepenhoff, Floor
Appelman, Yolande
Hofstra, Leonard
Tulevski, Igor I
Somsen, G Aernout
den Ruijter, Hester M
Spruit, Marco
Onland-Moret, N Charlotte
author_sort Siegersma, Klaske R
collection PubMed
description BACKGROUND: Knowledge about adverse drug reactions (ADRs) in the population is limited because of underreporting, which hampers surveillance and assessment of drug safety. Therefore, gathering accurate information that can be retrieved from clinical notes about the incidence of ADRs is of great relevance. However, manual labeling of these notes is time-consuming, and automatization can improve the use of free-text clinical notes for the identification of ADRs. Furthermore, tools for language processing in languages other than English are not widely available. OBJECTIVE: The aim of this study is to design and evaluate a method for automatic extraction of medication and Adverse Drug Reaction Identification in Clinical Notes (ADRIN). METHODS: Dutch free-text clinical notes (N=277,398) and medication registrations (N=499,435) from the Cardiology Centers of the Netherlands database were used. All clinical notes were used to develop word embedding models. Vector representations of word embedding models and string matching with a medical dictionary (Medical Dictionary for Regulatory Activities [MedDRA]) were used for identification of ADRs and medication in a test set of clinical notes that were manually labeled. Several settings, including search area and punctuation, could be adjusted in the prototype to evaluate the optimal version of the prototype. RESULTS: The ADRIN method was evaluated using a test set of 988 clinical notes written on the stop date of a drug. Multiple versions of the prototype were evaluated for a variety of tasks. Binary classification of ADR presence achieved the highest accuracy of 0.84. Reduced search area and inclusion of punctuation improved performance, whereas incorporation of the MedDRA did not improve the performance of the pipeline. CONCLUSIONS: The ADRIN method and prototype are effective in recognizing ADRs in Dutch clinical notes from cardiac diagnostic screening centers. Surprisingly, incorporation of the MedDRA did not result in improved identification on top of word embedding models. The implementation of the ADRIN tool may help increase the identification of ADRs, resulting in better care and saving substantial health care costs.
format Online
Article
Text
id pubmed-8826143
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-88261432022-02-11 Development of a Pipeline for Adverse Drug Reaction Identification in Clinical Notes: Word Embedding Models and String Matching Siegersma, Klaske R Evers, Maxime Bots, Sophie H Groepenhoff, Floor Appelman, Yolande Hofstra, Leonard Tulevski, Igor I Somsen, G Aernout den Ruijter, Hester M Spruit, Marco Onland-Moret, N Charlotte JMIR Med Inform Original Paper BACKGROUND: Knowledge about adverse drug reactions (ADRs) in the population is limited because of underreporting, which hampers surveillance and assessment of drug safety. Therefore, gathering accurate information that can be retrieved from clinical notes about the incidence of ADRs is of great relevance. However, manual labeling of these notes is time-consuming, and automatization can improve the use of free-text clinical notes for the identification of ADRs. Furthermore, tools for language processing in languages other than English are not widely available. OBJECTIVE: The aim of this study is to design and evaluate a method for automatic extraction of medication and Adverse Drug Reaction Identification in Clinical Notes (ADRIN). METHODS: Dutch free-text clinical notes (N=277,398) and medication registrations (N=499,435) from the Cardiology Centers of the Netherlands database were used. All clinical notes were used to develop word embedding models. Vector representations of word embedding models and string matching with a medical dictionary (Medical Dictionary for Regulatory Activities [MedDRA]) were used for identification of ADRs and medication in a test set of clinical notes that were manually labeled. Several settings, including search area and punctuation, could be adjusted in the prototype to evaluate the optimal version of the prototype. RESULTS: The ADRIN method was evaluated using a test set of 988 clinical notes written on the stop date of a drug. Multiple versions of the prototype were evaluated for a variety of tasks. Binary classification of ADR presence achieved the highest accuracy of 0.84. Reduced search area and inclusion of punctuation improved performance, whereas incorporation of the MedDRA did not improve the performance of the pipeline. CONCLUSIONS: The ADRIN method and prototype are effective in recognizing ADRs in Dutch clinical notes from cardiac diagnostic screening centers. Surprisingly, incorporation of the MedDRA did not result in improved identification on top of word embedding models. The implementation of the ADRIN tool may help increase the identification of ADRs, resulting in better care and saving substantial health care costs. JMIR Publications 2022-01-25 /pmc/articles/PMC8826143/ /pubmed/35076407 http://dx.doi.org/10.2196/31063 Text en ©Klaske R Siegersma, Maxime Evers, Sophie H Bots, Floor Groepenhoff, Yolande Appelman, Leonard Hofstra, Igor I Tulevski, G Aernout Somsen, Hester M den Ruijter, Marco Spruit, N Charlotte Onland-Moret. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 25.01.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Siegersma, Klaske R
Evers, Maxime
Bots, Sophie H
Groepenhoff, Floor
Appelman, Yolande
Hofstra, Leonard
Tulevski, Igor I
Somsen, G Aernout
den Ruijter, Hester M
Spruit, Marco
Onland-Moret, N Charlotte
Development of a Pipeline for Adverse Drug Reaction Identification in Clinical Notes: Word Embedding Models and String Matching
title Development of a Pipeline for Adverse Drug Reaction Identification in Clinical Notes: Word Embedding Models and String Matching
title_full Development of a Pipeline for Adverse Drug Reaction Identification in Clinical Notes: Word Embedding Models and String Matching
title_fullStr Development of a Pipeline for Adverse Drug Reaction Identification in Clinical Notes: Word Embedding Models and String Matching
title_full_unstemmed Development of a Pipeline for Adverse Drug Reaction Identification in Clinical Notes: Word Embedding Models and String Matching
title_short Development of a Pipeline for Adverse Drug Reaction Identification in Clinical Notes: Word Embedding Models and String Matching
title_sort development of a pipeline for adverse drug reaction identification in clinical notes: word embedding models and string matching
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8826143/
https://www.ncbi.nlm.nih.gov/pubmed/35076407
http://dx.doi.org/10.2196/31063
work_keys_str_mv AT siegersmaklasker developmentofapipelineforadversedrugreactionidentificationinclinicalnoteswordembeddingmodelsandstringmatching
AT eversmaxime developmentofapipelineforadversedrugreactionidentificationinclinicalnoteswordembeddingmodelsandstringmatching
AT botssophieh developmentofapipelineforadversedrugreactionidentificationinclinicalnoteswordembeddingmodelsandstringmatching
AT groepenhofffloor developmentofapipelineforadversedrugreactionidentificationinclinicalnoteswordembeddingmodelsandstringmatching
AT appelmanyolande developmentofapipelineforadversedrugreactionidentificationinclinicalnoteswordembeddingmodelsandstringmatching
AT hofstraleonard developmentofapipelineforadversedrugreactionidentificationinclinicalnoteswordembeddingmodelsandstringmatching
AT tulevskiigori developmentofapipelineforadversedrugreactionidentificationinclinicalnoteswordembeddingmodelsandstringmatching
AT somsengaernout developmentofapipelineforadversedrugreactionidentificationinclinicalnoteswordembeddingmodelsandstringmatching
AT denruijterhesterm developmentofapipelineforadversedrugreactionidentificationinclinicalnoteswordembeddingmodelsandstringmatching
AT spruitmarco developmentofapipelineforadversedrugreactionidentificationinclinicalnoteswordembeddingmodelsandstringmatching
AT onlandmoretncharlotte developmentofapipelineforadversedrugreactionidentificationinclinicalnoteswordembeddingmodelsandstringmatching