Cargando…
Evaluating Drug Risk Using GAN and SMOTE Based on CFDA's Spontaneous Reporting Data
Adverse drug reactions (ADRs) pose health threats to humans. Therefore, the risk re-evaluation of post-marketing drugs has become an important part of the pharmacovigilance work of various countries. In China, drugs are mainly divided into three categories, from high-risk to low-risk drugs, namely,...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8418931/ https://www.ncbi.nlm.nih.gov/pubmed/34493954 http://dx.doi.org/10.1155/2021/6033860 |
Sumario: | Adverse drug reactions (ADRs) pose health threats to humans. Therefore, the risk re-evaluation of post-marketing drugs has become an important part of the pharmacovigilance work of various countries. In China, drugs are mainly divided into three categories, from high-risk to low-risk drugs, namely, prescription drugs (Rx), over-the-counter drugs A (OTC-A), and over-the-counter drugs B (OTC-B). Until now, there has been a lack of automated evaluation methods for the three status switch of drugs. Based on China Food and Drug Administration's (CFDA) spontaneous reporting database (CSRD), we proposed a classification model to predict risk level of drugs by using feature enhancement based on Generative Adversarial Networks (GAN) and Synthetic Minority Over-Sampling Technique (SMOTE). A total of 985,960 spontaneous reports from 2011 to 2018 were selected from CSRD in Jiangsu Province as experimental data. After data preprocessing, a class-imbalance data set was obtained, which contained 887 Rx (accounting for 84.72%), 113 OTC-A (10.79%), and 47 OTC-B (4.49%). Taking drugs as the samples, ADRs as the features, and signal detection results obtained by proportional reporting ratio (PRR) method as the feature values, we constructed the original data matrix, where the last column represents the category label of each drug. Our proposed model expands the ADR data from both the sample space and the feature space. In terms of feature space, we use feature selection (FS) to screen ADR symptoms with higher importance scores. Then, we use GAN to generate artificial data, which are added to the feature space to achieve feature enhancement. In terms of sample space, we use SMOTE technology to expand the minority samples to balance three categories of drugs and minimize the classification deviation caused by the gap in the sample size. Finally, we use random forest (RF) algorithm to classify the feature-enhanced and balanced data set. The experimental results show that the accuracy of the proposed classification model reaches 98%. Our proposed model can well evaluate drug risk levels and provide automated methods for status switch of post-marketing drugs. |
---|