Cargando…

Mining FDA drug labels using an unsupervised learning technique - topic modeling

BACKGROUND: The Food and Drug Administration (FDA) approved drug labels contain a broad array of information, ranging from adverse drug reactions (ADRs) to drug efficacy, risk-benefit consideration, and more. However, the labeling language used to describe these information is free text often contai...

Descripción completa

Detalles Bibliográficos
Autores principales: Bisgin, Halil, Liu, Zhichao, Fang, Hong, Xu, Xiaowei, Tong, Weida
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3236833/
https://www.ncbi.nlm.nih.gov/pubmed/22166012
http://dx.doi.org/10.1186/1471-2105-12-S10-S11
_version_ 1782218792220229632
author Bisgin, Halil
Liu, Zhichao
Fang, Hong
Xu, Xiaowei
Tong, Weida
author_facet Bisgin, Halil
Liu, Zhichao
Fang, Hong
Xu, Xiaowei
Tong, Weida
author_sort Bisgin, Halil
collection PubMed
description BACKGROUND: The Food and Drug Administration (FDA) approved drug labels contain a broad array of information, ranging from adverse drug reactions (ADRs) to drug efficacy, risk-benefit consideration, and more. However, the labeling language used to describe these information is free text often containing ambiguous semantic descriptions, which poses a great challenge in retrieving useful information from the labeling text in a consistent and accurate fashion for comparative analysis across drugs. Consequently, this task has largely relied on the manual reading of the full text by experts, which is time consuming and labor intensive. METHOD: In this study, a novel text mining method with unsupervised learning in nature, called topic modeling, was applied to the drug labeling with a goal of discovering “topics” that group drugs with similar safety concerns and/or therapeutic uses together. A total of 794 FDA-approved drug labels were used in this study. First, the three labeling sections (i.e., Boxed Warning, Warnings and Precautions, Adverse Reactions) of each drug label were processed by the Medical Dictionary for Regulatory Activities (MedDRA) to convert the free text of each label to the standard ADR terms. Next, the topic modeling approach with latent Dirichlet allocation (LDA) was applied to generate 100 topics, each associated with a set of drugs grouped together based on the probability analysis. Lastly, the efficacy of the topic modeling was evaluated based on known information about the therapeutic uses and safety data of drugs. RESULTS: The results demonstrate that drugs grouped by topics are associated with the same safety concerns and/or therapeutic uses with statistical significance (P<0.05). The identified topics have distinct context that can be directly linked to specific adverse events (e.g., liver injury or kidney injury) or therapeutic application (e.g., antiinfectives for systemic use). We were also able to identify potential adverse events that might arise from specific medications via topics. CONCLUSIONS: The successful application of topic modeling on the FDA drug labeling demonstrates its potential utility as a hypothesis generation means to infer hidden relationships of concepts such as, in this study, drug safety and therapeutic use in the study of biomedical documents.
format Online
Article
Text
id pubmed-3236833
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32368332011-12-14 Mining FDA drug labels using an unsupervised learning technique - topic modeling Bisgin, Halil Liu, Zhichao Fang, Hong Xu, Xiaowei Tong, Weida BMC Bioinformatics Proceedings BACKGROUND: The Food and Drug Administration (FDA) approved drug labels contain a broad array of information, ranging from adverse drug reactions (ADRs) to drug efficacy, risk-benefit consideration, and more. However, the labeling language used to describe these information is free text often containing ambiguous semantic descriptions, which poses a great challenge in retrieving useful information from the labeling text in a consistent and accurate fashion for comparative analysis across drugs. Consequently, this task has largely relied on the manual reading of the full text by experts, which is time consuming and labor intensive. METHOD: In this study, a novel text mining method with unsupervised learning in nature, called topic modeling, was applied to the drug labeling with a goal of discovering “topics” that group drugs with similar safety concerns and/or therapeutic uses together. A total of 794 FDA-approved drug labels were used in this study. First, the three labeling sections (i.e., Boxed Warning, Warnings and Precautions, Adverse Reactions) of each drug label were processed by the Medical Dictionary for Regulatory Activities (MedDRA) to convert the free text of each label to the standard ADR terms. Next, the topic modeling approach with latent Dirichlet allocation (LDA) was applied to generate 100 topics, each associated with a set of drugs grouped together based on the probability analysis. Lastly, the efficacy of the topic modeling was evaluated based on known information about the therapeutic uses and safety data of drugs. RESULTS: The results demonstrate that drugs grouped by topics are associated with the same safety concerns and/or therapeutic uses with statistical significance (P<0.05). The identified topics have distinct context that can be directly linked to specific adverse events (e.g., liver injury or kidney injury) or therapeutic application (e.g., antiinfectives for systemic use). We were also able to identify potential adverse events that might arise from specific medications via topics. CONCLUSIONS: The successful application of topic modeling on the FDA drug labeling demonstrates its potential utility as a hypothesis generation means to infer hidden relationships of concepts such as, in this study, drug safety and therapeutic use in the study of biomedical documents. BioMed Central 2011-10-18 /pmc/articles/PMC3236833/ /pubmed/22166012 http://dx.doi.org/10.1186/1471-2105-12-S10-S11 Text en Copyright ©2011 Bisgin et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Bisgin, Halil
Liu, Zhichao
Fang, Hong
Xu, Xiaowei
Tong, Weida
Mining FDA drug labels using an unsupervised learning technique - topic modeling
title Mining FDA drug labels using an unsupervised learning technique - topic modeling
title_full Mining FDA drug labels using an unsupervised learning technique - topic modeling
title_fullStr Mining FDA drug labels using an unsupervised learning technique - topic modeling
title_full_unstemmed Mining FDA drug labels using an unsupervised learning technique - topic modeling
title_short Mining FDA drug labels using an unsupervised learning technique - topic modeling
title_sort mining fda drug labels using an unsupervised learning technique - topic modeling
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3236833/
https://www.ncbi.nlm.nih.gov/pubmed/22166012
http://dx.doi.org/10.1186/1471-2105-12-S10-S11
work_keys_str_mv AT bisginhalil miningfdadruglabelsusinganunsupervisedlearningtechniquetopicmodeling
AT liuzhichao miningfdadruglabelsusinganunsupervisedlearningtechniquetopicmodeling
AT fanghong miningfdadruglabelsusinganunsupervisedlearningtechniquetopicmodeling
AT xuxiaowei miningfdadruglabelsusinganunsupervisedlearningtechniquetopicmodeling
AT tongweida miningfdadruglabelsusinganunsupervisedlearningtechniquetopicmodeling