Cargando…

BICEPP: an example-based statistical text mining method for predicting the binary characteristics of drugs

BACKGROUND: The identification of drug characteristics is a clinically important task, but it requires much expert knowledge and consumes substantial resources. We have developed a statistical text-mining approach (BInary Characteristics Extractor and biomedical Properties Predictor: BICEPP) to help...

Descripción completa

Detalles Bibliográficos
Autores principales: Lin, Frank PY, Anthony, Stephen, Polasek, Thomas M, Tsafnat, Guy, Doogue, Matthew P
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3110144/
https://www.ncbi.nlm.nih.gov/pubmed/21510898
http://dx.doi.org/10.1186/1471-2105-12-112
_version_ 1782205489582440448
author Lin, Frank PY
Anthony, Stephen
Polasek, Thomas M
Tsafnat, Guy
Doogue, Matthew P
author_facet Lin, Frank PY
Anthony, Stephen
Polasek, Thomas M
Tsafnat, Guy
Doogue, Matthew P
author_sort Lin, Frank PY
collection PubMed
description BACKGROUND: The identification of drug characteristics is a clinically important task, but it requires much expert knowledge and consumes substantial resources. We have developed a statistical text-mining approach (BInary Characteristics Extractor and biomedical Properties Predictor: BICEPP) to help experts screen drugs that may have important clinical characteristics of interest. RESULTS: BICEPP first retrieves MEDLINE abstracts containing drug names, then selects tokens that best predict the list of drugs which represents the characteristic of interest. Machine learning is then used to classify drugs using a document frequency-based measure. Evaluation experiments were performed to validate BICEPP's performance on 484 characteristics of 857 drugs, identified from the Australian Medicines Handbook (AMH) and the PharmacoKinetic Interaction Screening (PKIS) database. Stratified cross-validations revealed that BICEPP was able to classify drugs into all 20 major therapeutic classes (100%) and 157 (of 197) minor drug classes (80%) with areas under the receiver operating characteristic curve (AUC) > 0.80. Similarly, AUC > 0.80 could be obtained in the classification of 173 (of 238) adverse events (73%), up to 12 (of 15) groups of clinically significant cytochrome P450 enzyme (CYP) inducers or inhibitors (80%), and up to 11 (of 14) groups of narrow therapeutic index drugs (79%). Interestingly, it was observed that the keywords used to describe a drug characteristic were not necessarily the most predictive ones for the classification task. CONCLUSIONS: BICEPP has sufficient classification power to automatically distinguish a wide range of clinical properties of drugs. This may be used in pharmacovigilance applications to assist with rapid screening of large drug databases to identify important characteristics for further evaluation.
format Online
Article
Text
id pubmed-3110144
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31101442011-06-08 BICEPP: an example-based statistical text mining method for predicting the binary characteristics of drugs Lin, Frank PY Anthony, Stephen Polasek, Thomas M Tsafnat, Guy Doogue, Matthew P BMC Bioinformatics Methodology Article BACKGROUND: The identification of drug characteristics is a clinically important task, but it requires much expert knowledge and consumes substantial resources. We have developed a statistical text-mining approach (BInary Characteristics Extractor and biomedical Properties Predictor: BICEPP) to help experts screen drugs that may have important clinical characteristics of interest. RESULTS: BICEPP first retrieves MEDLINE abstracts containing drug names, then selects tokens that best predict the list of drugs which represents the characteristic of interest. Machine learning is then used to classify drugs using a document frequency-based measure. Evaluation experiments were performed to validate BICEPP's performance on 484 characteristics of 857 drugs, identified from the Australian Medicines Handbook (AMH) and the PharmacoKinetic Interaction Screening (PKIS) database. Stratified cross-validations revealed that BICEPP was able to classify drugs into all 20 major therapeutic classes (100%) and 157 (of 197) minor drug classes (80%) with areas under the receiver operating characteristic curve (AUC) > 0.80. Similarly, AUC > 0.80 could be obtained in the classification of 173 (of 238) adverse events (73%), up to 12 (of 15) groups of clinically significant cytochrome P450 enzyme (CYP) inducers or inhibitors (80%), and up to 11 (of 14) groups of narrow therapeutic index drugs (79%). Interestingly, it was observed that the keywords used to describe a drug characteristic were not necessarily the most predictive ones for the classification task. CONCLUSIONS: BICEPP has sufficient classification power to automatically distinguish a wide range of clinical properties of drugs. This may be used in pharmacovigilance applications to assist with rapid screening of large drug databases to identify important characteristics for further evaluation. BioMed Central 2011-04-21 /pmc/articles/PMC3110144/ /pubmed/21510898 http://dx.doi.org/10.1186/1471-2105-12-112 Text en Copyright ©2011 Lin et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Lin, Frank PY
Anthony, Stephen
Polasek, Thomas M
Tsafnat, Guy
Doogue, Matthew P
BICEPP: an example-based statistical text mining method for predicting the binary characteristics of drugs
title BICEPP: an example-based statistical text mining method for predicting the binary characteristics of drugs
title_full BICEPP: an example-based statistical text mining method for predicting the binary characteristics of drugs
title_fullStr BICEPP: an example-based statistical text mining method for predicting the binary characteristics of drugs
title_full_unstemmed BICEPP: an example-based statistical text mining method for predicting the binary characteristics of drugs
title_short BICEPP: an example-based statistical text mining method for predicting the binary characteristics of drugs
title_sort bicepp: an example-based statistical text mining method for predicting the binary characteristics of drugs
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3110144/
https://www.ncbi.nlm.nih.gov/pubmed/21510898
http://dx.doi.org/10.1186/1471-2105-12-112
work_keys_str_mv AT linfrankpy biceppanexamplebasedstatisticaltextminingmethodforpredictingthebinarycharacteristicsofdrugs
AT anthonystephen biceppanexamplebasedstatisticaltextminingmethodforpredictingthebinarycharacteristicsofdrugs
AT polasekthomasm biceppanexamplebasedstatisticaltextminingmethodforpredictingthebinarycharacteristicsofdrugs
AT tsafnatguy biceppanexamplebasedstatisticaltextminingmethodforpredictingthebinarycharacteristicsofdrugs
AT dooguematthewp biceppanexamplebasedstatisticaltextminingmethodforpredictingthebinarycharacteristicsofdrugs