Cargando…

Phonetic Spelling Filter for Keyword Selection in Drug Mention Mining from Social Media

Social media postings are rich in information that often remain hidden and inaccessible for automatic extraction due to inherent limitations of the site’s APIs, which mostly limit access via specific keyword-based searches (and limit both the number of keywords and the number of postings that are re...

Descripción completa

Detalles Bibliográficos
Autores principales: Pimpalkhute, Pranoti, Patki, Apurv, Nikfarjam, Azadeh, Gonzalez, Graciela
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Medical Informatics Association 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4333687/
https://www.ncbi.nlm.nih.gov/pubmed/25717407
Descripción
Sumario:Social media postings are rich in information that often remain hidden and inaccessible for automatic extraction due to inherent limitations of the site’s APIs, which mostly limit access via specific keyword-based searches (and limit both the number of keywords and the number of postings that are returned). When mining social media for drug mentions, one of the first problems to solve is how to derive a list of variants of the drug name (common misspellings) that can capture a sufficient number of postings. We present here an approach that filters the potential variants based on the intuition that, faced with the task of writing an unfamiliar, complex word (the drug name), users will tend to revert to phonetic spelling, and we thus give preference to variants that reflect the phonemes of the correct spelling. The algorithm allowed us to capture 50.4 – 56.0 % of the user comments using only about 18% of the variants.