Cargando…

Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features

Objective Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks, particularly for pharmacovigilance, via the use of natural language processing (NLP) techniques. However, the la...

Descripción completa

Detalles Bibliográficos
Autores principales:	Nikfarjam, Azadeh, Sarker, Abeed, O’Connor, Karen, Ginn, Rachel, Gonzalez, Graciela
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2015
Materias:	Research and Applications
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4457113/ https://www.ncbi.nlm.nih.gov/pubmed/25755127 http://dx.doi.org/10.1093/jamia/ocu041

_version_	1782374940064874496
author	Nikfarjam, Azadeh Sarker, Abeed O’Connor, Karen Ginn, Rachel Gonzalez, Graciela
author_facet	Nikfarjam, Azadeh Sarker, Abeed O’Connor, Karen Ginn, Rachel Gonzalez, Graciela
author_sort	Nikfarjam, Azadeh
collection	PubMed
description	Objective Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks, particularly for pharmacovigilance, via the use of natural language processing (NLP) techniques. However, the language in social media is highly informal, and user-expressed medical concepts are often nontechnical, descriptive, and challenging to extract. There has been limited progress in addressing these challenges, and thus far, advanced machine learning-based NLP techniques have been underutilized. Our objective is to design a machine learning-based approach to extract mentions of adverse drug reactions (ADRs) from highly informal text in social media. Methods We introduce ADRMine, a machine learning-based concept extraction system that uses conditional random fields (CRFs). ADRMine utilizes a variety of features, including a novel feature for modeling words’ semantic similarities. The similarities are modeled by clustering words based on unsupervised, pretrained word representation vectors (embeddings) generated from unlabeled user posts in social media using a deep learning technique. Results ADRMine outperforms several strong baseline systems in the ADR extraction task by achieving an F-measure of 0.82. Feature analysis demonstrates that the proposed word cluster features significantly improve extraction performance. Conclusion It is possible to extract complex medical concepts, with relatively high performance, from informal, user-generated content. Our approach is particularly scalable, suitable for social media mining, as it relies on large volumes of unlabeled data, thus diminishing the need for large, annotated training data sets.
format	Online Article Text
id	pubmed-4457113
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-44571132016-05-01 Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features Nikfarjam, Azadeh Sarker, Abeed O’Connor, Karen Ginn, Rachel Gonzalez, Graciela J Am Med Inform Assoc Research and Applications Objective Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks, particularly for pharmacovigilance, via the use of natural language processing (NLP) techniques. However, the language in social media is highly informal, and user-expressed medical concepts are often nontechnical, descriptive, and challenging to extract. There has been limited progress in addressing these challenges, and thus far, advanced machine learning-based NLP techniques have been underutilized. Our objective is to design a machine learning-based approach to extract mentions of adverse drug reactions (ADRs) from highly informal text in social media. Methods We introduce ADRMine, a machine learning-based concept extraction system that uses conditional random fields (CRFs). ADRMine utilizes a variety of features, including a novel feature for modeling words’ semantic similarities. The similarities are modeled by clustering words based on unsupervised, pretrained word representation vectors (embeddings) generated from unlabeled user posts in social media using a deep learning technique. Results ADRMine outperforms several strong baseline systems in the ADR extraction task by achieving an F-measure of 0.82. Feature analysis demonstrates that the proposed word cluster features significantly improve extraction performance. Conclusion It is possible to extract complex medical concepts, with relatively high performance, from informal, user-generated content. Our approach is particularly scalable, suitable for social media mining, as it relies on large volumes of unlabeled data, thus diminishing the need for large, annotated training data sets. Oxford University Press 2015-05 2015-03-09 /pmc/articles/PMC4457113/ /pubmed/25755127 http://dx.doi.org/10.1093/jamia/ocu041 Text en © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Research and Applications Nikfarjam, Azadeh Sarker, Abeed O’Connor, Karen Ginn, Rachel Gonzalez, Graciela Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features
title	Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features
title_full	Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features
title_fullStr	Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features
title_full_unstemmed	Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features
title_short	Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features
title_sort	pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features
topic	Research and Applications
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4457113/ https://www.ncbi.nlm.nih.gov/pubmed/25755127 http://dx.doi.org/10.1093/jamia/ocu041
work_keys_str_mv	AT nikfarjamazadeh pharmacovigilancefromsocialmediaminingadversedrugreactionmentionsusingsequencelabelingwithwordembeddingclusterfeatures AT sarkerabeed pharmacovigilancefromsocialmediaminingadversedrugreactionmentionsusingsequencelabelingwithwordembeddingclusterfeatures AT oconnorkaren pharmacovigilancefromsocialmediaminingadversedrugreactionmentionsusingsequencelabelingwithwordembeddingclusterfeatures AT ginnrachel pharmacovigilancefromsocialmediaminingadversedrugreactionmentionsusingsequencelabelingwithwordembeddingclusterfeatures AT gonzalezgraciela pharmacovigilancefromsocialmediaminingadversedrugreactionmentionsusingsequencelabelingwithwordembeddingclusterfeatures

Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features

Ejemplares similares