Cargando…

Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task

OBJECTIVE: We executed the Social Media Mining for Health (SMM4H) 2017 shared tasks to enable the community-driven development and large-scale evaluation of automatic text processing methods for the classification and normalization of health-related text from social media. An additional objective wa...

Descripción completa

Detalles Bibliográficos
Autores principales: Sarker, Abeed, Belousov, Maksim, Friedrichs, Jasper, Hakala, Kai, Kiritchenko, Svetlana, Mehryary, Farrokh, Han, Sifei, Tran, Tung, Rios, Anthony, Kavuluru, Ramakanth, de Bruijn, Berry, Ginter, Filip, Mahata, Debanjan, Mohammad, Saif M, Nenadic, Goran, Gonzalez-Hernandez, Graciela
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6188524/
https://www.ncbi.nlm.nih.gov/pubmed/30272184
http://dx.doi.org/10.1093/jamia/ocy114
_version_ 1783363197339697152
author Sarker, Abeed
Belousov, Maksim
Friedrichs, Jasper
Hakala, Kai
Kiritchenko, Svetlana
Mehryary, Farrokh
Han, Sifei
Tran, Tung
Rios, Anthony
Kavuluru, Ramakanth
de Bruijn, Berry
Ginter, Filip
Mahata, Debanjan
Mohammad, Saif M
Nenadic, Goran
Gonzalez-Hernandez, Graciela
author_facet Sarker, Abeed
Belousov, Maksim
Friedrichs, Jasper
Hakala, Kai
Kiritchenko, Svetlana
Mehryary, Farrokh
Han, Sifei
Tran, Tung
Rios, Anthony
Kavuluru, Ramakanth
de Bruijn, Berry
Ginter, Filip
Mahata, Debanjan
Mohammad, Saif M
Nenadic, Goran
Gonzalez-Hernandez, Graciela
author_sort Sarker, Abeed
collection PubMed
description OBJECTIVE: We executed the Social Media Mining for Health (SMM4H) 2017 shared tasks to enable the community-driven development and large-scale evaluation of automatic text processing methods for the classification and normalization of health-related text from social media. An additional objective was to publicly release manually annotated data. MATERIALS AND METHODS: We organized 3 independent subtasks: automatic classification of self-reports of 1) adverse drug reactions (ADRs) and 2) medication consumption, from medication-mentioning tweets, and 3) normalization of ADR expressions. Training data consisted of 15 717 annotated tweets for (1), 10 260 for (2), and 6650 ADR phrases and identifiers for (3); and exhibited typical properties of social-media-based health-related texts. Systems were evaluated using 9961, 7513, and 2500 instances for the 3 subtasks, respectively. We evaluated performances of classes of methods and ensembles of system combinations following the shared tasks. RESULTS: Among 55 system runs, the best system scores for the 3 subtasks were 0.435 (ADR class F(1)-score) for subtask-1, 0.693 (micro-averaged F(1)-score over two classes) for subtask-2, and 88.5% (accuracy) for subtask-3. Ensembles of system combinations obtained best scores of 0.476, 0.702, and 88.7%, outperforming individual systems. DISCUSSION: Among individual systems, support vector machines and convolutional neural networks showed high performance. Performance gains achieved by ensembles of system combinations suggest that such strategies may be suitable for operational systems relying on difficult text classification tasks (eg, subtask-1). CONCLUSIONS: Data imbalance and lack of context remain challenges for natural language processing of social media text. Annotated data from the shared task have been made available as reference standards for future studies (http://dx.doi.org/10.17632/rxwfb3tysd.1).
format Online
Article
Text
id pubmed-6188524
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-61885242018-10-19 Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task Sarker, Abeed Belousov, Maksim Friedrichs, Jasper Hakala, Kai Kiritchenko, Svetlana Mehryary, Farrokh Han, Sifei Tran, Tung Rios, Anthony Kavuluru, Ramakanth de Bruijn, Berry Ginter, Filip Mahata, Debanjan Mohammad, Saif M Nenadic, Goran Gonzalez-Hernandez, Graciela J Am Med Inform Assoc Research and Applications OBJECTIVE: We executed the Social Media Mining for Health (SMM4H) 2017 shared tasks to enable the community-driven development and large-scale evaluation of automatic text processing methods for the classification and normalization of health-related text from social media. An additional objective was to publicly release manually annotated data. MATERIALS AND METHODS: We organized 3 independent subtasks: automatic classification of self-reports of 1) adverse drug reactions (ADRs) and 2) medication consumption, from medication-mentioning tweets, and 3) normalization of ADR expressions. Training data consisted of 15 717 annotated tweets for (1), 10 260 for (2), and 6650 ADR phrases and identifiers for (3); and exhibited typical properties of social-media-based health-related texts. Systems were evaluated using 9961, 7513, and 2500 instances for the 3 subtasks, respectively. We evaluated performances of classes of methods and ensembles of system combinations following the shared tasks. RESULTS: Among 55 system runs, the best system scores for the 3 subtasks were 0.435 (ADR class F(1)-score) for subtask-1, 0.693 (micro-averaged F(1)-score over two classes) for subtask-2, and 88.5% (accuracy) for subtask-3. Ensembles of system combinations obtained best scores of 0.476, 0.702, and 88.7%, outperforming individual systems. DISCUSSION: Among individual systems, support vector machines and convolutional neural networks showed high performance. Performance gains achieved by ensembles of system combinations suggest that such strategies may be suitable for operational systems relying on difficult text classification tasks (eg, subtask-1). CONCLUSIONS: Data imbalance and lack of context remain challenges for natural language processing of social media text. Annotated data from the shared task have been made available as reference standards for future studies (http://dx.doi.org/10.17632/rxwfb3tysd.1). Oxford University Press 2018-10-01 /pmc/articles/PMC6188524/ /pubmed/30272184 http://dx.doi.org/10.1093/jamia/ocy114 Text en © The Author(s) 2018. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial reproduction and distribution of the work, in any medium, provided the original work is not altered or transformed in any way, and that the work is properly cited. For commercial re-use, please contactjournals.permissions@oup.com
spellingShingle Research and Applications
Sarker, Abeed
Belousov, Maksim
Friedrichs, Jasper
Hakala, Kai
Kiritchenko, Svetlana
Mehryary, Farrokh
Han, Sifei
Tran, Tung
Rios, Anthony
Kavuluru, Ramakanth
de Bruijn, Berry
Ginter, Filip
Mahata, Debanjan
Mohammad, Saif M
Nenadic, Goran
Gonzalez-Hernandez, Graciela
Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task
title Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task
title_full Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task
title_fullStr Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task
title_full_unstemmed Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task
title_short Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task
title_sort data and systems for medication-related text classification and concept normalization from twitter: insights from the social media mining for health (smm4h)-2017 shared task
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6188524/
https://www.ncbi.nlm.nih.gov/pubmed/30272184
http://dx.doi.org/10.1093/jamia/ocy114
work_keys_str_mv AT sarkerabeed dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask
AT belousovmaksim dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask
AT friedrichsjasper dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask
AT hakalakai dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask
AT kiritchenkosvetlana dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask
AT mehryaryfarrokh dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask
AT hansifei dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask
AT trantung dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask
AT riosanthony dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask
AT kavulururamakanth dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask
AT debruijnberry dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask
AT ginterfilip dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask
AT mahatadebanjan dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask
AT mohammadsaifm dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask
AT nenadicgoran dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask
AT gonzalezhernandezgraciela dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask