Cargando…
Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task
OBJECTIVE: We executed the Social Media Mining for Health (SMM4H) 2017 shared tasks to enable the community-driven development and large-scale evaluation of automatic text processing methods for the classification and normalization of health-related text from social media. An additional objective wa...
Autores principales: | , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6188524/ https://www.ncbi.nlm.nih.gov/pubmed/30272184 http://dx.doi.org/10.1093/jamia/ocy114 |
_version_ | 1783363197339697152 |
---|---|
author | Sarker, Abeed Belousov, Maksim Friedrichs, Jasper Hakala, Kai Kiritchenko, Svetlana Mehryary, Farrokh Han, Sifei Tran, Tung Rios, Anthony Kavuluru, Ramakanth de Bruijn, Berry Ginter, Filip Mahata, Debanjan Mohammad, Saif M Nenadic, Goran Gonzalez-Hernandez, Graciela |
author_facet | Sarker, Abeed Belousov, Maksim Friedrichs, Jasper Hakala, Kai Kiritchenko, Svetlana Mehryary, Farrokh Han, Sifei Tran, Tung Rios, Anthony Kavuluru, Ramakanth de Bruijn, Berry Ginter, Filip Mahata, Debanjan Mohammad, Saif M Nenadic, Goran Gonzalez-Hernandez, Graciela |
author_sort | Sarker, Abeed |
collection | PubMed |
description | OBJECTIVE: We executed the Social Media Mining for Health (SMM4H) 2017 shared tasks to enable the community-driven development and large-scale evaluation of automatic text processing methods for the classification and normalization of health-related text from social media. An additional objective was to publicly release manually annotated data. MATERIALS AND METHODS: We organized 3 independent subtasks: automatic classification of self-reports of 1) adverse drug reactions (ADRs) and 2) medication consumption, from medication-mentioning tweets, and 3) normalization of ADR expressions. Training data consisted of 15 717 annotated tweets for (1), 10 260 for (2), and 6650 ADR phrases and identifiers for (3); and exhibited typical properties of social-media-based health-related texts. Systems were evaluated using 9961, 7513, and 2500 instances for the 3 subtasks, respectively. We evaluated performances of classes of methods and ensembles of system combinations following the shared tasks. RESULTS: Among 55 system runs, the best system scores for the 3 subtasks were 0.435 (ADR class F(1)-score) for subtask-1, 0.693 (micro-averaged F(1)-score over two classes) for subtask-2, and 88.5% (accuracy) for subtask-3. Ensembles of system combinations obtained best scores of 0.476, 0.702, and 88.7%, outperforming individual systems. DISCUSSION: Among individual systems, support vector machines and convolutional neural networks showed high performance. Performance gains achieved by ensembles of system combinations suggest that such strategies may be suitable for operational systems relying on difficult text classification tasks (eg, subtask-1). CONCLUSIONS: Data imbalance and lack of context remain challenges for natural language processing of social media text. Annotated data from the shared task have been made available as reference standards for future studies (http://dx.doi.org/10.17632/rxwfb3tysd.1). |
format | Online Article Text |
id | pubmed-6188524 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-61885242018-10-19 Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task Sarker, Abeed Belousov, Maksim Friedrichs, Jasper Hakala, Kai Kiritchenko, Svetlana Mehryary, Farrokh Han, Sifei Tran, Tung Rios, Anthony Kavuluru, Ramakanth de Bruijn, Berry Ginter, Filip Mahata, Debanjan Mohammad, Saif M Nenadic, Goran Gonzalez-Hernandez, Graciela J Am Med Inform Assoc Research and Applications OBJECTIVE: We executed the Social Media Mining for Health (SMM4H) 2017 shared tasks to enable the community-driven development and large-scale evaluation of automatic text processing methods for the classification and normalization of health-related text from social media. An additional objective was to publicly release manually annotated data. MATERIALS AND METHODS: We organized 3 independent subtasks: automatic classification of self-reports of 1) adverse drug reactions (ADRs) and 2) medication consumption, from medication-mentioning tweets, and 3) normalization of ADR expressions. Training data consisted of 15 717 annotated tweets for (1), 10 260 for (2), and 6650 ADR phrases and identifiers for (3); and exhibited typical properties of social-media-based health-related texts. Systems were evaluated using 9961, 7513, and 2500 instances for the 3 subtasks, respectively. We evaluated performances of classes of methods and ensembles of system combinations following the shared tasks. RESULTS: Among 55 system runs, the best system scores for the 3 subtasks were 0.435 (ADR class F(1)-score) for subtask-1, 0.693 (micro-averaged F(1)-score over two classes) for subtask-2, and 88.5% (accuracy) for subtask-3. Ensembles of system combinations obtained best scores of 0.476, 0.702, and 88.7%, outperforming individual systems. DISCUSSION: Among individual systems, support vector machines and convolutional neural networks showed high performance. Performance gains achieved by ensembles of system combinations suggest that such strategies may be suitable for operational systems relying on difficult text classification tasks (eg, subtask-1). CONCLUSIONS: Data imbalance and lack of context remain challenges for natural language processing of social media text. Annotated data from the shared task have been made available as reference standards for future studies (http://dx.doi.org/10.17632/rxwfb3tysd.1). Oxford University Press 2018-10-01 /pmc/articles/PMC6188524/ /pubmed/30272184 http://dx.doi.org/10.1093/jamia/ocy114 Text en © The Author(s) 2018. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial reproduction and distribution of the work, in any medium, provided the original work is not altered or transformed in any way, and that the work is properly cited. For commercial re-use, please contactjournals.permissions@oup.com |
spellingShingle | Research and Applications Sarker, Abeed Belousov, Maksim Friedrichs, Jasper Hakala, Kai Kiritchenko, Svetlana Mehryary, Farrokh Han, Sifei Tran, Tung Rios, Anthony Kavuluru, Ramakanth de Bruijn, Berry Ginter, Filip Mahata, Debanjan Mohammad, Saif M Nenadic, Goran Gonzalez-Hernandez, Graciela Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task |
title | Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task |
title_full | Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task |
title_fullStr | Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task |
title_full_unstemmed | Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task |
title_short | Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task |
title_sort | data and systems for medication-related text classification and concept normalization from twitter: insights from the social media mining for health (smm4h)-2017 shared task |
topic | Research and Applications |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6188524/ https://www.ncbi.nlm.nih.gov/pubmed/30272184 http://dx.doi.org/10.1093/jamia/ocy114 |
work_keys_str_mv | AT sarkerabeed dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask AT belousovmaksim dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask AT friedrichsjasper dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask AT hakalakai dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask AT kiritchenkosvetlana dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask AT mehryaryfarrokh dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask AT hansifei dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask AT trantung dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask AT riosanthony dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask AT kavulururamakanth dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask AT debruijnberry dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask AT ginterfilip dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask AT mahatadebanjan dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask AT mohammadsaifm dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask AT nenadicgoran dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask AT gonzalezhernandezgraciela dataandsystemsformedicationrelatedtextclassificationandconceptnormalizationfromtwitterinsightsfromthesocialmediaminingforhealthsmm4h2017sharedtask |