Cargando…

Tweet Classification Toward Twitter-Based Disease Surveillance: New Data, Methods, and Evaluations

BACKGROUND: The amount of medical and clinical-related information on the Web is increasing. Among the different types of information available, social media–based data obtained directly from people are particularly valuable and are attracting significant attention. To encourage medical natural lang...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wakamiya, Shoko, Morita, Mizuki, Kano, Yoshinobu, Ohkuma, Tomoko, Aramaki, Eiji
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2019
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6401666/ https://www.ncbi.nlm.nih.gov/pubmed/30785407 http://dx.doi.org/10.2196/12783

_version_	1783400195248095232
author	Wakamiya, Shoko Morita, Mizuki Kano, Yoshinobu Ohkuma, Tomoko Aramaki, Eiji
author_facet	Wakamiya, Shoko Morita, Mizuki Kano, Yoshinobu Ohkuma, Tomoko Aramaki, Eiji
author_sort	Wakamiya, Shoko
collection	PubMed
description	BACKGROUND: The amount of medical and clinical-related information on the Web is increasing. Among the different types of information available, social media–based data obtained directly from people are particularly valuable and are attracting significant attention. To encourage medical natural language processing (NLP) research exploiting social media data, the 13th NII Testbeds and Community for Information access Research (NTCIR-13) Medical natural language processing for Web document (MedWeb) provides pseudo-Twitter messages in a cross-language and multi-label corpus, covering 3 languages (Japanese, English, and Chinese) and annotated with 8 symptom labels (such as cold, fever, and flu). Then, participants classify each tweet into 1 of the 2 categories: those containing a patient’s symptom and those that do not. OBJECTIVE: This study aimed to present the results of groups participating in a Japanese subtask, English subtask, and Chinese subtask along with discussions, to clarify the issues that need to be resolved in the field of medical NLP. METHODS: In summary, 8 groups (19 systems) participated in the Japanese subtask, 4 groups (12 systems) participated in the English subtask, and 2 groups (6 systems) participated in the Chinese subtask. In total, 2 baseline systems were constructed for each subtask. The performance of the participant and baseline systems was assessed using the exact match accuracy, F-measure based on precision and recall, and Hamming loss. RESULTS: The best system achieved exactly 0.880 match accuracy, 0.920 F-measure, and 0.019 Hamming loss. The averages of match accuracy, F-measure, and Hamming loss for the Japanese subtask were 0.720, 0.820, and 0.051; those for the English subtask were 0.770, 0.850, and 0.037; and those for the Chinese subtask were 0.810, 0.880, and 0.032, respectively. CONCLUSIONS: This paper presented and discussed the performance of systems participating in the NTCIR-13 MedWeb task. As the MedWeb task settings can be formalized as the factualization of text, the achievement of this task could be directly applied to practical clinical applications.
format	Online Article Text
id	pubmed-6401666
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-64016662019-03-29 Tweet Classification Toward Twitter-Based Disease Surveillance: New Data, Methods, and Evaluations Wakamiya, Shoko Morita, Mizuki Kano, Yoshinobu Ohkuma, Tomoko Aramaki, Eiji J Med Internet Res Original Paper BACKGROUND: The amount of medical and clinical-related information on the Web is increasing. Among the different types of information available, social media–based data obtained directly from people are particularly valuable and are attracting significant attention. To encourage medical natural language processing (NLP) research exploiting social media data, the 13th NII Testbeds and Community for Information access Research (NTCIR-13) Medical natural language processing for Web document (MedWeb) provides pseudo-Twitter messages in a cross-language and multi-label corpus, covering 3 languages (Japanese, English, and Chinese) and annotated with 8 symptom labels (such as cold, fever, and flu). Then, participants classify each tweet into 1 of the 2 categories: those containing a patient’s symptom and those that do not. OBJECTIVE: This study aimed to present the results of groups participating in a Japanese subtask, English subtask, and Chinese subtask along with discussions, to clarify the issues that need to be resolved in the field of medical NLP. METHODS: In summary, 8 groups (19 systems) participated in the Japanese subtask, 4 groups (12 systems) participated in the English subtask, and 2 groups (6 systems) participated in the Chinese subtask. In total, 2 baseline systems were constructed for each subtask. The performance of the participant and baseline systems was assessed using the exact match accuracy, F-measure based on precision and recall, and Hamming loss. RESULTS: The best system achieved exactly 0.880 match accuracy, 0.920 F-measure, and 0.019 Hamming loss. The averages of match accuracy, F-measure, and Hamming loss for the Japanese subtask were 0.720, 0.820, and 0.051; those for the English subtask were 0.770, 0.850, and 0.037; and those for the Chinese subtask were 0.810, 0.880, and 0.032, respectively. CONCLUSIONS: This paper presented and discussed the performance of systems participating in the NTCIR-13 MedWeb task. As the MedWeb task settings can be formalized as the factualization of text, the achievement of this task could be directly applied to practical clinical applications. JMIR Publications 2019-02-20 /pmc/articles/PMC6401666/ /pubmed/30785407 http://dx.doi.org/10.2196/12783 Text en ©Shoko Wakamiya, Mizuki Morita, Yoshinobu Kano, Tomoko Ohkuma, Eiji Aramaki. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 20.02.2019. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Wakamiya, Shoko Morita, Mizuki Kano, Yoshinobu Ohkuma, Tomoko Aramaki, Eiji Tweet Classification Toward Twitter-Based Disease Surveillance: New Data, Methods, and Evaluations
title	Tweet Classification Toward Twitter-Based Disease Surveillance: New Data, Methods, and Evaluations
title_full	Tweet Classification Toward Twitter-Based Disease Surveillance: New Data, Methods, and Evaluations
title_fullStr	Tweet Classification Toward Twitter-Based Disease Surveillance: New Data, Methods, and Evaluations
title_full_unstemmed	Tweet Classification Toward Twitter-Based Disease Surveillance: New Data, Methods, and Evaluations
title_short	Tweet Classification Toward Twitter-Based Disease Surveillance: New Data, Methods, and Evaluations
title_sort	tweet classification toward twitter-based disease surveillance: new data, methods, and evaluations
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6401666/ https://www.ncbi.nlm.nih.gov/pubmed/30785407 http://dx.doi.org/10.2196/12783
work_keys_str_mv	AT wakamiyashoko tweetclassificationtowardtwitterbaseddiseasesurveillancenewdatamethodsandevaluations AT moritamizuki tweetclassificationtowardtwitterbaseddiseasesurveillancenewdatamethodsandevaluations AT kanoyoshinobu tweetclassificationtowardtwitterbaseddiseasesurveillancenewdatamethodsandevaluations AT ohkumatomoko tweetclassificationtowardtwitterbaseddiseasesurveillancenewdatamethodsandevaluations AT aramakieiji tweetclassificationtowardtwitterbaseddiseasesurveillancenewdatamethodsandevaluations

Tweet Classification Toward Twitter-Based Disease Surveillance: New Data, Methods, and Evaluations

Ejemplares similares