Cargando…

Task reformulation and data-centric approach for Twitter medication name extraction

Automatically extracting medication names from tweets is challenging in the real world. There are many tweets; however, only a small proportion mentions medications. Thus, datasets are usually highly imbalanced. Moreover, the length of tweets is very short, which makes it hard to recognize medicatio...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Yu, Lee, Jong Kang, Han, Jen-Chieh, Tsai, Richard Tzong-Han
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2022
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9397573/ https://www.ncbi.nlm.nih.gov/pubmed/35998105 http://dx.doi.org/10.1093/database/baac067

_version_	1784772151410163712
author	Zhang, Yu Lee, Jong Kang Han, Jen-Chieh Tsai, Richard Tzong-Han
author_facet	Zhang, Yu Lee, Jong Kang Han, Jen-Chieh Tsai, Richard Tzong-Han
author_sort	Zhang, Yu
collection	PubMed
description	Automatically extracting medication names from tweets is challenging in the real world. There are many tweets; however, only a small proportion mentions medications. Thus, datasets are usually highly imbalanced. Moreover, the length of tweets is very short, which makes it hard to recognize medication names from the limited context. This paper proposes a data-centric approach for extracting medications in the BioCreative VII Track 3 (Automatic Extraction of Medication Names in Tweets). Our approach formulates the sequence labeling problem as text entailment and question–answer tasks. As a result, without using the dictionary and ensemble method, our single model achieved a Strict F1 of 0.77 (the official baseline system is 0.758, and the average performance of participants is 0.696). Moreover, combining the dictionary filtering and ensemble method achieved a Strict F1 of 0.804 and had the highest performance for all participants. Furthermore, domain-specific and task-specific pretrained language models, as well as data-centric approaches, are proposed for further improvements. Database URL https://competitions.codalab.org/competitions/23925 and https://biocreative.bioinformatics.udel.edu/tasks/biocreative-vii/track-3/
format	Online Article Text
id	pubmed-9397573
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-93975732022-08-24 Task reformulation and data-centric approach for Twitter medication name extraction Zhang, Yu Lee, Jong Kang Han, Jen-Chieh Tsai, Richard Tzong-Han Database (Oxford) Original Article Automatically extracting medication names from tweets is challenging in the real world. There are many tweets; however, only a small proportion mentions medications. Thus, datasets are usually highly imbalanced. Moreover, the length of tweets is very short, which makes it hard to recognize medication names from the limited context. This paper proposes a data-centric approach for extracting medications in the BioCreative VII Track 3 (Automatic Extraction of Medication Names in Tweets). Our approach formulates the sequence labeling problem as text entailment and question–answer tasks. As a result, without using the dictionary and ensemble method, our single model achieved a Strict F1 of 0.77 (the official baseline system is 0.758, and the average performance of participants is 0.696). Moreover, combining the dictionary filtering and ensemble method achieved a Strict F1 of 0.804 and had the highest performance for all participants. Furthermore, domain-specific and task-specific pretrained language models, as well as data-centric approaches, are proposed for further improvements. Database URL https://competitions.codalab.org/competitions/23925 and https://biocreative.bioinformatics.udel.edu/tasks/biocreative-vii/track-3/ Oxford University Press 2022-08-23 /pmc/articles/PMC9397573/ /pubmed/35998105 http://dx.doi.org/10.1093/database/baac067 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Original Article Zhang, Yu Lee, Jong Kang Han, Jen-Chieh Tsai, Richard Tzong-Han Task reformulation and data-centric approach for Twitter medication name extraction
title	Task reformulation and data-centric approach for Twitter medication name extraction
title_full	Task reformulation and data-centric approach for Twitter medication name extraction
title_fullStr	Task reformulation and data-centric approach for Twitter medication name extraction
title_full_unstemmed	Task reformulation and data-centric approach for Twitter medication name extraction
title_short	Task reformulation and data-centric approach for Twitter medication name extraction
title_sort	task reformulation and data-centric approach for twitter medication name extraction
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9397573/ https://www.ncbi.nlm.nih.gov/pubmed/35998105 http://dx.doi.org/10.1093/database/baac067
work_keys_str_mv	AT zhangyu taskreformulationanddatacentricapproachfortwittermedicationnameextraction AT leejongkang taskreformulationanddatacentricapproachfortwittermedicationnameextraction AT hanjenchieh taskreformulationanddatacentricapproachfortwittermedicationnameextraction AT tsairichardtzonghan taskreformulationanddatacentricapproachfortwittermedicationnameextraction

Task reformulation and data-centric approach for Twitter medication name extraction

Ejemplares similares