Cargando…
Identifying COVID-19 english informative tweets using limited labelled data
Identifying COVID-19 informative tweets is very useful in building monitoring systems to track the latest updates. Existing approaches to identify informative tweets rely on a large number of labelled tweets to achieve good performances. As labelling is an expensive and laborious process, there is a...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer Vienna
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9844936/ https://www.ncbi.nlm.nih.gov/pubmed/36686376 http://dx.doi.org/10.1007/s13278-023-01025-8 |
_version_ | 1784870772390494208 |
---|---|
author | Kothuru, Srinivasulu Santhanavijayan, A. |
author_facet | Kothuru, Srinivasulu Santhanavijayan, A. |
author_sort | Kothuru, Srinivasulu |
collection | PubMed |
description | Identifying COVID-19 informative tweets is very useful in building monitoring systems to track the latest updates. Existing approaches to identify informative tweets rely on a large number of labelled tweets to achieve good performances. As labelling is an expensive and laborious process, there is a need to develop approaches that can identify COVID-19 informative tweets using limited labelled data. In this paper, we propose a simple yet novel labelled data-efficient approach that achieves the state-of-the-art (SOTA) F1-score of 91.23 on the WNUT COVID-19 dataset using just 1000 tweets (14.3% of the full training set). Our labelled data-efficient approach starts with limited labelled data, augment it using data augmentation methods and then fine-tune the model using augmented data set. It is the first work to approach the task of identifying COVID-19 English informative tweets using limited labelled data yet achieve the new SOTA performance. |
format | Online Article Text |
id | pubmed-9844936 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Springer Vienna |
record_format | MEDLINE/PubMed |
spelling | pubmed-98449362023-01-18 Identifying COVID-19 english informative tweets using limited labelled data Kothuru, Srinivasulu Santhanavijayan, A. Soc Netw Anal Min Original Article Identifying COVID-19 informative tweets is very useful in building monitoring systems to track the latest updates. Existing approaches to identify informative tweets rely on a large number of labelled tweets to achieve good performances. As labelling is an expensive and laborious process, there is a need to develop approaches that can identify COVID-19 informative tweets using limited labelled data. In this paper, we propose a simple yet novel labelled data-efficient approach that achieves the state-of-the-art (SOTA) F1-score of 91.23 on the WNUT COVID-19 dataset using just 1000 tweets (14.3% of the full training set). Our labelled data-efficient approach starts with limited labelled data, augment it using data augmentation methods and then fine-tune the model using augmented data set. It is the first work to approach the task of identifying COVID-19 English informative tweets using limited labelled data yet achieve the new SOTA performance. Springer Vienna 2023-01-17 2023 /pmc/articles/PMC9844936/ /pubmed/36686376 http://dx.doi.org/10.1007/s13278-023-01025-8 Text en © The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature 2023, Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Original Article Kothuru, Srinivasulu Santhanavijayan, A. Identifying COVID-19 english informative tweets using limited labelled data |
title | Identifying COVID-19 english informative tweets using limited labelled data |
title_full | Identifying COVID-19 english informative tweets using limited labelled data |
title_fullStr | Identifying COVID-19 english informative tweets using limited labelled data |
title_full_unstemmed | Identifying COVID-19 english informative tweets using limited labelled data |
title_short | Identifying COVID-19 english informative tweets using limited labelled data |
title_sort | identifying covid-19 english informative tweets using limited labelled data |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9844936/ https://www.ncbi.nlm.nih.gov/pubmed/36686376 http://dx.doi.org/10.1007/s13278-023-01025-8 |
work_keys_str_mv | AT kothurusrinivasulu identifyingcovid19englishinformativetweetsusinglimitedlabelleddata AT santhanavijayana identifyingcovid19englishinformativetweetsusinglimitedlabelleddata |