Cargando…
CLICK-ID: A novel dataset for Indonesian clickbait headlines
News analysis is a popular task in Natural Language Processing (NLP). In particular, the problem of clickbait in news analysis has gained attention in recent years [1, 2]. However, the majority of the tasks has been focused on English news, in which there is already a rich representative resource. F...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7479324/ https://www.ncbi.nlm.nih.gov/pubmed/32939383 http://dx.doi.org/10.1016/j.dib.2020.106231 |
_version_ | 1783580247247028224 |
---|---|
author | William, Andika Sari, Yunita |
author_facet | William, Andika Sari, Yunita |
author_sort | William, Andika |
collection | PubMed |
description | News analysis is a popular task in Natural Language Processing (NLP). In particular, the problem of clickbait in news analysis has gained attention in recent years [1, 2]. However, the majority of the tasks has been focused on English news, in which there is already a rich representative resource. For other languages, such as Indonesian, there is still a lack of resource for clickbait tasks. Therefore, we introduce the CLICK-ID dataset of Indonesian news headlines extracted from 12 Indonesian online news publishers. It is comprised of 15,000 annotated headlines with clickbait and non-clickbait labels. Using the CLICK-ID dataset, we then developed an Indonesian clickbait classification model achieving favourable performance. We believe that this corpus will be useful for replicable experiments in clickbait detection or other experiments in NLP areas. |
format | Online Article Text |
id | pubmed-7479324 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-74793242020-09-15 CLICK-ID: A novel dataset for Indonesian clickbait headlines William, Andika Sari, Yunita Data Brief Data Article News analysis is a popular task in Natural Language Processing (NLP). In particular, the problem of clickbait in news analysis has gained attention in recent years [1, 2]. However, the majority of the tasks has been focused on English news, in which there is already a rich representative resource. For other languages, such as Indonesian, there is still a lack of resource for clickbait tasks. Therefore, we introduce the CLICK-ID dataset of Indonesian news headlines extracted from 12 Indonesian online news publishers. It is comprised of 15,000 annotated headlines with clickbait and non-clickbait labels. Using the CLICK-ID dataset, we then developed an Indonesian clickbait classification model achieving favourable performance. We believe that this corpus will be useful for replicable experiments in clickbait detection or other experiments in NLP areas. Elsevier 2020-08-27 /pmc/articles/PMC7479324/ /pubmed/32939383 http://dx.doi.org/10.1016/j.dib.2020.106231 Text en © 2020 The Authors. Published by Elsevier Inc. http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Data Article William, Andika Sari, Yunita CLICK-ID: A novel dataset for Indonesian clickbait headlines |
title | CLICK-ID: A novel dataset for Indonesian clickbait headlines |
title_full | CLICK-ID: A novel dataset for Indonesian clickbait headlines |
title_fullStr | CLICK-ID: A novel dataset for Indonesian clickbait headlines |
title_full_unstemmed | CLICK-ID: A novel dataset for Indonesian clickbait headlines |
title_short | CLICK-ID: A novel dataset for Indonesian clickbait headlines |
title_sort | click-id: a novel dataset for indonesian clickbait headlines |
topic | Data Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7479324/ https://www.ncbi.nlm.nih.gov/pubmed/32939383 http://dx.doi.org/10.1016/j.dib.2020.106231 |
work_keys_str_mv | AT williamandika clickidanoveldatasetforindonesianclickbaitheadlines AT sariyunita clickidanoveldatasetforindonesianclickbaitheadlines |