Cargando…

CLICK-ID: A novel dataset for Indonesian clickbait headlines

News analysis is a popular task in Natural Language Processing (NLP). In particular, the problem of clickbait in news analysis has gained attention in recent years [1, 2]. However, the majority of the tasks has been focused on English news, in which there is already a rich representative resource. F...

Descripción completa

Detalles Bibliográficos
Autores principales: William, Andika, Sari, Yunita
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7479324/
https://www.ncbi.nlm.nih.gov/pubmed/32939383
http://dx.doi.org/10.1016/j.dib.2020.106231
_version_ 1783580247247028224
author William, Andika
Sari, Yunita
author_facet William, Andika
Sari, Yunita
author_sort William, Andika
collection PubMed
description News analysis is a popular task in Natural Language Processing (NLP). In particular, the problem of clickbait in news analysis has gained attention in recent years [1, 2]. However, the majority of the tasks has been focused on English news, in which there is already a rich representative resource. For other languages, such as Indonesian, there is still a lack of resource for clickbait tasks. Therefore, we introduce the CLICK-ID dataset of Indonesian news headlines extracted from 12 Indonesian online news publishers. It is comprised of 15,000 annotated headlines with clickbait and non-clickbait labels. Using the CLICK-ID dataset, we then developed an Indonesian clickbait classification model achieving favourable performance. We believe that this corpus will be useful for replicable experiments in clickbait detection or other experiments in NLP areas.
format Online
Article
Text
id pubmed-7479324
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-74793242020-09-15 CLICK-ID: A novel dataset for Indonesian clickbait headlines William, Andika Sari, Yunita Data Brief Data Article News analysis is a popular task in Natural Language Processing (NLP). In particular, the problem of clickbait in news analysis has gained attention in recent years [1, 2]. However, the majority of the tasks has been focused on English news, in which there is already a rich representative resource. For other languages, such as Indonesian, there is still a lack of resource for clickbait tasks. Therefore, we introduce the CLICK-ID dataset of Indonesian news headlines extracted from 12 Indonesian online news publishers. It is comprised of 15,000 annotated headlines with clickbait and non-clickbait labels. Using the CLICK-ID dataset, we then developed an Indonesian clickbait classification model achieving favourable performance. We believe that this corpus will be useful for replicable experiments in clickbait detection or other experiments in NLP areas. Elsevier 2020-08-27 /pmc/articles/PMC7479324/ /pubmed/32939383 http://dx.doi.org/10.1016/j.dib.2020.106231 Text en © 2020 The Authors. Published by Elsevier Inc. http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Data Article
William, Andika
Sari, Yunita
CLICK-ID: A novel dataset for Indonesian clickbait headlines
title CLICK-ID: A novel dataset for Indonesian clickbait headlines
title_full CLICK-ID: A novel dataset for Indonesian clickbait headlines
title_fullStr CLICK-ID: A novel dataset for Indonesian clickbait headlines
title_full_unstemmed CLICK-ID: A novel dataset for Indonesian clickbait headlines
title_short CLICK-ID: A novel dataset for Indonesian clickbait headlines
title_sort click-id: a novel dataset for indonesian clickbait headlines
topic Data Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7479324/
https://www.ncbi.nlm.nih.gov/pubmed/32939383
http://dx.doi.org/10.1016/j.dib.2020.106231
work_keys_str_mv AT williamandika clickidanoveldatasetforindonesianclickbaitheadlines
AT sariyunita clickidanoveldatasetforindonesianclickbaitheadlines