Cargando…

A data package for abstractive opinion summarization, title generation, and rating-based sentiment prediction for airline reviews

Customer reviews are valuable resources containing customer opinions and sentiments toward the product. The reviews are informative but can be quite lengthy or may contain repetitive information calling for opinion summarization systems that retain only the significant opinion information from the r...

Descripción completa

Detalles Bibliográficos
Autores principales: Syed, Ayesha Ayub, Gaol, Ford Lumban, Boediman, Alfred, Matsuo, Tokuro, Budiharto, Widodo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10504493/
https://www.ncbi.nlm.nih.gov/pubmed/37720686
http://dx.doi.org/10.1016/j.dib.2023.109535
_version_ 1785106735934996480
author Syed, Ayesha Ayub
Gaol, Ford Lumban
Boediman, Alfred
Matsuo, Tokuro
Budiharto, Widodo
author_facet Syed, Ayesha Ayub
Gaol, Ford Lumban
Boediman, Alfred
Matsuo, Tokuro
Budiharto, Widodo
author_sort Syed, Ayesha Ayub
collection PubMed
description Customer reviews are valuable resources containing customer opinions and sentiments toward the product. The reviews are informative but can be quite lengthy or may contain repetitive information calling for opinion summarization systems that retain only the significant opinion information from the review. Abstractive summarization is a form of text summarization that generates a summary mimicking a human-written summary [1]. When pretrained language models are finetuned for abstractive review summarization, there usually occurs a problem known as the ‘domain shift’, because the source and target domains exhibit data from varying distributions [2]. This issue results in performance degradation of the model at the target end. This paper contributes a data package comprising of an annotated abstractive summarization dataset (annotated_abs_summ) of airline reviews having 500 reviews and abstractive summary pairs, a dataset (review_titles_data) consisting of 7079 reviews and review title pairs for review title generatioon or domain adaptive training [3] to address the domain shift problem for abstractive opinion summarization and, an annotated reviews dataset (annotated_sentiment) for rating-based sentiment classification. All datasets have been collected from the Skytrax Review Portal via web scraping using Python programming language. The datasets have several potential use cases. The abstractive summarization dataset can serve as a benchmark dataset for airline review summarization. The dataset for domain adaptive training can be used as a standalone dataset for review title generation. The dataset for sentiment analysis is multipurpose having columns like user rating and recommendation value, that can be used for statistical analysis like finding correlation between these data items as well as for other Natural Language Processing (NLP) tasks like predicting rating or recommendation value from the customer reviews. The datasets can be extended using various data augmentation techniques [4,5]. Moreover, the datasets are related and can be collectively used to develop a multi-task learning model [6] for better learning efficiency and improved performance.
format Online
Article
Text
id pubmed-10504493
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-105044932023-09-17 A data package for abstractive opinion summarization, title generation, and rating-based sentiment prediction for airline reviews Syed, Ayesha Ayub Gaol, Ford Lumban Boediman, Alfred Matsuo, Tokuro Budiharto, Widodo Data Brief Data Article Customer reviews are valuable resources containing customer opinions and sentiments toward the product. The reviews are informative but can be quite lengthy or may contain repetitive information calling for opinion summarization systems that retain only the significant opinion information from the review. Abstractive summarization is a form of text summarization that generates a summary mimicking a human-written summary [1]. When pretrained language models are finetuned for abstractive review summarization, there usually occurs a problem known as the ‘domain shift’, because the source and target domains exhibit data from varying distributions [2]. This issue results in performance degradation of the model at the target end. This paper contributes a data package comprising of an annotated abstractive summarization dataset (annotated_abs_summ) of airline reviews having 500 reviews and abstractive summary pairs, a dataset (review_titles_data) consisting of 7079 reviews and review title pairs for review title generatioon or domain adaptive training [3] to address the domain shift problem for abstractive opinion summarization and, an annotated reviews dataset (annotated_sentiment) for rating-based sentiment classification. All datasets have been collected from the Skytrax Review Portal via web scraping using Python programming language. The datasets have several potential use cases. The abstractive summarization dataset can serve as a benchmark dataset for airline review summarization. The dataset for domain adaptive training can be used as a standalone dataset for review title generation. The dataset for sentiment analysis is multipurpose having columns like user rating and recommendation value, that can be used for statistical analysis like finding correlation between these data items as well as for other Natural Language Processing (NLP) tasks like predicting rating or recommendation value from the customer reviews. The datasets can be extended using various data augmentation techniques [4,5]. Moreover, the datasets are related and can be collectively used to develop a multi-task learning model [6] for better learning efficiency and improved performance. Elsevier 2023-09-01 /pmc/articles/PMC10504493/ /pubmed/37720686 http://dx.doi.org/10.1016/j.dib.2023.109535 Text en © 2023 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Data Article
Syed, Ayesha Ayub
Gaol, Ford Lumban
Boediman, Alfred
Matsuo, Tokuro
Budiharto, Widodo
A data package for abstractive opinion summarization, title generation, and rating-based sentiment prediction for airline reviews
title A data package for abstractive opinion summarization, title generation, and rating-based sentiment prediction for airline reviews
title_full A data package for abstractive opinion summarization, title generation, and rating-based sentiment prediction for airline reviews
title_fullStr A data package for abstractive opinion summarization, title generation, and rating-based sentiment prediction for airline reviews
title_full_unstemmed A data package for abstractive opinion summarization, title generation, and rating-based sentiment prediction for airline reviews
title_short A data package for abstractive opinion summarization, title generation, and rating-based sentiment prediction for airline reviews
title_sort data package for abstractive opinion summarization, title generation, and rating-based sentiment prediction for airline reviews
topic Data Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10504493/
https://www.ncbi.nlm.nih.gov/pubmed/37720686
http://dx.doi.org/10.1016/j.dib.2023.109535
work_keys_str_mv AT syedayeshaayub adatapackageforabstractiveopinionsummarizationtitlegenerationandratingbasedsentimentpredictionforairlinereviews
AT gaolfordlumban adatapackageforabstractiveopinionsummarizationtitlegenerationandratingbasedsentimentpredictionforairlinereviews
AT boedimanalfred adatapackageforabstractiveopinionsummarizationtitlegenerationandratingbasedsentimentpredictionforairlinereviews
AT matsuotokuro adatapackageforabstractiveopinionsummarizationtitlegenerationandratingbasedsentimentpredictionforairlinereviews
AT budihartowidodo adatapackageforabstractiveopinionsummarizationtitlegenerationandratingbasedsentimentpredictionforairlinereviews
AT syedayeshaayub datapackageforabstractiveopinionsummarizationtitlegenerationandratingbasedsentimentpredictionforairlinereviews
AT gaolfordlumban datapackageforabstractiveopinionsummarizationtitlegenerationandratingbasedsentimentpredictionforairlinereviews
AT boedimanalfred datapackageforabstractiveopinionsummarizationtitlegenerationandratingbasedsentimentpredictionforairlinereviews
AT matsuotokuro datapackageforabstractiveopinionsummarizationtitlegenerationandratingbasedsentimentpredictionforairlinereviews
AT budihartowidodo datapackageforabstractiveopinionsummarizationtitlegenerationandratingbasedsentimentpredictionforairlinereviews