Cargando…

QAScore—An Unsupervised Unreferenced Metric for the Question Generation Evaluation

Question Generation (QG) aims to automate the task of composing questions for a passage with a set of chosen answers found within the passage. In recent years, the introduction of neural generation models has resulted in substantial improvements of automatically generated questions in terms of quali...

Descripción completa

Detalles Bibliográficos
Autores principales: Ji, Tianbo, Lyu, Chenyang, Jones, Gareth, Zhou, Liting, Graham, Yvette
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9689497/
https://www.ncbi.nlm.nih.gov/pubmed/36359608
http://dx.doi.org/10.3390/e24111514
_version_ 1784836549949521920
author Ji, Tianbo
Lyu, Chenyang
Jones, Gareth
Zhou, Liting
Graham, Yvette
author_facet Ji, Tianbo
Lyu, Chenyang
Jones, Gareth
Zhou, Liting
Graham, Yvette
author_sort Ji, Tianbo
collection PubMed
description Question Generation (QG) aims to automate the task of composing questions for a passage with a set of chosen answers found within the passage. In recent years, the introduction of neural generation models has resulted in substantial improvements of automatically generated questions in terms of quality, especially compared to traditional approaches that employ manually crafted heuristics. However, current QG evaluation metrics solely rely on the comparison between the generated questions and references, ignoring the passages or answers. Meanwhile, these metrics are generally criticized because of their low agreement with human judgement. We therefore propose a new reference-free evaluation metric called QAScore, which is capable of providing a better mechanism for evaluating QG systems. QAScore evaluates a question by computing the cross entropy according to the probability that the language model can correctly generate the masked words in the answer to that question. Compared to existing metrics such as BLEU and BERTScore, QAScore can obtain a stronger correlation with human judgement according to our human evaluation experiment, meaning that applying QAScore in the QG task benefits to a higher level of evaluation accuracy.
format Online
Article
Text
id pubmed-9689497
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-96894972022-11-25 QAScore—An Unsupervised Unreferenced Metric for the Question Generation Evaluation Ji, Tianbo Lyu, Chenyang Jones, Gareth Zhou, Liting Graham, Yvette Entropy (Basel) Article Question Generation (QG) aims to automate the task of composing questions for a passage with a set of chosen answers found within the passage. In recent years, the introduction of neural generation models has resulted in substantial improvements of automatically generated questions in terms of quality, especially compared to traditional approaches that employ manually crafted heuristics. However, current QG evaluation metrics solely rely on the comparison between the generated questions and references, ignoring the passages or answers. Meanwhile, these metrics are generally criticized because of their low agreement with human judgement. We therefore propose a new reference-free evaluation metric called QAScore, which is capable of providing a better mechanism for evaluating QG systems. QAScore evaluates a question by computing the cross entropy according to the probability that the language model can correctly generate the masked words in the answer to that question. Compared to existing metrics such as BLEU and BERTScore, QAScore can obtain a stronger correlation with human judgement according to our human evaluation experiment, meaning that applying QAScore in the QG task benefits to a higher level of evaluation accuracy. MDPI 2022-10-24 /pmc/articles/PMC9689497/ /pubmed/36359608 http://dx.doi.org/10.3390/e24111514 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Ji, Tianbo
Lyu, Chenyang
Jones, Gareth
Zhou, Liting
Graham, Yvette
QAScore—An Unsupervised Unreferenced Metric for the Question Generation Evaluation
title QAScore—An Unsupervised Unreferenced Metric for the Question Generation Evaluation
title_full QAScore—An Unsupervised Unreferenced Metric for the Question Generation Evaluation
title_fullStr QAScore—An Unsupervised Unreferenced Metric for the Question Generation Evaluation
title_full_unstemmed QAScore—An Unsupervised Unreferenced Metric for the Question Generation Evaluation
title_short QAScore—An Unsupervised Unreferenced Metric for the Question Generation Evaluation
title_sort qascore—an unsupervised unreferenced metric for the question generation evaluation
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9689497/
https://www.ncbi.nlm.nih.gov/pubmed/36359608
http://dx.doi.org/10.3390/e24111514
work_keys_str_mv AT jitianbo qascoreanunsupervisedunreferencedmetricforthequestiongenerationevaluation
AT lyuchenyang qascoreanunsupervisedunreferencedmetricforthequestiongenerationevaluation
AT jonesgareth qascoreanunsupervisedunreferencedmetricforthequestiongenerationevaluation
AT zhouliting qascoreanunsupervisedunreferencedmetricforthequestiongenerationevaluation
AT grahamyvette qascoreanunsupervisedunreferencedmetricforthequestiongenerationevaluation