Cargando…
QAScore—An Unsupervised Unreferenced Metric for the Question Generation Evaluation
Question Generation (QG) aims to automate the task of composing questions for a passage with a set of chosen answers found within the passage. In recent years, the introduction of neural generation models has resulted in substantial improvements of automatically generated questions in terms of quali...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9689497/ https://www.ncbi.nlm.nih.gov/pubmed/36359608 http://dx.doi.org/10.3390/e24111514 |
_version_ | 1784836549949521920 |
---|---|
author | Ji, Tianbo Lyu, Chenyang Jones, Gareth Zhou, Liting Graham, Yvette |
author_facet | Ji, Tianbo Lyu, Chenyang Jones, Gareth Zhou, Liting Graham, Yvette |
author_sort | Ji, Tianbo |
collection | PubMed |
description | Question Generation (QG) aims to automate the task of composing questions for a passage with a set of chosen answers found within the passage. In recent years, the introduction of neural generation models has resulted in substantial improvements of automatically generated questions in terms of quality, especially compared to traditional approaches that employ manually crafted heuristics. However, current QG evaluation metrics solely rely on the comparison between the generated questions and references, ignoring the passages or answers. Meanwhile, these metrics are generally criticized because of their low agreement with human judgement. We therefore propose a new reference-free evaluation metric called QAScore, which is capable of providing a better mechanism for evaluating QG systems. QAScore evaluates a question by computing the cross entropy according to the probability that the language model can correctly generate the masked words in the answer to that question. Compared to existing metrics such as BLEU and BERTScore, QAScore can obtain a stronger correlation with human judgement according to our human evaluation experiment, meaning that applying QAScore in the QG task benefits to a higher level of evaluation accuracy. |
format | Online Article Text |
id | pubmed-9689497 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-96894972022-11-25 QAScore—An Unsupervised Unreferenced Metric for the Question Generation Evaluation Ji, Tianbo Lyu, Chenyang Jones, Gareth Zhou, Liting Graham, Yvette Entropy (Basel) Article Question Generation (QG) aims to automate the task of composing questions for a passage with a set of chosen answers found within the passage. In recent years, the introduction of neural generation models has resulted in substantial improvements of automatically generated questions in terms of quality, especially compared to traditional approaches that employ manually crafted heuristics. However, current QG evaluation metrics solely rely on the comparison between the generated questions and references, ignoring the passages or answers. Meanwhile, these metrics are generally criticized because of their low agreement with human judgement. We therefore propose a new reference-free evaluation metric called QAScore, which is capable of providing a better mechanism for evaluating QG systems. QAScore evaluates a question by computing the cross entropy according to the probability that the language model can correctly generate the masked words in the answer to that question. Compared to existing metrics such as BLEU and BERTScore, QAScore can obtain a stronger correlation with human judgement according to our human evaluation experiment, meaning that applying QAScore in the QG task benefits to a higher level of evaluation accuracy. MDPI 2022-10-24 /pmc/articles/PMC9689497/ /pubmed/36359608 http://dx.doi.org/10.3390/e24111514 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Ji, Tianbo Lyu, Chenyang Jones, Gareth Zhou, Liting Graham, Yvette QAScore—An Unsupervised Unreferenced Metric for the Question Generation Evaluation |
title | QAScore—An Unsupervised Unreferenced Metric for the Question Generation Evaluation |
title_full | QAScore—An Unsupervised Unreferenced Metric for the Question Generation Evaluation |
title_fullStr | QAScore—An Unsupervised Unreferenced Metric for the Question Generation Evaluation |
title_full_unstemmed | QAScore—An Unsupervised Unreferenced Metric for the Question Generation Evaluation |
title_short | QAScore—An Unsupervised Unreferenced Metric for the Question Generation Evaluation |
title_sort | qascore—an unsupervised unreferenced metric for the question generation evaluation |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9689497/ https://www.ncbi.nlm.nih.gov/pubmed/36359608 http://dx.doi.org/10.3390/e24111514 |
work_keys_str_mv | AT jitianbo qascoreanunsupervisedunreferencedmetricforthequestiongenerationevaluation AT lyuchenyang qascoreanunsupervisedunreferencedmetricforthequestiongenerationevaluation AT jonesgareth qascoreanunsupervisedunreferencedmetricforthequestiongenerationevaluation AT zhouliting qascoreanunsupervisedunreferencedmetricforthequestiongenerationevaluation AT grahamyvette qascoreanunsupervisedunreferencedmetricforthequestiongenerationevaluation |