Cargando…

Factoid Question Answering with Distant Supervision

Automatic question answering (QA), which can greatly facilitate the access to information, is an important task in artificial intelligence. Recent years have witnessed the development of QA methods based on deep learning. However, a great amount of data is needed to train deep neural networks, and i...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Hongzhi, Liang, Xiao, Xu, Guangluan, Fu, Kun, Li, Feng, Huang, Tinglei
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2018
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7512957/ https://www.ncbi.nlm.nih.gov/pubmed/33265529 http://dx.doi.org/10.3390/e20060439

_version_	1783586277252136960
author	Zhang, Hongzhi Liang, Xiao Xu, Guangluan Fu, Kun Li, Feng Huang, Tinglei
author_facet	Zhang, Hongzhi Liang, Xiao Xu, Guangluan Fu, Kun Li, Feng Huang, Tinglei
author_sort	Zhang, Hongzhi
collection	PubMed
description	Automatic question answering (QA), which can greatly facilitate the access to information, is an important task in artificial intelligence. Recent years have witnessed the development of QA methods based on deep learning. However, a great amount of data is needed to train deep neural networks, and it is laborious to annotate training data for factoid QA of new domains or languages. In this paper, a distantly supervised method is proposed to automatically generate QA pairs. Additional efforts are paid to let the generated questions reflect the query interests and expression styles of users by exploring the community QA. Specifically, the generated questions are selected according to the estimated probabilities they are asked. Diverse paraphrases of questions are mined from community QA data, considering that the model trained on monotonous synthetic questions is very sensitive to variants of question expressions. Experimental results show that the model solely trained on generated data via the distant supervision and mined paraphrases could answer real-world questions with the accuracy of 49.34%. When limited annotated training data is available, significant improvements could be achieved by incorporating the generated data. An improvement of 1.35 absolute points is still observed on WebQA, a dataset with large-scale annotated training samples.
format	Online Article Text
id	pubmed-7512957
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-75129572020-11-09 Factoid Question Answering with Distant Supervision Zhang, Hongzhi Liang, Xiao Xu, Guangluan Fu, Kun Li, Feng Huang, Tinglei Entropy (Basel) Article Automatic question answering (QA), which can greatly facilitate the access to information, is an important task in artificial intelligence. Recent years have witnessed the development of QA methods based on deep learning. However, a great amount of data is needed to train deep neural networks, and it is laborious to annotate training data for factoid QA of new domains or languages. In this paper, a distantly supervised method is proposed to automatically generate QA pairs. Additional efforts are paid to let the generated questions reflect the query interests and expression styles of users by exploring the community QA. Specifically, the generated questions are selected according to the estimated probabilities they are asked. Diverse paraphrases of questions are mined from community QA data, considering that the model trained on monotonous synthetic questions is very sensitive to variants of question expressions. Experimental results show that the model solely trained on generated data via the distant supervision and mined paraphrases could answer real-world questions with the accuracy of 49.34%. When limited annotated training data is available, significant improvements could be achieved by incorporating the generated data. An improvement of 1.35 absolute points is still observed on WebQA, a dataset with large-scale annotated training samples. MDPI 2018-06-05 /pmc/articles/PMC7512957/ /pubmed/33265529 http://dx.doi.org/10.3390/e20060439 Text en © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Zhang, Hongzhi Liang, Xiao Xu, Guangluan Fu, Kun Li, Feng Huang, Tinglei Factoid Question Answering with Distant Supervision
title	Factoid Question Answering with Distant Supervision
title_full	Factoid Question Answering with Distant Supervision
title_fullStr	Factoid Question Answering with Distant Supervision
title_full_unstemmed	Factoid Question Answering with Distant Supervision
title_short	Factoid Question Answering with Distant Supervision
title_sort	factoid question answering with distant supervision
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7512957/ https://www.ncbi.nlm.nih.gov/pubmed/33265529 http://dx.doi.org/10.3390/e20060439
work_keys_str_mv	AT zhanghongzhi factoidquestionansweringwithdistantsupervision AT liangxiao factoidquestionansweringwithdistantsupervision AT xuguangluan factoidquestionansweringwithdistantsupervision AT fukun factoidquestionansweringwithdistantsupervision AT lifeng factoidquestionansweringwithdistantsupervision AT huangtinglei factoidquestionansweringwithdistantsupervision

Factoid Question Answering with Distant Supervision

Ejemplares similares