Cargando…
Table to text generation with accurate content copying
Generating fluent, coherent, and informative text from structured data is called table-to-text generation. Copying words from the table is a common method to solve the “out-of-vocabulary” problem, but it’s difficult to achieve accurate copying. In order to overcome this problem, we invent an auto-re...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8611016/ https://www.ncbi.nlm.nih.gov/pubmed/34815423 http://dx.doi.org/10.1038/s41598-021-00813-6 |
_version_ | 1784603216483188736 |
---|---|
author | Yang, Yang Cao, Juan Wen, Yujun Zhang, Pengzhou |
author_facet | Yang, Yang Cao, Juan Wen, Yujun Zhang, Pengzhou |
author_sort | Yang, Yang |
collection | PubMed |
description | Generating fluent, coherent, and informative text from structured data is called table-to-text generation. Copying words from the table is a common method to solve the “out-of-vocabulary” problem, but it’s difficult to achieve accurate copying. In order to overcome this problem, we invent an auto-regressive framework based on the transformer that combines a copying mechanism and language modeling to generate target texts. Firstly, to make the model better learn the semantic relevance between table and text, we apply a word transformation method, which incorporates the field and position information into the target text to acquire the position of where to copy. Then we propose two auxiliary learning objectives, namely table-text constraint loss and copy loss. Table-text constraint loss is used to effectively model table inputs, whereas copy loss is exploited to precisely copy word fragments from a table. Furthermore, we improve the text search strategy to reduce the probability of generating incoherent and repetitive sentences. The model is verified by experiments on two datasets and better results are obtained than the baseline model. On WIKIBIO, the result is improved from 45.47 to 46.87 on BLEU and from 41.54 to 42.28 on ROUGE. On ROTOWIRE, the result is increased by 4.29% on CO metric, and 1.93 points higher on BLEU. |
format | Online Article Text |
id | pubmed-8611016 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-86110162021-11-24 Table to text generation with accurate content copying Yang, Yang Cao, Juan Wen, Yujun Zhang, Pengzhou Sci Rep Article Generating fluent, coherent, and informative text from structured data is called table-to-text generation. Copying words from the table is a common method to solve the “out-of-vocabulary” problem, but it’s difficult to achieve accurate copying. In order to overcome this problem, we invent an auto-regressive framework based on the transformer that combines a copying mechanism and language modeling to generate target texts. Firstly, to make the model better learn the semantic relevance between table and text, we apply a word transformation method, which incorporates the field and position information into the target text to acquire the position of where to copy. Then we propose two auxiliary learning objectives, namely table-text constraint loss and copy loss. Table-text constraint loss is used to effectively model table inputs, whereas copy loss is exploited to precisely copy word fragments from a table. Furthermore, we improve the text search strategy to reduce the probability of generating incoherent and repetitive sentences. The model is verified by experiments on two datasets and better results are obtained than the baseline model. On WIKIBIO, the result is improved from 45.47 to 46.87 on BLEU and from 41.54 to 42.28 on ROUGE. On ROTOWIRE, the result is increased by 4.29% on CO metric, and 1.93 points higher on BLEU. Nature Publishing Group UK 2021-11-23 /pmc/articles/PMC8611016/ /pubmed/34815423 http://dx.doi.org/10.1038/s41598-021-00813-6 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Yang, Yang Cao, Juan Wen, Yujun Zhang, Pengzhou Table to text generation with accurate content copying |
title | Table to text generation with accurate content copying |
title_full | Table to text generation with accurate content copying |
title_fullStr | Table to text generation with accurate content copying |
title_full_unstemmed | Table to text generation with accurate content copying |
title_short | Table to text generation with accurate content copying |
title_sort | table to text generation with accurate content copying |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8611016/ https://www.ncbi.nlm.nih.gov/pubmed/34815423 http://dx.doi.org/10.1038/s41598-021-00813-6 |
work_keys_str_mv | AT yangyang tabletotextgenerationwithaccuratecontentcopying AT caojuan tabletotextgenerationwithaccuratecontentcopying AT wenyujun tabletotextgenerationwithaccuratecontentcopying AT zhangpengzhou tabletotextgenerationwithaccuratecontentcopying |