Cargando…

Question-driven summarization of answers to consumer health questions

Automatic summarization of natural language is a widely studied area in computer science, one that is broadly applicable to anyone who needs to understand large quantities of information. In the medical domain, automatic summarization has the potential to make health information more accessible to p...

Descripción completa

Detalles Bibliográficos
Autores principales: Savery, Max, Abacha, Asma Ben, Gayen, Soumya, Demner-Fushman, Dina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7532186/
https://www.ncbi.nlm.nih.gov/pubmed/33009402
http://dx.doi.org/10.1038/s41597-020-00667-z
_version_ 1783589873248108544
author Savery, Max
Abacha, Asma Ben
Gayen, Soumya
Demner-Fushman, Dina
author_facet Savery, Max
Abacha, Asma Ben
Gayen, Soumya
Demner-Fushman, Dina
author_sort Savery, Max
collection PubMed
description Automatic summarization of natural language is a widely studied area in computer science, one that is broadly applicable to anyone who needs to understand large quantities of information. In the medical domain, automatic summarization has the potential to make health information more accessible to people without medical expertise. However, to evaluate the quality of summaries generated by summarization algorithms, researchers first require gold standard, human generated summaries. Unfortunately there is no available data for the purpose of assessing summaries that help consumers of health information answer their questions. To address this issue, we present the MEDIQA-Answer Summarization dataset, the first dataset designed for question-driven, consumer-focused summarization. It contains 156 health questions asked by consumers, answers to these questions, and manually generated summaries of these answers. The dataset’s unique structure allows it to be used for at least eight different types of summarization evaluations. We also benchmark the performance of baseline and state-of-the-art deep learning approaches on the dataset, demonstrating how it can be used to evaluate automatically generated summaries.
format Online
Article
Text
id pubmed-7532186
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-75321862020-10-19 Question-driven summarization of answers to consumer health questions Savery, Max Abacha, Asma Ben Gayen, Soumya Demner-Fushman, Dina Sci Data Data Descriptor Automatic summarization of natural language is a widely studied area in computer science, one that is broadly applicable to anyone who needs to understand large quantities of information. In the medical domain, automatic summarization has the potential to make health information more accessible to people without medical expertise. However, to evaluate the quality of summaries generated by summarization algorithms, researchers first require gold standard, human generated summaries. Unfortunately there is no available data for the purpose of assessing summaries that help consumers of health information answer their questions. To address this issue, we present the MEDIQA-Answer Summarization dataset, the first dataset designed for question-driven, consumer-focused summarization. It contains 156 health questions asked by consumers, answers to these questions, and manually generated summaries of these answers. The dataset’s unique structure allows it to be used for at least eight different types of summarization evaluations. We also benchmark the performance of baseline and state-of-the-art deep learning approaches on the dataset, demonstrating how it can be used to evaluate automatically generated summaries. Nature Publishing Group UK 2020-10-02 /pmc/articles/PMC7532186/ /pubmed/33009402 http://dx.doi.org/10.1038/s41597-020-00667-z Text en © This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.
spellingShingle Data Descriptor
Savery, Max
Abacha, Asma Ben
Gayen, Soumya
Demner-Fushman, Dina
Question-driven summarization of answers to consumer health questions
title Question-driven summarization of answers to consumer health questions
title_full Question-driven summarization of answers to consumer health questions
title_fullStr Question-driven summarization of answers to consumer health questions
title_full_unstemmed Question-driven summarization of answers to consumer health questions
title_short Question-driven summarization of answers to consumer health questions
title_sort question-driven summarization of answers to consumer health questions
topic Data Descriptor
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7532186/
https://www.ncbi.nlm.nih.gov/pubmed/33009402
http://dx.doi.org/10.1038/s41597-020-00667-z
work_keys_str_mv AT saverymax questiondrivensummarizationofanswerstoconsumerhealthquestions
AT abachaasmaben questiondrivensummarizationofanswerstoconsumerhealthquestions
AT gayensoumya questiondrivensummarizationofanswerstoconsumerhealthquestions
AT demnerfushmandina questiondrivensummarizationofanswerstoconsumerhealthquestions