Cargando…

SEOSS-Queries - a software engineering dataset for text-to-SQL and question answering tasks

Stakeholders of software development projects have various information needs for making rational decisions during their daily work. Satisfying these needs requires substantial knowledge of where and how the relevant information is stored and consumes valuable time that is often not available. Easing...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tomova, Mihaela Todorova, Hofmann, Martin, Mäder, Patrick
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2022
Materias:	Data Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9079685/ https://www.ncbi.nlm.nih.gov/pubmed/35539028 http://dx.doi.org/10.1016/j.dib.2022.108211

_version_	1784702610750570496
author	Tomova, Mihaela Todorova Hofmann, Martin Mäder, Patrick
author_facet	Tomova, Mihaela Todorova Hofmann, Martin Mäder, Patrick
author_sort	Tomova, Mihaela Todorova
collection	PubMed
description	Stakeholders of software development projects have various information needs for making rational decisions during their daily work. Satisfying these needs requires substantial knowledge of where and how the relevant information is stored and consumes valuable time that is often not available. Easing the need for this knowledge is an ideal text-to-SQL benchmark problem, a field where public datasets are scarce and needed. We propose the SEOSS-Queries dataset consisting of natural language utterances and accompanying SQL queries extracted from previous studies, software projects, issue tracking tools, and through expert surveys to cover a large variety of information need perspectives. Our dataset consists of 1,162 English utterances translating into 166 SQL queries; each query has four precise utterances and three more general ones. Furthermore, the dataset contains 393,086 labeled utterances extracted from issue tracker comments. We provide pre-trained SQLNet and RatSQL baseline models for benchmark comparisons, a replication package facilitating a seamless application, and discuss various other tasks that may be solved and evaluated using the dataset. The whole dataset with paraphrased natural language utterances and SQL queries is hosted at figshare.com/s/75ed49ef01ac2f83b3e2.
format	Online Article Text
id	pubmed-9079685
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-90796852022-05-09 SEOSS-Queries - a software engineering dataset for text-to-SQL and question answering tasks Tomova, Mihaela Todorova Hofmann, Martin Mäder, Patrick Data Brief Data Article Stakeholders of software development projects have various information needs for making rational decisions during their daily work. Satisfying these needs requires substantial knowledge of where and how the relevant information is stored and consumes valuable time that is often not available. Easing the need for this knowledge is an ideal text-to-SQL benchmark problem, a field where public datasets are scarce and needed. We propose the SEOSS-Queries dataset consisting of natural language utterances and accompanying SQL queries extracted from previous studies, software projects, issue tracking tools, and through expert surveys to cover a large variety of information need perspectives. Our dataset consists of 1,162 English utterances translating into 166 SQL queries; each query has four precise utterances and three more general ones. Furthermore, the dataset contains 393,086 labeled utterances extracted from issue tracker comments. We provide pre-trained SQLNet and RatSQL baseline models for benchmark comparisons, a replication package facilitating a seamless application, and discuss various other tasks that may be solved and evaluated using the dataset. The whole dataset with paraphrased natural language utterances and SQL queries is hosted at figshare.com/s/75ed49ef01ac2f83b3e2. Elsevier 2022-04-27 /pmc/articles/PMC9079685/ /pubmed/35539028 http://dx.doi.org/10.1016/j.dib.2022.108211 Text en © 2022 The Authors. Published by Elsevier Inc. https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Data Article Tomova, Mihaela Todorova Hofmann, Martin Mäder, Patrick SEOSS-Queries - a software engineering dataset for text-to-SQL and question answering tasks
title	SEOSS-Queries - a software engineering dataset for text-to-SQL and question answering tasks
title_full	SEOSS-Queries - a software engineering dataset for text-to-SQL and question answering tasks
title_fullStr	SEOSS-Queries - a software engineering dataset for text-to-SQL and question answering tasks
title_full_unstemmed	SEOSS-Queries - a software engineering dataset for text-to-SQL and question answering tasks
title_short	SEOSS-Queries - a software engineering dataset for text-to-SQL and question answering tasks
title_sort	seoss-queries - a software engineering dataset for text-to-sql and question answering tasks
topic	Data Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9079685/ https://www.ncbi.nlm.nih.gov/pubmed/35539028 http://dx.doi.org/10.1016/j.dib.2022.108211
work_keys_str_mv	AT tomovamihaelatodorova seossqueriesasoftwareengineeringdatasetfortexttosqlandquestionansweringtasks AT hofmannmartin seossqueriesasoftwareengineeringdatasetfortexttosqlandquestionansweringtasks AT maderpatrick seossqueriesasoftwareengineeringdatasetfortexttosqlandquestionansweringtasks

SEOSS-Queries - a software engineering dataset for text-to-SQL and question answering tasks

Ejemplares similares