Cargando…

ScienceQA: a novel resource for question answering on scholarly articles

Machine Reading Comprehension (MRC) of a document is a challenging problem that requires discourse-level understanding. Information extraction from scholarly articles nowadays is a critical use case for researchers to understand the underlying research quickly and move forward, especially in this ag...

Descripción completa

Detalles Bibliográficos
Autores principales:	Saikh, Tanik, Ghosal, Tirthankar, Mittal, Amish, Ekbal, Asif, Bhattacharyya, Pushpak
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer Berlin Heidelberg 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9297303/ https://www.ncbi.nlm.nih.gov/pubmed/35873651 http://dx.doi.org/10.1007/s00799-022-00329-y

_version_	1784750448025010176
author	Saikh, Tanik Ghosal, Tirthankar Mittal, Amish Ekbal, Asif Bhattacharyya, Pushpak
author_facet	Saikh, Tanik Ghosal, Tirthankar Mittal, Amish Ekbal, Asif Bhattacharyya, Pushpak
author_sort	Saikh, Tanik
collection	PubMed
description	Machine Reading Comprehension (MRC) of a document is a challenging problem that requires discourse-level understanding. Information extraction from scholarly articles nowadays is a critical use case for researchers to understand the underlying research quickly and move forward, especially in this age of infodemic. MRC on research articles can also provide helpful information to the reviewers and editors. However, the main bottleneck in building such models is the availability of human-annotated data. In this paper, firstly, we introduce a dataset to facilitate question answering (QA) on scientific articles. We prepare the dataset in a semi-automated fashion having more than 100k human-annotated context–question–answer triples. Secondly, we implement one baseline QA model based on Bidirectional Encoder Representations from Transformers (BERT). Additionally, we implement two models: the first one is based on Science BERT (SciBERT), and the second is the combination of SciBERT and Bi-Directional Attention Flow (Bi-DAF). The best model (i.e., SciBERT) obtains an F1 score of 75.46%. Our dataset is novel, and our work opens up a new avenue for scholarly document processing research by providing a benchmark QA dataset and standard baseline. We make our dataset and codes available here at https://github.com/TanikSaikh/Scientific-Question-Answering.
format	Online Article Text
id	pubmed-9297303
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Springer Berlin Heidelberg
record_format	MEDLINE/PubMed
spelling	pubmed-92973032022-07-20 ScienceQA: a novel resource for question answering on scholarly articles Saikh, Tanik Ghosal, Tirthankar Mittal, Amish Ekbal, Asif Bhattacharyya, Pushpak Int J Digit Libr Article Machine Reading Comprehension (MRC) of a document is a challenging problem that requires discourse-level understanding. Information extraction from scholarly articles nowadays is a critical use case for researchers to understand the underlying research quickly and move forward, especially in this age of infodemic. MRC on research articles can also provide helpful information to the reviewers and editors. However, the main bottleneck in building such models is the availability of human-annotated data. In this paper, firstly, we introduce a dataset to facilitate question answering (QA) on scientific articles. We prepare the dataset in a semi-automated fashion having more than 100k human-annotated context–question–answer triples. Secondly, we implement one baseline QA model based on Bidirectional Encoder Representations from Transformers (BERT). Additionally, we implement two models: the first one is based on Science BERT (SciBERT), and the second is the combination of SciBERT and Bi-Directional Attention Flow (Bi-DAF). The best model (i.e., SciBERT) obtains an F1 score of 75.46%. Our dataset is novel, and our work opens up a new avenue for scholarly document processing research by providing a benchmark QA dataset and standard baseline. We make our dataset and codes available here at https://github.com/TanikSaikh/Scientific-Question-Answering. Springer Berlin Heidelberg 2022-07-20 2022 /pmc/articles/PMC9297303/ /pubmed/35873651 http://dx.doi.org/10.1007/s00799-022-00329-y Text en © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Article Saikh, Tanik Ghosal, Tirthankar Mittal, Amish Ekbal, Asif Bhattacharyya, Pushpak ScienceQA: a novel resource for question answering on scholarly articles
title	ScienceQA: a novel resource for question answering on scholarly articles
title_full	ScienceQA: a novel resource for question answering on scholarly articles
title_fullStr	ScienceQA: a novel resource for question answering on scholarly articles
title_full_unstemmed	ScienceQA: a novel resource for question answering on scholarly articles
title_short	ScienceQA: a novel resource for question answering on scholarly articles
title_sort	scienceqa: a novel resource for question answering on scholarly articles
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9297303/ https://www.ncbi.nlm.nih.gov/pubmed/35873651 http://dx.doi.org/10.1007/s00799-022-00329-y
work_keys_str_mv	AT saikhtanik scienceqaanovelresourceforquestionansweringonscholarlyarticles AT ghosaltirthankar scienceqaanovelresourceforquestionansweringonscholarlyarticles AT mittalamish scienceqaanovelresourceforquestionansweringonscholarlyarticles AT ekbalasif scienceqaanovelresourceforquestionansweringonscholarlyarticles AT bhattacharyyapushpak scienceqaanovelresourceforquestionansweringonscholarlyarticles

ScienceQA: a novel resource for question answering on scholarly articles

Ejemplares similares