Cargando…

Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings

In this paper, we formulate keyphrase extraction from scholarly articles as a sequence labeling task solved using a BiLSTM-CRF, where the words in the input text are represented using deep contextualized embeddings. We evaluate the proposed architecture using both contextualized and fixed word embed...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sahrawat, Dhruva, Mahata, Debanjan, Zhang, Haimin, Kulkarni, Mayank, Sharma, Agniv, Gosangi, Rakesh, Stent, Amanda, Kumar, Yaman, Shah, Rajiv Ratn, Zimmermann, Roger
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148038/ http://dx.doi.org/10.1007/978-3-030-45442-5_41

_version_	1783520517365432320
author	Sahrawat, Dhruva Mahata, Debanjan Zhang, Haimin Kulkarni, Mayank Sharma, Agniv Gosangi, Rakesh Stent, Amanda Kumar, Yaman Shah, Rajiv Ratn Zimmermann, Roger
author_facet	Sahrawat, Dhruva Mahata, Debanjan Zhang, Haimin Kulkarni, Mayank Sharma, Agniv Gosangi, Rakesh Stent, Amanda Kumar, Yaman Shah, Rajiv Ratn Zimmermann, Roger
author_sort	Sahrawat, Dhruva
collection	PubMed
description	In this paper, we formulate keyphrase extraction from scholarly articles as a sequence labeling task solved using a BiLSTM-CRF, where the words in the input text are represented using deep contextualized embeddings. We evaluate the proposed architecture using both contextualized and fixed word embedding models on three different benchmark datasets, and compare with existing popular unsupervised and supervised techniques. Our results quantify the benefits of: (a) using contextualized embeddings over fixed word embeddings; (b) using a BiLSTM-CRF architecture with contextualized word embeddings over fine-tuning the contextualized embedding model directly; and (c) using domain-specific contextualized embeddings (SciBERT). Through error analysis, we also provide some insights into why particular models work better than the others. Lastly, we present a case study where we analyze different self-attention layers of the two best models (BERT and SciBERT) to better understand their predictions.
format	Online Article Text
id	pubmed-7148038
institution	National Center for Biotechnology Information
language	English
publishDate	2020
record_format	MEDLINE/PubMed
spelling	pubmed-71480382020-04-13 Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings Sahrawat, Dhruva Mahata, Debanjan Zhang, Haimin Kulkarni, Mayank Sharma, Agniv Gosangi, Rakesh Stent, Amanda Kumar, Yaman Shah, Rajiv Ratn Zimmermann, Roger Advances in Information Retrieval Article In this paper, we formulate keyphrase extraction from scholarly articles as a sequence labeling task solved using a BiLSTM-CRF, where the words in the input text are represented using deep contextualized embeddings. We evaluate the proposed architecture using both contextualized and fixed word embedding models on three different benchmark datasets, and compare with existing popular unsupervised and supervised techniques. Our results quantify the benefits of: (a) using contextualized embeddings over fixed word embeddings; (b) using a BiLSTM-CRF architecture with contextualized word embeddings over fine-tuning the contextualized embedding model directly; and (c) using domain-specific contextualized embeddings (SciBERT). Through error analysis, we also provide some insights into why particular models work better than the others. Lastly, we present a case study where we analyze different self-attention layers of the two best models (BERT and SciBERT) to better understand their predictions. 2020-03-24 /pmc/articles/PMC7148038/ http://dx.doi.org/10.1007/978-3-030-45442-5_41 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Article Sahrawat, Dhruva Mahata, Debanjan Zhang, Haimin Kulkarni, Mayank Sharma, Agniv Gosangi, Rakesh Stent, Amanda Kumar, Yaman Shah, Rajiv Ratn Zimmermann, Roger Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings
title	Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings
title_full	Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings
title_fullStr	Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings
title_full_unstemmed	Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings
title_short	Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings
title_sort	keyphrase extraction as sequence labeling using contextualized embeddings
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148038/ http://dx.doi.org/10.1007/978-3-030-45442-5_41
work_keys_str_mv	AT sahrawatdhruva keyphraseextractionassequencelabelingusingcontextualizedembeddings AT mahatadebanjan keyphraseextractionassequencelabelingusingcontextualizedembeddings AT zhanghaimin keyphraseextractionassequencelabelingusingcontextualizedembeddings AT kulkarnimayank keyphraseextractionassequencelabelingusingcontextualizedembeddings AT sharmaagniv keyphraseextractionassequencelabelingusingcontextualizedembeddings AT gosangirakesh keyphraseextractionassequencelabelingusingcontextualizedembeddings AT stentamanda keyphraseextractionassequencelabelingusingcontextualizedembeddings AT kumaryaman keyphraseextractionassequencelabelingusingcontextualizedembeddings AT shahrajivratn keyphraseextractionassequencelabelingusingcontextualizedembeddings AT zimmermannroger keyphraseextractionassequencelabelingusingcontextualizedembeddings

Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings

Ejemplares similares