Cargando…

Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings

In this paper, we formulate keyphrase extraction from scholarly articles as a sequence labeling task solved using a BiLSTM-CRF, where the words in the input text are represented using deep contextualized embeddings. We evaluate the proposed architecture using both contextualized and fixed word embed...

Descripción completa

Detalles Bibliográficos
Autores principales: Sahrawat, Dhruva, Mahata, Debanjan, Zhang, Haimin, Kulkarni, Mayank, Sharma, Agniv, Gosangi, Rakesh, Stent, Amanda, Kumar, Yaman, Shah, Rajiv Ratn, Zimmermann, Roger
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148038/
http://dx.doi.org/10.1007/978-3-030-45442-5_41
_version_ 1783520517365432320
author Sahrawat, Dhruva
Mahata, Debanjan
Zhang, Haimin
Kulkarni, Mayank
Sharma, Agniv
Gosangi, Rakesh
Stent, Amanda
Kumar, Yaman
Shah, Rajiv Ratn
Zimmermann, Roger
author_facet Sahrawat, Dhruva
Mahata, Debanjan
Zhang, Haimin
Kulkarni, Mayank
Sharma, Agniv
Gosangi, Rakesh
Stent, Amanda
Kumar, Yaman
Shah, Rajiv Ratn
Zimmermann, Roger
author_sort Sahrawat, Dhruva
collection PubMed
description In this paper, we formulate keyphrase extraction from scholarly articles as a sequence labeling task solved using a BiLSTM-CRF, where the words in the input text are represented using deep contextualized embeddings. We evaluate the proposed architecture using both contextualized and fixed word embedding models on three different benchmark datasets, and compare with existing popular unsupervised and supervised techniques. Our results quantify the benefits of: (a) using contextualized embeddings over fixed word embeddings; (b) using a BiLSTM-CRF architecture with contextualized word embeddings over fine-tuning the contextualized embedding model directly; and (c) using domain-specific contextualized embeddings (SciBERT). Through error analysis, we also provide some insights into why particular models work better than the others. Lastly, we present a case study where we analyze different self-attention layers of the two best models (BERT and SciBERT) to better understand their predictions.
format Online
Article
Text
id pubmed-7148038
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-71480382020-04-13 Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings Sahrawat, Dhruva Mahata, Debanjan Zhang, Haimin Kulkarni, Mayank Sharma, Agniv Gosangi, Rakesh Stent, Amanda Kumar, Yaman Shah, Rajiv Ratn Zimmermann, Roger Advances in Information Retrieval Article In this paper, we formulate keyphrase extraction from scholarly articles as a sequence labeling task solved using a BiLSTM-CRF, where the words in the input text are represented using deep contextualized embeddings. We evaluate the proposed architecture using both contextualized and fixed word embedding models on three different benchmark datasets, and compare with existing popular unsupervised and supervised techniques. Our results quantify the benefits of: (a) using contextualized embeddings over fixed word embeddings; (b) using a BiLSTM-CRF architecture with contextualized word embeddings over fine-tuning the contextualized embedding model directly; and (c) using domain-specific contextualized embeddings (SciBERT). Through error analysis, we also provide some insights into why particular models work better than the others. Lastly, we present a case study where we analyze different self-attention layers of the two best models (BERT and SciBERT) to better understand their predictions. 2020-03-24 /pmc/articles/PMC7148038/ http://dx.doi.org/10.1007/978-3-030-45442-5_41 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Sahrawat, Dhruva
Mahata, Debanjan
Zhang, Haimin
Kulkarni, Mayank
Sharma, Agniv
Gosangi, Rakesh
Stent, Amanda
Kumar, Yaman
Shah, Rajiv Ratn
Zimmermann, Roger
Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings
title Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings
title_full Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings
title_fullStr Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings
title_full_unstemmed Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings
title_short Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings
title_sort keyphrase extraction as sequence labeling using contextualized embeddings
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148038/
http://dx.doi.org/10.1007/978-3-030-45442-5_41
work_keys_str_mv AT sahrawatdhruva keyphraseextractionassequencelabelingusingcontextualizedembeddings
AT mahatadebanjan keyphraseextractionassequencelabelingusingcontextualizedembeddings
AT zhanghaimin keyphraseextractionassequencelabelingusingcontextualizedembeddings
AT kulkarnimayank keyphraseextractionassequencelabelingusingcontextualizedembeddings
AT sharmaagniv keyphraseextractionassequencelabelingusingcontextualizedembeddings
AT gosangirakesh keyphraseextractionassequencelabelingusingcontextualizedembeddings
AT stentamanda keyphraseextractionassequencelabelingusingcontextualizedembeddings
AT kumaryaman keyphraseextractionassequencelabelingusingcontextualizedembeddings
AT shahrajivratn keyphraseextractionassequencelabelingusingcontextualizedembeddings
AT zimmermannroger keyphraseextractionassequencelabelingusingcontextualizedembeddings