Cargando…
Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings
In this paper, we formulate keyphrase extraction from scholarly articles as a sequence labeling task solved using a BiLSTM-CRF, where the words in the input text are represented using deep contextualized embeddings. We evaluate the proposed architecture using both contextualized and fixed word embed...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148038/ http://dx.doi.org/10.1007/978-3-030-45442-5_41 |
_version_ | 1783520517365432320 |
---|---|
author | Sahrawat, Dhruva Mahata, Debanjan Zhang, Haimin Kulkarni, Mayank Sharma, Agniv Gosangi, Rakesh Stent, Amanda Kumar, Yaman Shah, Rajiv Ratn Zimmermann, Roger |
author_facet | Sahrawat, Dhruva Mahata, Debanjan Zhang, Haimin Kulkarni, Mayank Sharma, Agniv Gosangi, Rakesh Stent, Amanda Kumar, Yaman Shah, Rajiv Ratn Zimmermann, Roger |
author_sort | Sahrawat, Dhruva |
collection | PubMed |
description | In this paper, we formulate keyphrase extraction from scholarly articles as a sequence labeling task solved using a BiLSTM-CRF, where the words in the input text are represented using deep contextualized embeddings. We evaluate the proposed architecture using both contextualized and fixed word embedding models on three different benchmark datasets, and compare with existing popular unsupervised and supervised techniques. Our results quantify the benefits of: (a) using contextualized embeddings over fixed word embeddings; (b) using a BiLSTM-CRF architecture with contextualized word embeddings over fine-tuning the contextualized embedding model directly; and (c) using domain-specific contextualized embeddings (SciBERT). Through error analysis, we also provide some insights into why particular models work better than the others. Lastly, we present a case study where we analyze different self-attention layers of the two best models (BERT and SciBERT) to better understand their predictions. |
format | Online Article Text |
id | pubmed-7148038 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-71480382020-04-13 Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings Sahrawat, Dhruva Mahata, Debanjan Zhang, Haimin Kulkarni, Mayank Sharma, Agniv Gosangi, Rakesh Stent, Amanda Kumar, Yaman Shah, Rajiv Ratn Zimmermann, Roger Advances in Information Retrieval Article In this paper, we formulate keyphrase extraction from scholarly articles as a sequence labeling task solved using a BiLSTM-CRF, where the words in the input text are represented using deep contextualized embeddings. We evaluate the proposed architecture using both contextualized and fixed word embedding models on three different benchmark datasets, and compare with existing popular unsupervised and supervised techniques. Our results quantify the benefits of: (a) using contextualized embeddings over fixed word embeddings; (b) using a BiLSTM-CRF architecture with contextualized word embeddings over fine-tuning the contextualized embedding model directly; and (c) using domain-specific contextualized embeddings (SciBERT). Through error analysis, we also provide some insights into why particular models work better than the others. Lastly, we present a case study where we analyze different self-attention layers of the two best models (BERT and SciBERT) to better understand their predictions. 2020-03-24 /pmc/articles/PMC7148038/ http://dx.doi.org/10.1007/978-3-030-45442-5_41 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Sahrawat, Dhruva Mahata, Debanjan Zhang, Haimin Kulkarni, Mayank Sharma, Agniv Gosangi, Rakesh Stent, Amanda Kumar, Yaman Shah, Rajiv Ratn Zimmermann, Roger Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings |
title | Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings |
title_full | Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings |
title_fullStr | Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings |
title_full_unstemmed | Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings |
title_short | Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings |
title_sort | keyphrase extraction as sequence labeling using contextualized embeddings |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148038/ http://dx.doi.org/10.1007/978-3-030-45442-5_41 |
work_keys_str_mv | AT sahrawatdhruva keyphraseextractionassequencelabelingusingcontextualizedembeddings AT mahatadebanjan keyphraseextractionassequencelabelingusingcontextualizedembeddings AT zhanghaimin keyphraseextractionassequencelabelingusingcontextualizedembeddings AT kulkarnimayank keyphraseextractionassequencelabelingusingcontextualizedembeddings AT sharmaagniv keyphraseextractionassequencelabelingusingcontextualizedembeddings AT gosangirakesh keyphraseextractionassequencelabelingusingcontextualizedembeddings AT stentamanda keyphraseextractionassequencelabelingusingcontextualizedembeddings AT kumaryaman keyphraseextractionassequencelabelingusingcontextualizedembeddings AT shahrajivratn keyphraseextractionassequencelabelingusingcontextualizedembeddings AT zimmermannroger keyphraseextractionassequencelabelingusingcontextualizedembeddings |