Cargando…

Canonicalizing Knowledge Bases for Recruitment Domain

Online recruitment industry holds large amount of user-generated content in the form of job postings, resumes etc. This content finds its way in the knowledge bases (KB) causing duplicate and non-standard representations of entities (like company names, institute names, designations, skills etc.) T...

Descripción completa

Detalles Bibliográficos
Autores principales:	Fatma, Nausheen, Choudhary, Vijay, Sachdeva, Niharika, Rajput, Nitendra
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206229/ http://dx.doi.org/10.1007/978-3-030-47436-2_38

_version_	1783530372914479104
author	Fatma, Nausheen Choudhary, Vijay Sachdeva, Niharika Rajput, Nitendra
author_facet	Fatma, Nausheen Choudhary, Vijay Sachdeva, Niharika Rajput, Nitendra
author_sort	Fatma, Nausheen
collection	PubMed
description	Online recruitment industry holds large amount of user-generated content in the form of job postings, resumes etc. This content finds its way in the knowledge bases (KB) causing duplicate and non-standard representations of entities (like company names, institute names, designations, skills etc.) These non-standard entity representations impact various applications such as search, recommendations and information retrieval. Therefore, KB canonicalization i.e, mapping multiple references of same entities into unique clusters is imperative for online recruitment platforms. Research suggests various approaches that use enriched semantic context or external context (from sources like Freebase) to perform KB Canonicalization. In fields where such external sources of context do not exist the problem remains challenging. To address these challenges, we propose a novel deep Siamese architecture with character-based attention and word embeddings that (a) estimates pairwise similarity between all entity mentions, and (b) then uses these similarity (scores) to create canonical clusters representing unique entity in the KB. Our experiments on recruitment domain dataset comprising of 62,288 unique entities of various types such as companies, institutes, skills, and designations demonstrate the effectiveness of our approach. We also provide insights on different network architectures, each of which encapsulate a different set of variation while performing canonicalization.
format	Online Article Text
id	pubmed-7206229
institution	National Center for Biotechnology Information
language	English
publishDate	2020
record_format	MEDLINE/PubMed
spelling	pubmed-72062292020-05-08 Canonicalizing Knowledge Bases for Recruitment Domain Fatma, Nausheen Choudhary, Vijay Sachdeva, Niharika Rajput, Nitendra Advances in Knowledge Discovery and Data Mining Article Online recruitment industry holds large amount of user-generated content in the form of job postings, resumes etc. This content finds its way in the knowledge bases (KB) causing duplicate and non-standard representations of entities (like company names, institute names, designations, skills etc.) These non-standard entity representations impact various applications such as search, recommendations and information retrieval. Therefore, KB canonicalization i.e, mapping multiple references of same entities into unique clusters is imperative for online recruitment platforms. Research suggests various approaches that use enriched semantic context or external context (from sources like Freebase) to perform KB Canonicalization. In fields where such external sources of context do not exist the problem remains challenging. To address these challenges, we propose a novel deep Siamese architecture with character-based attention and word embeddings that (a) estimates pairwise similarity between all entity mentions, and (b) then uses these similarity (scores) to create canonical clusters representing unique entity in the KB. Our experiments on recruitment domain dataset comprising of 62,288 unique entities of various types such as companies, institutes, skills, and designations demonstrate the effectiveness of our approach. We also provide insights on different network architectures, each of which encapsulate a different set of variation while performing canonicalization. 2020-04-17 /pmc/articles/PMC7206229/ http://dx.doi.org/10.1007/978-3-030-47436-2_38 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Article Fatma, Nausheen Choudhary, Vijay Sachdeva, Niharika Rajput, Nitendra Canonicalizing Knowledge Bases for Recruitment Domain
title	Canonicalizing Knowledge Bases for Recruitment Domain
title_full	Canonicalizing Knowledge Bases for Recruitment Domain
title_fullStr	Canonicalizing Knowledge Bases for Recruitment Domain
title_full_unstemmed	Canonicalizing Knowledge Bases for Recruitment Domain
title_short	Canonicalizing Knowledge Bases for Recruitment Domain
title_sort	canonicalizing knowledge bases for recruitment domain
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206229/ http://dx.doi.org/10.1007/978-3-030-47436-2_38
work_keys_str_mv	AT fatmanausheen canonicalizingknowledgebasesforrecruitmentdomain AT choudharyvijay canonicalizingknowledgebasesforrecruitmentdomain AT sachdevaniharika canonicalizingknowledgebasesforrecruitmentdomain AT rajputnitendra canonicalizingknowledgebasesforrecruitmentdomain

Canonicalizing Knowledge Bases for Recruitment Domain

Ejemplares similares