Cargando…

Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval

Cross-lingual document retrieval, which aims to take a query in one language to retrieve relevant documents in another, has attracted strong research interest in the last decades. Most studies on this task start with cross-lingual comparisons at the word level and then represent documents via word e...

Descripción completa

Detalles Bibliográficos
Autores principales:	Feng, Kai, Huang, Lan, Xu, Hao, Wang, Kangping, Wei, Wei, Zhang, Rui
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9318374/ https://www.ncbi.nlm.nih.gov/pubmed/35885166 http://dx.doi.org/10.3390/e24070943

_version_	1784755274657038336
author	Feng, Kai Huang, Lan Xu, Hao Wang, Kangping Wei, Wei Zhang, Rui
author_facet	Feng, Kai Huang, Lan Xu, Hao Wang, Kangping Wei, Wei Zhang, Rui
author_sort	Feng, Kai
collection	PubMed
description	Cross-lingual document retrieval, which aims to take a query in one language to retrieve relevant documents in another, has attracted strong research interest in the last decades. Most studies on this task start with cross-lingual comparisons at the word level and then represent documents via word embeddings, which leads to insufficient structure information. In this work, the cross-lingual comparison at the document level is achieved through the cross-lingual semantic space. Our method, MDL (deep multilabel multilingual document learning), leverages a six-layer fully connected network to project cross-lingual documents into a shared semantic space. The semantic distances can be calculated when the cross-lingual documents are transformed into embeddings in semantic space. The supervision signals are automatically extracted from the data and then used to construct the semantic space via a linear classifier. The ambiguity of manual labels could be avoided and the multilabel supervision signals can be acquired instead of a single label. The representation of the semantic space is enriched by multilabel supervision signals, which improves the discriminative ability of the embeddings. The MDL is easy to extend to other fields since it does not depend on specific data. Furthermore, MDL is more efficient than the models training all languages jointly, since each language is trained individually. Experiments on Wikipedia data showed that the proposed method outperforms the state-of-the-art cross-lingual document retrieval methods.
format	Online Article Text
id	pubmed-9318374
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-93183742022-07-27 Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval Feng, Kai Huang, Lan Xu, Hao Wang, Kangping Wei, Wei Zhang, Rui Entropy (Basel) Article Cross-lingual document retrieval, which aims to take a query in one language to retrieve relevant documents in another, has attracted strong research interest in the last decades. Most studies on this task start with cross-lingual comparisons at the word level and then represent documents via word embeddings, which leads to insufficient structure information. In this work, the cross-lingual comparison at the document level is achieved through the cross-lingual semantic space. Our method, MDL (deep multilabel multilingual document learning), leverages a six-layer fully connected network to project cross-lingual documents into a shared semantic space. The semantic distances can be calculated when the cross-lingual documents are transformed into embeddings in semantic space. The supervision signals are automatically extracted from the data and then used to construct the semantic space via a linear classifier. The ambiguity of manual labels could be avoided and the multilabel supervision signals can be acquired instead of a single label. The representation of the semantic space is enriched by multilabel supervision signals, which improves the discriminative ability of the embeddings. The MDL is easy to extend to other fields since it does not depend on specific data. Furthermore, MDL is more efficient than the models training all languages jointly, since each language is trained individually. Experiments on Wikipedia data showed that the proposed method outperforms the state-of-the-art cross-lingual document retrieval methods. MDPI 2022-07-07 /pmc/articles/PMC9318374/ /pubmed/35885166 http://dx.doi.org/10.3390/e24070943 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Feng, Kai Huang, Lan Xu, Hao Wang, Kangping Wei, Wei Zhang, Rui Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval
title	Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval
title_full	Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval
title_fullStr	Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval
title_full_unstemmed	Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval
title_short	Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval
title_sort	deep multilabel multilingual document learning for cross-lingual document retrieval
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9318374/ https://www.ncbi.nlm.nih.gov/pubmed/35885166 http://dx.doi.org/10.3390/e24070943
work_keys_str_mv	AT fengkai deepmultilabelmultilingualdocumentlearningforcrosslingualdocumentretrieval AT huanglan deepmultilabelmultilingualdocumentlearningforcrosslingualdocumentretrieval AT xuhao deepmultilabelmultilingualdocumentlearningforcrosslingualdocumentretrieval AT wangkangping deepmultilabelmultilingualdocumentlearningforcrosslingualdocumentretrieval AT weiwei deepmultilabelmultilingualdocumentlearningforcrosslingualdocumentretrieval AT zhangrui deepmultilabelmultilingualdocumentlearningforcrosslingualdocumentretrieval

Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval

Ejemplares similares