Cargando…

Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval

Cross-lingual document retrieval, which aims to take a query in one language to retrieve relevant documents in another, has attracted strong research interest in the last decades. Most studies on this task start with cross-lingual comparisons at the word level and then represent documents via word e...

Descripción completa

Detalles Bibliográficos
Autores principales: Feng, Kai, Huang, Lan, Xu, Hao, Wang, Kangping, Wei, Wei, Zhang, Rui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9318374/
https://www.ncbi.nlm.nih.gov/pubmed/35885166
http://dx.doi.org/10.3390/e24070943
_version_ 1784755274657038336
author Feng, Kai
Huang, Lan
Xu, Hao
Wang, Kangping
Wei, Wei
Zhang, Rui
author_facet Feng, Kai
Huang, Lan
Xu, Hao
Wang, Kangping
Wei, Wei
Zhang, Rui
author_sort Feng, Kai
collection PubMed
description Cross-lingual document retrieval, which aims to take a query in one language to retrieve relevant documents in another, has attracted strong research interest in the last decades. Most studies on this task start with cross-lingual comparisons at the word level and then represent documents via word embeddings, which leads to insufficient structure information. In this work, the cross-lingual comparison at the document level is achieved through the cross-lingual semantic space. Our method, MDL (deep multilabel multilingual document learning), leverages a six-layer fully connected network to project cross-lingual documents into a shared semantic space. The semantic distances can be calculated when the cross-lingual documents are transformed into embeddings in semantic space. The supervision signals are automatically extracted from the data and then used to construct the semantic space via a linear classifier. The ambiguity of manual labels could be avoided and the multilabel supervision signals can be acquired instead of a single label. The representation of the semantic space is enriched by multilabel supervision signals, which improves the discriminative ability of the embeddings. The MDL is easy to extend to other fields since it does not depend on specific data. Furthermore, MDL is more efficient than the models training all languages jointly, since each language is trained individually. Experiments on Wikipedia data showed that the proposed method outperforms the state-of-the-art cross-lingual document retrieval methods.
format Online
Article
Text
id pubmed-9318374
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-93183742022-07-27 Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval Feng, Kai Huang, Lan Xu, Hao Wang, Kangping Wei, Wei Zhang, Rui Entropy (Basel) Article Cross-lingual document retrieval, which aims to take a query in one language to retrieve relevant documents in another, has attracted strong research interest in the last decades. Most studies on this task start with cross-lingual comparisons at the word level and then represent documents via word embeddings, which leads to insufficient structure information. In this work, the cross-lingual comparison at the document level is achieved through the cross-lingual semantic space. Our method, MDL (deep multilabel multilingual document learning), leverages a six-layer fully connected network to project cross-lingual documents into a shared semantic space. The semantic distances can be calculated when the cross-lingual documents are transformed into embeddings in semantic space. The supervision signals are automatically extracted from the data and then used to construct the semantic space via a linear classifier. The ambiguity of manual labels could be avoided and the multilabel supervision signals can be acquired instead of a single label. The representation of the semantic space is enriched by multilabel supervision signals, which improves the discriminative ability of the embeddings. The MDL is easy to extend to other fields since it does not depend on specific data. Furthermore, MDL is more efficient than the models training all languages jointly, since each language is trained individually. Experiments on Wikipedia data showed that the proposed method outperforms the state-of-the-art cross-lingual document retrieval methods. MDPI 2022-07-07 /pmc/articles/PMC9318374/ /pubmed/35885166 http://dx.doi.org/10.3390/e24070943 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Feng, Kai
Huang, Lan
Xu, Hao
Wang, Kangping
Wei, Wei
Zhang, Rui
Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval
title Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval
title_full Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval
title_fullStr Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval
title_full_unstemmed Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval
title_short Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval
title_sort deep multilabel multilingual document learning for cross-lingual document retrieval
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9318374/
https://www.ncbi.nlm.nih.gov/pubmed/35885166
http://dx.doi.org/10.3390/e24070943
work_keys_str_mv AT fengkai deepmultilabelmultilingualdocumentlearningforcrosslingualdocumentretrieval
AT huanglan deepmultilabelmultilingualdocumentlearningforcrosslingualdocumentretrieval
AT xuhao deepmultilabelmultilingualdocumentlearningforcrosslingualdocumentretrieval
AT wangkangping deepmultilabelmultilingualdocumentlearningforcrosslingualdocumentretrieval
AT weiwei deepmultilabelmultilingualdocumentlearningforcrosslingualdocumentretrieval
AT zhangrui deepmultilabelmultilingualdocumentlearningforcrosslingualdocumentretrieval