Cargando…
Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval
Cross-lingual document retrieval, which aims to take a query in one language to retrieve relevant documents in another, has attracted strong research interest in the last decades. Most studies on this task start with cross-lingual comparisons at the word level and then represent documents via word e...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9318374/ https://www.ncbi.nlm.nih.gov/pubmed/35885166 http://dx.doi.org/10.3390/e24070943 |
_version_ | 1784755274657038336 |
---|---|
author | Feng, Kai Huang, Lan Xu, Hao Wang, Kangping Wei, Wei Zhang, Rui |
author_facet | Feng, Kai Huang, Lan Xu, Hao Wang, Kangping Wei, Wei Zhang, Rui |
author_sort | Feng, Kai |
collection | PubMed |
description | Cross-lingual document retrieval, which aims to take a query in one language to retrieve relevant documents in another, has attracted strong research interest in the last decades. Most studies on this task start with cross-lingual comparisons at the word level and then represent documents via word embeddings, which leads to insufficient structure information. In this work, the cross-lingual comparison at the document level is achieved through the cross-lingual semantic space. Our method, MDL (deep multilabel multilingual document learning), leverages a six-layer fully connected network to project cross-lingual documents into a shared semantic space. The semantic distances can be calculated when the cross-lingual documents are transformed into embeddings in semantic space. The supervision signals are automatically extracted from the data and then used to construct the semantic space via a linear classifier. The ambiguity of manual labels could be avoided and the multilabel supervision signals can be acquired instead of a single label. The representation of the semantic space is enriched by multilabel supervision signals, which improves the discriminative ability of the embeddings. The MDL is easy to extend to other fields since it does not depend on specific data. Furthermore, MDL is more efficient than the models training all languages jointly, since each language is trained individually. Experiments on Wikipedia data showed that the proposed method outperforms the state-of-the-art cross-lingual document retrieval methods. |
format | Online Article Text |
id | pubmed-9318374 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-93183742022-07-27 Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval Feng, Kai Huang, Lan Xu, Hao Wang, Kangping Wei, Wei Zhang, Rui Entropy (Basel) Article Cross-lingual document retrieval, which aims to take a query in one language to retrieve relevant documents in another, has attracted strong research interest in the last decades. Most studies on this task start with cross-lingual comparisons at the word level and then represent documents via word embeddings, which leads to insufficient structure information. In this work, the cross-lingual comparison at the document level is achieved through the cross-lingual semantic space. Our method, MDL (deep multilabel multilingual document learning), leverages a six-layer fully connected network to project cross-lingual documents into a shared semantic space. The semantic distances can be calculated when the cross-lingual documents are transformed into embeddings in semantic space. The supervision signals are automatically extracted from the data and then used to construct the semantic space via a linear classifier. The ambiguity of manual labels could be avoided and the multilabel supervision signals can be acquired instead of a single label. The representation of the semantic space is enriched by multilabel supervision signals, which improves the discriminative ability of the embeddings. The MDL is easy to extend to other fields since it does not depend on specific data. Furthermore, MDL is more efficient than the models training all languages jointly, since each language is trained individually. Experiments on Wikipedia data showed that the proposed method outperforms the state-of-the-art cross-lingual document retrieval methods. MDPI 2022-07-07 /pmc/articles/PMC9318374/ /pubmed/35885166 http://dx.doi.org/10.3390/e24070943 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Feng, Kai Huang, Lan Xu, Hao Wang, Kangping Wei, Wei Zhang, Rui Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval |
title | Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval |
title_full | Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval |
title_fullStr | Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval |
title_full_unstemmed | Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval |
title_short | Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval |
title_sort | deep multilabel multilingual document learning for cross-lingual document retrieval |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9318374/ https://www.ncbi.nlm.nih.gov/pubmed/35885166 http://dx.doi.org/10.3390/e24070943 |
work_keys_str_mv | AT fengkai deepmultilabelmultilingualdocumentlearningforcrosslingualdocumentretrieval AT huanglan deepmultilabelmultilingualdocumentlearningforcrosslingualdocumentretrieval AT xuhao deepmultilabelmultilingualdocumentlearningforcrosslingualdocumentretrieval AT wangkangping deepmultilabelmultilingualdocumentlearningforcrosslingualdocumentretrieval AT weiwei deepmultilabelmultilingualdocumentlearningforcrosslingualdocumentretrieval AT zhangrui deepmultilabelmultilingualdocumentlearningforcrosslingualdocumentretrieval |