Cargando…

Distributionally robust learning-to-rank under the Wasserstein metric

Despite their satisfactory performance, most existing listwise Learning-To-Rank (LTR) models do not consider the crucial issue of robustness. A data set can be contaminated in various ways, including human error in labeling or annotation, distributional data shift, and malicious adversaries who wish...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sotudian, Shahabeddin, Chen, Ruidi, Paschalidis, Ioannis Ch.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2023
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10062629/ https://www.ncbi.nlm.nih.gov/pubmed/36996130 http://dx.doi.org/10.1371/journal.pone.0283574

_version_	1785017536987791360
author	Sotudian, Shahabeddin Chen, Ruidi Paschalidis, Ioannis Ch.
author_facet	Sotudian, Shahabeddin Chen, Ruidi Paschalidis, Ioannis Ch.
author_sort	Sotudian, Shahabeddin
collection	PubMed
description	Despite their satisfactory performance, most existing listwise Learning-To-Rank (LTR) models do not consider the crucial issue of robustness. A data set can be contaminated in various ways, including human error in labeling or annotation, distributional data shift, and malicious adversaries who wish to degrade the algorithm’s performance. It has been shown that Distributionally Robust Optimization (DRO) is resilient against various types of noise and perturbations. To fill this gap, we introduce a new listwise LTR model called Distributionally Robust Multi-output Regression Ranking (DRMRR). Different from existing methods, the scoring function of DRMRR was designed as a multivariate mapping from a feature vector to a vector of deviation scores, which captures local context information and cross-document interactions. In this way, we are able to incorporate the LTR metrics into our model. DRMRR uses a Wasserstein DRO framework to minimize a multi-output loss function under the most adverse distributions in the neighborhood of the empirical data distribution defined by a Wasserstein ball. We present a compact and computationally solvable reformulation of the min-max formulation of DRMRR. Our experiments were conducted on two real-world applications: medical document retrieval and drug response prediction, showing that DRMRR notably outperforms state-of-the-art LTR models. We also conducted an extensive analysis to examine the resilience of DRMRR against various types of noise: Gaussian noise, adversarial perturbations, and label poisoning. Accordingly, DRMRR is not only able to achieve significantly better performance than other baselines, but it can maintain a relatively stable performance as more noise is added to the data.
format	Online Article Text
id	pubmed-10062629
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-100626292023-03-31 Distributionally robust learning-to-rank under the Wasserstein metric Sotudian, Shahabeddin Chen, Ruidi Paschalidis, Ioannis Ch. PLoS One Research Article Despite their satisfactory performance, most existing listwise Learning-To-Rank (LTR) models do not consider the crucial issue of robustness. A data set can be contaminated in various ways, including human error in labeling or annotation, distributional data shift, and malicious adversaries who wish to degrade the algorithm’s performance. It has been shown that Distributionally Robust Optimization (DRO) is resilient against various types of noise and perturbations. To fill this gap, we introduce a new listwise LTR model called Distributionally Robust Multi-output Regression Ranking (DRMRR). Different from existing methods, the scoring function of DRMRR was designed as a multivariate mapping from a feature vector to a vector of deviation scores, which captures local context information and cross-document interactions. In this way, we are able to incorporate the LTR metrics into our model. DRMRR uses a Wasserstein DRO framework to minimize a multi-output loss function under the most adverse distributions in the neighborhood of the empirical data distribution defined by a Wasserstein ball. We present a compact and computationally solvable reformulation of the min-max formulation of DRMRR. Our experiments were conducted on two real-world applications: medical document retrieval and drug response prediction, showing that DRMRR notably outperforms state-of-the-art LTR models. We also conducted an extensive analysis to examine the resilience of DRMRR against various types of noise: Gaussian noise, adversarial perturbations, and label poisoning. Accordingly, DRMRR is not only able to achieve significantly better performance than other baselines, but it can maintain a relatively stable performance as more noise is added to the data. Public Library of Science 2023-03-30 /pmc/articles/PMC10062629/ /pubmed/36996130 http://dx.doi.org/10.1371/journal.pone.0283574 Text en © 2023 Sotudian et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Sotudian, Shahabeddin Chen, Ruidi Paschalidis, Ioannis Ch. Distributionally robust learning-to-rank under the Wasserstein metric
title	Distributionally robust learning-to-rank under the Wasserstein metric
title_full	Distributionally robust learning-to-rank under the Wasserstein metric
title_fullStr	Distributionally robust learning-to-rank under the Wasserstein metric
title_full_unstemmed	Distributionally robust learning-to-rank under the Wasserstein metric
title_short	Distributionally robust learning-to-rank under the Wasserstein metric
title_sort	distributionally robust learning-to-rank under the wasserstein metric
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10062629/ https://www.ncbi.nlm.nih.gov/pubmed/36996130 http://dx.doi.org/10.1371/journal.pone.0283574
work_keys_str_mv	AT sotudianshahabeddin distributionallyrobustlearningtorankunderthewassersteinmetric AT chenruidi distributionallyrobustlearningtorankunderthewassersteinmetric AT paschalidisioannisch distributionallyrobustlearningtorankunderthewassersteinmetric

Distributionally robust learning-to-rank under the Wasserstein metric

Ejemplares similares