Skip to content

Latest commit

 

History

History
56 lines (56 loc) · 2.1 KB

2024-04-18-baudry24a.md

File metadata and controls

56 lines (56 loc) · 2.1 KB
title software abstract layout series publisher issn id month tex_title firstpage lastpage page order cycles bibtex_author author date address container-title volume genre issued pdf extras
Multi-armed bandits with guaranteed revenue per arm
We consider a Multi-Armed Bandit problem with covering constraints, where the primary goal is to ensure that each arm receives a minimum expected reward while maximizing the total cumulative reward. In this scenario, the optimal policy then belongs to some unknown feasible set. Unlike much of the existing literature, we do not assume the presence of a safe policy or a feasibility margin, which hinders the exclusive use of conservative approaches. Consequently, we propose and analyze an algorithm that switches between pessimism and optimism in the face of uncertainty. We prove both precise problem-dependent and problem-independent bounds, demonstrating that our algorithm achieves the best of the two approaches—depending on the presence or absence of a feasibility margin—in terms of constraint violation guarantees. Furthermore, our results indicate that playing greedily on the constraints actually outperforms pessimism when considering long-term violations rather than violations on a per-round basis.
inproceedings
Proceedings of Machine Learning Research
PMLR
2640-3498
baudry24a
0
Multi-armed bandits with guaranteed revenue per arm
379
387
379-387
379
false
Baudry, Dorian and Merlis, Nadav and Benjamin Molina, Mathieu and Richard, Hugo and Perchet, Vianney
given family
Dorian
Baudry
given family
Nadav
Merlis
given family
Mathieu
Benjamin Molina
given family
Hugo
Richard
given family
Vianney
Perchet
2024-04-18
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
238
inproceedings
date-parts
2024
4
18