Agritrop
Accueil

Optimal Thompson Sampling strategies for support-aware CVaR bandits

Baudry Dorian, Gautron Romain, Kaufmann Emilie, Maillard Odalric-Ambrym. 2021. Optimal Thompson Sampling strategies for support-aware CVaR bandits. In : Proceedings of Machine Learning Research. Meila Marina (ed.), Zhang Tong (ed.). Cambridge : PMLR, 716-726. (Proceedings of Machine Learning Research, 139) International Conference on Machine Learning. 38, s.l., 18 Juillet 2021/24 Juillet 2021.

Communication avec actes
[img]
Prévisualisation
Version publiée - Anglais
Utilisation soumise à autorisation de l'auteur ou du Cirad.
baudry21a-supp.pdf

Télécharger (11MB) | Prévisualisation

Url - autres données associées : https://github.com/rgautron/DssatBanditEnv

Note générale : Le congrès s'est tenu en ligne

Résumé : In this paper we study a multi-arm bandit problem in which the quality of each arm is measured by the Conditional Value at Risk (CVaR) at some level alpha of the reward distribution. While existing works in this setting mainly focus on Upper Confidence Bound algorithms, we introduce a new Thompson Sampling approach for CVaR bandits on bounded rewards that is flexible enough to solve a variety of problems grounded on physical resources. Building on a recent work by Riou & Honda (2020), we introduce B-CVTS for continuous bounded rewards and M-CVTS for multinomial distributions. On the theoretical side, we provide a non-trivial extension of their analysis that enables to theoretically bound their CVaR regret minimization performance. Strikingly, our results show that these strategies are the first to provably achieve asymptotic optimality in CVaR bandits, matching the corresponding asymptotic lower bounds for this setting. Further, we illustrate empirically the benefit of Thompson Sampling approaches both in a realistic environment simulating a use-case in agriculture and on various synthetic examples.

Mots-clés libres : Bandit algorithm, Risk awareness, Sequential decision making, Machine learning

Auteurs et affiliations

  • Baudry Dorian, CNRS (FRA)
  • Gautron Romain, CIRAD-PERSYST-UPR AIDA (COL)
  • Kaufmann Emilie, CNRS (FRA)
  • Maillard Odalric-Ambrym, INRIA (FRA)

Source : Cirad-Agritrop (https://agritrop.cirad.fr/600903/)

Voir la notice (accès réservé à la Dist) Voir la notice (accès réservé à la Dist)

[ Page générée et mise en cache le 2022-06-21 ]