Covariance-Adaptive Least-Squares Algorithm for Stochastic Combinatorial Semi-Bandits - Apprentissage de modèles visuels à partir de données massives Accéder directement au contenu
Pré-Publication, Document De Travail (Preprint/Prepublication) Année : 2024

Covariance-Adaptive Least-Squares Algorithm for Stochastic Combinatorial Semi-Bandits

Résumé

We address the problem of stochastic combinatorial semi-bandits, where a player can select from P subsets of a set containing d base items. Most existing algorithms (e.g. CUCB, ESCB, OLS-UCB) require prior knowledge on the reward distribution, like an upper bound on a sub-Gaussian proxy-variance, which is hard to estimate tightly. In this work, we design a variance-adaptive version of OLS-UCB, relying on an online estimation of the covariance structure. Estimating the coefficients of a covariance matrix is much more manageable in practical settings and results in improved regret upper bounds compared to proxy variance-based algorithms. When covariance coefficients are all non-negative, we show that our approach efficiently leverages the semi-bandit feedback and provably outperforms bandit feedback approaches, not only in exponential regimes where P ≫ d but also when P ≤ d, which is not straightforward from most existing analyses.
Fichier principal
Vignette du fichier
main_hal.pdf (736.67 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04470568 , version 1 (22-02-2024)

Licence

Paternité

Identifiants

  • HAL Id : hal-04470568 , version 1

Citer

Julien Zhou, Pierre Gaillard, Thibaud Rahier, Houssam Zenati, Julyan Arbel. Covariance-Adaptive Least-Squares Algorithm for Stochastic Combinatorial Semi-Bandits. 2024. ⟨hal-04470568⟩
71 Consultations
63 Téléchargements

Partager

Gmail Facebook X LinkedIn More