Covariance-Adaptive Least-Squares Algorithm for Stochastic Combinatorial Semi-Bandits

We address the problem of stochastic combinatorial semi-bandits, where a player can select from P subsets of a set containing d base items. Most existing algorithms (e.g. CUCB, ESCB, OLS-UCB) require prior knowledge on the reward distribution, like an upper bound on a sub-Gaussian proxy-variance, which is hard to estimate tightly. In this work, we design a variance-adaptive version of OLS-UCB, relying on an online estimation of the covariance structure. Estimating the coefficients of a covariance matrix is much more manageable in practical settings and results in improved regret upper bounds compared to proxy variance-based algorithms. When covariance coefficients are all non-negative, we show that our approach efficiently leverages the semi-bandit feedback and provably outperforms bandit feedback approaches, not only in exponential regimes where P ≫ d but also when P ≤ d, which is not straightforward from most existing analyses.

Mots clés

Covariance-Adaptive Bandits Stochastic Combinatorial Semi-Bandit Confidence Ellipsoid

Domaines

Machine Learning [stat.ML] Apprentissage [cs.LG] Statistiques [math.ST]

Fichier principal

main_hal.pdf (736.67 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Julien Zhou : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04470568

Soumis le : jeudi 22 février 2024-12:07:55

Dernière modification le : vendredi 26 avril 2024-13:52:29

Dates et versions

hal-04470568 , version 1 (22-02-2024)

Licence

Paternité

Identifiants

HAL Id : hal-04470568 , version 1

Citer

Julien Zhou, Pierre Gaillard, Thibaud Rahier, Houssam Zenati, Julyan Arbel. Covariance-Adaptive Least-Squares Algorithm for Stochastic Combinatorial Semi-Bandits. 2024. ⟨hal-04470568⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CEA UGA CNRS INRIA INSMI LJK LJK_GI LJK_PS INRIA2 LJK-GI-THOTH LJK-PS-STATIFY GS-COMPUTER-SCIENCE

71 Consultations

63 Téléchargements