S. Agrawal and N. Goyal, Analysis of thompson sampling for the multi-armed bandit problem, Conference On Learning Theory (COLT), 2012.

S. Agrawal and N. Goyal, Further optimal regret bounds for thompson sampling, Sixteenth International Conference on Artificial Intelligence and Statistics (AISTATS), 2012.

S. Agrawal and N. Goyal, Thompson sampling for contextual bandits with linear payoffs, 30th International Conference on Machine Learning (ICML), 2013.

P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002.
DOI : 10.1023/A:1013689704352

S. Boucheron, G. Lugosi, and P. Massart, Concentration Inequalities, 2013.
DOI : 10.1007/978-1-4757-2440-0
URL : https://hal.archives-ouvertes.fr/hal-00751496

S. Bubeck and C. Liu, A note on the bayesian regret of thompson sampling with an arbitrairy prior, 2013.

O. Cappé, A. Garivier, O. Maillard, R. Munos, and G. Stoltz, Kullback???Leibler upper confidence bounds for optimal sequential allocation, The Annals of Statistics, vol.41, issue.3, 2013.
DOI : 10.1214/13-AOS1119SUPP

A. Garivier and O. Cappé, The kl-ucb algorithm for bounded stochastic bandits and beyond, Conference On Learning Theory (COLT), 2011.

J. Honda and A. Takemura, An asymptotically optimal bandit algorithm for bounded support models, Conference On Learning Theory (COLT), 2010.

E. Kaufmann, N. Korda, and R. Munos, Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis, Algorithmic Learning Theory, pp.199-213, 2012.
DOI : 10.1007/978-3-642-34106-9_18
URL : https://hal.archives-ouvertes.fr/hal-00830033

T. L. Lai and H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985.
DOI : 10.1016/0196-8858(85)90002-8
URL : http://doi.org/10.1016/0196-8858(85)90002-8

B. C. May, N. Korda, A. Lee, and D. Leslie, Optimistic bayesian sampling in contextual bandit problems, Journal of Machine Learning Research, vol.13, pp.2069-2106, 2012.

D. Russo and B. Van-roy, Learning to Optimize via Posterior Sampling, Mathematics of Operations Research, vol.39, issue.4, 2013.
DOI : 10.1287/moor.2014.0650

W. R. Thompson, ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES, Biometrika, vol.25, issue.3-4, pp.285-294, 1933.
DOI : 10.1093/biomet/25.3-4.285

A. Van-der and . Vaart, Asymptotic Statistics, 1998.
DOI : 10.1017/CBO9780511802256

L. Wasserman, All of Statistics: A Concise Course in Statistical Inference, 2010.
DOI : 10.1007/978-0-387-21736-9