UCB algorithm

UCB algorithm is the sum of "expected reward" and an "exploration bonus".

UCB1, Auer (2002)

UCB1kt=πˉkt+2logtnkt\text{UCB}1_{kt}=\bar{\pi}_{kt}+\sqrt{\frac{2\log t}{n_{kt}}}

  • Assume the profits of different prices are uncorrelated
  1. 23
  2. 23232

Here the bonus term can be written as αlogtnkt\sqrt{\frac{\alpha\log t}{n_{kt}}} when α=2\alpha=2, and the paper proposed another value to achieve better performance.

Short Summary
Model setup
Modified Algorithms
Some Thoughts
Pricing with Federated Learning
Xuhang Fan, Duke University
Dynamic Online Pricing Using MAB Experiments
7 / 19
2023/01/01