UCB with Partial Identification (PI)

UCB-PI and UCB-PI-untuned

UCB-PI-untunedkt={πˉkt+pk2logtnkt if UBt(π(pk))>maxlLBt(π(pl)),0 if UBt(π(pk))maxlLBt(π(pl)).UCB-PI-tunedkt={πˉkt+2pkδ^logtnktmin(14, Vkt) if UBt(π(pk))>maxlLBt(π(pl)),0 if UBt(π(pk))maxlLBt(π(pl)).\begin{align} \text{UCB-PI-untuned}_{k t}&=\{\begin{array}{ll} \bar{\pi}_{k t}+p_{k} \sqrt{\frac{2 \log t}{n_{k t}}} & \text { if } U B_{t}(\pi(p_{k}))>\max _{l} L B_{t}(\pi(p_{l})), \\ 0 & \text { if } U B_{t}(\pi(p_{k})) \leq \max _{l} L B_{t}(\pi(p_{l})) . \end{array} \\ \text{UCB-PI-tuned}_{k t}&=\{\begin{aligned} \bar{\pi}_{k t}+ & 2 p_{k} \hat{\delta} \sqrt{\frac{\log t}{n_{k t}} \min (\frac{1}{4}, \mathrm{~V}_{k t})} & \text{ if } U B_{t}(\pi(p_{k}))>\max _{l} L B_{t}(\pi(p_{l})), \\ 0 \quad & \text{ if } U B_{t}(\pi(p_{k})) \leq \max _{l} L B_{t}(\pi(p_{l})) . \end{aligned} \end{align}

Improvements in UCB-PI-untuned:

  • assign an action a zero value if the upper bound of potential returns is lower than the highest lower bound across all actions. Economically, we do not explore dominated options (the optimal return is still lower than the other lower returns).
  • scale the exploration bonus by price pkp_k since the original reward in UCB1 is [0,1][0, 1] but now [0,pk][0, p_k] for dummy demand.

Improvements in UCB-PI-tuned: add an additional tuning factor 2δ^2\hat{\delta}: size of uncertainty ↑ exploration bonus↑

Short Summary
Model setup
Modified Algorithms
Some Thoughts
Pricing with Federated Learning
Xuhang Fan, Duke University
Dynamic Online Pricing Using MAB Experiments
12 / 19
2023/01/01