Criterion of performance
Use maxmin regret: minimizing the maximum deviation to optimal profit.

Regret(Ψ,{π(pk)},t)=E[τ=1tππpτ]=τ=1t(ππ(pτ))=πtk=1Kπ(pk)E[nkt]\begin{aligned} \mathrm{Regret}(\Psi,\{\pi(p_{\mathbb{k}})\},t)& =\mathbb{E}[\sum_{\tau=1}^{t}\pi^*-\pi_{p_\tau}] \\ &=\sum_{\tau=1}^t(\pi^*-\pi(p_\tau)) \\ &=\pi^*t-\sum_{k=1}^K\pi(p_k)\mathbb{E}[n_{kt}] \end{aligned}

Short Summary
Model setup
Modified Algorithms
Some Thoughts
Pricing with Federated Learning
Xuhang Fan, Duke University
Dynamic Online Pricing Using MAB Experiments
6 / 19
2023/01/01