Improved confidence bounds for the linear logistic model and applications to bandits
Published in International Conference on Machine Learning, 2021
Recommended citation: Jun, K. S., Jain, L., Mason, B., & Nassif, H. (2021, July). Improved confidence bounds for the linear logistic model and applications to bandits. In International Conference on Machine Learning (pp. 5148-5157). PMLR. http://proceedings.mlr.press/v139/jun21a/jun21a.pdf
We propose improved fixed-design confidence bounds for the linear logistic model. Our bounds significantly improve upon the state-of-the-art bound by Li et al.(2017) via recent developments of the self-concordant analysis of the logistic loss (Faury et al., 2020). Specifically, our confidence bound avoids a direct dependence on 1/kappa, where kappa is the minimal variance over all arms’ reward distributions. In general, 1/kappa scales exponentially with the norm of the unknown linear parameter theta. Instead of relying on this worst case quantity, our confidence bound for the reward of any given arm depends directly on the variance of that arm’s reward distribution. We present two applications of our novel bounds to pure exploration and regret minimization logistic bandits improving upon state-of-the-art performance guarantees. For pure exploration we also provide a lower bound highlighting a dependence on 1/kappa for a family of instances.
Recommended citation: Jun, K. S., Jain, L., Mason, B., & Nassif, H. (2021, July). Improved confidence bounds for the linear logistic model and applications to bandits. In International Conference on Machine Learning (pp. 5148-5157). PMLR.