This repository includes implementations and performance reports of several bandit algorithms. The study covers:
2. Upper Confidence Bound (UCB)
5. Linear Thompson Sampling (LinTS)
6. Generalized Linear Model Bandit (GLM)
To simulate a specific algorithm, edit the Simulation.py
script by enabling the desired algorithm and disabling the others.
For example, to run the UCB algorithm with
## Initiate Bandit Algorithms ##
algorithms = {}
#algorithms['EpsilonGreedyLinearBandit'] = EpsilonGreedyLinearBandit(dimension=context_dimension, lambda_=0.1, epsilon=None)
#algorithms['EpsilonGreedyMultiArmedBandit'] = EpsilonGreedyMultiArmedBandit(num_arm=n_articles, epsilon=0.1)
#algorithms['ExplorethenCommit'] = ExplorethenCommit(num_arm=n_articles, m=30)
algorithms['UCBBandit'] = UCBBandit(num_arm=n_articles, alpha=0.5)
#algorithms['ThompsonSamplingGaussianMAB'] = ThompsonSamplingGaussianMAB(num_arm=n_articles)
#algorithms['LinearUCBBandit'] = LinearUCBBandit(dimension=context_dimension, lambda_=0.1, alpha=0.5) #delta=0.05, alpha=2.358
#algorithms['LinearThompsonSamplingMAB'] = LinearThompsonSamplingMAB(dimension=context_dimension, lambda_=0.1)
After selecting your algorithm, run the Simulation.py
script.
Hyperparameter (m) | Cumulative Regret |
---|---|
10 | 1001.40 |
20 | 214.90 |
30 | 334.02 |
data:image/s3,"s3://crabby-images/58262/58262fe22b529f752486d09eb565db907d57fe6c" alt=""
data:image/s3,"s3://crabby-images/5f7a1/5f7a1458f986e6dadd2fa1b3608f89e265d27947" alt=""
data:image/s3,"s3://crabby-images/eef18/eef18f4b2a7c9e17e06657d23c14d204567c202d" alt=""
(a) m = 10 (b) m = 20 (c) m = 30
Figure 1: Explore then Commit accumulated regret
Hyperparameter (α) | Cumulative Regret |
---|---|
0.1 | 256.50 |
0.5 | 977.03 |
1.0 | 1906.65 |
data:image/s3,"s3://crabby-images/861f6/861f6d37efd611592c38d201fa43848f268f0b08" alt=""
data:image/s3,"s3://crabby-images/cd56d/cd56da15a5d6653ed7912c822e1d1886443365c3" alt=""
data:image/s3,"s3://crabby-images/f53db/f53dbbb9baa303a3e18464602f695ec58ace073c" alt=""
(a)
Figure 2: UCB Bandit accumulated regret
Hyperparameter (α) | Cumulative Regret |
---|---|
0.5 | 24.43 |
1.5 | 177.89 |
2.5 | 487.73 |
data:image/s3,"s3://crabby-images/b165b/b165b022ca7bbfb00d5f3a5bcbe99888ab623830" alt=""
data:image/s3,"s3://crabby-images/d53f5/d53f52a83524cf684494aedbfe2ff45f0078f92c" alt=""
data:image/s3,"s3://crabby-images/33f0b/33f0b24e10a36152f7772e353999184e0365bdca" alt=""
(a)
Figure 4: Linear UCB accumulated regret
data:image/s3,"s3://crabby-images/58663/586632d367295b2f736264e737a8dc74151c13e3" alt=""
data:image/s3,"s3://crabby-images/dd249/dd249e34ad6a3058221ca9607cde9b156b9b2d32" alt=""
data:image/s3,"s3://crabby-images/64dd5/64dd5f71e14cdbdf3c15d0cdb7e8159bf53377fd" alt=""
(a)
Figure 5: Linear UCB estimation error
Cumulative Regret |
---|
1098.24 |
data:image/s3,"s3://crabby-images/8e314/8e314daf05c5c123e79bdbab9c9e187ab1ce8337" alt=""
data:image/s3,"s3://crabby-images/6a596/6a596c1bcfbf047cc5db0989fd6fa3d55b4e16dd" alt=""
Figure 6: Linear Thompson Sampling accumulated regret and estimation error