My Blog Nuts and Bolts of RL Multi-Armed Bandits and how to solve them Policy Gradient theorem REINFORCE what you've learnt