Improved Sample Complexity Bounds for Distributionally Robust Reinforcement Learning

This repository is a study of interesting characteristics of the robust value iteration algorithm. In particular, we introduce RPVL in our paper which considers finite-horizon tabular MDP. Gambler's problem (or Gambler's Ruin) is a simple yet nice example for us to characterize the behavior of an optimal robust policy.