Skip to content

Commit

Permalink
Update codes
Browse files Browse the repository at this point in the history
  • Loading branch information
ZhiqingXiao committed Nov 26, 2023
1 parent bb5d7a0 commit 80c6443
Show file tree
Hide file tree
Showing 9 changed files with 38 additions and 12 deletions.
2 changes: 1 addition & 1 deletion en2023/code/PongNoFrameskip-v4_CategoricalDQN_tf.html
Original file line number Diff line number Diff line change
Expand Up @@ -14826,7 +14826,7 @@ <h1 id="Use-Categorical-DQN-to-Play-Pong">Use Categorical DQN to Play Pong<a cla

<span class="k">def</span> <span class="nf">build_net</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">action_n</span><span class="p">,</span> <span class="n">atom_count</span><span class="p">):</span>
<span class="n">net</span> <span class="o">=</span> <span class="n">keras</span><span class="o">.</span><span class="n">Sequential</span><span class="p">([</span>
<span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Permute</span><span class="p">((</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">84</span><span class="p">,</span> <span class="mi">84</span><span class="p">)),</span>
<span class="n">layers</span><span class="o">.</span><span class="n">Permute</span><span class="p">((</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">84</span><span class="p">,</span> <span class="mi">84</span><span class="p">)),</span>
<span class="n">layers</span><span class="o">.</span><span class="n">Conv2D</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">8</span><span class="p">,</span> <span class="n">strides</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="n">nn</span><span class="o">.</span><span class="n">relu</span><span class="p">),</span>
<span class="n">layers</span><span class="o">.</span><span class="n">Conv2D</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">strides</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="n">nn</span><span class="o">.</span><span class="n">relu</span><span class="p">),</span>
<span class="n">layers</span><span class="o">.</span><span class="n">Conv2D</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">strides</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="n">nn</span><span class="o">.</span><span class="n">relu</span><span class="p">),</span>
Expand Down
2 changes: 1 addition & 1 deletion en2023/code/PongNoFrameskip-v4_CategoricalDQN_tf.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,7 @@
"\n",
" def build_net(self, action_n, atom_count):\n",
" net = keras.Sequential([\n",
" keras.layers.Permute((2, 3, 1), input_shape=(4, 84, 84)),\n",
" layers.Permute((2, 3, 1), input_shape=(4, 84, 84)),\n",
" layers.Conv2D(32, kernel_size=8, strides=4, activation=nn.relu),\n",
" layers.Conv2D(64, kernel_size=4, strides=2, activation=nn.relu),\n",
" layers.Conv2D(64, kernel_size=3, strides=1, activation=nn.relu),\n",
Expand Down
2 changes: 1 addition & 1 deletion en2023/code/PongNoFrameskip-v4_IQN_tf.html
Original file line number Diff line number Diff line change
Expand Up @@ -14812,7 +14812,7 @@ <h1 id="Use-Implict-Quantile-Network-to-Play-Pong">Use Implict Quantile Network
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">cosine_count</span> <span class="o">=</span> <span class="n">cosine_count</span>
<span class="bp">self</span><span class="o">.</span><span class="n">conv</span> <span class="o">=</span> <span class="n">keras</span><span class="o">.</span><span class="n">Sequential</span><span class="p">([</span>
<span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Permute</span><span class="p">((</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">84</span><span class="p">,</span> <span class="mi">84</span><span class="p">)),</span>
<span class="n">layers</span><span class="o">.</span><span class="n">Permute</span><span class="p">((</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">84</span><span class="p">,</span> <span class="mi">84</span><span class="p">)),</span>
<span class="n">layers</span><span class="o">.</span><span class="n">Conv2D</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">8</span><span class="p">,</span> <span class="n">strides</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="n">nn</span><span class="o">.</span><span class="n">relu</span><span class="p">),</span>
<span class="n">layers</span><span class="o">.</span><span class="n">Conv2D</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">strides</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="n">nn</span><span class="o">.</span><span class="n">relu</span><span class="p">),</span>
<span class="n">layers</span><span class="o">.</span><span class="n">Conv2D</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">strides</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="n">nn</span><span class="o">.</span><span class="n">relu</span><span class="p">),</span>
Expand Down
2 changes: 1 addition & 1 deletion en2023/code/PongNoFrameskip-v4_IQN_tf.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@
" super().__init__()\n",
" self.cosine_count = cosine_count\n",
" self.conv = keras.Sequential([\n",
" keras.layers.Permute((2, 3, 1), input_shape=(4, 84, 84)),\n",
" layers.Permute((2, 3, 1), input_shape=(4, 84, 84)),\n",
" layers.Conv2D(32, kernel_size=8, strides=4, activation=nn.relu),\n",
" layers.Conv2D(64, kernel_size=4, strides=2, activation=nn.relu),\n",
" layers.Conv2D(64, kernel_size=3, strides=1, activation=nn.relu),\n",
Expand Down
2 changes: 1 addition & 1 deletion en2023/code/PongNoFrameskip-v4_QRDQN_tf.html
Original file line number Diff line number Diff line change
Expand Up @@ -14824,7 +14824,7 @@ <h1 id="Use-QR-DQN-to-Play-Pong">Use QR-DQN to Play Pong<a class="anchor-link" h

<span class="k">def</span> <span class="nf">build_net</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">action_n</span><span class="p">,</span> <span class="n">quantile_count</span><span class="p">):</span>
<span class="n">net</span> <span class="o">=</span> <span class="n">keras</span><span class="o">.</span><span class="n">Sequential</span><span class="p">([</span>
<span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Permute</span><span class="p">((</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">84</span><span class="p">,</span> <span class="mi">84</span><span class="p">)),</span>
<span class="n">layers</span><span class="o">.</span><span class="n">Permute</span><span class="p">((</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">84</span><span class="p">,</span> <span class="mi">84</span><span class="p">)),</span>
<span class="n">layers</span><span class="o">.</span><span class="n">Conv2D</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">8</span><span class="p">,</span> <span class="n">strides</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="n">nn</span><span class="o">.</span><span class="n">relu</span><span class="p">),</span>
<span class="n">layers</span><span class="o">.</span><span class="n">Conv2D</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">strides</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="n">nn</span><span class="o">.</span><span class="n">relu</span><span class="p">),</span>
<span class="n">layers</span><span class="o">.</span><span class="n">Conv2D</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">strides</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="n">nn</span><span class="o">.</span><span class="n">relu</span><span class="p">),</span>
Expand Down
2 changes: 1 addition & 1 deletion en2023/code/PongNoFrameskip-v4_QRDQN_tf.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,7 @@
"\n",
" def build_net(self, action_n, quantile_count):\n",
" net = keras.Sequential([\n",
" keras.layers.Permute((2, 3, 1), input_shape=(4, 84, 84)),\n",
" layers.Permute((2, 3, 1), input_shape=(4, 84, 84)),\n",
" layers.Conv2D(32, kernel_size=8, strides=4, activation=nn.relu),\n",
" layers.Conv2D(64, kernel_size=4, strides=2, activation=nn.relu),\n",
" layers.Conv2D(64, kernel_size=3, strides=1, activation=nn.relu),\n",
Expand Down
2 changes: 1 addition & 1 deletion en2023/notation.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion en2023/notation_zh.html

Large diffs are not rendered by default.

34 changes: 30 additions & 4 deletions zh2023/errata/202307.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,22 +32,31 @@ $\sum\limits_{t=0}^{+\infty}$

## 第42页第3-4行

$p_\ast\left(\mathsfit{s'},\mathsfit{a'}|\mathsfit{s},\mathsfit{a}\right)=\sum\limits_{\mathsfit{a'}}{\pi_\ast\left(\mathsfit{a'}\mid\mathsfit{s'} \right)\sum\limits_\mathsfit{a}{p\left(\mathsfit{s'}\mid\mathsfit{s},\mathsfit{a}\right)}}$,
$p_\ast\left(\mathsfit{s'},\mathsfit{a'}\middle\vert\mathsfit{s},\mathsfit{a}\right)=\sum\limits_{\mathsfit{a'}}{\pi_\ast\left(\mathsfit{a'}\middle\vert\mathsfit{s'} \right)\sum\limits_\mathsfit{a}{p\left(\mathsfit{s'}\middle\vert\mathsfit{s},\mathsfit{a}\right)}}$,

$\mathsfit{s}\in\mathcal{S},\mathsfit{a}\in\mathcal{A}\left(\mathsfit{s}\right),\mathsfit{s'}\in\mathcal{S},\mathsfit{a}\in\mathcal{A}\left(\mathsfit{s'}\right)$

#### 改为

$p_\ast\left({\mathsfit{s'},\mathsfit{a'}|\mathsfit{s},\mathsfit{a}}\right)=\pi_\ast\left(\mathsfit{a'}\mid\mathsfit{s'}\right)p\left( \mathsfit{s'}\mid\mathsfit{s},\mathsfit{a}\right),\quad\mathsfit{s}\in\mathcal{S},\mathsfit{a}\in\mathcal{A}\left(\mathsfit{s}\right),\mathsfit{s'}\in\mathcal{S},\mathsfit{a'}\in\mathcal{A}\left(\mathsfit{s'}\right)$
$p_\ast\left({\mathsfit{s'},\mathsfit{a'}|\mathsfit{s},\mathsfit{a}}\right)=\pi_\ast\left(\mathsfit{a'}\middle\vert\mathsfit{s'}\right)p\left( \mathsfit{s'}\mid\mathsfit{s},\mathsfit{a}\right),\quad\mathsfit{s}\in\mathcal{S},\mathsfit{a}\in\mathcal{A}\left(\mathsfit{s}\right),\mathsfit{s'}\in\mathcal{S},\mathsfit{a'}\in\mathcal{A}\left(\mathsfit{s'}\right)$


## 第80页倒数第10行

$\alpha _k\mathrm{E}\left[\left|F{\left(X_ {k-1}\right)}^2\right|\middle\vert{X}_ {k-1}\right]$

#### 改为

$\alpha _k\mathrm{E}\left[\left|F\left(X_ {k-1}\right)\right|^2\middle\vert{X}_ {k-1}\right]$


## 第117页最后一个通栏数学表达式

$\rho_{t+1:t+n-1}=\frac{\Pr_\pi\left[R_{t+1},\mathsfit{S}_{t+1},\mathsfit{A}_{t+1},\ldots,\mathsfit{S}_{t+n}\mid\mathsfit{S}_t\right]}{\Pr_b\left[R_{t+1},\mathsfit{S}_{t+1},\mathsfit{A}_{t+1},\ldots,\mathsfit{S}_{t+n}\mid\mathsfit{S}_t\right]}=\prod\limits_{\tau=t+1}^{t+n-1}{\frac{\pi\left(\mathsfit{A}_\tau\mid\mathsfit{S}_\tau\right)}{b\left(\mathsfit{A}_\tau\mid\mathsfit{S}_\tau\right)}}$
$\rho_{t+1:t+n-1}=\frac{\Pr_\pi\left[R_{t+1},\mathsfit{S}_{t+1},\mathsfit{A}_{t+1},\ldots,\mathsfit{S}_{t+n}\middle\vert\mathsfit{S}_t\right]}{\Pr_b\left[R_{t+1},\mathsfit{S}_{t+1},\mathsfit{A}_{t+1},\ldots,\mathsfit{S}_{t+n}\middle\vert\mathsfit{S}_t\right]}=\prod\limits_{\tau=t+1}^{t+n-1}{\frac{\pi\left(\mathsfit{A}_\tau\mid\mathsfit{S}_\tau\right)}{b\left(\mathsfit{A}_\tau\mid\mathsfit{S}_\tau\right)}}$

#### 改为

$\rho_{t+1:t+n-1}=\frac{\Pr_\pi\left[R_{t+1},\mathsfit{S}_{t+1},\mathsfit{A}_{t+1},\ldots,\mathsfit{S}_{t+n}\mid\mathsfit{S}_t,\mathsfit{A}_t\right]}{\Pr_b\left[R_{t+1},\mathsfit{S}_{t+1},\mathsfit{A}_{t+1},\ldots,\mathsfit{S}_{t+n}\mid\mathsfit{S}_t,\mathsfit{A}_t\right]}=\prod\limits_{\tau=t+1}^{t+n-1}{\frac{\pi\left(\mathsfit{A}_\tau\mid\mathsfit{S}_\tau\right)}{b\left(\mathsfit{A}_\tau\mid\mathsfit{S}_\tau\right)}}$
$\rho_{t+1:t+n-1}=\frac{\Pr_\pi\left[R_{t+1},\mathsfit{S}_{t+1},\mathsfit{A}_{t+1},\ldots,\mathsfit{S}_{t+n}\mid\mathsfit{S}_t,\mathsfit{A}_t\right]}{\Pr_b\left[R_{t+1},\mathsfit{S}_{t+1},\mathsfit{A}_{t+1},\ldots,\mathsfit{S}_{t+n}\mid\mathsfit{S}_t,\mathsfit{A}_t\right]}=\prod\limits_{\tau=t+1}^{t+n-1}{\frac{\pi\left(\mathsfit{A}_\tau\middle\vert\mathsfit{S}_\tau\right)}{b\left(\mathsfit{A}_\tau\middle\vert\mathsfit{S}_\tau\right)}}$


## 第177页最后一行
Expand All @@ -59,6 +68,23 @@ $\gamma^2\mathrm{E}_{\pi\left(\boldsymbol\theta\right)}\left[\nabla{v_{\pi\left(
$\gamma^2\mathrm{E}_{\pi\left(\boldsymbol\theta\right)}\left[\nabla{v_{\pi\left(\boldsymbol\theta\right)}}\left(\mathsfit{S}_2\right)\right]$


## 第279页第0行

$\gamma\sum\limits_\mathsfit{s'}{p_{\pi\left(\boldsymbol\theta\right)}\left(\mathsfit{s'}\middle\vert\mathsfit{s}\right)\nabla v_{\pi\left(\boldsymbol\theta\right)}^\left(\mathrm{H}\right)\left(\mathsfit{s}\right)}$

### 改为

$\gamma\sum\limits_\mathsfit{s'}{p_{\pi\left(\boldsymbol\theta\right)}\left(\mathsfit{s'}\middle\vert\mathsfit{s}\right)\nabla v_{\pi\left(\boldsymbol\theta\right)}^\left(\mathrm{H}\right)\left(\mathsfit{s'}\right)}$


## 第279页第2~3行和第6行(共2处)

$\mathrm{E}_ {\pi\left(\boldsymbol\theta\right)}\left[\sum\limits_ \mathsfit{a}q_{\pi\left(\boldsymbol\theta\right)}^\left(\mathrm{H}\right)\left(\mathsfit{S}_ t,\mathsfit{a}\right)\nabla\pi\left(\mathsfit{a}\middle\vert{\mathsfit{S}_ t};\boldsymbol\theta\right)\right]+\nabla\left(\alpha^\left(\mathrm{H}\right)\mathrm{H}\left[\pi\left(\cdot\middle\vert\mathsfit{S}_ t;\boldsymbol\theta\right)\right]\right)$

### 改为

$\mathrm{E}_ {\pi\left(\boldsymbol\theta\right)}\left[\sum\limits_ \mathsfit{a}q_{\pi\left(\boldsymbol\theta\right)}^\left(\mathrm{H}\right)\left(\mathsfit{S}_ t,\mathsfit{a}\right)\nabla\pi\left(\mathsfit{a}\middle\vert{\mathsfit{S}_ t};\boldsymbol\theta\right)+\nabla\left(\alpha^\left(\mathrm{H}\right)\mathrm{H}\left[\pi\left(\cdot\middle\vert\mathsfit{S}_ t;\boldsymbol\theta\right)\right]\right)\right]$

## 第288页代码10-2

```python
Expand Down

0 comments on commit 80c6443

Please sign in to comment.