diff --git a/totrans/gen-dl_08.yaml b/totrans/gen-dl_08.yaml
index 078996c..2cc10e3 100644
--- a/totrans/gen-dl_08.yaml
+++ b/totrans/gen-dl_08.yaml
@@ -815,15 +815,19 @@
   id: totrans-117
   prefs: []
   type: TYPE_NORMAL
+  zh: LSTM单元格维护一个单元格状态，<math alttext="上标 C 下标 t"><msub><mi>C</mi> <mi>t</mi></msub></math>，可以被视为单元格对序列当前状态的内部信念。这与隐藏状态，<math
+    alttext="h 下标 t"><msub><mi>h</mi> <mi>t</mi></msub></math>，是不同的，隐藏状态最终在最后一个时间步输出。单元格状态与隐藏状态相同长度（单元格中的单元数）。
 - en: Let’s look more closely at a single cell and how the hidden state is updated
     ([Figure 5-6](#lstm_cell)).
   id: totrans-118
   prefs: []
   type: TYPE_NORMAL
+  zh: 让我们更仔细地看一下单个单元格以及隐藏状态是如何更新的（[图5-6](#lstm_cell)）。
 - en: 'The hidden state is updated in six steps:'
   id: totrans-119
   prefs: []
   type: TYPE_NORMAL
+  zh: 隐藏状态在六个步骤中更新：
 - en: The hidden state of the previous timestep, <math alttext="h Subscript t minus
     1"><msub><mi>h</mi> <mrow><mi>t</mi><mo>-</mo><mn>1</mn></mrow></msub></math>
     , and the current word embedding, <math alttext="x Subscript t"><msub><mi>x</mi>
@@ -840,15 +844,23 @@
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
+  zh: 上一个时间步的隐藏状态，<math alttext="h 下标 t 减 1"><msub><mi>h</mi> <mrow><mi>t</mi><mo>-</mo><mn>1</mn></mrow></msub></math>，和当前的单词嵌入，<math
+    alttext="x 下标 t"><msub><mi>x</mi> <mi>t</mi></msub></math>，被连接起来并通过*遗忘*门传递。这个门只是一个带有权重矩阵
+    <math alttext="上标 W 下标 f"><msub><mi>W</mi> <mi>f</mi></msub></math>，偏置 <math alttext="b
+    下标 f"><msub><mi>b</mi> <mi>f</mi></msub></math> 和 sigmoid 激活函数的稠密层。得到的向量，<math
+    alttext="f 下标 t"><msub><mi>f</mi> <mi>t</msub></math>，长度等于单元格中的单元数，并包含介于0和1之间的值，确定了应该保留多少先前的单元格状态，<math
+    alttext="上标 C 下标 t 减 1"><msub><mi>C</mi> <mrow><mi>t</mi><mo>-</mo><mn>1</mn></mrow></msub></math>。
 - en: '![](Images/gdl2_0506.png)'
   id: totrans-121
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_0506.png)'
 - en: Figure 5-6\. An LSTM cell
   id: totrans-122
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图5-6\. LSTM单元格
 - en: The concatenated vector is also passed through an *input* gate that, like the
     forget gate, is a dense layer with weights matrix <math alttext="upper W Subscript
     i"><msub><mi>W</mi> <mi>i</mi></msub></math> , bias <math alttext="b Subscript
@@ -862,6 +874,10 @@
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
+  zh: 连接的向量也通过一个*输入*门传递，类似于遗忘门，它是一个带有权重矩阵 <math alttext="上标 W 下标 i"><msub><mi>W</mi>
+    <mi>i</mi></msub></math>，偏置 <math alttext="b 下标 i"><msub><mi>b</mi> <mi>i</msub></math>
+    和 sigmoid 激活函数的稠密层。这个门的输出，<math alttext="i 下标 t"><msub><mi>i</mi> <mi>t</msub></math>，长度等于单元格中的单元数，并包含介于0和1之间的值，确定了新信息将被添加到先前单元格状态，<math
+    alttext="上标 C 下标 t 减 1"><msub><mi>C</mi> <mrow><mi>t</mi><mo>-</mo><mn>1</mn></mrow></msub></math>，的程度。
 - en: The concatenated vector is passed through a dense layer with weights matrix
     <math alttext="upper W Subscript upper C"><msub><mi>W</mi> <mi>C</mi></msub></math>
     , bias <math alttext="b Subscript upper C"><msub><mi>b</mi> <mi>C</mi></msub></math>
@@ -874,6 +890,10 @@
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
+  zh: 连接的向量也通过一个带有权重矩阵 <math alttext="上标 W 上标 C"><msub><mi>W</mi> <mi>C</mi></msub></math>，偏置
+    <math alttext="b 上标 C"><msub><mi>b</mi> <mi>C</mi></msub></math> 和 tanh 激活函数的稠密层，生成一个向量
+    <math alttext="上标 C overTilde 下标 t"><msub><mover accent="true"><mi>C</mi> <mo>˜</mo></mover>
+    <mi>t</msub></math>，其中包含单元格希望考虑保留的新信息。它的长度也等于单元格中的单元数，并包含介于-1和1之间的值。
 - en: <math alttext="f Subscript t"><msub><mi>f</mi> <mi>t</mi></msub></math> and
     <math alttext="upper C Subscript t minus 1"><msub><mi>C</mi> <mrow><mi>t</mi><mo>-</mo><mn>1</mn></mrow></msub></math>
     are multiplied element-wise and added to the element-wise multiplication of <math
@@ -887,6 +907,12 @@
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
+  zh: <math alttext="f 下标 t"><msub><mi>f</mi> <mi>t</mi></msub></math> 和 <math alttext="上标
+    C 下标 t 减 1"><msub><mi>C</mi> <mrow><mi>t</mi><mo>-</mo><mn>1</mn></mrow></msub></math>
+    逐元素相乘并加到 <math alttext="i 下标 t"><msub><mi>i</mi> <mi>t</mi></msub></math> 和 <math
+    alttext="上标 C overTilde 下标 t"><msub><mover accent="true"><mi>C</mi> <mo>˜</mo></mover>
+    <mi>t</mi></msub></math> 的逐元素乘积中。这代表了遗忘先前单元格状态的部分，并添加新的相关信息以生成更新后的单元格状态，<math
+    alttext="上标 C 下标 t"><msub><mi>C</mi> <mi>t</mi></msub></math>。
 - en: 'The concatenated vector is passed through an *output* gate: a dense layer with
     weights matrix <math alttext="upper W Subscript o"><msub><mi>W</mi> <mi>o</mi></msub></math>
     , bias <math alttext="b Subscript o"><msub><mi>b</mi> <mi>o</mi></msub></math>
@@ -1789,16 +1815,19 @@
   id: totrans-262
   prefs: []
   type: TYPE_NORMAL
+  zh: 请注意，这个简化的例子假设是灰度图像（即，只有一个通道）。如果是彩色图像，我们将有三个颜色通道，我们也可以对它们进行排序，例如，红色通道在蓝色通道之前，蓝色通道在绿色通道之前。
 - en: Residual Blocks
   id: totrans-263
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 残差块
 - en: Now that we have seen how to mask the convolutional layer, we can start to build
     our PixelCNN. The core building block that we will use is the residual block.
   id: totrans-264
   prefs: []
   type: TYPE_NORMAL
+  zh: 现在我们已经看到如何对卷积层进行掩码，我们可以开始构建我们的PixelCNN。我们将使用的核心构建块是残差块。
 - en: A *residual block* is a set of layers where the output is added to the input
     before being passed on to the rest of the network. In other words, the input has
     a *fast-track* route to the output, without having to go through the intermediate
@@ -1810,29 +1839,35 @@
   id: totrans-265
   prefs: []
   type: TYPE_NORMAL
+  zh: '*残差块*是一组层，其中输出在传递到网络的其余部分之前添加到输入中。换句话说，输入有一条*快速通道*到输出，而无需经过中间层——这被称为*跳跃连接*。包含跳跃连接的理由是，如果最佳转换只是保持输入不变，这可以通过简单地将中间层的权重置零来实现。如果没有跳跃连接，网络将不得不通过中间层找到一个恒等映射，这要困难得多。'
 - en: A diagram of the residual block in our PixelCNN is shown in [Figure 5-14](#residual_block_pixelcnn).
   id: totrans-266
   prefs: []
   type: TYPE_NORMAL
+  zh: 我们在PixelCNN中的残差块的图示在[图5-14](#residual_block_pixelcnn)中显示。
 - en: '![](Images/gdl2_0514.png)'
   id: totrans-267
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_0514.png)'
 - en: Figure 5-14\. A PixelCNN residual block (the numbers of filters are next to
     the arrows and the filter sizes are next to the layers)
   id: totrans-268
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图5-14。一个PixelCNN残差块（箭头旁边是滤波器的数量，层旁边是滤波器大小）
 - en: We can build a `ResidualBlock` using the code shown in [Example 5-13](#residual_block_code_pixelcnn).
   id: totrans-269
   prefs: []
   type: TYPE_NORMAL
+  zh: 我们可以使用[示例5-13](#residual_block_code_pixelcnn)中显示的代码构建一个`ResidualBlock`。
 - en: Example 5-13\. A `ResidualBlock`
   id: totrans-270
   prefs:
   - PREF_H5
   type: TYPE_NORMAL
+  zh: 示例5-13。一个`ResidualBlock`
 - en: '[PRE12]'
   id: totrans-271
   prefs: []
@@ -1842,43 +1877,52 @@
   id: totrans-272
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![1](Images/1.png)](#co_autoregressive_models_CO6-1)'
 - en: The initial `Conv2D` layer halves the number of channels.
   id: totrans-273
   prefs: []
   type: TYPE_NORMAL
+  zh: 初始的`Conv2D`层将通道数量减半。
 - en: '[![2](Images/2.png)](#co_autoregressive_models_CO6-2)'
   id: totrans-274
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![2](Images/2.png)](#co_autoregressive_models_CO6-2)'
 - en: The Type B `MaskedConv2D` layer with kernel size of 3 only uses information
     from five pixels—three pixels in the row above the focus pixel, one to the left,
     and the focus pixel itself.
   id: totrans-275
   prefs: []
   type: TYPE_NORMAL
+  zh: Type B `MaskedConv2D`层，核大小为3，仅使用来自五个像素的信息——上面一行中的三个像素，左边一个像素和焦点像素本身。
 - en: '[![3](Images/3.png)](#co_autoregressive_models_CO6-3)'
   id: totrans-276
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![3](Images/3.png)](#co_autoregressive_models_CO6-3)'
 - en: The final `Conv2D` layer doubles the number of channels to again match the input
     shape.
   id: totrans-277
   prefs: []
   type: TYPE_NORMAL
+  zh: 最终的`Conv2D`层将通道数量加倍，以再次匹配输入形状。
 - en: '[![4](Images/4.png)](#co_autoregressive_models_CO6-4)'
   id: totrans-278
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![4](Images/4.png)](#co_autoregressive_models_CO6-4)'
 - en: The output from the convolutional layers is added to the input—this is the skip
     connection.
   id: totrans-279
   prefs: []
   type: TYPE_NORMAL
+  zh: 卷积层的输出与输入相加——这是跳跃连接。
 - en: Training the PixelCNN
   id: totrans-280
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 训练PixelCNN
 - en: In [Example 5-14](#pixelcnn_architecture) we put together the whole PixelCNN
     network, approximately following the structure laid out in the original paper.
     In the original paper, the output layer is a 256-filter `Conv2D` layer, with softmax
@@ -1890,6 +1934,7 @@
   id: totrans-281
   prefs: []
   type: TYPE_NORMAL
+  zh: 在[示例5-14](#pixelcnn_architecture)中，我们组合了整个PixelCNN网络，大致遵循原始论文中的结构。在原始论文中，输出层是一个有256个滤波器的`Conv2D`层，使用softmax激活。换句话说，网络试图通过预测正确的像素值来重新创建其输入，有点像自动编码器。不同之处在于，PixelCNN受到限制，以便不允许来自早期像素的信息流通过影响每个像素的预测，这是由于网络设计方式，使用`MaskedConv2D`层。
 - en: A challenge with this approach is that the network has no way to understand
     that a pixel value of, say, 200 is very close to a pixel value of 201\. It must
     learn every pixel output value independently, which means training can be very
@@ -1899,11 +1944,13 @@
   id: totrans-282
   prefs: []
   type: TYPE_NORMAL
+  zh: 这种方法的一个挑战是网络无法理解，比如说，像素值200非常接近像素值201。它必须独立学习每个像素输出值，这意味着即使对于最简单的数据集，训练也可能非常缓慢。因此，在我们的实现中，我们简化输入，使每个像素只能取四个值之一。这样，我们可以使用一个有4个滤波器的`Conv2D`输出层，而不是256个。
 - en: Example 5-14\. The PixelCNN architecture
   id: totrans-283
   prefs:
   - PREF_H5
   type: TYPE_NORMAL
+  zh: 示例5-14。PixelCNN架构
 - en: '[PRE13]'
   id: totrans-284
   prefs: []
@@ -1913,42 +1960,51 @@
   id: totrans-285
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![1](Images/1.png)](#co_autoregressive_models_CO7-1)'
 - en: The model `Input` is a grayscale image of size 16 × 16 × 1, with inputs scaled
     between 0 and 1.
   id: totrans-286
   prefs: []
   type: TYPE_NORMAL
+  zh: 模型的`Input`是一个尺寸为16×16×1的灰度图像，输入值在0到1之间缩放。
 - en: '[![2](Images/2.png)](#co_autoregressive_models_CO7-2)'
   id: totrans-287
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![2](Images/2.png)](#co_autoregressive_models_CO7-2)'
 - en: The first Type A `MaskedConv2D` layer with a kernel size of 7 uses information
     from 24 pixels—21 pixels in the three rows above the focus pixel and 3 to the
     left (the focus pixel itself is not used).
   id: totrans-288
   prefs: []
   type: TYPE_NORMAL
+  zh: 第一个Type A `MaskedConv2D`层，核大小为7，使用来自24个像素的信息——在焦点像素上面的三行中的21个像素和左边的3个像素（焦点像素本身不使用）。
 - en: '[![3](Images/3.png)](#co_autoregressive_models_CO7-3)'
   id: totrans-289
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![3](Images/3.png)](#co_autoregressive_models_CO7-3)'
 - en: Five `ResidualBlock` layer groups are stacked sequentially.
   id: totrans-290
   prefs: []
   type: TYPE_NORMAL
+  zh: 五个`ResidualBlock`层组被顺序堆叠。
 - en: '[![4](Images/4.png)](#co_autoregressive_models_CO7-4)'
   id: totrans-291
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![4](Images/4.png)](#co_autoregressive_models_CO7-4)'
 - en: Two Type B `MaskedConv2D` layers with a kernel size of 1 act as `Dense` layers
     across the number of channels for each pixel.
   id: totrans-292
   prefs: []
   type: TYPE_NORMAL
+  zh: 两个Type B `MaskedConv2D`层，核大小为1，作为每个像素通道数量的`Dense`层。
 - en: '[![5](Images/5.png)](#co_autoregressive_models_CO7-5)'
   id: totrans-293
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![5](Images/5.png)](#co_autoregressive_models_CO7-5)'
 - en: The final `Conv2D` layer reduces the number of channels to four—the number of
     pixel levels for this example.
   id: totrans-294