From e61b1bfdaa85574169f46cd88ab94c3b70dbf06a Mon Sep 17 00:00:00 2001 From: wizardforcel <562826179@qq.com> Date: Thu, 8 Feb 2024 19:05:20 +0800 Subject: [PATCH] 2024-02-08 19:05:18 --- totrans/gen-dl_08.yaml | 56 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+) diff --git a/totrans/gen-dl_08.yaml b/totrans/gen-dl_08.yaml index 078996c..2cc10e3 100644 --- a/totrans/gen-dl_08.yaml +++ b/totrans/gen-dl_08.yaml @@ -815,15 +815,19 @@ id: totrans-117 prefs: [] type: TYPE_NORMAL + zh: LSTM单元格维护一个单元格状态,C t,可以被视为单元格对序列当前状态的内部信念。这与隐藏状态,h t,是不同的,隐藏状态最终在最后一个时间步输出。单元格状态与隐藏状态相同长度(单元格中的单元数)。 - en: Let’s look more closely at a single cell and how the hidden state is updated ([Figure 5-6](#lstm_cell)). id: totrans-118 prefs: [] type: TYPE_NORMAL + zh: 让我们更仔细地看一下单个单元格以及隐藏状态是如何更新的([图5-6](#lstm_cell))。 - en: 'The hidden state is updated in six steps:' id: totrans-119 prefs: [] type: TYPE_NORMAL + zh: 隐藏状态在六个步骤中更新: - en: The hidden state of the previous timestep, h t-1 , and the current word embedding, x @@ -840,15 +844,23 @@ prefs: - PREF_OL type: TYPE_NORMAL + zh: 上一个时间步的隐藏状态,h t-1,和当前的单词嵌入,x t,被连接起来并通过*遗忘*门传递。这个门只是一个带有权重矩阵 + W f,偏置 b f 和 sigmoid 激活函数的稠密层。得到的向量,f t,长度等于单元格中的单元数,并包含介于0和1之间的值,确定了应该保留多少先前的单元格状态,C t-1。 - en: '![](Images/gdl2_0506.png)' id: totrans-121 prefs: [] type: TYPE_IMG + zh: '![](Images/gdl2_0506.png)' - en: Figure 5-6\. An LSTM cell id: totrans-122 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 图5-6\. LSTM单元格 - en: The concatenated vector is also passed through an *input* gate that, like the forget gate, is a dense layer with weights matrix W i , bias W + i,偏置 b i + 和 sigmoid 激活函数的稠密层。这个门的输出,i t,长度等于单元格中的单元数,并包含介于0和1之间的值,确定了新信息将被添加到先前单元格状态,C t-1,的程度。 - en: The concatenated vector is passed through a dense layer with weights matrix W C , bias b C @@ -874,6 +890,10 @@ prefs: - PREF_OL type: TYPE_NORMAL + zh: 连接的向量也通过一个带有权重矩阵 W C,偏置 + b C 和 tanh 激活函数的稠密层,生成一个向量 + C ˜ + t,其中包含单元格希望考虑保留的新信息。它的长度也等于单元格中的单元数,并包含介于-1和1之间的值。 - en: f t and C t-1 are multiplied element-wise and added to the element-wise multiplication of f tC t-1 + 逐元素相乘并加到 i tC ˜ + t 的逐元素乘积中。这代表了遗忘先前单元格状态的部分,并添加新的相关信息以生成更新后的单元格状态,C t。 - en: 'The concatenated vector is passed through an *output* gate: a dense layer with weights matrix W o , bias b o @@ -1789,16 +1815,19 @@ id: totrans-262 prefs: [] type: TYPE_NORMAL + zh: 请注意,这个简化的例子假设是灰度图像(即,只有一个通道)。如果是彩色图像,我们将有三个颜色通道,我们也可以对它们进行排序,例如,红色通道在蓝色通道之前,蓝色通道在绿色通道之前。 - en: Residual Blocks id: totrans-263 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 残差块 - en: Now that we have seen how to mask the convolutional layer, we can start to build our PixelCNN. The core building block that we will use is the residual block. id: totrans-264 prefs: [] type: TYPE_NORMAL + zh: 现在我们已经看到如何对卷积层进行掩码,我们可以开始构建我们的PixelCNN。我们将使用的核心构建块是残差块。 - en: A *residual block* is a set of layers where the output is added to the input before being passed on to the rest of the network. In other words, the input has a *fast-track* route to the output, without having to go through the intermediate @@ -1810,29 +1839,35 @@ id: totrans-265 prefs: [] type: TYPE_NORMAL + zh: '*残差块*是一组层,其中输出在传递到网络的其余部分之前添加到输入中。换句话说,输入有一条*快速通道*到输出,而无需经过中间层——这被称为*跳跃连接*。包含跳跃连接的理由是,如果最佳转换只是保持输入不变,这可以通过简单地将中间层的权重置零来实现。如果没有跳跃连接,网络将不得不通过中间层找到一个恒等映射,这要困难得多。' - en: A diagram of the residual block in our PixelCNN is shown in [Figure 5-14](#residual_block_pixelcnn). id: totrans-266 prefs: [] type: TYPE_NORMAL + zh: 我们在PixelCNN中的残差块的图示在[图5-14](#residual_block_pixelcnn)中显示。 - en: '![](Images/gdl2_0514.png)' id: totrans-267 prefs: [] type: TYPE_IMG + zh: '![](Images/gdl2_0514.png)' - en: Figure 5-14\. A PixelCNN residual block (the numbers of filters are next to the arrows and the filter sizes are next to the layers) id: totrans-268 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 图5-14。一个PixelCNN残差块(箭头旁边是滤波器的数量,层旁边是滤波器大小) - en: We can build a `ResidualBlock` using the code shown in [Example 5-13](#residual_block_code_pixelcnn). id: totrans-269 prefs: [] type: TYPE_NORMAL + zh: 我们可以使用[示例5-13](#residual_block_code_pixelcnn)中显示的代码构建一个`ResidualBlock`。 - en: Example 5-13\. A `ResidualBlock` id: totrans-270 prefs: - PREF_H5 type: TYPE_NORMAL + zh: 示例5-13。一个`ResidualBlock` - en: '[PRE12]' id: totrans-271 prefs: [] @@ -1842,43 +1877,52 @@ id: totrans-272 prefs: [] type: TYPE_NORMAL + zh: '[![1](Images/1.png)](#co_autoregressive_models_CO6-1)' - en: The initial `Conv2D` layer halves the number of channels. id: totrans-273 prefs: [] type: TYPE_NORMAL + zh: 初始的`Conv2D`层将通道数量减半。 - en: '[![2](Images/2.png)](#co_autoregressive_models_CO6-2)' id: totrans-274 prefs: [] type: TYPE_NORMAL + zh: '[![2](Images/2.png)](#co_autoregressive_models_CO6-2)' - en: The Type B `MaskedConv2D` layer with kernel size of 3 only uses information from five pixels—three pixels in the row above the focus pixel, one to the left, and the focus pixel itself. id: totrans-275 prefs: [] type: TYPE_NORMAL + zh: Type B `MaskedConv2D`层,核大小为3,仅使用来自五个像素的信息——上面一行中的三个像素,左边一个像素和焦点像素本身。 - en: '[![3](Images/3.png)](#co_autoregressive_models_CO6-3)' id: totrans-276 prefs: [] type: TYPE_NORMAL + zh: '[![3](Images/3.png)](#co_autoregressive_models_CO6-3)' - en: The final `Conv2D` layer doubles the number of channels to again match the input shape. id: totrans-277 prefs: [] type: TYPE_NORMAL + zh: 最终的`Conv2D`层将通道数量加倍,以再次匹配输入形状。 - en: '[![4](Images/4.png)](#co_autoregressive_models_CO6-4)' id: totrans-278 prefs: [] type: TYPE_NORMAL + zh: '[![4](Images/4.png)](#co_autoregressive_models_CO6-4)' - en: The output from the convolutional layers is added to the input—this is the skip connection. id: totrans-279 prefs: [] type: TYPE_NORMAL + zh: 卷积层的输出与输入相加——这是跳跃连接。 - en: Training the PixelCNN id: totrans-280 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 训练PixelCNN - en: In [Example 5-14](#pixelcnn_architecture) we put together the whole PixelCNN network, approximately following the structure laid out in the original paper. In the original paper, the output layer is a 256-filter `Conv2D` layer, with softmax @@ -1890,6 +1934,7 @@ id: totrans-281 prefs: [] type: TYPE_NORMAL + zh: 在[示例5-14](#pixelcnn_architecture)中,我们组合了整个PixelCNN网络,大致遵循原始论文中的结构。在原始论文中,输出层是一个有256个滤波器的`Conv2D`层,使用softmax激活。换句话说,网络试图通过预测正确的像素值来重新创建其输入,有点像自动编码器。不同之处在于,PixelCNN受到限制,以便不允许来自早期像素的信息流通过影响每个像素的预测,这是由于网络设计方式,使用`MaskedConv2D`层。 - en: A challenge with this approach is that the network has no way to understand that a pixel value of, say, 200 is very close to a pixel value of 201\. It must learn every pixel output value independently, which means training can be very @@ -1899,11 +1944,13 @@ id: totrans-282 prefs: [] type: TYPE_NORMAL + zh: 这种方法的一个挑战是网络无法理解,比如说,像素值200非常接近像素值201。它必须独立学习每个像素输出值,这意味着即使对于最简单的数据集,训练也可能非常缓慢。因此,在我们的实现中,我们简化输入,使每个像素只能取四个值之一。这样,我们可以使用一个有4个滤波器的`Conv2D`输出层,而不是256个。 - en: Example 5-14\. The PixelCNN architecture id: totrans-283 prefs: - PREF_H5 type: TYPE_NORMAL + zh: 示例5-14。PixelCNN架构 - en: '[PRE13]' id: totrans-284 prefs: [] @@ -1913,42 +1960,51 @@ id: totrans-285 prefs: [] type: TYPE_NORMAL + zh: '[![1](Images/1.png)](#co_autoregressive_models_CO7-1)' - en: The model `Input` is a grayscale image of size 16 × 16 × 1, with inputs scaled between 0 and 1. id: totrans-286 prefs: [] type: TYPE_NORMAL + zh: 模型的`Input`是一个尺寸为16×16×1的灰度图像,输入值在0到1之间缩放。 - en: '[![2](Images/2.png)](#co_autoregressive_models_CO7-2)' id: totrans-287 prefs: [] type: TYPE_NORMAL + zh: '[![2](Images/2.png)](#co_autoregressive_models_CO7-2)' - en: The first Type A `MaskedConv2D` layer with a kernel size of 7 uses information from 24 pixels—21 pixels in the three rows above the focus pixel and 3 to the left (the focus pixel itself is not used). id: totrans-288 prefs: [] type: TYPE_NORMAL + zh: 第一个Type A `MaskedConv2D`层,核大小为7,使用来自24个像素的信息——在焦点像素上面的三行中的21个像素和左边的3个像素(焦点像素本身不使用)。 - en: '[![3](Images/3.png)](#co_autoregressive_models_CO7-3)' id: totrans-289 prefs: [] type: TYPE_NORMAL + zh: '[![3](Images/3.png)](#co_autoregressive_models_CO7-3)' - en: Five `ResidualBlock` layer groups are stacked sequentially. id: totrans-290 prefs: [] type: TYPE_NORMAL + zh: 五个`ResidualBlock`层组被顺序堆叠。 - en: '[![4](Images/4.png)](#co_autoregressive_models_CO7-4)' id: totrans-291 prefs: [] type: TYPE_NORMAL + zh: '[![4](Images/4.png)](#co_autoregressive_models_CO7-4)' - en: Two Type B `MaskedConv2D` layers with a kernel size of 1 act as `Dense` layers across the number of channels for each pixel. id: totrans-292 prefs: [] type: TYPE_NORMAL + zh: 两个Type B `MaskedConv2D`层,核大小为1,作为每个像素通道数量的`Dense`层。 - en: '[![5](Images/5.png)](#co_autoregressive_models_CO7-5)' id: totrans-293 prefs: [] type: TYPE_NORMAL + zh: '[![5](Images/5.png)](#co_autoregressive_models_CO7-5)' - en: The final `Conv2D` layer reduces the number of channels to four—the number of pixel levels for this example. id: totrans-294