2024-02-08 19:12:19

OpenDocCN · Feb 8, 2024 · a1ebb4d · a1ebb4d
1 parent e4ab2e1
commit a1ebb4d
Show file tree

Hide file tree

Showing 2 changed files with 387 additions and 0 deletions.
diff --git a/totrans/gen-dl_13.yaml b/totrans/gen-dl_13.yaml
@@ -753,51 +753,63 @@
   id: totrans-100
   prefs: []
   type: TYPE_NORMAL
+  zh: 构成`TransformerBlock`层的子层在初始化函数中定义。
 - en: '[![2](Images/2.png)](#co_transformers_CO2-2)'
   id: totrans-101
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![2](Images/2.png)](#co_transformers_CO2-2)'
 - en: The causal mask is created to hide future keys from the query.
   id: totrans-102
   prefs: []
   type: TYPE_NORMAL
+  zh: 因果掩码被创建用来隐藏查询中的未来键。
 - en: '[![3](Images/3.png)](#co_transformers_CO2-3)'
   id: totrans-103
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![3](Images/3.png)](#co_transformers_CO2-3)'
 - en: The multihead attention layer is created, with the attention masks specified.
   id: totrans-104
   prefs: []
   type: TYPE_NORMAL
+  zh: 创建了多头注意力层，并指定了注意力掩码。
 - en: '[![4](Images/4.png)](#co_transformers_CO2-4)'
   id: totrans-105
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![4](Images/4.png)](#co_transformers_CO2-4)'
 - en: The first *add and normalization* layer.
   id: totrans-106
   prefs: []
   type: TYPE_NORMAL
+  zh: 第一个*加和归一化*层。
 - en: '[![5](Images/5.png)](#co_transformers_CO2-5)'
   id: totrans-107
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![5](Images/5.png)](#co_transformers_CO2-5)'
 - en: The feed-forward layers.
   id: totrans-108
   prefs: []
   type: TYPE_NORMAL
+  zh: 前馈层。
 - en: '[![6](Images/6.png)](#co_transformers_CO2-6)'
   id: totrans-109
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![6](Images/6.png)](#co_transformers_CO2-6)'
 - en: The second *add and normalization* layer.
   id: totrans-110
   prefs: []
   type: TYPE_NORMAL
+  zh: 第二个*加和归一化*层。
 - en: Positional Encoding
   id: totrans-111
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 位置编码
 - en: 'There is one final step to cover before we can put everything together to train
     our GPT model. You may have noticed that in the multihead attention layer, there
     is nothing that cares about the ordering of the keys. The dot product between
@@ -808,67 +820,79 @@
   id: totrans-112
   prefs: []
   type: TYPE_NORMAL
+  zh: 在我们能够将所有内容整合在一起训练我们的GPT模型之前，还有一个最后的步骤要解决。您可能已经注意到，在多头注意力层中，没有任何关心键的顺序的内容。每个键和查询之间的点积是并行计算的，而不是像递归神经网络那样顺序计算。这是一种优势（因为并行化效率提高），但也是一个问题，因为我们显然需要注意力层能够预测以下两个句子的不同输出：
 - en: The dog looked at the boy and … (barked?)
   id: totrans-113
   prefs:
   - PREF_UL
   type: TYPE_NORMAL
+  zh: 狗看着男孩然后…（叫？）
 - en: The boy looked at the dog and … (smiled?)
   id: totrans-114
   prefs:
   - PREF_UL
   type: TYPE_NORMAL
+  zh: 男孩看着狗然后…（微笑？）
 - en: To solve this problem, we use a technique called *positional encoding* when
     creating the inputs to the initial Transformer block. Instead of only encoding
     each token using a *token embedding*, we also encode the position of the token,
     using a *position embedding*.
   id: totrans-115
   prefs: []
   type: TYPE_NORMAL
+  zh: 为了解决这个问题，我们在创建初始Transformer块的输入时使用一种称为*位置编码*的技术。我们不仅使用*标记嵌入*对每个标记进行编码，还使用*位置嵌入*对标记的位置进行编码。
 - en: The *token embedding* is created using a standard `Embedding` layer to convert
     each token into a learned vector. We can create the *positional embedding* in
     the same way, using a standard `Embedding` layer to convert each integer position
     into a learned vector.
   id: totrans-116
   prefs: []
   type: TYPE_NORMAL
+  zh: '*标记嵌入*是使用标准的`Embedding`层创建的，将每个标记转换为一个学习到的向量。我们可以以相同的方式创建*位置嵌入*，使用标准的`Embedding`层将每个整数位置转换为一个学习到的向量。'
 - en: Tip
   id: totrans-117
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 提示
 - en: While GPT uses an `Embedding` layer to embed the position, the original Transformer
     paper used trigonometric functions—we’ll cover this alternative in [Chapter 11](ch11.xhtml#chapter_music),
     when we explore music generation.
   id: totrans-118
   prefs: []
   type: TYPE_NORMAL
+  zh: 虽然GPT使用`Embedding`层来嵌入位置，但原始Transformer论文使用三角函数——我们将在[第11章](ch11.xhtml#chapter_music)中介绍这种替代方法，当我们探索音乐生成时。
 - en: To construct the joint token–position encoding, the token embedding is added
     to the positional embedding, as shown in [Figure 9-8](#positional_enc). This way,
     the meaning and position of each word in the sequence are captured in a single
     vector.
   id: totrans-119
   prefs: []
   type: TYPE_NORMAL
+  zh: 为构建联合标记-位置编码，将标记嵌入加到位置嵌入中，如[图9-8](#positional_enc)所示。这样，序列中每个单词的含义和位置都被捕捉在一个向量中。
 - en: '![](Images/gdl2_0908.png)'
   id: totrans-120
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_0908.png)'
 - en: Figure 9-8\. The token embeddings are added to the positional embeddings to
     give the token position encoding
   id: totrans-121
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图9-8\. 将标记嵌入添加到位置嵌入以给出标记位置编码
 - en: The code that defines our `TokenAndPositionEmbedding` layer is shown in [Example 9-5](#positional_embedding_code).
   id: totrans-122
   prefs: []
   type: TYPE_NORMAL
+  zh: 定义我们的`TokenAndPositionEmbedding`层的代码显示在[示例9-5](#positional_embedding_code)中。
 - en: Example 9-5\. The `TokenAndPositionEmbedding` layer
   id: totrans-123
   prefs:
   - PREF_H5
   type: TYPE_NORMAL
+  zh: 示例9-5\. `TokenAndPositionEmbedding`层
 - en: '[PRE5]'
   id: totrans-124
   prefs: []
@@ -878,67 +902,81 @@
   id: totrans-125
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![1](Images/1.png)](#co_transformers_CO3-1)'
 - en: The tokens are embedded using an `Embedding` layer.
   id: totrans-126
   prefs: []
   type: TYPE_NORMAL
+  zh: 标记使用`Embedding`层进行嵌入。
 - en: '[![2](Images/2.png)](#co_transformers_CO3-2)'
   id: totrans-127
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![2](Images/2.png)](#co_transformers_CO3-2)'
 - en: The positions of the tokens are also embedded using an `Embedding` layer.
   id: totrans-128
   prefs: []
   type: TYPE_NORMAL
+  zh: 标记的位置也使用`Embedding`层进行嵌入。
 - en: '[![3](Images/3.png)](#co_transformers_CO3-3)'
   id: totrans-129
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![3](Images/3.png)](#co_transformers_CO3-3)'
 - en: The output from the layer is the sum of the token and position embeddings.
   id: totrans-130
   prefs: []
   type: TYPE_NORMAL
+  zh: 该层的输出是标记和位置嵌入的总和。
 - en: Training GPT
   id: totrans-131
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 训练GPT
 - en: Now we are ready to build and train our GPT model! To put everything together,
     we need to pass our input text through the token and position embedding layer,
     then through our Transformer block. The final output of the network is a simple
     `Dense` layer with softmax activation over the number of words in the vocabulary.
   id: totrans-132
   prefs: []
   type: TYPE_NORMAL
+  zh: 现在我们准备构建和训练我们的GPT模型！为了将所有内容整合在一起，我们需要将输入文本通过标记和位置嵌入层，然后通过我们的Transformer块。网络的最终输出是一个简单的具有softmax激活函数的`Dense`层，覆盖词汇表中的单词数量。
 - en: Tip
   id: totrans-133
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 提示
 - en: For simplicity, we will use just one Transformer block, rather than the 12 in
     the paper.
   id: totrans-134
   prefs: []
   type: TYPE_NORMAL
+  zh: 为简单起见，我们将只使用一个Transformer块，而不是论文中的12个。
 - en: The overall architecture is shown in [Figure 9-9](#transformer) and the equivalent
     code is provided in [Example 9-6](#transformer_code).
   id: totrans-135
   prefs: []
   type: TYPE_NORMAL
+  zh: 整体架构显示在[图9-9](#transformer)中，相应的代码在[示例9-6](#transformer_code)中提供。
 - en: '![](Images/gdl2_0909.png)'
   id: totrans-136
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_0909.png)'
 - en: Figure 9-9\. The simplified GPT model architecture
   id: totrans-137
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图9-9\. 简化的GPT模型架构
 - en: Example 9-6\. A GPT model in Keras
   id: totrans-138
   prefs:
   - PREF_H5
   type: TYPE_NORMAL
+  zh: 示例9-6\. 在Keras中的GPT模型
 - en: '[PRE6]'
   id: totrans-139
   prefs: []
@@ -948,30 +986,37 @@
   id: totrans-140
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![1](Images/1.png)](#co_transformers_CO4-1)'
 - en: The input is padded (with zeros).
   id: totrans-141
   prefs: []
   type: TYPE_NORMAL
+  zh: 输入被填充（用零填充）。
 - en: '[![2](Images/2.png)](#co_transformers_CO4-2)'
   id: totrans-142
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![2](Images/2.png)](#co_transformers_CO4-2)'
 - en: The text is encoded using a `TokenAndPositionEmbedding` layer.
   id: totrans-143
   prefs: []
   type: TYPE_NORMAL
+  zh: 文本使用`TokenAndPositionEmbedding`层进行编码。
 - en: '[![3](Images/3.png)](#co_transformers_CO4-3)'
   id: totrans-144
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![3](Images/3.png)](#co_transformers_CO4-3)'
 - en: The encoding is passed through a `TransformerBlock`.
   id: totrans-145
   prefs: []
   type: TYPE_NORMAL
+  zh: 编码通过`TransformerBlock`传递。
 - en: '[![4](Images/4.png)](#co_transformers_CO4-4)'
   id: totrans-146
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![4](Images/4.png)](#co_transformers_CO4-4)'
 - en: The transformed output is passed through a `Dense` layer with softmax activation
     to predict a distribution over the subsequent word.
   id: totrans-147