Skip to content

Commit

Permalink
2024-02-08 19:12:19
Browse files Browse the repository at this point in the history
  • Loading branch information
wizardforcel committed Feb 8, 2024
1 parent e4ab2e1 commit a1ebb4d
Show file tree
Hide file tree
Showing 2 changed files with 387 additions and 0 deletions.
45 changes: 45 additions & 0 deletions totrans/gen-dl_13.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -753,51 +753,63 @@
id: totrans-100
prefs: []
type: TYPE_NORMAL
zh: 构成`TransformerBlock`层的子层在初始化函数中定义。
- en: '[![2](Images/2.png)](#co_transformers_CO2-2)'
id: totrans-101
prefs: []
type: TYPE_NORMAL
zh: '[![2](Images/2.png)](#co_transformers_CO2-2)'
- en: The causal mask is created to hide future keys from the query.
id: totrans-102
prefs: []
type: TYPE_NORMAL
zh: 因果掩码被创建用来隐藏查询中的未来键。
- en: '[![3](Images/3.png)](#co_transformers_CO2-3)'
id: totrans-103
prefs: []
type: TYPE_NORMAL
zh: '[![3](Images/3.png)](#co_transformers_CO2-3)'
- en: The multihead attention layer is created, with the attention masks specified.
id: totrans-104
prefs: []
type: TYPE_NORMAL
zh: 创建了多头注意力层,并指定了注意力掩码。
- en: '[![4](Images/4.png)](#co_transformers_CO2-4)'
id: totrans-105
prefs: []
type: TYPE_NORMAL
zh: '[![4](Images/4.png)](#co_transformers_CO2-4)'
- en: The first *add and normalization* layer.
id: totrans-106
prefs: []
type: TYPE_NORMAL
zh: 第一个*加和归一化*层。
- en: '[![5](Images/5.png)](#co_transformers_CO2-5)'
id: totrans-107
prefs: []
type: TYPE_NORMAL
zh: '[![5](Images/5.png)](#co_transformers_CO2-5)'
- en: The feed-forward layers.
id: totrans-108
prefs: []
type: TYPE_NORMAL
zh: 前馈层。
- en: '[![6](Images/6.png)](#co_transformers_CO2-6)'
id: totrans-109
prefs: []
type: TYPE_NORMAL
zh: '[![6](Images/6.png)](#co_transformers_CO2-6)'
- en: The second *add and normalization* layer.
id: totrans-110
prefs: []
type: TYPE_NORMAL
zh: 第二个*加和归一化*层。
- en: Positional Encoding
id: totrans-111
prefs:
- PREF_H2
type: TYPE_NORMAL
zh: 位置编码
- en: 'There is one final step to cover before we can put everything together to train
our GPT model. You may have noticed that in the multihead attention layer, there
is nothing that cares about the ordering of the keys. The dot product between
Expand All @@ -808,67 +820,79 @@
id: totrans-112
prefs: []
type: TYPE_NORMAL
zh: 在我们能够将所有内容整合在一起训练我们的GPT模型之前,还有一个最后的步骤要解决。您可能已经注意到,在多头注意力层中,没有任何关心键的顺序的内容。每个键和查询之间的点积是并行计算的,而不是像递归神经网络那样顺序计算。这是一种优势(因为并行化效率提高),但也是一个问题,因为我们显然需要注意力层能够预测以下两个句子的不同输出:
- en: The dog looked at the boy and …​ (barked?)
id: totrans-113
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: 狗看着男孩然后…(叫?)
- en: The boy looked at the dog and …​ (smiled?)
id: totrans-114
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: 男孩看着狗然后…(微笑?)
- en: To solve this problem, we use a technique called *positional encoding* when
creating the inputs to the initial Transformer block. Instead of only encoding
each token using a *token embedding*, we also encode the position of the token,
using a *position embedding*.
id: totrans-115
prefs: []
type: TYPE_NORMAL
zh: 为了解决这个问题,我们在创建初始Transformer块的输入时使用一种称为*位置编码*的技术。我们不仅使用*标记嵌入*对每个标记进行编码,还使用*位置嵌入*对标记的位置进行编码。
- en: The *token embedding* is created using a standard `Embedding` layer to convert
each token into a learned vector. We can create the *positional embedding* in
the same way, using a standard `Embedding` layer to convert each integer position
into a learned vector.
id: totrans-116
prefs: []
type: TYPE_NORMAL
zh: '*标记嵌入*是使用标准的`Embedding`层创建的,将每个标记转换为一个学习到的向量。我们可以以相同的方式创建*位置嵌入*,使用标准的`Embedding`层将每个整数位置转换为一个学习到的向量。'
- en: Tip
id: totrans-117
prefs:
- PREF_H6
type: TYPE_NORMAL
zh: 提示
- en: While GPT uses an `Embedding` layer to embed the position, the original Transformer
paper used trigonometric functions—we’ll cover this alternative in [Chapter 11](ch11.xhtml#chapter_music),
when we explore music generation.
id: totrans-118
prefs: []
type: TYPE_NORMAL
zh: 虽然GPT使用`Embedding`层来嵌入位置,但原始Transformer论文使用三角函数——我们将在[第11章](ch11.xhtml#chapter_music)中介绍这种替代方法,当我们探索音乐生成时。
- en: To construct the joint token–position encoding, the token embedding is added
to the positional embedding, as shown in [Figure 9-8](#positional_enc). This way,
the meaning and position of each word in the sequence are captured in a single
vector.
id: totrans-119
prefs: []
type: TYPE_NORMAL
zh: 为构建联合标记-位置编码,将标记嵌入加到位置嵌入中,如[图9-8](#positional_enc)所示。这样,序列中每个单词的含义和位置都被捕捉在一个向量中。
- en: '![](Images/gdl2_0908.png)'
id: totrans-120
prefs: []
type: TYPE_IMG
zh: '![](Images/gdl2_0908.png)'
- en: Figure 9-8\. The token embeddings are added to the positional embeddings to
give the token position encoding
id: totrans-121
prefs:
- PREF_H6
type: TYPE_NORMAL
zh: 图9-8\. 将标记嵌入添加到位置嵌入以给出标记位置编码
- en: The code that defines our `TokenAndPositionEmbedding` layer is shown in [Example 9-5](#positional_embedding_code).
id: totrans-122
prefs: []
type: TYPE_NORMAL
zh: 定义我们的`TokenAndPositionEmbedding`层的代码显示在[示例9-5](#positional_embedding_code)中。
- en: Example 9-5\. The `TokenAndPositionEmbedding` layer
id: totrans-123
prefs:
- PREF_H5
type: TYPE_NORMAL
zh: 示例9-5\. `TokenAndPositionEmbedding`层
- en: '[PRE5]'
id: totrans-124
prefs: []
Expand All @@ -878,67 +902,81 @@
id: totrans-125
prefs: []
type: TYPE_NORMAL
zh: '[![1](Images/1.png)](#co_transformers_CO3-1)'
- en: The tokens are embedded using an `Embedding` layer.
id: totrans-126
prefs: []
type: TYPE_NORMAL
zh: 标记使用`Embedding`层进行嵌入。
- en: '[![2](Images/2.png)](#co_transformers_CO3-2)'
id: totrans-127
prefs: []
type: TYPE_NORMAL
zh: '[![2](Images/2.png)](#co_transformers_CO3-2)'
- en: The positions of the tokens are also embedded using an `Embedding` layer.
id: totrans-128
prefs: []
type: TYPE_NORMAL
zh: 标记的位置也使用`Embedding`层进行嵌入。
- en: '[![3](Images/3.png)](#co_transformers_CO3-3)'
id: totrans-129
prefs: []
type: TYPE_NORMAL
zh: '[![3](Images/3.png)](#co_transformers_CO3-3)'
- en: The output from the layer is the sum of the token and position embeddings.
id: totrans-130
prefs: []
type: TYPE_NORMAL
zh: 该层的输出是标记和位置嵌入的总和。
- en: Training GPT
id: totrans-131
prefs:
- PREF_H2
type: TYPE_NORMAL
zh: 训练GPT
- en: Now we are ready to build and train our GPT model! To put everything together,
we need to pass our input text through the token and position embedding layer,
then through our Transformer block. The final output of the network is a simple
`Dense` layer with softmax activation over the number of words in the vocabulary.
id: totrans-132
prefs: []
type: TYPE_NORMAL
zh: 现在我们准备构建和训练我们的GPT模型!为了将所有内容整合在一起,我们需要将输入文本通过标记和位置嵌入层,然后通过我们的Transformer块。网络的最终输出是一个简单的具有softmax激活函数的`Dense`层,覆盖词汇表中的单词数量。
- en: Tip
id: totrans-133
prefs:
- PREF_H6
type: TYPE_NORMAL
zh: 提示
- en: For simplicity, we will use just one Transformer block, rather than the 12 in
the paper.
id: totrans-134
prefs: []
type: TYPE_NORMAL
zh: 为简单起见,我们将只使用一个Transformer块,而不是论文中的12个。
- en: The overall architecture is shown in [Figure 9-9](#transformer) and the equivalent
code is provided in [Example 9-6](#transformer_code).
id: totrans-135
prefs: []
type: TYPE_NORMAL
zh: 整体架构显示在[图9-9](#transformer)中,相应的代码在[示例9-6](#transformer_code)中提供。
- en: '![](Images/gdl2_0909.png)'
id: totrans-136
prefs: []
type: TYPE_IMG
zh: '![](Images/gdl2_0909.png)'
- en: Figure 9-9\. The simplified GPT model architecture
id: totrans-137
prefs:
- PREF_H6
type: TYPE_NORMAL
zh: 图9-9\. 简化的GPT模型架构
- en: Example 9-6\. A GPT model in Keras
id: totrans-138
prefs:
- PREF_H5
type: TYPE_NORMAL
zh: 示例9-6\. 在Keras中的GPT模型
- en: '[PRE6]'
id: totrans-139
prefs: []
Expand All @@ -948,30 +986,37 @@
id: totrans-140
prefs: []
type: TYPE_NORMAL
zh: '[![1](Images/1.png)](#co_transformers_CO4-1)'
- en: The input is padded (with zeros).
id: totrans-141
prefs: []
type: TYPE_NORMAL
zh: 输入被填充(用零填充)。
- en: '[![2](Images/2.png)](#co_transformers_CO4-2)'
id: totrans-142
prefs: []
type: TYPE_NORMAL
zh: '[![2](Images/2.png)](#co_transformers_CO4-2)'
- en: The text is encoded using a `TokenAndPositionEmbedding` layer.
id: totrans-143
prefs: []
type: TYPE_NORMAL
zh: 文本使用`TokenAndPositionEmbedding`层进行编码。
- en: '[![3](Images/3.png)](#co_transformers_CO4-3)'
id: totrans-144
prefs: []
type: TYPE_NORMAL
zh: '[![3](Images/3.png)](#co_transformers_CO4-3)'
- en: The encoding is passed through a `TransformerBlock`.
id: totrans-145
prefs: []
type: TYPE_NORMAL
zh: 编码通过`TransformerBlock`传递。
- en: '[![4](Images/4.png)](#co_transformers_CO4-4)'
id: totrans-146
prefs: []
type: TYPE_NORMAL
zh: '[![4](Images/4.png)](#co_transformers_CO4-4)'
- en: The transformed output is passed through a `Dense` layer with softmax activation
to predict a distribution over the subsequent word.
id: totrans-147
Expand Down
Loading

0 comments on commit a1ebb4d

Please sign in to comment.