diff --git a/totrans/gen-dl_13.yaml b/totrans/gen-dl_13.yaml
index 33d04c0..6745916 100644
--- a/totrans/gen-dl_13.yaml
+++ b/totrans/gen-dl_13.yaml
@@ -753,51 +753,63 @@
   id: totrans-100
   prefs: []
   type: TYPE_NORMAL
+  zh: 构成`TransformerBlock`层的子层在初始化函数中定义。
 - en: '[![2](Images/2.png)](#co_transformers_CO2-2)'
   id: totrans-101
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![2](Images/2.png)](#co_transformers_CO2-2)'
 - en: The causal mask is created to hide future keys from the query.
   id: totrans-102
   prefs: []
   type: TYPE_NORMAL
+  zh: 因果掩码被创建用来隐藏查询中的未来键。
 - en: '[![3](Images/3.png)](#co_transformers_CO2-3)'
   id: totrans-103
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![3](Images/3.png)](#co_transformers_CO2-3)'
 - en: The multihead attention layer is created, with the attention masks specified.
   id: totrans-104
   prefs: []
   type: TYPE_NORMAL
+  zh: 创建了多头注意力层，并指定了注意力掩码。
 - en: '[![4](Images/4.png)](#co_transformers_CO2-4)'
   id: totrans-105
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![4](Images/4.png)](#co_transformers_CO2-4)'
 - en: The first *add and normalization* layer.
   id: totrans-106
   prefs: []
   type: TYPE_NORMAL
+  zh: 第一个*加和归一化*层。
 - en: '[![5](Images/5.png)](#co_transformers_CO2-5)'
   id: totrans-107
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![5](Images/5.png)](#co_transformers_CO2-5)'
 - en: The feed-forward layers.
   id: totrans-108
   prefs: []
   type: TYPE_NORMAL
+  zh: 前馈层。
 - en: '[![6](Images/6.png)](#co_transformers_CO2-6)'
   id: totrans-109
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![6](Images/6.png)](#co_transformers_CO2-6)'
 - en: The second *add and normalization* layer.
   id: totrans-110
   prefs: []
   type: TYPE_NORMAL
+  zh: 第二个*加和归一化*层。
 - en: Positional Encoding
   id: totrans-111
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 位置编码
 - en: 'There is one final step to cover before we can put everything together to train
     our GPT model. You may have noticed that in the multihead attention layer, there
     is nothing that cares about the ordering of the keys. The dot product between
@@ -808,16 +820,19 @@
   id: totrans-112
   prefs: []
   type: TYPE_NORMAL
+  zh: 在我们能够将所有内容整合在一起训练我们的GPT模型之前，还有一个最后的步骤要解决。您可能已经注意到，在多头注意力层中，没有任何关心键的顺序的内容。每个键和查询之间的点积是并行计算的，而不是像递归神经网络那样顺序计算。这是一种优势（因为并行化效率提高），但也是一个问题，因为我们显然需要注意力层能够预测以下两个句子的不同输出：
 - en: The dog looked at the boy and …​ (barked?)
   id: totrans-113
   prefs:
   - PREF_UL
   type: TYPE_NORMAL
+  zh: 狗看着男孩然后…（叫？）
 - en: The boy looked at the dog and …​ (smiled?)
   id: totrans-114
   prefs:
   - PREF_UL
   type: TYPE_NORMAL
+  zh: 男孩看着狗然后…（微笑？）
 - en: To solve this problem, we use a technique called *positional encoding* when
     creating the inputs to the initial Transformer block. Instead of only encoding
     each token using a *token embedding*, we also encode the position of the token,
@@ -825,6 +840,7 @@
   id: totrans-115
   prefs: []
   type: TYPE_NORMAL
+  zh: 为了解决这个问题，我们在创建初始Transformer块的输入时使用一种称为*位置编码*的技术。我们不仅使用*标记嵌入*对每个标记进行编码，还使用*位置嵌入*对标记的位置进行编码。
 - en: The *token embedding* is created using a standard `Embedding` layer to convert
     each token into a learned vector. We can create the *positional embedding* in
     the same way, using a standard `Embedding` layer to convert each integer position
@@ -832,17 +848,20 @@
   id: totrans-116
   prefs: []
   type: TYPE_NORMAL
+  zh: '*标记嵌入*是使用标准的`Embedding`层创建的，将每个标记转换为一个学习到的向量。我们可以以相同的方式创建*位置嵌入*，使用标准的`Embedding`层将每个整数位置转换为一个学习到的向量。'
 - en: Tip
   id: totrans-117
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 提示
 - en: While GPT uses an `Embedding` layer to embed the position, the original Transformer
     paper used trigonometric functions—we’ll cover this alternative in [Chapter 11](ch11.xhtml#chapter_music),
     when we explore music generation.
   id: totrans-118
   prefs: []
   type: TYPE_NORMAL
+  zh: 虽然GPT使用`Embedding`层来嵌入位置，但原始Transformer论文使用三角函数——我们将在[第11章](ch11.xhtml#chapter_music)中介绍这种替代方法，当我们探索音乐生成时。
 - en: To construct the joint token–position encoding, the token embedding is added
     to the positional embedding, as shown in [Figure 9-8](#positional_enc). This way,
     the meaning and position of each word in the sequence are captured in a single
@@ -850,25 +869,30 @@
   id: totrans-119
   prefs: []
   type: TYPE_NORMAL
+  zh: 为构建联合标记-位置编码，将标记嵌入加到位置嵌入中，如[图9-8](#positional_enc)所示。这样，序列中每个单词的含义和位置都被捕捉在一个向量中。
 - en: '![](Images/gdl2_0908.png)'
   id: totrans-120
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_0908.png)'
 - en: Figure 9-8\. The token embeddings are added to the positional embeddings to
     give the token position encoding
   id: totrans-121
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图9-8\. 将标记嵌入添加到位置嵌入以给出标记位置编码
 - en: The code that defines our `TokenAndPositionEmbedding` layer is shown in [Example 9-5](#positional_embedding_code).
   id: totrans-122
   prefs: []
   type: TYPE_NORMAL
+  zh: 定义我们的`TokenAndPositionEmbedding`层的代码显示在[示例9-5](#positional_embedding_code)中。
 - en: Example 9-5\. The `TokenAndPositionEmbedding` layer
   id: totrans-123
   prefs:
   - PREF_H5
   type: TYPE_NORMAL
+  zh: 示例9-5\. `TokenAndPositionEmbedding`层
 - en: '[PRE5]'
   id: totrans-124
   prefs: []
@@ -878,31 +902,38 @@
   id: totrans-125
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![1](Images/1.png)](#co_transformers_CO3-1)'
 - en: The tokens are embedded using an `Embedding` layer.
   id: totrans-126
   prefs: []
   type: TYPE_NORMAL
+  zh: 标记使用`Embedding`层进行嵌入。
 - en: '[![2](Images/2.png)](#co_transformers_CO3-2)'
   id: totrans-127
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![2](Images/2.png)](#co_transformers_CO3-2)'
 - en: The positions of the tokens are also embedded using an `Embedding` layer.
   id: totrans-128
   prefs: []
   type: TYPE_NORMAL
+  zh: 标记的位置也使用`Embedding`层进行嵌入。
 - en: '[![3](Images/3.png)](#co_transformers_CO3-3)'
   id: totrans-129
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![3](Images/3.png)](#co_transformers_CO3-3)'
 - en: The output from the layer is the sum of the token and position embeddings.
   id: totrans-130
   prefs: []
   type: TYPE_NORMAL
+  zh: 该层的输出是标记和位置嵌入的总和。
 - en: Training GPT
   id: totrans-131
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 训练GPT
 - en: Now we are ready to build and train our GPT model! To put everything together,
     we need to pass our input text through the token and position embedding layer,
     then through our Transformer block. The final output of the network is a simple
@@ -910,35 +941,42 @@
   id: totrans-132
   prefs: []
   type: TYPE_NORMAL
+  zh: 现在我们准备构建和训练我们的GPT模型！为了将所有内容整合在一起，我们需要将输入文本通过标记和位置嵌入层，然后通过我们的Transformer块。网络的最终输出是一个简单的具有softmax激活函数的`Dense`层，覆盖词汇表中的单词数量。
 - en: Tip
   id: totrans-133
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 提示
 - en: For simplicity, we will use just one Transformer block, rather than the 12 in
     the paper.
   id: totrans-134
   prefs: []
   type: TYPE_NORMAL
+  zh: 为简单起见，我们将只使用一个Transformer块，而不是论文中的12个。
 - en: The overall architecture is shown in [Figure 9-9](#transformer) and the equivalent
     code is provided in [Example 9-6](#transformer_code).
   id: totrans-135
   prefs: []
   type: TYPE_NORMAL
+  zh: 整体架构显示在[图9-9](#transformer)中，相应的代码在[示例9-6](#transformer_code)中提供。
 - en: '![](Images/gdl2_0909.png)'
   id: totrans-136
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_0909.png)'
 - en: Figure 9-9\. The simplified GPT model architecture
   id: totrans-137
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图9-9\. 简化的GPT模型架构
 - en: Example 9-6\. A GPT model in Keras
   id: totrans-138
   prefs:
   - PREF_H5
   type: TYPE_NORMAL
+  zh: 示例9-6\. 在Keras中的GPT模型
 - en: '[PRE6]'
   id: totrans-139
   prefs: []
@@ -948,30 +986,37 @@
   id: totrans-140
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![1](Images/1.png)](#co_transformers_CO4-1)'
 - en: The input is padded (with zeros).
   id: totrans-141
   prefs: []
   type: TYPE_NORMAL
+  zh: 输入被填充（用零填充）。
 - en: '[![2](Images/2.png)](#co_transformers_CO4-2)'
   id: totrans-142
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![2](Images/2.png)](#co_transformers_CO4-2)'
 - en: The text is encoded using a `TokenAndPositionEmbedding` layer.
   id: totrans-143
   prefs: []
   type: TYPE_NORMAL
+  zh: 文本使用`TokenAndPositionEmbedding`层进行编码。
 - en: '[![3](Images/3.png)](#co_transformers_CO4-3)'
   id: totrans-144
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![3](Images/3.png)](#co_transformers_CO4-3)'
 - en: The encoding is passed through a `TransformerBlock`.
   id: totrans-145
   prefs: []
   type: TYPE_NORMAL
+  zh: 编码通过`TransformerBlock`传递。
 - en: '[![4](Images/4.png)](#co_transformers_CO4-4)'
   id: totrans-146
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![4](Images/4.png)](#co_transformers_CO4-4)'
 - en: The transformed output is passed through a `Dense` layer with softmax activation
     to predict a distribution over the subsequent word.
   id: totrans-147
diff --git a/totrans/gen-dl_14.yaml b/totrans/gen-dl_14.yaml
index 9b577d3..7bc60e3 100644
--- a/totrans/gen-dl_14.yaml
+++ b/totrans/gen-dl_14.yaml
@@ -1,92 +1,126 @@
 - en: Chapter 10\. Advanced GANs
+  id: totrans-0
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
+  zh: 第10章. 高级GANs
 - en: '[Chapter 4](ch04.xhtml#chapter_gan) introduced generative adversarial networks
     (GANs), a class of generative model that has produced state-of-the-art results
     across a wide variety of image generation tasks. The flexibility in the model
     architecture and training process has led academics and deep learning practitioners
     to find new ways to design and train GANs, leading to many different advanced
     *flavors* of the architecture that we shall explore in this chapter.'
+  id: totrans-1
   prefs: []
   type: TYPE_NORMAL
+  zh: '[第4章](ch04.xhtml#chapter_gan)介绍了生成对抗网络（GANs），这是一类生成模型，在各种图像生成任务中取得了最先进的结果。模型架构和训练过程的灵活性导致学术界和深度学习从业者找到了设计和训练GAN的新方法，从而产生了许多不同的高级架构，我们将在本章中探讨。'
 - en: Introduction
+  id: totrans-2
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
+  zh: 介绍
 - en: Explaining all GAN developments and their repercussions in detail could easily
     fill another book. The [GAN Zoo repository](https://oreil.ly/Oy6bR) on GitHub
     contains over 500 distinct examples of GANs with linked papers, ranging from ABC-GAN
     to ZipNet-GAN!
+  id: totrans-3
   prefs: []
   type: TYPE_NORMAL
+  zh: 详细解释所有GAN发展及其影响可能需要另一本书。GitHub上的[GAN Zoo代码库](https://oreil.ly/Oy6bR)包含了500多个不同的GAN示例，涵盖了从ABC-GAN到ZipNet-GAN的各种GAN，并附有相关论文链接！
 - en: In this chapter we will cover the main GANs that have been influential in the
     field, including a detailed explanation of the model architecture and training
     process for each.
+  id: totrans-4
   prefs: []
   type: TYPE_NORMAL
+  zh: 在本章中，我们将介绍对该领域产生影响的主要GANs，包括对每个模型的模型架构和训练过程的详细解释。
 - en: 'We will first explore three important models from NVIDIA that have pushed the
     boundaries of image generation: ProGAN, StyleGAN, and StyleGAN2\. We will analyze
     each of these models in enough detail to understand the fundamental concepts that
     underpin the architectures and see how they have each built on ideas from earlier
     papers.'
+  id: totrans-5
   prefs: []
   type: TYPE_NORMAL
+  zh: 我们将首先探讨NVIDIA推动图像生成边界的三个重要模型：ProGAN、StyleGAN和StyleGAN2。我们将对每个模型进行足够详细的分析，以理解支撑架构的基本概念，并看看它们如何各自建立在早期论文的想法基础上。
 - en: 'We will also explore two other important GAN architectures that incorporate
     attention: the Self-Attention GAN (SAGAN) and BigGAN, which built on many of the
     ideas in the SAGAN paper. We have already seen the power of the attention mechanism
     in the context of Transformers in [Chapter 9](ch09.xhtml#chapter_transformer).'
+  id: totrans-6
   prefs: []
   type: TYPE_NORMAL
+  zh: 我们还将探讨另外两种重要的GAN架构，包括引入注意力机制的Self-Attention GAN（SAGAN）和BigGAN，后者在SAGAN论文中的许多想法基础上构建。我们已经在[第9章](ch09.xhtml#chapter_transformer)中看到了注意力机制在变换器中的威力。
 - en: Lastly, we will cover VQ-GAN and ViT VQ-GAN, which incorporate a blend of ideas
     from variational autoencoders, Transformers, and GANs. VQ-GAN is a key component
     of Google’s state-of-the-art text-to-image generation model Muse.^([1](ch10.xhtml#idm45387005226448))
     We will explore so-called multimodal models in more detail in [Chapter 13](ch13.xhtml#chapter_multimodal).
+  id: totrans-7
   prefs: []
   type: TYPE_NORMAL
+  zh: 最后，我们将介绍VQ-GAN和ViT VQ-GAN，它们融合了变分自动编码器、变换器和GAN的思想。VQ-GAN是谷歌最先进的文本到图像生成模型Muse的关键组成部分。我们将在[第13章](ch13.xhtml#chapter_multimodal)中更详细地探讨所谓的多模型。
 - en: Training Your Own Models
+  id: totrans-8
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
+  zh: 训练您自己的模型
 - en: For conciseness I have chosen not to include code to directly build these models
     in the code repository for this book, but instead will point to publicly available
     implementations where possible, so that you can train your own versions if you
     wish.
+  id: totrans-9
   prefs: []
   type: TYPE_NORMAL
+  zh: 为了简洁起见，我选择不在本书的代码库中直接构建这些模型的代码，而是将尽可能指向公开可用的实现，以便您可以根据需要训练自己的版本。
 - en: ProGAN
+  id: totrans-10
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
+  zh: ProGAN
 - en: ProGAN is a technique developed by NVIDIA Labs in 2017^([2](ch10.xhtml#idm45387005216528))
     to improve both the speed and stability of GAN training. Instead of immediately
     training a GAN on full-resolution images, the ProGAN paper suggests first training
     the generator and discriminator on low-resolution images of, say, 4 × 4 pixels
     and then incrementally adding layers throughout the training process to increase
     the resolution.
+  id: totrans-11
   prefs: []
   type: TYPE_NORMAL
+  zh: ProGAN是NVIDIA实验室在2017年开发的一种技术，旨在提高GAN训练的速度和稳定性。ProGAN论文建议，不要立即在全分辨率图像上训练GAN，而是首先在低分辨率图像（例如4×4像素）上训练生成器和鉴别器，然后在训练过程中逐步添加层以增加分辨率。
 - en: Let’s take a look at the concept of *progressive training* in more detail.
+  id: totrans-12
   prefs: []
   type: TYPE_NORMAL
+  zh: 让我们更详细地了解*渐进式训练*的概念。
 - en: Training Your Own ProGAN
+  id: totrans-13
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
+  zh: 训练您自己的ProGAN
 - en: There is an excellent tutorial by Bharath K on training your own ProGAN using
     Keras available on the [Paperspace blog](https://oreil.ly/b2CJm). Bear in mind
     that training a ProGAN to achieve the results from the paper requires a significant
     amount of computing power.
+  id: totrans-14
   prefs: []
   type: TYPE_NORMAL
+  zh: Bharath K在[Paperspace博客](https://oreil.ly/b2CJm)上提供了一个关于使用Keras训练自己的ProGAN的优秀教程。请记住，训练ProGAN以达到论文中的结果需要大量的计算能力。
 - en: Progressive Training
+  id: totrans-15
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 渐进式训练
 - en: As always with GANs, we build two independent networks, the generator and discriminator,
     with a fight for dominance taking place during the training process.
+  id: totrans-16
   prefs: []
   type: TYPE_NORMAL
+  zh: 与GANs一样，我们构建两个独立的网络，生成器和鉴别器，在训练过程中进行统治之争。
 - en: In a normal GAN, the generator always outputs full-resolution images, even in
     the early stages of training. It is reasonable to think that this strategy might
     not be optimal—the generator might be slow to learn high-level structures in the
@@ -94,60 +128,84 @@
     images. Wouldn’t it be better to first train a lightweight GAN to output accurate
     low-resolution images and then see if we can build on this to gradually increase
     the resolution?
+  id: totrans-17
   prefs: []
   type: TYPE_NORMAL
+  zh: 在普通的GAN中，生成器总是输出全分辨率图像，即使在训练的早期阶段也是如此。可以合理地认为，这种策略可能不是最佳的——生成器可能在训练的早期阶段学习高级结构较慢，因为它立即在复杂的高分辨率图像上操作。首先训练一个轻量级的GAN以输出准确的低分辨率图像，然后逐渐增加分辨率，这样做会更好吗？
 - en: This simple idea leads us to *progressive training*, one of the key contributions
     of the ProGAN paper. The ProGAN is trained in stages, starting with a training
     set that has been condensed down to 4 × 4–pixel images using interpolation, as
     shown in [Figure 10-1](Images/#condensed_images).
+  id: totrans-18
   prefs: []
   type: TYPE_NORMAL
+  zh: 这个简单的想法引导我们进入*渐进式训练*，这是ProGAN论文的一个关键贡献。ProGAN分阶段训练，从一个已经通过插值压缩到4×4像素图像的训练集开始，如[图10-1](Images/#condensed_images)所示。
 - en: '![](Images/gdl2_1001.png)'
+  id: totrans-19
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_1001.png)'
 - en: Figure 10-1\. Images in the dataset can be compressed to lower resolution using
     interpolation
+  id: totrans-20
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图10-1。数据集中的图像可以使用插值压缩到较低分辨率
 - en: We can then initially train the generator to transform a latent input noise
     vector <math alttext="z"><mi>z</mi></math> (say, of length 512) into an image
     of shape 4 × 4 × 3\. The matching discriminator will need to transform an input
     image of size 4 × 4 × 3 into a single scalar prediction. The network architectures
     for this first step are shown in [Figure 10-2](#progan_4).
+  id: totrans-21
   prefs: []
   type: TYPE_NORMAL
+  zh: 然后，我们可以最初训练生成器，将潜在输入噪声向量<math alttext="z"><mi>z</mi></math>（比如长度为512）转换为形状为4×4×3的图像。匹配的鉴别器需要将大小为4×4×3的输入图像转换为单个标量预测。这第一步的网络架构如[图10-2](#progan_4)所示。
 - en: The blue box in the generator represents the convolutional layer that converts
     the set of feature maps into an RGB image (`toRGB`), and the blue box in the discriminator
     represents the convolutional layer that converts the RGB images into a set of
     feature maps (`fromRGB`).
+  id: totrans-22
   prefs: []
   type: TYPE_NORMAL
+  zh: 生成器中的蓝色框表示将特征图转换为RGB图像的卷积层（`toRGB`），鉴别器中的蓝色框表示将RGB图像转换为一组特征图的卷积层（`fromRGB`）。
 - en: '![](Images/gdl2_1002.png)'
+  id: totrans-23
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_1002.png)'
 - en: Figure 10-2\. The generator and discriminator architectures for the first stage
     of the ProGAN training process
+  id: totrans-24
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图10-2。ProGAN训练过程的第一阶段的生成器和鉴别器架构
 - en: In the paper, the authors train this pair of networks until the discriminator
     has seen 800,000 real images. We now need to understand how the generator and
     discriminator are expanded to work with 8 × 8–pixel images.
+  id: totrans-25
   prefs: []
   type: TYPE_NORMAL
+  zh: 在论文中，作者训练这对网络，直到鉴别器看到了800,000张真实图像。现在我们需要了解如何扩展生成器和鉴别器以处理8×8像素图像。
 - en: To expand the generator and discriminator, we need to blend in additional layers.
     This is managed in two phases, transition and stabilization, as shown in [Figure 10-3](#progan_training_gen).
+  id: totrans-26
   prefs: []
   type: TYPE_NORMAL
+  zh: 为了扩展生成器和鉴别器，我们需要融入额外的层。这在两个阶段中进行，过渡和稳定，如[图10-3](#progan_training_gen)所示。
 - en: '![](Images/gdl2_1003.png)'
+  id: totrans-27
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_1003.png)'
 - en: Figure 10-3\. The ProGAN generator training process, expanding the network from
     4 × 4 images to 8 × 8 (dotted lines represent the rest of the network, not shown)
+  id: totrans-28
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图10-3。ProGAN生成器训练过程，将网络从4×4图像扩展到8×8（虚线代表网络的其余部分，未显示）
 - en: Let’s first look at the generator. During the *transition phase*, new upsampling
     and convolutional layers are appended to the existing network, with a residual
     connection set up to maintain the output from the existing trained `toRGB` layer.
@@ -155,70 +213,96 @@
     that is gradually increased from 0 to 1 throughout the transition phase to allow
     more of the new `toRGB` output through and less of the existing `toRGB` layer.
     This is to avoid a *shock* to the network as the new layers take over.
+  id: totrans-29
   prefs: []
   type: TYPE_NORMAL
+  zh: 让我们首先看一下生成器。在*过渡阶段*中，新的上采样和卷积层被附加到现有网络中，建立了一个残差连接以保持现有训练过的`toRGB`层的输出。关键的是，新层最初使用一个参数<math
+    alttext="alpha"><mi>α</mi></math>进行掩蔽，该参数在整个过渡阶段逐渐从0增加到1，以允许更多新的`toRGB`输出通过，减少现有的`toRGB`层。这是为了避免网络在新层接管时出现*冲击*。
 - en: Eventually, there is no flow through the old `toRGB` layer and the network enters
     the *stabilization phase*—a further period of training where the network can fine-tune
     the output, without any flow through the old `toRGB` layer.
+  id: totrans-30
   prefs: []
   type: TYPE_NORMAL
+  zh: 最终，旧的`toRGB`层不再有输出流，网络进入*稳定阶段*——进一步的训练期间，网络可以微调输出，而不经过旧的`toRGB`层。
 - en: The discriminator uses a similar process, as shown in [Figure 10-4](#progan_training_dis).
+  id: totrans-31
   prefs: []
   type: TYPE_NORMAL
+  zh: 鉴别器使用类似的过程，如[图10-4](#progan_training_dis)所示。
 - en: '![](Images/gdl2_1004.png)'
+  id: totrans-32
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_1004.png)'
 - en: Figure 10-4\. The ProGAN discriminator training process, expanding the network
     from 4 × 4 images to 8 × 8 (dotted lines represent the rest of the network, not
     shown)
+  id: totrans-33
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图10-4。ProGAN鉴别器训练过程，将网络从4×4图像扩展到8×8（虚线代表网络的其余部分，未显示）
 - en: Here, we need to blend in additional downscaling and convolutional layers. Again,
     the layers are injected into the network—this time at the start of the network,
     just after the input image. The existing `fromRGB` layer is connected via a residual
     connection and gradually phased out as the new layers take over during the transition
     phase. The stabilization phase allows the discriminator to fine-tune using the
     new layers.
+  id: totrans-34
   prefs: []
   type: TYPE_NORMAL
+  zh: 在这里，我们需要融入额外的降采样和卷积层。同样，这些层被注入到网络中——这次是在网络的开始部分，就在输入图像之后。现有的`fromRGB`层通过残差连接连接，并在过渡阶段逐渐淡出，随着新层在过渡阶段接管时逐渐淡出。稳定阶段允许鉴别器使用新层进行微调。
 - en: All transition and stabilization phases last until the discriminator has been
     shown 800,000 real images. Note that even through the network is trained progressively,
     no layers are *frozen*. Throughout the training process, all layers remain fully
     trainable.
+  id: totrans-35
   prefs: []
   type: TYPE_NORMAL
+  zh: 所有过渡和稳定阶段持续到鉴别器已经看到了800,000张真实图像。请注意，即使网络是渐进训练的，也没有层被*冻结*。在整个训练过程中，所有层都保持完全可训练。
 - en: This process continues, growing the GAN from 4 × 4 images to 8 × 8, then 16
     × 16, 32 × 32, and so on, until it reaches full resolution (1,024 × 1,024), as
     shown in [Figure 10-5](#progan).
+  id: totrans-36
   prefs: []
   type: TYPE_NORMAL
+  zh: 这个过程继续进行，将GAN从4×4图像扩展到8×8，然后16×16，32×32，依此类推，直到达到完整分辨率（1,024×1,024），如[图10-5](#progan)所示。
 - en: '![](Images/gdl2_1005.png)'
+  id: totrans-37
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_1005.png)'
 - en: 'Figure 10-5\. The ProGAN training mechanism, and some example generated faces
     (source: [Karras et al., 2017](https://arxiv.org/abs/1710.10196))'
+  id: totrans-38
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图10-5。ProGAN训练机制，以及一些示例生成的人脸（来源：[Karras等人，2017](https://arxiv.org/abs/1710.10196)）
 - en: The overall structure of the generator and discriminator after the full progressive
     training process is complete is shown in [Figure 10-6](#progan_network_diagram).
+  id: totrans-39
   prefs: []
   type: TYPE_NORMAL
 - en: '![](Images/gdl2_1006.png)'
+  id: totrans-40
   prefs: []
   type: TYPE_IMG
 - en: 'Figure 10-6\. The ProGAN generator and discriminator used to generate 1,024
     × 1,024–pixel CelebA faces (source: [Karras et al., 2018](https://arxiv.org/abs/1812.04948))'
+  id: totrans-41
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
 - en: The paper also makes several other important contributions, namely minibatch
     standard deviation, equalized learning rates, and pixelwise normalization, which
     are described briefly in the following sections.
+  id: totrans-42
   prefs: []
   type: TYPE_NORMAL
 - en: Minibatch standard deviation
+  id: totrans-43
   prefs:
   - PREF_H3
   type: TYPE_NORMAL
@@ -230,9 +314,11 @@
     can use this feature to distinguish the fake batches from the real batches! Therefore,
     the generator is incentivized to ensure it generates a similar amount of variety
     as is present in the real training data.
+  id: totrans-44
   prefs: []
   type: TYPE_NORMAL
 - en: Equalized learning rates
+  id: totrans-45
   prefs:
   - PREF_H3
   type: TYPE_NORMAL
@@ -243,6 +329,7 @@
     layer. This way, layers with a greater number of inputs will be initialized with
     weights that have a smaller deviation from zero, which generally improves the
     stability of the training process.
+  id: totrans-46
   prefs: []
   type: TYPE_NORMAL
 - en: The authors of the ProGAN paper found that this was causing problems when used
@@ -254,6 +341,7 @@
     more inputs). It was found that this causes an imbalance between the speed of
     training of the different layers of the generator and discriminator in ProGAN,
     so they used *equalized learning rates* to solve this problem.
+  id: totrans-47
   prefs: []
   type: TYPE_NORMAL
 - en: In ProGAN, weights are initialized using a simple standard Gaussian, regardless
@@ -262,9 +350,11 @@
     the optimizer sees each weight as having approximately the same dynamic range,
     so it applies the same learning rate. It is only when the layer is called that
     the weight is scaled by the factor from the He initializer.
+  id: totrans-48
   prefs: []
   type: TYPE_NORMAL
 - en: Pixelwise normalization
+  id: totrans-49
   prefs:
   - PREF_H3
   type: TYPE_NORMAL
@@ -273,9 +363,11 @@
     a unit length and helps to prevent the signal from spiraling out of control as
     it propagates through the network. The pixelwise normalization layer has no trainable
     weights.
+  id: totrans-50
   prefs: []
   type: TYPE_NORMAL
 - en: Outputs
+  id: totrans-51
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
@@ -284,25 +376,31 @@
     in [Figure 10-7](#progan_examples). This demonstrated the power of ProGAN over
     earlier GAN architectures and paved the way for future iterations such as StyleGAN
     and StyleGAN2, which we shall explore in the next sections.
+  id: totrans-52
   prefs: []
   type: TYPE_NORMAL
 - en: '![](Images/gdl2_1007.png)'
+  id: totrans-53
   prefs: []
   type: TYPE_IMG
 - en: 'Figure 10-7\. Generated examples from a ProGAN trained progressively on the
     LSUN dataset at 256 × 256 resolution (source: [Karras et al., 2017](https://arxiv.org/abs/1710.10196))'
+  id: totrans-54
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
 - en: StyleGAN
+  id: totrans-55
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
 - en: StyleGAN^([3](ch10.xhtml#idm45387005140128)) is a GAN architecture from 2018
     that builds on the earlier ideas in the ProGAN paper. In fact, the discriminator
     is identical; only the generator is changed.
+  id: totrans-56
   prefs: []
   type: TYPE_NORMAL
+  zh: StyleGAN^([3](ch10.xhtml#idm45387005140128))是2018年的一个GAN架构，建立在ProGAN论文中的早期思想基础上。实际上，鉴别器是相同的；只有生成器被改变。
 - en: Often when training GANs it is difficult to separate out vectors in the latent
     space corresponding to high-level attributes—they are frequently *entangled*,
     meaning that adjusting an image in the latent space to give a face more freckles,
@@ -310,40 +408,56 @@
     generates fantastically realistic images, it is no exception to this general rule.
     We would ideally like to have full control of the style of the image, and this
     requires a disentangled separation of features in the latent space.
+  id: totrans-57
   prefs: []
   type: TYPE_NORMAL
+  zh: 通常在训练GAN时，很难将潜在空间中对应于高级属性的向量分离出来——它们经常是*纠缠在一起*，这意味着调整潜在空间中的图像以使脸部更多雀斑，例如，可能也会无意中改变背景颜色。虽然ProGAN生成了极其逼真的图像，但它也不例外。我们理想情况下希望完全控制图像的风格，这需要在潜在空间中对特征进行分离。
 - en: 'StyleGAN achieves this by explicitly injecting style vectors into the network
     at different points: some that control high-level features (e.g., face orientation)
     and some that control low-level details (e.g., the way the hair falls across the
     forehead).'
+  id: totrans-58
   prefs: []
   type: TYPE_NORMAL
+  zh: StyleGAN通过在网络的不同点显式注入风格向量来实现这一点：一些控制高级特征（例如，面部方向）的向量，一些控制低级细节（例如，头发如何落在额头上）的向量。
 - en: The overall architecture of the StyleGAN generator is shown in [Figure 10-8](#stylegan_arch).
     Let’s walk through this architecture step by step, starting with the mapping network.
+  id: totrans-59
   prefs: []
   type: TYPE_NORMAL
+  zh: StyleGAN生成器的整体架构如[图10-8](#stylegan_arch)所示。让我们逐步走过这个架构，从映射网络开始。
 - en: '![](Images/gdl2_1008.png)'
+  id: totrans-60
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_1008.png)'
 - en: 'Figure 10-8\. The StyleGAN generator architecture (source: [Karras et al.,
     2018](https://arxiv.org/abs/1812.04948))'
+  id: totrans-61
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图10-8。StyleGAN生成器架构（来源：[Karras et al., 2018](https://arxiv.org/abs/1812.04948)）
 - en: Training Your Own StyleGAN
+  id: totrans-62
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
+  zh: 训练您自己的StyleGAN
 - en: There is an excellent tutorial by Soon-Yau Cheong on training your own StyleGAN
     using Keras available on the [Keras website](https://oreil.ly/MooSe). Bear in
     mind that training a StyleGAN to achieve the results from the paper requires a
     significant amount of computing power.
+  id: totrans-63
   prefs: []
   type: TYPE_NORMAL
+  zh: Soon-Yau Cheong在[Keras网站](https://oreil.ly/MooSe)上提供了一个关于使用Keras训练自己的StyleGAN的优秀教程。请记住，要实现论文中的结果，训练StyleGAN需要大量的计算资源。
 - en: The Mapping Network
+  id: totrans-64
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 映射网络
 - en: The *mapping network* <math alttext="f"><mi>f</mi></math> is a simple feed-forward
     network that converts the input noise <math alttext="bold z element-of script
     upper Z"><mrow><mi>𝐳</mi> <mo>∈</mo> <mi>𝒵</mi></mrow></math> into a different
@@ -351,17 +465,26 @@
     <mo>∈</mo> <mi>𝒲</mi></mrow></math> . This gives the generator the opportunity
     to disentangle the noisy input vector into distinct factors of variation, which
     can be easily picked up by the downstream style-generating layers.
+  id: totrans-65
   prefs: []
   type: TYPE_NORMAL
+  zh: '*映射网络* <math alttext="f"><mi>f</mi></math> 是一个简单的前馈网络，将输入噪声 <math alttext="bold
+    z element-of script upper Z"><mrow><mi>𝐳</mi> <mo>∈</mo> <mi>𝒵</mi></mrow></math>
+    转换为不同的潜在空间 <math alttext="bold w element-of script upper W"><mrow><mi>𝐰</mi> <mo>∈</mo>
+    <mi>𝒲</mi></mrow></math>。这使得生成器有机会将嘈杂的输入向量分解为不同的变化因素，这些因素可以被下游的风格生成层轻松捕捉到。'
 - en: The point of doing this is to separate out the process of choosing a style for
     the image (the mapping network) from the generation of an image with a given style
     (the synthesis network).
+  id: totrans-66
   prefs: []
   type: TYPE_NORMAL
+  zh: 这样做的目的是将图像的风格选择过程（映射网络）与生成具有给定风格的图像的过程（合成网络）分开。
 - en: The Synthesis Network
+  id: totrans-67
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 合成网络
 - en: 'The synthesis network is the generator of the actual image with a given style,
     as provided by the mapping network. As can be seen from [Figure 10-8](#stylegan_arch),
     the style vector <math alttext="bold w"><mi>𝐰</mi></math> is injected into the
@@ -374,16 +497,26 @@
     the specific style that should be injected at this point in the network—that is,
     they tell the synthesis network how to adjust the feature maps to move the generated
     image in the direction of the specified style.'
+  id: totrans-68
   prefs: []
   type: TYPE_NORMAL
+  zh: 合成网络是生成具有给定风格的实际图像的生成器，由映射网络提供。如[图10-8](#stylegan_arch)所示，风格向量 <math alttext="bold
+    w"><mi>𝐰</mi></math> 被注入到合成网络的不同点，每次通过不同的密集连接层 <math alttext="upper A Subscript
+    i"><msub><mi>A</mi> <mi>i</mi></msub></math>，生成两个向量：一个偏置向量 <math alttext="bold
+    y Subscript b comma i"><msub><mi>𝐲</mi> <mrow><mi>b</mi><mo>,</mo><mi>i</mi></mrow></msub></math>
+    和一个缩放向量 <math alttext="bold y Subscript s comma i"><msub><mi>𝐲</mi> <mrow><mi>s</mi><mo>,</mo><mi>i</mi></mrow></msub></math>。这些向量定义了应该在网络中的这一点注入的特定风格，也就是告诉合成网络如何调整特征图以使生成的图像朝着指定的风格方向移动。
 - en: This adjustment is achieved through *adaptive instance normalization* (AdaIN)
     layers.
+  id: totrans-69
   prefs: []
   type: TYPE_NORMAL
+  zh: 通过*自适应实例归一化*（AdaIN）层实现这种调整。
 - en: Adaptive instance normalization
+  id: totrans-70
   prefs:
   - PREF_H3
   type: TYPE_NORMAL
+  zh: 自适应实例归一化
 - en: 'An AdaIN layer is a type of neural network layer that adjusts the mean and
     variance of each feature map <math alttext="bold x Subscript i"><msub><mi>𝐱</mi>
     <mi>i</mi></msub></math> with a reference style bias <math alttext="bold y Subscript
@@ -393,6 +526,7 @@
     equal to the number of channels output from the preceding convolutional layer
     in the synthesis network. The equation for adaptive instance normalization is
     as follows:'
+  id: totrans-71
   prefs: []
   type: TYPE_NORMAL
 - en: <math alttext="StartLayout 1st Row  AdaIN left-parenthesis bold x Subscript
@@ -406,14 +540,27 @@
     <mfrac><mrow><msub><mi>𝐱</mi> <mi>i</mi></msub> <mo>-</mo><mi>μ</mi><mrow><mo>(</mo><msub><mi>𝐱</mi>
     <mi>i</mi></msub> <mo>)</mo></mrow></mrow> <mrow><mi>σ</mi><mo>(</mo><msub><mi>𝐱</mi>
     <mi>i</mi></msub> <mo>)</mo></mrow></mfrac> <mo>+</mo> <msub><mi>𝐲</mi> <mrow><mi>b</mi><mo>,</mo><mi>i</mi></mrow></msub></mrow></mtd></mtr></mtable></math>
+  id: totrans-72
   prefs: []
   type: TYPE_NORMAL
+  zh: <math alttext="StartLayout 1st Row  AdaIN left-parenthesis bold x Subscript
+    i Baseline comma bold y right-parenthesis equals bold y Subscript s comma i Baseline
+    StartFraction bold x Subscript i Baseline minus mu left-parenthesis bold x Subscript
+    i Baseline right-parenthesis Over sigma left-parenthesis bold x Subscript i Baseline
+    right-parenthesis EndFraction plus bold y Subscript b comma i Baseline EndLayout"
+    display="block"><mtable displaystyle="true"><mtr><mtd columnalign="right"><mrow><mtext>AdaIN</mtext>
+    <mrow><mo>(</mo> <msub><mi>𝐱</mi> <mi>i</mi></msub> <mo>,</mo> <mi>𝐲</mi> <mo>)</mo></mrow>
+    <mo>=</mo> <msub><mi>𝐲</mi> <mrow><mi>s</mi><mo>,</mo><mi>i</mi></mrow></msub>
+    <mfrac><mrow><msub><mi>𝐱</mi> <mi>i</mi></msub> <mo>-</mo><mi>μ</mi><mrow><mo>(</mo><msub><mi>𝐱</mi>
+    <mi>i</mi></msub> <mo>)</mo></mrow></mrow> <mrow><mi>σ</mi><mo>(</mo><msub><mi>𝐱</mi>
+    <mi>i</mi></msub> <mo>)</mo></mrow></mfrac> <mo>+</mo> <msub><mi>𝐲</mi> <mrow><mi>b</mi><mo>,</mo><mi>i</mi></mrow></msub></mrow></mtd></mtr></mtable></math>
 - en: The adaptive instance normalization layers ensure that the style vectors that
     are injected into each layer only affect features at that layer, by preventing
     any style information from leaking through between layers. The authors show that
     this results in the latent vectors <math alttext="bold w"><mi>𝐰</mi></math> being
     significantly more disentangled than the original <math alttext="bold z"><mi>𝐳</mi></math>
     vectors.
+  id: totrans-73
   prefs: []
   type: TYPE_NORMAL
 - en: Since the synthesis network is based on the ProGAN architecture, it is trained
@@ -424,9 +571,11 @@
     the latent vector <math alttext="bold w"><mi>𝐰</mi></math> , but we can also switch
     the <math alttext="bold w"><mi>𝐰</mi></math> vector at different points in the
     synthesis network to change the style at a variety of levels of detail.
+  id: totrans-74
   prefs: []
   type: TYPE_NORMAL
 - en: Style mixing
+  id: totrans-75
   prefs:
   - PREF_H3
   type: TYPE_NORMAL
@@ -445,9 +594,11 @@
     w bold 2 right-parenthesis"><mrow><msub><mi>𝐰</mi> <mn mathvariant="bold">2</mn></msub>
     <mrow><mo>)</mo></mrow></mrow></math> is chosen at random, to break any possible
     correlation between the vectors.
+  id: totrans-76
   prefs: []
   type: TYPE_NORMAL
 - en: Stochastic variation
+  id: totrans-77
   prefs:
   - PREF_H3
   type: TYPE_NORMAL
@@ -456,26 +607,32 @@
     for stochastic details such as the placement of individual hairs, or the background
     behind the face. Again, the depth at which the noise is injected affects the coarseness
     of the impact on the image.
+  id: totrans-78
   prefs: []
   type: TYPE_NORMAL
 - en: This also means that the initial input to the synthesis network can simply be
     a learned constant, rather than additional noise. There is enough stochasticity
     already present in the style inputs and the noise inputs to generate sufficient
     variation in the images.
+  id: totrans-79
   prefs: []
   type: TYPE_NORMAL
 - en: Outputs from StyleGAN
+  id: totrans-80
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
 - en: '[Figure 10-9](#stylegan_w) shows StyleGAN in action.'
+  id: totrans-81
   prefs: []
   type: TYPE_NORMAL
 - en: '![](Images/gdl2_1009.png)'
+  id: totrans-82
   prefs: []
   type: TYPE_IMG
 - en: 'Figure 10-9\. Merging styles between two generated images at different levels
     of detail (source: [Karras et al., 2018](https://arxiv.org/abs/1812.04948))'
+  id: totrans-83
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
@@ -488,9 +645,11 @@
     A. However, if the switch happens later, only fine-grained detail is carried across
     from source B, such as colors and microstructure of the face, while the coarse
     features from source A are preserved.
+  id: totrans-84
   prefs: []
   type: TYPE_NORMAL
 - en: StyleGAN2
+  id: totrans-85
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
@@ -500,22 +659,27 @@
     do not suffer as greatly from *artifacts*—water droplet–like areas of the image
     that were found to be caused by the adaptive instance normalization layers in
     StyleGAN, as shown in [Figure 10-10](#artifacts_stylegan).
+  id: totrans-86
   prefs: []
   type: TYPE_NORMAL
 - en: '![](Images/gdl2_1010.png)'
+  id: totrans-87
   prefs: []
   type: TYPE_IMG
 - en: 'Figure 10-10\. An artifact in a StyleGAN-generated image of a face (source:
     [Karras et al., 2019](https://arxiv.org/abs/1912.04958))'
+  id: totrans-88
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
 - en: Both the generator and the discriminator in StyleGAN2 are different from the
     StyleGAN. In the next sections we will explore the key differences between the
     architectures.
+  id: totrans-89
   prefs: []
   type: TYPE_NORMAL
 - en: Training Your Own StyleGAN2
+  id: totrans-90
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
@@ -523,9 +687,11 @@
     on [GitHub](https://oreil.ly/alB6w). Bear in mind that training a StyleGAN2 to
     achieve the results from the paper requires a significant amount of computing
     power.
+  id: totrans-91
   prefs: []
   type: TYPE_NORMAL
 - en: Weight Modulation and Demodulation
+  id: totrans-92
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
@@ -536,6 +702,7 @@
     by the modulation and demodulation steps in StyleGAN2 at runtime. In comparison,
     the AdaIN layers of StyleGAN operate on the image tensor as it flows through the
     network.
+  id: totrans-93
   prefs: []
   type: TYPE_NORMAL
 - en: The AdaIN layer in StyleGAN is simply an instance normalization followed by
@@ -544,12 +711,15 @@
     layers at runtime, rather than the output from the convolutional layers, as shown
     in [Figure 10-11](#stylegan2_styleblock). The authors show how this removes the
     artifact issue while retaining control of the image style.
+  id: totrans-94
   prefs: []
   type: TYPE_NORMAL
 - en: '![](Images/gdl2_1011.png)'
+  id: totrans-95
   prefs: []
   type: TYPE_IMG
 - en: Figure 10-11\. A comparison between the StyleGAN and StyleGAN2 style blocks
+  id: totrans-96
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
@@ -558,22 +728,30 @@
     , where <math alttext="i"><mi>i</mi></math> indexes the number of input channels
     in the corresponding convolutional layer. This style vector is then applied to
     the weights of the convolutional layer as follows:'
+  id: totrans-97
   prefs: []
   type: TYPE_NORMAL
 - en: <math alttext="w Subscript i comma j comma k Superscript prime Baseline equals
     s Subscript i Baseline dot w Subscript i comma j comma k" display="block"><mrow><msubsup><mi>w</mi>
     <mrow><mi>i</mi><mo>,</mo><mi>j</mi><mo>,</mo><mi>k</mi></mrow> <msup><mo>'</mo></msup></msubsup>
     <mo>=</mo> <msub><mi>s</mi> <mi>i</mi></msub> <mo>·</mo> <msub><mi>w</mi> <mrow><mi>i</mi><mo>,</mo><mi>j</mi><mo>,</mo><mi>k</mi></mrow></msub></mrow></math>
+  id: totrans-98
   prefs: []
   type: TYPE_NORMAL
+  zh: <math alttext="w Subscript i comma j comma k Superscript prime Baseline equals
+    s Subscript i Baseline dot w Subscript i comma j comma k" display="block"><mrow><msubsup><mi>w</mi>
+    <mrow><mi>i</mi><mo>,</mo><mi>j</mi><mo>,</mo><mi>k</mi></mrow> <msup><mo>'</mo></msup></msubsup>
+    <mo>=</mo> <msub><mi>s</mi> <mi>i</mi></msub> <mo>·</mo> <msub><mi>w</mi> <mrow><mi>i</mi><mo>,</mo><mi>j</mi><mo>,</mo><mi>k</mi></mrow></msub></mrow></math>
 - en: Here, <math alttext="j"><mi>j</mi></math> indexes the output channels of the
     layer and <math alttext="k"><mi>k</mi></math> indexes the spatial dimensions.
     This is the *modulation* step of the process.
+  id: totrans-99
   prefs: []
   type: TYPE_NORMAL
 - en: 'Then, we need to normalize the weights so that they again have a unit standard
     deviation, to ensure stability in the training process. This is the *demodulation*
     step:'
+  id: totrans-100
   prefs: []
   type: TYPE_NORMAL
 - en: <math alttext="w Subscript i comma j comma k Superscript double-prime Baseline
@@ -586,34 +764,55 @@
     <msqrt><mrow><munder><mo>∑</mo> <mrow><mi>i</mi><mo>,</mo><mi>k</mi></mrow></munder>
     <msup><mrow><msubsup><mi>w</mi> <mrow><mi>i</mi><mo>,</mo><mi>j</mi><mo>,</mo><mi>k</mi></mrow>
     <msup><mo>'</mo></msup></msubsup></mrow> <mn>2</mn></msup> <mo>+</mo><mi>ε</mi></mrow></msqrt></mfrac></mrow></math>
+  id: totrans-101
   prefs: []
   type: TYPE_NORMAL
+  zh: <math alttext="w Subscript i comma j comma k Superscript double-prime Baseline
+    equals StartFraction w Subscript i comma j comma k Superscript prime Baseline
+    Over StartRoot sigma-summation Underscript i comma k Endscripts w Subscript i
+    comma j comma k Superscript prime Baseline squared plus epsilon EndRoot EndFraction"
+    display="block"><mrow><msubsup><mi>w</mi> <mrow><mi>i</mi><mo>,</mo><mi>j</mi><mo>,</mo><mi>k</mi></mrow>
+    <msup><mrow><mo>'</mo><mo>'</mo></mrow></msup></msubsup> <mo>=</mo> <mfrac><msubsup><mi>w</mi>
+    <mrow><mi>i</mi><mo>,</mo><mi>j</mi><mo>,</mo><mi>k</mi></mrow> <msup><mo>'</mo></msup></msubsup>
+    <msqrt><mrow><munder><mo>∑</mo> <mrow><mi>i</mi><mo>,</mo><mi>k</mi></mrow></munder>
+    <msup><mrow><msubsup><mi>w</mi> <mrow><mi>i</mi><mo>,</mo><mi>j</mi><mo>,</mo><mi>k</mi></mrow>
+    <msup><mo>'</mo></msup></msubsup></mrow> <mn>2</mn></msup> <mo>+</mo><mi>ε</mi></mrow></msqrt></mfrac></mrow></math>
 - en: where <math alttext="epsilon"><mi>ϵ</mi></math> is a small constant value that
     prevents division by zero.
+  id: totrans-102
   prefs: []
   type: TYPE_NORMAL
 - en: In the paper, the authors show how this simple change is enough to prevent water-droplet
     artifacts, while retaining control over the generated images via the style vectors
     and ensuring the quality of the output remains high.
+  id: totrans-103
   prefs: []
   type: TYPE_NORMAL
 - en: Path Length Regularization
+  id: totrans-104
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 路径长度正则化
 - en: Another change made to the StyleGAN architecture is the inclusion of an additional
     penalty term in the loss function—*this is known as path length regularization*.
+  id: totrans-105
   prefs: []
   type: TYPE_NORMAL
+  zh: StyleGAN架构的另一个变化是在损失函数中包含了额外的惩罚项——*这被称为路径长度正则化*。
 - en: We would like the latent space to be as smooth and uniform as possible, so that
     a fixed-size step in the latent space in any direction results in a fixed-magnitude
     change in the image.
+  id: totrans-106
   prefs: []
   type: TYPE_NORMAL
+  zh: 我们希望潜在空间尽可能平滑和均匀，这样在任何方向上潜在空间中的固定大小步长会导致图像的固定幅度变化。
 - en: 'To encourage this property, StyleGAN2 aims to minimize the following term,
     alongside the usual Wasserstein loss with gradient penalty:'
+  id: totrans-107
   prefs: []
   type: TYPE_NORMAL
+  zh: 为了鼓励这一属性，StyleGAN2旨在最小化以下术语，以及通常的Wasserstein损失和梯度惩罚：
 - en: <math alttext="double-struck upper E Subscript w comma y Baseline left-parenthesis
     parallel-to bold upper J Subscript w Superscript down-tack Baseline y parallel-to
     Subscript 2 Baseline minus a right-parenthesis squared" display="block"><mrow><msub><mi>𝔼</mi>
@@ -621,8 +820,16 @@
     open="(" close=")"><msub><mfenced separators="" open="∥" close="∥"><msubsup><mi>𝐉</mi>
     <mi>𝑤</mi> <mi>⊤</mi></msubsup> <mi>𝑦</mi></mfenced> <mn>2</mn></msub> <mo>-</mo><mi>a</mi></mfenced>
     <mn>2</mn></msup></mrow></math>
+  id: totrans-108
   prefs: []
   type: TYPE_NORMAL
+  zh: <math alttext="double-struck upper E Subscript w comma y Baseline left-parenthesis
+    parallel-to bold upper J Subscript w Superscript down-tack Baseline y parallel-to
+    Subscript 2 Baseline minus a right-parenthesis squared" display="block"><mrow><msub><mi>𝔼</mi>
+    <mrow><mi>𝑤</mi><mo>,</mo><mi>𝑦</mi></mrow></msub> <msup><mfenced separators=""
+    open="(" close=")"><msub><mfenced separators="" open="∥" close="∥"><msubsup><mi>𝐉</mi>
+    <mi>𝑤</mi> <mi>⊤</mi></msubsup> <mi>𝑦</mi></mfenced> <mn>2</mn></msub> <mo>-</mo><mi>a</mi></mfenced>
+    <mn>2</mn></msup></mrow></math>
 - en: Here, <math alttext="w"><mi>𝑤</mi></math> is a set of style vectors created
     by the mapping network, <math alttext="y"><mi>𝑦</mi></math> is a set of noisy
     images drawn from <math alttext="script upper N left-parenthesis 0 comma bold
@@ -632,8 +839,15 @@
     w EndFraction"><mrow><msub><mi>𝐉</mi> <mi>𝑤</mi></msub> <mo>=</mo> <mfrac><mrow><mi>∂</mi><mi>g</mi></mrow>
     <mrow><mi>∂</mi><mi>𝑤</mi></mrow></mfrac></mrow></math> is the Jacobian of the
     generator network with respect to the style vectors.
+  id: totrans-109
   prefs: []
   type: TYPE_NORMAL
+  zh: 在这里，<math alttext="w"><mi>𝑤</mi></math>是由映射网络创建的一组样式向量，<math alttext="y"><mi>𝑦</mi></math>是从<math
+    alttext="script upper N left-parenthesis 0 comma bold upper I right-parenthesis"><mrow><mi>𝒩</mi>
+    <mo>(</mo> <mn>0</mn> <mo>,</mo> <mi>𝐈</mi> <mo>)</mo></mrow></math>中绘制的一组嘈杂图像，<math
+    alttext="bold upper J Subscript w Baseline equals StartFraction normal partial-differential
+    g Over normal partial-differential w EndFraction"><mrow><msub><mi>𝐉</mi> <mi>𝑤</mi></msub>
+    <mo>=</mo> <mfrac><mrow><mi>∂</mi><mi>g</mi></mrow> <mrow><mi>∂</mi><mi>𝑤</mi></mrow></mfrac></mrow></math>是生成器网络相对于样式向量的雅可比矩阵。
 - en: The term <math alttext="parallel-to bold upper J Subscript w Superscript down-tack
     Baseline y parallel-to Subscript 2"><msub><mfenced separators="" open="∥" close="∥"><msubsup><mi>𝐉</mi>
     <mi>𝑤</mi> <mi>⊤</mi></msubsup> <mi>𝑦</mi></mfenced> <mn>2</mn></msub></math>
@@ -644,36 +858,56 @@
     w Superscript down-tack Baseline y parallel-to Subscript 2"><msub><mfenced separators=""
     open="∥" close="∥"><msubsup><mi>𝐉</mi> <mi>𝑤</mi> <mi>⊤</mi></msubsup> <mi>𝑦</mi></mfenced>
     <mn>2</mn></msub></math> as the training progresses.
+  id: totrans-110
   prefs: []
   type: TYPE_NORMAL
+  zh: 术语<math alttext="parallel-to bold upper J Subscript w Superscript down-tack
+    Baseline y parallel-to Subscript 2"><msub><mfenced separators="" open="∥" close="∥"><msubsup><mi>𝐉</mi>
+    <mi>𝑤</mi> <mi>⊤</mi></msubsup> <mi>𝑦</mi></mfenced> <mn>2</mn></msub></math>测量了经雅可比矩阵给出的梯度变换后图像<math
+    alttext="y"><mi>𝑦</mi></math>的幅度。我们希望这个值接近一个常数<math alttext="a"><mi>a</mi></math>，这个常数是动态计算的，作为训练进行时<math
+    alttext="parallel-to bold upper J Subscript w Superscript down-tack Baseline y
+    parallel-to Subscript 2"><msub><mfenced separators="" open="∥" close="∥"><msubsup><mi>𝐉</mi>
+    <mi>𝑤</mi> <mi>⊤</mi></msubsup> <mi>𝑦</mi></mfenced> <mn>2</mn></msub></math>的指数移动平均值。
 - en: The authors find that this additional term makes exploring the latent space
     more reliable and consistent. Moreover, the regularization terms in the loss function
     are only applied once every 16 minibatches, for efficiency. This technique, called
     *lazy regularization*, does not cause a measurable drop in performance.
+  id: totrans-111
   prefs: []
   type: TYPE_NORMAL
+  zh: 作者发现，这个额外的术语使探索潜在空间更可靠和一致。此外，损失函数中的正则化项仅在每16个小批次中应用一次，以提高效率。这种技术称为*懒惰正则化*，不会导致性能的明显下降。
 - en: No Progressive Growing
+  id: totrans-112
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 没有渐进增长
 - en: Another major update is in how StyleGAN2 is trained. Rather than adopting the
     usual progressive training mechanism, StyleGAN2 utilizes skip connections in the
     generator and residual connections in the discriminator to train the entire network
     as one. It no longer requires different resolutions to be trained independently
     and blended as part of the training process.
+  id: totrans-113
   prefs: []
   type: TYPE_NORMAL
+  zh: StyleGAN2训练的另一个重大更新是在训练方式上。StyleGAN2不再采用通常的渐进式训练机制，而是利用生成器中的跳过连接和鉴别器中的残差连接来将整个网络作为一个整体进行训练。它不再需要独立训练不同分辨率，并将其作为训练过程的一部分混合。
 - en: '[Figure 10-12](#stylegan2_gen_dis) shows the generator and discriminator blocks
     in StyleGAN2.'
+  id: totrans-114
   prefs: []
   type: TYPE_NORMAL
+  zh: '[图10-12](#stylegan2_gen_dis)展示了StyleGAN2中的生成器和鉴别器块。'
 - en: '![](Images/gdl2_1012.png)'
+  id: totrans-115
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_1012.png)'
 - en: Figure 10-12\. The generator and discriminator blocks in StyleGAN2
+  id: totrans-116
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图10-12。StyleGAN2中的生成器和鉴别器块
 - en: The crucial property that we would like to be able to preserve is that the StyleGAN2
     starts by learning low-resolution features and gradually refines the output as
     training progresses. The authors show that this property is indeed preserved using
@@ -684,43 +918,57 @@
     begin to dominate, as the generator discovers more intricate ways to improve the
     realism of the images in order to fool the discriminator. This process is demonstrated
     in [Figure 10-13](#stylegan2_contrib).
+  id: totrans-117
   prefs: []
   type: TYPE_NORMAL
+  zh: 我们希望能够保留的关键属性是，StyleGAN2从学习低分辨率特征开始，并随着训练的进行逐渐完善输出。作者表明，使用这种架构确实保留了这一属性。在训练的早期阶段，每个网络都受益于在较低分辨率层中细化卷积权重，而通过跳过和残差连接将输出传递到较高分辨率层的方式基本上不受影响。随着训练的进行，较高分辨率层开始占主导地位，因为生成器发现了更复杂的方法来改善图像的逼真度，以欺骗鉴别器。这个过程在[图10-13](#stylegan2_contrib)中展示。
 - en: '![](Images/gdl2_1013.png)'
+  id: totrans-118
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_1013.png)'
 - en: Figure 10-13\. The contribution of each resolution layer to the output of the
     generator, by training time (adapted from [Karras et al., 2019](https://arxiv.org/pdf/1912.04958.pdf))
+  id: totrans-119
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图10-13。每个分辨率层对生成器输出的贡献，按训练时间（改编自[Karras等人，2019](https://arxiv.org/pdf/1912.04958.pdf)）
 - en: Outputs from StyleGAN2
+  id: totrans-120
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: StyleGAN2的输出
 - en: Some examples of StyleGAN2 output are shown in [Figure 10-14](#stylegan2_output).
     To date, the StyleGAN2 architecture (and scaled variations such as StyleGAN-XL^([6](ch10.xhtml#idm45387004898624)))
     remain state of the art for image generation on datasets such as Flickr-Faces-HQ
     (FFHQ) and CIFAR-10, according to the benchmarking website [Papers with Code](https://oreil.ly/VwH2r).
+  id: totrans-121
   prefs: []
   type: TYPE_NORMAL
 - en: '![](Images/gdl2_1014.png)'
+  id: totrans-122
   prefs: []
   type: TYPE_IMG
 - en: 'Figure 10-14\. Uncurated StyleGAN2 output for the FFHQ face dataset and LSUN
     car dataset (source: [Karras et al., 2019](https://arxiv.org/pdf/1912.04958.pdf))'
+  id: totrans-123
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
 - en: Other Important GANs
+  id: totrans-124
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
 - en: In this section, we will explore two more architectures that have also contributed
     significantly to the development of GANs—SAGAN and BigGAN.
+  id: totrans-125
   prefs: []
   type: TYPE_NORMAL
 - en: Self-Attention GAN (SAGAN)
+  id: totrans-126
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
@@ -729,13 +977,16 @@
     models such as the Transformer can also be incorporated into GAN-based models
     for image generation. [Figure 10-15](#sagan_attention) shows the self-attention
     mechanism from the paper introducing this architecture.
+  id: totrans-127
   prefs: []
   type: TYPE_NORMAL
 - en: '![](Images/gdl2_1015.png)'
+  id: totrans-128
   prefs: []
   type: TYPE_IMG
 - en: 'Figure 10-15\. The self-attention mechanism within the SAGAN model (source:
     [Zhang et al., 2018](https://arxiv.org/abs/1805.08318))'
+  id: totrans-129
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
@@ -749,14 +1000,17 @@
     solves this problem by incorporating the attention mechanism that we explored
     earlier in this chapter into the GAN. The effect of this inclusion is shown in
     [Figure 10-16](Images/#sagan_images).
+  id: totrans-130
   prefs: []
   type: TYPE_NORMAL
 - en: '![](Images/gdl2_1016.png)'
+  id: totrans-131
   prefs: []
   type: TYPE_IMG
 - en: 'Figure 10-16\. A SAGAN-generated image of a bird (leftmost cell) and the attention
     maps of the final attention-based generator layer for the pixels covered by the
     three colored dots (rightmost cells) (source: [Zhang et al., 2018](https://arxiv.org/abs/1805.08318))'
+  id: totrans-132
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
@@ -767,31 +1021,38 @@
     falls on other tail pixels, some of which are distant from the blue dot. It would
     be difficult to maintain this long-range dependency for pixels without attention,
     especially for long, thin structures in the image (such as the tail in this case).
+  id: totrans-133
   prefs: []
   type: TYPE_NORMAL
 - en: Training Your Own SAGAN
+  id: totrans-134
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
 - en: The official code for training your own SAGAN using TensorFlow is available
     on [GitHub](https://oreil.ly/rvej0). Bear in mind that training a SAGAN to achieve
     the results from the paper requires a significant amount of computing power.
+  id: totrans-135
   prefs: []
   type: TYPE_NORMAL
 - en: BigGAN
+  id: totrans-136
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
 - en: BigGAN,^([8](ch10.xhtml#idm45387004870736)) developed at DeepMind, extends the
     ideas from the SAGAN paper. [Figure 10-17](#biggan_examples) shows some of the
     images generated by BigGAN, trained on the ImageNet dataset at 128 × 128 resolution.
+  id: totrans-137
   prefs: []
   type: TYPE_NORMAL
 - en: '![](Images/gdl2_1017.png)'
+  id: totrans-138
   prefs: []
   type: TYPE_IMG
 - en: 'Figure 10-17\. Examples of images generated by BigGAN (source: [Brock et al.,
     2018](https://arxiv.org/abs/1809.11096))'
+  id: totrans-139
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
@@ -806,16 +1067,26 @@
     that have magnitude greater than a certain threshold). The smaller the truncation
     threshold, the greater the believability of generated samples, at the expense
     of reduced variability. This concept is shown in [Figure 10-18](#truncation).
+  id: totrans-140
   prefs: []
   type: TYPE_NORMAL
+  zh: 除了对基本 SAGAN 模型进行一些增量更改外，论文中还概述了将模型提升到更高层次的几项创新。其中一项创新是所谓的“截断技巧”。这是指用于采样的潜在分布与训练期间使用的
+    <math alttext="z tilde script upper N left-parenthesis 0 comma bold upper I right-parenthesis"><mrow><mi>z</mi>
+    <mo>∼</mo> <mi>𝒩</mi> <mo>(</mo> <mn>0</mn> <mo>,</mo> <mi>𝐈</mi> <mo>)</mo></mrow></math>
+    分布不同。具体来说，采样期间使用的分布是“截断正态分布”（重新采样具有大于一定阈值的 <math alttext="z"><mi>z</mi></math>
+    值）。截断阈值越小，生成样本的可信度越高，但变异性降低。这个概念在[图 10-18](#truncation)中展示。
 - en: '![](Images/gdl2_1018.png)'
+  id: totrans-141
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_1018.png)'
 - en: 'Figure 10-18\. The truncation trick: from left to right, the threshold is set
     to 2, 1, 0.5, and 0.04 (source: [Brock et al., 2018](https://arxiv.org/abs/1809.11096))'
+  id: totrans-142
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图 10-18\. 截断技巧：从左到右，阈值设置为 2、1、0.5 和 0.04（来源：[Brock 等人，2018](https://arxiv.org/abs/1809.11096)）
 - en: Also, as the name suggests, BigGAN is an improvement over SAGAN in part simply
     by being *bigger*. BigGAN uses a batch size of 2,048—8 times larger than the batch
     size of 256 used in SAGAN—and a channel size that is increased by 50% in each
@@ -823,24 +1094,36 @@
     by the inclusion of a shared embedding, by orthogonal regularization, and by incorporating
     the latent vector <math alttext="z"><mi>z</mi></math> into each layer of the generator,
     rather than just the initial layer.
+  id: totrans-143
   prefs: []
   type: TYPE_NORMAL
+  zh: 正如其名称所示，BigGAN 在某种程度上是对 SAGAN 的改进，仅仅是因为它更“大”。BigGAN 使用的批量大小为 2,048，比 SAGAN 中使用的
+    256 的批量大小大 8 倍，并且每一层的通道大小增加了 50%。然而，BigGAN 还表明，通过包含共享嵌入、正交正则化以及将潜在向量 <math alttext="z"><mi>z</mi></math>
+    包含到生成器的每一层中，而不仅仅是初始层，可以在结构上改进 SAGAN。
 - en: For a full description of the innovations introduced by BigGAN, I recommend
     reading the original paper and [accompanying presentation material](https://oreil.ly/vPn8T).
+  id: totrans-144
   prefs: []
   type: TYPE_NORMAL
+  zh: 要全面了解 BigGAN 引入的创新，我建议阅读原始论文和[相关演示材料](https://oreil.ly/vPn8T)。
 - en: Using BigGAN
+  id: totrans-145
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
+  zh: 使用 BigGAN
 - en: A tutorial for generating images using a pre-trained BigGAN is available on
     [the TensorFlow website](https://oreil.ly/YLbLb).
+  id: totrans-146
   prefs: []
   type: TYPE_NORMAL
+  zh: 在[ TensorFlow 网站](https://oreil.ly/YLbLb)上提供了一个使用预训练的 BigGAN 生成图像的教程。
 - en: VQ-GAN
+  id: totrans-147
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: VQ-GAN
 - en: Another important type of GAN is the Vector Quantized GAN (VQ-GAN), introduced
     in 2020.^([9](ch10.xhtml#idm45387004838864)) This model architecture builds upon
     an idea introduced in the 2017 paper “Neural Discrete Representation Learning”^([10](ch10.xhtml#idm45387004834704))—namely,
@@ -849,17 +1132,26 @@
     high-quality images while avoiding some of the issues often seen with traditional
     continuous latent space VAEs, such as *posterior collapse* (where the learned
     latent space becomes uninformative due to an overly powerful decoder).
+  id: totrans-148
   prefs: []
   type: TYPE_NORMAL
+  zh: 另一种重要的 GAN 类型是 2020 年推出的 Vector Quantized GAN（VQ-GAN）。这种模型架构建立在 2017 年的论文“神经离散表示学习”中提出的一个想法之上，即
+    VAE 学习到的表示可以是离散的，而不是连续的。这种新型模型，即 Vector Quantized VAE（VQ-VAE），被证明可以生成高质量的图像，同时避免了传统连续潜在空间
+    VAE 经常出现的一些问题，比如“后验坍缩”（学习到的潜在空间由于过于强大的解码器而变得无信息）。
 - en: Tip
+  id: totrans-149
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 提示
 - en: The first version of DALL.E, a text-to-image model released by OpenAI in 2021
     (see [Chapter 13](ch13.xhtml#chapter_multimodal)), utilized a VAE with a discrete
     latent space, similar to VQ-VAE.
+  id: totrans-150
   prefs: []
   type: TYPE_NORMAL
+  zh: OpenAI 在 2021 年发布的文本到图像模型 DALL.E 的第一个版本（参见[第 13 章](ch13.xhtml#chapter_multimodal)）使用了具有离散潜在空间的
+    VAE，类似于 VQ-VAE。
 - en: By a *discrete latent space*, we mean a learned list of vectors (the *codebook*),
     each associated with a corresponding index. The job of the encoder in a VQ-VAE
     is to collapse the input image to a smaller grid of vectors that can then be compared
@@ -869,15 +1161,23 @@
     (the embedding size) that matches the number of channels in the output of the
     encoder and input to the decoder. For example, <math alttext="e 1"><msub><mi>e</mi>
     <mn>1</mn></msub></math> is a vector that can be interpreted as *background*.
+  id: totrans-151
   prefs: []
   type: TYPE_NORMAL
+  zh: 通过“离散潜在空间”，我们指的是一个学习到的向量列表（“码书”），每个向量与相应的索引相关联。VQ-VAE 中编码器的工作是将输入图像折叠到一个较小的向量网格中，然后将其与码书进行比较。然后，将每个网格方格向量（通过欧氏距离）最接近的码书向量传递给解码器进行解码，如[图
+    10-19](#vqvae)所示。码书是一个长度为 <math alttext="d"><mi>d</mi></math>（嵌入大小）的学习向量列表，与编码器输出和解码器输入中的通道数相匹配。例如，<math
+    alttext="e 1"><msub><mi>e</mi> <mn>1</mn></msub></math> 是一个可以解释为“背景”的向量。
 - en: '![](Images/gdl2_1019.png)'
+  id: totrans-152
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_1019.png)'
 - en: Figure 10-19\. A diagram of a VQ-VAE
+  id: totrans-153
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图 10-19\. VQ-VAE 的示意图
 - en: The codebook can be thought of as a set of learned discrete concepts that are
     shared by the encoder and decoder in order to describe the contents of a given
     image. The VQ-VAE must find a way to make this set of discrete concepts as informative
@@ -888,6 +1188,7 @@
     as possible to vectors in the codebook. These terms replace the the KL divergence
     term between the encoded distribution and the standard Gaussian prior in a typical
     VAE.
+  id: totrans-154
   prefs: []
   type: TYPE_NORMAL
 - en: However, this architecture poses a question—how do we sample novel code grids
@@ -900,26 +1201,32 @@
     to predict the next code vector in the grid, given previous code vectors. In other
     words, the prior is learned by the model, rather than static as in the case of
     the vanilla VAE.
+  id: totrans-155
   prefs: []
   type: TYPE_NORMAL
 - en: Training Your Own VQ-VAE
+  id: totrans-156
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
 - en: There is an excellent tutorial by Sayak Paul on training your own VQ-VAE using
     Keras available on the [Keras website](https://oreil.ly/dmcb4).
+  id: totrans-157
   prefs: []
   type: TYPE_NORMAL
 - en: The VQ-GAN paper details several key changes to the VQ-VAE architecture, as
     shown in [Figure 10-20](#vqgan).
+  id: totrans-158
   prefs: []
   type: TYPE_NORMAL
 - en: '![](Images/gdl2_1020.png)'
+  id: totrans-159
   prefs: []
   type: TYPE_IMG
 - en: 'Figure 10-20\. A diagram of a VQ-GAN: the GAN discriminator helps to encourage
     the VAE to generate less blurry images through an additional adversarial loss
     term'
+  id: totrans-160
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
@@ -931,6 +1238,7 @@
     GAN discriminator is an additional component rather than a replacement of the
     VAE. The idea of combining a VAE with a GAN discriminator (VAE-GAN) was first
     introduced by Larsen et al. in their 2015 paper.^([11](ch10.xhtml#idm45387004808112))
+  id: totrans-161
   prefs: []
   type: TYPE_NORMAL
 - en: Secondly, the GAN discriminator predicts if small patches of the images are
@@ -948,6 +1256,7 @@
     that VAEs produce images that are stylistically more blurry than real images,
     so the PatchGAN discriminator can encourage the VAE decoder to generate sharper
     images than it would naturally produce.
+  id: totrans-162
   prefs: []
   type: TYPE_NORMAL
 - en: Thirdly, rather than use a single MSE reconstruction loss that compares the
@@ -957,6 +1266,7 @@
     idea is from the 2016 paper by Hou et al.,^([14](ch10.xhtml#idm45387004793216))
     where the authors show that this change to the loss function results in more realistic
     image generations.
+  id: totrans-163
   prefs: []
   type: TYPE_NORMAL
 - en: Lastly, instead of PixelCNN, a Transformer is used as the autoregressive part
@@ -966,9 +1276,11 @@
     use tokens that fall within a sliding window around the token to be predicted.
     This ensures that the model scales to larger images, which require a larger latent
     grid size and therefore more tokens to be generated by the Transformer.
+  id: totrans-164
   prefs: []
   type: TYPE_NORMAL
 - en: ViT VQ-GAN
+  id: totrans-165
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
@@ -976,6 +1288,7 @@
     entitled “Vector-Quantized Image Modeling with Improved VQGAN.”^([15](ch10.xhtml#idm45387004783968))
     Here, the authors show how the convolutional encoder and decoder of the VQ-GAN
     can be replaced with Transformers as shown in [Figure 10-21](#vit_vqgan).
+  id: totrans-166
   prefs: []
   type: TYPE_NORMAL
 - en: For the encoder, the authors use a *Vision Transformer* (ViT).^([16](ch10.xhtml#idm45387004780000))
@@ -983,6 +1296,7 @@
     designed for natural language processing, to image data. Instead of using convolutional
     layers to extract features from an image, a ViT divides the image into a sequence
     of patches, which are tokenized and then fed as input to an encoder Transformer.
+  id: totrans-167
   prefs: []
   type: TYPE_NORMAL
 - en: Specifically, in the ViT VQ-GAN, the nonoverlapping input patches (each of size
@@ -993,14 +1307,17 @@
     model, with the overall output being a sequence of patches that can be stitched
     back together to form the original image. The overall encoder-decoder model is
     trained end-to-end as an autoencoder.
+  id: totrans-168
   prefs: []
   type: TYPE_NORMAL
 - en: '![](Images/gdl2_1021.png)'
+  id: totrans-169
   prefs: []
   type: TYPE_IMG
 - en: 'Figure 10-21\. A diagram of a ViT VQ-GAN: the GAN discriminator helps to encourage
     the VAE to generate less blurry images through an additional adversarial loss
     term (source: [Yu and Koh, 2022](https://ai.googleblog.com/2022/05/vector-quantized-image-modeling-with.html))^([17](ch10.xhtml#idm45387004774560))'
+  id: totrans-170
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
@@ -1009,23 +1326,28 @@
     in total, there are three Transformers in a ViT VQ-GAN, in addition to the GAN
     discriminator and learned codebook. Examples of images generated by the ViT VQ-GAN
     from the paper are shown in [Figure 10-22](#vit_vqgan_ex).
+  id: totrans-171
   prefs: []
   type: TYPE_NORMAL
 - en: '![](Images/gdl2_1022.png)'
+  id: totrans-172
   prefs: []
   type: TYPE_IMG
 - en: 'Figure 10-22\. Example images generated by a ViT VQ-GAN trained on ImageNet
     (source: [Yu et al., 2021](https://arxiv.org/pdf/2110.04627.pdf))'
+  id: totrans-173
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
 - en: Summary
+  id: totrans-174
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
 - en: In this chapter, we have taken a tour of some of the most important and influential
     GAN papers since 2017\. In particular, we have explored ProGAN, StyleGAN, StyleGAN2,
     SAGAN, BigGAN, VQ-GAN, and ViT VQ-GAN.
+  id: totrans-175
   prefs: []
   type: TYPE_NORMAL
 - en: We started by exploring the concept of progressive training that was pioneered
@@ -1037,6 +1359,7 @@
     alongside additional enhancements such as path regularization. The paper also
     showed how the desirable property of gradual resolution refinement could be retained
     without having to the train the network progressively.
+  id: totrans-176
   prefs: []
   type: TYPE_NORMAL
 - en: We also saw how the concept of attention could be built into a GAN, with the
@@ -1046,6 +1369,7 @@
     spatial dimensions of the image. BigGAN was an extension of this idea that made
     several key changes and trained a larger network to improve the image quality
     further.
+  id: totrans-177
   prefs: []
   type: TYPE_NORMAL
 - en: In the VQ-GAN paper, the authors show how several different types of generative
@@ -1056,78 +1380,96 @@
     used to construct a novel sequence of code tokens that can be decoded by the VAE
     decoder to produce novel images. The ViT VQ-GAN paper extends this idea even further,
     by replacing the convolutional encoder and decoder of VQ-GAN with Transformers.
+  id: totrans-178
   prefs: []
   type: TYPE_NORMAL
 - en: '^([1](ch10.xhtml#idm45387005226448-marker)) Huiwen Chang et al., “Muse: Text-to-Image
     Generation via Masked Generative Transformers,” January 2, 2023, [*https://arxiv.org/abs/2301.00704*](https://arxiv.org/abs/2301.00704).'
+  id: totrans-179
   prefs: []
   type: TYPE_NORMAL
 - en: ^([2](ch10.xhtml#idm45387005216528-marker)) Tero Karras et al., “Progressive
     Growing of GANs for Improved Quality, Stability, and Variation,” October 27, 2017,
     [*https://arxiv.org/abs/1710.10196*](https://arxiv.org/abs/1710.10196).
+  id: totrans-180
   prefs: []
   type: TYPE_NORMAL
 - en: ^([3](ch10.xhtml#idm45387005140128-marker)) Tero Karras et al., “A Style-Based
     Generator Architecture for Generative Adversarial Networks,” December 12, 2018,
     [*https://arxiv.org/abs/1812.04948*](https://arxiv.org/abs/1812.04948).
+  id: totrans-181
   prefs: []
   type: TYPE_NORMAL
 - en: ^([4](ch10.xhtml#idm45387005090240-marker)) Xun Huang and Serge Belongie, “Arbitrary
     Style Transfer in Real-Time with Adaptive Instance Normalization,” March 20, 2017,
     [*https://arxiv.org/abs/1703.06868*](https://arxiv.org/abs/1703.06868).
+  id: totrans-182
   prefs: []
   type: TYPE_NORMAL
 - en: ^([5](ch10.xhtml#idm45387005019232-marker)) Tero Karras et al., “Analyzing and
     Improving the Image Quality of StyleGAN,” December 3, 2019, [*https://arxiv.org/abs/1912.04958*](https://arxiv.org/abs/1912.04958).
+  id: totrans-183
   prefs: []
   type: TYPE_NORMAL
 - en: '^([6](ch10.xhtml#idm45387004898624-marker)) Axel Sauer et al., “StyleGAN-XL:
     Scaling StyleGAN to Large Diverse Datasets,” February 1, 2022, [*https://arxiv.org/abs/2202.00273v2*](https://arxiv.org/abs/2202.00273v2).'
+  id: totrans-184
   prefs: []
   type: TYPE_NORMAL
 - en: ^([7](ch10.xhtml#idm45387004886752-marker)) Han Zhang et al., “Self-Attention
     Generative Adversarial Networks,” May 21, 2018, [*https://arxiv.org/abs/1805.08318*](https://arxiv.org/abs/1805.08318).
+  id: totrans-185
   prefs: []
   type: TYPE_NORMAL
 - en: ^([8](ch10.xhtml#idm45387004870736-marker)) Andrew Brock et al., “Large Scale
     GAN Training for High Fidelity Natural Image Synthesis,” September 28, 2018, [*https://arxiv.org/abs/1809.11096*](https://arxiv.org/abs/1809.11096).
+  id: totrans-186
   prefs: []
   type: TYPE_NORMAL
 - en: ^([9](ch10.xhtml#idm45387004838864-marker)) Patrick Esser et al., “Taming Transformers
     for High-Resolution Image Synthesis,” December 17, 2020, [*https://arxiv.org/abs/2012.09841*](https://arxiv.org/abs/2012.09841).
+  id: totrans-187
   prefs: []
   type: TYPE_NORMAL
 - en: ^([10](ch10.xhtml#idm45387004834704-marker)) Aaron van den Oord et al., “Neural
     Discrete Representation Learning,” November 2, 2017, [*https://arxiv.org/abs/1711.00937v2*](https://arxiv.org/abs/1711.00937v2).
+  id: totrans-188
   prefs: []
   type: TYPE_NORMAL
 - en: ^([11](ch10.xhtml#idm45387004808112-marker)) Anders Boesen Lindbo Larsen et
     al., “Autoencoding Beyond Pixels Using a Learned Similarity Metric,” December
     31, 2015, [*https://arxiv.org/abs/1512.09300*](https://arxiv.org/abs/1512.09300).
+  id: totrans-189
   prefs: []
   type: TYPE_NORMAL
 - en: ^([12](ch10.xhtml#idm45387004801680-marker)) Phillip Isola et al., “Image-to-Image
     Translation with Conditional Adversarial Networks,” November 21, 2016, [*https://arxiv.org/abs/1611.07004v3*](https://arxiv.org/abs/1611.07004v3).
+  id: totrans-190
   prefs: []
   type: TYPE_NORMAL
 - en: ^([13](ch10.xhtml#idm45387004798080-marker)) Jun-Yan Zhu et al., “Unpaired Image-to-Image
     Translation using Cycle-Consistent Adversarial Networks,” March 30, 2017, [*https://arxiv.org/abs/1703.10593*](https://arxiv.org/abs/1703.10593).
+  id: totrans-191
   prefs: []
   type: TYPE_NORMAL
 - en: ^([14](ch10.xhtml#idm45387004793216-marker)) Xianxu Hou et al., “Deep Feature
     Consistent Variational Autoencoder,” October 2, 2016, [*https://arxiv.org/abs/1610.00291*](https://arxiv.org/abs/1610.00291).
+  id: totrans-192
   prefs: []
   type: TYPE_NORMAL
 - en: ^([15](ch10.xhtml#idm45387004783968-marker)) Jiahui Yu et al., “Vector-Quantized
     Image Modeling with Improved VQGAN,” October 9, 2021, [*https://arxiv.org/abs/2110.04627*](https://arxiv.org/abs/2110.04627).
+  id: totrans-193
   prefs: []
   type: TYPE_NORMAL
 - en: '^([16](ch10.xhtml#idm45387004780000-marker)) Alexey Dosovitskiy et al., “An
     Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale,” October
     22, 2020, [*https://arxiv.org/abs/2010.11929v2*](https://arxiv.org/abs/2010.11929v2).'
+  id: totrans-194
   prefs: []
   type: TYPE_NORMAL
 - en: ^([17](ch10.xhtml#idm45387004774560-marker)) Jiahui Yu and Jing Yu Koh, “Vector-Quantized
     Image Modeling with Improved VQGAN,” May 18, 2022, [*https://ai.googleblog.com/2022/05/vector-quantized-image-modeling-with.html*](https://ai.googleblog.com/2022/05/vector-quantized-image-modeling-with.html).
+  id: totrans-195
   prefs: []
   type: TYPE_NORMAL