diff --git a/totrans/gen-dl_11.yaml b/totrans/gen-dl_11.yaml
index d125ed2..fd0b17a 100644
--- a/totrans/gen-dl_11.yaml
+++ b/totrans/gen-dl_11.yaml
@@ -1,7 +1,9 @@
- en: Chapter 8\. Diffusion Models
+ id: totrans-0
prefs:
- PREF_H1
type: TYPE_NORMAL
+ zh: 第8章。扩散模型
- en: Alongside GANs, diffusion models are one of the most influential and impactful
generative modeling techniques for image generation to have been introduced over
the last decade. Across many benchmarks, diffusion models now outperform previously
@@ -10,16 +12,21 @@
2 and Google’s ImageGen for text-to-image generation). Recently, there has been
an explosion of diffusion models being applied across wide range of tasks, reminiscent
of the GAN proliferation that took place between 2017–2020.
+ id: totrans-1
prefs: []
type: TYPE_NORMAL
+ zh: 与GANs并驾齐驱,扩散模型是过去十年中引入的最具影响力和影响力的生成建模技术之一。在许多基准测试中,扩散模型现在胜过以前的最先进GANs,并迅速成为生成建模从业者的首选选择,特别是对于视觉领域(例如,OpenAI的DALL.E
+ 2和Google的ImageGen用于文本到图像生成)。最近,扩散模型在广泛任务中的应用呈现爆炸性增长,类似于2017年至2020年间GAN的普及。
- en: 'Many of the core ideas that underpin diffusion models share similarities with
earlier types of generative models that we have already explored in this book
(e.g., denoising autoencoders, energy-based models). Indeed, the name *diffusion*
takes inspiration from the well-studied property of thermodynamic diffusion: an
important link was made between this purely physical field and deep learning in
2015.^([1](ch08.xhtml#idm45387010500320))'
+ id: totrans-2
prefs: []
type: TYPE_NORMAL
+ zh: 许多支撑扩散模型的核心思想与本书中已经探索过的早期类型的生成模型(例如,去噪自动编码器,基于能量的模型)有相似之处。事实上,名称*扩散*灵感来自热力学扩散的深入研究:在2015年,这一纯物理领域与深度学习之间建立了重要联系。^([1](ch08.xhtml#idm45387010500320))
- en: Important progress was also being made in the field of score-based generative
models,^([2](ch08.xhtml#idm45387010496240))^,^([3](ch08.xhtml#idm45387010494000))
a branch of energy-based modeling that directly estimates the gradient of the
@@ -28,141 +35,211 @@
Stefano Ermon used multiple scales of noise perturbations applied to the raw data
to ensure the model—a *noise conditional score network* (NCSN)—performs well on
regions of low data density.
+ id: totrans-3
prefs: []
type: TYPE_NORMAL
+ zh: 在基于分数的生成模型领域也取得了重要进展,^([2](ch08.xhtml#idm45387010496240))^,^([3](ch08.xhtml#idm45387010494000))这是能量基模型的一个分支,直接估计对数分布的梯度(也称为分数函数),以训练模型,作为使用对比散度的替代方法。特别是,杨松和斯特凡诺·厄尔蒙使用多个尺度的噪声扰动应用于原始数据,以确保模型-一个*噪声条件分数网络*(NCSN)在低数据密度区域表现良好。
- en: The breakthrough diffusion model paper came in the summer of 2020.^([4](ch08.xhtml#idm45387010490880))
Standing on the shoulders of earlier works, the paper uncovers a deep connection
between diffusion models and score-based generative models, and the authors use
this fact to train a diffusion model that can rival GANs across several datasets,
called the *Denoising Diffusion Probabilistic Model* (DDPM).
+ id: totrans-4
prefs: []
type: TYPE_NORMAL
+ zh: 突破性的扩散模型论文于2020年夏天发表。^([4](ch08.xhtml#idm45387010490880))在前人的基础上,该论文揭示了扩散模型和基于分数的生成模型之间的深刻联系,作者利用这一事实训练了一个可以在几个数据集上与GANs匹敌的扩散模型,称为*去噪扩散概率模型*(DDPM)。
- en: This chapter will walk through the theoretical requirements for understanding
how a denoising diffusion model works. You will then learn how to build your own
denoising diffusion model using Keras.
+ id: totrans-5
prefs: []
type: TYPE_NORMAL
+ zh: 本章将介绍理解去噪扩散模型工作原理的理论要求。然后,您将学习如何使用Keras构建自己的去噪扩散模型。
- en: Introduction
+ id: totrans-6
prefs:
- PREF_H1
type: TYPE_NORMAL
+ zh: 介绍
- en: To help explain the key ideas that underpin diffusion models, let’s begin with
a short story!
+ id: totrans-7
prefs: []
type: TYPE_NORMAL
+ zh: 为了帮助解释支撑扩散模型的关键思想,让我们从一个简短的故事开始!
- en: The DiffuseTV story describes the general idea behind a diffusion model. Now
let’s dive into the technicalities of how we build such a model using Keras.
+ id: totrans-8
prefs: []
type: TYPE_NORMAL
+ zh: DiffuseTV故事描述了扩散模型背后的一般思想。现在让我们深入探讨如何使用Keras构建这样一个模型的技术细节。
- en: Denoising Diffusion Models (DDM)
+ id: totrans-9
prefs:
- PREF_H1
type: TYPE_NORMAL
+ zh: 去噪扩散模型(DDM)
- en: The core idea behind a denoising diffusion model is simple—we train a deep learning
model to denoise an image over a series of very small steps. If we start from
pure random noise, in theory we should be able to keep applying the model until
we obtain an image that looks as if it were drawn from the training set. What’s
amazing is that this simple concept works so well in practice!
+ id: totrans-10
prefs: []
type: TYPE_NORMAL
+ zh: 去噪扩散模型背后的核心思想很简单-我们训练一个深度学习模型,在一系列非常小的步骤中去噪图像。如果我们从纯随机噪音开始,在理论上我们应该能够不断应用该模型,直到获得一个看起来好像是从训练集中绘制出来的图像。令人惊奇的是,这个简单的概念在实践中效果如此出色!
- en: Let’s first get set up with a dataset and then walk through the forward (noising)
and backward (denoising) diffusion processes.
+ id: totrans-11
prefs: []
type: TYPE_NORMAL
+ zh: 让我们首先准备一个数据集,然后逐步介绍前向(加噪)和后向(去噪)扩散过程。
- en: Running the Code for This Example
+ id: totrans-12
prefs:
- PREF_H1
type: TYPE_NORMAL
+ zh: 运行此示例的代码
- en: The code for this example can be found in the Jupyter notebook located at *notebooks/08_diffusion/01_ddm/ddm.ipynb*
in the book repository.
+ id: totrans-13
prefs: []
type: TYPE_NORMAL
+ zh: 此示例的代码可以在书籍存储库中位于*notebooks/08_diffusion/01_ddm/ddm.ipynb*的Jupyter笔记本中找到。
- en: The code is adapted from the excellent [tutorial on denoising diffusion implicit
models](https://oreil.ly/srPCe) created by András Béres available on the Keras
website.
+ id: totrans-14
prefs: []
type: TYPE_NORMAL
+ zh: 该代码改编自András Béres在Keras网站上创建的优秀[去噪扩散隐式模型教程](https://oreil.ly/srPCe)。
- en: The Flowers Dataset
+ id: totrans-15
prefs:
- PREF_H2
type: TYPE_NORMAL
+ zh: 花卉数据集
- en: We’ll be using the [Oxford 102 Flower dataset](https://oreil.ly/HfrKV) that
is available through Kaggle. This is a set of over 8,000 color images of a variety
of flowers.
+ id: totrans-16
prefs: []
type: TYPE_NORMAL
+ zh: 我们将使用通过Kaggle提供的[牛津102花卉数据集](https://oreil.ly/HfrKV)。这是一组包含各种花卉的8000多张彩色图像。
- en: You can download the dataset by running the Kaggle dataset downloader script
in the book repository, as shown in [Example 8-1](#downloading-flower-dataset).
This will save the flower images to the */data* folder.
+ id: totrans-17
prefs: []
type: TYPE_NORMAL
+ zh: 您可以通过在书籍存储库中运行Kaggle数据集下载脚本来下载数据集,如[示例8-1](#downloading-flower-dataset)所示。这将把花卉图像保存到*/data*文件夹中。
- en: Example 8-1\. Downloading the Oxford 102 Flower dataset
+ id: totrans-18
prefs:
- PREF_H5
type: TYPE_NORMAL
+ zh: 示例8-1。下载牛津102花卉数据集
- en: '[PRE0]'
+ id: totrans-19
prefs: []
type: TYPE_PRE
+ zh: '[PRE0]'
- en: '`As usual, we’ll load the images in using the Keras `image_dataset_from_directory`
function, resize the images to 64 × 64 pixels, and scale the pixel values to the
range [0, 1]. We’ll also repeat the dataset five times to increase the epoch length
and batch the data into groups of 64 images, as shown in [Example 8-2](#flower-preprocessing-ex).'
+ id: totrans-20
prefs: []
type: TYPE_NORMAL
+ zh: “通常情况下,我们将使用Keras的`image_dataset_from_directory`函数加载图像,将图像调整为64×64像素,并将像素值缩放到范围[0,
+ 1]。我们还将数据集重复五次,以增加时代长度,并将数据分成64张图像一组,如[示例8-2](#flower-preprocessing-ex)所示。
- en: Example 8-2\. Loading the Oxford 102 Flower dataset
+ id: totrans-21
prefs:
- PREF_H5
type: TYPE_NORMAL
+ zh: 示例8-2。加载牛津102花卉数据集
- en: '[PRE1]'
+ id: totrans-22
prefs: []
type: TYPE_PRE
+ zh: '[PRE1]'
- en: '[![1](Images/1.png)](#co_diffusion_models_CO1-1)'
+ id: totrans-23
prefs: []
type: TYPE_NORMAL
+ zh: '[![1](Images/1.png)](#co_diffusion_models_CO1-1)'
- en: Load dataset (when required during training) using the Keras `image_dataset_from_directory`
function.
+ id: totrans-24
prefs: []
type: TYPE_NORMAL
+ zh: 使用Keras的`image_dataset_from_directory`函数加载数据集(在训练期间需要时)。
- en: '[![2](Images/2.png)](#co_diffusion_models_CO1-2)'
+ id: totrans-25
prefs: []
type: TYPE_NORMAL
+ zh: '[![2](Images/2.png)](#co_diffusion_models_CO1-2)'
- en: Scale the pixel values to the range [0, 1].
+ id: totrans-26
prefs: []
type: TYPE_NORMAL
+ zh: 将像素值缩放到范围[0, 1]。
- en: '[![3](Images/3.png)](#co_diffusion_models_CO1-3)'
+ id: totrans-27
prefs: []
type: TYPE_NORMAL
+ zh: '[![3](Images/3.png)](#co_diffusion_models_CO1-3)'
- en: Repeat the dataset five times.
+ id: totrans-28
prefs: []
type: TYPE_NORMAL
+ zh: 将数据集重复五次。
- en: '[![4](Images/4.png)](#co_diffusion_models_CO1-4)'
+ id: totrans-29
prefs: []
type: TYPE_NORMAL
+ zh: '[![4](Images/4.png)](#co_diffusion_models_CO1-4)'
- en: Batch the dataset into groups of 64 images.
+ id: totrans-30
prefs: []
type: TYPE_NORMAL
+ zh: 将数据集分成64张图像一组。
- en: Example images from the dataset are shown in [Figure 8-2](Images/#flower_example_images).
+ id: totrans-31
prefs: []
type: TYPE_NORMAL
+ zh: 数据集中的示例图像显示在[图8-2](Images/#flower_example_images)中。
- en: '![](Images/gdl2_0802.png)'
+ id: totrans-32
prefs: []
type: TYPE_IMG
+ zh: '![](Images/gdl2_0802.png)'
- en: Figure 8-2\. Example images from the Oxford 102 Flower dataset
+ id: totrans-33
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 图8-2。牛津102花卉数据集中的示例图像
- en: Now that we have our dataset we can explore how we should add noise to the images,
using a forward diffusion process.` `## The Forward Diffusion Process
+ id: totrans-34
prefs: []
type: TYPE_NORMAL
+ zh: 现在我们有了数据集,我们可以探讨如何向图像添加噪声,使用前向扩散过程。` `##前向扩散过程
- en: Suppose we have an image
that we want to corrupt gradually over a large number of steps (say,
), so that eventually it is indistinguishable from standard Gaussian noise (i.e.,
should have zero mean and unit variance). How should we go about doing this?
+ id: totrans-35
prefs: []
type: TYPE_NORMAL
+ zh: 假设我们有一幅图像,我们希望在大量步骤(比如,)中逐渐损坏,以至于最终与标准高斯噪声不可区分(即,应具有零均值和单位方差)。我们应该如何做到这一点呢?
- en: We can define a function that adds a small
amount of Gaussian noise with variance to an image
+ id: totrans-48
prefs: []
type: TYPE_NORMAL
+ zh: 𝐱 t
+ = α t
+ 𝐱 t-1 +
+ 1 - α t
+ ϵ t-1
+ = α
+ t α t-1
+ 𝐱 t-2 +
+ 1 - α t α
+ t-1 ϵ
+ = ⋯ =
+ α ¯
+ t 𝐱 0 + 1
+ - α ¯ t
+ ϵ
- en: Note that the second line uses the fact that we can add two Gaussians to obtain
a new Gaussian. We therefore have a way to jump from the original image 𝐱 0 to any step of the
@@ -323,12 +510,25 @@
t">1 - α ¯
t is the variance due to the noise ( ϵ
).
+ id: totrans-49
prefs: []
type: TYPE_NORMAL
+ zh: 请注意,第二行使用了我们可以将两个高斯函数相加以获得一个新高斯函数的事实。因此,我们有一种方法可以从原始图像𝐱 0跳转到前向扩散过程的任何步骤𝐱 t。此外,我们可以使用α ¯
+ t值来定义扩散进度表,而不是原始的β
+ t值,解释为α ¯ t是由信号(原始图像,𝐱 0)引起的方差,而1 - α
+ ¯ t是由噪声(ϵ)引起的方差。
- en: 'The forward diffusion process q can therefore
also be written as follows:'
+ id: totrans-50
prefs: []
type: TYPE_NORMAL
+ zh: 前向扩散过程q也可以写成如下形式:
- en: α ¯ t )
𝐈 )
+ id: totrans-51
prefs: []
type: TYPE_NORMAL
+ zh: q (
+ 𝐱 t | 𝐱 0
+ ) = 𝒩 ( 𝐱 t
+ ; α ¯ t
+ 𝐱 0 , ( 1 -
+ α ¯ t )
+ 𝐈 )
- en: Diffusion Schedules
+ id: totrans-52
prefs:
- PREF_H2
type: TYPE_NORMAL
+ zh: 扩散进度表
- en: Notice that we are also free to choose a different β t at each timestep—they don’t all have
be the same. How the β t
(or α
¯ t ) values change with t
is called the *diffusion* *schedule*.
+ id: totrans-53
prefs: []
type: TYPE_NORMAL
+ zh: 请注意,我们也可以在每个时间步长选择不同的β t——它们不必全部相同。β t(或α ¯
+ t)值随着t的变化被称为*扩散进度表*。
- en: In the original paper (Ho et al., 2020), the authors chose a *linear diffusion
schedule* for β t
—that is, β t
@@ -363,54 +582,91 @@
T = 0.02\. This ensures that in the early
stages of the noising process we take smaller noising steps than in the later
stages, when the image is already very noisy.
+ id: totrans-54
prefs: []
type: TYPE_NORMAL
+ zh: 在原始论文中(Ho等人,2020年),作者选择了一个*线性扩散进度表*用于β
+ t——即,β
+ t随着t线性增加,从β 1 =0.0001到β T
+ =0.02。这确保在噪声过程的早期阶段,我们采取比在后期阶段更小的噪声步骤,当图像已经非常嘈杂时。
- en: We can code up a linear diffusion schedule as shown in [Example 8-3](#linear_diffusion_schedule).
+ id: totrans-55
prefs: []
type: TYPE_NORMAL
+ zh: 我们可以编写一个线性扩散进度表,如[示例8-3](#linear_diffusion_schedule)所示。
- en: Example 8-3\. The linear diffusion schedule
+ id: totrans-56
prefs:
- PREF_H5
type: TYPE_NORMAL
+ zh: 示例8-3。线性扩散进度表
- en: '[PRE2]'
+ id: totrans-57
prefs: []
type: TYPE_PRE
+ zh: '[PRE2]'
- en: '[![1](Images/1.png)](#co_diffusion_models_CO2-1)'
+ id: totrans-58
prefs: []
type: TYPE_NORMAL
+ zh: '[![1](Images/1.png)](#co_diffusion_models_CO2-1)'
- en: The diffusion times are equally spaced steps between 0 and 1.
+ id: totrans-59
prefs: []
type: TYPE_NORMAL
+ zh: 扩散时间是0到1之间等间隔的步骤。
- en: '[![2](Images/2.png)](#co_diffusion_models_CO2-2)'
+ id: totrans-60
prefs: []
type: TYPE_NORMAL
+ zh: '[![2](Images/2.png)](#co_diffusion_models_CO2-2)'
- en: The linear diffusion schedule is applied to the diffusion times to produce the
noise and signal rates.
+ id: totrans-61
prefs: []
type: TYPE_NORMAL
+ zh: 线性扩散进度表应用于扩散时间以产生噪声和信号速率。
- en: 'In a later paper it was found that a *cosine diffusion schedule* outperformed
the linear schedule from the original paper.^([5](ch08.xhtml#idm45387010764208))
A cosine schedule defines the following values of α ¯ t
:'
+ id: totrans-62
prefs: []
type: TYPE_NORMAL
+ zh: 在后续的一篇论文中发现,*余弦扩散进度表*优于原始论文中的线性进度表。余弦进度表定义了以下α ¯ t值:
- en: α
¯ t = cos
2 ( t T ·
π 2 )
+ id: totrans-63
prefs: []
type: TYPE_NORMAL
+ zh: α
+ ¯ t = cos
+ 2 ( t T ·
+ π 2 )
- en: 'The updated equation is therefore as follows (using the trigonometric identity
cos
2 ( x ) + sin 2 ( x )
= 1 ):'
+ id: totrans-64
prefs: []
type: TYPE_NORMAL
+ zh: 因此,更新的方程如下(使用三角恒等式 cos 2 ( x )
+ + sin 2 (
+ x ) = 1):
- en: sin (
t T · π 2
) ϵ
+ id: totrans-65
prefs: []
type: TYPE_NORMAL
+ zh: 𝐱
+ t = cos ( t
+ T · π 2 )
+ 𝐱 0 + sin (
+ t T · π 2
+ ) ϵ
- en: This equation is a simplified version of the actual cosine diffusion schedule
used in the paper. The authors also add an offset term and scaling to prevent
the noising steps from being too small at the beginning of the diffusion process.
We can code up the cosine and offset cosine diffusion schedules as shown in [Example 8-4](#cosine_diffusion_schedule).
+ id: totrans-66
prefs: []
type: TYPE_NORMAL
+ zh: 这个方程是论文中使用的实际余弦扩散时间表的简化版本。作者还添加了一个偏移项和缩放,以防止扩散过程开始时噪声步骤太小。我们可以编写余弦和偏移余弦扩散时间表,如[示例8-4](#cosine_diffusion_schedule)所示。
- en: Example 8-4\. The cosine and offset cosine diffusion schedules
+ id: totrans-67
prefs:
- PREF_H5
type: TYPE_NORMAL
+ zh: 示例8-4\. 余弦和偏移余弦扩散时间表
- en: '[PRE3]'
+ id: totrans-68
prefs: []
type: TYPE_PRE
+ zh: '[PRE3]'
- en: '[![1](Images/1.png)](#co_diffusion_models_CO3-1)'
+ id: totrans-69
prefs: []
type: TYPE_NORMAL
+ zh: '[![1](Images/1.png)](#co_diffusion_models_CO3-1)'
- en: The pure cosine diffusion schedule (without offset or rescaling).
+ id: totrans-70
prefs: []
type: TYPE_NORMAL
+ zh: 纯余弦扩散时间表(不包括偏移或重新缩放)。
- en: '[![2](Images/2.png)](#co_diffusion_models_CO3-2)'
+ id: totrans-71
prefs: []
type: TYPE_NORMAL
+ zh: '[![2](Images/2.png)](#co_diffusion_models_CO3-2)'
- en: The offset cosine diffusion schedule that we will be using, which adjusts the
schedule to ensure the noising steps are not too small at the start of the noising
process.
+ id: totrans-72
prefs: []
type: TYPE_NORMAL
+ zh: 我们将使用的偏移余弦扩散时间表会调整时间表,以确保在扩散过程开始时噪声步骤不会太小。
- en: We can compute the α
¯ t values for each t
to show how much signal ( α ¯ t
) is let through at each stage of the process for the linear, cosine, and offset
cosine diffusion schedules, as shown in [Figure 8-4](#signal_and_noise_linear).
+ id: totrans-73
prefs: []
type: TYPE_NORMAL
+ zh: 我们可以计算每个 t 的 α ¯ t
+ 值,以显示在线性、余弦和偏移余弦扩散时间表的每个阶段中有多少信号( α ¯ t )和噪声( 1 - α ¯ t )通过,如[图8-4](#signal_and_noise_linear)所示。
- en: '![](Images/gdl2_0804.png)'
+ id: totrans-74
prefs: []
type: TYPE_IMG
+ zh: '![](Images/gdl2_0804.png)'
- en: Figure 8-4\. The signal and noise at each step of the noising process, for the
linear, cosine, and offset cosine diffusion schedules
+ id: totrans-75
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 图8-4\. 在扩散过程的每个步骤中的信号和噪声,对于线性、余弦和偏移余弦扩散时间表
- en: Notice how the noise level ramps up more slowly in the cosine diffusion schedule.
A cosine diffusion schedule adds noise to the image more gradually than a linear
diffusion schedule, which improves training efficiency and generation quality.
This can also be seen in images that have been corrupted by the linear and cosine
schedules ([Figure 8-5](#diff_schedule_examples)).
+ id: totrans-76
prefs: []
type: TYPE_NORMAL
+ zh: 请注意,余弦扩散时间表中的噪声级别上升速度较慢。余弦扩散时间表将噪声逐渐添加到图像中,比线性扩散时间表更有效地提高了训练效率和生成质量。这也可以在被线性和余弦时间表破坏的图像中看到([图8-5](#diff_schedule_examples))。
- en: '![](Images/gdl2_0805.png)'
+ id: totrans-77
prefs: []
type: TYPE_IMG
+ zh: '![](Images/gdl2_0805.png)'
- en: 'Figure 8-5\. An image being corrupted by the linear (top) and cosine (bottom)
diffusion schedules, at equally spaced values of t from 0 to T (source: [Ho et
al., 2020](https://arxiv.org/abs/2006.11239))'
+ id: totrans-78
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 图8-5\. 一个图像被线性(顶部)和余弦(底部)扩散时间表破坏,从0到T的等间距值(来源:[Ho等人,2020](https://arxiv.org/abs/2006.11239))
- en: The Reverse Diffusion Process
+ id: totrans-79
prefs:
- PREF_H2
type: TYPE_NORMAL
+ zh: 反向扩散过程
- en: Now let’s look at the reverse diffusion process. To recap, we are looking to
build a neural network p
@@ -501,27 +800,47 @@
upper I right-parenthesis">𝒩 ( 0 , 𝐈
) and then apply the reverse diffusion process multiple
times in order to generate a novel image. This is visualized in [Figure 8-6](#reverse_diff).
+ id: totrans-80
prefs: []
type: TYPE_NORMAL
+ zh: 现在让我们看一下反向扩散过程。简而言之,我们要构建一个神经网络 p
+ θ ( 𝐱 t-1
+ | 𝐱 t ),它可以*撤销*扩散过程,即近似反向分布
+ q ( 𝐱
+ t-1 | 𝐱
+ t )。如果我们能做到这一点,我们可以从 𝒩
+ ( 0 , 𝐈 ) 中随机采样噪声,然后多次应用反向扩散过程以生成新颖的图像。这在[图8-6](#reverse_diff)中可视化。
- en: '![](Images/gdl2_0806.png)'
+ id: totrans-81
prefs: []
type: TYPE_IMG
+ zh: '![](Images/gdl2_0806.png)'
- en: Figure 8-6\. The reverse diffusion process p θ
. ( 𝐱 t-1
| 𝐱 t )
tries to *undo* the noise produced by the forward diffusion process
+ id: totrans-82
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 图8-6。反向扩散过程p
+ θ . ( 𝐱 t-1
+ | 𝐱 t )试图*撤消*由正向扩散过程产生的噪声
- en: There are many similarities between the reverse diffusion process and the decoder
of a variational autoencoder. In both, we aim to transform random noise into meaningful
output using a neural network. The difference between diffusion models and VAEs
is that in a VAE the forward process (converting images to noise) is part of the
model (i.e., it is learned), whereas in a diffusion model it is unparameterized.
+ id: totrans-83
prefs: []
type: TYPE_NORMAL
+ zh: 反向扩散过程和变分自动编码器的解码器之间存在许多相似之处。 在两者中,我们的目标都是使用神经网络将随机噪声转换为有意义的输出。 扩散模型和VAE之间的区别在于,在VAE中,正向过程(将图像转换为噪声)是模型的一部分(即,它是学习的),而在扩散模型中,它是非参数化的。
- en: Therefore, it makes sense to apply the same loss function as in a variational
autoencoder. The original DDPM paper derives the exact form of this loss function
and shows that it can be optimized by training a network ϵ that has been added to a given image
𝐱 0
at timestep t .
+ id: totrans-84
prefs: []
type: TYPE_NORMAL
+ zh: 因此,将与变分自动编码器中相同的损失函数应用是有意义的。 原始的DDPM论文推导出了这个损失函数的确切形式,并表明可以通过训练一个网络ϵ θ来预测已添加到给定图像𝐱 0的噪声ϵ在时间步t。
- en: In other words, we sample an image 𝐱
0 and transform it by t
noising steps to get the image ϵ θ (
𝐱 t ) and the true
ϵ .
+ id: totrans-85
prefs: []
type: TYPE_NORMAL
+ zh: 换句话说,我们对图像𝐱 0进行采样,并通过t个噪声步骤将其转换为图像𝐱
+ t = α ¯
+ t 𝐱 0 + 1
+ - α ¯ t
+ ϵ。 我们将这个新图像和噪声率α ¯ t提供给神经网络,并要求它预测ϵ,采取梯度步骤来计算预测ϵ
+ θ ( 𝐱 t )和真实ϵ之间的平方误差。
- en: 'We’ll take a look at the structure of the neural network in the next section.
It is worth noting here that the diffusion model actually maintains two copies
of the network: one that is actively trained used gradient descent and another
@@ -557,109 +895,170 @@
not as susceptible to short-term fluctuations and spikes in the training process,
making it more robust for generation than the actively trained network. We therefore
use the EMA network whenever we want to produce generated output from the network.'
+ id: totrans-86
prefs: []
type: TYPE_NORMAL
+ zh: 我们将在下一节中查看神经网络的结构。 值得注意的是,扩散模型实际上维护了两个网络副本:一个是通过梯度下降主动训练的网络,另一个是权重的指数移动平均(EMA)网络,该网络是在先前的训练步骤中对主动训练网络的权重进行指数移动平均。
+ EMA网络不太容易受到训练过程中的短期波动和峰值的影响,因此在生成方面比主动训练网络更稳健。 因此,每当我们想要从网络生成输出时,我们都会使用EMA网络。
- en: The training process for the model is shown in [Figure 8-7](#diff_training_process).
+ id: totrans-87
prefs: []
type: TYPE_NORMAL
+ zh: 模型的训练过程如[图8-7](#diff_training_process)所示。
- en: '![](Images/gdl2_0807.png)'
+ id: totrans-88
prefs: []
type: TYPE_IMG
+ zh: '![](Images/gdl2_0807.png)'
- en: 'Figure 8-7\. The training process for a denoising diffusion model (source:
[Ho et al., 2020](https://arxiv.org/abs/2006.11239))'
+ id: totrans-89
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 图8-7。去噪扩散模型的训练过程(来源:[Ho等人,2020](https://arxiv.org/abs/2006.11239))
- en: In Keras, we can code up this training step as illustrated in [Example 8-5](#diffusion_train_step).
+ id: totrans-90
prefs: []
type: TYPE_NORMAL
+ zh: 在Keras中,我们可以将这个训练步骤编码为[示例8-5](#diffusion_train_step)所示。
- en: Example 8-5\. The `train_step` function of the Keras diffusion model
+ id: totrans-91
prefs:
- PREF_H5
type: TYPE_NORMAL
+ zh: 示例8-5。Keras扩散模型的`train_step`函数
- en: '[PRE4]'
+ id: totrans-92
prefs: []
type: TYPE_PRE
+ zh: '[PRE4]'
- en: '[![1](Images/1.png)](#co_diffusion_models_CO4-1)'
+ id: totrans-93
prefs: []
type: TYPE_NORMAL
+ zh: '[![1](Images/1.png)](#co_diffusion_models_CO4-1)'
- en: We first normalize the batch of images to have zero mean and unit variance.
+ id: totrans-94
prefs: []
type: TYPE_NORMAL
+ zh: 我们首先将图像批次归一化为零均值和单位方差。
- en: '[![2](Images/2.png)](#co_diffusion_models_CO4-2)'
+ id: totrans-95
prefs: []
type: TYPE_NORMAL
+ zh: '[![2](Images/2.png)](#co_diffusion_models_CO4-2)'
- en: Next, we sample noise to match the shape of the input images.
+ id: totrans-96
prefs: []
type: TYPE_NORMAL
+ zh: 接下来,我们对形状与输入图像匹配的噪声进行采样。
- en: '[![3](Images/3.png)](#co_diffusion_models_CO4-3)'
+ id: totrans-97
prefs: []
type: TYPE_NORMAL
+ zh: '[![3](Images/3.png)](#co_diffusion_models_CO4-3)'
- en: We also sample random diffusion times…
+ id: totrans-98
prefs: []
type: TYPE_NORMAL
+ zh: 我们还对随机扩散时间进行采样…
- en: '[![4](Images/4.png)](#co_diffusion_models_CO4-4)'
+ id: totrans-99
prefs: []
type: TYPE_NORMAL
+ zh: '[![4](Images/4.png)](#co_diffusion_models_CO4-4)'
- en: …and use these to generate the noise and signal rates according to the cosine
diffusion schedule.
+ id: totrans-100
prefs: []
type: TYPE_NORMAL
+ zh: …并使用这些根据余弦扩散计划生成噪声和信号速率。
- en: '[![5](Images/5.png)](#co_diffusion_models_CO4-5)'
+ id: totrans-101
prefs: []
type: TYPE_NORMAL
+ zh: '[![5](Images/5.png)](#co_diffusion_models_CO4-5)'
- en: Then we apply the signal and noise weightings to the input images to generate
the noisy images.
+ id: totrans-102
prefs: []
type: TYPE_NORMAL
+ zh: 然后我们将信号和噪声权重应用于输入图像以生成嘈杂的图像。
- en: '[![6](Images/6.png)](#co_diffusion_models_CO4-6)'
+ id: totrans-103
prefs: []
type: TYPE_NORMAL
+ zh: '[![6](Images/6.png)](#co_diffusion_models_CO4-6)'
- en: Next, we denoise the noisy images by asking the network to predict the noise
and then undoing the noising operation, using the provided `noise_rates` and `signal_rates`.
+ id: totrans-104
prefs: []
type: TYPE_NORMAL
+ zh: 接下来,我们通过要求网络预测噪声然后撤消添加噪声的操作,使用提供的`noise_rates`和`signal_rates`来去噪嘈杂的图像。
- en: '[![7](Images/7.png)](#co_diffusion_models_CO4-7)'
+ id: totrans-105
prefs: []
type: TYPE_NORMAL
+ zh: '[![7](Images/7.png)](#co_diffusion_models_CO4-7)'
- en: We can then calculate the loss (mean absolute error) between the predicted noise
and the true noise…
+ id: totrans-106
prefs: []
type: TYPE_NORMAL
+ zh: 然后我们可以计算预测噪声和真实噪声之间的损失(平均绝对误差)…
- en: '[![8](Images/8.png)](#co_diffusion_models_CO4-8)'
+ id: totrans-107
prefs: []
type: TYPE_NORMAL
+ zh: '[![8](Images/8.png)](#co_diffusion_models_CO4-8)'
- en: …and take a gradient step against this loss function.
+ id: totrans-108
prefs: []
type: TYPE_NORMAL
+ zh: …并根据这个损失函数采取梯度步骤。
- en: '[![9](Images/9.png)](#co_diffusion_models_CO4-9)'
+ id: totrans-109
prefs: []
type: TYPE_NORMAL
+ zh: '[![9](Images/9.png)](#co_diffusion_models_CO4-9)'
- en: The EMA network weights are updated to a weighted average of the existing EMA
weights and the trained network weights after the gradient step.
+ id: totrans-110
prefs: []
type: TYPE_NORMAL
+ zh: EMA网络权重更新为现有EMA权重和训练后的网络权重在梯度步骤后的加权平均值。
- en: The U-Net Denoising Model
+ id: totrans-111
prefs:
- PREF_H2
type: TYPE_NORMAL
+ zh: U-Net去噪模型
- en: Now that we have seen the kind of neural network that we need to build (one
that predicts the noise added to a given image), we can look at the architecture
that makes this possible.
+ id: totrans-112
prefs: []
type: TYPE_NORMAL
+ zh: 现在我们已经看到了我们需要构建的神经网络的类型(一个预测添加到给定图像的噪声的网络),我们可以看一下使这种可能的架构。
- en: The authors of the DDPM paper used a type of architecture known as a *U-Net*.
A diagram of this network is shown in [Figure 8-8](#unet_diffusion), explicitly
showing the shape of the tensor as it passes through the network.
+ id: totrans-113
prefs: []
type: TYPE_NORMAL
+ zh: DDPM论文的作者使用了一种称为*U-Net*的架构类型。这个网络的图表显示在[图8-8](#unet_diffusion)中,明确显示了张量在通过网络时的形状。
- en: '![](Images/gdl2_0808.png)'
+ id: totrans-114
prefs: []
type: TYPE_IMG
+ zh: '![](Images/gdl2_0808.png)'
- en: Figure 8-8\. U-Net architecture diagram
+ id: totrans-115
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 图8-8\. U-Net架构图
- en: 'In a similar manner to a variational autoencoder, a U-Net consists of two halves:
the downsampling half, where input images are compressed spatially but expanded
channel-wise, and the upsampling half, where representations are expanded spatially
@@ -669,129 +1068,198 @@
the network from input to output, one layer after another. A U-Net is different,
because the skip connections allow information to shortcut parts of the network
and flow through to later layers.'
+ id: totrans-116
prefs: []
type: TYPE_NORMAL
+ zh: 类似于变分自动编码器,U-Net由两部分组成:下采样部分,其中输入图像在空间上被压缩但在通道上被扩展,以及上采样部分,其中表示在空间上被扩展,而通道数量减少。然而,与VAE不同的是,在网络的上采样和下采样部分之间还有*跳跃连接*。VAE是顺序的;数据从输入到输出依次通过网络的每一层。U-Net不同,因为跳跃连接允许信息绕过网络的部分并流向后续层。
- en: A U-Net is particularly useful when we want the output to have the same shape
as the input. In our diffusion model example, we want to predict the noise added
to an image, which has exactly the same shape as the image itself, so a U-Net
is the natural choice for the network architecture.
+ id: totrans-117
prefs: []
type: TYPE_NORMAL
+ zh: 当我们希望输出具有与输入相同的形状时,U-Net特别有用。在我们的扩散模型示例中,我们希望预测添加到图像中的噪声,这个噪声与图像本身的形状完全相同,因此U-Net是网络架构的自然选择。
- en: First let’s take a look at the code that builds this U-Net in Keras, shown in
[Example 8-6](#unet_keras).
+ id: totrans-118
prefs: []
type: TYPE_NORMAL
+ zh: 首先让我们看一下在Keras中构建这个U-Net的代码,显示在[示例8-6](#unet_keras)中。
- en: Example 8-6\. A U-Net model in Keras
+ id: totrans-119
prefs:
- PREF_H5
type: TYPE_NORMAL
+ zh: 示例8-6\. Keras中的U-Net模型
- en: '[PRE5]'
+ id: totrans-120
prefs: []
type: TYPE_PRE
+ zh: '[PRE5]'
- en: '[![1](Images/1.png)](#co_diffusion_models_CO5-1)'
+ id: totrans-121
prefs: []
type: TYPE_NORMAL
+ zh: '[![1](Images/1.png)](#co_diffusion_models_CO5-1)'
- en: The first input to the U-Net is the image that we wish to denoise.
+ id: totrans-122
prefs: []
type: TYPE_NORMAL
+ zh: U-Net的第一个输入是我们希望去噪的图像。
- en: '[![2](Images/2.png)](#co_diffusion_models_CO5-2)'
+ id: totrans-123
prefs: []
type: TYPE_NORMAL
+ zh: '[![2](Images/2.png)](#co_diffusion_models_CO5-2)'
- en: This image is passed through a `Conv2D` layer to increase the number of channels.
+ id: totrans-124
prefs: []
type: TYPE_NORMAL
+ zh: 这个图像通过一个`Conv2D`层传递,以增加通道数量。
- en: '[![3](Images/3.png)](#co_diffusion_models_CO5-3)'
+ id: totrans-125
prefs: []
type: TYPE_NORMAL
+ zh: '[![3](Images/3.png)](#co_diffusion_models_CO5-3)'
- en: The second input to the U-Net is the noise variance (a scalar).
+ id: totrans-126
prefs: []
type: TYPE_NORMAL
+ zh: U-Net的第二个输入是噪声方差(一个标量)。
- en: '[![4](Images/4.png)](#co_diffusion_models_CO5-4)'
+ id: totrans-127
prefs: []
type: TYPE_NORMAL
+ zh: '[![4](Images/4.png)](#co_diffusion_models_CO5-4)'
- en: This is encoded using a sinusoidal embedding.
+ id: totrans-128
prefs: []
type: TYPE_NORMAL
+ zh: 这是使用正弦嵌入编码的。
- en: '[![5](Images/5.png)](#co_diffusion_models_CO5-5)'
+ id: totrans-129
prefs: []
type: TYPE_NORMAL
+ zh: '[![5](Images/5.png)](#co_diffusion_models_CO5-5)'
- en: This embedding is copied across spatial dimensions to match the size of the
input image.
+ id: totrans-130
prefs: []
type: TYPE_NORMAL
+ zh: 这个嵌入被复制到空间维度以匹配输入图像的大小。
- en: '[![6](Images/6.png)](#co_diffusion_models_CO5-6)'
+ id: totrans-131
prefs: []
type: TYPE_NORMAL
+ zh: '[![6](Images/6.png)](#co_diffusion_models_CO5-6)'
- en: The two input streams are concatenated across channels.
+ id: totrans-132
prefs: []
type: TYPE_NORMAL
+ zh: 两个输入流在通道上连接。
- en: '[![7](Images/7.png)](#co_diffusion_models_CO5-7)'
+ id: totrans-133
prefs: []
type: TYPE_NORMAL
+ zh: '[![7](Images/7.png)](#co_diffusion_models_CO5-7)'
- en: The `skips` list will hold the output from the `DownBlock` layers that we wish
to connect to `UpBlock` layers downstream.
+ id: totrans-134
prefs: []
type: TYPE_NORMAL
+ zh: '`skips`列表将保存我们希望连接到下游`UpBlock`层的`DownBlock`层的输出。'
- en: '[![8](Images/8.png)](#co_diffusion_models_CO5-8)'
+ id: totrans-135
prefs: []
type: TYPE_NORMAL
+ zh: '[![8](Images/8.png)](#co_diffusion_models_CO5-8)'
- en: The tensor is passed through a series of `DownBlock` layers that reduce the
size of the image, while increasing the number of channels.
+ id: totrans-136
prefs: []
type: TYPE_NORMAL
+ zh: 张量通过一系列`DownBlock`层传递,这些层减小了图像的大小,同时增加了通道的数量。
- en: '[![9](Images/9.png)](#co_diffusion_models_CO5-9)'
+ id: totrans-137
prefs: []
type: TYPE_NORMAL
+ zh: '[![9](Images/9.png)](#co_diffusion_models_CO5-9)'
- en: The tensor is then passed through two `ResidualBlock` layers that hold the image
size and number of channels constant.
+ id: totrans-138
prefs: []
type: TYPE_NORMAL
+ zh: 然后,张量通过两个`ResidualBlock`层传递,这些层保持图像大小和通道数量恒定。
- en: '[![10](Images/10.png)](#co_diffusion_models_CO5-10)'
+ id: totrans-139
prefs: []
type: TYPE_NORMAL
+ zh: '[![10](Images/10.png)](#co_diffusion_models_CO5-10)'
- en: Next, the tensor is passed through a series of `UpBlock` layers that increase
the size of the image, while decreasing the number of channels. The skip connections
incorporate output from the earlier `DownBlock` layers.
+ id: totrans-140
prefs: []
type: TYPE_NORMAL
+ zh: 接下来,张量通过一系列`UpBlock`层传递,这些层增加图像的大小,同时减少通道数。跳跃连接将输出与较早的`DownBlock`层的输出合并。
- en: '[![11](Images/11.png)](#co_diffusion_models_CO5-11)'
+ id: totrans-141
prefs: []
type: TYPE_NORMAL
+ zh: '[![11](Images/11.png)](#co_diffusion_models_CO5-11)'
- en: The final `Conv2D` layer reduces the number of channels to three (RGB).
+ id: totrans-142
prefs: []
type: TYPE_NORMAL
+ zh: 最终的`Conv2D`层将通道数减少到三(RGB)。
- en: '[![12](Images/12.png)](#co_diffusion_models_CO5-12)'
+ id: totrans-143
prefs: []
type: TYPE_NORMAL
+ zh: '[![12](Images/12.png)](#co_diffusion_models_CO5-12)'
- en: The U-Net is a Keras `Model` that takes the noisy images and noise variances
as input and outputs a predicted noise map.
+ id: totrans-144
prefs: []
type: TYPE_NORMAL
+ zh: U-Net是一个Keras `Model`,它以嘈杂的图像和噪声方差作为输入,并输出预测的噪声图。
- en: 'To understand the U-Net in detail, we need to explore four more concepts: the
sinusoidal embedding of the noise variance, the `ResidualBlock`, the `DownBlock`,
and the `UpBlock`.'
+ id: totrans-145
prefs: []
type: TYPE_NORMAL
+ zh: 要详细了解U-Net,我们需要探索四个概念:噪声方差的正弦嵌入、`ResidualBlock`、`DownBlock`和`UpBlock`。
- en: Sinusoidal embedding
+ id: totrans-146
prefs:
- PREF_H3
type: TYPE_NORMAL
+ zh: 正弦嵌入
- en: '*Sinusoidal embedding* was first introduced in a paper by Vaswani et al.^([6](ch08.xhtml#idm45387008220416))
We will be using an adaptation of that original idea as utilized in Mildenhall
et al.’s paper titled “NeRF: Representing Scenes as Neural Radiance Fields for
View Synthesis.”^([7](ch08.xhtml#idm45387008216736))'
+ id: totrans-147
prefs: []
type: TYPE_NORMAL
+ zh: '*正弦嵌入*最初是由Vaswani等人在一篇论文中引入的。我们将使用Mildenhall等人在题为“NeRF: Representing Scenes
+ as Neural Radiance Fields for View Synthesis”的论文中使用的这个原始想法的改编。'
- en: The idea is that we want to be able to convert a scalar value (the noise variance)
into a distinct higher-dimensional vector that is able to provide a more complex
representation, for use downstream in the network. The original paper used this
idea to encode the discrete position of words in a sentence into vectors; the
NeRF paper extends this idea to continuous values.
+ id: totrans-148
prefs: []
type: TYPE_NORMAL
+ zh: 我们希望能够将标量值(噪声方差)转换为一个不同的高维向量,能够提供更复杂的表示,以便在网络中下游使用。原始论文使用这个想法将句子中单词的离散位置编码为向量;NeRF论文将这个想法扩展到连续值。
- en: 'Specifically, a scalar value *x* is encoded as shown in the following equation:'
+ id: totrans-149
prefs: []
type: TYPE_NORMAL
+ zh: 具体来说,标量值*x*被编码如下方程所示:
- en: cos
( 2 π e (L-1)f
x ) )
+ id: totrans-150
prefs: []
type: TYPE_NORMAL
+ zh: γ ( x )
+ = ( sin ( 2 π
+ e 0f x )
+ , ⋯ , sin ( 2
+ π e (L-1)f)
+ x ) , cos (
+ 2 π e 0f
+ x ) , ⋯ , cos
+ ( 2 π e (L-1)f
+ x ) )
- en: where we choose L =
16 to be half the size of our desired noise embedding length
and f = ln(1000) L-1
to be the maximum scaling factor for the frequencies.
+ id: totrans-151
prefs: []
type: TYPE_NORMAL
+ zh: 其中我们选择L = 16,是我们期望的噪声嵌入长度的一半,f = ln(1000)
+ L-1是频率的最大缩放因子。
- en: This produces the embedding pattern shown in [Figure 8-9](#sinusoidal_embedding_image).
+ id: totrans-152
prefs: []
type: TYPE_NORMAL
+ zh: 这产生了[图8-9](#sinusoidal_embedding_image)中显示的嵌入模式。
- en: '![](Images/gdl2_0809.png)'
+ id: totrans-153
prefs: []
type: TYPE_IMG
+ zh: '![](Images/gdl2_0809.png)'
- en: Figure 8-9\. The pattern of sinusoidal embeddings for noise variances from 0
to 1
+ id: totrans-154
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 图8-9。噪声方差从0到1的正弦嵌入模式
- en: We can code this sinusoidal embedding function as shown in [Example 8-7](#sinusoidal_embedding_diffusion).
This converts a single noise variance scalar value into a vector of length 32.
+ id: totrans-155
prefs: []
type: TYPE_NORMAL
+ zh: 我们可以将这个正弦嵌入函数编码如[示例8-7](#sinusoidal_embedding_diffusion)所示。这将一个单一的噪声方差标量值转换为长度为32的向量。
- en: Example 8-7\. The `sinusoidal_embedding` function that encodes the noise variance
+ id: totrans-156
prefs:
- PREF_H5
type: TYPE_NORMAL
+ zh: 示例8-7。编码噪声方差的`sinusoidal_embedding`函数
- en: '[PRE6]'
+ id: totrans-157
prefs: []
type: TYPE_PRE
+ zh: '[PRE6]'
- en: ResidualBlock
+ id: totrans-158
prefs:
- PREF_H3
type: TYPE_NORMAL
+ zh: 残差块
- en: Both the `DownBlock` and the `UpBlock` contain `ResidualBlock` layers, so let’s
start with these. We already explored residual blocks in [Chapter 5](ch05.xhtml#chapter_autoregressive),
when we built a PixelCNN, but we will recap here for completeness.
+ id: totrans-159
prefs: []
type: TYPE_NORMAL
+ zh: '`DownBlock`和`UpBlock`都包含`ResidualBlock`层,所以让我们从这些层开始。我们在[第5章](ch05.xhtml#chapter_autoregressive)中构建PixelCNN时已经探讨过残差块,但为了完整起见,我们将在这里进行回顾。'
- en: A *residual block* is a group of layers that contains a skip connection that
adds the input to the output. Residual blocks help us to build deeper networks
that can learn more complex patterns without suffering as greatly from vanishing
@@ -859,77 +1365,100 @@
that as neural networks become deeper, they are not necessarily as accurate as
their shallower counterparts—accuracy seems to become saturated at a certain depth
and then degrade rapidly.
+ id: totrans-160
prefs: []
type: TYPE_NORMAL
+ zh: '*残差块*是一组包含跳跃连接的层,将输入添加到输出中。残差块帮助我们构建更深的网络,可以学习更复杂的模式,而不会受到梯度消失和退化问题的严重影响。梯度消失问题是指随着网络变得更深,通过更深层传播的梯度很小,因此学习速度非常慢。退化问题是指随着神经网络变得更深,它们不一定像较浅的对应网络那样准确——准确性似乎在一定深度上饱和,然后迅速退化。'
- en: Degradation
+ id: totrans-161
prefs:
- PREF_H1
type: TYPE_NORMAL
+ zh: 退化
- en: The degradation problem is somewhat counterintuitive, but observed in practice
as the deeper layers must at least learn the identity mapping, which is not trivial—especially
considering other problems deeper networks face, such as the vanishing gradient
problem.
+ id: totrans-162
prefs: []
type: TYPE_NORMAL
+ zh: 退化问题有点反直觉,但在实践中观察到,因为更深的层至少必须学习恒等映射,这并不是微不足道的——尤其考虑到更深的网络面临的其他问题,比如梯度消失问题。
- en: The solution, first introduced in the ResNet paper by He et al. in 2015,^([8](ch08.xhtml#idm45387008052288))
is very simple. By including a skip connection *highway* around the main weighted
layers, the block has the option to bypass the complex weight updates and simply
pass through the identity mapping. This allows the network to be trained to great
depth without sacrificing gradient size or network accuracy.
+ id: totrans-163
prefs: []
type: TYPE_NORMAL
- en: A diagram of a `ResidualBlock` is shown in [Figure 8-10](#diffusion_residual).
Note that in some residual blocks, we also include an extra `Conv2D` layer with
kernel size 1 on the skip connection, to bring the number of channels in line
with the rest of the block.
+ id: totrans-164
prefs: []
type: TYPE_NORMAL
- en: '![](Images/gdl2_0810.png)'
+ id: totrans-165
prefs: []
type: TYPE_IMG
- en: Figure 8-10\. The `ResidualBlock` in the U-Net
+ id: totrans-166
prefs:
- PREF_H6
type: TYPE_NORMAL
- en: We can code a `ResidualBlock` in Keras as shown in [Example 8-8](#diffusion_residual_code).
+ id: totrans-167
prefs: []
type: TYPE_NORMAL
- en: Example 8-8\. Code for the `ResidualBlock` in the U-Net
+ id: totrans-168
prefs:
- PREF_H5
type: TYPE_NORMAL
- en: '[PRE7]'
+ id: totrans-169
prefs: []
type: TYPE_PRE
+ zh: '[PRE7]'
- en: '[![1](Images/1.png)](#co_diffusion_models_CO6-1)'
+ id: totrans-170
prefs: []
type: TYPE_NORMAL
- en: Check if the number of channels in the input matches the number of channels
that we would like the block to output. If not, include an extra `Conv2D` layer
on the skip connection to bring the number of channels in line with the rest of
the block.
+ id: totrans-171
prefs: []
type: TYPE_NORMAL
- en: '[![2](Images/2.png)](#co_diffusion_models_CO6-2)'
+ id: totrans-172
prefs: []
type: TYPE_NORMAL
- en: Apply a `BatchNormalization` layer.
+ id: totrans-173
prefs: []
type: TYPE_NORMAL
- en: '[![3](Images/3.png)](#co_diffusion_models_CO6-3)'
+ id: totrans-174
prefs: []
type: TYPE_NORMAL
- en: Apply two `Conv2D` layers.
+ id: totrans-175
prefs: []
type: TYPE_NORMAL
- en: '[![4](Images/4.png)](#co_diffusion_models_CO6-4)'
+ id: totrans-176
prefs: []
type: TYPE_NORMAL
- en: Add the original block input to the output to provide the final output from
the block.
+ id: totrans-177
prefs: []
type: TYPE_NORMAL
- en: DownBlocks and UpBlocks
+ id: totrans-178
prefs:
- PREF_H3
type: TYPE_NORMAL
@@ -937,6 +1466,7 @@
(=2 in our example) `ResidualBlock`s, while also applying a final `AveragePooling2D`
layer in order to halve the size of the image. Each `ResidualBlock` is added to
a list for use later by the `UpBlock` layers as skip connections across the U-Net.
+ id: totrans-179
prefs: []
type: TYPE_NORMAL
- en: An `UpBlock` first applies an `UpSampling2D` layer that doubles the size of
@@ -944,128 +1474,173 @@
the number of channels via `block_depth` (=2) `ResidualBlock`s, while also concatenating
the outputs from the `DownBlock`s through skip connections across the U-Net. A
diagram of this process is shown in [Figure 8-11](#diffusion_down_up_block).
+ id: totrans-180
prefs: []
type: TYPE_NORMAL
- en: '![](Images/gdl2_0811.png)'
+ id: totrans-181
prefs: []
type: TYPE_IMG
- en: Figure 8-11\. The `DownBlock` and corresponding `UpBlock` in the U-Net
+ id: totrans-182
prefs:
- PREF_H6
type: TYPE_NORMAL
- en: We can code the `DownBlock` and `UpBlock` using Keras as illustrated in [Example 8-9](#diffusion_down_up_code).
+ id: totrans-183
prefs: []
type: TYPE_NORMAL
- en: Example 8-9\. Code for the `DownBlock` and `UpBlock` in the U-Net model
+ id: totrans-184
prefs:
- PREF_H5
type: TYPE_NORMAL
- en: '[PRE8]'
+ id: totrans-185
prefs: []
type: TYPE_PRE
+ zh: '[PRE8]'
- en: '[![1](Images/1.png)](#co_diffusion_models_CO7-1)'
+ id: totrans-186
prefs: []
type: TYPE_NORMAL
- en: The `DownBlock` increases the number of channels in the image using a `ResidualBlock`
of a given `width`…
+ id: totrans-187
prefs: []
type: TYPE_NORMAL
- en: '[![2](Images/2.png)](#co_diffusion_models_CO7-2)'
+ id: totrans-188
prefs: []
type: TYPE_NORMAL
- en: …each of which are saved to a list (`skips`) for use later by the `UpBlock`s.
+ id: totrans-189
prefs: []
type: TYPE_NORMAL
- en: '[![3](Images/3.png)](#co_diffusion_models_CO7-3)'
+ id: totrans-190
prefs: []
type: TYPE_NORMAL
- en: A final `AveragePooling2D` layer reduces the dimensionality of the image by
half.
+ id: totrans-191
prefs: []
type: TYPE_NORMAL
- en: '[![4](Images/4.png)](#co_diffusion_models_CO7-4)'
+ id: totrans-192
prefs: []
type: TYPE_NORMAL
- en: The `UpBlock` begins with an `UpSampling2D` layer that doubles the size of the
image.
+ id: totrans-193
prefs: []
type: TYPE_NORMAL
- en: '[![5](Images/5.png)](#co_diffusion_models_CO7-5)'
+ id: totrans-194
prefs: []
type: TYPE_NORMAL
- en: The output from a `DownBlock` layer is glued to the current output using a `Concatenate`
layer.
+ id: totrans-195
prefs: []
type: TYPE_NORMAL
- en: '[![6](Images/6.png)](#co_diffusion_models_CO7-6)'
+ id: totrans-196
prefs: []
type: TYPE_NORMAL
- en: A `ResidualBlock` is used to reduce the number of channels in the image as it
passes through the `UpBlock`.
+ id: totrans-197
prefs: []
type: TYPE_NORMAL
- en: Training the Diffusion Model
+ id: totrans-198
prefs:
- PREF_H2
type: TYPE_NORMAL
- en: We now have all the components in place to train our denoising diffusion model!
[Example 8-10](#diffusion_train_code) creates, compiles, and fits the diffusion
model.
+ id: totrans-199
prefs: []
type: TYPE_NORMAL
- en: Example 8-10\. Code for training the `DiffusionModel`
+ id: totrans-200
prefs:
- PREF_H5
type: TYPE_NORMAL
- en: '[PRE9]'
+ id: totrans-201
prefs: []
type: TYPE_PRE
+ zh: '[PRE9]'
- en: '[![1](Images/1.png)](#co_diffusion_models_CO8-1)'
+ id: totrans-202
prefs: []
type: TYPE_NORMAL
- en: Instantiate the model.
+ id: totrans-203
prefs: []
type: TYPE_NORMAL
- en: '[![2](Images/2.png)](#co_diffusion_models_CO8-2)'
+ id: totrans-204
prefs: []
type: TYPE_NORMAL
- en: Compile the model, using the AdamW optimizer (similar to Adam but with weight
decay, which helps stabilize the training process) and mean absolute error loss
function.
+ id: totrans-205
prefs: []
type: TYPE_NORMAL
- en: '[![3](Images/3.png)](#co_diffusion_models_CO8-3)'
+ id: totrans-206
prefs: []
type: TYPE_NORMAL
- en: Calculate the normalization statistics using the training set.
+ id: totrans-207
prefs: []
type: TYPE_NORMAL
+ zh: 使用训练集计算归一化统计数据。
- en: '[![4](Images/4.png)](#co_diffusion_models_CO8-4)'
+ id: totrans-208
prefs: []
type: TYPE_NORMAL
+ zh: '[![4](Images/4.png)](#co_diffusion_models_CO8-4)'
- en: Fit the model over 50 epochs.
+ id: totrans-209
prefs: []
type: TYPE_NORMAL
+ zh: 在50个时代内拟合模型。
- en: The loss curve (noise mean absolute error [MAE]) is shown in [Figure 8-12](#diffusion_loss).
+ id: totrans-210
prefs: []
type: TYPE_NORMAL
+ zh: 损失曲线(噪音平均绝对误差[MAE])显示在[图8-12](#diffusion_loss)中。
- en: '![](Images/gdl2_0812.png)'
+ id: totrans-211
prefs: []
type: TYPE_IMG
+ zh: '![](Images/gdl2_0812.png)'
- en: Figure 8-12\. The noise mean absolute error loss curve, by epoch
+ id: totrans-212
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 图8-12。噪音平均绝对误差损失曲线,按时代
- en: Sampling from the Denoising Diffusion Model
+ id: totrans-213
prefs:
- PREF_H2
type: TYPE_NORMAL
+ zh: 从去噪扩散模型中采样
- en: In order to sample images from our trained model, we need to apply the reverse
diffusion process—that is, we need to start with random noise and use the model
to gradually undo the noise, until we are left with a recognizable picture of
a flower.
+ id: totrans-214
prefs: []
type: TYPE_NORMAL
+ zh: 为了从我们训练好的模型中采样图像,我们需要应用反向扩散过程-也就是说,我们需要从随机噪音开始,并使用模型逐渐消除噪音,直到我们得到一个可以识别的花朵图片。
- en: We must bear in mind that our model is trained to predict the total amount of
noise that has been added to a given noisy image from the training set, not just
the noise that was added at the last timestep of the noising process. However,
@@ -1073,8 +1648,10 @@
noise in one shot is clearly not going to work! We would rather mimic the forward
process and undo the predicted noise gradually over many small steps, to allow
the model to adjust to its own predictions.
+ id: totrans-215
prefs: []
type: TYPE_NORMAL
+ zh: 我们必须记住,我们的模型是经过训练的,用于预测在训练集中添加到给定嘈杂图像的总噪音量,而不仅仅是在噪音过程的最后一个时间步骤中添加的噪音。然而,我们不希望一次性消除所有噪音-在一次预测中从纯随机噪音中预测图像显然不会奏效!我们宁愿模仿正向过程,并在许多小步骤中逐渐消除预测的噪音,以使模型能够适应自己的预测。
- en: To achieve this, we can jump from x
t to x
t-1 in two steps—first by
@@ -1084,26 +1661,40 @@
- 1 timesteps, to produce x t-1
. This idea is shown in [Figure 8-13](#diffusion_one_step_sample).
+ id: totrans-216
prefs: []
type: TYPE_NORMAL
+ zh: 为了实现这一点,我们可以在两个步骤中从x t跳到x t-1,首先使用我们模型的噪音预测来计算原始图像x 0的估计,然后重新应用预测的噪音到这个图像,但只在t - 1个时间步骤内,产生x t-1。这个想法在[图8-13](#diffusion_one_step_sample)中显示。
- en: '![](Images/gdl2_0813.png)'
+ id: totrans-217
prefs: []
type: TYPE_IMG
+ zh: '![](Images/gdl2_0813.png)'
- en: Figure 8-13\. One step of the sampling process for our diffusion model
+ id: totrans-218
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 图8-13。扩散模型采样过程的一步
- en: If we repeat this process over a number of steps, we’ll eventually get back
to an estimate for x 0
that has been guided gradually over many small steps. In fact, we are free to
choose the number of steps we take, and crucially, it doesn’t have to be the same
as the large number of steps in the training noising process (i.e., 1,000). It
can be much smaller—in this example we choose 20.
+ id: totrans-219
prefs: []
type: TYPE_NORMAL
+ zh: 如果我们重复这个过程多次,最终我们将得到一个经过许多小步骤逐渐引导的x 0的估计。实际上,我们可以自由选择采取的步数,关键是,它不必与训练噪音过程中的大量步数(即1,000)相同。它可以小得多-在这个例子中,我们选择了20。
- en: 'The following equation (Song et al., 2020) this process mathematically:'
+ id: totrans-220
prefs: []
type: TYPE_NORMAL
+ zh: 以下方程(Song等,2020)数学上描述了这个过程:
- en: σ
t ϵ t ︸
randomnoise
+ id: totrans-221
prefs: []
type: TYPE_NORMAL
+ zh: 𝐱
+ t-1 = α ¯ t-1
+ 𝐱
+ t -1-α
+ ¯ t ϵ θ
+ (t) (𝐱
+ t ) α
+ ¯ t ︸
+ predicted𝐱 0
+ + 1-α ¯ t-1
+ -σ t 2 ·ϵ
+ θ (t) (𝐱
+ t ) ︸ directionpointingto𝐱
+ t + σ
+ t ϵ t ︸
+ randomnoise
- en: Let’s break this down. The first term inside the brackets on the righthand side
of the equation is the estimated image x 0
, calculated using the noise predicted by our network σ
t determining how random we want our generation process
to be.
- prefs: []
- type: TYPE_NORMAL
+ id: totrans-222
+ prefs: []
+ type: TYPE_NORMAL
+ zh: 让我们来分解一下。方程式右侧括号内的第一个项是估计的图像 x 0,使用我们网络预测的噪声
+ ϵ
+ θ (t) 计算得到。然后我们通过
+ t - 1
+ 信号率 α ¯ t-1
+ 缩放这个值,并重新应用预测的噪声,但这次是通过 t -
+ 1 噪声率 1
+ - α ¯ t-1
+ - σ t 2
+ 进行缩放。还添加了额外的高斯随机噪声 σ t ϵ t,其中
+ σ t 确定了我们希望生成过程有多随机。
- en: The special case σ
t = 0 for all t
corresponds to a type of model known as a *Denoising Diffusion Implicit Model*
@@ -1165,139 +1802,217 @@
random noise input will always give the same output. This is desirable as then
we have a well-defined mapping between samples from the latent space and the generated
outputs in pixel space.
+ id: totrans-223
prefs: []
type: TYPE_NORMAL
+ zh: 特殊情况 σ
+ t = 0 对于所有的 t
+ 对应于一种称为*去噪扩散隐式模型*(DDIM)的模型,由Song等人在2020年提出。^([9](ch08.xhtml#idm45387007342688))
+ 使用DDIM,生成过程完全是确定性的—也就是说,相同的随机噪声输入将始终产生相同的输出。这是可取的,因为这样我们在潜在空间的样本和像素空间中生成的输出之间有一个明确定义的映射。
- en: In our example, we will implement a DDIM, thus making our generation process
deterministic. The code for the DDIM sampling process (reverse diffusion) is shown
in [Example 8-11](#diffusion_sampling).
+ id: totrans-224
prefs: []
type: TYPE_NORMAL
+ zh: 在我们的示例中,我们将实现一个DDIM,从而使我们的生成过程确定性。DDIM采样过程(反向扩散)的代码显示在[示例 8-11](#diffusion_sampling)中。
- en: Example 8-11\. Sampling from the diffusion model
+ id: totrans-225
prefs:
- PREF_H5
type: TYPE_NORMAL
+ zh: 示例 8-11\. 从扩散模型中采样
- en: '[PRE10]'
+ id: totrans-226
prefs: []
type: TYPE_PRE
+ zh: '[PRE10]'
- en: '[![1](Images/1.png)](#co_diffusion_models_CO9-1)'
+ id: totrans-227
prefs: []
type: TYPE_NORMAL
+ zh: '[![1](Images/1.png)](#co_diffusion_models_CO9-1)'
- en: Look over a fixed number of steps (e.g., 20).
+ id: totrans-228
prefs: []
type: TYPE_NORMAL
+ zh: 观察固定数量的步骤(例如,20步)。
- en: '[![2](Images/2.png)](#co_diffusion_models_CO9-2)'
+ id: totrans-229
prefs: []
type: TYPE_NORMAL
+ zh: '[![2](Images/2.png)](#co_diffusion_models_CO9-2)'
- en: The diffusion times are all set to 1 (i.e., at the start of the reverse diffusion
process).
+ id: totrans-230
prefs: []
type: TYPE_NORMAL
+ zh: 扩散时间都设置为1(即在反向扩散过程开始时)。
- en: '[![3](Images/3.png)](#co_diffusion_models_CO9-3)'
+ id: totrans-231
prefs: []
type: TYPE_NORMAL
+ zh: '[![3](Images/3.png)](#co_diffusion_models_CO9-3)'
- en: The noise and signal rates are calculated according to the diffusion schedule.
+ id: totrans-232
prefs: []
type: TYPE_NORMAL
+ zh: 根据扩散计划计算噪声和信号率。
- en: '[![4](Images/4.png)](#co_diffusion_models_CO9-4)'
+ id: totrans-233
prefs: []
type: TYPE_NORMAL
+ zh: '[![4](Images/4.png)](#co_diffusion_models_CO9-4)'
- en: The U-Net is used to predict the noise, allowing us to calculate the denoised
image estimate.
+ id: totrans-234
prefs: []
type: TYPE_NORMAL
+ zh: U-Net用于预测噪声,从而使我们能够计算去噪图像的估计。
- en: '[![5](Images/5.png)](#co_diffusion_models_CO9-5)'
+ id: totrans-235
prefs: []
type: TYPE_NORMAL
+ zh: '[![5](Images/5.png)](#co_diffusion_models_CO9-5)'
- en: The diffusion times are reduced by one step.
+ id: totrans-236
prefs: []
type: TYPE_NORMAL
+ zh: 扩散时间减少一步。
- en: '[![6](Images/6.png)](#co_diffusion_models_CO9-6)'
+ id: totrans-237
prefs: []
type: TYPE_NORMAL
+ zh: '[![6](Images/6.png)](#co_diffusion_models_CO9-6)'
- en: The new noise and signal rates are calculated.
+ id: totrans-238
prefs: []
type: TYPE_NORMAL
+ zh: 计算新的噪声和信号率。
- en: '[![7](Images/7.png)](#co_diffusion_models_CO9-7)'
+ id: totrans-239
prefs: []
type: TYPE_NORMAL
+ zh: '[![7](Images/7.png)](#co_diffusion_models_CO9-7)'
- en: The `t-1` images are calculated by reapplying the predicted noise to the predicted
image, according to the `t-1` diffusion schedule rates.
+ id: totrans-240
prefs: []
type: TYPE_NORMAL
+ zh: 通过根据扩散计划率重新应用预测噪声到预测图像,计算出 `t-1` 图像。
- en: '[![8](Images/8.png)](#co_diffusion_models_CO9-8)'
+ id: totrans-241
prefs: []
type: TYPE_NORMAL
+ zh: '[![8](Images/8.png)](#co_diffusion_models_CO9-8)'
- en: After 20 steps, the final 𝐱 0
predicted images are returned.
+ id: totrans-242
prefs: []
type: TYPE_NORMAL
+ zh: 经过20步,最终的 𝐱 0
+ 预测图像被返回。
- en: Analysis of the Diffusion Model
+ id: totrans-243
prefs:
- PREF_H2
type: TYPE_NORMAL
+ zh: 扩散模型的分析
- en: 'We’ll now take a look at three different ways that we can use our trained model:
for generation of new images, testing how the number of reverse diffusion steps
affects quality, and interpolating between two images in the latent space.'
+ id: totrans-244
prefs: []
type: TYPE_NORMAL
+ zh: 现在我们将看一下我们训练模型的三种不同用法:用于生成新图像,测试反向扩散步数如何影响质量,以及在潜在空间中两个图像之间的插值。
- en: Generating images
+ id: totrans-245
prefs:
- PREF_H3
type: TYPE_NORMAL
+ zh: 生成图像
- en: In order to produce samples from our trained model, we can simply run the reverse
diffusion process, ensuring that we denormalize the output at the end (i.e., take
the pixel values back into the range [0, 1]). We can achieve this using the code
in [Example 8-12](#diffusion_generation) inside the `DiffusionModel` class.
+ id: totrans-246
prefs: []
type: TYPE_NORMAL
+ zh: 为了从我们训练的模型中生成样本,我们只需运行逆扩散过程,确保最终去标准化输出(即,将像素值带回范围[0, 1])。我们可以在`DiffusionModel`类中使用[示例8-12](#diffusion_generation)中的代码来实现这一点。
- en: Example 8-12\. Generating images using the diffusion model
+ id: totrans-247
prefs:
- PREF_H5
type: TYPE_NORMAL
+ zh: 示例8-12。使用扩散模型生成图像
- en: '[PRE11]'
+ id: totrans-248
prefs: []
type: TYPE_PRE
+ zh: '[PRE11]'
- en: '[![1](Images/1.png)](#co_diffusion_models_CO10-1)'
+ id: totrans-249
prefs: []
type: TYPE_NORMAL
+ zh: '[![1](Images/1.png)](#co_diffusion_models_CO10-1)'
- en: Generate some initial noise maps.
+ id: totrans-250
prefs: []
type: TYPE_NORMAL
+ zh: 生成一些初始噪声图。
- en: '[![2](Images/2.png)](#co_diffusion_models_CO10-3)'
+ id: totrans-251
prefs: []
type: TYPE_NORMAL
+ zh: '[![2](Images/2.png)](#co_diffusion_models_CO10-3)'
- en: Apply the reverse diffusion process.
+ id: totrans-252
prefs: []
type: TYPE_NORMAL
+ zh: 应用逆扩散过程。
- en: '[![3](Images/3.png)](#co_diffusion_models_CO10-4)'
+ id: totrans-253
prefs: []
type: TYPE_NORMAL
+ zh: '[![3](Images/3.png)](#co_diffusion_models_CO10-4)'
- en: The images output by the network will have mean zero and unit variance, so we
need to denormalize by reapplying the mean and variance calculated from the training
data.
+ id: totrans-254
prefs: []
type: TYPE_NORMAL
+ zh: 网络输出的图像将具有零均值和单位方差,因此我们需要通过重新应用从训练数据计算得出的均值和方差来去标准化。
- en: In [Figure 8-14](#diffusion_samples_epoch) we can observe some samples from
the diffusion model at different epochs of the training process.
+ id: totrans-255
prefs: []
type: TYPE_NORMAL
+ zh: 在[图8-14](#diffusion_samples_epoch)中,我们可以观察到训练过程中不同时期扩散模型的一些样本。
- en: '![](Images/gdl2_0814.png)'
+ id: totrans-256
prefs: []
type: TYPE_IMG
+ zh: '![](Images/gdl2_0814.png)'
- en: Figure 8-14\. Samples from the diffusion model at different epochs of the training
process
+ id: totrans-257
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 图8-14。训练过程中不同时期扩散模型的样本
- en: Adjusting the number of diffusion steps
+ id: totrans-258
prefs:
- PREF_H3
type: TYPE_NORMAL
+ zh: 调整扩散步数
- en: We can also test to see how adjusting the number of diffusion steps in the reverse
process affects image quality. Intuitively, the more steps taken by the process,
the higher the quality of the image generation.
+ id: totrans-259
prefs: []
type: TYPE_NORMAL
+ zh: 我们还可以测试调整逆向过程中扩散步数如何影响图像质量。直观地,过程中步数越多,图像生成的质量就越高。
- en: We can see in [Figure 8-15](#diffusion_steps_quality) that the quality of the
generations does indeed improve with the number of diffusion steps. With one giant
leap from the initial sampled noise, the model can only predict a hazy blob of
@@ -1306,19 +2021,27 @@
of diffusion steps, so there is a trade-off. There is minimal improvement between
20 and 100 diffusion steps, so we choose 20 as a reasonable compromise between
quality and speed in this example.
+ id: totrans-260
prefs: []
type: TYPE_NORMAL
+ zh: 我们可以在[图8-15](#diffusion_steps_quality)中看到,随着扩散步数的增加,生成的质量确实会提高。从初始抽样的噪声中一次性跳跃,模型只能预测出一个朦胧的颜色斑块。随着步数的增加,模型能够改进和锐化生成物。然而,生成图像所需的时间与扩散步数成线性关系,因此存在权衡。在20和100个扩散步之间的改进很小,因此在这个例子中我们选择20作为质量和速度之间的合理折衷。
- en: '![](Images/gdl2_0815.png)'
+ id: totrans-261
prefs: []
type: TYPE_IMG
+ zh: '![](Images/gdl2_0815.png)'
- en: Figure 8-15\. Image quality improves with the number of diffusion steps
+ id: totrans-262
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 图8-15。随着扩散步数的增加,图像质量提高
- en: Interpolating between images
+ id: totrans-263
prefs:
- PREF_H3
type: TYPE_NORMAL
+ zh: 在图像之间进行插值
- en: Lastly, as we have seen previously with variational autoencoders, we can interpolate
between points in the Gaussian latent space in order to smoothly transition between
images in pixel space. Here we choose to use a form of spherical interpolation
@@ -1333,46 +2056,70 @@
ranges smoothly from 0 to 1 and a and b are the two randomly sampled Gaussian noise tensors
that we wish to interpolate between.
+ id: totrans-264
prefs: []
type: TYPE_NORMAL
+ zh: 最后,正如我们之前在变分自动编码器中看到的那样,我们可以在高斯潜在空间中的点之间进行插值,以便在像素空间中平滑过渡。在这里,我们选择使用一种球面插值的形式,确保方差在混合两个高斯噪声图之间保持恒定。具体来说,每一步的初始噪声图由a
+ sin ( π 2
+ t ) + b cos
+ ( π 2 t )给出,其中t从0平滑地变化到1,a和b是我们希望在其间插值的两个随机抽样的高斯噪声张量。
- en: The resulting images are shown in [Figure 8-16](#diffusion_interpolation).
+ id: totrans-265
prefs: []
type: TYPE_NORMAL
+ zh: 生成的图像显示在[图8-16](#diffusion_interpolation)中。
- en: '![](Images/gdl2_0816.png)'
+ id: totrans-266
prefs: []
type: TYPE_IMG
+ zh: '![](Images/gdl2_0816.png)'
- en: Figure 8-16\. Interpolating between images using the denoising diffusion model` `#
Summary
+ id: totrans-267
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 图8-16。使用去噪扩散模型在图像之间进行插值` `# 总结
- en: 'In this chapter we have explored one of the most exciting and promising areas
of generative modeling in recent times: diffusion models. In particular, we implemented
the ideas from a key paper on generative diffusion models (Ho et al., 2020) that
introduced the original Denoising Diffusion Probabilistic Model (DDPM). We then
extended this with the ideas from the Denoising Diffusion Implicit Model (DDIM)
paper to make the generation process fully deterministic.'
+ id: totrans-268
prefs: []
type: TYPE_NORMAL
+ zh: 在本章中,我们探索了近期最令人兴奋和有前途的生成建模领域之一:扩散模型。特别是,我们实现了一篇关于生成扩散模型的关键论文(Ho等人,2020)中介绍的原始去噪扩散概率模型(DDPM)的思想。然后,我们借鉴了去噪扩散隐式模型(DDIM)论文中的思想,使生成过程完全确定性。
- en: We have seen how diffusion models are formed of a forward diffusion process
and a reverse diffusion process. The forward diffusion process adds noise to the
training data through a series of small steps, while the reverse diffusion process
consists of a model that tries to predict the noise added.
+ id: totrans-269
prefs: []
type: TYPE_NORMAL
+ zh: 我们已经看到扩散模型由前向扩散过程和逆扩散过程组成。 前向扩散过程通过一系列小步骤向训练数据添加噪声,而逆扩散过程包括试图预测添加的噪声的模型。
- en: We make use of a reparameterization trick in order to calculate the noised images
at any step of the forward process without having to go through multiple noising
steps. We have seen how the chosen schedule of parameters used to add noise to
the data plays an important part in the overall success of the model.
+ id: totrans-270
prefs: []
type: TYPE_NORMAL
+ zh: 我们利用重新参数化技巧,以便在前向过程的任何步骤中计算带噪声的图像,而无需经历多个加噪步骤。 我们已经看到,用于向数据添加噪声的参数选择计划在模型的整体成功中起着重要作用。
- en: The reverse diffusion process is parameterized by a U-Net that tries to predict
the noise at each timestep, given the noised image and the noise rate at that
step. A U-Net consists of `DownBlock`s that increase the number of channels while
reducing the size of the image and `UpBlock`s that decrease the number of channels
while increasing the size. The noise rate is encoded using sinusoidal embedding.
+ id: totrans-271
prefs: []
type: TYPE_NORMAL
+ zh: 逆扩散过程由一个U-Net参数化,试图在每个时间步预测噪声,给定在该步骤的噪声图像和噪声率。 U-Net由`DownBlock`组成,它们增加通道数同时减小图像的大小,以及`UpBlock`,它们减少通道数同时增加大小。
+ 噪声率使用正弦嵌入进行编码。
- en: Sampling from the diffusion model is conducted over a series of steps. The U-Net
is used to predict the noise added to a given noised image, which is then used
to calculate an estimate for the original image. The predicted noise is then reapplied
@@ -1380,46 +2127,69 @@
may be significantly smaller than the number of steps used during training), starting
from a random point sampled from a standard Gaussian noise distribution, to obtain
the final generation.
+ id: totrans-272
prefs: []
type: TYPE_NORMAL
+ zh: 从扩散模型中进行采样是在一系列步骤中进行的。 使用U-Net来预测添加到给定噪声图像的噪声,然后用于计算原始图像的估计。 然后使用较小的噪声率重新应用预测的噪声。
+ 从标准高斯噪声分布中随机抽取的随机点开始,重复这个过程一系列步骤(可能明显小于训练过程中使用的步骤数),以获得最终生成。
- en: We saw how increasing the number of diffusion steps in the reverse process improves
the image generation quality, at the expense of speed. We also performed latent
space arithmetic in order to interpolate between two images.
+ id: totrans-273
prefs: []
type: TYPE_NORMAL
+ zh: 我们看到,在逆过程中增加扩散步骤的数量会提高图像生成质量,但会降低速度。 我们还执行了潜在空间算术,以在两个图像之间插值。
- en: ^([1](ch08.xhtml#idm45387010500320-marker)) Jascha Sohl-Dickstein et al., “Deep
Unsupervised Learning Using Nonequilibrium Thermodynamics,” March 12, 2015, [*https://arxiv.org/abs/1503.03585*](https://arxiv.org/abs/1503.03585)
+ id: totrans-274
prefs: []
type: TYPE_NORMAL
+ zh: ^([1](ch08.xhtml#idm45387010500320-marker)) Jascha Sohl-Dickstein等,“使用非平衡热力学进行深度无监督学习”,2015年3月12日,[*https://arxiv.org/abs/1503.03585*](https://arxiv.org/abs/1503.03585)
- en: ^([2](ch08.xhtml#idm45387010496240-marker)) Yang Song and Stefano Ermon, “Generative
Modeling by Estimating Gradients of the Data Distribution,” July 12, 2019, [*https://arxiv.org/abs/1907.05600*](https://arxiv.org/abs/1907.05600).
+ id: totrans-275
prefs: []
type: TYPE_NORMAL
+ zh: ^([2](ch08.xhtml#idm45387010496240-marker)) 杨松和Stefano Ermon,“通过估计数据分布的梯度进行生成建模”,2019年7月12日,[*https://arxiv.org/abs/1907.05600*](https://arxiv.org/abs/1907.05600)。
- en: ^([3](ch08.xhtml#idm45387010494000-marker)) Yang Song and Stefano Ermon, “Improved
Techniques for Training Score-Based Generative Models,” June 16, 2020, [*https://arxiv.org/abs/2006.09011*](https://arxiv.org/abs/2006.09011).
+ id: totrans-276
prefs: []
type: TYPE_NORMAL
+ zh: ^([3](ch08.xhtml#idm45387010494000-marker)) 杨松和Stefano Ermon,“改进训练基于分数的生成模型的技术”,2020年6月16日,[*https://arxiv.org/abs/2006.09011*](https://arxiv.org/abs/2006.09011)。
- en: ^([4](ch08.xhtml#idm45387010490880-marker)) Jonathon Ho et al., “Denoising Diffusion
Probabilistic Models,” June 19, 2020, [*https://arxiv.org/abs/2006.11239*](https://arxiv.org/abs/2006.11239).
+ id: totrans-277
prefs: []
type: TYPE_NORMAL
+ zh: ^([4](ch08.xhtml#idm45387010490880-marker)) Jonathon Ho等,“去噪扩散概率模型”,2020年6月19日,[*https://arxiv.org/abs/2006.11239*](https://arxiv.org/abs/2006.11239)。
- en: ^([5](ch08.xhtml#idm45387010764208-marker)) Alex Nichol and Prafulla Dhariwal,
“Improved Denoising Diffusion Probabilistic Models,” February 18, 2021, [*https://arxiv.org/abs/2102.09672*](https://arxiv.org/abs/2102.09672).
+ id: totrans-278
prefs: []
type: TYPE_NORMAL
+ zh: ^([5](ch08.xhtml#idm45387010764208-marker)) Alex Nichol和Prafulla Dhariwal,“改进去噪扩散概率模型”,2021年2月18日,[*https://arxiv.org/abs/2102.09672*](https://arxiv.org/abs/2102.09672)。
- en: ^([6](ch08.xhtml#idm45387008220416-marker)) Ashish Vaswani et al., “Attention
Is All You Need,” June 12, 2017, [*https://arxiv.org/abs/1706.03762*](https://arxiv.org/abs/1706.03762).
+ id: totrans-279
prefs: []
type: TYPE_NORMAL
+ zh: ^([6](ch08.xhtml#idm45387008220416-marker)) Ashish Vaswani等,“注意力就是一切”,2017年6月12日,[*https://arxiv.org/abs/1706.03762*](https://arxiv.org/abs/1706.03762)。
- en: '^([7](ch08.xhtml#idm45387008216736-marker)) Ben Mildenhall et al., “NeRF: Representing
Scenes as Neural Radiance Fields for View Synthesis,” March 1, 2020, [*https://arxiv.org/abs/2003.08934*](https://arxiv.org/abs/2003.08934).'
+ id: totrans-280
prefs: []
type: TYPE_NORMAL
+ zh: ^([7](ch08.xhtml#idm45387008216736-marker)) Ben Mildenhall等,“NeRF:将场景表示为神经辐射场进行视图合成”,2020年3月1日,[*https://arxiv.org/abs/2003.08934*](https://arxiv.org/abs/2003.08934)。
- en: ^([8](ch08.xhtml#idm45387008052288-marker)) Kaiming He et al., “Deep Residual
Learning for Image Recognition,” December 10, 2015, [*https://arxiv.org/abs/1512.03385*](https://arxiv.org/abs/1512.03385).
+ id: totrans-281
prefs: []
type: TYPE_NORMAL
+ zh: ^([8](ch08.xhtml#idm45387008052288-marker)) Kaiming He等,“用于图像识别的深度残差学习”,2015年12月10日,[*https://arxiv.org/abs/1512.03385*](https://arxiv.org/abs/1512.03385)。
- en: ^([9](ch08.xhtml#idm45387007342688-marker)) Jiaming Song et al., “Denoising
Diffusion Implicit Models,” October 6, 2020, [*https://arxiv.org/abs/2010.02502*](https://arxiv.org/abs/2010.02502)`
+ id: totrans-282
prefs: []
type: TYPE_NORMAL
+ zh: ^([9](ch08.xhtml#idm45387007342688-marker)) 宋嘉明等,“去噪扩散隐式模型”,2020年10月6日,[*https://arxiv.org/abs/2010.02502*](https://arxiv.org/abs/2010.02502)`