From 444ef71c5158bf483e100d08c6cdd4a3779470af Mon Sep 17 00:00:00 2001
From: wizardforcel <562826179@qq.com>
Date: Thu, 8 Feb 2024 18:58:20 +0800
Subject: [PATCH] 2024-02-08 18:58:18

---
 totrans/gen-dl_04.yaml | 687 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 687 insertions(+)

diff --git a/totrans/gen-dl_04.yaml b/totrans/gen-dl_04.yaml
index 408551e..4c305d8 100644
--- a/totrans/gen-dl_04.yaml
+++ b/totrans/gen-dl_04.yaml
@@ -1,13 +1,16 @@
 - en: Chapter 2\. Deep Learning
+  id: totrans-0
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
 - en: 'Let’s start with a basic definition of deep learning:'
+  id: totrans-1
   prefs: []
   type: TYPE_NORMAL
 - en: Deep learning is a class of machine learning algorithms that uses *multiple
     stacked layers of processing units* to learn high-level representations from *unstructured*
     data.
+  id: totrans-2
   prefs:
   - PREF_BQ
   type: TYPE_NORMAL
@@ -17,9 +20,11 @@
     building multiple stacked layers of processing units to solve classification tasks.
     This will provide the foundation for future chapters where we focus on deep learning
     for generative tasks.
+  id: totrans-3
   prefs: []
   type: TYPE_NORMAL
 - en: Data for Deep Learning
+  id: totrans-4
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
@@ -32,6 +37,7 @@
     to predict the binary response variable—did the person subscribe (1) or not (0)?
     Here, each individual feature contains a nugget of information about the observation,
     and the model would learn how these features interact to influence the response.
+  id: totrans-5
   prefs: []
   type: TYPE_NORMAL
 - en: '*Unstructured* data refers to any data that is not naturally arranged into
@@ -39,12 +45,15 @@
     structure to an image, temporal structure to a recording or passage of text, and
     both spatial and temporal structure to video data, but since the data does not
     arrive in columns of features, it is considered unstructured, as shown in [Figure 2-1](#structured_unstructured).'
+  id: totrans-6
   prefs: []
   type: TYPE_NORMAL
 - en: '![](Images/gdl2_0201.png)'
+  id: totrans-7
   prefs: []
   type: TYPE_IMG
 - en: Figure 2-1\. The difference between structured and unstructured data
+  id: totrans-8
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
@@ -53,6 +62,7 @@
     is a muddy shade of brown doesn’t really help identify if the image is of a house
     or a dog, and knowing that character 24 of a sentence is an *e* doesn’t help predict
     if the text is about football or politics.
+  id: totrans-9
   prefs: []
   type: TYPE_NORMAL
 - en: Pixels or characters are really just the dimples of the canvas into which higher-level
@@ -64,6 +74,7 @@
     positions would provide this information. The granularity of the data combined
     with the high degree of spatial dependence destroys the concept of the pixel or
     character as an informative feature in its own right.
+  id: totrans-10
   prefs: []
   type: TYPE_NORMAL
 - en: For this reason, if we train logistic regression, random forest, or XGBoost
@@ -72,6 +83,7 @@
     to be informative and not spatially dependent. A deep learning model, on the other
     hand, can learn how to build high-level informative features by itself, directly
     from the unstructured data.
+  id: totrans-11
   prefs: []
   type: TYPE_NORMAL
 - en: Deep learning can be applied to structured data, but its real power, especially
@@ -79,9 +91,11 @@
     data. Most often, we want to generate unstructured data such as new images or
     original strings of text, which is why deep learning has had such a profound impact
     on the field of generative modeling.
+  id: totrans-12
   prefs: []
   type: TYPE_NORMAL
 - en: Deep Neural Networks
+  id: totrans-13
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
@@ -90,49 +104,67 @@
     this reason, *deep learning* has now almost become synonymous with *deep neural
     networks*. However, any system that employs many layers to learn high-level representations
     of the input data is also a form of deep learning (e.g., deep belief networks).
+  id: totrans-14
   prefs: []
   type: TYPE_NORMAL
+  zh: 大多数深度学习系统是*人工神经网络*（ANNs，或简称*神经网络*）具有多个堆叠的隐藏层。因此，*深度学习*现在几乎已经成为*深度神经网络*的同义词。然而，任何使用多层学习输入数据的高级表示的系统也是一种深度学习形式（例如，深度信念网络）。
 - en: Let’s start by breaking down exactly what we mean by a neural network and then
     see how they can be used to learn high-level features from unstructured data.
+  id: totrans-15
   prefs: []
   type: TYPE_NORMAL
+  zh: 让我们首先详细解释一下神经网络的含义，然后看看它们如何用于从非结构化数据中学习高级特征。
 - en: What Is a Neural Network?
+  id: totrans-16
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 什么是神经网络？
 - en: A neural network consists of a series of stacked *layers*. Each layer contains
     *units* that are connected to the previous layer’s units through a set of *weights*.
     As we shall see, there are many different types of layers, but one of the most
     common is the *fully connected* (or *dense*) layer that connects all units in
     the layer directly to every unit in the previous layer.
+  id: totrans-17
   prefs: []
   type: TYPE_NORMAL
+  zh: 神经网络由一系列堆叠的*层*组成。每一层包含通过一组*权重*连接到前一层单元的*单元*。正如我们将看到的，有许多不同类型的层，但其中最常见的是*全连接*（或*密集*）层，它将该层中的所有单元直接连接到前一层的每个单元。
 - en: Neural networks where all adjacent layers are fully connected are called *multilayer
     perceptrons* (MLPs). This is the first type of neural network that we will study.
     An example of an MLP is shown in [Figure 2-2](#deep_learning_diagram).
+  id: totrans-18
   prefs: []
   type: TYPE_NORMAL
+  zh: 所有相邻层都是全连接的神经网络称为*多层感知器*（MLPs）。这是我们将要学习的第一种神经网络。[图2-2](#deep_learning_diagram)中显示了一个MLP的示例。
 - en: '![](Images/gdl2_0202.png)'
+  id: totrans-19
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_0202.png)'
 - en: Figure 2-2\. An example of a multilayer perceptron that predicts if a face is
     smiling
+  id: totrans-20
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图2-2。一个预测脸部是否微笑的多层感知器的示例
 - en: The input (e.g., an image) is transformed by each layer in turn, in what is
     known as a *forward pass* through the network, until it reaches the output layer.
     Specifically, each unit applies a nonlinear transformation to a weighted sum of
     its inputs and passes the output through to the subsequent layer. The final output
     layer is the culmination of this process, where the single unit outputs a probability
     that the original input belongs to a particular category (e.g., *smiling*).
+  id: totrans-21
   prefs: []
   type: TYPE_NORMAL
+  zh: 输入（例如，一张图像）依次通过网络中的每一层进行转换，直到达到输出层，这被称为网络的*前向传递*。具体来说，每个单元对其输入的加权和应用非线性变换，并将输出传递到后续层。最终的输出层是这个过程的结尾，单个单元输出一个概率，表明原始输入属于特定类别（例如，*微笑*）。
 - en: The magic of deep neural networks lies in finding the set of weights for each
     layer that results in the most accurate predictions. The process of finding these
     weights is what we mean by *training* the network.
+  id: totrans-22
   prefs: []
   type: TYPE_NORMAL
+  zh: 深度神经网络的魔力在于找到每一层的权重集，以获得最准确的预测。找到这些权重的过程就是我们所说的*训练*网络。
 - en: During the training process, batches of images are passed through the network
     and the predicted outputs are compared to the ground truth. For example, the network
     might output a probability of 80% for an image of someone who really is smiling
@@ -143,184 +175,258 @@
     the prediction most significantly. This process is appropriately called *backpropagation*.
     Gradually, each unit becomes skilled at identifying a particular feature that
     ultimately helps the network to make better predictions.
+  id: totrans-23
   prefs: []
   type: TYPE_NORMAL
+  zh: 在训练过程中，一批图像通过网络传递，并将预测输出与真实值进行比较。例如，网络可能为一个真正微笑的人的图像输出80%的概率，为一个真正不微笑的人的图像输出23%的概率。对于这些示例，完美的预测将输出100%和0%，因此存在一定的误差。然后，预测中的误差通过网络向后传播，调整每组权重，使其朝着最显著改善预测的方向微调。这个过程被适当地称为*反向传播*。逐渐地，每个单元变得擅长识别一个特定的特征，最终帮助网络做出更好的预测。
 - en: Learning High-Level Features
+  id: totrans-24
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 学习高级特征
 - en: The critical property that makes neural networks so powerful is their ability
     to learn features from the input data, without human guidance. In other words,
     we do not need to do any feature engineering, which is why neural networks are
     so useful! We can let the model decide how it wants to arrange its weights, guided
     only by its desire to minimize the error in its predictions.
+  id: totrans-25
   prefs: []
   type: TYPE_NORMAL
+  zh: 使神经网络如此强大的关键属性是它们能够从输入数据中学习特征，而无需人类指导。换句话说，我们不需要进行任何特征工程，这就是为什么神经网络如此有用！我们可以让模型决定如何安排其权重，只受其希望最小化预测误差的影响。
 - en: 'For example, let’s walk through the network shown in [Figure 2-2](#deep_learning_diagram),
     assuming it has already been trained to accurately predict if a given input face
     is smiling:'
+  id: totrans-26
   prefs: []
   type: TYPE_NORMAL
+  zh: 例如，让我们来解释一下[图2-2](#deep_learning_diagram)中所示的网络，假设它已经被训练得可以准确预测给定输入脸部是否微笑：
 - en: Unit A receives the value for an individual channel of an input pixel.
+  id: totrans-27
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
+  zh: 单元A接收输入像素的单个通道的值。
 - en: Unit B combines its input values so that it fires strongest when a particular
     low-level feature such as an edge is present.
+  id: totrans-28
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
+  zh: 单元B组合其输入值，使得当存在特定的低级特征，例如边缘时，它发射最强。
 - en: Unit C combines the low-level features so that it fires strongest when a higher-level
     feature such as *teeth* are seen in the image.
+  id: totrans-29
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
+  zh: 单元C组合低级特征，使得当图像中看到高级特征，例如*牙齿*时，它发射最强。
 - en: Unit D combines the high-level features so that it fires strongest when the
     person in the original image is smiling.
+  id: totrans-30
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
+  zh: 单元D结合高级特征，使得当原始图像中的人在微笑时它发射最强。
 - en: Units in each subsequent layer are able to represent increasingly sophisticated
     aspects of the original input, by combining lower-level features from the previous
     layer. Amazingly, this arises naturally out of the training process—we do not
     need to *tell* each unit what to look for, or whether it should look for high-level
     features or low-level features.
+  id: totrans-31
   prefs: []
   type: TYPE_NORMAL
+  zh: 每个后续层中的单元能够通过结合来自前一层的低级特征来表示原始输入的越来越复杂的方面。令人惊讶的是，这是训练过程中自然产生的——我们不需要*告诉*每个单元要寻找什么，或者它应该寻找高级特征还是低级特征。
 - en: The layers between the input and output layers are called *hidden* layers. While
     our example only has two hidden layers, deep neural networks can have many more.
     Stacking large numbers of layers allows the neural network to learn progressively
     higher-level features by gradually building up information from the lower-level
     features in previous layers. For example, ResNet,^([1](ch02.xhtml#idm45387028957520))
     designed for image recognition, contains 152 layers.
+  id: totrans-32
   prefs: []
   type: TYPE_NORMAL
+  zh: 输入层和输出层之间的层被称为*隐藏*层。虽然我们的例子只有两个隐藏层，但深度神经网络可以有更多层。堆叠大量层允许神经网络逐渐构建信息，从先前层中的低级特征逐渐构建出更高级别的特征。例如，用于图像识别的ResNet包含152层。
 - en: Next, we’ll dive straight into the practical side of deep learning and get set
     up with TensorFlow and Keras so that you can start building your own deep neural
     networks.
+  id: totrans-33
   prefs: []
   type: TYPE_NORMAL
+  zh: 接下来，我们将直接深入深度学习的实践方面，并使用TensorFlow和Keras进行设置，以便您可以开始构建自己的深度神经网络。
 - en: TensorFlow and Keras
+  id: totrans-34
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: TensorFlow和Keras
 - en: '[*TensorFlow*](https://www.tensorflow.org) is an open source Python library
     for machine learning, developed by Google. TensorFlow is one of the most utilized
     frameworks for building machine learning solutions, with particular emphasis on
     the manipulation of tensors (hence the name). It provides the low-level functionality
     required to train neural networks, such as computing the gradient of arbitrary
     differentiable expressions and efficiently executing tensor operations.'
+  id: totrans-35
   prefs: []
   type: TYPE_NORMAL
+  zh: '[*TensorFlow*](https://www.tensorflow.org)是由谷歌开发的用于机器学习的开源Python库。TensorFlow是构建机器学习解决方案中最常用的框架之一，特别强调张量的操作（因此得名）。它提供了训练神经网络所需的低级功能，例如计算任意可微表达式的梯度和高效执行张量操作。'
 - en: '[*Keras*](https://keras.io) is a high-level API for building neural networks,
     built on top of TensorFlow ([Figure 2-3](#tf_keras_logos)). It is extremely flexible
     and very user-friendly, making it an ideal choice for getting started with deep
     learning. Moreover, Keras provides numerous useful building blocks that can be
     plugged together to create highly complex deep learning architectures through
     its functional API.'
+  id: totrans-36
   prefs: []
   type: TYPE_NORMAL
+  zh: '[*Keras*](https://keras.io)是一个用于构建神经网络的高级API，构建在TensorFlow之上（[图2-3](#tf_keras_logos)）。它非常灵活和用户友好，是开始深度学习的理想选择。此外，Keras提供了许多有用的构建模块，可以通过其功能API组合在一起，创建高度复杂的深度学习架构。'
 - en: '![](Images/gdl2_0203.png)'
+  id: totrans-37
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_0203.png)'
 - en: Figure 2-3\. TensorFlow and Keras are excellent tools for building deep learning
     solutions
+  id: totrans-38
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图2-3\. TensorFlow和Keras是构建深度学习解决方案的优秀工具
 - en: If you are just getting started with deep learning, I can highly recommend using
     TensorFlow and Keras. This setup will allow you to build any network that you
     can think of in a production environment, while also giving you an easy-to-learn
     API that enables rapid development of new ideas and concepts. Let’s start by seeing
     how easy it is to build a multilayer perceptron using Keras.
+  id: totrans-39
   prefs: []
   type: TYPE_NORMAL
+  zh: 如果您刚开始学习深度学习，我强烈推荐使用TensorFlow和Keras。这个设置将允许您在生产环境中构建任何您能想到的网络，同时还提供易于学习的API，可以快速开发新的想法和概念。让我们从看看使用Keras构建多层感知器有多容易开始。
 - en: Multilayer Perceptron (MLP)
+  id: totrans-40
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
+  zh: 多层感知器（MLP）
 - en: In this section, we will train an MLP to classify a given image using *supervised
     learning*. Supervised learning is a type of machine learning algorithm in which
     the computer is trained on a labeled dataset. In other words, the dataset used
     for training includes input data with corresponding output labels. The goal of
     the algorithm is to learn a mapping between the input data and the output labels,
     so that it can make predictions on new, unseen data.
+  id: totrans-41
   prefs: []
   type: TYPE_NORMAL
+  zh: 在本节中，我们将使用*监督学习*训练一个MLP来对给定的图像进行分类。监督学习是一种机器学习算法，计算机在标记的数据集上进行训练。换句话说，用于训练的数据集包括带有相应输出标签的输入数据。算法的目标是学习输入数据和输出标签之间的映射，以便它可以对新的、未见过的数据进行预测。
 - en: The MLP is a discriminative (rather than generative) model, but supervised learning
     will still play a role in many types of generative models that we will explore
     in later chapters of this book, so it is a good place to start our journey.
+  id: totrans-42
   prefs: []
   type: TYPE_NORMAL
+  zh: MLP是一种判别模型（而不是生成模型），但在本书后面的章节中，监督学习仍将在许多类型的生成模型中发挥作用，因此这是我们旅程的一个好起点。
 - en: Running the Code for This Example
+  id: totrans-43
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
+  zh: 运行此示例的代码
 - en: The code for this example can be found in the Jupyter notebook located at *notebooks/02_deeplearning/01_mlp/mlp.ipynb*
     in the book repository.
+  id: totrans-44
   prefs: []
   type: TYPE_NORMAL
+  zh: 这个例子的代码可以在位于书籍存储库中的Jupyter笔记本中找到，位置为*notebooks/02_deeplearning/01_mlp/mlp.ipynb*。
 - en: Preparing the Data
+  id: totrans-45
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 准备数据
 - en: For this example we will be using the [CIFAR-10](https://oreil.ly/cNbFG) dataset,
     a collection of 60,000 32 × 32–pixel color images that comes bundled with Keras
     out of the box. Each image is classified into exactly one of 10 classes, as shown
     in [Figure 2-4](#cifar).
+  id: totrans-46
   prefs: []
   type: TYPE_NORMAL
+  zh: 在这个例子中，我们将使用[CIFAR-10](https://oreil.ly/cNbFG)数据集，这是一个包含60,000个32×32像素彩色图像的集合，与Keras捆绑在一起。每个图像被分类为10个类别中的一个，如[图2-4](#cifar)所示。
 - en: '![](Images/gdl2_0204.png)'
+  id: totrans-47
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_0204.png)'
 - en: 'Figure 2-4\. Example images from the CIFAR-10 dataset (source: [Krizhevsky,
     2009](https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf))^([2](ch02.xhtml#idm45387033163216))'
+  id: totrans-48
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图2-4\. CIFAR-10数据集中的示例图像（来源：[Krizhevsky, 2009](https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf))^([2](ch02.xhtml#idm45387033163216))
 - en: By default, the image data consists of integers between 0 and 255 for each pixel
     channel. We first need to preprocess the images by scaling these values to lie
     between 0 and 1, as neural networks work best when the absolute value of each
     input is less than 1.
+  id: totrans-49
   prefs: []
   type: TYPE_NORMAL
+  zh: 默认情况下，图像数据由每个像素通道的0到255之间的整数组成。我们首先需要通过将这些值缩放到0到1之间来预处理图像，因为当每个输入的绝对值小于1时，神经网络的效果最好。
 - en: We also need to change the integer labeling of the images to one-hot encoded
     vectors, because the neural network output will be a probability that the image
     belongs to each class. If the class integer label of an image is <math alttext="i"><mi>i</mi></math>
     , then its one-hot encoding is a vector of length 10 (the number of classes) that
     has 0s in all but the <math alttext="i"><mi>i</mi></math> th element, which is
     1\. These steps are shown in [Example 2-1](#preprocessing-cifar-10).
+  id: totrans-50
   prefs: []
   type: TYPE_NORMAL
+  zh: 我们还需要将图像的整数标签更改为独热编码向量，因为神经网络的输出将是图像属于每个类的概率。如果图像的类整数标签是<math alttext="i"><mi>i</mi></math>，那么它的独热编码是一个长度为10的向量（类的数量），除了第<math
+    alttext="i"><mi>i</mi></math>个元素为1之外，其他元素都为0。这些步骤在[示例2-1](#preprocessing-cifar-10)中显示。
 - en: Example 2-1\. Preprocessing the CIFAR-10 dataset
+  id: totrans-51
   prefs:
   - PREF_H5
   type: TYPE_NORMAL
+  zh: 示例2-1。预处理CIFAR-10数据集
 - en: '[PRE0]'
+  id: totrans-52
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE0]'
 - en: '[![1](Images/1.png)](#co_deep_learning_CO1-1)'
+  id: totrans-53
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![1](Images/1.png)](#co_deep_learning_CO1-1)'
 - en: Load the CIFAR-10 dataset. `x_train` and `x_test` are `numpy` arrays of shape
     `[50000, 32, 32, 3]` and `[10000, 32, 32, 3]`, respectively. `y_train` and `y_test`
     are `numpy` arrays of shape `[50000, 1]` and `[10000, 1]`, respectively, containing
     the integer labels in the range 0 to 9 for the class of each image.
+  id: totrans-54
   prefs: []
   type: TYPE_NORMAL
+  zh: 加载CIFAR-10数据集。`x_train`和`x_test`分别是形状为`[50000, 32, 32, 3]`和`[10000, 32, 32,
+    3]`的`numpy`数组。`y_train`和`y_test`分别是形状为`[50000, 1]`和`[10000, 1]`的`numpy`数组，包含每个图像类的范围为0到9的整数标签。
 - en: '[![2](Images/2.png)](#co_deep_learning_CO1-2)'
+  id: totrans-55
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![2](Images/2.png)](#co_deep_learning_CO1-2)'
 - en: Scale each image so that the pixel channel values lie between 0 and 1.
+  id: totrans-56
   prefs: []
   type: TYPE_NORMAL
+  zh: 缩放每个图像，使像素通道值介于0和1之间。
 - en: '[![3](Images/3.png)](#co_deep_learning_CO1-3)'
+  id: totrans-57
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![3](Images/3.png)](#co_deep_learning_CO1-3)'
 - en: One-hot encode the labels—the new shapes of `y_train` and `y_test` are `[50000,
     10]` and `[10000, 10]`, respectively.
+  id: totrans-58
   prefs: []
   type: TYPE_NORMAL
+  zh: 对标签进行独热编码——`y_train`和`y_test`的新形状分别为`[50000, 10]`和`[10000, 10]`。
 - en: We can see that the training image data (`x_train`) is stored in a *tensor*
     of shape `[50000, 32, 32, 3]`. There are no *columns* or *rows* in this dataset;
     instead, this is a tensor with four dimensions. A tensor is just a multidimensional
@@ -328,108 +434,154 @@
     first dimension of this tensor references the index of the image in the dataset,
     the second and third relate to the size of the image, and the last is the channel
     (i.e., red, green, or blue, since these are RGB images).
+  id: totrans-59
   prefs: []
   type: TYPE_NORMAL
+  zh: 我们可以看到训练图像数据（`x_train`）存储在形状为`[50000, 32, 32, 3]`的*张量*中。在这个数据集中没有*列*或*行*；相反，这是一个具有四个维度的张量。张量只是一个多维数组——它是矩阵向超过两个维度的自然扩展。这个张量的第一个维度引用数据集中图像的索引，第二和第三个维度与图像的大小有关，最后一个是通道（即红色、绿色或蓝色，因为这些是RGB图像）。
 - en: For example, [Example 2-2](#pixel-value) shows how we can find the channel value
     of a specific pixel in an image.
+  id: totrans-60
   prefs: []
   type: TYPE_NORMAL
+  zh: 例如，[示例2-2](#pixel-value)展示了如何找到图像中特定像素的通道值。
 - en: Example 2-2\. The green channel (1) value of the pixel in the (12,13) position
     of image 54
+  id: totrans-61
   prefs:
   - PREF_H5
   type: TYPE_NORMAL
+  zh: 示例2-2。图像54中位置为（12,13）的像素的绿色通道（1）值
 - en: '[PRE1]'
+  id: totrans-62
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE1]'
 - en: Building the Model
+  id: totrans-63
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 构建模型
 - en: In Keras you can either define the structure of a neural network as a `Sequential`
     model or using the functional API.
+  id: totrans-64
   prefs: []
   type: TYPE_NORMAL
+  zh: 在Keras中，您可以将神经网络的结构定义为`Sequential`模型或使用功能API。
 - en: A `Sequential` model is useful for quickly defining a linear stack of layers
     (i.e., where one layer follows on directly from the previous layer without any
     branching). We can define our MLP model using the `Sequential` class as shown
     in [Example 2-3](#sequential_functional).
+  id: totrans-65
   prefs: []
   type: TYPE_NORMAL
+  zh: '`Sequential`模型适用于快速定义一系列层的线性堆叠（即一个层直接跟在前一个层后面，没有任何分支）。我们可以使用`Sequential`类来定义我们的MLP模型，如[示例2-3](#sequential_functional)所示。'
 - en: Example 2-3\. Building our MLP using a `Sequential` model
+  id: totrans-66
   prefs:
   - PREF_H5
   type: TYPE_NORMAL
+  zh: 示例2-3。使用`Sequential`模型构建我们的MLP
 - en: '[PRE2]'
+  id: totrans-67
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE2]'
 - en: Many of the models in this book require that the output from a layer is passed
     to multiple subsequent layers, or conversely, that a layer receives input from
     multiple preceding layers. For these models, the `Sequential` class is not suitable
     and we would need to use the functional API instead, which is a lot more flexible.
+  id: totrans-68
   prefs: []
   type: TYPE_NORMAL
+  zh: 本书中的许多模型要求从一层输出传递到多个后续层，或者反过来，一层接收来自多个前面层的输入。对于这些模型，`Sequential`类不适用，我们需要使用功能API，这样更加灵活。
 - en: Tip
+  id: totrans-69
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 提示
 - en: I recommend that even if you are just starting out building linear models with
     Keras, you still use the functional API rather than `Sequential` models, since
     it will serve you better in the long run as your neural networks become more architecturally
     complex. The functional API will give you complete freedom over the design of
     your deep neural network.
+  id: totrans-70
   prefs: []
   type: TYPE_NORMAL
+  zh: 我建议即使您刚开始使用Keras构建线性模型，也应该使用功能API而不是`Sequential`模型，因为随着您的神经网络变得更加复杂，功能API将在长远中为您提供更好的服务。功能API将为您提供对深度神经网络设计的完全自由。
 - en: '[Example 2-4](#sequential_functional-2) shows the same MLP coded using the
     functional API. When using the functional API, we use the `Model` class to define
     the overall input and output layers of the model.'
+  id: totrans-71
   prefs: []
   type: TYPE_NORMAL
+  zh: '[示例2-4](#sequential_functional-2)展示了使用功能API编码的相同MLP。在使用功能API时，我们使用`Model`类来定义模型的整体输入和输出层。'
 - en: Example 2-4\. Building our MLP using the functional API
+  id: totrans-72
   prefs:
   - PREF_H5
   type: TYPE_NORMAL
+  zh: 示例2-4。使用功能API构建我们的MLP
 - en: '[PRE3]'
+  id: totrans-73
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE3]'
 - en: Both methods give identical models—a diagram of the architecture is shown in
     [Figure 2-5](#cifar_nn).
+  id: totrans-74
   prefs: []
   type: TYPE_NORMAL
+  zh: 这两种方法提供相同的模型——架构的图表显示在[图2-5](#cifar_nn)中。
 - en: '![](Images/gdl2_0205.png)'
+  id: totrans-75
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_0205.png)'
 - en: Figure 2-5\. A diagram of the MLP architecture
+  id: totrans-76
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图2-5。MLP架构的图表
 - en: Let’s now look in more detail at the different layers and activation functions
     used within the MLP.
+  id: totrans-77
   prefs: []
   type: TYPE_NORMAL
+  zh: 现在让我们更详细地看一下MLP中使用的不同层和激活函数。
 - en: Layers
+  id: totrans-78
   prefs:
   - PREF_H3
   type: TYPE_NORMAL
+  zh: 层
 - en: 'To build our MLP, we used three different types of layers: `Input`, `Flatten`,
     and `Dense`.'
+  id: totrans-79
   prefs: []
   type: TYPE_NORMAL
+  zh: 为构建我们的MLP，我们使用了三种不同类型的层：`Input`、`Flatten`和`Dense`。
 - en: The `Input` layer is an entry point into the network. We tell the network the
     shape of each data element to expect as a tuple. Notice that we do not specify
     the batch size; this isn’t necessary as we can pass any number of images into
     the `Input` layer simultaneously. We do not need to explicitly state the batch
     size in the `Input` layer definition.
+  id: totrans-80
   prefs: []
   type: TYPE_NORMAL
+  zh: '`Input`层是网络的入口点。我们告诉网络每个数据元素的形状应该是一个元组。请注意，我们不指定批量大小；这是不必要的，因为我们可以同时将任意数量的图像传递到`Input`层中。我们不需要在`Input`层定义中明确指定批量大小。'
 - en: Next we flatten this input into a vector, using a `Flatten` layer. This results
     in a vector of length 3,072 (= 32 × 32 × 3). The reason we do this is because
     the subsequent `Dense` layer requires that its input is flat, rather than a multidimensional
     array. As we shall see later, other layer types require multidimensional arrays
     as input, so you need to be aware of the required input and output shape of each
     layer type to understand when it is necessary to use `Flatten`.
+  id: totrans-81
   prefs: []
   type: TYPE_NORMAL
+  zh: 接下来，我们将这个输入展平成一个向量，使用`Flatten`层。这将导致一个长度为3072的向量（= 32 × 32 × 3）。我们这样做的原因是因为后续的`Dense`层要求其输入是平坦的，而不是多维数组。正如我们将在后面看到的，其他类型的层需要多维数组作为输入，因此您需要了解每种层类型所需的输入和输出形状，以便了解何时需要使用`Flatten`。
 - en: The `Dense` layer is one of the most fundamental building blocks of a neural
     network. It contains a given number of units that are densely connected to the
     previous layer—that is, every unit in the layer is connected to every unit in
@@ -439,16 +591,22 @@
     nonlinear *activation function* before being sent to the following layer. The
     activation function is critical to ensure the neural network is able to learn
     complex functions and doesn’t just output a linear combination of its inputs.
+  id: totrans-82
   prefs: []
   type: TYPE_NORMAL
+  zh: '`Dense`层是神经网络中最基本的构建块之一。它包含一定数量的单元，这些单元与前一层密切连接，也就是说，层中的每个单元都与前一层中的每个单元连接，通过一个携带权重的单一连接（可以是正数或负数）。给定单元的输出是它从前一层接收的输入的加权和，然后通过非线性*激活函数*传递到下一层。激活函数对于确保神经网络能够学习复杂函数并且不仅仅输出其输入的线性组合至关重要。'
 - en: Activation functions
+  id: totrans-83
   prefs:
   - PREF_H3
   type: TYPE_NORMAL
+  zh: 激活函数
 - en: There are many kinds of activation function, but three of the most important
     are ReLU, sigmoid, and softmax.
+  id: totrans-84
   prefs: []
   type: TYPE_NORMAL
+  zh: 有许多种激活函数，但其中最重要的三种是ReLU、sigmoid和softmax。
 - en: 'The *ReLU* (rectified linear unit) activation function is defined to be 0 if
     the input is negative and is otherwise equal to the input. The *LeakyReLU* activation
     function is very similar to ReLU, with one key difference: whereas the ReLU activation
@@ -459,104 +617,161 @@
     this unit. LeakyReLU activations fix this issue by always ensuring the gradient
     is nonzero. ReLU-based functions are among the most reliable activations to use
     between the layers of a deep network to encourage stable training.'
+  id: totrans-85
   prefs: []
   type: TYPE_NORMAL
+  zh: '*ReLU*（修正线性单元）激活函数被定义为如果输入为负数则为0，否则等于输入。*LeakyReLU*激活函数与ReLU非常相似，但有一个关键区别：ReLU激活函数对于小于0的输入值返回0，而LeakyReLU函数返回与输入成比例的一个小负数。如果ReLU单元总是输出0，有时会出现死亡现象，因为存在对负值预激活的大偏差。在这种情况下，梯度为0，因此没有错误通过该单元向后传播。LeakyReLU激活通过始终确保梯度为非零来解决这个问题。基于ReLU的函数是在深度网络的层之间使用的最可靠的激活函数之一，以鼓励稳定的训练。'
 - en: The *sigmoid* activation is useful if you wish the output from the layer to
     be scaled between 0 and 1—for example, for binary classification problems with
     one output unit or multilabel classification problems, where each observation
     can belong to more than one class. [Figure 2-6](#activations) shows ReLU, LeakyReLU,
     and sigmoid activation functions side by side for comparison.
+  id: totrans-86
   prefs: []
   type: TYPE_NORMAL
+  zh: 如果您希望从该层输出的结果在0和1之间缩放，那么*sigmoid*激活函数是有用的，例如，对于具有一个输出单元的二元分类问题或多标签分类问题，其中每个观察结果可以属于多个类。[图2-6](#activations)显示了ReLU、LeakyReLU和sigmoid激活函数并排进行比较。
 - en: '![](Images/gdl2_0206.png)'
+  id: totrans-87
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_0206.png)'
 - en: Figure 2-6\. The ReLU, LeakyReLU, and sigmoid activation functions
+  id: totrans-88
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图2-6。ReLU、LeakyReLU和sigmoid激活函数
 - en: 'The *softmax* activation function is useful if you want the total sum of the
     output from the layer to equal 1; for example, for multiclass classification problems
     where each observation only belongs to exactly one class. It is defined as:'
+  id: totrans-89
   prefs: []
   type: TYPE_NORMAL
+  zh: 如果您希望从该层输出的总和等于1，则*softmax*激活函数是有用的；例如，对于每个观察结果只属于一个类的多类分类问题。它被定义为：
 - en: <math alttext="y Subscript i Baseline equals StartFraction e Superscript x Super
     Subscript i Superscript Baseline Over sigma-summation Underscript j equals 1 Overscript
     upper J Endscripts e Superscript x Super Subscript j Superscript Baseline EndFraction"
     display="block"><mrow><msub><mi>y</mi> <mi>i</mi></msub> <mo>=</mo> <mfrac><msup><mi>e</mi>
     <msub><mi>x</mi> <mi>i</mi></msub></msup> <mrow><munderover><mo>∑</mo> <mrow><mi>j</mi><mo>=</mo><mn>1</mn></mrow>
     <mi>J</mi></munderover> <msup><mi>e</mi> <msub><mi>x</mi> <mi>j</mi></msub></msup></mrow></mfrac></mrow></math>
+  id: totrans-90
   prefs: []
   type: TYPE_NORMAL
+  zh: <math alttext="y Subscript i Baseline equals StartFraction e Superscript x Super
+    Subscript i Superscript Baseline Over sigma-summation Underscript j equals 1 Overscript
+    upper J Endscripts e Superscript x Super Subscript j Superscript Baseline EndFraction"
+    display="block"><mrow><msub><mi>y</mi> <mi>i</mi></msub> <mo>=</mo> <mfrac><msup><mi>e</mi>
+    <msub><mi>x</mi> <mi>i</mi></msub></msup> <mrow><munderover><mo>∑</mo> <mrow><mi>j</mi><mo>=</mo><mn>1</mn></mrow>
+    <mi>J</mi></munderover> <msup><mi>e</mi> <msub><mi>x</mi> <mi>j</mi></msub></msup></mrow></mfrac></mrow></math>
 - en: Here, *J* is the total number of units in the layer. In our neural network,
     we use a softmax activation in the final layer to ensure that the output is a
     set of 10 probabilities that sum to 1, which can be interpreted as the likelihood
     that the image belongs to each class.
+  id: totrans-91
   prefs: []
   type: TYPE_NORMAL
+  zh: 在这里，*J*是层中单元的总数。在我们的神经网络中，我们在最后一层使用softmax激活，以确保输出是一组总和为1的10个概率，这可以被解释为图像属于每个类的可能性。
 - en: In Keras, activation functions can be defined within a layer ([Example 2-5](#activation-function-together))
     or as a separate layer ([Example 2-6](#activation-function-separate)).
+  id: totrans-92
   prefs: []
   type: TYPE_NORMAL
+  zh: 在Keras中，激活函数可以在层内定义（[示例2-5](#activation-function-together)）或作为单独的层定义（[示例2-6](#activation-function-separate)）。
 - en: Example 2-5\. A ReLU activation function defined as part of a `Dense` layer
+  id: totrans-93
   prefs:
   - PREF_H5
   type: TYPE_NORMAL
+  zh: 示例2-5。作为`Dense`层的一部分定义的ReLU激活函数
 - en: '[PRE4]'
+  id: totrans-94
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE4]'
 - en: Example 2-6\. A ReLU activation function defined as its own layer
+  id: totrans-95
   prefs:
   - PREF_H5
   type: TYPE_NORMAL
+  zh: 示例2-6。作为自己的层定义的ReLU激活函数
 - en: '[PRE5]'
+  id: totrans-96
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE5]'
 - en: In our example, we pass the input through two `Dense` layers, the first with
     200 units and the second with 150, both with ReLU activation functions.
+  id: totrans-97
   prefs: []
   type: TYPE_NORMAL
+  zh: 在我们的示例中，我们通过两个`Dense`层传递输入，第一个有200个单元，第二个有150个，两者都带有ReLU激活函数。
 - en: Inspecting the model
+  id: totrans-98
   prefs:
   - PREF_H3
   type: TYPE_NORMAL
+  zh: 检查模型
 - en: We can use the `model.summary()` method to inspect the shape of the network
     at each layer, as shown in [Table 2-1](#first_nn_shape).
+  id: totrans-99
   prefs: []
   type: TYPE_NORMAL
+  zh: 我们可以使用`model.summary()`方法来检查每一层网络的形状，如[表2-1](#first_nn_shape)所示。
 - en: Table 2-1\. Output from the `model.summary()` method
+  id: totrans-100
   prefs: []
   type: TYPE_NORMAL
+  zh: 表2-1. `model.summary()`方法的输出
 - en: '| Layer (type) | Output shape | Param # |'
+  id: totrans-101
   prefs: []
   type: TYPE_TB
+  zh: '| 层（类型） | 输出形状 | 参数 # |'
 - en: '| --- | --- | --- |'
+  id: totrans-102
   prefs: []
   type: TYPE_TB
+  zh: '| --- | --- | --- |'
 - en: '| InputLayer | (None, 32, 32, 3) | 0 |'
+  id: totrans-103
   prefs: []
   type: TYPE_TB
+  zh: '| InputLayer | (None, 32, 32, 3) | 0 |'
 - en: '| Flatten | (None, 3072) | 0 |'
+  id: totrans-104
   prefs: []
   type: TYPE_TB
+  zh: '| 展平 | (None, 3072) | 0 |'
 - en: '| Dense | (None, 200) | 614,600 |'
+  id: totrans-105
   prefs: []
   type: TYPE_TB
+  zh: '| Dense | (None, 200) | 614,600 |'
 - en: '| Dense | (None, 150) | 30,150 |'
+  id: totrans-106
   prefs: []
   type: TYPE_TB
+  zh: '| Dense | (None, 150) | 30,150 |'
 - en: '| Dense | (None, 10) | 1,510 |'
+  id: totrans-107
   prefs: []
   type: TYPE_TB
+  zh: '| Dense | (None, 10) | 1,510 |'
 - en: '| Total params | 646,260 |'
+  id: totrans-108
   prefs: []
   type: TYPE_TB
+  zh: '| 总参数 | 646,260 |'
 - en: '| Trainable params | 646,260 |'
+  id: totrans-109
   prefs: []
   type: TYPE_TB
+  zh: '| 可训练参数 | 646,260 |'
 - en: '| Non-trainable params | 0 |'
+  id: totrans-110
   prefs: []
   type: TYPE_TB
+  zh: '| 不可训练参数 | 0 |'
 - en: 'Notice how the shape of our `Input` layer matches the shape of `x_train` and
     the shape of our `Dense` output layer matches the shape of `y_train`. Keras uses
     `None` as a marker for the first dimension to show that it doesn’t yet know the
@@ -567,63 +782,89 @@
     is also the reason why you get a performance increase when training deep neural
     networks on GPUs instead of CPUs: GPUs are optimized for large tensor operations
     since these calculations are also necessary for complex graphics manipulation.'
+  id: totrans-111
   prefs: []
   type: TYPE_NORMAL
+  zh: 注意我们的`Input`层的形状与`x_train`的形状匹配，而我们的`Dense`输出层的形状与`y_train`的形状匹配。Keras使用`None`作为第一维的标记，以显示它尚不知道将传递到网络中的观测数量。实际上，它不需要知道；我们可以一次通过1个观测或1000个观测通过网络。这是因为张量操作是使用线性代数同时在所有观测上进行的—这是由TensorFlow处理的部分。这也是为什么在GPU上训练深度神经网络而不是在CPU上时性能会提高的原因：GPU针对大型张量操作进行了优化，因为这些计算对于复杂的图形处理也是必要的。
 - en: The `summary` method also gives the number of parameters (weights) that will
     be trained at each layer. If ever you find that your model is training too slowly,
     check the summary to see if there are any layers that contain a huge number of
     weights. If so, you should consider whether the number of units in the layer could
     be reduced to speed up training.
+  id: totrans-112
   prefs: []
   type: TYPE_NORMAL
+  zh: '`summary`方法还会给出每一层将被训练的参数（权重）的数量。如果你发现你的模型训练速度太慢，检查摘要看看是否有任何包含大量权重的层。如果有的话，你应该考虑是否可以减少该层中的单元数量以加快训练速度。'
 - en: Tip
+  id: totrans-113
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 提示
 - en: Make sure you understand how the number of parameters is calculated in each
     layer! It’s important to remember that by default, each unit within a given layer
     is also connected to one additional *bias* unit that always outputs 1\. This ensures
     that the output from the unit can still be nonzero even when all inputs from the
     previous layer are 0.
+  id: totrans-114
   prefs: []
   type: TYPE_NORMAL
+  zh: 确保你理解每一层中参数是如何计算的！重要的是要记住，默认情况下，给定层中的每个单元也连接到一个额外的*偏置*单元，它总是输出1。这确保了即使来自前一层的所有输入为0，单元的输出仍然可以是非零的。
 - en: Therefore, the number of parameters in the 200-unit `Dense` layer is 200 * (3,072
     + 1) = 614,600.
+  id: totrans-115
   prefs: []
   type: TYPE_NORMAL
+  zh: 因此，200单元`Dense`层中的参数数量为200 * (3,072 + 1) = 614,600。
 - en: Compiling the Model
+  id: totrans-116
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 编译模型
 - en: In this step, we compile the model with an optimizer and a loss function, as
     shown in [Example 2-7](#optimizer-loss).
+  id: totrans-117
   prefs: []
   type: TYPE_NORMAL
+  zh: 在这一步中，我们使用一个优化器和一个损失函数来编译模型，如[示例2-7](#optimizer-loss)所示。
 - en: Example 2-7\. Defining the optimizer and the loss function
+  id: totrans-118
   prefs:
   - PREF_H5
   type: TYPE_NORMAL
+  zh: 示例2-7. 定义优化器和损失函数
 - en: '[PRE6]'
+  id: totrans-119
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE6]'
 - en: Let’s now look in more detail at what we mean by loss functions and optimizers.
+  id: totrans-120
   prefs: []
   type: TYPE_NORMAL
+  zh: 现在让我们更详细地看一下我们所说的损失函数和优化器。
 - en: Loss functions
+  id: totrans-121
   prefs:
   - PREF_H3
   type: TYPE_NORMAL
+  zh: 损失函数
 - en: The *loss function* is used by the neural network to compare its predicted output
     to the ground truth. It returns a single number for each observation; the greater
     this number, the worse the network has performed for this observation.
+  id: totrans-122
   prefs: []
   type: TYPE_NORMAL
+  zh: '*损失函数*被神经网络用来比较其预测输出与实际情况的差异。它为每个观测返回一个单一数字；这个数字越大，网络在这个观测中的表现就越差。'
 - en: Keras provides many built-in loss functions to choose from, or you can create
     your own. Three of the most commonly used are mean squared error, categorical
     cross-entropy, and binary cross-entropy. It is important to understand when it
     is appropriate to use each.
+  id: totrans-123
   prefs: []
   type: TYPE_NORMAL
+  zh: Keras提供了许多内置的损失函数可供选择，或者你可以创建自己的损失函数。最常用的三个是均方误差、分类交叉熵和二元交叉熵。重要的是要理解何时适合使用每种损失函数。
 - en: 'If your neural network is designed to solve a regression problem (i.e., the
     output is continuous), then you might use the *mean squared error* loss. This
     is the mean of the squared difference between the ground truth <math alttext="y
@@ -631,8 +872,12 @@
     alttext="p Subscript i"><msub><mi>p</mi> <mi>i</mi></msub></math> of each output
     unit, where the mean is taken over all <math alttext="n"><mi>n</mi></math> output
     units:'
+  id: totrans-124
   prefs: []
   type: TYPE_NORMAL
+  zh: 如果你的神经网络旨在解决回归问题（即输出是连续的），那么你可能会使用*均方误差*损失。这是每个输出单元的实际值<math alttext="y Subscript
+    i"><msub><mi>y</mi> <mi>i</mi></msub></math>和预测值<math alttext="p Subscript i"><msub><mi>p</mi>
+    <mi>i</mi></msub></math>之间的平方差的平均值，其中平均值是在所有<math alttext="n"><mi>n</mi></math>个输出单元上取得的：
 - en: <math alttext="upper M upper S upper E equals StartFraction 1 Over n EndFraction
     sigma-summation Underscript i equals 1 Overscript n Endscripts left-parenthesis
     y Subscript i Baseline minus p Subscript i Baseline right-parenthesis squared"
@@ -640,25 +885,43 @@
     <mo>=</mo> <mfrac><mn>1</mn> <mi>n</mi></mfrac> <munderover><mo>∑</mo> <mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow>
     <mi>n</mi></munderover> <msup><mrow><mo>(</mo><msub><mi>y</mi> <mi>i</mi></msub>
     <mo>-</mo><msub><mi>p</mi> <mi>i</mi></msub> <mo>)</mo></mrow> <mn>2</mn></msup></mrow></mstyle></math>
+  id: totrans-125
   prefs: []
   type: TYPE_NORMAL
+  zh: <math alttext="upper M upper S upper E equals StartFraction 1 Over n EndFraction
+    sigma-summation Underscript i equals 1 Overscript n Endscripts left-parenthesis
+    y Subscript i Baseline minus p Subscript i Baseline right-parenthesis squared"
+    display="block"><mstyle scriptlevel="0" displaystyle="true"><mrow><mo form="prefix">MSE</mo>
+    <mo>=</mo> <mfrac><mn>1</mn> <mi>n</mi></mfrac> <munderover><mo>∑</mo> <mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow>
+    <mi>n</mi></munderover> <msup><mrow><mo>(</mo><msub><mi>y</mi> <mi>i</mi></msub>
+    <mo>-</mo><msub><mi>p</mi> <mi>i</mi></msub> <mo>)</mo></mrow> <mn>2</mn></msup></mrow></mstyle></math>
 - en: 'If you are working on a classification problem where each observation only
     belongs to one class, then *categorical cross-entropy* is the correct loss function.
     This is defined as follows:'
+  id: totrans-126
   prefs: []
   type: TYPE_NORMAL
+  zh: 如果你正在处理一个分类问题，其中每个观测只属于一个类，那么*分类交叉熵*是正确的损失函数。它定义如下：
 - en: <math alttext="minus sigma-summation Underscript i equals 1 Overscript n Endscripts
     y Subscript i Baseline log left-parenthesis p Subscript i Baseline right-parenthesis"
     display="block"><mrow><mo>-</mo> <munderover><mo>∑</mo> <mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow>
     <mi>n</mi></munderover> <msub><mi>y</mi> <mi>i</mi></msub> <mo form="prefix">log</mo>
     <mrow><mo>(</mo> <msub><mi>p</mi> <mi>i</mi></msub> <mo>)</mo></mrow></mrow></math>
+  id: totrans-127
   prefs: []
   type: TYPE_NORMAL
+  zh: <math alttext="minus sigma-summation Underscript i equals 1 Overscript n Endscripts
+    y Subscript i Baseline log left-parenthesis p Subscript i Baseline right-parenthesis"
+    display="block"><mrow><mo>-</mo> <munderover><mo>∑</mo> <mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow>
+    <mi>n</mi></munderover> <msub><mi>y</mi> <mi>i</mi></msub> <mo form="prefix">log</mo>
+    <mrow><mo>(</mo> <msub><mi>p</mi> <mi>i</mi></msub> <mo>)</mo></mrow></mrow></math>
 - en: 'Finally, if you are working on a binary classification problem with one output
     unit, or a multilabel problem where each observation can belong to multiple classes
     simultaneously, you should use *binary cross-entropy*:'
+  id: totrans-128
   prefs: []
   type: TYPE_NORMAL
+  zh: 最后，如果你正在处理一个具有一个输出单元的二元分类问题，或者一个每个观测可以同时属于多个类的多标签问题，你应该使用*二元交叉熵*：
 - en: <math alttext="minus StartFraction 1 Over n EndFraction sigma-summation Underscript
     i equals 1 Overscript n Endscripts left-parenthesis y Subscript i Baseline log
     left-parenthesis p Subscript i Baseline right-parenthesis plus left-parenthesis
@@ -670,12 +933,26 @@
     <mo>+</mo> <mrow><mo>(</mo> <mn>1</mn> <mo>-</mo> <msub><mi>y</mi> <mi>i</mi></msub>
     <mo>)</mo></mrow> <mo form="prefix">log</mo> <mrow><mo>(</mo> <mn>1</mn> <mo>-</mo>
     <msub><mi>p</mi> <mi>i</mi></msub> <mo>)</mo></mrow> <mo>)</mo></mrow></mrow></math>
+  id: totrans-129
   prefs: []
   type: TYPE_NORMAL
+  zh: <math alttext="minus StartFraction 1 Over n EndFraction sigma-summation Underscript
+    i equals 1 Overscript n Endscripts left-parenthesis y Subscript i Baseline log
+    left-parenthesis p Subscript i Baseline right-parenthesis plus left-parenthesis
+    1 minus y Subscript i Baseline right-parenthesis log left-parenthesis 1 minus
+    p Subscript i Baseline right-parenthesis right-parenthesis" display="block"><mrow><mo>-</mo>
+    <mfrac><mn>1</mn> <mi>n</mi></mfrac> <munderover><mo>∑</mo> <mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow>
+    <mi>n</mi></munderover> <mrow><mo>(</mo> <msub><mi>y</mi> <mi>i</mi></msub> <mo
+    form="prefix">log</mo> <mrow><mo>(</mo> <msub><mi>p</mi> <mi>i</mi></msub> <mo>)</mo></mrow>
+    <mo>+</mo> <mrow><mo>(</mo> <mn>1</mn> <mo>-</mo> <msub><mi>y</mi> <mi>i</mi></msub>
+    <mo>)</mo></mrow> <mo form="prefix">log</mo> <mrow><mo>(</mo> <mn>1</mn> <mo>-</mo>
+    <msub><mi>p</mi> <mi>i</mi></msub> <mo>)</mo></mrow> <mo>)</mo></mrow></mrow></math>
 - en: Optimizers
+  id: totrans-130
   prefs:
   - PREF_H3
   type: TYPE_NORMAL
+  zh: 优化器
 - en: The *optimizer* is the algorithm that will be used to update the weights in
     the neural network based on the gradient of the loss function. One of the most
     commonly used and stable optimizers is *Adam* (Adaptive Moment Estimation).^([3](ch02.xhtml#idm45387032147088))
@@ -685,228 +962,339 @@
     with a large learning rate, the downside is that it may result in less stable
     training and may not find the global minimum of the loss function. This is a parameter
     that you may want to tune or adjust during training.
+  id: totrans-131
   prefs: []
   type: TYPE_NORMAL
+  zh: '*优化器* 是基于损失函数的梯度更新神经网络权重的算法。最常用和稳定的优化器之一是 *Adam*（自适应矩估计）。^([3](ch02.xhtml#idm45387032147088))
+    在大多数情况下，您不需要调整Adam优化器的默认参数，除了 *学习率*。学习率越大，每个训练步骤中权重的变化就越大。虽然初始时使用较大的学习率训练速度更快，但缺点是可能导致训练不稳定，无法找到损失函数的全局最小值。这是您可能需要在训练过程中调整的参数。'
 - en: Another common optimizer that you may come across is *RMSProp* (Root Mean Squared
     Propagation). Again, you shouldn’t need to adjust the parameters of this optimizer
     too much, but it is worth reading the [Keras documentation](https://keras.io/optimizers)
     to understand the role of each parameter.
+  id: totrans-132
   prefs: []
   type: TYPE_NORMAL
+  zh: 另一个您可能遇到的常见优化器是 *RMSProp*（均方根传播）。同样，您不需要太多调整这个优化器的参数，但值得阅读[Keras文档](https://keras.io/optimizers)以了解每个参数的作用。
 - en: We pass both the loss function and the optimizer into the `compile` method of
     the model, as well as a `metrics` parameter where we can specify any additional
     metrics that we would like to report on during training, such as accuracy.
+  id: totrans-133
   prefs: []
   type: TYPE_NORMAL
+  zh: 我们将损失函数和优化器一起传递给模型的 `compile` 方法，还有一个 `metrics` 参数，我们可以在训练过程中指定任何额外的指标，如准确率。
 - en: Training the Model
+  id: totrans-134
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 训练模型
 - en: Thus far, we haven’t shown the model any data. We have just set up the architecture
     and compiled the model with a loss function and optimizer.
+  id: totrans-135
   prefs: []
   type: TYPE_NORMAL
+  zh: 到目前为止，我们还没有向模型展示任何数据。我们只是设置了架构并使用损失函数和优化器编译了模型。
 - en: To train the model against the data, we simply call the `fit` method, as shown
     in [Example 2-8](#training-mlp).
+  id: totrans-136
   prefs: []
   type: TYPE_NORMAL
+  zh: 要针对数据训练模型，我们只需调用 `fit` 方法，如[示例2-8](#training-mlp)所示。
 - en: Example 2-8\. Calling the `fit` method to train the model
+  id: totrans-137
   prefs:
   - PREF_H5
   type: TYPE_NORMAL
+  zh: 示例2-8\. 调用 `fit` 方法来训练模型
 - en: '[PRE7]'
+  id: totrans-138
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE7]'
 - en: '[![1](Images/1.png)](#co_deep_learning_CO2-1)'
+  id: totrans-139
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![1](Images/1.png)](#co_deep_learning_CO2-1)'
 - en: The raw image data.
+  id: totrans-140
   prefs: []
   type: TYPE_NORMAL
+  zh: 原始图像数据。
 - en: '[![2](Images/2.png)](#co_deep_learning_CO2-2)'
+  id: totrans-141
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![2](Images/2.png)](#co_deep_learning_CO2-2)'
 - en: The one-hot encoded class labels.
+  id: totrans-142
   prefs: []
   type: TYPE_NORMAL
+  zh: 独热编码的类标签。
 - en: '[![3](Images/3.png)](#co_deep_learning_CO2-3)'
+  id: totrans-143
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![3](Images/3.png)](#co_deep_learning_CO2-3)'
 - en: The `batch_size` determines how many observations will be passed to the network
     at each training step.
+  id: totrans-144
   prefs: []
   type: TYPE_NORMAL
+  zh: '`batch_size` 确定每个训练步骤将传递给网络多少观察值。'
 - en: '[![4](Images/4.png)](#co_deep_learning_CO2-4)'
+  id: totrans-145
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![4](Images/4.png)](#co_deep_learning_CO2-4)'
 - en: The `epochs` determine how many times the network will be shown the full training
     data.
+  id: totrans-146
   prefs: []
   type: TYPE_NORMAL
+  zh: '`epochs` 确定网络将被展示完整训练数据的次数。'
 - en: '[![5](Images/5.png)](#co_deep_learning_CO2-5)'
+  id: totrans-147
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![5](Images/5.png)](#co_deep_learning_CO2-5)'
 - en: If `shuffle = True`, the batches will be drawn randomly without replacement
     from the training data at each training step.
+  id: totrans-148
   prefs: []
   type: TYPE_NORMAL
+  zh: 如果 `shuffle = True`，每个训练步骤将从训练数据中随机抽取批次而不重复。
 - en: This will start training a deep neural network to predict the category of an
     image from the CIFAR-10 dataset. The training process works as follows.
+  id: totrans-149
   prefs: []
   type: TYPE_NORMAL
+  zh: 这将开始训练一个深度神经网络，以预测来自CIFAR-10数据集的图像的类别。训练过程如下。
 - en: First, the weights of the network are initialized to small random values. Then
     the network performs a series of training steps. At each training step, one *batch*
     of images is passed through the network and the errors are backpropagated to update
     the weights. The `batch_size` determines how many images are in each training
     step batch. The larger the batch size, the more stable the gradient calculation,
     but the slower each training step.
+  id: totrans-150
   prefs: []
   type: TYPE_NORMAL
+  zh: 首先，网络的权重被初始化为小的随机值。然后网络执行一系列训练步骤。在每个训练步骤中，通过网络传递一个 *batch* 图像，并将错误反向传播以更新权重。`batch_size`
+    确定每个训练步骤批次中有多少图像。批量大小越大，梯度计算越稳定，但每个训练步骤越慢。
 - en: Tip
+  id: totrans-151
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 提示
 - en: It would be far too time-consuming and computationally intensive to use the
     entire dataset to calculate the gradient at each training step, so generally a
     batch size between 32 and 256 is used. It is also now recommended practice to
     increase the batch size as training progresses.^([4](ch02.xhtml#idm45387032068928))
+  id: totrans-152
   prefs: []
   type: TYPE_NORMAL
+  zh: 使用整个数据集在每个训练步骤中计算梯度将耗费太多时间和计算资源，因此通常使用32到256之间的批量大小。现在推荐的做法是随着训练的进行增加批量大小。^([4](ch02.xhtml#idm45387032068928))
 - en: This continues until all observations in the dataset have been seen once. This
     completes the first *epoch*. The data is then passed through the network again
     in batches as part of the second epoch. This process repeats until the specified
     number of epochs have elapsed.
+  id: totrans-153
   prefs: []
   type: TYPE_NORMAL
+  zh: 这将持续到数据集中的所有观察值都被看到一次。这完成了第一个 *epoch*。然后数据再次以批次的形式通过网络，作为第二个epoch的一部分。这个过程重复，直到指定的epoch数已经过去。
 - en: During training, Keras outputs the progress of the procedure, as shown in [Figure 2-7](#first_nn_fit).
     We can see that the training dataset has been split into 1,563 batches (each containing
     32 images) and it has been shown to the network 10 times (i.e., over 10 epochs),
     at a rate of approximately 2 milliseconds per batch. The categorical cross-entropy
     loss has fallen from 1.8377 to 1.3696, resulting in an accuracy increase from
     33.69% after the first epoch to 51.67% after the tenth epoch.
+  id: totrans-154
   prefs: []
   type: TYPE_NORMAL
+  zh: 在训练过程中，Keras会输出过程的进展，如[图2-7](#first_nn_fit)所示。我们可以看到训练数据集已经被分成了1,563批次（每批包含32张图片），并且已经被展示给网络10次（即10个epochs），每批大约需要2毫秒的时间。分类交叉熵损失从1.8377下降到1.3696，导致准确率从第一个epoch后的33.69%增加到第十个epoch后的51.67%。
 - en: '![](Images/gdl2_0207.png)'
+  id: totrans-155
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_0207.png)'
 - en: Figure 2-7\. The output from the `fit` method
+  id: totrans-156
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图2-7\. `fit` 方法的输出
 - en: Evaluating the Model
+  id: totrans-157
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 评估模型
 - en: We know the model achieves an accuracy of 51.9% on the training set, but how
     does it perform on data it has never seen?
+  id: totrans-158
   prefs: []
   type: TYPE_NORMAL
+  zh: 我们知道模型在训练集上的准确率为51.9%，但它在从未见过的数据上表现如何？
 - en: To answer this question we can use the `evaluate` method provided by Keras,
     as shown in [Example 2-9](#evaluate-mlp).
+  id: totrans-159
   prefs: []
   type: TYPE_NORMAL
+  zh: 为了回答这个问题，我们可以使用Keras提供的`evaluate`方法，如[示例2-9](#evaluate-mlp)所示。
 - en: Example 2-9\. Evaluating the model performance on the test set
+  id: totrans-160
   prefs:
   - PREF_H5
   type: TYPE_NORMAL
+  zh: 示例2-9。在测试集上评估模型性能
 - en: '[PRE8]'
+  id: totrans-161
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE8]'
 - en: '[Figure 2-8](#first_nn_evaluate) shows the output from this method.'
+  id: totrans-162
   prefs: []
   type: TYPE_NORMAL
+  zh: '[图2-8](#first_nn_evaluate)显示了这种方法的输出。'
 - en: '![](Images/gdl2_0208.png)'
+  id: totrans-163
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_0208.png)'
 - en: Figure 2-8\. The output from the `evaluate` method
+  id: totrans-164
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图2-8。`evaluate`方法的输出
 - en: 'The output is a list of the metrics we are monitoring: categorical cross-entropy
     and accuracy. We can see that model accuracy is still 49.0% even on images that
     it has never seen before. Note that if the model were guessing randomly, it would
     achieve approximately 10% accuracy (because there are 10 classes), so 49.0% is
     a good result, given that we have used a very basic neural network.'
+  id: totrans-165
   prefs: []
   type: TYPE_NORMAL
+  zh: 输出是我们正在监控的指标列表：分类交叉熵和准确率。我们可以看到，即使在它从未见过的图像上，模型的准确率仍然是49.0%。请注意，如果模型是随机猜测的，它将达到大约10%的准确率（因为有10个类别），因此49.0%是一个很好的结果，考虑到我们使用了一个非常基本的神经网络。
 - en: We can view some of the predictions on the test set using the `predict` method,
     as shown in [Example 2-10](#predict-mlp).
+  id: totrans-166
   prefs: []
   type: TYPE_NORMAL
+  zh: 我们可以使用`predict`方法查看测试集上的一些预测，如[示例2-10](#predict-mlp)所示。
 - en: Example 2-10\. Viewing predictions on the test set using the `predict` method
+  id: totrans-167
   prefs:
   - PREF_H5
   type: TYPE_NORMAL
+  zh: 示例2-10。使用`predict`方法查看测试集上的预测
 - en: '[PRE9]'
+  id: totrans-168
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE9]'
 - en: '[![1](Images/1.png)](#co_deep_learning_CO3-1)'
+  id: totrans-169
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![1](Images/1.png)](#co_deep_learning_CO3-1)'
 - en: '`preds` is an array of shape `[10000, 10]`—i.e., a vector of 10 class probabilities
     for each observation.'
+  id: totrans-170
   prefs: []
   type: TYPE_NORMAL
+  zh: '`preds`是一个形状为`[10000, 10]`的数组，即每个观测的10个类别概率的向量。'
 - en: '[![2](Images/2.png)](#co_deep_learning_CO3-2)'
+  id: totrans-171
   prefs: []
   type: TYPE_NORMAL
+  zh: '[![2](Images/2.png)](#co_deep_learning_CO3-2)'
 - en: We convert this array of probabilities back into a single prediction using `numpy`’s
     `argmax` function. Here, `axis = –1` tells the function to collapse the array
     over the last dimension (the classes dimension), so that the shape of `preds_single`
     is then `[10000, 1]`.
+  id: totrans-172
   prefs: []
   type: TYPE_NORMAL
+  zh: 我们将这个概率数组转换回一个单一的预测，使用`numpy`的`argmax`函数。这里，`axis = -1`告诉函数将数组折叠到最后一个维度（类别维度），因此`preds_single`的形状为`[10000,
+    1]`。
 - en: We can view some of the images alongside their labels and predictions with the
     code in [Example 2-11](#display-mlp). As expected, around half are correct.
+  id: totrans-173
   prefs: []
   type: TYPE_NORMAL
+  zh: 我们可以使用[示例2-11](#display-mlp)中的代码查看一些图像以及它们的标签和预测。如预期的那样，大约一半是正确的。
 - en: Example 2-11\. Displaying predictions of the MLP against the actual labels
+  id: totrans-174
   prefs:
   - PREF_H5
   type: TYPE_NORMAL
+  zh: 示例2-11。显示MLP的预测与实际标签
 - en: '[PRE10]'
+  id: totrans-175
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE10]'
 - en: '[Figure 2-9](#first_nn_preds) shows a randomly chosen selection of predictions
     made by the model, alongside the true labels.'
+  id: totrans-176
   prefs: []
   type: TYPE_NORMAL
+  zh: '[图2-9](#first_nn_preds)显示了模型随机选择的一些预测，以及真实标签。'
 - en: '![](Images/gdl2_0209.png)'
+  id: totrans-177
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_0209.png)'
 - en: Figure 2-9\. Some predictions made by the model, alongside the actual labels
+  id: totrans-178
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图2-9。模型进行的一些预测，以及实际标签
 - en: Congratulations! You’ve just built a multilayer perceptron using Keras and used
     it to make predictions on new data. Even though this is a supervised learning
     problem, when we come to building generative models in future chapters many of
     the core ideas from this chapter (such as loss functions, activation functions,
     and understanding layer shapes) will still be extremely important. Next we’ll
     look at ways of improving this model, by introducing a few new layer types.
+  id: totrans-179
   prefs: []
   type: TYPE_NORMAL
+  zh: 恭喜！您刚刚使用Keras构建了一个多层感知器，并用它对新数据进行了预测。即使这是一个监督学习问题，但当我们在未来的章节中构建生成模型时，本章的许多核心思想（如损失函数、激活函数和理解层形状）仍然非常重要。接下来，我们将探讨通过引入一些新的层类型来改进这个模型的方法。
 - en: Convolutional Neural Network (CNN)
+  id: totrans-180
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
+  zh: 卷积神经网络（CNN）
 - en: One of the reasons our network isn’t yet performing as well as it might is because
     there isn’t anything in the network that takes into account the spatial structure
     of the input images. In fact, our first step is to flatten the image into a single
     vector, so that we can pass it to the first `Dense` layer!
+  id: totrans-181
   prefs: []
   type: TYPE_NORMAL
+  zh: 我们的网络尚未表现得像它可能表现得那样好的原因之一是网络中没有考虑输入图像的空间结构。事实上，我们的第一步是将图像展平为一个单一向量，以便我们可以将其传递给第一个`Dense`层！
 - en: To achieve this we need to use a *convolutional layer*.
+  id: totrans-182
   prefs: []
   type: TYPE_NORMAL
+  zh: 为了实现这一点，我们需要使用*卷积层*。
 - en: Convolutional Layers
+  id: totrans-183
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 卷积层
 - en: First, we need to understand what is meant by a *convolution* in the context
     of deep learning.
+  id: totrans-184
   prefs: []
   type: TYPE_NORMAL
+  zh: 首先，我们需要了解在深度学习背景下*卷积*的含义。
 - en: '[Figure 2-10](#simple_conv) shows two different 3 × 3 × 1 portions of a grayscale
     image being convoluted with a 3 × 3 × 1 *filter* (or *kernel*). The convolution
     is performed by multiplying the filter pixelwise with the portion of the image,
@@ -915,104 +1303,147 @@
     the inverse of the filter. The top example resonates strongly with the filter,
     so it produces a large positive value. The bottom example does not resonate much
     with the filter, so it produces a value near zero.'
+  id: totrans-185
   prefs: []
   type: TYPE_NORMAL
+  zh: '[图2-10](#simple_conv)显示了一个灰度图像的两个不同的3×3×1部分，与一个3×3×1*滤波器*（或*核心*）进行卷积。卷积是通过将滤波器逐像素地与图像部分相乘，并将结果求和来执行的。当图像部分与滤波器紧密匹配时，输出更为正向，当图像部分与滤波器的反向匹配时，输出更为负向。顶部示例与滤波器强烈共振，因此产生一个较大的正值。底部示例与滤波器的共振不大，因此产生一个接近零的值。'
 - en: '![](Images/gdl2_0210.png)'
+  id: totrans-186
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_0210.png)'
 - en: Figure 2-10\. A 3 × 3 convolutional filter applied to two portions of a grayscale
     image
+  id: totrans-187
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图2-10。应用于灰度图像两个部分的3×3卷积滤波器
 - en: If we move the filter across the entire image from left to right and top to
     bottom, recording the convolutional output as we go, we obtain a new array that
     picks out a particular feature of the input, depending on the values in the filter.
     For example, [Figure 2-11](#conv_layer_2d) shows two different filters that highlight
     horizontal and vertical edges.
+  id: totrans-188
   prefs: []
   type: TYPE_NORMAL
+  zh: 如果我们将滤波器从左到右和从上到下移动到整个图像上，并记录卷积输出，我们将获得一个新的数组，根据滤波器中的值选择输入的特定特征。例如，图2-11显示了突出显示水平和垂直边缘的两个不同滤波器。
 - en: Running the Code for This Example
+  id: totrans-189
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
+  zh: 运行此示例的代码
 - en: You can see this convolutional process worked through manually in the Jupyter
     notebook located at *notebooks/02_deeplearning/02_cnn/convolutions.ipynb* in the
     book repository.
+  id: totrans-190
   prefs: []
   type: TYPE_NORMAL
+  zh: 您可以在位于书籍存储库中的*notebooks/02_deeplearning/02_cnn/convolutions.ipynb*的Jupyter笔记本中手动查看这个卷积过程。
 - en: '![](Images/gdl2_0211.png)'
+  id: totrans-191
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_0211.png)'
 - en: Figure 2-11\. Two convolutional filters applied to a grayscale image
+  id: totrans-192
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图2-11。应用于灰度图像的两个卷积滤波器
 - en: A convolutional layer is simply a collection of filters, where the values stored
     in the filters are the weights that are learned by the neural network through
     training. Initially these are random, but gradually the filters adapt their weights
     to start picking out interesting features such as edges or particular color combinations.
+  id: totrans-193
   prefs: []
   type: TYPE_NORMAL
+  zh: 卷积层只是一组滤波器，其中存储在滤波器中的值是通过训练的神经网络学习的权重。最初这些是随机的，但逐渐滤波器调整它们的权重以开始选择有趣的特征，如边缘或特定的颜色组合。
 - en: In Keras, the `Conv2D` layer applies convolutions to an input tensor with two
     spatial dimensions (such as an image). For example, the code shown in [Example 2-12](#conv-layer)
     builds a convolutional layer with two filters, to match the example in [Figure 2-11](#conv_layer_2d).
+  id: totrans-194
   prefs: []
   type: TYPE_NORMAL
+  zh: 在Keras中，`Conv2D`层将卷积应用于具有两个空间维度（如图像）的输入张量。例如，[示例2-12](#conv-layer)中显示的代码构建了一个具有两个滤波器的卷积层，以匹配[图2-11](#conv_layer_2d)中的示例。
 - en: Example 2-12\. A `Conv2D` layer applied to grayscale input images
+  id: totrans-195
   prefs:
   - PREF_H5
   type: TYPE_NORMAL
+  zh: 示例2-12。应用于灰度输入图像的`Conv2D`层
 - en: '[PRE11]'
+  id: totrans-196
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE11]'
 - en: Next, let’s look at two of the arguments to the `Conv2D` layer in more detail—`strides`
     and `padding`.
+  id: totrans-197
   prefs: []
   type: TYPE_NORMAL
+  zh: 接下来，让我们更详细地看一下`Conv2D`层的两个参数——`strides`和`padding`。
 - en: Stride
+  id: totrans-198
   prefs:
   - PREF_H3
   type: TYPE_NORMAL
+  zh: 步幅
 - en: The `strides` parameter is the step size used by the layer to move the filters
     across the input. Increasing the stride therefore reduces the size of the output
     tensor. For example, when `strides = 2`, the height and width of the output tensor
     will be half the size of the input tensor. This is useful for reducing the spatial
     size of the tensor as it passes through the network, while increasing the number
     of channels.
+  id: totrans-199
   prefs: []
   type: TYPE_NORMAL
+  zh: '`strides`参数是层用来在输入上移动滤波器的步长。增加步长会减小输出张量的大小。例如，当`strides = 2`时，输出张量的高度和宽度将是输入张量大小的一半。这对于通过网络传递时减小张量的空间大小，同时增加通道数量是有用的。'
 - en: Padding
+  id: totrans-200
   prefs:
   - PREF_H3
   type: TYPE_NORMAL
+  zh: 填充
 - en: The `padding = "same"` input parameter pads the input data with zeros so that
     the output size from the layer is exactly the same as the input size when `strides
     = 1`.
+  id: totrans-201
   prefs: []
   type: TYPE_NORMAL
+  zh: '`padding = "same"`输入参数使用零填充输入数据，以便当`strides = 1`时，从层的输出大小与输入大小完全相同。'
 - en: '[Figure 2-12](#padding_example) shows a 3 × 3 kernel being passed over a 5
     × 5 input image, with `padding = "same"` and `strides = 1`. The output size from
     this convolutional layer would also be 5 × 5, as the padding allows the kernel
     to extend over the edge of the image, so that it fits five times in both directions.
     Without padding, the kernel could only fit three times along each direction, giving
     an output size of 3 × 3.'
+  id: totrans-202
   prefs: []
   type: TYPE_NORMAL
+  zh: 图2-12显示了一个3×3的卷积核在一个5×5的输入图像上进行传递，其中`padding = "same"`和`strides = 1`。这个卷积层的输出大小也将是5×5，因为填充允许卷积核延伸到图像的边缘，使其在两个方向上都适合五次。没有填充，卷积核只能在每个方向上适合三次，从而给出一个3×3的输出大小。
 - en: '![](Images/gdl2_0212.png)'
+  id: totrans-203
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_0212.png)'
 - en: 'Figure 2-12\. A 3 × 3 × 1 kernel (gray) being passed over a 5 × 5 × 1 input
     image (blue), with `padding = "same"` and `strides = 1`, to generate the 5 × 5
     × 1 output (green) (source: [Dumoulin and Visin, 2018](https://arxiv.org/abs/1603.07285))^([5](ch02.xhtml#idm45387031545152))'
+  id: totrans-204
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图2-12。一个3×3×1的卷积核（灰色）在一个5×5×1的输入图像（蓝色）上进行传递，其中`padding = "same"`和`strides =
+    1`，生成5×5×1的输出（绿色）（来源：Dumoulin和Visin，2018）
 - en: 'Setting `padding = "same"` is a good way to ensure that you are able to easily
     keep track of the size of the tensor as it passes through many convolutional layers.
     The shape of the output from a convolutional layer with `padding = "same"` is:'
+  id: totrans-205
   prefs: []
   type: TYPE_NORMAL
+  zh: 设置`padding = "same"`是一种确保您能够轻松跟踪张量大小的好方法，因为它通过许多卷积层时。具有`padding = "same"`的卷积层的输出形状是：
 - en: <math alttext="left-parenthesis StartFraction i n p u t h e i g h t Over s t
     r i d e EndFraction comma StartFraction i n p u t w i d t h Over s t r i d e EndFraction
     comma f i l t e r s right-parenthesis" display="block"><mrow><mo>(</mo> <mfrac><mrow><mi>i</mi><mi>n</mi><mi>p</mi><mi>u</mi><mi>t</mi><mi>h</mi><mi>e</mi><mi>i</mi><mi>g</mi><mi>h</mi><mi>t</mi></mrow>
@@ -1021,110 +1452,171 @@
     <mrow><mi>s</mi><mi>t</mi><mi>r</mi><mi>i</mi><mi>d</mi><mi>e</mi></mrow></mfrac>
     <mo>,</mo> <mi>f</mi> <mi>i</mi> <mi>l</mi> <mi>t</mi> <mi>e</mi> <mi>r</mi> <mi>s</mi>
     <mo>)</mo></mrow></math>
+  id: totrans-206
   prefs: []
   type: TYPE_NORMAL
+  zh: <math alttext="left-parenthesis StartFraction i n p u t h e i g h t Over s t
+    r i d e EndFraction comma StartFraction i n p u t w i d t h Over s t r i d e EndFraction
+    comma f i l t e r s right-parenthesis" display="block"><mrow><mo>(</mo> <mfrac><mrow><mi>i</mi><mi>n</mi><mi>p</mi><mi>u</mi><mi>t</mi><mi>h</mi><mi>e</mi><mi>i</mi><mi>g</mi><mi>h</mi><mi>t</mi></mrow>
+    <mrow><mi>s</mi><mi>t</mi><mi>r</mi><mi>i</mi><mi>d</mi><mi>e</mi></mrow></mfrac>
+    <mo>,</mo> <mfrac><mrow><mi>i</mi><mi>n</mi><mi>p</mi><mi>u</mi><mi>t</mi><mi>w</mi><mi>i</mi><mi>d</mi><mi>t</mi><mi>h</mi></mrow>
+    <mrow><mi>s</mi><mi>t</mi><mi>r</mi><mi>i</mi><mi>d</mi><mi>e</mi></mrow></mfrac>
+    <mo>,</mo> <mi>f</mi> <mi>i</mi> <mi>l</mi> <mi>t</mi> <mi>e</mi> <mi>r</mi> <mi>s</mi>
+    <mo>)</mo></mrow></math>
 - en: Stacking convolutional layers
+  id: totrans-207
   prefs:
   - PREF_H3
   type: TYPE_NORMAL
+  zh: 堆叠卷积层
 - en: The output of a `Conv2D` layer is another four-dimensional tensor, now of shape
     `(batch_size, height, width, filters)`, so we can stack `Conv2D` layers on top
     of each other to grow the depth of our neural network and make it more powerful.
     To demonstrate this, let’s imagine we are applying `Conv2D` layers to the CIFAR-10
     dataset and wish to predict the label of a given image. Note that this time, instead
     of one input channel (grayscale) we have three (red, green, and blue).
+  id: totrans-208
   prefs: []
   type: TYPE_NORMAL
+  zh: '`Conv2D`层的输出是另一个四维张量，现在的形状是`(batch_size, height, width, filters)`，因此我们可以将`Conv2D`层堆叠在一起，以增加神经网络的深度并使其更强大。为了演示这一点，让我们想象我们正在将`Conv2D`层应用于CIFAR-10数据集，并希望预测给定图像的标签。请注意，这一次，我们不是一个输入通道（灰度），而是三个（红色、绿色和蓝色）。'
 - en: '[Example 2-13](#conv-network) shows how to build a simple convolutional neural
     network that we could train to succeed at this task.'
+  id: totrans-209
   prefs: []
   type: TYPE_NORMAL
+  zh: '[示例2-13](#conv-network)展示了如何构建一个简单的卷积神经网络，我们可以训练它成功完成这项任务。'
 - en: Example 2-13\. Code to build a convolutional neural network model using Keras
+  id: totrans-210
   prefs:
   - PREF_H5
   type: TYPE_NORMAL
+  zh: 示例2-13。使用Keras构建卷积神经网络模型的代码
 - en: '[PRE12]'
+  id: totrans-211
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE12]'
 - en: This code corresponds to the diagram shown in [Figure 2-13](#conv_2d_complex).
+  id: totrans-212
   prefs: []
   type: TYPE_NORMAL
+  zh: 这段代码对应于[图2-13](#conv_2d_complex)中显示的图表。
 - en: '![](Images/gdl2_0213.png)'
+  id: totrans-213
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_0213.png)'
 - en: Figure 2-13\. A diagram of a convolutional neural network
+  id: totrans-214
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图2-13。卷积神经网络的图表
 - en: Note that now that we are working with color images, each filter in the first
     convolutional layer has a depth of 3 rather than 1 (i.e., each filter has shape
     4 × 4 × 3, rather than 4 × 4 × 1). This is to match the three channels (red, green,
     blue) of the input image. The same idea applies to the filters in the second convolutional
     layer that have a depth of 10, to match the 10 channels output by the first convolutional
     layer.
+  id: totrans-215
   prefs: []
   type: TYPE_NORMAL
+  zh: 请注意，现在我们正在处理彩色图像，第一个卷积层中的每个滤波器的深度为3，而不是1（即每个滤波器的形状为4×4×3，而不是4×4×1）。这是为了匹配输入图像的三个通道（红色、绿色、蓝色）。同样的想法也适用于第二个卷积层中的深度为10的滤波器，以匹配第一个卷积层输出的10个通道。
 - en: Tip
+  id: totrans-216
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 提示
 - en: In general, the depth of the filters in a layer is always equal to the number
     of channels output by the preceding layer.
+  id: totrans-217
   prefs: []
   type: TYPE_NORMAL
+  zh: 一般来说，层中滤波器的深度总是等于前一层输出的通道数。
 - en: Inspecting the model
+  id: totrans-218
   prefs:
   - PREF_H3
   type: TYPE_NORMAL
+  zh: 检查模型
 - en: It’s really informative to look at how the shape of the tensor changes as data
     flows through from one convolutional layer to the next. We can use the `model.summary()`
     method to inspect the shape of the tensor as it passes through the network ([Table 2-2](#conv_net_example_summary)).
+  id: totrans-219
   prefs: []
   type: TYPE_NORMAL
+  zh: 从一个卷积层到下一个卷积层，数据流经过时张量形状如何变化真的很有启发性。我们可以使用`model.summary()`方法检查张量在网络中传递时的形状（[表2-2](#conv_net_example_summary)）。
 - en: Table 2-2\. CNN model summary
+  id: totrans-220
   prefs: []
   type: TYPE_NORMAL
+  zh: 表2-2\. CNN模型摘要
 - en: '| Layer (type) | Output shape | Param # |'
+  id: totrans-221
   prefs: []
   type: TYPE_TB
+  zh: '| 层（类型） | 输出形状 | 参数数量 |'
 - en: '| --- | --- | --- |'
+  id: totrans-222
   prefs: []
   type: TYPE_TB
+  zh: '| --- | --- | --- |'
 - en: '| InputLayer | (None, 32, 32, 3) | 0 |'
+  id: totrans-223
   prefs: []
   type: TYPE_TB
+  zh: '| 输入层 | (None, 32, 32, 3) | 0 |'
 - en: '| Conv2D | (None, 16, 16, 10) | 490 |'
+  id: totrans-224
   prefs: []
   type: TYPE_TB
+  zh: '| Conv2D | (None, 16, 16, 10) | 490 |'
 - en: '| Conv2D | (None, 8, 8, 20) | 1,820 |'
+  id: totrans-225
   prefs: []
   type: TYPE_TB
+  zh: '| Conv2D | (None, 8, 8, 20) | 1,820 |'
 - en: '| Flatten | (None, 1280) | 0 |'
+  id: totrans-226
   prefs: []
   type: TYPE_TB
+  zh: '| Flatten | (None, 1280) | 0 |'
 - en: '| Dense | (None, 10) | 12,810 |'
+  id: totrans-227
   prefs: []
   type: TYPE_TB
+  zh: '| Dense | (None, 10) | 12,810 |'
 - en: '| Total params | 15,120 |'
+  id: totrans-228
   prefs: []
   type: TYPE_TB
+  zh: '| 总参数 | 15,120 |'
 - en: '| Trainable params | 15,120 |'
+  id: totrans-229
   prefs: []
   type: TYPE_TB
+  zh: '| 可训练参数 | 15,120 |'
 - en: '| Non-trainable params | 0 |'
+  id: totrans-230
   prefs: []
   type: TYPE_TB
+  zh: '| 不可训练参数 | 0 |'
 - en: 'Let’s walk through our network layer by layer, noting the shape of the tensor
     as we go:'
+  id: totrans-231
   prefs: []
   type: TYPE_NORMAL
+  zh: 让我们逐层走过我们的网络，注意张量的形状：
 - en: The input shape is `(None, 32, 32, 3)`—Keras uses `None` to represent the fact
     that we can pass any number of images through the network simultaneously. Since
     the network is just performing tensor algebra, we don’t need to pass images through
     the network individually, but instead can pass them through together as a batch.
+  id: totrans-232
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
+  zh: 输入形状为`(None, 32, 32, 3)`—Keras使用`None`表示我们可以同时通过网络传递任意数量的图像。由于网络只是执行张量代数运算，我们不需要单独通过网络传递图像，而是可以一起作为批次传递它们。
 - en: The shape of each of the 10 filters in the first convolutional layer is 4 ×
     4 × 3\. This is because we have chosen each filter to have a height and width
     of 4 (`kernel_size = (4,4)`) and there are three channels in the preceding layer
@@ -1136,106 +1628,141 @@
     and height of the output are both halved to 16, and since there are 10 filters
     the output of the first layer is a batch of tensors each having shape `[16, 16,
     10]`.
+  id: totrans-233
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
+  zh: 第一个卷积层中每个滤波器的形状是4×4×3。这是因为我们选择每个滤波器的高度和宽度为4（`kernel_size=(4,4)`），并且在前一层中有三个通道（红色、绿色和蓝色）。因此，该层中的参数（或权重）数量为（4×4×3+1）×10=490，其中+1是由于每个滤波器附加了一个偏置项。每个滤波器的输出将是滤波器权重和它所覆盖的图像的4×4×3部分的逐像素乘积。由于`strides=2`和`padding="same"`，输出的宽度和高度都减半为16，由于有10个滤波器，第一层的输出是一批张量，每个张量的形状为`[16,16,10]`。
 - en: In the second convolutional layer, we choose the filters to be 3 × 3 and they
     now have depth 10, to match the number of channels in the previous layer. Since
     there are 20 filters in this layer, this gives a total number of parameters (weights)
     of (3 × 3 × 10 + 1) × 20 = 1,820\. Again, we use `strides = 2 and` `padding =
     "same"`, so the width and height both halve. This gives us an overall output shape
     of `(None, 8, 8, 20)`.
+  id: totrans-234
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
+  zh: 在第二个卷积层中，我们选择滤波器为3×3，它们现在的深度为10，以匹配前一层中的通道数。由于这一层中有20个滤波器，这给出了总参数（权重）数量为（3×3×10+1）×20=1,820。同样，我们使用`strides=2`和`padding="same"`，所以宽度和高度都减半。这给出了一个总体输出形状为`(None,
+    8, 8, 20)`。
 - en: We now flatten the tensor using the Keras `Flatten` layer. This results in a
     set of 8 × 8 × 20 = 1,280 units. Note that there are no parameters to learn in
     a `Flatten` layer as the operation is just a restructuring of the tensor.
+  id: totrans-235
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
+  zh: 现在我们使用Keras的`Flatten`层展平张量。这会产生一组8×8×20=1,280个单元。请注意，在`Flatten`层中没有需要学习的参数，因为该操作只是对张量进行重组。
 - en: We finally connect these units to a 10-unit `Dense` layer with softmax activation,
     which represents the probability of each category in a 10-category classification
     task. This creates an extra 1,280 × 10 = 12,810 parameters (weights) to learn.
+  id: totrans-236
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
+  zh: 最后，我们将这些单元连接到一个具有softmax激活函数的10单元`Dense`层，表示10类分类任务中每个类别的概率。这会创建额外的1,280×10=12,810个参数（权重）需要学习。
 - en: 'This example demonstrates how we can chain convolutional layers together to
     create a convolutional neural network. Before we see how this compares in accuracy
     to our densely connected neural network, we’ll examine two more techniques that
     can also improve performance: batch normalization and dropout.'
+  id: totrans-237
   prefs: []
   type: TYPE_NORMAL
+  zh: 这个例子演示了如何将卷积层链接在一起创建卷积神经网络。在我们看到这与我们密集连接的神经网络在准确性上的比较之前，我们将研究另外两种也可以提高性能的技术：批量归一化和dropout。
 - en: Batch Normalization
+  id: totrans-238
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 批量归一化
 - en: One common problem when training a deep neural network is ensuring that the
     weights of the network remain within a reasonable range of values—if they start
     to become too large, this is a sign that your network is suffering from what is
     known as the *exploding gradient* problem. As errors are propagated backward through
     the network, the calculation of the gradient in the earlier layers can sometimes
     grow exponentially large, causing wild fluctuations in the weight values.
+  id: totrans-239
   prefs: []
   type: TYPE_NORMAL
+  zh: 训练深度神经网络时的一个常见问题是确保网络的权重保持在合理范围内的数值范围内 - 如果它们开始变得过大，这表明您的网络正在遭受所谓的*梯度爆炸*问题。当错误向后传播通过网络时，早期层中梯度的计算有时可能会呈指数增长，导致权重值出现剧烈波动。
 - en: Warning
+  id: totrans-240
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 警告
 - en: If your loss function starts to return `NaN`, chances are that your weights
     have grown large enough to cause an overflow error.
+  id: totrans-241
   prefs: []
   type: TYPE_NORMAL
+  zh: 如果您的损失函数开始返回`NaN`，那么很有可能是您的权重已经变得足够大，导致溢出错误。
 - en: This doesn’t necessarily happen immediately as you start training the network.
     Sometimes it can be happily training for hours when suddenly the loss function
     returns `NaN` and your network has exploded. This can be incredibly annoying.
     To prevent it from happening, you need to understand the root cause of the exploding
     gradient problem.
+  id: totrans-242
   prefs: []
   type: TYPE_NORMAL
+  zh: 这并不一定会立即发生在您开始训练网络时。有时候，它可能在几个小时内愉快地训练，突然损失函数返回`NaN`，您的网络就爆炸了。这可能非常恼人。为了防止这种情况发生，您需要了解梯度爆炸问题的根本原因。
 - en: Covariate shift
+  id: totrans-243
   prefs:
   - PREF_H3
   type: TYPE_NORMAL
+  zh: 协变量转移
 - en: One of the reasons for scaling input data to a neural network is to ensure a
     stable start to training over the first few iterations. Since the weights of the
     network are initially randomized, unscaled input could potentially create huge
     activation values that immediately lead to exploding gradients. For example, instead
     of passing pixel values from 0–255 into the input layer, we usually scale these
     values to between –1 and 1.
+  id: totrans-244
   prefs: []
   type: TYPE_NORMAL
+  zh: 将输入数据缩放到神经网络的一个原因是确保在前几次迭代中稳定地开始训练。由于网络的权重最初是随机化的，未缩放的输入可能会导致立即产生激活值过大，从而导致梯度爆炸。例如，我们通常将像素值从0-255传递到输入层，而不是将这些值缩放到-1到1之间。
 - en: Because the input is scaled, it’s natural to expect the activations from all
     future layers to be relatively well scaled as well. Initially this may be true,
     but as the network trains and the weights move further away from their random
     initial values, this assumption can start to break down. This phenomenon is known
     as *covariate shift*.
+  id: totrans-245
   prefs: []
   type: TYPE_NORMAL
+  zh: 因为输入被缩放，自然地期望未来所有层的激活也相对缩放。最初可能是正确的，但随着网络训练和权重远离其随机初始值，这个假设可能开始破裂。这种现象被称为*协变量转移*。
 - en: Covariate Shift Analogy
+  id: totrans-246
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
+  zh: 协变量转移类比
 - en: Imagine you’re carrying a tall pile of books, and you get hit by a gust of wind.
     You move the books in a direction opposite to the wind to compensate, but as you
     do so, some of the books shift, so that the tower is slightly more unstable than
     before. Initially, this is OK, but with every gust the pile becomes more and more
     unstable, until eventually the books have shifted so much that the pile collapses.
     This is covariate shift.
+  id: totrans-247
   prefs: []
   type: TYPE_NORMAL
+  zh: 想象一下，你正拿着一摞高高的书，突然被一阵风吹袭。你将书向与风相反的方向移动以补偿，但在这样做的过程中，一些书会移动，使得整个塔比以前稍微不稳定。最初，这没关系，但随着每阵风，这摞书变得越来越不稳定，直到最终书移动得太多，整摞书倒塌。这就是协变量转移。
 - en: Relating this to neural networks, each layer is like a book in the pile. To
     remain stable, when the network updates the weights, each layer implicitly assumes
     that the distribution of its input from the layer beneath is approximately consistent
     across iterations. However, since there is nothing to stop any of the activation
     distributions shifting significantly in a certain direction, this can sometimes
     lead to runaway weight values and an overall collapse of the network.
+  id: totrans-248
   prefs: []
   type: TYPE_NORMAL
+  zh: 将这与神经网络联系起来，每一层就像堆叠中的一本书。为了保持稳定，当网络更新权重时，每一层都隐含地假设其来自下一层的输入分布在迭代中大致保持一致。然而，由于没有任何东西可以阻止任何激活分布在某个方向上发生显着变化，这有时会导致权重值失控和网络整体崩溃。
 - en: Training using batch normalization
+  id: totrans-249
   prefs:
   - PREF_H3
   type: TYPE_NORMAL
+  zh: 使用批量归一化进行训练
 - en: '*Batch normalization* is a technique that drastically reduces this problem.
     The solution is surprisingly simple. During training, a batch normalization layer
     calculates the mean and standard deviation of each of its input channels across
@@ -1243,41 +1770,57 @@
     deviation. There are then two learned parameters for each channel, the scale (gamma)
     and shift (beta). The output is simply the normalized input, scaled by gamma and
     shifted by beta. [Figure 2-14](#batch_norm) shows the whole process.'
+  id: totrans-250
   prefs: []
   type: TYPE_NORMAL
+  zh: '*批量归一化*是一种极大地减少这个问题的技术。解决方案出奇地简单。在训练期间，批量归一化层计算每个输入通道在批处理中的均值和标准差，并通过减去均值并除以标准差来进行归一化。然后，每个通道有两个学习参数，即缩放（gamma）和移位（beta）。输出只是归一化的输入，由gamma缩放并由beta移位。[图2-14](#batch_norm)展示了整个过程。'
 - en: '![](Images/gdl2_0214.png)'
+  id: totrans-251
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_0214.png)'
 - en: 'Figure 2-14\. The batch normalization process (source: [Ioffe and Szegedy,
     2015](https://arxiv.org/abs/1502.03167))^([6](ch02.xhtml#idm45387025136368))'
+  id: totrans-252
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图2-14。批量归一化过程（来源：[Ioffe and Szegedy, 2015](https://arxiv.org/abs/1502.03167)）^([6](ch02.xhtml#idm45387025136368))
 - en: We can place batch normalization layers after dense or convolutional layers
     to normalize the output.
+  id: totrans-253
   prefs: []
   type: TYPE_NORMAL
+  zh: 我们可以在密集层或卷积层之后放置批量归一化层来归一化输出。
 - en: Tip
+  id: totrans-254
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 提示
 - en: Referring to our previous example, it’s a bit like connecting the layers of
     books with small sets of adjustable springs that ensure there aren’t any overall
     huge shifts in their positions over time.
+  id: totrans-255
   prefs: []
   type: TYPE_NORMAL
+  zh: 参考我们之前的例子，这有点像用一小组可调节弹簧连接书层，以确保它们的位置随时间不会发生明显的整体移动。
 - en: Prediction using batch normalization
+  id: totrans-256
   prefs:
   - PREF_H3
   type: TYPE_NORMAL
+  zh: 使用批量归一化进行预测
 - en: You might be wondering how this layer works at prediction time. When it comes
     to prediction, we may only want to predict a single observation, so there is no
     *batch* over which to calculate the mean and standard deviation. To get around
     this problem, during training a batch normalization layer also calculates the
     moving average of the mean and standard deviation of each channel and stores this
     value as part of the layer to use at test time.
+  id: totrans-257
   prefs: []
   type: TYPE_NORMAL
+  zh: 您可能想知道这个层在预测时是如何工作的。在预测时，我们可能只想预测单个观测值，因此没有*批次*可以计算平均值和标准差。为了解决这个问题，在训练期间，批归一化层还会计算每个通道的平均值和标准差的移动平均值，并将这个值作为该层的一部分存储起来，以便在测试时使用。
 - en: 'How many parameters are contained within a batch normalization layer? For every
     channel in the preceding layer, two weights need to be learned: the scale (gamma)
     and shift (beta). These are the *trainable* parameters. The moving average and
@@ -1286,27 +1829,39 @@
     backpropagation, they are called *nontrainable* parameters. In total, this gives
     four parameters for each channel in the preceding layer, where two are trainable
     and two are nontrainable.'
+  id: totrans-258
   prefs: []
   type: TYPE_NORMAL
+  zh: 批归一化层中包含多少参数？对于前一层中的每个通道，需要学习两个权重：比例（gamma）和偏移（beta）。这些是*可训练*参数。移动平均值和标准差也需要针对每个通道进行计算，但由于它们是从通过该层的数据派生而来，而不是通过反向传播进行训练，因此被称为*不可训练*参数。总共，这为前一层中的每个通道提供了四个参数，其中两个是可训练的，两个是不可训练的。
 - en: In Keras, the `BatchNormalization` layer implements the batch normalization
     functionality, as shown in [Example 2-14](#batchnorm-layer).
+  id: totrans-259
   prefs: []
   type: TYPE_NORMAL
+  zh: 在Keras中，`BatchNormalization`层实现了批归一化功能，如[例2-14](#batchnorm-layer)所示。
 - en: Example 2-14\. A `BatchNormalization` layer in Keras
+  id: totrans-260
   prefs:
   - PREF_H5
   type: TYPE_NORMAL
+  zh: 例2-14\. Keras中的`BatchNormalization`层
 - en: '[PRE13]'
+  id: totrans-261
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE13]'
 - en: The `momentum` parameter is the weight given to the previous value when calculating
     the moving average and moving standard deviation.
+  id: totrans-262
   prefs: []
   type: TYPE_NORMAL
+  zh: 在计算移动平均值和移动标准差时，`momentum`参数是给予先前值的权重。
 - en: Dropout
+  id: totrans-263
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: Dropout
 - en: When studying for an exam, it is common practice for students to use past papers
     and sample questions to improve their knowledge of the subject material. Some
     students try to memorize the answers to these questions, but then come unstuck
@@ -1314,53 +1869,71 @@
     students use the practice material to further their general understanding, so
     that they are still able to answer correctly when faced with new questions that
     they haven’t seen before.
+  id: totrans-264
   prefs: []
   type: TYPE_NORMAL
+  zh: 在备考考试时，学生通常会使用过去的试卷和样题来提高对学科材料的了解。一些学生试图记住这些问题的答案，但在考试中却因为没有真正理解学科内容而失败。最好的学生利用练习材料来进一步提高他们对学科的整体理解，这样当面对以前没有见过的新问题时，他们仍然能够正确回答。
 - en: The same principle holds for machine learning. Any successful machine learning
     algorithm must ensure that it generalizes to unseen data, rather than simply *remembering*
     the training dataset. If an algorithm performs well on the training dataset, but
     not the test dataset, we say that it is suffering from *overfitting*. To counteract
     this problem, we use *regularization* techniques, which ensure that the model
     is penalized if it starts to overfit.
+  id: totrans-265
   prefs: []
   type: TYPE_NORMAL
+  zh: 相同的原则适用于机器学习。任何成功的机器学习算法必须确保它能泛化到未见过的数据，而不仅仅是*记住*训练数据集。如果一个算法在训练数据集上表现良好，但在测试数据集上表现不佳，我们称其为*过拟合*。为了解决这个问题，我们使用*正则化*技术，确保模型在开始过拟合时受到惩罚。
 - en: There are many ways to regularize a machine learning algorithm, but for deep
     learning, one of the most common is by using *dropout* layers. This idea was introduced
     by Hinton et al. in 2012^([7](ch02.xhtml#idm45387025089232)) and presented in
     a 2014 paper by Srivastava et al.^([8](ch02.xhtml#idm45387025086976))
+  id: totrans-266
   prefs: []
   type: TYPE_NORMAL
+  zh: 有许多方法可以对机器学习算法进行正则化，但对于深度学习来说，最常见的一种方法是使用*dropout*层。这个想法是由Hinton等人在2012年提出的^([7](ch02.xhtml#idm45387025089232))，并在2014年由Srivastava等人在一篇论文中提出^([8](ch02.xhtml#idm45387025086976))
 - en: Dropout layers are very simple. During training, each dropout layer chooses
     a random set of units from the preceding layer and sets their output to 0, as
     shown in [Figure 2-15](#dropout).
+  id: totrans-267
   prefs: []
   type: TYPE_NORMAL
+  zh: Dropout层非常简单。在训练期间，每个dropout层从前一层中选择一组随机单元，并将它们的输出设置为0，如[图2-15](#dropout)所示。
 - en: Incredibly, this simple addition drastically reduces overfitting by ensuring
     that the network doesn’t become overdependent on certain units or groups of units
     that, in effect, just remember observations from the training set. If we use dropout
     layers, the network cannot rely too much on any one unit and therefore knowledge
     is more evenly spread across the whole network.
+  id: totrans-268
   prefs: []
   type: TYPE_NORMAL
+  zh: 令人难以置信的是，这个简单的添加通过确保网络不会过度依赖某些单元或单元组而大大减少了过拟合，这些单元或单元组实际上只是记住了训练集中的观察结果。如果我们使用dropout层，网络就不能太依赖任何一个单元，因此知识更均匀地分布在整个网络中。
 - en: '![](Images/gdl2_0215.png)'
+  id: totrans-269
   prefs: []
   type: TYPE_IMG
+  zh: '![](Images/gdl2_0215.png)'
 - en: Figure 2-15\. A dropout layer
+  id: totrans-270
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图2-15\. 一个dropout层
 - en: This makes the model much better at generalizing to unseen data, because the
     network has been trained to produce accurate predictions even under unfamiliar
     conditions, such as those caused by dropping random units. There are no weights
     to learn within a dropout layer, as the units to drop are decided stochastically.
     At prediction time, the dropout layer doesn’t drop any units, so that the full
     network is used to make predictions.
+  id: totrans-271
   prefs: []
   type: TYPE_NORMAL
+  zh: 这使得模型在泛化到未见过的数据时更加出色，因为网络已经经过训练，即使在由于丢弃随机单元引起的陌生条件下，也能产生准确的预测。在dropout层内没有需要学习的权重，因为要丢弃的单元是随机决定的。在预测时，dropout层不会丢弃任何单元，因此整个网络用于进行预测。
 - en: Dropout Analogy
+  id: totrans-272
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
+  zh: Dropout类比
 - en: Returning to our analogy, it’s a bit like a math student practicing past papers
     with a random selection of key formulae missing from their formula book. This
     way, they learn how to answer questions through an understanding of the core principles,
@@ -1368,190 +1941,285 @@
     it comes to test time, they will find it much easier to answer questions that
     they have never seen before, due to their ability to generalize beyond the training
     material.
+  id: totrans-273
   prefs: []
   type: TYPE_NORMAL
+  zh: 回到我们的类比，这有点像数学学生练习过去试卷，其中随机选择了公式书中缺失的关键公式。通过这种方式，他们学会了通过对核心原则的理解来回答问题，而不是总是在书中相同的地方查找公式。当考试时，他们会发现更容易回答以前从未见过的问题，因为他们能够超越训练材料进行泛化。
 - en: The `Dropout` layer in Keras implements this functionality, with the `rate`
     parameter specifying the proportion of units to drop from the preceding layer,
     as shown in [Example 2-15](#dropout-layer).
+  id: totrans-274
   prefs: []
   type: TYPE_NORMAL
+  zh: Keras中的`Dropout`层实现了这种功能，`rate`参数指定了要从前一层中丢弃的单元的比例，如[示例2-15](#dropout-layer)所示。
 - en: Example 2-15\. A `Dropout` layer in Keras
+  id: totrans-275
   prefs:
   - PREF_H5
   type: TYPE_NORMAL
+  zh: 示例2-15\. Keras中的`Dropout`层
 - en: '[PRE14]'
+  id: totrans-276
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE14]'
 - en: Dropout layers are used most commonly after dense layers since these are the
     most prone to overfitting due to the higher number of weights, though you can
     also use them after convolutional layers.
+  id: totrans-277
   prefs: []
   type: TYPE_NORMAL
+  zh: 由于密集层的权重数量较高，最容易过拟合，因此通常在密集层之后使用Dropout层，尽管也可以在卷积层之后使用。
 - en: Tip
+  id: totrans-278
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 提示
 - en: Batch normalization also has been shown to reduce overfitting, and therefore
     many modern deep learning architectures don’t use dropout at all, relying solely
     on batch normalization for regularization. As with most deep learning principles,
     there is no golden rule that applies in every situation—the only way to know for
     sure what’s best is to test different architectures and see which performs best
     on a holdout set of data.
+  id: totrans-279
   prefs: []
   type: TYPE_NORMAL
+  zh: 批量归一化也被证明可以减少过拟合，因此许多现代深度学习架构根本不使用dropout，完全依赖批量归一化进行正则化。与大多数深度学习原则一样，在每种情况下都没有适用的黄金法则，唯一确定最佳方法的方式是测试不同的架构，看看哪种在保留数据集上表现最好。
 - en: Building the CNN
+  id: totrans-280
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 构建CNN
 - en: 'You’ve now seen three new Keras layer types: `Conv2D`, `BatchNormalization`,
     and `Dropout`. Let’s put these pieces together into a CNN model and see how it
     performs on the CIFAR-10 dataset.'
+  id: totrans-281
   prefs: []
   type: TYPE_NORMAL
+  zh: 您现在已经看到了三种新的Keras层类型：`Conv2D`、`BatchNormalization`和`Dropout`。让我们将这些部分组合成一个CNN模型，并看看它在CIFAR-10数据集上的表现。
 - en: Running the Code for This Example
+  id: totrans-282
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
+  zh: 运行此示例的代码
 - en: You can run the following example in the Jupyter notebook in the book repository
     called *notebooks/02_deeplearning/02_cnn/cnn.ipynb*.
+  id: totrans-283
   prefs: []
   type: TYPE_NORMAL
+  zh: 您可以在书籍存储库中名为*notebooks/02_deeplearning/02_cnn/cnn.ipynb*的Jupyter笔记本中运行以下示例。
 - en: The model architecture we shall test is shown in [Example 2-16](#conv-network-2).
+  id: totrans-284
   prefs: []
   type: TYPE_NORMAL
+  zh: 我们将测试的模型架构显示在[示例2-16](#conv-network-2)中。
 - en: Example 2-16\. Code to build a CNN model using Keras
+  id: totrans-285
   prefs:
   - PREF_H5
   type: TYPE_NORMAL
+  zh: 示例2-16\. 使用Keras构建CNN模型的代码
 - en: '[PRE15]'
+  id: totrans-286
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE15]'
 - en: We use four stacked `Conv2D` layers, each followed by a `BatchNormalization`
     and a `LeakyReLU` layer. After flattening the resulting tensor, we pass the data
     through a `Dense` layer of size 128, again followed by a `BatchNormalization`
     and a `LeakyReLU` layer. This is immediately followed by a `Dropout` layer for
     regularization, and the network is concluded with an output `Dense` layer of size
     10.
+  id: totrans-287
   prefs: []
   type: TYPE_NORMAL
+  zh: 我们使用四个堆叠的`Conv2D`层，每个后面跟一个`BatchNormalization`和一个`LeakyReLU`层。在展平结果张量后，我们通过一个大小为128的`Dense`层，再次跟一个`BatchNormalization`和一个`LeakyReLU`层。紧接着是一个用于正则化的`Dropout`层，网络最后是一个大小为10的输出`Dense`层。
 - en: Tip
+  id: totrans-288
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 提示
 - en: The order in which to use the batch normalization and activation layers is a
     matter of preference. Usually batch normalization layers are placed before the
     activation, but some successful architectures use these layers the other way around.
     If you do choose to use batch normalization before activation, you can remember
     the order using the acronym *BAD* (batch normalization, activation, then dropout)!
+  id: totrans-289
   prefs: []
   type: TYPE_NORMAL
+  zh: 使用批量归一化和激活层的顺序是个人偏好的问题。通常情况下，批量归一化层放在激活层之前，但一些成功的架构会反过来使用这些层。如果选择在激活之前使用批量归一化，可以使用缩写
+    *BAD*（批量归一化，激活，然后是dropout）来记住顺序！
 - en: The model summary is shown in [Table 2-3](#cnn_model_summary).
+  id: totrans-290
   prefs: []
   type: TYPE_NORMAL
+  zh: 模型摘要显示在[表2-3](#cnn_model_summary)中。
 - en: Table 2-3\. Model summary of the CNN for CIFAR-10
+  id: totrans-291
   prefs: []
   type: TYPE_NORMAL
+  zh: 表2-3\. CIFAR-10的CNN模型摘要
 - en: '| Layer (type) | Output shape | Param # |'
+  id: totrans-292
   prefs: []
   type: TYPE_TB
+  zh: '| 层（类型） | 输出形状 | 参数 # |'
 - en: '| --- | --- | --- |'
+  id: totrans-293
   prefs: []
   type: TYPE_TB
+  zh: '| --- | --- | --- |'
 - en: '| InputLayer | (None, 32, 32, 3) | 0 |'
+  id: totrans-294
   prefs: []
   type: TYPE_TB
+  zh: '| InputLayer | (None, 32, 32, 3) | 0 |'
 - en: '| Conv2D | (None, 32, 32, 32) | 896 |'
+  id: totrans-295
   prefs: []
   type: TYPE_TB
+  zh: '| Conv2D | (None, 32, 32, 32) | 896 |'
 - en: '| BatchNormalization | (None, 32, 32, 32) | 128 |'
+  id: totrans-296
   prefs: []
   type: TYPE_TB
+  zh: '| BatchNormalization | (None, 32, 32, 32) | 128 |'
 - en: '| LeakyReLU | (None, 32, 32, 32) | 0 |'
+  id: totrans-297
   prefs: []
   type: TYPE_TB
+  zh: '| LeakyReLU | (None, 32, 32, 32) | 0 |'
 - en: '| Conv2D | (None, 16, 16, 32) | 9,248 |'
+  id: totrans-298
   prefs: []
   type: TYPE_TB
+  zh: '| Conv2D | (None, 16, 16, 32) | 9,248 |'
 - en: '| BatchNormalization | (None, 16, 16, 32) | 128 |'
+  id: totrans-299
   prefs: []
   type: TYPE_TB
+  zh: '| BatchNormalization | (None, 16, 16, 32) | 128 |'
 - en: '| LeakyReLU | (None, 16, 16, 32) | 0 |'
+  id: totrans-300
   prefs: []
   type: TYPE_TB
+  zh: '| LeakyReLU | (None, 16, 16, 32) | 0 |'
 - en: '| Conv2D | (None, 16, 16, 64) | 18,496 |'
+  id: totrans-301
   prefs: []
   type: TYPE_TB
+  zh: '| Conv2D | (None, 16, 16, 64) | 18,496 |'
 - en: '| BatchNormalization | (None, 16, 16, 64) | 256 |'
+  id: totrans-302
   prefs: []
   type: TYPE_TB
+  zh: '| BatchNormalization | (None, 16, 16, 64) | 256 |'
 - en: '| LeakyReLU | (None, 16, 16, 64) | 0 |'
+  id: totrans-303
   prefs: []
   type: TYPE_TB
+  zh: '| LeakyReLU | (None, 16, 16, 64) | 0 |'
 - en: '| Conv2D | (None, 8, 8, 64) | 36,928 |'
+  id: totrans-304
   prefs: []
   type: TYPE_TB
+  zh: '| Conv2D | (None, 8, 8, 64) | 36,928 |'
 - en: '| BatchNormalization | (None, 8, 8, 64) | 256 |'
+  id: totrans-305
   prefs: []
   type: TYPE_TB
+  zh: '| BatchNormalization | (None, 8, 8, 64) | 256 |'
 - en: '| LeakyReLU | (None, 8, 8, 64) | 0 |'
+  id: totrans-306
   prefs: []
   type: TYPE_TB
+  zh: '| LeakyReLU | (None, 8, 8, 64) | 0 |'
 - en: '| Flatten | (None, 4096) | 0 |'
+  id: totrans-307
   prefs: []
   type: TYPE_TB
+  zh: '| Flatten | (None, 4096) | 0 |'
 - en: '| Dense | (None, 128) | 524,416 |'
+  id: totrans-308
   prefs: []
   type: TYPE_TB
+  zh: '| Dense | (None, 128) | 524,416 |'
 - en: '| BatchNormalization | (None, 128) | 512 |'
+  id: totrans-309
   prefs: []
   type: TYPE_TB
+  zh: '| BatchNormalization | (None, 128) | 512 |'
 - en: '| LeakyReLU | (None, 128) | 0 |'
+  id: totrans-310
   prefs: []
   type: TYPE_TB
+  zh: '| LeakyReLU | (None, 128) | 0 |'
 - en: '| Dropout | (None, 128) | 0 |'
+  id: totrans-311
   prefs: []
   type: TYPE_TB
+  zh: '| Dropout | (None, 128) | 0 |'
 - en: '| Dense | (None, 10) | 1290 |'
+  id: totrans-312
   prefs: []
   type: TYPE_TB
+  zh: '| Dense | (None, 10) | 1290 |'
 - en: '| Total params | 592,554 |'
+  id: totrans-313
   prefs: []
   type: TYPE_TB
+  zh: '| 总参数 | 592,554 |'
 - en: '| Trainable params | 591,914 |'
+  id: totrans-314
   prefs: []
   type: TYPE_TB
+  zh: '| 可训练参数 | 591,914 |'
 - en: '| Non-trainable params | 640 |'
+  id: totrans-315
   prefs: []
   type: TYPE_TB
+  zh: '| 不可训练参数 | 640 |'
 - en: Tip
+  id: totrans-316
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 提示
 - en: Before moving on, make sure you are able to calculate the output shape and number
     of parameters for each layer by hand. It’s a good exercise to prove to yourself
     that you have fully understood how each layer is constructed and how it is connected
     to the preceding layer! Don’t forget to include the bias weights that are included
     as part of the `Conv2D` and `Dense` layers.
+  id: totrans-317
   prefs: []
   type: TYPE_NORMAL
 - en: Training and Evaluating the CNN
+  id: totrans-318
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
 - en: We compile and train the model in exactly the same way as before and call the
     `evaluate` method to determine its accuracy on the holdout set ([Figure 2-16](#cnn_model_evaluate)).
+  id: totrans-319
   prefs: []
   type: TYPE_NORMAL
 - en: '![](Images/gdl2_0216.png)'
+  id: totrans-320
   prefs: []
   type: TYPE_IMG
 - en: Figure 2-16\. CNN performance
+  id: totrans-321
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
 - en: As you can see, this model is now achieving 71.5% accuracy, up from 49.0% previously.
     Much better! [Figure 2-17](#cnn_preds) shows some predictions from our new convolutional
     model.
+  id: totrans-322
   prefs: []
   type: TYPE_NORMAL
 - en: This improvement has been achieved simply by changing the architecture of the
@@ -1563,16 +2231,20 @@
     generative models, it becomes even more important to understand the inner workings
     of your model since it is the middle layers of your network that capture the high-level
     features that you are most interested in.
+  id: totrans-323
   prefs: []
   type: TYPE_NORMAL
 - en: '![](Images/gdl2_0217.png)'
+  id: totrans-324
   prefs: []
   type: TYPE_IMG
 - en: Figure 2-17\. CNN predictions
+  id: totrans-325
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
 - en: Summary
+  id: totrans-326
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
@@ -1582,6 +2254,7 @@
     from the CIFAR-10 dataset. Then, we improved upon this architecture by introducing
     convolutional, batch normalization, and dropout layers to create a convolutional
     neural network (CNN).
+  id: totrans-327
   prefs: []
   type: TYPE_NORMAL
 - en: A really important point to take away from this chapter is that deep neural
@@ -1591,43 +2264,57 @@
     appear. Don’t feel constrained to only use the architectures that you have read
     about in this book or elsewhere! Like a child with a set of building blocks, the
     design of your neural network is only limited by your own imagination.
+  id: totrans-328
   prefs: []
   type: TYPE_NORMAL
 - en: In the next chapter, we shall see how we can use these building blocks to design
     a network that can generate images.
+  id: totrans-329
   prefs: []
   type: TYPE_NORMAL
 - en: ^([1](ch02.xhtml#idm45387028957520-marker)) Kaiming He et al., “Deep Residual
     Learning for Image Recognition,” December 10, 2015, [*https://arxiv.org/abs/1512.03385*](https://arxiv.org/abs/1512.03385).
+  id: totrans-330
   prefs: []
   type: TYPE_NORMAL
 - en: ^([2](ch02.xhtml#idm45387033163216-marker)) Alex Krizhevsky, “Learning Multiple
     Layers of Features from Tiny Images,” April 8, 2009, [*https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf*](https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf).
+  id: totrans-331
   prefs: []
   type: TYPE_NORMAL
 - en: '^([3](ch02.xhtml#idm45387032147088-marker)) Diederik Kingma and Jimmy Ba, “Adam:
     A Method for Stochastic Optimization,” December 22, 2014, [*https://arxiv.org/abs/1412.6980v8*](https://arxiv.org/abs/1412.6980v8).'
+  id: totrans-332
   prefs: []
   type: TYPE_NORMAL
 - en: ^([4](ch02.xhtml#idm45387032068928-marker)) Samuel L. Smith et al., “Don’t Decay
     the Learning Rate, Increase the Batch Size,” November 1, 2017, [*https://arxiv.org/abs/1711.00489*](https://arxiv.org/abs/1711.00489).
+  id: totrans-333
   prefs: []
   type: TYPE_NORMAL
 - en: ^([5](ch02.xhtml#idm45387031545152-marker)) Vincent Dumoulin and Francesco Visin,
     “A Guide to Convolution Arithmetic for Deep Learning,” January 12, 2018, [*https://arxiv.org/abs/1603.07285*](https://arxiv.org/abs/1603.07285).
+  id: totrans-334
   prefs: []
   type: TYPE_NORMAL
 - en: '^([6](ch02.xhtml#idm45387025136368-marker)) Sergey Ioffe and Christian Szegedy,
     “Batch Normalization: Accelerating Deep Network Training by Reducing Internal
     Covariate Shift,” February 11, 2015, [*https://arxiv.org/abs/1502.03167*](https://arxiv.org/abs/1502.03167).'
+  id: totrans-335
   prefs: []
   type: TYPE_NORMAL
+  zh: ^([6](ch02.xhtml#idm45387025136368-marker)) Sergey Ioffe和Christian Szegedy，“批量归一化：通过减少内部协变量转移加速深度网络训练”，2015年2月11日，[*https://arxiv.org/abs/1502.03167*](https://arxiv.org/abs/1502.03167)。
 - en: ^([7](ch02.xhtml#idm45387025089232-marker)) Hinton et al., “Networks by Preventing
     Co-Adaptation of Feature Detectors,” July 3, 2012, [*https://arxiv.org/abs/1207.0580*](https://arxiv.org/abs/1207.0580).
+  id: totrans-336
   prefs: []
   type: TYPE_NORMAL
+  zh: ^([7](ch02.xhtml#idm45387025089232-marker)) Hinton等人，“通过防止特征探测器的共适应来构建网络”，2012年7月3日，[*https://arxiv.org/abs/1207.0580*](https://arxiv.org/abs/1207.0580)。
 - en: '^([8](ch02.xhtml#idm45387025086976-marker)) Nitish Srivastava et al., “Dropout:
     A Simple Way to Prevent Neural Networks from Overfitting,” *Journal of Machine
     Learning Research* 15 (2014): 1929–1958, [*http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf*](http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf).'
+  id: totrans-337
   prefs: []
   type: TYPE_NORMAL
+  zh: '^([8](ch02.xhtml#idm45387025086976-marker)) Nitish Srivastava等人，“Dropout：防止神经网络过拟合的简单方法”，*机器学习研究杂志*
+    15 (2014): 1929–1958，[*http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf*](http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf)。'