From 444ef71c5158bf483e100d08c6cdd4a3779470af Mon Sep 17 00:00:00 2001
From: wizardforcel <562826179@qq.com>
Date: Thu, 8 Feb 2024 18:58:20 +0800
Subject: [PATCH] 2024-02-08 18:58:18
---
totrans/gen-dl_04.yaml | 687 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 687 insertions(+)
diff --git a/totrans/gen-dl_04.yaml b/totrans/gen-dl_04.yaml
index 408551e..4c305d8 100644
--- a/totrans/gen-dl_04.yaml
+++ b/totrans/gen-dl_04.yaml
@@ -1,13 +1,16 @@
- en: Chapter 2\. Deep Learning
+ id: totrans-0
prefs:
- PREF_H1
type: TYPE_NORMAL
- en: 'Let’s start with a basic definition of deep learning:'
+ id: totrans-1
prefs: []
type: TYPE_NORMAL
- en: Deep learning is a class of machine learning algorithms that uses *multiple
stacked layers of processing units* to learn high-level representations from *unstructured*
data.
+ id: totrans-2
prefs:
- PREF_BQ
type: TYPE_NORMAL
@@ -17,9 +20,11 @@
building multiple stacked layers of processing units to solve classification tasks.
This will provide the foundation for future chapters where we focus on deep learning
for generative tasks.
+ id: totrans-3
prefs: []
type: TYPE_NORMAL
- en: Data for Deep Learning
+ id: totrans-4
prefs:
- PREF_H1
type: TYPE_NORMAL
@@ -32,6 +37,7 @@
to predict the binary response variable—did the person subscribe (1) or not (0)?
Here, each individual feature contains a nugget of information about the observation,
and the model would learn how these features interact to influence the response.
+ id: totrans-5
prefs: []
type: TYPE_NORMAL
- en: '*Unstructured* data refers to any data that is not naturally arranged into
@@ -39,12 +45,15 @@
structure to an image, temporal structure to a recording or passage of text, and
both spatial and temporal structure to video data, but since the data does not
arrive in columns of features, it is considered unstructured, as shown in [Figure 2-1](#structured_unstructured).'
+ id: totrans-6
prefs: []
type: TYPE_NORMAL
- en: '![](Images/gdl2_0201.png)'
+ id: totrans-7
prefs: []
type: TYPE_IMG
- en: Figure 2-1\. The difference between structured and unstructured data
+ id: totrans-8
prefs:
- PREF_H6
type: TYPE_NORMAL
@@ -53,6 +62,7 @@
is a muddy shade of brown doesn’t really help identify if the image is of a house
or a dog, and knowing that character 24 of a sentence is an *e* doesn’t help predict
if the text is about football or politics.
+ id: totrans-9
prefs: []
type: TYPE_NORMAL
- en: Pixels or characters are really just the dimples of the canvas into which higher-level
@@ -64,6 +74,7 @@
positions would provide this information. The granularity of the data combined
with the high degree of spatial dependence destroys the concept of the pixel or
character as an informative feature in its own right.
+ id: totrans-10
prefs: []
type: TYPE_NORMAL
- en: For this reason, if we train logistic regression, random forest, or XGBoost
@@ -72,6 +83,7 @@
to be informative and not spatially dependent. A deep learning model, on the other
hand, can learn how to build high-level informative features by itself, directly
from the unstructured data.
+ id: totrans-11
prefs: []
type: TYPE_NORMAL
- en: Deep learning can be applied to structured data, but its real power, especially
@@ -79,9 +91,11 @@
data. Most often, we want to generate unstructured data such as new images or
original strings of text, which is why deep learning has had such a profound impact
on the field of generative modeling.
+ id: totrans-12
prefs: []
type: TYPE_NORMAL
- en: Deep Neural Networks
+ id: totrans-13
prefs:
- PREF_H1
type: TYPE_NORMAL
@@ -90,49 +104,67 @@
this reason, *deep learning* has now almost become synonymous with *deep neural
networks*. However, any system that employs many layers to learn high-level representations
of the input data is also a form of deep learning (e.g., deep belief networks).
+ id: totrans-14
prefs: []
type: TYPE_NORMAL
+ zh: 大多数深度学习系统是*人工神经网络*(ANNs,或简称*神经网络*)具有多个堆叠的隐藏层。因此,*深度学习*现在几乎已经成为*深度神经网络*的同义词。然而,任何使用多层学习输入数据的高级表示的系统也是一种深度学习形式(例如,深度信念网络)。
- en: Let’s start by breaking down exactly what we mean by a neural network and then
see how they can be used to learn high-level features from unstructured data.
+ id: totrans-15
prefs: []
type: TYPE_NORMAL
+ zh: 让我们首先详细解释一下神经网络的含义,然后看看它们如何用于从非结构化数据中学习高级特征。
- en: What Is a Neural Network?
+ id: totrans-16
prefs:
- PREF_H2
type: TYPE_NORMAL
+ zh: 什么是神经网络?
- en: A neural network consists of a series of stacked *layers*. Each layer contains
*units* that are connected to the previous layer’s units through a set of *weights*.
As we shall see, there are many different types of layers, but one of the most
common is the *fully connected* (or *dense*) layer that connects all units in
the layer directly to every unit in the previous layer.
+ id: totrans-17
prefs: []
type: TYPE_NORMAL
+ zh: 神经网络由一系列堆叠的*层*组成。每一层包含通过一组*权重*连接到前一层单元的*单元*。正如我们将看到的,有许多不同类型的层,但其中最常见的是*全连接*(或*密集*)层,它将该层中的所有单元直接连接到前一层的每个单元。
- en: Neural networks where all adjacent layers are fully connected are called *multilayer
perceptrons* (MLPs). This is the first type of neural network that we will study.
An example of an MLP is shown in [Figure 2-2](#deep_learning_diagram).
+ id: totrans-18
prefs: []
type: TYPE_NORMAL
+ zh: 所有相邻层都是全连接的神经网络称为*多层感知器*(MLPs)。这是我们将要学习的第一种神经网络。[图2-2](#deep_learning_diagram)中显示了一个MLP的示例。
- en: '![](Images/gdl2_0202.png)'
+ id: totrans-19
prefs: []
type: TYPE_IMG
+ zh: '![](Images/gdl2_0202.png)'
- en: Figure 2-2\. An example of a multilayer perceptron that predicts if a face is
smiling
+ id: totrans-20
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 图2-2。一个预测脸部是否微笑的多层感知器的示例
- en: The input (e.g., an image) is transformed by each layer in turn, in what is
known as a *forward pass* through the network, until it reaches the output layer.
Specifically, each unit applies a nonlinear transformation to a weighted sum of
its inputs and passes the output through to the subsequent layer. The final output
layer is the culmination of this process, where the single unit outputs a probability
that the original input belongs to a particular category (e.g., *smiling*).
+ id: totrans-21
prefs: []
type: TYPE_NORMAL
+ zh: 输入(例如,一张图像)依次通过网络中的每一层进行转换,直到达到输出层,这被称为网络的*前向传递*。具体来说,每个单元对其输入的加权和应用非线性变换,并将输出传递到后续层。最终的输出层是这个过程的结尾,单个单元输出一个概率,表明原始输入属于特定类别(例如,*微笑*)。
- en: The magic of deep neural networks lies in finding the set of weights for each
layer that results in the most accurate predictions. The process of finding these
weights is what we mean by *training* the network.
+ id: totrans-22
prefs: []
type: TYPE_NORMAL
+ zh: 深度神经网络的魔力在于找到每一层的权重集,以获得最准确的预测。找到这些权重的过程就是我们所说的*训练*网络。
- en: During the training process, batches of images are passed through the network
and the predicted outputs are compared to the ground truth. For example, the network
might output a probability of 80% for an image of someone who really is smiling
@@ -143,184 +175,258 @@
the prediction most significantly. This process is appropriately called *backpropagation*.
Gradually, each unit becomes skilled at identifying a particular feature that
ultimately helps the network to make better predictions.
+ id: totrans-23
prefs: []
type: TYPE_NORMAL
+ zh: 在训练过程中,一批图像通过网络传递,并将预测输出与真实值进行比较。例如,网络可能为一个真正微笑的人的图像输出80%的概率,为一个真正不微笑的人的图像输出23%的概率。对于这些示例,完美的预测将输出100%和0%,因此存在一定的误差。然后,预测中的误差通过网络向后传播,调整每组权重,使其朝着最显著改善预测的方向微调。这个过程被适当地称为*反向传播*。逐渐地,每个单元变得擅长识别一个特定的特征,最终帮助网络做出更好的预测。
- en: Learning High-Level Features
+ id: totrans-24
prefs:
- PREF_H2
type: TYPE_NORMAL
+ zh: 学习高级特征
- en: The critical property that makes neural networks so powerful is their ability
to learn features from the input data, without human guidance. In other words,
we do not need to do any feature engineering, which is why neural networks are
so useful! We can let the model decide how it wants to arrange its weights, guided
only by its desire to minimize the error in its predictions.
+ id: totrans-25
prefs: []
type: TYPE_NORMAL
+ zh: 使神经网络如此强大的关键属性是它们能够从输入数据中学习特征,而无需人类指导。换句话说,我们不需要进行任何特征工程,这就是为什么神经网络如此有用!我们可以让模型决定如何安排其权重,只受其希望最小化预测误差的影响。
- en: 'For example, let’s walk through the network shown in [Figure 2-2](#deep_learning_diagram),
assuming it has already been trained to accurately predict if a given input face
is smiling:'
+ id: totrans-26
prefs: []
type: TYPE_NORMAL
+ zh: 例如,让我们来解释一下[图2-2](#deep_learning_diagram)中所示的网络,假设它已经被训练得可以准确预测给定输入脸部是否微笑:
- en: Unit A receives the value for an individual channel of an input pixel.
+ id: totrans-27
prefs:
- PREF_OL
type: TYPE_NORMAL
+ zh: 单元A接收输入像素的单个通道的值。
- en: Unit B combines its input values so that it fires strongest when a particular
low-level feature such as an edge is present.
+ id: totrans-28
prefs:
- PREF_OL
type: TYPE_NORMAL
+ zh: 单元B组合其输入值,使得当存在特定的低级特征,例如边缘时,它发射最强。
- en: Unit C combines the low-level features so that it fires strongest when a higher-level
feature such as *teeth* are seen in the image.
+ id: totrans-29
prefs:
- PREF_OL
type: TYPE_NORMAL
+ zh: 单元C组合低级特征,使得当图像中看到高级特征,例如*牙齿*时,它发射最强。
- en: Unit D combines the high-level features so that it fires strongest when the
person in the original image is smiling.
+ id: totrans-30
prefs:
- PREF_OL
type: TYPE_NORMAL
+ zh: 单元D结合高级特征,使得当原始图像中的人在微笑时它发射最强。
- en: Units in each subsequent layer are able to represent increasingly sophisticated
aspects of the original input, by combining lower-level features from the previous
layer. Amazingly, this arises naturally out of the training process—we do not
need to *tell* each unit what to look for, or whether it should look for high-level
features or low-level features.
+ id: totrans-31
prefs: []
type: TYPE_NORMAL
+ zh: 每个后续层中的单元能够通过结合来自前一层的低级特征来表示原始输入的越来越复杂的方面。令人惊讶的是,这是训练过程中自然产生的——我们不需要*告诉*每个单元要寻找什么,或者它应该寻找高级特征还是低级特征。
- en: The layers between the input and output layers are called *hidden* layers. While
our example only has two hidden layers, deep neural networks can have many more.
Stacking large numbers of layers allows the neural network to learn progressively
higher-level features by gradually building up information from the lower-level
features in previous layers. For example, ResNet,^([1](ch02.xhtml#idm45387028957520))
designed for image recognition, contains 152 layers.
+ id: totrans-32
prefs: []
type: TYPE_NORMAL
+ zh: 输入层和输出层之间的层被称为*隐藏*层。虽然我们的例子只有两个隐藏层,但深度神经网络可以有更多层。堆叠大量层允许神经网络逐渐构建信息,从先前层中的低级特征逐渐构建出更高级别的特征。例如,用于图像识别的ResNet包含152层。
- en: Next, we’ll dive straight into the practical side of deep learning and get set
up with TensorFlow and Keras so that you can start building your own deep neural
networks.
+ id: totrans-33
prefs: []
type: TYPE_NORMAL
+ zh: 接下来,我们将直接深入深度学习的实践方面,并使用TensorFlow和Keras进行设置,以便您可以开始构建自己的深度神经网络。
- en: TensorFlow and Keras
+ id: totrans-34
prefs:
- PREF_H2
type: TYPE_NORMAL
+ zh: TensorFlow和Keras
- en: '[*TensorFlow*](https://www.tensorflow.org) is an open source Python library
for machine learning, developed by Google. TensorFlow is one of the most utilized
frameworks for building machine learning solutions, with particular emphasis on
the manipulation of tensors (hence the name). It provides the low-level functionality
required to train neural networks, such as computing the gradient of arbitrary
differentiable expressions and efficiently executing tensor operations.'
+ id: totrans-35
prefs: []
type: TYPE_NORMAL
+ zh: '[*TensorFlow*](https://www.tensorflow.org)是由谷歌开发的用于机器学习的开源Python库。TensorFlow是构建机器学习解决方案中最常用的框架之一,特别强调张量的操作(因此得名)。它提供了训练神经网络所需的低级功能,例如计算任意可微表达式的梯度和高效执行张量操作。'
- en: '[*Keras*](https://keras.io) is a high-level API for building neural networks,
built on top of TensorFlow ([Figure 2-3](#tf_keras_logos)). It is extremely flexible
and very user-friendly, making it an ideal choice for getting started with deep
learning. Moreover, Keras provides numerous useful building blocks that can be
plugged together to create highly complex deep learning architectures through
its functional API.'
+ id: totrans-36
prefs: []
type: TYPE_NORMAL
+ zh: '[*Keras*](https://keras.io)是一个用于构建神经网络的高级API,构建在TensorFlow之上([图2-3](#tf_keras_logos))。它非常灵活和用户友好,是开始深度学习的理想选择。此外,Keras提供了许多有用的构建模块,可以通过其功能API组合在一起,创建高度复杂的深度学习架构。'
- en: '![](Images/gdl2_0203.png)'
+ id: totrans-37
prefs: []
type: TYPE_IMG
+ zh: '![](Images/gdl2_0203.png)'
- en: Figure 2-3\. TensorFlow and Keras are excellent tools for building deep learning
solutions
+ id: totrans-38
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 图2-3\. TensorFlow和Keras是构建深度学习解决方案的优秀工具
- en: If you are just getting started with deep learning, I can highly recommend using
TensorFlow and Keras. This setup will allow you to build any network that you
can think of in a production environment, while also giving you an easy-to-learn
API that enables rapid development of new ideas and concepts. Let’s start by seeing
how easy it is to build a multilayer perceptron using Keras.
+ id: totrans-39
prefs: []
type: TYPE_NORMAL
+ zh: 如果您刚开始学习深度学习,我强烈推荐使用TensorFlow和Keras。这个设置将允许您在生产环境中构建任何您能想到的网络,同时还提供易于学习的API,可以快速开发新的想法和概念。让我们从看看使用Keras构建多层感知器有多容易开始。
- en: Multilayer Perceptron (MLP)
+ id: totrans-40
prefs:
- PREF_H1
type: TYPE_NORMAL
+ zh: 多层感知器(MLP)
- en: In this section, we will train an MLP to classify a given image using *supervised
learning*. Supervised learning is a type of machine learning algorithm in which
the computer is trained on a labeled dataset. In other words, the dataset used
for training includes input data with corresponding output labels. The goal of
the algorithm is to learn a mapping between the input data and the output labels,
so that it can make predictions on new, unseen data.
+ id: totrans-41
prefs: []
type: TYPE_NORMAL
+ zh: 在本节中,我们将使用*监督学习*训练一个MLP来对给定的图像进行分类。监督学习是一种机器学习算法,计算机在标记的数据集上进行训练。换句话说,用于训练的数据集包括带有相应输出标签的输入数据。算法的目标是学习输入数据和输出标签之间的映射,以便它可以对新的、未见过的数据进行预测。
- en: The MLP is a discriminative (rather than generative) model, but supervised learning
will still play a role in many types of generative models that we will explore
in later chapters of this book, so it is a good place to start our journey.
+ id: totrans-42
prefs: []
type: TYPE_NORMAL
+ zh: MLP是一种判别模型(而不是生成模型),但在本书后面的章节中,监督学习仍将在许多类型的生成模型中发挥作用,因此这是我们旅程的一个好起点。
- en: Running the Code for This Example
+ id: totrans-43
prefs:
- PREF_H1
type: TYPE_NORMAL
+ zh: 运行此示例的代码
- en: The code for this example can be found in the Jupyter notebook located at *notebooks/02_deeplearning/01_mlp/mlp.ipynb*
in the book repository.
+ id: totrans-44
prefs: []
type: TYPE_NORMAL
+ zh: 这个例子的代码可以在位于书籍存储库中的Jupyter笔记本中找到,位置为*notebooks/02_deeplearning/01_mlp/mlp.ipynb*。
- en: Preparing the Data
+ id: totrans-45
prefs:
- PREF_H2
type: TYPE_NORMAL
+ zh: 准备数据
- en: For this example we will be using the [CIFAR-10](https://oreil.ly/cNbFG) dataset,
a collection of 60,000 32 × 32–pixel color images that comes bundled with Keras
out of the box. Each image is classified into exactly one of 10 classes, as shown
in [Figure 2-4](#cifar).
+ id: totrans-46
prefs: []
type: TYPE_NORMAL
+ zh: 在这个例子中,我们将使用[CIFAR-10](https://oreil.ly/cNbFG)数据集,这是一个包含60,000个32×32像素彩色图像的集合,与Keras捆绑在一起。每个图像被分类为10个类别中的一个,如[图2-4](#cifar)所示。
- en: '![](Images/gdl2_0204.png)'
+ id: totrans-47
prefs: []
type: TYPE_IMG
+ zh: '![](Images/gdl2_0204.png)'
- en: 'Figure 2-4\. Example images from the CIFAR-10 dataset (source: [Krizhevsky,
2009](https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf))^([2](ch02.xhtml#idm45387033163216))'
+ id: totrans-48
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 图2-4\. CIFAR-10数据集中的示例图像(来源:[Krizhevsky, 2009](https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf))^([2](ch02.xhtml#idm45387033163216))
- en: By default, the image data consists of integers between 0 and 255 for each pixel
channel. We first need to preprocess the images by scaling these values to lie
between 0 and 1, as neural networks work best when the absolute value of each
input is less than 1.
+ id: totrans-49
prefs: []
type: TYPE_NORMAL
+ zh: 默认情况下,图像数据由每个像素通道的0到255之间的整数组成。我们首先需要通过将这些值缩放到0到1之间来预处理图像,因为当每个输入的绝对值小于1时,神经网络的效果最好。
- en: We also need to change the integer labeling of the images to one-hot encoded
vectors, because the neural network output will be a probability that the image
belongs to each class. If the class integer label of an image is
, then its one-hot encoding is a vector of length 10 (the number of classes) that
has 0s in all but the th element, which is
1\. These steps are shown in [Example 2-1](#preprocessing-cifar-10).
+ id: totrans-50
prefs: []
type: TYPE_NORMAL
+ zh: 我们还需要将图像的整数标签更改为独热编码向量,因为神经网络的输出将是图像属于每个类的概率。如果图像的类整数标签是,那么它的独热编码是一个长度为10的向量(类的数量),除了第个元素为1之外,其他元素都为0。这些步骤在[示例2-1](#preprocessing-cifar-10)中显示。
- en: Example 2-1\. Preprocessing the CIFAR-10 dataset
+ id: totrans-51
prefs:
- PREF_H5
type: TYPE_NORMAL
+ zh: 示例2-1。预处理CIFAR-10数据集
- en: '[PRE0]'
+ id: totrans-52
prefs: []
type: TYPE_PRE
+ zh: '[PRE0]'
- en: '[![1](Images/1.png)](#co_deep_learning_CO1-1)'
+ id: totrans-53
prefs: []
type: TYPE_NORMAL
+ zh: '[![1](Images/1.png)](#co_deep_learning_CO1-1)'
- en: Load the CIFAR-10 dataset. `x_train` and `x_test` are `numpy` arrays of shape
`[50000, 32, 32, 3]` and `[10000, 32, 32, 3]`, respectively. `y_train` and `y_test`
are `numpy` arrays of shape `[50000, 1]` and `[10000, 1]`, respectively, containing
the integer labels in the range 0 to 9 for the class of each image.
+ id: totrans-54
prefs: []
type: TYPE_NORMAL
+ zh: 加载CIFAR-10数据集。`x_train`和`x_test`分别是形状为`[50000, 32, 32, 3]`和`[10000, 32, 32,
+ 3]`的`numpy`数组。`y_train`和`y_test`分别是形状为`[50000, 1]`和`[10000, 1]`的`numpy`数组,包含每个图像类的范围为0到9的整数标签。
- en: '[![2](Images/2.png)](#co_deep_learning_CO1-2)'
+ id: totrans-55
prefs: []
type: TYPE_NORMAL
+ zh: '[![2](Images/2.png)](#co_deep_learning_CO1-2)'
- en: Scale each image so that the pixel channel values lie between 0 and 1.
+ id: totrans-56
prefs: []
type: TYPE_NORMAL
+ zh: 缩放每个图像,使像素通道值介于0和1之间。
- en: '[![3](Images/3.png)](#co_deep_learning_CO1-3)'
+ id: totrans-57
prefs: []
type: TYPE_NORMAL
+ zh: '[![3](Images/3.png)](#co_deep_learning_CO1-3)'
- en: One-hot encode the labels—the new shapes of `y_train` and `y_test` are `[50000,
10]` and `[10000, 10]`, respectively.
+ id: totrans-58
prefs: []
type: TYPE_NORMAL
+ zh: 对标签进行独热编码——`y_train`和`y_test`的新形状分别为`[50000, 10]`和`[10000, 10]`。
- en: We can see that the training image data (`x_train`) is stored in a *tensor*
of shape `[50000, 32, 32, 3]`. There are no *columns* or *rows* in this dataset;
instead, this is a tensor with four dimensions. A tensor is just a multidimensional
@@ -328,108 +434,154 @@
first dimension of this tensor references the index of the image in the dataset,
the second and third relate to the size of the image, and the last is the channel
(i.e., red, green, or blue, since these are RGB images).
+ id: totrans-59
prefs: []
type: TYPE_NORMAL
+ zh: 我们可以看到训练图像数据(`x_train`)存储在形状为`[50000, 32, 32, 3]`的*张量*中。在这个数据集中没有*列*或*行*;相反,这是一个具有四个维度的张量。张量只是一个多维数组——它是矩阵向超过两个维度的自然扩展。这个张量的第一个维度引用数据集中图像的索引,第二和第三个维度与图像的大小有关,最后一个是通道(即红色、绿色或蓝色,因为这些是RGB图像)。
- en: For example, [Example 2-2](#pixel-value) shows how we can find the channel value
of a specific pixel in an image.
+ id: totrans-60
prefs: []
type: TYPE_NORMAL
+ zh: 例如,[示例2-2](#pixel-value)展示了如何找到图像中特定像素的通道值。
- en: Example 2-2\. The green channel (1) value of the pixel in the (12,13) position
of image 54
+ id: totrans-61
prefs:
- PREF_H5
type: TYPE_NORMAL
+ zh: 示例2-2。图像54中位置为(12,13)的像素的绿色通道(1)值
- en: '[PRE1]'
+ id: totrans-62
prefs: []
type: TYPE_PRE
+ zh: '[PRE1]'
- en: Building the Model
+ id: totrans-63
prefs:
- PREF_H2
type: TYPE_NORMAL
+ zh: 构建模型
- en: In Keras you can either define the structure of a neural network as a `Sequential`
model or using the functional API.
+ id: totrans-64
prefs: []
type: TYPE_NORMAL
+ zh: 在Keras中,您可以将神经网络的结构定义为`Sequential`模型或使用功能API。
- en: A `Sequential` model is useful for quickly defining a linear stack of layers
(i.e., where one layer follows on directly from the previous layer without any
branching). We can define our MLP model using the `Sequential` class as shown
in [Example 2-3](#sequential_functional).
+ id: totrans-65
prefs: []
type: TYPE_NORMAL
+ zh: '`Sequential`模型适用于快速定义一系列层的线性堆叠(即一个层直接跟在前一个层后面,没有任何分支)。我们可以使用`Sequential`类来定义我们的MLP模型,如[示例2-3](#sequential_functional)所示。'
- en: Example 2-3\. Building our MLP using a `Sequential` model
+ id: totrans-66
prefs:
- PREF_H5
type: TYPE_NORMAL
+ zh: 示例2-3。使用`Sequential`模型构建我们的MLP
- en: '[PRE2]'
+ id: totrans-67
prefs: []
type: TYPE_PRE
+ zh: '[PRE2]'
- en: Many of the models in this book require that the output from a layer is passed
to multiple subsequent layers, or conversely, that a layer receives input from
multiple preceding layers. For these models, the `Sequential` class is not suitable
and we would need to use the functional API instead, which is a lot more flexible.
+ id: totrans-68
prefs: []
type: TYPE_NORMAL
+ zh: 本书中的许多模型要求从一层输出传递到多个后续层,或者反过来,一层接收来自多个前面层的输入。对于这些模型,`Sequential`类不适用,我们需要使用功能API,这样更加灵活。
- en: Tip
+ id: totrans-69
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 提示
- en: I recommend that even if you are just starting out building linear models with
Keras, you still use the functional API rather than `Sequential` models, since
it will serve you better in the long run as your neural networks become more architecturally
complex. The functional API will give you complete freedom over the design of
your deep neural network.
+ id: totrans-70
prefs: []
type: TYPE_NORMAL
+ zh: 我建议即使您刚开始使用Keras构建线性模型,也应该使用功能API而不是`Sequential`模型,因为随着您的神经网络变得更加复杂,功能API将在长远中为您提供更好的服务。功能API将为您提供对深度神经网络设计的完全自由。
- en: '[Example 2-4](#sequential_functional-2) shows the same MLP coded using the
functional API. When using the functional API, we use the `Model` class to define
the overall input and output layers of the model.'
+ id: totrans-71
prefs: []
type: TYPE_NORMAL
+ zh: '[示例2-4](#sequential_functional-2)展示了使用功能API编码的相同MLP。在使用功能API时,我们使用`Model`类来定义模型的整体输入和输出层。'
- en: Example 2-4\. Building our MLP using the functional API
+ id: totrans-72
prefs:
- PREF_H5
type: TYPE_NORMAL
+ zh: 示例2-4。使用功能API构建我们的MLP
- en: '[PRE3]'
+ id: totrans-73
prefs: []
type: TYPE_PRE
+ zh: '[PRE3]'
- en: Both methods give identical models—a diagram of the architecture is shown in
[Figure 2-5](#cifar_nn).
+ id: totrans-74
prefs: []
type: TYPE_NORMAL
+ zh: 这两种方法提供相同的模型——架构的图表显示在[图2-5](#cifar_nn)中。
- en: '![](Images/gdl2_0205.png)'
+ id: totrans-75
prefs: []
type: TYPE_IMG
+ zh: '![](Images/gdl2_0205.png)'
- en: Figure 2-5\. A diagram of the MLP architecture
+ id: totrans-76
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 图2-5。MLP架构的图表
- en: Let’s now look in more detail at the different layers and activation functions
used within the MLP.
+ id: totrans-77
prefs: []
type: TYPE_NORMAL
+ zh: 现在让我们更详细地看一下MLP中使用的不同层和激活函数。
- en: Layers
+ id: totrans-78
prefs:
- PREF_H3
type: TYPE_NORMAL
+ zh: 层
- en: 'To build our MLP, we used three different types of layers: `Input`, `Flatten`,
and `Dense`.'
+ id: totrans-79
prefs: []
type: TYPE_NORMAL
+ zh: 为构建我们的MLP,我们使用了三种不同类型的层:`Input`、`Flatten`和`Dense`。
- en: The `Input` layer is an entry point into the network. We tell the network the
shape of each data element to expect as a tuple. Notice that we do not specify
the batch size; this isn’t necessary as we can pass any number of images into
the `Input` layer simultaneously. We do not need to explicitly state the batch
size in the `Input` layer definition.
+ id: totrans-80
prefs: []
type: TYPE_NORMAL
+ zh: '`Input`层是网络的入口点。我们告诉网络每个数据元素的形状应该是一个元组。请注意,我们不指定批量大小;这是不必要的,因为我们可以同时将任意数量的图像传递到`Input`层中。我们不需要在`Input`层定义中明确指定批量大小。'
- en: Next we flatten this input into a vector, using a `Flatten` layer. This results
in a vector of length 3,072 (= 32 × 32 × 3). The reason we do this is because
the subsequent `Dense` layer requires that its input is flat, rather than a multidimensional
array. As we shall see later, other layer types require multidimensional arrays
as input, so you need to be aware of the required input and output shape of each
layer type to understand when it is necessary to use `Flatten`.
+ id: totrans-81
prefs: []
type: TYPE_NORMAL
+ zh: 接下来,我们将这个输入展平成一个向量,使用`Flatten`层。这将导致一个长度为3072的向量(= 32 × 32 × 3)。我们这样做的原因是因为后续的`Dense`层要求其输入是平坦的,而不是多维数组。正如我们将在后面看到的,其他类型的层需要多维数组作为输入,因此您需要了解每种层类型所需的输入和输出形状,以便了解何时需要使用`Flatten`。
- en: The `Dense` layer is one of the most fundamental building blocks of a neural
network. It contains a given number of units that are densely connected to the
previous layer—that is, every unit in the layer is connected to every unit in
@@ -439,16 +591,22 @@
nonlinear *activation function* before being sent to the following layer. The
activation function is critical to ensure the neural network is able to learn
complex functions and doesn’t just output a linear combination of its inputs.
+ id: totrans-82
prefs: []
type: TYPE_NORMAL
+ zh: '`Dense`层是神经网络中最基本的构建块之一。它包含一定数量的单元,这些单元与前一层密切连接,也就是说,层中的每个单元都与前一层中的每个单元连接,通过一个携带权重的单一连接(可以是正数或负数)。给定单元的输出是它从前一层接收的输入的加权和,然后通过非线性*激活函数*传递到下一层。激活函数对于确保神经网络能够学习复杂函数并且不仅仅输出其输入的线性组合至关重要。'
- en: Activation functions
+ id: totrans-83
prefs:
- PREF_H3
type: TYPE_NORMAL
+ zh: 激活函数
- en: There are many kinds of activation function, but three of the most important
are ReLU, sigmoid, and softmax.
+ id: totrans-84
prefs: []
type: TYPE_NORMAL
+ zh: 有许多种激活函数,但其中最重要的三种是ReLU、sigmoid和softmax。
- en: 'The *ReLU* (rectified linear unit) activation function is defined to be 0 if
the input is negative and is otherwise equal to the input. The *LeakyReLU* activation
function is very similar to ReLU, with one key difference: whereas the ReLU activation
@@ -459,104 +617,161 @@
this unit. LeakyReLU activations fix this issue by always ensuring the gradient
is nonzero. ReLU-based functions are among the most reliable activations to use
between the layers of a deep network to encourage stable training.'
+ id: totrans-85
prefs: []
type: TYPE_NORMAL
+ zh: '*ReLU*(修正线性单元)激活函数被定义为如果输入为负数则为0,否则等于输入。*LeakyReLU*激活函数与ReLU非常相似,但有一个关键区别:ReLU激活函数对于小于0的输入值返回0,而LeakyReLU函数返回与输入成比例的一个小负数。如果ReLU单元总是输出0,有时会出现死亡现象,因为存在对负值预激活的大偏差。在这种情况下,梯度为0,因此没有错误通过该单元向后传播。LeakyReLU激活通过始终确保梯度为非零来解决这个问题。基于ReLU的函数是在深度网络的层之间使用的最可靠的激活函数之一,以鼓励稳定的训练。'
- en: The *sigmoid* activation is useful if you wish the output from the layer to
be scaled between 0 and 1—for example, for binary classification problems with
one output unit or multilabel classification problems, where each observation
can belong to more than one class. [Figure 2-6](#activations) shows ReLU, LeakyReLU,
and sigmoid activation functions side by side for comparison.
+ id: totrans-86
prefs: []
type: TYPE_NORMAL
+ zh: 如果您希望从该层输出的结果在0和1之间缩放,那么*sigmoid*激活函数是有用的,例如,对于具有一个输出单元的二元分类问题或多标签分类问题,其中每个观察结果可以属于多个类。[图2-6](#activations)显示了ReLU、LeakyReLU和sigmoid激活函数并排进行比较。
- en: '![](Images/gdl2_0206.png)'
+ id: totrans-87
prefs: []
type: TYPE_IMG
+ zh: '![](Images/gdl2_0206.png)'
- en: Figure 2-6\. The ReLU, LeakyReLU, and sigmoid activation functions
+ id: totrans-88
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 图2-6。ReLU、LeakyReLU和sigmoid激活函数
- en: 'The *softmax* activation function is useful if you want the total sum of the
output from the layer to equal 1; for example, for multiclass classification problems
where each observation only belongs to exactly one class. It is defined as:'
+ id: totrans-89
prefs: []
type: TYPE_NORMAL
+ zh: 如果您希望从该层输出的总和等于1,则*softmax*激活函数是有用的;例如,对于每个观察结果只属于一个类的多类分类问题。它被定义为:
- en:
+ id: totrans-90
prefs: []
type: TYPE_NORMAL
+ zh:
- en: Here, *J* is the total number of units in the layer. In our neural network,
we use a softmax activation in the final layer to ensure that the output is a
set of 10 probabilities that sum to 1, which can be interpreted as the likelihood
that the image belongs to each class.
+ id: totrans-91
prefs: []
type: TYPE_NORMAL
+ zh: 在这里,*J*是层中单元的总数。在我们的神经网络中,我们在最后一层使用softmax激活,以确保输出是一组总和为1的10个概率,这可以被解释为图像属于每个类的可能性。
- en: In Keras, activation functions can be defined within a layer ([Example 2-5](#activation-function-together))
or as a separate layer ([Example 2-6](#activation-function-separate)).
+ id: totrans-92
prefs: []
type: TYPE_NORMAL
+ zh: 在Keras中,激活函数可以在层内定义([示例2-5](#activation-function-together))或作为单独的层定义([示例2-6](#activation-function-separate))。
- en: Example 2-5\. A ReLU activation function defined as part of a `Dense` layer
+ id: totrans-93
prefs:
- PREF_H5
type: TYPE_NORMAL
+ zh: 示例2-5。作为`Dense`层的一部分定义的ReLU激活函数
- en: '[PRE4]'
+ id: totrans-94
prefs: []
type: TYPE_PRE
+ zh: '[PRE4]'
- en: Example 2-6\. A ReLU activation function defined as its own layer
+ id: totrans-95
prefs:
- PREF_H5
type: TYPE_NORMAL
+ zh: 示例2-6。作为自己的层定义的ReLU激活函数
- en: '[PRE5]'
+ id: totrans-96
prefs: []
type: TYPE_PRE
+ zh: '[PRE5]'
- en: In our example, we pass the input through two `Dense` layers, the first with
200 units and the second with 150, both with ReLU activation functions.
+ id: totrans-97
prefs: []
type: TYPE_NORMAL
+ zh: 在我们的示例中,我们通过两个`Dense`层传递输入,第一个有200个单元,第二个有150个,两者都带有ReLU激活函数。
- en: Inspecting the model
+ id: totrans-98
prefs:
- PREF_H3
type: TYPE_NORMAL
+ zh: 检查模型
- en: We can use the `model.summary()` method to inspect the shape of the network
at each layer, as shown in [Table 2-1](#first_nn_shape).
+ id: totrans-99
prefs: []
type: TYPE_NORMAL
+ zh: 我们可以使用`model.summary()`方法来检查每一层网络的形状,如[表2-1](#first_nn_shape)所示。
- en: Table 2-1\. Output from the `model.summary()` method
+ id: totrans-100
prefs: []
type: TYPE_NORMAL
+ zh: 表2-1. `model.summary()`方法的输出
- en: '| Layer (type) | Output shape | Param # |'
+ id: totrans-101
prefs: []
type: TYPE_TB
+ zh: '| 层(类型) | 输出形状 | 参数 # |'
- en: '| --- | --- | --- |'
+ id: totrans-102
prefs: []
type: TYPE_TB
+ zh: '| --- | --- | --- |'
- en: '| InputLayer | (None, 32, 32, 3) | 0 |'
+ id: totrans-103
prefs: []
type: TYPE_TB
+ zh: '| InputLayer | (None, 32, 32, 3) | 0 |'
- en: '| Flatten | (None, 3072) | 0 |'
+ id: totrans-104
prefs: []
type: TYPE_TB
+ zh: '| 展平 | (None, 3072) | 0 |'
- en: '| Dense | (None, 200) | 614,600 |'
+ id: totrans-105
prefs: []
type: TYPE_TB
+ zh: '| Dense | (None, 200) | 614,600 |'
- en: '| Dense | (None, 150) | 30,150 |'
+ id: totrans-106
prefs: []
type: TYPE_TB
+ zh: '| Dense | (None, 150) | 30,150 |'
- en: '| Dense | (None, 10) | 1,510 |'
+ id: totrans-107
prefs: []
type: TYPE_TB
+ zh: '| Dense | (None, 10) | 1,510 |'
- en: '| Total params | 646,260 |'
+ id: totrans-108
prefs: []
type: TYPE_TB
+ zh: '| 总参数 | 646,260 |'
- en: '| Trainable params | 646,260 |'
+ id: totrans-109
prefs: []
type: TYPE_TB
+ zh: '| 可训练参数 | 646,260 |'
- en: '| Non-trainable params | 0 |'
+ id: totrans-110
prefs: []
type: TYPE_TB
+ zh: '| 不可训练参数 | 0 |'
- en: 'Notice how the shape of our `Input` layer matches the shape of `x_train` and
the shape of our `Dense` output layer matches the shape of `y_train`. Keras uses
`None` as a marker for the first dimension to show that it doesn’t yet know the
@@ -567,63 +782,89 @@
is also the reason why you get a performance increase when training deep neural
networks on GPUs instead of CPUs: GPUs are optimized for large tensor operations
since these calculations are also necessary for complex graphics manipulation.'
+ id: totrans-111
prefs: []
type: TYPE_NORMAL
+ zh: 注意我们的`Input`层的形状与`x_train`的形状匹配,而我们的`Dense`输出层的形状与`y_train`的形状匹配。Keras使用`None`作为第一维的标记,以显示它尚不知道将传递到网络中的观测数量。实际上,它不需要知道;我们可以一次通过1个观测或1000个观测通过网络。这是因为张量操作是使用线性代数同时在所有观测上进行的—这是由TensorFlow处理的部分。这也是为什么在GPU上训练深度神经网络而不是在CPU上时性能会提高的原因:GPU针对大型张量操作进行了优化,因为这些计算对于复杂的图形处理也是必要的。
- en: The `summary` method also gives the number of parameters (weights) that will
be trained at each layer. If ever you find that your model is training too slowly,
check the summary to see if there are any layers that contain a huge number of
weights. If so, you should consider whether the number of units in the layer could
be reduced to speed up training.
+ id: totrans-112
prefs: []
type: TYPE_NORMAL
+ zh: '`summary`方法还会给出每一层将被训练的参数(权重)的数量。如果你发现你的模型训练速度太慢,检查摘要看看是否有任何包含大量权重的层。如果有的话,你应该考虑是否可以减少该层中的单元数量以加快训练速度。'
- en: Tip
+ id: totrans-113
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 提示
- en: Make sure you understand how the number of parameters is calculated in each
layer! It’s important to remember that by default, each unit within a given layer
is also connected to one additional *bias* unit that always outputs 1\. This ensures
that the output from the unit can still be nonzero even when all inputs from the
previous layer are 0.
+ id: totrans-114
prefs: []
type: TYPE_NORMAL
+ zh: 确保你理解每一层中参数是如何计算的!重要的是要记住,默认情况下,给定层中的每个单元也连接到一个额外的*偏置*单元,它总是输出1。这确保了即使来自前一层的所有输入为0,单元的输出仍然可以是非零的。
- en: Therefore, the number of parameters in the 200-unit `Dense` layer is 200 * (3,072
+ 1) = 614,600.
+ id: totrans-115
prefs: []
type: TYPE_NORMAL
+ zh: 因此,200单元`Dense`层中的参数数量为200 * (3,072 + 1) = 614,600。
- en: Compiling the Model
+ id: totrans-116
prefs:
- PREF_H2
type: TYPE_NORMAL
+ zh: 编译模型
- en: In this step, we compile the model with an optimizer and a loss function, as
shown in [Example 2-7](#optimizer-loss).
+ id: totrans-117
prefs: []
type: TYPE_NORMAL
+ zh: 在这一步中,我们使用一个优化器和一个损失函数来编译模型,如[示例2-7](#optimizer-loss)所示。
- en: Example 2-7\. Defining the optimizer and the loss function
+ id: totrans-118
prefs:
- PREF_H5
type: TYPE_NORMAL
+ zh: 示例2-7. 定义优化器和损失函数
- en: '[PRE6]'
+ id: totrans-119
prefs: []
type: TYPE_PRE
+ zh: '[PRE6]'
- en: Let’s now look in more detail at what we mean by loss functions and optimizers.
+ id: totrans-120
prefs: []
type: TYPE_NORMAL
+ zh: 现在让我们更详细地看一下我们所说的损失函数和优化器。
- en: Loss functions
+ id: totrans-121
prefs:
- PREF_H3
type: TYPE_NORMAL
+ zh: 损失函数
- en: The *loss function* is used by the neural network to compare its predicted output
to the ground truth. It returns a single number for each observation; the greater
this number, the worse the network has performed for this observation.
+ id: totrans-122
prefs: []
type: TYPE_NORMAL
+ zh: '*损失函数*被神经网络用来比较其预测输出与实际情况的差异。它为每个观测返回一个单一数字;这个数字越大,网络在这个观测中的表现就越差。'
- en: Keras provides many built-in loss functions to choose from, or you can create
your own. Three of the most commonly used are mean squared error, categorical
cross-entropy, and binary cross-entropy. It is important to understand when it
is appropriate to use each.
+ id: totrans-123
prefs: []
type: TYPE_NORMAL
+ zh: Keras提供了许多内置的损失函数可供选择,或者你可以创建自己的损失函数。最常用的三个是均方误差、分类交叉熵和二元交叉熵。重要的是要理解何时适合使用每种损失函数。
- en: 'If your neural network is designed to solve a regression problem (i.e., the
output is continuous), then you might use the *mean squared error* loss. This
is the mean of the squared difference between the ground truth of each output
unit, where the mean is taken over all output
units:'
+ id: totrans-124
prefs: []
type: TYPE_NORMAL
+ zh: 如果你的神经网络旨在解决回归问题(即输出是连续的),那么你可能会使用*均方误差*损失。这是每个输出单元的实际值和预测值之间的平方差的平均值,其中平均值是在所有个输出单元上取得的:
- en:
+ id: totrans-125
prefs: []
type: TYPE_NORMAL
+ zh:
- en: 'If you are working on a classification problem where each observation only
belongs to one class, then *categorical cross-entropy* is the correct loss function.
This is defined as follows:'
+ id: totrans-126
prefs: []
type: TYPE_NORMAL
+ zh: 如果你正在处理一个分类问题,其中每个观测只属于一个类,那么*分类交叉熵*是正确的损失函数。它定义如下:
- en:
+ id: totrans-127
prefs: []
type: TYPE_NORMAL
+ zh:
- en: 'Finally, if you are working on a binary classification problem with one output
unit, or a multilabel problem where each observation can belong to multiple classes
simultaneously, you should use *binary cross-entropy*:'
+ id: totrans-128
prefs: []
type: TYPE_NORMAL
+ zh: 最后,如果你正在处理一个具有一个输出单元的二元分类问题,或者一个每个观测可以同时属于多个类的多标签问题,你应该使用*二元交叉熵*:
- en:
+ id: totrans-129
prefs: []
type: TYPE_NORMAL
+ zh:
- en: Optimizers
+ id: totrans-130
prefs:
- PREF_H3
type: TYPE_NORMAL
+ zh: 优化器
- en: The *optimizer* is the algorithm that will be used to update the weights in
the neural network based on the gradient of the loss function. One of the most
commonly used and stable optimizers is *Adam* (Adaptive Moment Estimation).^([3](ch02.xhtml#idm45387032147088))
@@ -685,228 +962,339 @@
with a large learning rate, the downside is that it may result in less stable
training and may not find the global minimum of the loss function. This is a parameter
that you may want to tune or adjust during training.
+ id: totrans-131
prefs: []
type: TYPE_NORMAL
+ zh: '*优化器* 是基于损失函数的梯度更新神经网络权重的算法。最常用和稳定的优化器之一是 *Adam*(自适应矩估计)。^([3](ch02.xhtml#idm45387032147088))
+ 在大多数情况下,您不需要调整Adam优化器的默认参数,除了 *学习率*。学习率越大,每个训练步骤中权重的变化就越大。虽然初始时使用较大的学习率训练速度更快,但缺点是可能导致训练不稳定,无法找到损失函数的全局最小值。这是您可能需要在训练过程中调整的参数。'
- en: Another common optimizer that you may come across is *RMSProp* (Root Mean Squared
Propagation). Again, you shouldn’t need to adjust the parameters of this optimizer
too much, but it is worth reading the [Keras documentation](https://keras.io/optimizers)
to understand the role of each parameter.
+ id: totrans-132
prefs: []
type: TYPE_NORMAL
+ zh: 另一个您可能遇到的常见优化器是 *RMSProp*(均方根传播)。同样,您不需要太多调整这个优化器的参数,但值得阅读[Keras文档](https://keras.io/optimizers)以了解每个参数的作用。
- en: We pass both the loss function and the optimizer into the `compile` method of
the model, as well as a `metrics` parameter where we can specify any additional
metrics that we would like to report on during training, such as accuracy.
+ id: totrans-133
prefs: []
type: TYPE_NORMAL
+ zh: 我们将损失函数和优化器一起传递给模型的 `compile` 方法,还有一个 `metrics` 参数,我们可以在训练过程中指定任何额外的指标,如准确率。
- en: Training the Model
+ id: totrans-134
prefs:
- PREF_H2
type: TYPE_NORMAL
+ zh: 训练模型
- en: Thus far, we haven’t shown the model any data. We have just set up the architecture
and compiled the model with a loss function and optimizer.
+ id: totrans-135
prefs: []
type: TYPE_NORMAL
+ zh: 到目前为止,我们还没有向模型展示任何数据。我们只是设置了架构并使用损失函数和优化器编译了模型。
- en: To train the model against the data, we simply call the `fit` method, as shown
in [Example 2-8](#training-mlp).
+ id: totrans-136
prefs: []
type: TYPE_NORMAL
+ zh: 要针对数据训练模型,我们只需调用 `fit` 方法,如[示例2-8](#training-mlp)所示。
- en: Example 2-8\. Calling the `fit` method to train the model
+ id: totrans-137
prefs:
- PREF_H5
type: TYPE_NORMAL
+ zh: 示例2-8\. 调用 `fit` 方法来训练模型
- en: '[PRE7]'
+ id: totrans-138
prefs: []
type: TYPE_PRE
+ zh: '[PRE7]'
- en: '[![1](Images/1.png)](#co_deep_learning_CO2-1)'
+ id: totrans-139
prefs: []
type: TYPE_NORMAL
+ zh: '[![1](Images/1.png)](#co_deep_learning_CO2-1)'
- en: The raw image data.
+ id: totrans-140
prefs: []
type: TYPE_NORMAL
+ zh: 原始图像数据。
- en: '[![2](Images/2.png)](#co_deep_learning_CO2-2)'
+ id: totrans-141
prefs: []
type: TYPE_NORMAL
+ zh: '[![2](Images/2.png)](#co_deep_learning_CO2-2)'
- en: The one-hot encoded class labels.
+ id: totrans-142
prefs: []
type: TYPE_NORMAL
+ zh: 独热编码的类标签。
- en: '[![3](Images/3.png)](#co_deep_learning_CO2-3)'
+ id: totrans-143
prefs: []
type: TYPE_NORMAL
+ zh: '[![3](Images/3.png)](#co_deep_learning_CO2-3)'
- en: The `batch_size` determines how many observations will be passed to the network
at each training step.
+ id: totrans-144
prefs: []
type: TYPE_NORMAL
+ zh: '`batch_size` 确定每个训练步骤将传递给网络多少观察值。'
- en: '[![4](Images/4.png)](#co_deep_learning_CO2-4)'
+ id: totrans-145
prefs: []
type: TYPE_NORMAL
+ zh: '[![4](Images/4.png)](#co_deep_learning_CO2-4)'
- en: The `epochs` determine how many times the network will be shown the full training
data.
+ id: totrans-146
prefs: []
type: TYPE_NORMAL
+ zh: '`epochs` 确定网络将被展示完整训练数据的次数。'
- en: '[![5](Images/5.png)](#co_deep_learning_CO2-5)'
+ id: totrans-147
prefs: []
type: TYPE_NORMAL
+ zh: '[![5](Images/5.png)](#co_deep_learning_CO2-5)'
- en: If `shuffle = True`, the batches will be drawn randomly without replacement
from the training data at each training step.
+ id: totrans-148
prefs: []
type: TYPE_NORMAL
+ zh: 如果 `shuffle = True`,每个训练步骤将从训练数据中随机抽取批次而不重复。
- en: This will start training a deep neural network to predict the category of an
image from the CIFAR-10 dataset. The training process works as follows.
+ id: totrans-149
prefs: []
type: TYPE_NORMAL
+ zh: 这将开始训练一个深度神经网络,以预测来自CIFAR-10数据集的图像的类别。训练过程如下。
- en: First, the weights of the network are initialized to small random values. Then
the network performs a series of training steps. At each training step, one *batch*
of images is passed through the network and the errors are backpropagated to update
the weights. The `batch_size` determines how many images are in each training
step batch. The larger the batch size, the more stable the gradient calculation,
but the slower each training step.
+ id: totrans-150
prefs: []
type: TYPE_NORMAL
+ zh: 首先,网络的权重被初始化为小的随机值。然后网络执行一系列训练步骤。在每个训练步骤中,通过网络传递一个 *batch* 图像,并将错误反向传播以更新权重。`batch_size`
+ 确定每个训练步骤批次中有多少图像。批量大小越大,梯度计算越稳定,但每个训练步骤越慢。
- en: Tip
+ id: totrans-151
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 提示
- en: It would be far too time-consuming and computationally intensive to use the
entire dataset to calculate the gradient at each training step, so generally a
batch size between 32 and 256 is used. It is also now recommended practice to
increase the batch size as training progresses.^([4](ch02.xhtml#idm45387032068928))
+ id: totrans-152
prefs: []
type: TYPE_NORMAL
+ zh: 使用整个数据集在每个训练步骤中计算梯度将耗费太多时间和计算资源,因此通常使用32到256之间的批量大小。现在推荐的做法是随着训练的进行增加批量大小。^([4](ch02.xhtml#idm45387032068928))
- en: This continues until all observations in the dataset have been seen once. This
completes the first *epoch*. The data is then passed through the network again
in batches as part of the second epoch. This process repeats until the specified
number of epochs have elapsed.
+ id: totrans-153
prefs: []
type: TYPE_NORMAL
+ zh: 这将持续到数据集中的所有观察值都被看到一次。这完成了第一个 *epoch*。然后数据再次以批次的形式通过网络,作为第二个epoch的一部分。这个过程重复,直到指定的epoch数已经过去。
- en: During training, Keras outputs the progress of the procedure, as shown in [Figure 2-7](#first_nn_fit).
We can see that the training dataset has been split into 1,563 batches (each containing
32 images) and it has been shown to the network 10 times (i.e., over 10 epochs),
at a rate of approximately 2 milliseconds per batch. The categorical cross-entropy
loss has fallen from 1.8377 to 1.3696, resulting in an accuracy increase from
33.69% after the first epoch to 51.67% after the tenth epoch.
+ id: totrans-154
prefs: []
type: TYPE_NORMAL
+ zh: 在训练过程中,Keras会输出过程的进展,如[图2-7](#first_nn_fit)所示。我们可以看到训练数据集已经被分成了1,563批次(每批包含32张图片),并且已经被展示给网络10次(即10个epochs),每批大约需要2毫秒的时间。分类交叉熵损失从1.8377下降到1.3696,导致准确率从第一个epoch后的33.69%增加到第十个epoch后的51.67%。
- en: '![](Images/gdl2_0207.png)'
+ id: totrans-155
prefs: []
type: TYPE_IMG
+ zh: '![](Images/gdl2_0207.png)'
- en: Figure 2-7\. The output from the `fit` method
+ id: totrans-156
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 图2-7\. `fit` 方法的输出
- en: Evaluating the Model
+ id: totrans-157
prefs:
- PREF_H2
type: TYPE_NORMAL
+ zh: 评估模型
- en: We know the model achieves an accuracy of 51.9% on the training set, but how
does it perform on data it has never seen?
+ id: totrans-158
prefs: []
type: TYPE_NORMAL
+ zh: 我们知道模型在训练集上的准确率为51.9%,但它在从未见过的数据上表现如何?
- en: To answer this question we can use the `evaluate` method provided by Keras,
as shown in [Example 2-9](#evaluate-mlp).
+ id: totrans-159
prefs: []
type: TYPE_NORMAL
+ zh: 为了回答这个问题,我们可以使用Keras提供的`evaluate`方法,如[示例2-9](#evaluate-mlp)所示。
- en: Example 2-9\. Evaluating the model performance on the test set
+ id: totrans-160
prefs:
- PREF_H5
type: TYPE_NORMAL
+ zh: 示例2-9。在测试集上评估模型性能
- en: '[PRE8]'
+ id: totrans-161
prefs: []
type: TYPE_PRE
+ zh: '[PRE8]'
- en: '[Figure 2-8](#first_nn_evaluate) shows the output from this method.'
+ id: totrans-162
prefs: []
type: TYPE_NORMAL
+ zh: '[图2-8](#first_nn_evaluate)显示了这种方法的输出。'
- en: '![](Images/gdl2_0208.png)'
+ id: totrans-163
prefs: []
type: TYPE_IMG
+ zh: '![](Images/gdl2_0208.png)'
- en: Figure 2-8\. The output from the `evaluate` method
+ id: totrans-164
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 图2-8。`evaluate`方法的输出
- en: 'The output is a list of the metrics we are monitoring: categorical cross-entropy
and accuracy. We can see that model accuracy is still 49.0% even on images that
it has never seen before. Note that if the model were guessing randomly, it would
achieve approximately 10% accuracy (because there are 10 classes), so 49.0% is
a good result, given that we have used a very basic neural network.'
+ id: totrans-165
prefs: []
type: TYPE_NORMAL
+ zh: 输出是我们正在监控的指标列表:分类交叉熵和准确率。我们可以看到,即使在它从未见过的图像上,模型的准确率仍然是49.0%。请注意,如果模型是随机猜测的,它将达到大约10%的准确率(因为有10个类别),因此49.0%是一个很好的结果,考虑到我们使用了一个非常基本的神经网络。
- en: We can view some of the predictions on the test set using the `predict` method,
as shown in [Example 2-10](#predict-mlp).
+ id: totrans-166
prefs: []
type: TYPE_NORMAL
+ zh: 我们可以使用`predict`方法查看测试集上的一些预测,如[示例2-10](#predict-mlp)所示。
- en: Example 2-10\. Viewing predictions on the test set using the `predict` method
+ id: totrans-167
prefs:
- PREF_H5
type: TYPE_NORMAL
+ zh: 示例2-10。使用`predict`方法查看测试集上的预测
- en: '[PRE9]'
+ id: totrans-168
prefs: []
type: TYPE_PRE
+ zh: '[PRE9]'
- en: '[![1](Images/1.png)](#co_deep_learning_CO3-1)'
+ id: totrans-169
prefs: []
type: TYPE_NORMAL
+ zh: '[![1](Images/1.png)](#co_deep_learning_CO3-1)'
- en: '`preds` is an array of shape `[10000, 10]`—i.e., a vector of 10 class probabilities
for each observation.'
+ id: totrans-170
prefs: []
type: TYPE_NORMAL
+ zh: '`preds`是一个形状为`[10000, 10]`的数组,即每个观测的10个类别概率的向量。'
- en: '[![2](Images/2.png)](#co_deep_learning_CO3-2)'
+ id: totrans-171
prefs: []
type: TYPE_NORMAL
+ zh: '[![2](Images/2.png)](#co_deep_learning_CO3-2)'
- en: We convert this array of probabilities back into a single prediction using `numpy`’s
`argmax` function. Here, `axis = –1` tells the function to collapse the array
over the last dimension (the classes dimension), so that the shape of `preds_single`
is then `[10000, 1]`.
+ id: totrans-172
prefs: []
type: TYPE_NORMAL
+ zh: 我们将这个概率数组转换回一个单一的预测,使用`numpy`的`argmax`函数。这里,`axis = -1`告诉函数将数组折叠到最后一个维度(类别维度),因此`preds_single`的形状为`[10000,
+ 1]`。
- en: We can view some of the images alongside their labels and predictions with the
code in [Example 2-11](#display-mlp). As expected, around half are correct.
+ id: totrans-173
prefs: []
type: TYPE_NORMAL
+ zh: 我们可以使用[示例2-11](#display-mlp)中的代码查看一些图像以及它们的标签和预测。如预期的那样,大约一半是正确的。
- en: Example 2-11\. Displaying predictions of the MLP against the actual labels
+ id: totrans-174
prefs:
- PREF_H5
type: TYPE_NORMAL
+ zh: 示例2-11。显示MLP的预测与实际标签
- en: '[PRE10]'
+ id: totrans-175
prefs: []
type: TYPE_PRE
+ zh: '[PRE10]'
- en: '[Figure 2-9](#first_nn_preds) shows a randomly chosen selection of predictions
made by the model, alongside the true labels.'
+ id: totrans-176
prefs: []
type: TYPE_NORMAL
+ zh: '[图2-9](#first_nn_preds)显示了模型随机选择的一些预测,以及真实标签。'
- en: '![](Images/gdl2_0209.png)'
+ id: totrans-177
prefs: []
type: TYPE_IMG
+ zh: '![](Images/gdl2_0209.png)'
- en: Figure 2-9\. Some predictions made by the model, alongside the actual labels
+ id: totrans-178
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 图2-9。模型进行的一些预测,以及实际标签
- en: Congratulations! You’ve just built a multilayer perceptron using Keras and used
it to make predictions on new data. Even though this is a supervised learning
problem, when we come to building generative models in future chapters many of
the core ideas from this chapter (such as loss functions, activation functions,
and understanding layer shapes) will still be extremely important. Next we’ll
look at ways of improving this model, by introducing a few new layer types.
+ id: totrans-179
prefs: []
type: TYPE_NORMAL
+ zh: 恭喜!您刚刚使用Keras构建了一个多层感知器,并用它对新数据进行了预测。即使这是一个监督学习问题,但当我们在未来的章节中构建生成模型时,本章的许多核心思想(如损失函数、激活函数和理解层形状)仍然非常重要。接下来,我们将探讨通过引入一些新的层类型来改进这个模型的方法。
- en: Convolutional Neural Network (CNN)
+ id: totrans-180
prefs:
- PREF_H1
type: TYPE_NORMAL
+ zh: 卷积神经网络(CNN)
- en: One of the reasons our network isn’t yet performing as well as it might is because
there isn’t anything in the network that takes into account the spatial structure
of the input images. In fact, our first step is to flatten the image into a single
vector, so that we can pass it to the first `Dense` layer!
+ id: totrans-181
prefs: []
type: TYPE_NORMAL
+ zh: 我们的网络尚未表现得像它可能表现得那样好的原因之一是网络中没有考虑输入图像的空间结构。事实上,我们的第一步是将图像展平为一个单一向量,以便我们可以将其传递给第一个`Dense`层!
- en: To achieve this we need to use a *convolutional layer*.
+ id: totrans-182
prefs: []
type: TYPE_NORMAL
+ zh: 为了实现这一点,我们需要使用*卷积层*。
- en: Convolutional Layers
+ id: totrans-183
prefs:
- PREF_H2
type: TYPE_NORMAL
+ zh: 卷积层
- en: First, we need to understand what is meant by a *convolution* in the context
of deep learning.
+ id: totrans-184
prefs: []
type: TYPE_NORMAL
+ zh: 首先,我们需要了解在深度学习背景下*卷积*的含义。
- en: '[Figure 2-10](#simple_conv) shows two different 3 × 3 × 1 portions of a grayscale
image being convoluted with a 3 × 3 × 1 *filter* (or *kernel*). The convolution
is performed by multiplying the filter pixelwise with the portion of the image,
@@ -915,104 +1303,147 @@
the inverse of the filter. The top example resonates strongly with the filter,
so it produces a large positive value. The bottom example does not resonate much
with the filter, so it produces a value near zero.'
+ id: totrans-185
prefs: []
type: TYPE_NORMAL
+ zh: '[图2-10](#simple_conv)显示了一个灰度图像的两个不同的3×3×1部分,与一个3×3×1*滤波器*(或*核心*)进行卷积。卷积是通过将滤波器逐像素地与图像部分相乘,并将结果求和来执行的。当图像部分与滤波器紧密匹配时,输出更为正向,当图像部分与滤波器的反向匹配时,输出更为负向。顶部示例与滤波器强烈共振,因此产生一个较大的正值。底部示例与滤波器的共振不大,因此产生一个接近零的值。'
- en: '![](Images/gdl2_0210.png)'
+ id: totrans-186
prefs: []
type: TYPE_IMG
+ zh: '![](Images/gdl2_0210.png)'
- en: Figure 2-10\. A 3 × 3 convolutional filter applied to two portions of a grayscale
image
+ id: totrans-187
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 图2-10。应用于灰度图像两个部分的3×3卷积滤波器
- en: If we move the filter across the entire image from left to right and top to
bottom, recording the convolutional output as we go, we obtain a new array that
picks out a particular feature of the input, depending on the values in the filter.
For example, [Figure 2-11](#conv_layer_2d) shows two different filters that highlight
horizontal and vertical edges.
+ id: totrans-188
prefs: []
type: TYPE_NORMAL
+ zh: 如果我们将滤波器从左到右和从上到下移动到整个图像上,并记录卷积输出,我们将获得一个新的数组,根据滤波器中的值选择输入的特定特征。例如,图2-11显示了突出显示水平和垂直边缘的两个不同滤波器。
- en: Running the Code for This Example
+ id: totrans-189
prefs:
- PREF_H1
type: TYPE_NORMAL
+ zh: 运行此示例的代码
- en: You can see this convolutional process worked through manually in the Jupyter
notebook located at *notebooks/02_deeplearning/02_cnn/convolutions.ipynb* in the
book repository.
+ id: totrans-190
prefs: []
type: TYPE_NORMAL
+ zh: 您可以在位于书籍存储库中的*notebooks/02_deeplearning/02_cnn/convolutions.ipynb*的Jupyter笔记本中手动查看这个卷积过程。
- en: '![](Images/gdl2_0211.png)'
+ id: totrans-191
prefs: []
type: TYPE_IMG
+ zh: '![](Images/gdl2_0211.png)'
- en: Figure 2-11\. Two convolutional filters applied to a grayscale image
+ id: totrans-192
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 图2-11。应用于灰度图像的两个卷积滤波器
- en: A convolutional layer is simply a collection of filters, where the values stored
in the filters are the weights that are learned by the neural network through
training. Initially these are random, but gradually the filters adapt their weights
to start picking out interesting features such as edges or particular color combinations.
+ id: totrans-193
prefs: []
type: TYPE_NORMAL
+ zh: 卷积层只是一组滤波器,其中存储在滤波器中的值是通过训练的神经网络学习的权重。最初这些是随机的,但逐渐滤波器调整它们的权重以开始选择有趣的特征,如边缘或特定的颜色组合。
- en: In Keras, the `Conv2D` layer applies convolutions to an input tensor with two
spatial dimensions (such as an image). For example, the code shown in [Example 2-12](#conv-layer)
builds a convolutional layer with two filters, to match the example in [Figure 2-11](#conv_layer_2d).
+ id: totrans-194
prefs: []
type: TYPE_NORMAL
+ zh: 在Keras中,`Conv2D`层将卷积应用于具有两个空间维度(如图像)的输入张量。例如,[示例2-12](#conv-layer)中显示的代码构建了一个具有两个滤波器的卷积层,以匹配[图2-11](#conv_layer_2d)中的示例。
- en: Example 2-12\. A `Conv2D` layer applied to grayscale input images
+ id: totrans-195
prefs:
- PREF_H5
type: TYPE_NORMAL
+ zh: 示例2-12。应用于灰度输入图像的`Conv2D`层
- en: '[PRE11]'
+ id: totrans-196
prefs: []
type: TYPE_PRE
+ zh: '[PRE11]'
- en: Next, let’s look at two of the arguments to the `Conv2D` layer in more detail—`strides`
and `padding`.
+ id: totrans-197
prefs: []
type: TYPE_NORMAL
+ zh: 接下来,让我们更详细地看一下`Conv2D`层的两个参数——`strides`和`padding`。
- en: Stride
+ id: totrans-198
prefs:
- PREF_H3
type: TYPE_NORMAL
+ zh: 步幅
- en: The `strides` parameter is the step size used by the layer to move the filters
across the input. Increasing the stride therefore reduces the size of the output
tensor. For example, when `strides = 2`, the height and width of the output tensor
will be half the size of the input tensor. This is useful for reducing the spatial
size of the tensor as it passes through the network, while increasing the number
of channels.
+ id: totrans-199
prefs: []
type: TYPE_NORMAL
+ zh: '`strides`参数是层用来在输入上移动滤波器的步长。增加步长会减小输出张量的大小。例如,当`strides = 2`时,输出张量的高度和宽度将是输入张量大小的一半。这对于通过网络传递时减小张量的空间大小,同时增加通道数量是有用的。'
- en: Padding
+ id: totrans-200
prefs:
- PREF_H3
type: TYPE_NORMAL
+ zh: 填充
- en: The `padding = "same"` input parameter pads the input data with zeros so that
the output size from the layer is exactly the same as the input size when `strides
= 1`.
+ id: totrans-201
prefs: []
type: TYPE_NORMAL
+ zh: '`padding = "same"`输入参数使用零填充输入数据,以便当`strides = 1`时,从层的输出大小与输入大小完全相同。'
- en: '[Figure 2-12](#padding_example) shows a 3 × 3 kernel being passed over a 5
× 5 input image, with `padding = "same"` and `strides = 1`. The output size from
this convolutional layer would also be 5 × 5, as the padding allows the kernel
to extend over the edge of the image, so that it fits five times in both directions.
Without padding, the kernel could only fit three times along each direction, giving
an output size of 3 × 3.'
+ id: totrans-202
prefs: []
type: TYPE_NORMAL
+ zh: 图2-12显示了一个3×3的卷积核在一个5×5的输入图像上进行传递,其中`padding = "same"`和`strides = 1`。这个卷积层的输出大小也将是5×5,因为填充允许卷积核延伸到图像的边缘,使其在两个方向上都适合五次。没有填充,卷积核只能在每个方向上适合三次,从而给出一个3×3的输出大小。
- en: '![](Images/gdl2_0212.png)'
+ id: totrans-203
prefs: []
type: TYPE_IMG
+ zh: '![](Images/gdl2_0212.png)'
- en: 'Figure 2-12\. A 3 × 3 × 1 kernel (gray) being passed over a 5 × 5 × 1 input
image (blue), with `padding = "same"` and `strides = 1`, to generate the 5 × 5
× 1 output (green) (source: [Dumoulin and Visin, 2018](https://arxiv.org/abs/1603.07285))^([5](ch02.xhtml#idm45387031545152))'
+ id: totrans-204
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 图2-12。一个3×3×1的卷积核(灰色)在一个5×5×1的输入图像(蓝色)上进行传递,其中`padding = "same"`和`strides =
+ 1`,生成5×5×1的输出(绿色)(来源:Dumoulin和Visin,2018)
- en: 'Setting `padding = "same"` is a good way to ensure that you are able to easily
keep track of the size of the tensor as it passes through many convolutional layers.
The shape of the output from a convolutional layer with `padding = "same"` is:'
+ id: totrans-205
prefs: []
type: TYPE_NORMAL
+ zh: 设置`padding = "same"`是一种确保您能够轻松跟踪张量大小的好方法,因为它通过许多卷积层时。具有`padding = "same"`的卷积层的输出形状是:
- en:
+ id: totrans-206
prefs: []
type: TYPE_NORMAL
+ zh:
- en: Stacking convolutional layers
+ id: totrans-207
prefs:
- PREF_H3
type: TYPE_NORMAL
+ zh: 堆叠卷积层
- en: The output of a `Conv2D` layer is another four-dimensional tensor, now of shape
`(batch_size, height, width, filters)`, so we can stack `Conv2D` layers on top
of each other to grow the depth of our neural network and make it more powerful.
To demonstrate this, let’s imagine we are applying `Conv2D` layers to the CIFAR-10
dataset and wish to predict the label of a given image. Note that this time, instead
of one input channel (grayscale) we have three (red, green, and blue).
+ id: totrans-208
prefs: []
type: TYPE_NORMAL
+ zh: '`Conv2D`层的输出是另一个四维张量,现在的形状是`(batch_size, height, width, filters)`,因此我们可以将`Conv2D`层堆叠在一起,以增加神经网络的深度并使其更强大。为了演示这一点,让我们想象我们正在将`Conv2D`层应用于CIFAR-10数据集,并希望预测给定图像的标签。请注意,这一次,我们不是一个输入通道(灰度),而是三个(红色、绿色和蓝色)。'
- en: '[Example 2-13](#conv-network) shows how to build a simple convolutional neural
network that we could train to succeed at this task.'
+ id: totrans-209
prefs: []
type: TYPE_NORMAL
+ zh: '[示例2-13](#conv-network)展示了如何构建一个简单的卷积神经网络,我们可以训练它成功完成这项任务。'
- en: Example 2-13\. Code to build a convolutional neural network model using Keras
+ id: totrans-210
prefs:
- PREF_H5
type: TYPE_NORMAL
+ zh: 示例2-13。使用Keras构建卷积神经网络模型的代码
- en: '[PRE12]'
+ id: totrans-211
prefs: []
type: TYPE_PRE
+ zh: '[PRE12]'
- en: This code corresponds to the diagram shown in [Figure 2-13](#conv_2d_complex).
+ id: totrans-212
prefs: []
type: TYPE_NORMAL
+ zh: 这段代码对应于[图2-13](#conv_2d_complex)中显示的图表。
- en: '![](Images/gdl2_0213.png)'
+ id: totrans-213
prefs: []
type: TYPE_IMG
+ zh: '![](Images/gdl2_0213.png)'
- en: Figure 2-13\. A diagram of a convolutional neural network
+ id: totrans-214
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 图2-13。卷积神经网络的图表
- en: Note that now that we are working with color images, each filter in the first
convolutional layer has a depth of 3 rather than 1 (i.e., each filter has shape
4 × 4 × 3, rather than 4 × 4 × 1). This is to match the three channels (red, green,
blue) of the input image. The same idea applies to the filters in the second convolutional
layer that have a depth of 10, to match the 10 channels output by the first convolutional
layer.
+ id: totrans-215
prefs: []
type: TYPE_NORMAL
+ zh: 请注意,现在我们正在处理彩色图像,第一个卷积层中的每个滤波器的深度为3,而不是1(即每个滤波器的形状为4×4×3,而不是4×4×1)。这是为了匹配输入图像的三个通道(红色、绿色、蓝色)。同样的想法也适用于第二个卷积层中的深度为10的滤波器,以匹配第一个卷积层输出的10个通道。
- en: Tip
+ id: totrans-216
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 提示
- en: In general, the depth of the filters in a layer is always equal to the number
of channels output by the preceding layer.
+ id: totrans-217
prefs: []
type: TYPE_NORMAL
+ zh: 一般来说,层中滤波器的深度总是等于前一层输出的通道数。
- en: Inspecting the model
+ id: totrans-218
prefs:
- PREF_H3
type: TYPE_NORMAL
+ zh: 检查模型
- en: It’s really informative to look at how the shape of the tensor changes as data
flows through from one convolutional layer to the next. We can use the `model.summary()`
method to inspect the shape of the tensor as it passes through the network ([Table 2-2](#conv_net_example_summary)).
+ id: totrans-219
prefs: []
type: TYPE_NORMAL
+ zh: 从一个卷积层到下一个卷积层,数据流经过时张量形状如何变化真的很有启发性。我们可以使用`model.summary()`方法检查张量在网络中传递时的形状([表2-2](#conv_net_example_summary))。
- en: Table 2-2\. CNN model summary
+ id: totrans-220
prefs: []
type: TYPE_NORMAL
+ zh: 表2-2\. CNN模型摘要
- en: '| Layer (type) | Output shape | Param # |'
+ id: totrans-221
prefs: []
type: TYPE_TB
+ zh: '| 层(类型) | 输出形状 | 参数数量 |'
- en: '| --- | --- | --- |'
+ id: totrans-222
prefs: []
type: TYPE_TB
+ zh: '| --- | --- | --- |'
- en: '| InputLayer | (None, 32, 32, 3) | 0 |'
+ id: totrans-223
prefs: []
type: TYPE_TB
+ zh: '| 输入层 | (None, 32, 32, 3) | 0 |'
- en: '| Conv2D | (None, 16, 16, 10) | 490 |'
+ id: totrans-224
prefs: []
type: TYPE_TB
+ zh: '| Conv2D | (None, 16, 16, 10) | 490 |'
- en: '| Conv2D | (None, 8, 8, 20) | 1,820 |'
+ id: totrans-225
prefs: []
type: TYPE_TB
+ zh: '| Conv2D | (None, 8, 8, 20) | 1,820 |'
- en: '| Flatten | (None, 1280) | 0 |'
+ id: totrans-226
prefs: []
type: TYPE_TB
+ zh: '| Flatten | (None, 1280) | 0 |'
- en: '| Dense | (None, 10) | 12,810 |'
+ id: totrans-227
prefs: []
type: TYPE_TB
+ zh: '| Dense | (None, 10) | 12,810 |'
- en: '| Total params | 15,120 |'
+ id: totrans-228
prefs: []
type: TYPE_TB
+ zh: '| 总参数 | 15,120 |'
- en: '| Trainable params | 15,120 |'
+ id: totrans-229
prefs: []
type: TYPE_TB
+ zh: '| 可训练参数 | 15,120 |'
- en: '| Non-trainable params | 0 |'
+ id: totrans-230
prefs: []
type: TYPE_TB
+ zh: '| 不可训练参数 | 0 |'
- en: 'Let’s walk through our network layer by layer, noting the shape of the tensor
as we go:'
+ id: totrans-231
prefs: []
type: TYPE_NORMAL
+ zh: 让我们逐层走过我们的网络,注意张量的形状:
- en: The input shape is `(None, 32, 32, 3)`—Keras uses `None` to represent the fact
that we can pass any number of images through the network simultaneously. Since
the network is just performing tensor algebra, we don’t need to pass images through
the network individually, but instead can pass them through together as a batch.
+ id: totrans-232
prefs:
- PREF_OL
type: TYPE_NORMAL
+ zh: 输入形状为`(None, 32, 32, 3)`—Keras使用`None`表示我们可以同时通过网络传递任意数量的图像。由于网络只是执行张量代数运算,我们不需要单独通过网络传递图像,而是可以一起作为批次传递它们。
- en: The shape of each of the 10 filters in the first convolutional layer is 4 ×
4 × 3\. This is because we have chosen each filter to have a height and width
of 4 (`kernel_size = (4,4)`) and there are three channels in the preceding layer
@@ -1136,106 +1628,141 @@
and height of the output are both halved to 16, and since there are 10 filters
the output of the first layer is a batch of tensors each having shape `[16, 16,
10]`.
+ id: totrans-233
prefs:
- PREF_OL
type: TYPE_NORMAL
+ zh: 第一个卷积层中每个滤波器的形状是4×4×3。这是因为我们选择每个滤波器的高度和宽度为4(`kernel_size=(4,4)`),并且在前一层中有三个通道(红色、绿色和蓝色)。因此,该层中的参数(或权重)数量为(4×4×3+1)×10=490,其中+1是由于每个滤波器附加了一个偏置项。每个滤波器的输出将是滤波器权重和它所覆盖的图像的4×4×3部分的逐像素乘积。由于`strides=2`和`padding="same"`,输出的宽度和高度都减半为16,由于有10个滤波器,第一层的输出是一批张量,每个张量的形状为`[16,16,10]`。
- en: In the second convolutional layer, we choose the filters to be 3 × 3 and they
now have depth 10, to match the number of channels in the previous layer. Since
there are 20 filters in this layer, this gives a total number of parameters (weights)
of (3 × 3 × 10 + 1) × 20 = 1,820\. Again, we use `strides = 2 and` `padding =
"same"`, so the width and height both halve. This gives us an overall output shape
of `(None, 8, 8, 20)`.
+ id: totrans-234
prefs:
- PREF_OL
type: TYPE_NORMAL
+ zh: 在第二个卷积层中,我们选择滤波器为3×3,它们现在的深度为10,以匹配前一层中的通道数。由于这一层中有20个滤波器,这给出了总参数(权重)数量为(3×3×10+1)×20=1,820。同样,我们使用`strides=2`和`padding="same"`,所以宽度和高度都减半。这给出了一个总体输出形状为`(None,
+ 8, 8, 20)`。
- en: We now flatten the tensor using the Keras `Flatten` layer. This results in a
set of 8 × 8 × 20 = 1,280 units. Note that there are no parameters to learn in
a `Flatten` layer as the operation is just a restructuring of the tensor.
+ id: totrans-235
prefs:
- PREF_OL
type: TYPE_NORMAL
+ zh: 现在我们使用Keras的`Flatten`层展平张量。这会产生一组8×8×20=1,280个单元。请注意,在`Flatten`层中没有需要学习的参数,因为该操作只是对张量进行重组。
- en: We finally connect these units to a 10-unit `Dense` layer with softmax activation,
which represents the probability of each category in a 10-category classification
task. This creates an extra 1,280 × 10 = 12,810 parameters (weights) to learn.
+ id: totrans-236
prefs:
- PREF_OL
type: TYPE_NORMAL
+ zh: 最后,我们将这些单元连接到一个具有softmax激活函数的10单元`Dense`层,表示10类分类任务中每个类别的概率。这会创建额外的1,280×10=12,810个参数(权重)需要学习。
- en: 'This example demonstrates how we can chain convolutional layers together to
create a convolutional neural network. Before we see how this compares in accuracy
to our densely connected neural network, we’ll examine two more techniques that
can also improve performance: batch normalization and dropout.'
+ id: totrans-237
prefs: []
type: TYPE_NORMAL
+ zh: 这个例子演示了如何将卷积层链接在一起创建卷积神经网络。在我们看到这与我们密集连接的神经网络在准确性上的比较之前,我们将研究另外两种也可以提高性能的技术:批量归一化和dropout。
- en: Batch Normalization
+ id: totrans-238
prefs:
- PREF_H2
type: TYPE_NORMAL
+ zh: 批量归一化
- en: One common problem when training a deep neural network is ensuring that the
weights of the network remain within a reasonable range of values—if they start
to become too large, this is a sign that your network is suffering from what is
known as the *exploding gradient* problem. As errors are propagated backward through
the network, the calculation of the gradient in the earlier layers can sometimes
grow exponentially large, causing wild fluctuations in the weight values.
+ id: totrans-239
prefs: []
type: TYPE_NORMAL
+ zh: 训练深度神经网络时的一个常见问题是确保网络的权重保持在合理范围内的数值范围内 - 如果它们开始变得过大,这表明您的网络正在遭受所谓的*梯度爆炸*问题。当错误向后传播通过网络时,早期层中梯度的计算有时可能会呈指数增长,导致权重值出现剧烈波动。
- en: Warning
+ id: totrans-240
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 警告
- en: If your loss function starts to return `NaN`, chances are that your weights
have grown large enough to cause an overflow error.
+ id: totrans-241
prefs: []
type: TYPE_NORMAL
+ zh: 如果您的损失函数开始返回`NaN`,那么很有可能是您的权重已经变得足够大,导致溢出错误。
- en: This doesn’t necessarily happen immediately as you start training the network.
Sometimes it can be happily training for hours when suddenly the loss function
returns `NaN` and your network has exploded. This can be incredibly annoying.
To prevent it from happening, you need to understand the root cause of the exploding
gradient problem.
+ id: totrans-242
prefs: []
type: TYPE_NORMAL
+ zh: 这并不一定会立即发生在您开始训练网络时。有时候,它可能在几个小时内愉快地训练,突然损失函数返回`NaN`,您的网络就爆炸了。这可能非常恼人。为了防止这种情况发生,您需要了解梯度爆炸问题的根本原因。
- en: Covariate shift
+ id: totrans-243
prefs:
- PREF_H3
type: TYPE_NORMAL
+ zh: 协变量转移
- en: One of the reasons for scaling input data to a neural network is to ensure a
stable start to training over the first few iterations. Since the weights of the
network are initially randomized, unscaled input could potentially create huge
activation values that immediately lead to exploding gradients. For example, instead
of passing pixel values from 0–255 into the input layer, we usually scale these
values to between –1 and 1.
+ id: totrans-244
prefs: []
type: TYPE_NORMAL
+ zh: 将输入数据缩放到神经网络的一个原因是确保在前几次迭代中稳定地开始训练。由于网络的权重最初是随机化的,未缩放的输入可能会导致立即产生激活值过大,从而导致梯度爆炸。例如,我们通常将像素值从0-255传递到输入层,而不是将这些值缩放到-1到1之间。
- en: Because the input is scaled, it’s natural to expect the activations from all
future layers to be relatively well scaled as well. Initially this may be true,
but as the network trains and the weights move further away from their random
initial values, this assumption can start to break down. This phenomenon is known
as *covariate shift*.
+ id: totrans-245
prefs: []
type: TYPE_NORMAL
+ zh: 因为输入被缩放,自然地期望未来所有层的激活也相对缩放。最初可能是正确的,但随着网络训练和权重远离其随机初始值,这个假设可能开始破裂。这种现象被称为*协变量转移*。
- en: Covariate Shift Analogy
+ id: totrans-246
prefs:
- PREF_H1
type: TYPE_NORMAL
+ zh: 协变量转移类比
- en: Imagine you’re carrying a tall pile of books, and you get hit by a gust of wind.
You move the books in a direction opposite to the wind to compensate, but as you
do so, some of the books shift, so that the tower is slightly more unstable than
before. Initially, this is OK, but with every gust the pile becomes more and more
unstable, until eventually the books have shifted so much that the pile collapses.
This is covariate shift.
+ id: totrans-247
prefs: []
type: TYPE_NORMAL
+ zh: 想象一下,你正拿着一摞高高的书,突然被一阵风吹袭。你将书向与风相反的方向移动以补偿,但在这样做的过程中,一些书会移动,使得整个塔比以前稍微不稳定。最初,这没关系,但随着每阵风,这摞书变得越来越不稳定,直到最终书移动得太多,整摞书倒塌。这就是协变量转移。
- en: Relating this to neural networks, each layer is like a book in the pile. To
remain stable, when the network updates the weights, each layer implicitly assumes
that the distribution of its input from the layer beneath is approximately consistent
across iterations. However, since there is nothing to stop any of the activation
distributions shifting significantly in a certain direction, this can sometimes
lead to runaway weight values and an overall collapse of the network.
+ id: totrans-248
prefs: []
type: TYPE_NORMAL
+ zh: 将这与神经网络联系起来,每一层就像堆叠中的一本书。为了保持稳定,当网络更新权重时,每一层都隐含地假设其来自下一层的输入分布在迭代中大致保持一致。然而,由于没有任何东西可以阻止任何激活分布在某个方向上发生显着变化,这有时会导致权重值失控和网络整体崩溃。
- en: Training using batch normalization
+ id: totrans-249
prefs:
- PREF_H3
type: TYPE_NORMAL
+ zh: 使用批量归一化进行训练
- en: '*Batch normalization* is a technique that drastically reduces this problem.
The solution is surprisingly simple. During training, a batch normalization layer
calculates the mean and standard deviation of each of its input channels across
@@ -1243,41 +1770,57 @@
deviation. There are then two learned parameters for each channel, the scale (gamma)
and shift (beta). The output is simply the normalized input, scaled by gamma and
shifted by beta. [Figure 2-14](#batch_norm) shows the whole process.'
+ id: totrans-250
prefs: []
type: TYPE_NORMAL
+ zh: '*批量归一化*是一种极大地减少这个问题的技术。解决方案出奇地简单。在训练期间,批量归一化层计算每个输入通道在批处理中的均值和标准差,并通过减去均值并除以标准差来进行归一化。然后,每个通道有两个学习参数,即缩放(gamma)和移位(beta)。输出只是归一化的输入,由gamma缩放并由beta移位。[图2-14](#batch_norm)展示了整个过程。'
- en: '![](Images/gdl2_0214.png)'
+ id: totrans-251
prefs: []
type: TYPE_IMG
+ zh: '![](Images/gdl2_0214.png)'
- en: 'Figure 2-14\. The batch normalization process (source: [Ioffe and Szegedy,
2015](https://arxiv.org/abs/1502.03167))^([6](ch02.xhtml#idm45387025136368))'
+ id: totrans-252
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 图2-14。批量归一化过程(来源:[Ioffe and Szegedy, 2015](https://arxiv.org/abs/1502.03167))^([6](ch02.xhtml#idm45387025136368))
- en: We can place batch normalization layers after dense or convolutional layers
to normalize the output.
+ id: totrans-253
prefs: []
type: TYPE_NORMAL
+ zh: 我们可以在密集层或卷积层之后放置批量归一化层来归一化输出。
- en: Tip
+ id: totrans-254
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 提示
- en: Referring to our previous example, it’s a bit like connecting the layers of
books with small sets of adjustable springs that ensure there aren’t any overall
huge shifts in their positions over time.
+ id: totrans-255
prefs: []
type: TYPE_NORMAL
+ zh: 参考我们之前的例子,这有点像用一小组可调节弹簧连接书层,以确保它们的位置随时间不会发生明显的整体移动。
- en: Prediction using batch normalization
+ id: totrans-256
prefs:
- PREF_H3
type: TYPE_NORMAL
+ zh: 使用批量归一化进行预测
- en: You might be wondering how this layer works at prediction time. When it comes
to prediction, we may only want to predict a single observation, so there is no
*batch* over which to calculate the mean and standard deviation. To get around
this problem, during training a batch normalization layer also calculates the
moving average of the mean and standard deviation of each channel and stores this
value as part of the layer to use at test time.
+ id: totrans-257
prefs: []
type: TYPE_NORMAL
+ zh: 您可能想知道这个层在预测时是如何工作的。在预测时,我们可能只想预测单个观测值,因此没有*批次*可以计算平均值和标准差。为了解决这个问题,在训练期间,批归一化层还会计算每个通道的平均值和标准差的移动平均值,并将这个值作为该层的一部分存储起来,以便在测试时使用。
- en: 'How many parameters are contained within a batch normalization layer? For every
channel in the preceding layer, two weights need to be learned: the scale (gamma)
and shift (beta). These are the *trainable* parameters. The moving average and
@@ -1286,27 +1829,39 @@
backpropagation, they are called *nontrainable* parameters. In total, this gives
four parameters for each channel in the preceding layer, where two are trainable
and two are nontrainable.'
+ id: totrans-258
prefs: []
type: TYPE_NORMAL
+ zh: 批归一化层中包含多少参数?对于前一层中的每个通道,需要学习两个权重:比例(gamma)和偏移(beta)。这些是*可训练*参数。移动平均值和标准差也需要针对每个通道进行计算,但由于它们是从通过该层的数据派生而来,而不是通过反向传播进行训练,因此被称为*不可训练*参数。总共,这为前一层中的每个通道提供了四个参数,其中两个是可训练的,两个是不可训练的。
- en: In Keras, the `BatchNormalization` layer implements the batch normalization
functionality, as shown in [Example 2-14](#batchnorm-layer).
+ id: totrans-259
prefs: []
type: TYPE_NORMAL
+ zh: 在Keras中,`BatchNormalization`层实现了批归一化功能,如[例2-14](#batchnorm-layer)所示。
- en: Example 2-14\. A `BatchNormalization` layer in Keras
+ id: totrans-260
prefs:
- PREF_H5
type: TYPE_NORMAL
+ zh: 例2-14\. Keras中的`BatchNormalization`层
- en: '[PRE13]'
+ id: totrans-261
prefs: []
type: TYPE_PRE
+ zh: '[PRE13]'
- en: The `momentum` parameter is the weight given to the previous value when calculating
the moving average and moving standard deviation.
+ id: totrans-262
prefs: []
type: TYPE_NORMAL
+ zh: 在计算移动平均值和移动标准差时,`momentum`参数是给予先前值的权重。
- en: Dropout
+ id: totrans-263
prefs:
- PREF_H2
type: TYPE_NORMAL
+ zh: Dropout
- en: When studying for an exam, it is common practice for students to use past papers
and sample questions to improve their knowledge of the subject material. Some
students try to memorize the answers to these questions, but then come unstuck
@@ -1314,53 +1869,71 @@
students use the practice material to further their general understanding, so
that they are still able to answer correctly when faced with new questions that
they haven’t seen before.
+ id: totrans-264
prefs: []
type: TYPE_NORMAL
+ zh: 在备考考试时,学生通常会使用过去的试卷和样题来提高对学科材料的了解。一些学生试图记住这些问题的答案,但在考试中却因为没有真正理解学科内容而失败。最好的学生利用练习材料来进一步提高他们对学科的整体理解,这样当面对以前没有见过的新问题时,他们仍然能够正确回答。
- en: The same principle holds for machine learning. Any successful machine learning
algorithm must ensure that it generalizes to unseen data, rather than simply *remembering*
the training dataset. If an algorithm performs well on the training dataset, but
not the test dataset, we say that it is suffering from *overfitting*. To counteract
this problem, we use *regularization* techniques, which ensure that the model
is penalized if it starts to overfit.
+ id: totrans-265
prefs: []
type: TYPE_NORMAL
+ zh: 相同的原则适用于机器学习。任何成功的机器学习算法必须确保它能泛化到未见过的数据,而不仅仅是*记住*训练数据集。如果一个算法在训练数据集上表现良好,但在测试数据集上表现不佳,我们称其为*过拟合*。为了解决这个问题,我们使用*正则化*技术,确保模型在开始过拟合时受到惩罚。
- en: There are many ways to regularize a machine learning algorithm, but for deep
learning, one of the most common is by using *dropout* layers. This idea was introduced
by Hinton et al. in 2012^([7](ch02.xhtml#idm45387025089232)) and presented in
a 2014 paper by Srivastava et al.^([8](ch02.xhtml#idm45387025086976))
+ id: totrans-266
prefs: []
type: TYPE_NORMAL
+ zh: 有许多方法可以对机器学习算法进行正则化,但对于深度学习来说,最常见的一种方法是使用*dropout*层。这个想法是由Hinton等人在2012年提出的^([7](ch02.xhtml#idm45387025089232)),并在2014年由Srivastava等人在一篇论文中提出^([8](ch02.xhtml#idm45387025086976))
- en: Dropout layers are very simple. During training, each dropout layer chooses
a random set of units from the preceding layer and sets their output to 0, as
shown in [Figure 2-15](#dropout).
+ id: totrans-267
prefs: []
type: TYPE_NORMAL
+ zh: Dropout层非常简单。在训练期间,每个dropout层从前一层中选择一组随机单元,并将它们的输出设置为0,如[图2-15](#dropout)所示。
- en: Incredibly, this simple addition drastically reduces overfitting by ensuring
that the network doesn’t become overdependent on certain units or groups of units
that, in effect, just remember observations from the training set. If we use dropout
layers, the network cannot rely too much on any one unit and therefore knowledge
is more evenly spread across the whole network.
+ id: totrans-268
prefs: []
type: TYPE_NORMAL
+ zh: 令人难以置信的是,这个简单的添加通过确保网络不会过度依赖某些单元或单元组而大大减少了过拟合,这些单元或单元组实际上只是记住了训练集中的观察结果。如果我们使用dropout层,网络就不能太依赖任何一个单元,因此知识更均匀地分布在整个网络中。
- en: '![](Images/gdl2_0215.png)'
+ id: totrans-269
prefs: []
type: TYPE_IMG
+ zh: '![](Images/gdl2_0215.png)'
- en: Figure 2-15\. A dropout layer
+ id: totrans-270
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 图2-15\. 一个dropout层
- en: This makes the model much better at generalizing to unseen data, because the
network has been trained to produce accurate predictions even under unfamiliar
conditions, such as those caused by dropping random units. There are no weights
to learn within a dropout layer, as the units to drop are decided stochastically.
At prediction time, the dropout layer doesn’t drop any units, so that the full
network is used to make predictions.
+ id: totrans-271
prefs: []
type: TYPE_NORMAL
+ zh: 这使得模型在泛化到未见过的数据时更加出色,因为网络已经经过训练,即使在由于丢弃随机单元引起的陌生条件下,也能产生准确的预测。在dropout层内没有需要学习的权重,因为要丢弃的单元是随机决定的。在预测时,dropout层不会丢弃任何单元,因此整个网络用于进行预测。
- en: Dropout Analogy
+ id: totrans-272
prefs:
- PREF_H1
type: TYPE_NORMAL
+ zh: Dropout类比
- en: Returning to our analogy, it’s a bit like a math student practicing past papers
with a random selection of key formulae missing from their formula book. This
way, they learn how to answer questions through an understanding of the core principles,
@@ -1368,190 +1941,285 @@
it comes to test time, they will find it much easier to answer questions that
they have never seen before, due to their ability to generalize beyond the training
material.
+ id: totrans-273
prefs: []
type: TYPE_NORMAL
+ zh: 回到我们的类比,这有点像数学学生练习过去试卷,其中随机选择了公式书中缺失的关键公式。通过这种方式,他们学会了通过对核心原则的理解来回答问题,而不是总是在书中相同的地方查找公式。当考试时,他们会发现更容易回答以前从未见过的问题,因为他们能够超越训练材料进行泛化。
- en: The `Dropout` layer in Keras implements this functionality, with the `rate`
parameter specifying the proportion of units to drop from the preceding layer,
as shown in [Example 2-15](#dropout-layer).
+ id: totrans-274
prefs: []
type: TYPE_NORMAL
+ zh: Keras中的`Dropout`层实现了这种功能,`rate`参数指定了要从前一层中丢弃的单元的比例,如[示例2-15](#dropout-layer)所示。
- en: Example 2-15\. A `Dropout` layer in Keras
+ id: totrans-275
prefs:
- PREF_H5
type: TYPE_NORMAL
+ zh: 示例2-15\. Keras中的`Dropout`层
- en: '[PRE14]'
+ id: totrans-276
prefs: []
type: TYPE_PRE
+ zh: '[PRE14]'
- en: Dropout layers are used most commonly after dense layers since these are the
most prone to overfitting due to the higher number of weights, though you can
also use them after convolutional layers.
+ id: totrans-277
prefs: []
type: TYPE_NORMAL
+ zh: 由于密集层的权重数量较高,最容易过拟合,因此通常在密集层之后使用Dropout层,尽管也可以在卷积层之后使用。
- en: Tip
+ id: totrans-278
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 提示
- en: Batch normalization also has been shown to reduce overfitting, and therefore
many modern deep learning architectures don’t use dropout at all, relying solely
on batch normalization for regularization. As with most deep learning principles,
there is no golden rule that applies in every situation—the only way to know for
sure what’s best is to test different architectures and see which performs best
on a holdout set of data.
+ id: totrans-279
prefs: []
type: TYPE_NORMAL
+ zh: 批量归一化也被证明可以减少过拟合,因此许多现代深度学习架构根本不使用dropout,完全依赖批量归一化进行正则化。与大多数深度学习原则一样,在每种情况下都没有适用的黄金法则,唯一确定最佳方法的方式是测试不同的架构,看看哪种在保留数据集上表现最好。
- en: Building the CNN
+ id: totrans-280
prefs:
- PREF_H2
type: TYPE_NORMAL
+ zh: 构建CNN
- en: 'You’ve now seen three new Keras layer types: `Conv2D`, `BatchNormalization`,
and `Dropout`. Let’s put these pieces together into a CNN model and see how it
performs on the CIFAR-10 dataset.'
+ id: totrans-281
prefs: []
type: TYPE_NORMAL
+ zh: 您现在已经看到了三种新的Keras层类型:`Conv2D`、`BatchNormalization`和`Dropout`。让我们将这些部分组合成一个CNN模型,并看看它在CIFAR-10数据集上的表现。
- en: Running the Code for This Example
+ id: totrans-282
prefs:
- PREF_H1
type: TYPE_NORMAL
+ zh: 运行此示例的代码
- en: You can run the following example in the Jupyter notebook in the book repository
called *notebooks/02_deeplearning/02_cnn/cnn.ipynb*.
+ id: totrans-283
prefs: []
type: TYPE_NORMAL
+ zh: 您可以在书籍存储库中名为*notebooks/02_deeplearning/02_cnn/cnn.ipynb*的Jupyter笔记本中运行以下示例。
- en: The model architecture we shall test is shown in [Example 2-16](#conv-network-2).
+ id: totrans-284
prefs: []
type: TYPE_NORMAL
+ zh: 我们将测试的模型架构显示在[示例2-16](#conv-network-2)中。
- en: Example 2-16\. Code to build a CNN model using Keras
+ id: totrans-285
prefs:
- PREF_H5
type: TYPE_NORMAL
+ zh: 示例2-16\. 使用Keras构建CNN模型的代码
- en: '[PRE15]'
+ id: totrans-286
prefs: []
type: TYPE_PRE
+ zh: '[PRE15]'
- en: We use four stacked `Conv2D` layers, each followed by a `BatchNormalization`
and a `LeakyReLU` layer. After flattening the resulting tensor, we pass the data
through a `Dense` layer of size 128, again followed by a `BatchNormalization`
and a `LeakyReLU` layer. This is immediately followed by a `Dropout` layer for
regularization, and the network is concluded with an output `Dense` layer of size
10.
+ id: totrans-287
prefs: []
type: TYPE_NORMAL
+ zh: 我们使用四个堆叠的`Conv2D`层,每个后面跟一个`BatchNormalization`和一个`LeakyReLU`层。在展平结果张量后,我们通过一个大小为128的`Dense`层,再次跟一个`BatchNormalization`和一个`LeakyReLU`层。紧接着是一个用于正则化的`Dropout`层,网络最后是一个大小为10的输出`Dense`层。
- en: Tip
+ id: totrans-288
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 提示
- en: The order in which to use the batch normalization and activation layers is a
matter of preference. Usually batch normalization layers are placed before the
activation, but some successful architectures use these layers the other way around.
If you do choose to use batch normalization before activation, you can remember
the order using the acronym *BAD* (batch normalization, activation, then dropout)!
+ id: totrans-289
prefs: []
type: TYPE_NORMAL
+ zh: 使用批量归一化和激活层的顺序是个人偏好的问题。通常情况下,批量归一化层放在激活层之前,但一些成功的架构会反过来使用这些层。如果选择在激活之前使用批量归一化,可以使用缩写
+ *BAD*(批量归一化,激活,然后是dropout)来记住顺序!
- en: The model summary is shown in [Table 2-3](#cnn_model_summary).
+ id: totrans-290
prefs: []
type: TYPE_NORMAL
+ zh: 模型摘要显示在[表2-3](#cnn_model_summary)中。
- en: Table 2-3\. Model summary of the CNN for CIFAR-10
+ id: totrans-291
prefs: []
type: TYPE_NORMAL
+ zh: 表2-3\. CIFAR-10的CNN模型摘要
- en: '| Layer (type) | Output shape | Param # |'
+ id: totrans-292
prefs: []
type: TYPE_TB
+ zh: '| 层(类型) | 输出形状 | 参数 # |'
- en: '| --- | --- | --- |'
+ id: totrans-293
prefs: []
type: TYPE_TB
+ zh: '| --- | --- | --- |'
- en: '| InputLayer | (None, 32, 32, 3) | 0 |'
+ id: totrans-294
prefs: []
type: TYPE_TB
+ zh: '| InputLayer | (None, 32, 32, 3) | 0 |'
- en: '| Conv2D | (None, 32, 32, 32) | 896 |'
+ id: totrans-295
prefs: []
type: TYPE_TB
+ zh: '| Conv2D | (None, 32, 32, 32) | 896 |'
- en: '| BatchNormalization | (None, 32, 32, 32) | 128 |'
+ id: totrans-296
prefs: []
type: TYPE_TB
+ zh: '| BatchNormalization | (None, 32, 32, 32) | 128 |'
- en: '| LeakyReLU | (None, 32, 32, 32) | 0 |'
+ id: totrans-297
prefs: []
type: TYPE_TB
+ zh: '| LeakyReLU | (None, 32, 32, 32) | 0 |'
- en: '| Conv2D | (None, 16, 16, 32) | 9,248 |'
+ id: totrans-298
prefs: []
type: TYPE_TB
+ zh: '| Conv2D | (None, 16, 16, 32) | 9,248 |'
- en: '| BatchNormalization | (None, 16, 16, 32) | 128 |'
+ id: totrans-299
prefs: []
type: TYPE_TB
+ zh: '| BatchNormalization | (None, 16, 16, 32) | 128 |'
- en: '| LeakyReLU | (None, 16, 16, 32) | 0 |'
+ id: totrans-300
prefs: []
type: TYPE_TB
+ zh: '| LeakyReLU | (None, 16, 16, 32) | 0 |'
- en: '| Conv2D | (None, 16, 16, 64) | 18,496 |'
+ id: totrans-301
prefs: []
type: TYPE_TB
+ zh: '| Conv2D | (None, 16, 16, 64) | 18,496 |'
- en: '| BatchNormalization | (None, 16, 16, 64) | 256 |'
+ id: totrans-302
prefs: []
type: TYPE_TB
+ zh: '| BatchNormalization | (None, 16, 16, 64) | 256 |'
- en: '| LeakyReLU | (None, 16, 16, 64) | 0 |'
+ id: totrans-303
prefs: []
type: TYPE_TB
+ zh: '| LeakyReLU | (None, 16, 16, 64) | 0 |'
- en: '| Conv2D | (None, 8, 8, 64) | 36,928 |'
+ id: totrans-304
prefs: []
type: TYPE_TB
+ zh: '| Conv2D | (None, 8, 8, 64) | 36,928 |'
- en: '| BatchNormalization | (None, 8, 8, 64) | 256 |'
+ id: totrans-305
prefs: []
type: TYPE_TB
+ zh: '| BatchNormalization | (None, 8, 8, 64) | 256 |'
- en: '| LeakyReLU | (None, 8, 8, 64) | 0 |'
+ id: totrans-306
prefs: []
type: TYPE_TB
+ zh: '| LeakyReLU | (None, 8, 8, 64) | 0 |'
- en: '| Flatten | (None, 4096) | 0 |'
+ id: totrans-307
prefs: []
type: TYPE_TB
+ zh: '| Flatten | (None, 4096) | 0 |'
- en: '| Dense | (None, 128) | 524,416 |'
+ id: totrans-308
prefs: []
type: TYPE_TB
+ zh: '| Dense | (None, 128) | 524,416 |'
- en: '| BatchNormalization | (None, 128) | 512 |'
+ id: totrans-309
prefs: []
type: TYPE_TB
+ zh: '| BatchNormalization | (None, 128) | 512 |'
- en: '| LeakyReLU | (None, 128) | 0 |'
+ id: totrans-310
prefs: []
type: TYPE_TB
+ zh: '| LeakyReLU | (None, 128) | 0 |'
- en: '| Dropout | (None, 128) | 0 |'
+ id: totrans-311
prefs: []
type: TYPE_TB
+ zh: '| Dropout | (None, 128) | 0 |'
- en: '| Dense | (None, 10) | 1290 |'
+ id: totrans-312
prefs: []
type: TYPE_TB
+ zh: '| Dense | (None, 10) | 1290 |'
- en: '| Total params | 592,554 |'
+ id: totrans-313
prefs: []
type: TYPE_TB
+ zh: '| 总参数 | 592,554 |'
- en: '| Trainable params | 591,914 |'
+ id: totrans-314
prefs: []
type: TYPE_TB
+ zh: '| 可训练参数 | 591,914 |'
- en: '| Non-trainable params | 640 |'
+ id: totrans-315
prefs: []
type: TYPE_TB
+ zh: '| 不可训练参数 | 640 |'
- en: Tip
+ id: totrans-316
prefs:
- PREF_H6
type: TYPE_NORMAL
+ zh: 提示
- en: Before moving on, make sure you are able to calculate the output shape and number
of parameters for each layer by hand. It’s a good exercise to prove to yourself
that you have fully understood how each layer is constructed and how it is connected
to the preceding layer! Don’t forget to include the bias weights that are included
as part of the `Conv2D` and `Dense` layers.
+ id: totrans-317
prefs: []
type: TYPE_NORMAL
- en: Training and Evaluating the CNN
+ id: totrans-318
prefs:
- PREF_H2
type: TYPE_NORMAL
- en: We compile and train the model in exactly the same way as before and call the
`evaluate` method to determine its accuracy on the holdout set ([Figure 2-16](#cnn_model_evaluate)).
+ id: totrans-319
prefs: []
type: TYPE_NORMAL
- en: '![](Images/gdl2_0216.png)'
+ id: totrans-320
prefs: []
type: TYPE_IMG
- en: Figure 2-16\. CNN performance
+ id: totrans-321
prefs:
- PREF_H6
type: TYPE_NORMAL
- en: As you can see, this model is now achieving 71.5% accuracy, up from 49.0% previously.
Much better! [Figure 2-17](#cnn_preds) shows some predictions from our new convolutional
model.
+ id: totrans-322
prefs: []
type: TYPE_NORMAL
- en: This improvement has been achieved simply by changing the architecture of the
@@ -1563,16 +2231,20 @@
generative models, it becomes even more important to understand the inner workings
of your model since it is the middle layers of your network that capture the high-level
features that you are most interested in.
+ id: totrans-323
prefs: []
type: TYPE_NORMAL
- en: '![](Images/gdl2_0217.png)'
+ id: totrans-324
prefs: []
type: TYPE_IMG
- en: Figure 2-17\. CNN predictions
+ id: totrans-325
prefs:
- PREF_H6
type: TYPE_NORMAL
- en: Summary
+ id: totrans-326
prefs:
- PREF_H1
type: TYPE_NORMAL
@@ -1582,6 +2254,7 @@
from the CIFAR-10 dataset. Then, we improved upon this architecture by introducing
convolutional, batch normalization, and dropout layers to create a convolutional
neural network (CNN).
+ id: totrans-327
prefs: []
type: TYPE_NORMAL
- en: A really important point to take away from this chapter is that deep neural
@@ -1591,43 +2264,57 @@
appear. Don’t feel constrained to only use the architectures that you have read
about in this book or elsewhere! Like a child with a set of building blocks, the
design of your neural network is only limited by your own imagination.
+ id: totrans-328
prefs: []
type: TYPE_NORMAL
- en: In the next chapter, we shall see how we can use these building blocks to design
a network that can generate images.
+ id: totrans-329
prefs: []
type: TYPE_NORMAL
- en: ^([1](ch02.xhtml#idm45387028957520-marker)) Kaiming He et al., “Deep Residual
Learning for Image Recognition,” December 10, 2015, [*https://arxiv.org/abs/1512.03385*](https://arxiv.org/abs/1512.03385).
+ id: totrans-330
prefs: []
type: TYPE_NORMAL
- en: ^([2](ch02.xhtml#idm45387033163216-marker)) Alex Krizhevsky, “Learning Multiple
Layers of Features from Tiny Images,” April 8, 2009, [*https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf*](https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf).
+ id: totrans-331
prefs: []
type: TYPE_NORMAL
- en: '^([3](ch02.xhtml#idm45387032147088-marker)) Diederik Kingma and Jimmy Ba, “Adam:
A Method for Stochastic Optimization,” December 22, 2014, [*https://arxiv.org/abs/1412.6980v8*](https://arxiv.org/abs/1412.6980v8).'
+ id: totrans-332
prefs: []
type: TYPE_NORMAL
- en: ^([4](ch02.xhtml#idm45387032068928-marker)) Samuel L. Smith et al., “Don’t Decay
the Learning Rate, Increase the Batch Size,” November 1, 2017, [*https://arxiv.org/abs/1711.00489*](https://arxiv.org/abs/1711.00489).
+ id: totrans-333
prefs: []
type: TYPE_NORMAL
- en: ^([5](ch02.xhtml#idm45387031545152-marker)) Vincent Dumoulin and Francesco Visin,
“A Guide to Convolution Arithmetic for Deep Learning,” January 12, 2018, [*https://arxiv.org/abs/1603.07285*](https://arxiv.org/abs/1603.07285).
+ id: totrans-334
prefs: []
type: TYPE_NORMAL
- en: '^([6](ch02.xhtml#idm45387025136368-marker)) Sergey Ioffe and Christian Szegedy,
“Batch Normalization: Accelerating Deep Network Training by Reducing Internal
Covariate Shift,” February 11, 2015, [*https://arxiv.org/abs/1502.03167*](https://arxiv.org/abs/1502.03167).'
+ id: totrans-335
prefs: []
type: TYPE_NORMAL
+ zh: ^([6](ch02.xhtml#idm45387025136368-marker)) Sergey Ioffe和Christian Szegedy,“批量归一化:通过减少内部协变量转移加速深度网络训练”,2015年2月11日,[*https://arxiv.org/abs/1502.03167*](https://arxiv.org/abs/1502.03167)。
- en: ^([7](ch02.xhtml#idm45387025089232-marker)) Hinton et al., “Networks by Preventing
Co-Adaptation of Feature Detectors,” July 3, 2012, [*https://arxiv.org/abs/1207.0580*](https://arxiv.org/abs/1207.0580).
+ id: totrans-336
prefs: []
type: TYPE_NORMAL
+ zh: ^([7](ch02.xhtml#idm45387025089232-marker)) Hinton等人,“通过防止特征探测器的共适应来构建网络”,2012年7月3日,[*https://arxiv.org/abs/1207.0580*](https://arxiv.org/abs/1207.0580)。
- en: '^([8](ch02.xhtml#idm45387025086976-marker)) Nitish Srivastava et al., “Dropout:
A Simple Way to Prevent Neural Networks from Overfitting,” *Journal of Machine
Learning Research* 15 (2014): 1929–1958, [*http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf*](http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf).'
+ id: totrans-337
prefs: []
type: TYPE_NORMAL
+ zh: '^([8](ch02.xhtml#idm45387025086976-marker)) Nitish Srivastava等人,“Dropout:防止神经网络过拟合的简单方法”,*机器学习研究杂志*
+ 15 (2014): 1929–1958,[*http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf*](http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf)。'