diff --git a/totrans/dl-scr_2.yaml b/totrans/dl-scr_2.yaml
index ebc4c18..f69bbf6 100644
--- a/totrans/dl-scr_2.yaml
+++ b/totrans/dl-scr_2.yaml
@@ -1410,6 +1410,9 @@
   id: totrans-160
   prefs: []
   type: TYPE_NORMAL
+  zh: 做“一堆线性回归”是什么意思？做一个线性回归涉及使用一组参数进行矩阵乘法：如果我们的数据*X*的维度是`[batch_size, num_features]`，那么我们将它乘以一个维度为`[num_features,
+    1]`的权重矩阵*W*，得到一个维度为`[batch_size, 1]`的输出；对于批次中的每个观察值，这个输出只是原始特征的一个*加权和*。要做多个线性回归，我们只需将我们的输入乘以一个维度为`[num_features,
+    num_outputs]`的权重矩阵，得到一个维度为`[batch_size, num_outputs]`的输出；现在，*对于每个观察值*，我们有`num_outputs`个不同的原始特征的加权和。
 - en: What are these weighted sums? We should think of each of them as a “learned
     feature”—a combination of the original features that, once the network is trained,
     will represent its attempt to learn combinations of features that help it accurately
@@ -1418,26 +1421,31 @@
   id: totrans-161
   prefs: []
   type: TYPE_NORMAL
+  zh: 这些加权和是什么？我们应该将它们中的每一个看作是一个“学习到的特征”——原始特征的组合，一旦网络训练完成，将代表其尝试学习的特征组合，以帮助准确预测房价。我们应该创建多少个学习到的特征？让我们创建13个，因为我们创建了13个原始特征。
 - en: 'Step 2: A Nonlinear Function'
   id: totrans-162
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 步骤2：一个非线性函数
 - en: Next, we’ll feed each of these weighted sums through a *non*linear function;
     the first function we’ll try is the `sigmoid` function that was mentioned in [Chapter 1](ch01.html#foundations).
     As a refresher, [Figure 2-9](#fig_02-09) plots the `sigmoid` function.
   id: totrans-163
   prefs: []
   type: TYPE_NORMAL
+  zh: 接下来，我们将通过一个非线性函数来处理这些加权和；我们将尝试的第一个函数是在第1章中提到的`sigmoid`函数。作为提醒，[图2-9](#fig_02-09)展示了`sigmoid`函数。
 - en: '![Sigmoid](assets/dlfs_0209.png)'
   id: totrans-164
   prefs: []
   type: TYPE_IMG
+  zh: '![Sigmoid](assets/dlfs_0209.png)'
 - en: Figure 2-9\. Sigmoid function plotted from x = –5 to x = 5
   id: totrans-165
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图2-9。从x = -5到x = 5绘制的Sigmoid函数
 - en: Why is using this nonlinear function a good idea? Why not the `square` function
     *f*(*x*) = *x*², for example? There are a couple of reasons. First, we want the
     function we use here to be *monotonic* so that it “preserves” information about
@@ -1450,17 +1458,20 @@
   id: totrans-166
   prefs: []
   type: TYPE_NORMAL
+  zh: 为什么使用这个非线性函数是个好主意？为什么不使用`square`函数*f*(*x*) = *x*²，例如？有几个原因。首先，我们希望在这里使用的函数是*单调*的，以便“保留”输入的数字的信息。假设，给定输入的日期，我们的两个线性回归分别产生值-3和3。然后通过`square`函数传递这些值将为每个产生一个值9，因此任何接收这些数字作为输入的函数在它们通过`square`函数传递后将“丢失”一个原始为-3，另一个为3的信息。
 - en: The second reason, of course, is that the function is nonlinear; this nonlinearity
     will enable our neural network to model the inherently nonlinear relationship
     between the features and the target.
   id: totrans-167
   prefs: []
   type: TYPE_NORMAL
+  zh: 当然，第二个原因是这个函数是非线性的；这种非线性将使我们的神经网络能够建模特征和目标之间固有的非线性关系。
 - en: 'Finally, the `sigmoid` function has the nice property that its derivative can
     be expressed in terms of the function itself:'
   id: totrans-168
   prefs: []
   type: TYPE_NORMAL
+  zh: 最后，`sigmoid`函数有一个很好的性质，即它的导数可以用函数本身来表示：
 - en: <math display="block"><mrow><mfrac><mrow><mi>∂</mi><mi>σ</mi></mrow> <mrow><mi>∂</mi><mi>u</mi></mrow></mfrac>
     <mrow><mo>(</mo> <mi>x</mi> <mo>)</mo></mrow> <mo>=</mo> <mi>σ</mi> <mrow><mo>(</mo>
     <mi>x</mi> <mo>)</mo></mrow> <mo>×</mo> <mrow><mo>(</mo> <mn>1</mn> <mo>-</mo>
@@ -1477,11 +1488,13 @@
   id: totrans-170
   prefs: []
   type: TYPE_NORMAL
+  zh: 我们将很快在神经网络的反向传播中使用`sigmoid`函数时使用它。
 - en: 'Step 3: Another Linear Regression'
   id: totrans-171
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 步骤3：另一个线性回归
 - en: Finally, we’ll take the resulting 13 elements—each of which is a combination
     of the original features, fed through the `sigmoid` function so that they all
     have values between 0 and 1—and feed them into a regular linear regression, using
@@ -1489,6 +1502,7 @@
   id: totrans-172
   prefs: []
   type: TYPE_NORMAL
+  zh: 最后，我们将得到的13个元素——每个元素都是原始特征的组合，通过`sigmoid`函数传递，使它们的值都在0到1之间——并将它们输入到一个常规线性回归中，使用它们的方式与我们之前使用原始特征的方式相同。
 - en: 'Then, we’ll try training the *entire* resulting function in the same way we
     trained the standard linear regression earlier in this chapter: we’ll feed data
     through the model, use the chain rule to figure out how much increasing the weights
@@ -1499,31 +1513,37 @@
   id: totrans-173
   prefs: []
   type: TYPE_NORMAL
+  zh: 然后，我们将尝试训练*整个*得到的函数，方式与本章前面训练标准线性回归的方式相同：我们将数据通过模型，使用链式法则来计算增加权重会增加（或减少）损失多少，然后在每次迭代中更新权重，以减少损失。随着时间的推移（我们希望），我们将得到比以前更准确的模型，一个已经“学会”了特征和目标之间固有非线性关系的模型。
 - en: It might be tough to wrap your mind around what’s going on based on this description,
     so let’s look at an illustration.
   id: totrans-174
   prefs: []
   type: TYPE_NORMAL
+  zh: 根据这个描述，可能很难理解正在发生的事情，所以让我们看一个插图。
 - en: Diagrams
   id: totrans-175
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 图表
 - en: '[Figure 2-10](#fig_02-10) is a diagram of what our more complicated model now
     looks like.'
   id: totrans-176
   prefs: []
   type: TYPE_NORMAL
+  zh: '[图2-10](#fig_02-10)是我们更复杂模型的图表。'
 - en: '![Neural network forward pass](assets/dlfs_0210.png)'
   id: totrans-177
   prefs: []
   type: TYPE_IMG
+  zh: '![神经网络前向传播](assets/dlfs_0210.png)'
 - en: Figure 2-10\. Steps 1–3 translated into a computational graph of the kind we
     saw in [Chapter 1](ch01.html#foundations)
   id: totrans-178
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图2-10。将步骤1-3翻译成我们在第1章中看到的计算图的一种类型
 - en: 'You’ll see that we start with matrix multiplication and matrix addition, as
     before. Now let’s formalize some terminology that was mentioned previously: when
     we apply these operations in the course of a nested function, we’ll call the first
diff --git a/totrans/dl-scr_3.yaml b/totrans/dl-scr_3.yaml
index b4cd162..f12057f 100644
--- a/totrans/dl-scr_3.yaml
+++ b/totrans/dl-scr_3.yaml
@@ -1,4 +1,5 @@
 - en: Chapter 3\. Deep Learning from Scratch
+  id: totrans-0
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
@@ -13,6 +14,7 @@
     learn to represent these building blocks themselves as abstract Python classes
     and then use these classes to build deep learning models; by the end of this chapter,
     you will indeed have done “deep learning from scratch”!'
+  id: totrans-1
   prefs: []
   type: TYPE_NORMAL
 - en: 'We’ll also map the descriptions of neural networks in terms of these building
@@ -25,9 +27,11 @@
     that happen at a low level. In the first part of this chapter, we’ll map this
     description of models to common higher-level concepts such as “layers” that will
     ultimately allow us to more easily describe more complex models.'
+  id: totrans-2
   prefs: []
   type: TYPE_NORMAL
 - en: 'Deep Learning Definition: A First Pass'
+  id: totrans-3
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
@@ -39,26 +43,31 @@
     We found that if we defined the model as a function that included *parameters*
     as inputs to some of its operations, we could “fit” it to optimally describe the
     data using the following procedure:'
+  id: totrans-4
   prefs: []
   type: TYPE_NORMAL
 - en: Repeatedly feed observations through the model, keeping track of the quantities
     computed along the way during this “forward pass.”
+  id: totrans-5
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
 - en: Calculate a *loss* representing how far off our model’s predictions were from
     the desired outputs or *target*.
+  id: totrans-6
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
 - en: Using the quantities computed on the forward pass and the chain rule math worked
     out in [Chapter 1](ch01.html#foundations), compute how much each of the input
     *parameters* ultimately affects this loss.
+  id: totrans-7
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
 - en: Update the values of the parameters so that the loss will hopefully be reduced
     when the next set of observations is passed through the model.
+  id: totrans-8
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
@@ -67,6 +76,7 @@
     linear regression model). This had the expected limitation that, even when fit
     “optimally,” the model could nevertheless represent only linear relationships
     between our features and our target.
+  id: totrans-9
   prefs: []
   type: TYPE_NORMAL
 - en: We then defined a function structure that applied these linear operations first,
@@ -75,12 +85,14 @@
     something closer to the true, nonlinear relationship between input and output,
     while having the additional benefit that it could learn relationships between
     *combinations* of our input features and the target.
+  id: totrans-10
   prefs: []
   type: TYPE_NORMAL
 - en: 'What is the connection between models like these and deep learning models?
     We’ll start with a somewhat clumsy attempt at a definition: deep learning models
     are represented by series of operations that have *at least two, nonconsecutive*
     nonlinear functions involved.'
+  id: totrans-11
   prefs: []
   type: TYPE_NORMAL
 - en: I’ll show where this definition comes from shortly, but first note that since
@@ -92,6 +104,7 @@
     is differentiable, so as long as the individual operations making up the function
     are differentiable, the whole function will be differentiable, and we’ll be able
     to train it using the same four-step training procedure just described.
+  id: totrans-12
   prefs: []
   type: TYPE_NORMAL
 - en: However, so far our approach to actually training these models has been to compute
@@ -108,14 +121,17 @@
     To guide us in the right direction as far as which abstractions to create, we’ll
     try to map the operations we’ve been using to traditional descriptions of neural
     networks as being made up of “layers,” “neurons,” and so on.
+  id: totrans-13
   prefs: []
   type: TYPE_NORMAL
 - en: As our first step, we’ll have to create an abstraction to represent the individual
     operations we’ve been working with so far, instead of continuing to code the same
     matrix multiplication and bias addition over and over again.
+  id: totrans-14
   prefs: []
   type: TYPE_NORMAL
 - en: 'The Building Blocks of Neural Networks: Operations'
+  id: totrans-15
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
@@ -126,6 +142,7 @@
     such as matrix multiplication, seem to have *another* special kind of input, also
     an `ndarray`: the parameters. In our `Operation` class—or perhaps in another class
     that inherits from it—we should allow for `params` as another instance variable.'
+  id: totrans-16
   prefs: []
   type: TYPE_NORMAL
 - en: 'Another insight is that there seem to be two types of `Operation`s: some, such
@@ -141,66 +158,84 @@
     network). Also on the backward pass, each `Operation` will send an “input gradient”
     backward, representing the partial derivative of the loss with respect to each
     element of the input.'
+  id: totrans-17
   prefs: []
   type: TYPE_NORMAL
 - en: 'These facts place a few important restrictions on the workings of our `Operation`s
     that will help us ensure we’re computing the gradients correctly:'
+  id: totrans-18
   prefs: []
   type: TYPE_NORMAL
 - en: The shape of the *output gradient* `ndarray` must match the shape of the *output*.
+  id: totrans-19
   prefs:
   - PREF_UL
   type: TYPE_NORMAL
 - en: The shape of the *input gradient* that the `Operation` sends backward during
     the backward pass must match the shape of the `Operation`’s *input*.
+  id: totrans-20
   prefs:
   - PREF_UL
   type: TYPE_NORMAL
 - en: This will all be clearer once you see it in a diagram; let’s look at that next.
+  id: totrans-21
   prefs: []
   type: TYPE_NORMAL
 - en: Diagram
+  id: totrans-22
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
 - en: This is all summarized in [Figure 3-1](#fig_03-01), for an operation `O` that
     is receiving inputs from an operation `N` and passing outputs on to another operation
     `P`.
+  id: totrans-23
   prefs: []
   type: TYPE_NORMAL
 - en: '![Neural net diagram](assets/dlfs_0301.png)'
+  id: totrans-24
   prefs: []
   type: TYPE_IMG
 - en: Figure 3-1\. An Operation, with input and output
+  id: totrans-25
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
 - en: '[Figure 3-2](#fig_03-02) covers the case of an `Operation` with parameters.'
+  id: totrans-26
   prefs: []
   type: TYPE_NORMAL
 - en: '![Neural net diagram](assets/dlfs_0302.png)'
+  id: totrans-27
   prefs: []
   type: TYPE_IMG
 - en: Figure 3-2\. A ParamOperation, with input and output and parameters
+  id: totrans-28
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
 - en: Code
+  id: totrans-29
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
 - en: 'With all this, we can write the fundamental building block for our neural network,
     an `Operation`, as:'
+  id: totrans-30
   prefs: []
   type: TYPE_NORMAL
 - en: '[PRE0]'
+  id: totrans-31
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE0]'
 - en: For any individual `Operation` that we define, we’ll have to implement the `_output`
     and `_input_grad` functions, so named because of the quantities they compute.
+  id: totrans-32
   prefs: []
   type: TYPE_NORMAL
 - en: Note
+  id: totrans-33
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
@@ -209,28 +244,35 @@
     throughout deep learning fit this blueprint of sending inputs forward and gradients
     backward, with the shapes of what they receive on the forward pass matching the
     shapes of what they send backward on the backward pass, and vice versa.'
+  id: totrans-34
   prefs: []
   type: TYPE_NORMAL
 - en: 'We’ll define the specific `Operation`s we’ve used thus far—matrix multiplication
     and so on—later in this chapter. First we’ll define another class that inherits
     from `Operation` that we’ll use specifically for `Operation`s that involve parameters:'
+  id: totrans-35
   prefs: []
   type: TYPE_NORMAL
 - en: '[PRE1]'
+  id: totrans-36
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE1]'
 - en: Similar to the base `Operation`, an individual `ParamOperation` would have to
     define the `_param_grad` function in addition to the `_output` and `_input_grad`
     functions.
+  id: totrans-37
   prefs: []
   type: TYPE_NORMAL
 - en: 'We have now formalized the neural network building blocks we’ve been using
     in our models so far. We could skip ahead and define neural networks directly
     in terms of these `Operation`s, but there is an intermediate class we’ve been
     dancing around for a chapter and a half that we’ll define first: the `Layer`.'
+  id: totrans-38
   prefs: []
   type: TYPE_NORMAL
 - en: 'The Building Blocks of Neural Networks: Layers'
+  id: totrans-39
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
@@ -248,6 +290,7 @@
     numbering—also has an important name: it is called a *hidden* layer, since it
     is the only layer whose values we don’t typically see explicitly during the course
     of training.'
+  id: totrans-40
   prefs: []
   type: TYPE_NORMAL
 - en: The output layer is an important exception to this definition of layers, in
@@ -257,28 +300,34 @@
     functions typically “squash down” their input to some subset of that range relevant
     to the particular problem we’re trying to solve (for example, the `sigmoid` function
     squashes down its input to between 0 and 1).
+  id: totrans-41
   prefs: []
   type: TYPE_NORMAL
 - en: Diagrams
+  id: totrans-42
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
 - en: To make the connection explicit, [Figure 3-3](#fig_03-03) shows the diagram
     of the neural network from the prior chapter with the individual operations grouped
     into layers.
+  id: totrans-43
   prefs: []
   type: TYPE_NORMAL
 - en: '![Neural net diagram](assets/dlfs_0303.png)'
+  id: totrans-44
   prefs: []
   type: TYPE_IMG
 - en: Figure 3-3\. The neural network from the prior chapter with the operations grouped
     into layers
+  id: totrans-45
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
 - en: You can see that the input represents an “input” layer, the next three operations
     (ending with the `sigmoid` function) represent the next layer, and the last two
     operations represent the last layer.
+  id: totrans-46
   prefs: []
   type: TYPE_NORMAL
 - en: 'This is, of course, rather cumbersome. And that’s the point: representing neural
@@ -286,16 +335,20 @@
     networks work and how to train them, is too “low level” for anything more complicated
     than a two-layer neural network. That’s why the more common way to represent neural
     networks is in terms of layers, as shown in [Figure 3-4](#fig_03-04).'
+  id: totrans-47
   prefs: []
   type: TYPE_NORMAL
 - en: '![Neural net diagram](assets/dlfs_0304.png)'
+  id: totrans-48
   prefs: []
   type: TYPE_IMG
 - en: Figure 3-4\. The neural network from the prior chapter in terms of layers
+  id: totrans-49
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
 - en: Connection to the brain
+  id: totrans-50
   prefs:
   - PREF_H3
   type: TYPE_NORMAL
@@ -305,6 +358,7 @@
     each observation in the layer’s output*. The neural network from the prior example
     can thus be thought of as having 13 neurons in the input layer, then 13 neurons
     (again) in the hidden layer, and one neuron in the output layer.'
+  id: totrans-51
   prefs: []
   type: TYPE_NORMAL
 - en: 'Neurons in the brain have the property that they can receive inputs from many
@@ -315,127 +369,162 @@
     via a nonlinear function. Thus, this nonlinear function is called the *activation
     function*, and the values that come out of it are called the *activations* for
     that layer.^([1](ch03.html#idm45732624417528))'
+  id: totrans-52
   prefs: []
   type: TYPE_NORMAL
 - en: 'Now that we’ve defined layers, we can state the more conventional definition
     of deep learning: *deep learning models are neural networks with more than one
     hidden layer.*'
+  id: totrans-53
   prefs: []
   type: TYPE_NORMAL
 - en: We can see that this is equivalent to the earlier definition that was purely
     in terms of `Operation`s, since a layer is just a series of `Operation`s with
     a nonlinear operation at the end.
+  id: totrans-54
   prefs: []
   type: TYPE_NORMAL
 - en: Now that we’ve defined a base class for our `Operation`s, let’s show how it
     can serve as the fundamental building block of the models we saw in the prior
     chapter.
+  id: totrans-55
   prefs: []
   type: TYPE_NORMAL
 - en: Building Blocks on Building Blocks
+  id: totrans-56
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
 - en: 'What specific `Operation`s do we need to implement for the models in the prior
     chapter to work? Based on our experience of implementing that neural network step
     by step, we know there are three kinds:'
+  id: totrans-57
   prefs: []
   type: TYPE_NORMAL
 - en: The matrix multiplication of the input with the matrix of parameters
+  id: totrans-58
   prefs:
   - PREF_UL
   type: TYPE_NORMAL
 - en: The addition of a bias term
+  id: totrans-59
   prefs:
   - PREF_UL
   type: TYPE_NORMAL
 - en: The `sigmoid` activation function
+  id: totrans-60
   prefs:
   - PREF_UL
   type: TYPE_NORMAL
 - en: 'Let’s start with the `WeightMultiply` `Operation`:'
+  id: totrans-61
   prefs: []
   type: TYPE_NORMAL
 - en: '[PRE2]'
+  id: totrans-62
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE2]'
 - en: Here we simply code up the matrix multiplication on the forward pass, as well
     as the rules for “sending gradients backward” to both the inputs and the parameters
     on the backward pass (using the rules for doing so that we reasoned through at
     the end of [Chapter 1](ch01.html#foundations)). As you’ll see shortly, we can
     now use this as a *building block* that we can simply plug into our `Layer`s.
+  id: totrans-63
   prefs: []
   type: TYPE_NORMAL
 - en: 'Next up is the addition operation, which we’ll call `BiasAdd`:'
+  id: totrans-64
   prefs: []
   type: TYPE_NORMAL
 - en: '[PRE3]'
+  id: totrans-65
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE3]'
 - en: 'Finally, let’s do `sigmoid`:'
+  id: totrans-66
   prefs: []
   type: TYPE_NORMAL
 - en: '[PRE4]'
+  id: totrans-67
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE4]'
 - en: This simply implements the math described in the previous chapter.
+  id: totrans-68
   prefs: []
   type: TYPE_NORMAL
 - en: Note
+  id: totrans-69
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
 - en: 'For both `sigmoid` and the `ParamOperation`, the step during the backward pass
     where we compute:'
+  id: totrans-70
   prefs: []
   type: TYPE_NORMAL
 - en: '[PRE5]'
+  id: totrans-71
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE5]'
 - en: 'is the step where we are applying the chain rule, and the corresponding rule
     for `WeightMultiply`:'
+  id: totrans-72
   prefs: []
   type: TYPE_NORMAL
 - en: '[PRE6]'
+  id: totrans-73
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE6]'
 - en: is, as I argued in [Chapter 1](ch01.html#foundations), the analogue of the chain
     rule when the function in question is a matrix multiplication.
+  id: totrans-74
   prefs: []
   type: TYPE_NORMAL
 - en: Now that we’ve defined these `Operation`s precisely, we can use *them* as building
     blocks to define a `Layer`.
+  id: totrans-75
   prefs: []
   type: TYPE_NORMAL
 - en: The Layer Blueprint
+  id: totrans-76
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
 - en: 'Because of the way we’ve written the `Operation`s, writing the `Layer` class
     is easy:'
+  id: totrans-77
   prefs: []
   type: TYPE_NORMAL
 - en: 'The `forward` and `backward` methods simply involve sending the input successively
     forward through a series of `Operation`s—exactly as we’ve been doing in the diagrams
     all along! This is the most important fact about the working of `Layer`s; the
     rest of the code is a wrapper around this and mostly involves bookkeeping:'
+  id: totrans-78
   prefs:
   - PREF_UL
   type: TYPE_NORMAL
 - en: Defining the correct series of `Operation`s in the `_setup_layer` function and
     initializing and storing the parameters in these `Operation`s (which will also
     take place in the `_setup_layer` function)
+  id: totrans-79
   prefs:
   - PREF_IND
   - PREF_UL
   type: TYPE_NORMAL
 - en: Storing the correct values in `self.input_` and `self.output` on the `forward`
     method
+  id: totrans-80
   prefs:
   - PREF_IND
   - PREF_UL
   type: TYPE_NORMAL
 - en: Performing the correct assertion checking in the `backward` method
+  id: totrans-81
   prefs:
   - PREF_IND
   - PREF_UL
@@ -443,27 +532,34 @@
 - en: Finally, the `_params` and `_param_grads` functions simply extract the parameters
     and their gradients (with respect to the loss) from the `ParamOperation`s within
     the layer.
+  id: totrans-82
   prefs:
   - PREF_UL
   type: TYPE_NORMAL
 - en: 'Here’s what all that looks like:'
+  id: totrans-83
   prefs: []
   type: TYPE_NORMAL
 - en: '[PRE7]'
+  id: totrans-84
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE7]'
 - en: Just as we moved from an abstract definition of an `Operation` to the implementation
     of specific `Operation`s needed for the neural network from [Chapter 2](ch02.html#fundamentals),
     let’s now implement the `Layer` from that network as well.
+  id: totrans-85
   prefs: []
   type: TYPE_NORMAL
 - en: The Dense Layer
+  id: totrans-86
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
 - en: We called the `Operation`s we’ve been dealing with `WeightMultiply`, `BiasAdd`,
     and so on. What should we call the layer we’ve been using so far? A `LinearNonLinear`
     layer?
+  id: totrans-87
   prefs: []
   type: TYPE_NORMAL
 - en: 'A defining characteristic of this layer is that *each output neuron is a function
@@ -476,20 +572,25 @@
     Thus these layers are often called *fully connected* layers; recently, in the
     popular `Keras` library, they are also often called `Dense` layers, a more concise
     term that gets across the same idea.'
+  id: totrans-88
   prefs: []
   type: TYPE_NORMAL
 - en: Now that we know what to call it and why, let’s define the `Dense` layer in
     terms of the operations we’ve already defined—as you’ll see, because of how we
     defined our `Layer` base class, all we need to do is to put the `Operation`s defined
     in the previous section in as a list in the `_setup_layer` function.
+  id: totrans-89
   prefs: []
   type: TYPE_NORMAL
 - en: '[PRE8]'
+  id: totrans-90
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE8]'
 - en: Note that we’ll make the default activation a `Linear` activation, which really
     means we apply no activation, and simply apply the identity function to the output
     of the layer.
+  id: totrans-91
   prefs: []
   type: TYPE_NORMAL
 - en: What building blocks should we now add on top of `Operation` and `Layer`? To
@@ -497,9 +598,11 @@
     just as `Layer`s wrapped around `Operation`s. It isn’t obvious what other classes
     will be needed, so we’ll just dive in and build `NeuralNetwork` and figure out
     the other classes we’ll need as we go.
+  id: totrans-92
   prefs: []
   type: TYPE_NORMAL
 - en: The NeuralNetwork Class, and Maybe Others
+  id: totrans-93
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
@@ -508,155 +611,216 @@
     of data representing “observations” (`X`) and “correct answers” (`y`) and learn
     the relationship between `X` and `y`, which means learning a function that can
     transform `X` into predictions `p` that are very close to `y`.'
+  id: totrans-94
   prefs: []
   type: TYPE_NORMAL
 - en: 'How exactly will this learning take place, given the `Layer` and `Operation`
     classes just defined? Recalling how the model from the last chapter worked, we’ll
     implement the following:'
+  id: totrans-95
   prefs: []
   type: TYPE_NORMAL
 - en: The neural network should take `X` and pass it successively forward through
     each `Layer` (which is really a convenient wrapper around feeding it through many
     `Operation`s), at which point the result will represent the `prediction`.
+  id: totrans-96
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
+  zh: 神经网络应该接受`X`并将其逐步通过每个`Layer`（实际上是一个方便的包装器，用于通过许多`Operation`进行馈送），此时结果将代表`prediction`。
 - en: Next, `prediction` should be compared with the value `y` to calculate the loss
     and generate the “loss gradient,” which is the partial derivative of the loss
     with respect to each element in the last layer in the network (namely, the one
     that generated the `prediction`).
+  id: totrans-97
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
+  zh: 接下来，应该将`prediction`与值`y`进行比较，计算损失并生成“损失梯度”，这是与网络中最后一个层（即生成`prediction`的层）中的每个元素相关的损失的偏导数。
 - en: Finally, we’ll send this loss gradient successively backward through each layer,
     along the way computing the “parameter gradients”—the partial derivative of the
     loss with respect to each of the parameters—and storing them in the corresponding
     `Operation`s.
+  id: totrans-98
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
+  zh: 最后，我们将通过每个层将这个损失梯度逐步向后发送，同时计算“参数梯度”——损失对每个参数的偏导数，并将它们存储在相应的`Operation`中。
 - en: Diagram
+  id: totrans-99
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 图
 - en: '[Figure 3-5](#backpropagation_now_in_terms) captures this description of a
     neural network in terms of `Layer`s.'
+  id: totrans-100
   prefs: []
   type: TYPE_NORMAL
+  zh: '[图3-5](#backpropagation_now_in_terms)以`Layer`的术语捕捉了神经网络的描述。'
 - en: '![Neural net diagram](assets/dlfs_0305.png)'
+  id: totrans-101
   prefs: []
   type: TYPE_IMG
+  zh: '![神经网络图](assets/dlfs_0305.png)'
 - en: Figure 3-5\. Backpropagation, now in terms of Layers instead of Operations
+  id: totrans-102
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图3-5。反向传播，现在以Layer而不是Operation的术语
 - en: Code
+  id: totrans-103
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 代码
 - en: 'How should we implement this? First, we’ll want our neural network to ultimately
     deal with `Layer`s the same way our `Layer`s dealt with `Operation`s. For example,
     we want the `forward` method to receive `X` as input and simply do something like:'
+  id: totrans-104
   prefs: []
   type: TYPE_NORMAL
+  zh: 我们应该如何实现这一点？首先，我们希望我们的神经网络最终处理`Layer`的方式与我们的`Layer`处理`Operation`的方式相同。例如，我们希望`forward`方法接收`X`作为输入，然后简单地执行类似以下的操作：
 - en: '[PRE9]'
+  id: totrans-105
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE9]'
 - en: 'Similarly, we’ll want our `backward` method to take in an argument—let’s initially
     call it `grad`—and do something like:'
+  id: totrans-106
   prefs: []
   type: TYPE_NORMAL
+  zh: 同样，我们希望我们的`backward`方法接收一个参数——我们最初称之为`grad`——然后执行类似以下的操作：
 - en: '[PRE10]'
+  id: totrans-107
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE10]'
 - en: 'Where will `grad` come from? It has to come from the *loss*, a special function
     that takes in the `prediction` along with `y` and:'
+  id: totrans-108
   prefs: []
   type: TYPE_NORMAL
+  zh: '`grad`将从哪里来？它必须来自*损失*，一个特殊的函数，它接收`prediction`以及`y`，然后：'
 - en: Computes a single number representing the “penalty” for the network making that
     `prediction`.
+  id: totrans-109
   prefs:
   - PREF_UL
   type: TYPE_NORMAL
+  zh: 计算代表网络进行该`prediction`的“惩罚”的单个数字。
 - en: Sends backward a gradient for every element of the `prediction` with respect
     to the loss. This gradient is what the last `Layer` in the network will receive
     as the input to its `backward` function.
+  id: totrans-110
   prefs:
   - PREF_UL
   type: TYPE_NORMAL
+  zh: 针对每个`prediction`中的元素，发送一个梯度与损失相关的反向梯度。这个梯度是网络中最后一个`Layer`将作为其`backward`函数输入接收的内容。
 - en: In the example from the prior chapter, the loss function was the squared difference
     between the `prediction` and the target, and the gradient of the `prediction`
     with respect to the loss was computed accordingly.
+  id: totrans-111
   prefs: []
   type: TYPE_NORMAL
+  zh: 在前一章的示例中，损失函数是`prediction`和目标之间的平方差，相应地计算了`prediction`相对于损失的梯度。
 - en: How should we implement this? It seems like this concept is important enough
     to deserve its own class. Furthermore, this class can be implemented similarly
     to the `Layer` class, except the `forward` method will produce an actual number
     (a `float`) as the loss, instead of an `ndarray` to be sent forward to the next
     `Layer`. Let’s formalize this.
+  id: totrans-112
   prefs: []
   type: TYPE_NORMAL
+  zh: 我们应该如何实现这一点？这个概念似乎很重要，值得拥有自己的类。此外，这个类可以类似于`Layer`类实现，只是`forward`方法将产生一个实际数字（一个`float`）作为损失，而不是一个`ndarray`被发送到下一个`Layer`。让我们正式化这一点。
 - en: Loss Class
+  id: totrans-113
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 损失类
 - en: 'The `Loss` base class will be similar to `Layer`—the `forward` and `backward`
     methods will check that the shapes of the appropriate `ndarray`s are identical
     and define two methods, `_output` and `_input_grad`, that any subclass of `Loss`
     will have to define:'
+  id: totrans-114
   prefs: []
   type: TYPE_NORMAL
+  zh: '`Loss`基类将类似于`Layer`——`forward`和`backward`方法将检查适当的`ndarray`的形状是否相同，并定义两个方法，`_output`和`_input_grad`，任何`Loss`子类都必须定义：'
 - en: '[PRE11]'
+  id: totrans-115
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE11]'
 - en: 'As in the `Operation` class, we check that the gradient that the loss sends
     backward is the same shape as the `prediction` received as input from the last
     layer of the network:'
+  id: totrans-116
   prefs: []
   type: TYPE_NORMAL
+  zh: 与`Operation`类一样，我们检查损失向后发送的梯度与从网络的最后一层接收的`prediction`的形状是否相同：
 - en: '[PRE12]'
+  id: totrans-117
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE12]'
 - en: Here, we simply code the forward and backward rules of the mean squared error
     loss formula.
+  id: totrans-118
   prefs: []
   type: TYPE_NORMAL
+  zh: 在这里，我们简单地编写均方误差损失公式的前向和反向规则。
 - en: This is the last key building block we need to build deep learning from scratch.
     Let’s review how these pieces fit together and then proceed with building a model!
+  id: totrans-119
   prefs: []
   type: TYPE_NORMAL
+  zh: 这是我们需要从头开始构建深度学习的最后一个关键构建块。让我们回顾一下这些部分如何组合在一起，然后继续构建模型！
 - en: Deep Learning from Scratch
+  id: totrans-120
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
+  zh: 从零开始的深度学习
 - en: 'We ultimately want to build a `NeuralNetwork` class, using [Figure 3-5](#backpropagation_now_in_terms)
     as a guide, that we can use to define and train deep learning models. Before we
     dive in and start coding, let’s describe precisely what such a class would be
     and how it would interact with the `Operation`, `Layer`, and `Loss` classes we
     just defined:'
+  id: totrans-121
   prefs: []
   type: TYPE_NORMAL
+  zh: 我们最终希望构建一个`NeuralNetwork`类，使用[图3-5](#backpropagation_now_in_terms)作为指南，我们可以用来定义和训练深度学习模型。在我们深入编码之前，让我们准确描述一下这样一个类会是什么样的，以及它将如何与我们刚刚定义的`Operation`、`Layer`和`Loss`类进行交互：
 - en: A `NeuralNetwork` will have a list of `Layer`s as an attribute. The `Layer`s
     would be as defined previously, with `forward` and `backward` methods. These methods
     take in `ndarray` objects and return `ndarray` objects.
+  id: totrans-122
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
+  zh: '`NeuralNetwork`将具有`Layer`列表作为属性。`Layer`将如先前定义的那样，具有`forward`和`backward`方法。这些方法接受`ndarray`对象并返回`ndarray`对象。'
 - en: Each `Layer` will have a list of `Operation`s saved in the `operations` attribute
     of the layer during the `_setup_layer` function.
+  id: totrans-123
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
+  zh: 每个`Layer`在`_setup_layer`函数期间的`operations`属性中保存了一个`Operation`列表。
 - en: These `Operation`s, just like the `Layer` itself, have `forward` and `backward`
     methods that take in `ndarray` objects as arguments and return `ndarray` objects
     as outputs.
+  id: totrans-124
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
+  zh: 这些`Operation`，就像`Layer`本身一样，有`forward`和`backward`方法，接受`ndarray`对象作为参数并返回`ndarray`对象作为输出。
 - en: In each operation, the shape of the `output_grad` received in the `backward`
     method must be the same as the shape of the `output` attribute of the `Layer`.
     The same is true for the shapes of the `input_grad` passed backward during the
     `backward` method and the `input_` attribute.
+  id: totrans-125
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
@@ -665,6 +829,7 @@
     shapes apply to `Layer`s and their `forward` and `backward` methods as well—they
     take in `ndarray` objects and output `ndarray` objects, and the shapes of the
     `input` and `output` attributes and their corresponding gradients must match.
+  id: totrans-126
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
@@ -672,87 +837,109 @@
     the last operation from the `NeuralNetwork` and the target, check that their shapes
     are the same, and calculate both a loss value (a number) and an `ndarray` `loss_grad`
     that will be fed into the output layer, starting backpropagation.
+  id: totrans-127
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
 - en: Implementing Batch Training
+  id: totrans-128
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
 - en: 'We’ve covered several times the high-level steps for training a model one batch
     at a time. They are important and worth repeating:'
+  id: totrans-129
   prefs: []
   type: TYPE_NORMAL
 - en: Feed input through the model function (the “forward pass”) to get a prediction.
+  id: totrans-130
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
 - en: Calculate the number representing the loss.
+  id: totrans-131
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
 - en: Calculate the gradient of the loss with respect to the parameters, using the
     chain rule and the quantities computed during the forward pass.
+  id: totrans-132
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
 - en: Update the parameters using these gradients.
+  id: totrans-133
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
 - en: We would then feed a new batch of data through and repeat these steps.
+  id: totrans-134
   prefs: []
   type: TYPE_NORMAL
 - en: 'Translating these steps into the `NeuralNetwork` framework just described is
     straightforward:'
+  id: totrans-135
   prefs: []
   type: TYPE_NORMAL
 - en: Receive `X` and `y` as inputs, both `ndarray`s.
+  id: totrans-136
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
 - en: Feed `X` successively forward through each `Layer`.
+  id: totrans-137
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
 - en: Use the `Loss` to produce loss value and the loss gradient to be sent backward.
+  id: totrans-138
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
 - en: Use the loss gradient as input to the `backward` method for the network, which
     will calculate the `param_grads` for each layer in the network.
+  id: totrans-139
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
 - en: Call the `update_params` function on each layer, which will use the overall
     learning rate for the `NeuralNetwork` as well as the newly calculated `param_grads`.
+  id: totrans-140
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
 - en: We finally have our full definition of a neural network that can accommodate
     batch training. Now let’s code it up.
+  id: totrans-141
   prefs: []
   type: TYPE_NORMAL
 - en: 'NeuralNetwork: Code'
+  id: totrans-142
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
 - en: 'Coding all of this up is pretty straightforward:'
+  id: totrans-143
   prefs: []
   type: TYPE_NORMAL
 - en: '[PRE13]'
+  id: totrans-144
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE13]'
 - en: With this `NeuralNetwork` class, we can implement the models from the prior
     chapter in a more modular, flexible way and define other models to represent complex
     nonlinear relationships between input and output. For example, here’s how to easily
     instantiate the two models we covered in the last chapter—the linear regression
     and the neural network:^([3](ch03.html#idm45732622822120))
+  id: totrans-145
   prefs: []
   type: TYPE_NORMAL
 - en: '[PRE14]'
+  id: totrans-146
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE14]'
 - en: We’re basically done; now we just feed data repeatedly through the network in
     order for it to learn. To make this process cleaner and easier to extend to the
     more complicated deep learning scenarios we’ll see in the following chapter, however,
@@ -760,9 +947,11 @@
     as an additional class that carries out the “learning,” or the actual updating
     of the `NeuralNetwork` parameters given the gradients computed on the backward
     pass. Let’s quickly define these two classes.
+  id: totrans-147
   prefs: []
   type: TYPE_NORMAL
 - en: Trainer and Optimizer
+  id: totrans-148
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
@@ -770,13 +959,17 @@
     to train the network in [Chapter 2](ch02.html#fundamentals). There, we used the
     following code to implement the four steps described earlier for training the
     model:'
+  id: totrans-149
   prefs: []
   type: TYPE_NORMAL
 - en: '[PRE15]'
+  id: totrans-150
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE15]'
 - en: This code was within a `for` loop that repeatedly fed data through the function
     defining and updated our network.
+  id: totrans-151
   prefs: []
   type: TYPE_NORMAL
 - en: 'With the classes we have now, we’ll ultimately do this inside a `fit` function
@@ -785,22 +978,28 @@
     Notebook](https://oreil.ly/2MV0aZI) on the book’s GitHub page.) The main difference
     is that inside this new function, the first two lines from the preceding code
     block will be replaced with this line:'
+  id: totrans-152
   prefs: []
   type: TYPE_NORMAL
 - en: '[PRE16]'
+  id: totrans-153
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE16]'
 - en: Updating the parameters, which happens in the following two lines, will take
     place in a separate `Optimizer` class. And finally, the `for` loop that previously
     wrapped around all of this will take place in the `Trainer` class that wraps around
     the `NeuralNetwork` and the `Optimizer`.
+  id: totrans-154
   prefs: []
   type: TYPE_NORMAL
 - en: Next, let’s discuss why we need an `Optimizer` class and what it should look
     like.
+  id: totrans-155
   prefs: []
   type: TYPE_NORMAL
 - en: Optimizer
+  id: totrans-156
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
@@ -811,9 +1010,11 @@
     gradient updates from the specific batch that was fed in at that iteration. Creating
     a separate `Optimizer` class will give us the flexibility to swap in one update
     rule for another, something that we’ll explore in more detail in the next chapter.
+  id: totrans-157
   prefs: []
   type: TYPE_NORMAL
 - en: Description and code
+  id: totrans-158
   prefs:
   - PREF_H3
   type: TYPE_NORMAL
@@ -821,31 +1022,41 @@
     `step` function is called, will update the parameters of the network based on
     their current values, their gradients, and any other information stored in the
     `Optimizer`:'
+  id: totrans-159
   prefs: []
   type: TYPE_NORMAL
 - en: '[PRE17]'
+  id: totrans-160
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE17]'
 - en: 'And here’s how this looks with the straightforward update rule we’ve seen so
     far, known as *stochastic gradient descent*:'
+  id: totrans-161
   prefs: []
   type: TYPE_NORMAL
 - en: '[PRE18]'
+  id: totrans-162
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE18]'
 - en: Note
+  id: totrans-163
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
 - en: Note that while our `NeuralNetwork` class does not have an `_update_params`
     method, we do rely on the `params()` and `param_grads()` methods to extract the
     correct `ndarray`s for optimization.
+  id: totrans-164
   prefs: []
   type: TYPE_NORMAL
 - en: That’s the basic `Optimizer` class; let’s cover the `Trainer` class next.
+  id: totrans-165
   prefs: []
   type: TYPE_NORMAL
 - en: Trainer
+  id: totrans-166
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
@@ -855,30 +1066,38 @@
     we didn’t pass in a `NeuralNetwork` when initializing our `Optimizer`; instead,
     we’ll assign the `NeuralNetwork` to be an attribute of the `Optimizer` when we
     initialize the `Trainer` class shortly, with this line:'
+  id: totrans-167
   prefs: []
   type: TYPE_NORMAL
 - en: '[PRE19]'
+  id: totrans-168
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE19]'
 - en: 'In the following subsection, I show a simplified but working version of the
     `Trainer` class that for now contains just the `fit` method. This method trains
     our model for a number of *epochs* and prints out the loss value after each set
     number of epochs. In each epoch, we:'
+  id: totrans-169
   prefs: []
   type: TYPE_NORMAL
 - en: Shuffle the data at the beginning of the epoch
+  id: totrans-170
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
 - en: Feed the data through the network in batches, updating the parameters after
     each batch has been fed through
+  id: totrans-171
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
 - en: The epoch ends when we have fed the entire training set through the `Trainer`.
+  id: totrans-172
   prefs: []
   type: TYPE_NORMAL
 - en: Trainer code
+  id: totrans-173
   prefs:
   - PREF_H3
   type: TYPE_NORMAL
@@ -889,32 +1108,41 @@
     epoch. We also include a `restart` argument in the `train` function: if `True`
     (default), it will reinitialize the model’s parameters to random values upon calling
     the `train` function:'
+  id: totrans-174
   prefs: []
   type: TYPE_NORMAL
 - en: '[PRE20]'
+  id: totrans-175
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE20]'
 - en: 'In the full version of this function in the book’s [GitHub repository](https://oreil.ly/2MV0aZI),
     we also implement *early stopping*, which does the following:'
+  id: totrans-176
   prefs: []
   type: TYPE_NORMAL
 - en: It saves the loss value every `eval_every` epochs.
+  id: totrans-177
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
 - en: It checks whether the validation loss is lower than the last time it was calculated.
+  id: totrans-178
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
 - en: If the validation loss is *not* lower, it uses the model from `eval_every` epochs
     ago.
+  id: totrans-179
   prefs:
   - PREF_OL
   type: TYPE_NORMAL
 - en: Finally, we have everything we need to train these models!
+  id: totrans-180
   prefs: []
   type: TYPE_NORMAL
 - en: Putting Everything Together
+  id: totrans-181
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
@@ -922,78 +1150,110 @@
     classes and the two models defined before—`linear_regression` and `neural_network`.
     We’ll set the learning rate to `0.01` and the maximum number of epochs to `50`
     and evaluate our models every `10` epochs:'
+  id: totrans-182
   prefs: []
   type: TYPE_NORMAL
 - en: '[PRE21]'
+  id: totrans-183
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE21]'
 - en: '[PRE22]'
+  id: totrans-184
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE22]'
 - en: 'Using the same model-scoring functions from [Chapter 2](ch02.html#fundamentals),
     and wrapping them inside an `eval_regression_model` function, gives us these results:'
+  id: totrans-185
   prefs: []
   type: TYPE_NORMAL
 - en: '[PRE23]'
+  id: totrans-186
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE23]'
 - en: '[PRE24]'
+  id: totrans-187
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE24]'
 - en: These are similar to the results of the linear regression we ran in the last
     chapter, confirming that our framework is working.
+  id: totrans-188
   prefs: []
   type: TYPE_NORMAL
 - en: 'Running the same code with the `neural_network` model with a hidden layer with
     13 neurons, we get the following:'
+  id: totrans-189
   prefs: []
   type: TYPE_NORMAL
 - en: '[PRE25]'
+  id: totrans-190
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE25]'
 - en: '[PRE26]'
+  id: totrans-191
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE26]'
 - en: '[PRE27]'
+  id: totrans-192
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE27]'
 - en: Again, these results are similar to what we saw in the prior chapter, and they’re
     significantly better than our straightforward linear regression.
+  id: totrans-193
   prefs: []
   type: TYPE_NORMAL
 - en: Our First Deep Learning Model (from Scratch)
+  id: totrans-194
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
 - en: 'Now that all of that setup is out of the way, defining our first deep learning
     model is trivial:'
+  id: totrans-195
   prefs: []
   type: TYPE_NORMAL
 - en: '[PRE28]'
+  id: totrans-196
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE28]'
 - en: We won’t even try to be clever with this (yet). We’ll just add a hidden layer
     with the same dimensionality as the first layer, so that our network now has two
     hidden layers, each with 13 neurons.
+  id: totrans-197
   prefs: []
   type: TYPE_NORMAL
 - en: 'Training this using the same learning rate and evaluation schedule as the prior
     models yields the following result:'
+  id: totrans-198
   prefs: []
   type: TYPE_NORMAL
 - en: '[PRE29]'
+  id: totrans-199
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE29]'
 - en: '[PRE30]'
+  id: totrans-200
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE30]'
 - en: '[PRE31]'
+  id: totrans-201
   prefs: []
   type: TYPE_PRE
+  zh: '[PRE31]'
 - en: We finally worked up to doing deep learning from scratch—and indeed, on this
     real-world problem, without the use of any tricks (just a bit of learning rate
     tuning), our deep learning model does perform slightly better than a neural network
     with just one hidden layer.
+  id: totrans-202
   prefs: []
   type: TYPE_NORMAL
 - en: More importantly, we did so by building a framework that is easily extensible.
@@ -1004,9 +1264,11 @@
     activation functions into our existing layers and see if it decreases our error
     metrics; I encourage you to clone the book’s [GitHub repo](https://oreil.ly/deep-learning-github)
     and try this!
+  id: totrans-203
   prefs: []
   type: TYPE_NORMAL
 - en: Conclusion and Next Steps
+  id: totrans-204
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
@@ -1018,27 +1280,32 @@
     into the `Optimizer` and `Trainer` classes. Finally, we’ll see Dropout, a new
     kind of `Operation` that has proven essential for increasing the training stability
     of deep learning models. Onward!
+  id: totrans-205
   prefs: []
   type: TYPE_NORMAL
 - en: ^([1](ch03.html#idm45732624417528-marker)) Among all activation functions, the
     `sigmoid` function, which maps inputs to between 0 and 1, most closely mimics
     the actual activation of neurons in the brain, but in general activation functions
     can be any monotonic, nonlinear function.
+  id: totrans-206
   prefs: []
   type: TYPE_NORMAL
 - en: '^([2](ch03.html#idm45732623512888-marker)) As we’ll see in [Chapter 5](ch05.html#convolution),
     this is not true of all layers: in *convolutional* layers, for example, each output
     feature is a combination of *only a small subset* of the input features.'
+  id: totrans-207
   prefs: []
   type: TYPE_NORMAL
 - en: ^([3](ch03.html#idm45732622822120-marker)) The learning rate of 0.01 isn’t special;
     we simply found it to be optimal in the course of experimenting while writing
     the prior chapter.
+  id: totrans-208
   prefs: []
   type: TYPE_NORMAL
 - en: ^([4](ch03.html#idm45732621371848-marker)) Even on this simple problem, changing
     the hyperparameters slightly can cause the deep learning model to fail to beat
     the two-layer neural network. Clone the [GitHub repo](https://oreil.ly/deep-learning-github)
     and try it yourself!
+  id: totrans-209
   prefs: []
   type: TYPE_NORMAL