2024-02-08 17:52:16

OpenDocCN · Feb 8, 2024 · 7128437 · 7128437
1 parent 122f827
commit 7128437
Show file tree

Hide file tree

Showing 2 changed files with 581 additions and 0 deletions.
diff --git a/totrans/dl-scr_1.yaml b/totrans/dl-scr_1.yaml
@@ -578,46 +578,58 @@
   id: totrans-83
   prefs: []
   type: TYPE_NORMAL
+  zh: 现在我们将介绍一个对理解神经网络至关重要的概念：函数可以“嵌套”形成“复合”函数。我所说的“嵌套”到底是什么意思呢？我指的是如果我们有两个函数，按照数学约定我们称为
+    *f*[1] 和 *f*[2]，其中一个函数的输出成为下一个函数的输入，这样我们就可以“串联”它们。
 - en: Diagram
   id: totrans-84
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 图表
 - en: The most natural way to represent a nested function is with the “minifactory”
     or “box” representation (the second representation from [“Functions”](#functions-section-01)).
   id: totrans-85
   prefs: []
   type: TYPE_NORMAL
+  zh: 表示嵌套函数最自然的方式是使用“迷你工厂”或“盒子”表示法（来自 [“函数”](#functions-section-01) 的第二种表示法）。
 - en: As [Figure 1-6](#fig_01-07) shows, an input goes into the first function, gets
     transformed, and comes out; then it goes into the second function and gets transformed
     again, and we get our final output.
   id: totrans-86
   prefs: []
   type: TYPE_NORMAL
+  zh: 如 [图1-6](#fig_01-07) 所示，一个输入进入第一个函数，被转换，然后出来；然后它进入第二个函数，再次被转换，我们得到最终输出。
 - en: '![f1 and f2 as a chain](assets/dlfs_0106.png)'
   id: totrans-87
   prefs: []
   type: TYPE_IMG
+  zh: '![f1 and f2 as a chain](assets/dlfs_0106.png)'
 - en: Figure 1-6\. Nested functions, naturally
   id: totrans-88
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图1-6\. 嵌套函数，自然地
 - en: Math
   id: totrans-89
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 数学
 - en: 'We should also include the less intuitive mathematical representation:'
   id: totrans-90
   prefs: []
   type: TYPE_NORMAL
+  zh: 我们还应该包括不太直观的数学表示：
 - en: <math><mrow><msub><mi>f</mi> <mn>2</mn></msub> <mrow><mo>(</mo> <msub><mi>f</mi>
     <mn>1</mn></msub> <mrow><mo>(</mo> <mi>x</mi> <mo>)</mo></mrow> <mo>)</mo></mrow>
     <mo>=</mo> <mi>y</mi></mrow></math>
   id: totrans-91
   prefs: []
   type: TYPE_NORMAL
+  zh: <math><mrow><msub><mi>f</mi> <mn>2</mn></msub> <mrow><mo>(</mo> <msub><mi>f</mi>
+    <mn>1</mn></msub> <mrow><mo>(</mo> <mi>x</mi> <mo>)</mo></mrow> <mo>)</mo></mrow>
+    <mo>=</mo> <mi>y</mi></mrow></math>
 - en: This is less intuitive because of the quirk that nested functions are read “from
     the outside in” but the operations are in fact performed “from the inside out.”
     For example, though <math><mrow><msub><mi>f</mi> <mn>2</mn></msub> <mrow><mo>(</mo>
@@ -628,16 +640,23 @@
   id: totrans-92
   prefs: []
   type: TYPE_NORMAL
+  zh: 这是不太直观的，因为嵌套函数的怪癖是从“外到内”阅读，但实际上操作是“从内到外”执行的。例如，尽管 <math><mrow><msub><mi>f</mi>
+    <mn>2</mn></msub> <mrow><mo>(</mo> <msub><mi>f</mi> <mn>1</mn></msub> <mrow><mo>(</mo>
+    <mi>x</mi> <mo>)</mo></mrow> <mo>)</mo></mrow> <mo>=</mo> <mi>y</mi></mrow></math>
+    读作“f 2 of f 1 of x”，但它实际上意味着“首先将 *f*[1] 应用于 *x*，然后将 *f*[2] 应用于将 *f*[1] 应用于 *x*
+    的结果”。
 - en: Code
   id: totrans-93
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 代码
 - en: 'Finally, in keeping with my promise to explain every concept from three perspectives,
     we’ll code this up. First, we’ll define a data type for nested functions:'
   id: totrans-94
   prefs: []
   type: TYPE_NORMAL
+  zh: 最后，为了遵守我承诺的从三个角度解释每个概念，我们将对此进行编码。首先，我们将为嵌套函数定义一个数据类型：
 - en: '[PRE12]'
   id: totrans-95
   prefs: []
@@ -647,6 +666,7 @@
   id: totrans-96
   prefs: []
   type: TYPE_NORMAL
+  zh: 然后我们将定义数据如何通过长度为2的链传递：
 - en: '[PRE13]'
   id: totrans-97
   prefs: []
@@ -657,21 +677,26 @@
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 另一个图表
 - en: Depicting the nested function using the box representation shows us that this
     composite function is really just a single function. Thus, we can represent this
     function as simply *f*[1] *f*[2], as shown in [Figure 1-7](#fig_01-08).
   id: totrans-99
   prefs: []
   type: TYPE_NORMAL
+  zh: 使用盒子表示法描绘嵌套函数，我们可以看到这个复合函数实际上只是一个单一函数。因此，我们可以简单地表示这个函数为 *f*[1] *f*[2]，如 [图1-7](#fig_01-08)
+    所示。
 - en: '![f1f2 nested](assets/dlfs_0107.png)'
   id: totrans-100
   prefs: []
   type: TYPE_IMG
+  zh: '![f1f2 nested](assets/dlfs_0107.png)'
 - en: Figure 1-7\. Another way to think of nested functions
   id: totrans-101
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图1-7\. 另一种思考嵌套函数的方式
 - en: Moreover, a theorem from calculus tells us that a composite function made up
     of “mostly differentiable” functions is itself mostly differentiable! Thus, we
     can think of *f*[1]*f*[2] as just another function that we can compute derivatives
@@ -680,34 +705,40 @@
   id: totrans-102
   prefs: []
   type: TYPE_NORMAL
+  zh: 此外，微积分中的一个定理告诉我们，由“大部分可微”的函数组成的复合函数本身也是大部分可微的！因此，我们可以将 *f*[1]*f*[2] 视为另一个我们可以计算导数的函数，计算复合函数的导数将对训练深度学习模型至关重要。
 - en: However, we need a formula to be able to compute this composite function’s derivative
     in terms of the derivatives of its constituent functions. That’s what we’ll cover
     next.
   id: totrans-103
   prefs: []
   type: TYPE_NORMAL
+  zh: 然而，我们需要一个公式来计算这个复合函数的导数，以其组成函数的导数表示。这将是我们接下来要讨论的内容。
 - en: The Chain Rule
   id: totrans-104
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
+  zh: 链式法则
 - en: The chain rule is a mathematical theorem that lets us compute derivatives of
     composite functions. Deep learning models are, mathematically, composite functions,
     and reasoning about their derivatives is essential to training them, as we’ll
     see in the next couple of chapters.
   id: totrans-105
   prefs: []
   type: TYPE_NORMAL
+  zh: 链式法则是一个数学定理，让我们能够计算复合函数的导数。深度学习模型在数学上是复合函数，推理它们的导数对于训练它们是至关重要的，我们将在接下来的几章中看到。
 - en: Math
   id: totrans-106
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 数学
 - en: Mathematically, the theorem states—in a rather nonintuitive form—that, for a
     given value `x`,
   id: totrans-107
   prefs: []
   type: TYPE_NORMAL
+  zh: 从数学上讲，定理陈述了一个相当不直观的形式，即对于给定的值 `x`，
 - en: <math display="block"><mrow><mfrac><mrow><mi>d</mi><msub><mi>f</mi> <mn>2</mn></msub></mrow>
     <mrow><mi>d</mi><mi>u</mi></mrow></mfrac> <mrow><mo>(</mo> <mi>x</mi> <mo>)</mo></mrow>
     <mo>=</mo> <mfrac><mrow><mi>d</mi><msub><mi>f</mi> <mn>2</mn></msub></mrow> <mrow><mi>d</mi><mi>u</mi></mrow></mfrac>
@@ -729,11 +760,13 @@
   id: totrans-109
   prefs: []
   type: TYPE_NORMAL
+  zh: 其中 *u* 只是一个代表函数输入的虚拟变量。
 - en: Note
   id: totrans-110
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 注意
 - en: When describing the derivative of a function *f* with one input and output,
     we can denote the *function* that represents the derivative of this function as
     <math><mfrac><mrow><mi>d</mi><mi>f</mi></mrow> <mrow><mi>d</mi><mi>u</mi></mrow></mfrac></math>
@@ -742,6 +775,9 @@
   id: totrans-111
   prefs: []
   type: TYPE_NORMAL
+  zh: 当描述具有一个输入和输出的函数 *f* 的导数时，我们可以将表示该函数导数的 *函数* 表示为 <math><mfrac><mrow><mi>d</mi><mi>f</mi></mrow>
+    <mrow><mi>d</mi><mi>u</mi></mrow></mfrac></math>。我们可以用另一个虚拟变量代替 *u* —— 这无关紧要，就像
+    *f*(*x*) = *x*² 和 *f*(*y*) = *y*² 意思相同。
 - en: On the other hand, later on we’ll deal with functions that take in *multiple*
     inputs, say, both *x* and *y*. Once we get there, it will make sense to write
     <math><mfrac><mrow><mi>d</mi><mi>f</mi></mrow> <mrow><mi>d</mi><mi>x</mi></mrow></mfrac></math>
@@ -1516,73 +1552,88 @@
   id: totrans-212
   prefs: []
   type: TYPE_NORMAL
+  zh: 请注意，这个操作是*矩阵乘法*的特例，只是碰巧是点积，因为*X*有一行，*W*只有一列。
 - en: Next, let’s look at a few ways we could depict this with a diagram.
   id: totrans-213
   prefs: []
   type: TYPE_NORMAL
+  zh: 接下来，让我们看一下我们可以用图示来描述这个操作的几种方式。
 - en: Diagram
   id: totrans-214
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 图
 - en: A simple way of depicting this operation is shown in [Figure 1-14](#fig_01-15).
   id: totrans-215
   prefs: []
   type: TYPE_NORMAL
+  zh: 一种简单的描述这个操作的方式如[图1-14](#fig_01-15)所示。
 - en: '![dlfs 0114](assets/dlfs_0114.png)'
   id: totrans-216
   prefs: []
   type: TYPE_IMG
+  zh: '![dlfs 0114](assets/dlfs_0114.png)'
 - en: Figure 1-14\. Diagram of a vector dot product
   id: totrans-217
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图1-14。矢量点积的图示
 - en: This diagram depicts an operation that takes in two inputs, both of which can
     be `ndarray`s, and produces one output `ndarray`.
   id: totrans-218
   prefs: []
   type: TYPE_NORMAL
+  zh: 这个图示描述了一个接受两个输入的操作，这两个输入都可以是`ndarray`，并产生一个输出`ndarray`。
 - en: But this is really a massive shorthand for many operations that are happening
     on many inputs. We could instead highlight the individual operations and inputs,
     as shown in Figures [1-15](#fig_01-16) and [1-16](#fig_01-17).
   id: totrans-219
   prefs: []
   type: TYPE_NORMAL
+  zh: 但这实际上是对许多操作进行了大量简写，这些操作发生在许多输入上。我们可以选择突出显示各个操作和输入，如图[1-15](#fig_01-16)和[1-16](#fig_01-17)所示。
 - en: '![dlfs 0115](assets/dlfs_0115.png)'
   id: totrans-220
   prefs: []
   type: TYPE_IMG
+  zh: '![dlfs 0115](assets/dlfs_0115.png)'
 - en: Figure 1-15\. Another diagram of a matrix multiplication
   id: totrans-221
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图1-15。矩阵乘法的另一个图示
 - en: '![dlfs 0116](assets/dlfs_0116.png)'
   id: totrans-222
   prefs: []
   type: TYPE_IMG
+  zh: '![dlfs 0116](assets/dlfs_0116.png)'
 - en: Figure 1-16\. A third diagram of a matrix multiplication
   id: totrans-223
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图1-16。矩阵乘法的第三个图示
 - en: The key point is that the dot product (or matrix multiplication) is a concise
     way to represent many individual operations; in addition, as we’ll start to see
     in the next section, using this operation makes our derivative calculations on
     the backward pass extremely concise as well.
   id: totrans-224
   prefs: []
   type: TYPE_NORMAL
+  zh: 关键点是点积（或矩阵乘法）是表示许多个体操作的简洁方式；此外，正如我们将在下一节中开始看到的，使用这个操作也使我们在反向传播中的导数计算变得极其简洁。
 - en: Code
   id: totrans-225
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 代码
 - en: 'Finally, in code this operation is simply:'
   id: totrans-226
   prefs: []
   type: TYPE_NORMAL
+  zh: 最后，在代码中，这个操作只是：
 - en: '[PRE21]'
   id: totrans-227
   prefs: []
@@ -1595,11 +1646,13 @@
   id: totrans-228
   prefs: []
   type: TYPE_NORMAL
+  zh: 我们有一个新的断言，确保矩阵乘法能够进行。（这是必要的，因为这是我们的第一个不仅仅处理大小相同的`ndarray`并对元素进行操作的操作——我们的输出现在实际上与我们的输入大小不同。）
 - en: Derivatives of Functions with Multiple Vector Inputs
   id: totrans-229
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
+  zh: 具有多个矢量输入的函数的导数
 - en: 'For functions that simply take one input as a number and produce one output,
     like *f*(*x*) = *x*² or *f*(*x*) = sigmoid(*x*), computing the derivative is straightforward:
     we simply apply rules from calculus. For vector functions, it isn’t immediately
@@ -1611,36 +1664,46 @@
   id: totrans-230
   prefs: []
   type: TYPE_NORMAL
+  zh: 对于简单将一个数字作为输入并产生一个输出的函数，如*f*(*x*) = *x*²或*f*(*x*) = sigmoid(*x*)，计算导数是直接的：我们只需应用微积分规则。对于矢量函数，导数并不是立即明显的：如果我们将点积写成<math><mrow><mi>ν</mi>
+    <mo>(</mo> <mi>X</mi> <mo>,</mo> <mi>W</mi> <mo>)</mo> <mo>=</mo> <mi>N</mi></mrow></math>，如前一节所示，自然会产生一个问题——<math><mfrac><mrow><mi>∂</mi><mi>N</mi></mrow>
+    <mrow><mi>∂</mi><mi>X</mi></mrow></mfrac></math>和<math><mfrac><mrow><mi>∂</mi><mi>N</mi></mrow>
+    <mrow><mi>∂</mi><mi>W</mi></mrow></mfrac></math>会是什么？
 - en: Diagram
   id: totrans-231
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 图
 - en: Conceptually, we just want to do something like in [Figure 1-17](#fig_01-18).
   id: totrans-232
   prefs: []
   type: TYPE_NORMAL
+  zh: 从概念上讲，我们只是想做类似于[图1-17](#fig_01-18)的事情。
 - en: '![dlfs 0117](assets/dlfs_0117.png)'
   id: totrans-233
   prefs: []
   type: TYPE_IMG
+  zh: '![dlfs 0117](assets/dlfs_0117.png)'
 - en: Figure 1-17\. Backward pass of a matrix multiplication, conceptually
   id: totrans-234
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图1-17。矩阵乘法的反向传播，概念上
 - en: Calculating these derivatives was easy when we were just dealing with addition
     and multiplication, as in the prior examples. But how can we do the analogous
     thing with matrix multiplication? To define that precisely, we’ll have to turn
     to the math.
   id: totrans-235
   prefs: []
   type: TYPE_NORMAL
+  zh: 当我们只处理加法和乘法时，计算这些导数是很容易的，就像前面的例子一样。但是如何用矩阵乘法做类似的事情呢？要准确定义这一点，我们将不得不求助于数学。
 - en: Math
   id: totrans-236
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 数学
 - en: 'First, how would we even define “the derivative with respect to a matrix”?
     Recalling that the matrix syntax is just shorthand for a bunch of numbers arranged
     in a particular form, “the derivative with respect to a matrix” really means “the
@@ -1649,6 +1712,7 @@
   id: totrans-237
   prefs: []
   type: TYPE_NORMAL
+  zh: 首先，我们如何定义“关于矩阵的导数”？回想一下，矩阵语法只是一堆数字以特定形式排列的简写，“关于矩阵的导数”实际上意味着“关于矩阵的每个元素的导数”。由于*X*是一行，自然的定义方式是：
 - en: <math display="block"><mrow><mfrac><mrow><mi>∂</mi><mi>ν</mi></mrow> <mrow><mi>∂</mi><mi>X</mi></mrow></mfrac>
     <mo>=</mo> <mfenced close="]" open="["><mtable><mtr><mtd><mfrac><mrow><mi>∂</mi><mi>ν</mi></mrow>
     <mrow><mi>∂</mi><msub><mi>x</mi> <mn>1</mn></msub></mrow></mfrac></mtd> <mtd><mfrac><mrow><mi>∂</mi><mi>ν</mi></mrow>
@@ -1673,6 +1737,12 @@
   id: totrans-239
   prefs: []
   type: TYPE_NORMAL
+  zh: 然而，*ν*的输出只是一个数字：<math><mrow><mi>N</mi> <mo>=</mo> <msub><mi>x</mi> <mn>1</mn></msub>
+    <mo>×</mo> <msub><mi>w</mi> <mn>1</mn></msub> <mo>+</mo> <msub><mi>x</mi> <mn>2</mn></msub>
+    <mo>×</mo> <msub><mi>w</mi> <mn>2</mn></msub> <mo>+</mo> <msub><mi>x</mi> <mn>3</mn></msub>
+    <mo>×</mo> <msub><mi>w</mi> <mn>3</mn></msub></mrow></math>。观察这一点，我们可以看到，例如，如果<math><msub><mi>x</mi>
+    <mn>1</mn></msub></math>变化了*ϵ*单位，那么*N*将变化<math><mrow><msub><mi>w</mi> <mn>1</mn></msub>
+    <mo>×</mo> <mi>ϵ</mi></mrow></math>单位——同样的逻辑也适用于其他*x*[*i*]元素。因此：
 - en: <math display="block"><mrow><mfrac><mrow><mi>∂</mi><mi>ν</mi></mrow> <mrow><mi>∂</mi><msub><mi>x</mi>
     <mn>1</mn></msub></mrow></mfrac> <mo>=</mo> <msub><mi>w</mi> <mn>1</mn></msub></mrow></math><math
     display="block"><mrow><mfrac><mrow><mi>∂</mi><mi>ν</mi></mrow> <mrow><mi>∂</mi><msub><mi>x</mi>