Skip to content

Commit

Permalink
2024-02-08 17:52:16
Browse files Browse the repository at this point in the history
  • Loading branch information
wizardforcel committed Feb 8, 2024
1 parent 122f827 commit 7128437
Show file tree
Hide file tree
Showing 2 changed files with 581 additions and 0 deletions.
70 changes: 70 additions & 0 deletions totrans/dl-scr_1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -578,46 +578,58 @@
id: totrans-83
prefs: []
type: TYPE_NORMAL
zh: 现在我们将介绍一个对理解神经网络至关重要的概念:函数可以“嵌套”形成“复合”函数。我所说的“嵌套”到底是什么意思呢?我指的是如果我们有两个函数,按照数学约定我们称为
*f*[1] 和 *f*[2],其中一个函数的输出成为下一个函数的输入,这样我们就可以“串联”它们。
- en: Diagram
id: totrans-84
prefs:
- PREF_H2
type: TYPE_NORMAL
zh: 图表
- en: The most natural way to represent a nested function is with the “minifactory”
or “box” representation (the second representation from [“Functions”](#functions-section-01)).
id: totrans-85
prefs: []
type: TYPE_NORMAL
zh: 表示嵌套函数最自然的方式是使用“迷你工厂”或“盒子”表示法(来自 [“函数”](#functions-section-01) 的第二种表示法)。
- en: As [Figure 1-6](#fig_01-07) shows, an input goes into the first function, gets
transformed, and comes out; then it goes into the second function and gets transformed
again, and we get our final output.
id: totrans-86
prefs: []
type: TYPE_NORMAL
zh: 如 [图1-6](#fig_01-07) 所示,一个输入进入第一个函数,被转换,然后出来;然后它进入第二个函数,再次被转换,我们得到最终输出。
- en: '![f1 and f2 as a chain](assets/dlfs_0106.png)'
id: totrans-87
prefs: []
type: TYPE_IMG
zh: '![f1 and f2 as a chain](assets/dlfs_0106.png)'
- en: Figure 1-6\. Nested functions, naturally
id: totrans-88
prefs:
- PREF_H6
type: TYPE_NORMAL
zh: 图1-6\. 嵌套函数,自然地
- en: Math
id: totrans-89
prefs:
- PREF_H2
type: TYPE_NORMAL
zh: 数学
- en: 'We should also include the less intuitive mathematical representation:'
id: totrans-90
prefs: []
type: TYPE_NORMAL
zh: 我们还应该包括不太直观的数学表示:
- en: <math><mrow><msub><mi>f</mi> <mn>2</mn></msub> <mrow><mo>(</mo> <msub><mi>f</mi>
<mn>1</mn></msub> <mrow><mo>(</mo> <mi>x</mi> <mo>)</mo></mrow> <mo>)</mo></mrow>
<mo>=</mo> <mi>y</mi></mrow></math>
id: totrans-91
prefs: []
type: TYPE_NORMAL
zh: <math><mrow><msub><mi>f</mi> <mn>2</mn></msub> <mrow><mo>(</mo> <msub><mi>f</mi>
<mn>1</mn></msub> <mrow><mo>(</mo> <mi>x</mi> <mo>)</mo></mrow> <mo>)</mo></mrow>
<mo>=</mo> <mi>y</mi></mrow></math>
- en: This is less intuitive because of the quirk that nested functions are read “from
the outside in” but the operations are in fact performed “from the inside out.”
For example, though <math><mrow><msub><mi>f</mi> <mn>2</mn></msub> <mrow><mo>(</mo>
Expand All @@ -628,16 +640,23 @@
id: totrans-92
prefs: []
type: TYPE_NORMAL
zh: 这是不太直观的,因为嵌套函数的怪癖是从“外到内”阅读,但实际上操作是“从内到外”执行的。例如,尽管 <math><mrow><msub><mi>f</mi>
<mn>2</mn></msub> <mrow><mo>(</mo> <msub><mi>f</mi> <mn>1</mn></msub> <mrow><mo>(</mo>
<mi>x</mi> <mo>)</mo></mrow> <mo>)</mo></mrow> <mo>=</mo> <mi>y</mi></mrow></math>
读作“f 2 of f 1 of x”,但它实际上意味着“首先将 *f*[1] 应用于 *x*,然后将 *f*[2] 应用于将 *f*[1] 应用于 *x*
的结果”。
- en: Code
id: totrans-93
prefs:
- PREF_H2
type: TYPE_NORMAL
zh: 代码
- en: 'Finally, in keeping with my promise to explain every concept from three perspectives,
we’ll code this up. First, we’ll define a data type for nested functions:'
id: totrans-94
prefs: []
type: TYPE_NORMAL
zh: 最后,为了遵守我承诺的从三个角度解释每个概念,我们将对此进行编码。首先,我们将为嵌套函数定义一个数据类型:
- en: '[PRE12]'
id: totrans-95
prefs: []
Expand All @@ -647,6 +666,7 @@
id: totrans-96
prefs: []
type: TYPE_NORMAL
zh: 然后我们将定义数据如何通过长度为2的链传递:
- en: '[PRE13]'
id: totrans-97
prefs: []
Expand All @@ -657,21 +677,26 @@
prefs:
- PREF_H2
type: TYPE_NORMAL
zh: 另一个图表
- en: Depicting the nested function using the box representation shows us that this
composite function is really just a single function. Thus, we can represent this
function as simply *f*[1] *f*[2], as shown in [Figure 1-7](#fig_01-08).
id: totrans-99
prefs: []
type: TYPE_NORMAL
zh: 使用盒子表示法描绘嵌套函数,我们可以看到这个复合函数实际上只是一个单一函数。因此,我们可以简单地表示这个函数为 *f*[1] *f*[2],如 [图1-7](#fig_01-08)
所示。
- en: '![f1f2 nested](assets/dlfs_0107.png)'
id: totrans-100
prefs: []
type: TYPE_IMG
zh: '![f1f2 nested](assets/dlfs_0107.png)'
- en: Figure 1-7\. Another way to think of nested functions
id: totrans-101
prefs:
- PREF_H6
type: TYPE_NORMAL
zh: 图1-7\. 另一种思考嵌套函数的方式
- en: Moreover, a theorem from calculus tells us that a composite function made up
of “mostly differentiable” functions is itself mostly differentiable! Thus, we
can think of *f*[1]*f*[2] as just another function that we can compute derivatives
Expand All @@ -680,34 +705,40 @@
id: totrans-102
prefs: []
type: TYPE_NORMAL
zh: 此外,微积分中的一个定理告诉我们,由“大部分可微”的函数组成的复合函数本身也是大部分可微的!因此,我们可以将 *f*[1]*f*[2] 视为另一个我们可以计算导数的函数,计算复合函数的导数将对训练深度学习模型至关重要。
- en: However, we need a formula to be able to compute this composite function’s derivative
in terms of the derivatives of its constituent functions. That’s what we’ll cover
next.
id: totrans-103
prefs: []
type: TYPE_NORMAL
zh: 然而,我们需要一个公式来计算这个复合函数的导数,以其组成函数的导数表示。这将是我们接下来要讨论的内容。
- en: The Chain Rule
id: totrans-104
prefs:
- PREF_H1
type: TYPE_NORMAL
zh: 链式法则
- en: The chain rule is a mathematical theorem that lets us compute derivatives of
composite functions. Deep learning models are, mathematically, composite functions,
and reasoning about their derivatives is essential to training them, as we’ll
see in the next couple of chapters.
id: totrans-105
prefs: []
type: TYPE_NORMAL
zh: 链式法则是一个数学定理,让我们能够计算复合函数的导数。深度学习模型在数学上是复合函数,推理它们的导数对于训练它们是至关重要的,我们将在接下来的几章中看到。
- en: Math
id: totrans-106
prefs:
- PREF_H2
type: TYPE_NORMAL
zh: 数学
- en: Mathematically, the theorem states—in a rather nonintuitive form—that, for a
given value `x`,
id: totrans-107
prefs: []
type: TYPE_NORMAL
zh: 从数学上讲,定理陈述了一个相当不直观的形式,即对于给定的值 `x`,
- en: <math display="block"><mrow><mfrac><mrow><mi>d</mi><msub><mi>f</mi> <mn>2</mn></msub></mrow>
<mrow><mi>d</mi><mi>u</mi></mrow></mfrac> <mrow><mo>(</mo> <mi>x</mi> <mo>)</mo></mrow>
<mo>=</mo> <mfrac><mrow><mi>d</mi><msub><mi>f</mi> <mn>2</mn></msub></mrow> <mrow><mi>d</mi><mi>u</mi></mrow></mfrac>
Expand All @@ -729,11 +760,13 @@
id: totrans-109
prefs: []
type: TYPE_NORMAL
zh: 其中 *u* 只是一个代表函数输入的虚拟变量。
- en: Note
id: totrans-110
prefs:
- PREF_H6
type: TYPE_NORMAL
zh: 注意
- en: When describing the derivative of a function *f* with one input and output,
we can denote the *function* that represents the derivative of this function as
<math><mfrac><mrow><mi>d</mi><mi>f</mi></mrow> <mrow><mi>d</mi><mi>u</mi></mrow></mfrac></math>
Expand All @@ -742,6 +775,9 @@
id: totrans-111
prefs: []
type: TYPE_NORMAL
zh: 当描述具有一个输入和输出的函数 *f* 的导数时,我们可以将表示该函数导数的 *函数* 表示为 <math><mfrac><mrow><mi>d</mi><mi>f</mi></mrow>
<mrow><mi>d</mi><mi>u</mi></mrow></mfrac></math>。我们可以用另一个虚拟变量代替 *u* —— 这无关紧要,就像
*f*(*x*) = *x*² 和 *f*(*y*) = *y*² 意思相同。
- en: On the other hand, later on we’ll deal with functions that take in *multiple*
inputs, say, both *x* and *y*. Once we get there, it will make sense to write
<math><mfrac><mrow><mi>d</mi><mi>f</mi></mrow> <mrow><mi>d</mi><mi>x</mi></mrow></mfrac></math>
Expand Down Expand Up @@ -1516,73 +1552,88 @@
id: totrans-212
prefs: []
type: TYPE_NORMAL
zh: 请注意,这个操作是*矩阵乘法*的特例,只是碰巧是点积,因为*X*有一行,*W*只有一列。
- en: Next, let’s look at a few ways we could depict this with a diagram.
id: totrans-213
prefs: []
type: TYPE_NORMAL
zh: 接下来,让我们看一下我们可以用图示来描述这个操作的几种方式。
- en: Diagram
id: totrans-214
prefs:
- PREF_H2
type: TYPE_NORMAL
zh:
- en: A simple way of depicting this operation is shown in [Figure 1-14](#fig_01-15).
id: totrans-215
prefs: []
type: TYPE_NORMAL
zh: 一种简单的描述这个操作的方式如[图1-14](#fig_01-15)所示。
- en: '![dlfs 0114](assets/dlfs_0114.png)'
id: totrans-216
prefs: []
type: TYPE_IMG
zh: '![dlfs 0114](assets/dlfs_0114.png)'
- en: Figure 1-14\. Diagram of a vector dot product
id: totrans-217
prefs:
- PREF_H6
type: TYPE_NORMAL
zh: 图1-14。矢量点积的图示
- en: This diagram depicts an operation that takes in two inputs, both of which can
be `ndarray`s, and produces one output `ndarray`.
id: totrans-218
prefs: []
type: TYPE_NORMAL
zh: 这个图示描述了一个接受两个输入的操作,这两个输入都可以是`ndarray`,并产生一个输出`ndarray`。
- en: But this is really a massive shorthand for many operations that are happening
on many inputs. We could instead highlight the individual operations and inputs,
as shown in Figures [1-15](#fig_01-16) and [1-16](#fig_01-17).
id: totrans-219
prefs: []
type: TYPE_NORMAL
zh: 但这实际上是对许多操作进行了大量简写,这些操作发生在许多输入上。我们可以选择突出显示各个操作和输入,如图[1-15](#fig_01-16)和[1-16](#fig_01-17)所示。
- en: '![dlfs 0115](assets/dlfs_0115.png)'
id: totrans-220
prefs: []
type: TYPE_IMG
zh: '![dlfs 0115](assets/dlfs_0115.png)'
- en: Figure 1-15\. Another diagram of a matrix multiplication
id: totrans-221
prefs:
- PREF_H6
type: TYPE_NORMAL
zh: 图1-15。矩阵乘法的另一个图示
- en: '![dlfs 0116](assets/dlfs_0116.png)'
id: totrans-222
prefs: []
type: TYPE_IMG
zh: '![dlfs 0116](assets/dlfs_0116.png)'
- en: Figure 1-16\. A third diagram of a matrix multiplication
id: totrans-223
prefs:
- PREF_H6
type: TYPE_NORMAL
zh: 图1-16。矩阵乘法的第三个图示
- en: The key point is that the dot product (or matrix multiplication) is a concise
way to represent many individual operations; in addition, as we’ll start to see
in the next section, using this operation makes our derivative calculations on
the backward pass extremely concise as well.
id: totrans-224
prefs: []
type: TYPE_NORMAL
zh: 关键点是点积(或矩阵乘法)是表示许多个体操作的简洁方式;此外,正如我们将在下一节中开始看到的,使用这个操作也使我们在反向传播中的导数计算变得极其简洁。
- en: Code
id: totrans-225
prefs:
- PREF_H2
type: TYPE_NORMAL
zh: 代码
- en: 'Finally, in code this operation is simply:'
id: totrans-226
prefs: []
type: TYPE_NORMAL
zh: 最后,在代码中,这个操作只是:
- en: '[PRE21]'
id: totrans-227
prefs: []
Expand All @@ -1595,11 +1646,13 @@
id: totrans-228
prefs: []
type: TYPE_NORMAL
zh: 我们有一个新的断言,确保矩阵乘法能够进行。(这是必要的,因为这是我们的第一个不仅仅处理大小相同的`ndarray`并对元素进行操作的操作——我们的输出现在实际上与我们的输入大小不同。)
- en: Derivatives of Functions with Multiple Vector Inputs
id: totrans-229
prefs:
- PREF_H1
type: TYPE_NORMAL
zh: 具有多个矢量输入的函数的导数
- en: 'For functions that simply take one input as a number and produce one output,
like *f*(*x*) = *x*² or *f*(*x*) = sigmoid(*x*), computing the derivative is straightforward:
we simply apply rules from calculus. For vector functions, it isn’t immediately
Expand All @@ -1611,36 +1664,46 @@
id: totrans-230
prefs: []
type: TYPE_NORMAL
zh: 对于简单将一个数字作为输入并产生一个输出的函数,如*f*(*x*) = *x*²或*f*(*x*) = sigmoid(*x*),计算导数是直接的:我们只需应用微积分规则。对于矢量函数,导数并不是立即明显的:如果我们将点积写成<math><mrow><mi>ν</mi>
<mo>(</mo> <mi>X</mi> <mo>,</mo> <mi>W</mi> <mo>)</mo> <mo>=</mo> <mi>N</mi></mrow></math>,如前一节所示,自然会产生一个问题——<math><mfrac><mrow><mi>∂</mi><mi>N</mi></mrow>
<mrow><mi>∂</mi><mi>X</mi></mrow></mfrac></math>和<math><mfrac><mrow><mi>∂</mi><mi>N</mi></mrow>
<mrow><mi>∂</mi><mi>W</mi></mrow></mfrac></math>会是什么?
- en: Diagram
id: totrans-231
prefs:
- PREF_H2
type: TYPE_NORMAL
zh:
- en: Conceptually, we just want to do something like in [Figure 1-17](#fig_01-18).
id: totrans-232
prefs: []
type: TYPE_NORMAL
zh: 从概念上讲,我们只是想做类似于[图1-17](#fig_01-18)的事情。
- en: '![dlfs 0117](assets/dlfs_0117.png)'
id: totrans-233
prefs: []
type: TYPE_IMG
zh: '![dlfs 0117](assets/dlfs_0117.png)'
- en: Figure 1-17\. Backward pass of a matrix multiplication, conceptually
id: totrans-234
prefs:
- PREF_H6
type: TYPE_NORMAL
zh: 图1-17。矩阵乘法的反向传播,概念上
- en: Calculating these derivatives was easy when we were just dealing with addition
and multiplication, as in the prior examples. But how can we do the analogous
thing with matrix multiplication? To define that precisely, we’ll have to turn
to the math.
id: totrans-235
prefs: []
type: TYPE_NORMAL
zh: 当我们只处理加法和乘法时,计算这些导数是很容易的,就像前面的例子一样。但是如何用矩阵乘法做类似的事情呢?要准确定义这一点,我们将不得不求助于数学。
- en: Math
id: totrans-236
prefs:
- PREF_H2
type: TYPE_NORMAL
zh: 数学
- en: 'First, how would we even define “the derivative with respect to a matrix”?
Recalling that the matrix syntax is just shorthand for a bunch of numbers arranged
in a particular form, “the derivative with respect to a matrix” really means “the
Expand All @@ -1649,6 +1712,7 @@
id: totrans-237
prefs: []
type: TYPE_NORMAL
zh: 首先,我们如何定义“关于矩阵的导数”?回想一下,矩阵语法只是一堆数字以特定形式排列的简写,“关于矩阵的导数”实际上意味着“关于矩阵的每个元素的导数”。由于*X*是一行,自然的定义方式是:
- en: <math display="block"><mrow><mfrac><mrow><mi>∂</mi><mi>ν</mi></mrow> <mrow><mi>∂</mi><mi>X</mi></mrow></mfrac>
<mo>=</mo> <mfenced close="]" open="["><mtable><mtr><mtd><mfrac><mrow><mi>∂</mi><mi>ν</mi></mrow>
<mrow><mi>∂</mi><msub><mi>x</mi> <mn>1</mn></msub></mrow></mfrac></mtd> <mtd><mfrac><mrow><mi>∂</mi><mi>ν</mi></mrow>
Expand All @@ -1673,6 +1737,12 @@
id: totrans-239
prefs: []
type: TYPE_NORMAL
zh: 然而,*ν*的输出只是一个数字:<math><mrow><mi>N</mi> <mo>=</mo> <msub><mi>x</mi> <mn>1</mn></msub>
<mo>×</mo> <msub><mi>w</mi> <mn>1</mn></msub> <mo>+</mo> <msub><mi>x</mi> <mn>2</mn></msub>
<mo>×</mo> <msub><mi>w</mi> <mn>2</mn></msub> <mo>+</mo> <msub><mi>x</mi> <mn>3</mn></msub>
<mo>×</mo> <msub><mi>w</mi> <mn>3</mn></msub></mrow></math>。观察这一点,我们可以看到,例如,如果<math><msub><mi>x</mi>
<mn>1</mn></msub></math>变化了*ϵ*单位,那么*N*将变化<math><mrow><msub><mi>w</mi> <mn>1</mn></msub>
<mo>×</mo> <mi>ϵ</mi></mrow></math>单位——同样的逻辑也适用于其他*x*[*i*]元素。因此:
- en: <math display="block"><mrow><mfrac><mrow><mi>∂</mi><mi>ν</mi></mrow> <mrow><mi>∂</mi><msub><mi>x</mi>
<mn>1</mn></msub></mrow></mfrac> <mo>=</mo> <msub><mi>w</mi> <mn>1</mn></msub></mrow></math><math
display="block"><mrow><mfrac><mrow><mi>∂</mi><mi>ν</mi></mrow> <mrow><mi>∂</mi><msub><mi>x</mi>
Expand Down
Loading

0 comments on commit 7128437

Please sign in to comment.