Skip to content

Commit

Permalink
2024-02-08 18:24:17
Browse files Browse the repository at this point in the history
  • Loading branch information
wizardforcel committed Feb 8, 2024
1 parent 1f77467 commit 331bc2f
Showing 1 changed file with 18 additions and 0 deletions.
18 changes: 18 additions & 0 deletions totrans/fund-dl_06.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1273,13 +1273,20 @@
id: totrans-131
prefs: []
type: TYPE_NORMAL
zh: <math alttext="theta Subscript i Baseline equals theta Subscript i minus 1 Baseline
minus StartFraction epsilon Over delta circled-plus StartRoot bold v overTilde
Subscript i Baseline EndRoot EndFraction"><mrow><msub><mi>θ</mi> <mi>i</mi></msub>
<mo>=</mo> <msub><mi>θ</mi> <mrow><mi>i</mi><mo>-</mo><mn>1</mn></mrow></msub>
<mo>-</mo> <mfrac><mi>ϵ</mi> <mrow><mi>δ</mi><mo>⊕</mo><msqrt><msub><mover accent="true"><mi>𝐯</mi>
<mo>˜</mo></mover> <mi>i</mi></msub></msqrt></mrow></mfrac></mrow></math> **m̃**[*i*]
- en: 'Recently, Adam has gained popularity because of its corrective measures against
the zero initialization bias (a weakness of RMSProp) and its ability to combine
the core concepts behind RMSProp with momentum more effectively. PyTorch exposes
the Adam optimizer through the following constructor:'
id: totrans-132
prefs: []
type: TYPE_NORMAL
zh: 最近,Adam因其对零初始化偏差(RMSProp的一个弱点)的纠正措施以及其更有效地将RMSProp的核心概念与动量结合起来而变得流行。PyTorch通过以下构造函数公开了Adam优化器:
- en: '[PRE12]'
id: totrans-133
prefs: []
Expand All @@ -1292,18 +1299,21 @@
id: totrans-134
prefs: []
type: TYPE_NORMAL
zh: PyTorch中Adam的默认超参数设置通常表现良好,但Adam对于超参数选择通常是健壮的。唯一的例外是在某些情况下,学习率可能需要从默认值0.001进行修改。
- en: The Philosophy Behind Optimizer Selection
id: totrans-135
prefs:
- PREF_H1
type: TYPE_NORMAL
zh: 优化器选择背后的哲学
- en: In this chapter, we’ve discussed several strategies that are used to make navigating
the complex error surfaces of deep networks more tractable. These strategies have
culminated in several optimization algorithms, each with its own benefits and
shortcomings.
id: totrans-136
prefs: []
type: TYPE_NORMAL
zh: 在本章中,我们讨论了几种用于使深度网络复杂误差曲面更易处理的策略。这些策略已经发展成为几种优化算法,每种算法都有其自身的优点和缺点。
- en: While it would be awfully nice to know when to use which algorithm, there is
very little consensus among expert practitioners. Currently, the most popular
algorithms are minibatch gradient descent,  minibatch gradient with momentum, RMSProp, RMSProp with momentum, Adam,
Expand All @@ -1313,6 +1323,7 @@
id: totrans-137
prefs: []
type: TYPE_NORMAL
zh: 虽然知道何时使用哪种算法会很好,但在专业从业者中几乎没有共识。目前最流行的算法是小批量梯度下降、带动量的小批量梯度下降、RMSProp、带动量的RMSProp、Adam和AdaDelta(我们这里没有讨论,但PyTorch也支持)。我们鼓励您在我们构建的前馈网络模型上尝试这些优化算法。
- en: One important point, however, is that for most deep learning practitioners,
the best way to push the cutting edge of deep learning is not by building more
advanced optimizers. Instead, the vast majority of breakthroughs in deep learning
Expand All @@ -1323,11 +1334,13 @@
id: totrans-138
prefs: []
type: TYPE_NORMAL
zh: 然而,对于大多数深度学习从业者来说,推动深度学习的前沿并不是通过构建更先进的优化器。相反,过去几十年深度学习中的绝大多数突破是通过发现更容易训练的架构来实现的,而不是试图应对恶劣的误差曲面。我们将在本书的其余部分开始关注如何利用架构更有效地训练神经网络。
- en: Summary
id: totrans-139
prefs:
- PREF_H1
type: TYPE_NORMAL
zh: 摘要
- en: In this chapter, we discussed several challenges that arise when trying to train
deep networks with complex error surfaces. We discussed how while the challenges
of spurious local minima are likely exaggerated, saddle points and ill-conditioning
Expand All @@ -1339,30 +1352,35 @@
id: totrans-140
prefs: []
type: TYPE_NORMAL
zh: 在本章中,我们讨论了在尝试训练具有复杂误差曲面的深度网络时出现的几个挑战。我们讨论了虽然虚假局部最小值的挑战可能被夸大了,但鞍点和病态确实对香草小批量梯度下降的成功构成了严重威胁。我们描述了如何使用动量来克服病态,并简要讨论了近期关于近似Hessian矩阵的二阶方法的研究。我们还描述了自适应学习率优化器的演变,这些优化器在训练过程中调整学习率以实现更好的收敛。
- en: Next, we’ll begin tackling the larger issue of network architecture and design.
We’ll explore computer vision and how we might design deep networks that learn
effectively from complex images.
id: totrans-141
prefs: []
type: TYPE_NORMAL
zh: 接下来,我们将开始解决网络架构和设计的更大问题。我们将探讨计算机视觉以及我们如何设计能够有效学习复杂图像的深度网络。
- en: '^([1](ch06.xhtml#idm45934168902672-marker)) Bengio, Yoshua, et al. “Greedy
Layer-Wise Training of Deep Networks.” *Advances in Neural Information Processing
Systems* 19 (2007): 153.'
id: totrans-142
prefs: []
type: TYPE_NORMAL
zh: ([1] Bengio, Yoshua等人。“深度网络的贪婪逐层训练。”*神经信息处理系统的进展* 19(2007):153。
- en: ^([2](ch06.xhtml#idm45934168861872-marker)) Goodfellow, Ian J., Oriol Vinyals,
and Andrew M. Saxe. “Qualitatively characterizing neural network optimization
problems.” *arXiv preprint arXiv*:1412.6544 (2014).
id: totrans-143
prefs: []
type: TYPE_NORMAL
zh: ([2] Goodfellow, Ian J.、Oriol Vinyals和Andrew M. Saxe。“定性地表征神经网络优化问题。”*arXiv预印本arXiv*:1412.6544(2014)。
- en: ^([3](ch06.xhtml#idm45934164995408-marker)) Dauphin, Yann N., et al. “Identifying
and attacking the saddle point problem in high-dimensional non-convex optimization.”
*Advances in Neural Information Processing Systems*. 2014.
id: totrans-144
prefs: []
type: TYPE_NORMAL
zh: ([3] Dauphin, Yann N.等人。“在高维非凸优化中识别和攻击鞍点问题。”*神经信息处理系统的进展*。2014年。
- en: '^([4](ch06.xhtml#idm45934167589056-marker)) Polyak, Boris T. “Some methods
of speeding up the convergence of iteration methods.” *USSR Computational Mathematics
and Mathematical Physics* 4.5 (1964): 1-17.'
Expand Down

0 comments on commit 331bc2f

Please sign in to comment.