2024-02-08 18:24:17

OpenDocCN · Feb 8, 2024 · 331bc2f · 331bc2f
1 parent 1f77467
commit 331bc2f
Showing 1 changed file with 18 additions and 0 deletions.
diff --git a/totrans/fund-dl_06.yaml b/totrans/fund-dl_06.yaml
@@ -1273,13 +1273,20 @@
   id: totrans-131
   prefs: []
   type: TYPE_NORMAL
+  zh: <math alttext="theta Subscript i Baseline equals theta Subscript i minus 1 Baseline
+    minus StartFraction epsilon Over delta circled-plus StartRoot bold v overTilde
+    Subscript i Baseline EndRoot EndFraction"><mrow><msub><mi>θ</mi> <mi>i</mi></msub>
+    <mo>=</mo> <msub><mi>θ</mi> <mrow><mi>i</mi><mo>-</mo><mn>1</mn></mrow></msub>
+    <mo>-</mo> <mfrac><mi>ϵ</mi> <mrow><mi>δ</mi><mo>⊕</mo><msqrt><msub><mover accent="true"><mi>𝐯</mi>
+    <mo>˜</mo></mover> <mi>i</mi></msub></msqrt></mrow></mfrac></mrow></math> **m̃**[*i*]
 - en: 'Recently, Adam has gained popularity because of its corrective measures against
     the zero initialization bias (a weakness of RMSProp) and its ability to combine
     the core concepts behind RMSProp with momentum more effectively. PyTorch exposes
     the Adam optimizer through the following constructor:'
   id: totrans-132
   prefs: []
   type: TYPE_NORMAL
+  zh: 最近，Adam因其对零初始化偏差（RMSProp的一个弱点）的纠正措施以及其更有效地将RMSProp的核心概念与动量结合起来而变得流行。PyTorch通过以下构造函数公开了Adam优化器：
 - en: '[PRE12]'
   id: totrans-133
   prefs: []
@@ -1292,18 +1299,21 @@
   id: totrans-134
   prefs: []
   type: TYPE_NORMAL
+  zh: PyTorch中Adam的默认超参数设置通常表现良好，但Adam对于超参数选择通常是健壮的。唯一的例外是在某些情况下，学习率可能需要从默认值0.001进行修改。
 - en: The Philosophy Behind Optimizer Selection
   id: totrans-135
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
+  zh: 优化器选择背后的哲学
 - en: In this chapter, we’ve discussed several strategies that are used to make navigating
     the complex error surfaces of deep networks more tractable. These strategies have
     culminated in several optimization algorithms, each with its own benefits and
     shortcomings.
   id: totrans-136
   prefs: []
   type: TYPE_NORMAL
+  zh: 在本章中，我们讨论了几种用于使深度网络复杂误差曲面更易处理的策略。这些策略已经发展成为几种优化算法，每种算法都有其自身的优点和缺点。
 - en: While it would be awfully nice to know when to use which algorithm, there is
     very little consensus among expert practitioners. Currently, the most popular
     algorithms are minibatch gradient descent,   minibatch gradient  with  momentum,  RMSProp,  RMSProp  with  momentum,  Adam,
@@ -1313,6 +1323,7 @@
   id: totrans-137
   prefs: []
   type: TYPE_NORMAL
+  zh: 虽然知道何时使用哪种算法会很好，但在专业从业者中几乎没有共识。目前最流行的算法是小批量梯度下降、带动量的小批量梯度下降、RMSProp、带动量的RMSProp、Adam和AdaDelta（我们这里没有讨论，但PyTorch也支持）。我们鼓励您在我们构建的前馈网络模型上尝试这些优化算法。
 - en: One important point, however, is that for most deep learning practitioners,
     the best way to push the cutting edge of deep learning is not by building more
     advanced optimizers. Instead, the vast majority of breakthroughs in deep learning
@@ -1323,11 +1334,13 @@
   id: totrans-138
   prefs: []
   type: TYPE_NORMAL
+  zh: 然而，对于大多数深度学习从业者来说，推动深度学习的前沿并不是通过构建更先进的优化器。相反，过去几十年深度学习中的绝大多数突破是通过发现更容易训练的架构来实现的，而不是试图应对恶劣的误差曲面。我们将在本书的其余部分开始关注如何利用架构更有效地训练神经网络。
 - en: Summary
   id: totrans-139
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
+  zh: 摘要
 - en: In this chapter, we discussed several challenges that arise when trying to train
     deep networks with complex error surfaces. We discussed how while the challenges
     of spurious local minima are likely exaggerated, saddle points and ill-conditioning
@@ -1339,30 +1352,35 @@
   id: totrans-140
   prefs: []
   type: TYPE_NORMAL
+  zh: 在本章中，我们讨论了在尝试训练具有复杂误差曲面的深度网络时出现的几个挑战。我们讨论了虽然虚假局部最小值的挑战可能被夸大了，但鞍点和病态确实对香草小批量梯度下降的成功构成了严重威胁。我们描述了如何使用动量来克服病态，并简要讨论了近期关于近似Hessian矩阵的二阶方法的研究。我们还描述了自适应学习率优化器的演变，这些优化器在训练过程中调整学习率以实现更好的收敛。
 - en: Next, we’ll begin tackling the larger issue of network architecture and design.
     We’ll explore computer vision and how we might design deep networks that learn
     effectively from complex images.
   id: totrans-141
   prefs: []
   type: TYPE_NORMAL
+  zh: 接下来，我们将开始解决网络架构和设计的更大问题。我们将探讨计算机视觉以及我们如何设计能够有效学习复杂图像的深度网络。
 - en: '^([1](ch06.xhtml#idm45934168902672-marker)) Bengio, Yoshua, et al. “Greedy
     Layer-Wise Training of Deep Networks.” *Advances in Neural Information Processing
     Systems* 19 (2007): 153.'
   id: totrans-142
   prefs: []
   type: TYPE_NORMAL
+  zh: （[1] Bengio, Yoshua等人。“深度网络的贪婪逐层训练。”*神经信息处理系统的进展* 19（2007）：153。
 - en: ^([2](ch06.xhtml#idm45934168861872-marker)) Goodfellow, Ian J., Oriol Vinyals,
     and Andrew M. Saxe. “Qualitatively characterizing neural network optimization
     problems.” *arXiv preprint arXiv*:1412.6544 (2014).
   id: totrans-143
   prefs: []
   type: TYPE_NORMAL
+  zh: （[2] Goodfellow, Ian J.、Oriol Vinyals和Andrew M. Saxe。“定性地表征神经网络优化问题。”*arXiv预印本arXiv*:1412.6544（2014）。
 - en: ^([3](ch06.xhtml#idm45934164995408-marker)) Dauphin, Yann N., et al. “Identifying
     and attacking the saddle point problem in high-dimensional non-convex optimization.”
     *Advances in Neural Information Processing Systems*. 2014.
   id: totrans-144
   prefs: []
   type: TYPE_NORMAL
+  zh: （[3] Dauphin, Yann N.等人。“在高维非凸优化中识别和攻击鞍点问题。”*神经信息处理系统的进展*。2014年。
 - en: '^([4](ch06.xhtml#idm45934167589056-marker)) Polyak, Boris T. “Some methods
     of speeding up the convergence of iteration methods.” *USSR Computational Mathematics
     and Mathematical Physics* 4.5 (1964): 1-17.'