diff --git a/totrans/fund-dl_06.yaml b/totrans/fund-dl_06.yaml index 5c8cad9..02176ea 100644 --- a/totrans/fund-dl_06.yaml +++ b/totrans/fund-dl_06.yaml @@ -1273,6 +1273,12 @@ id: totrans-131 prefs: [] type: TYPE_NORMAL + zh: θ i + = θ i-1 + - ϵ δ𝐯 + ˜ i **m̃**[*i*] - en: 'Recently, Adam has gained popularity because of its corrective measures against the zero initialization bias (a weakness of RMSProp) and its ability to combine the core concepts behind RMSProp with momentum more effectively. PyTorch exposes @@ -1280,6 +1286,7 @@ id: totrans-132 prefs: [] type: TYPE_NORMAL + zh: 最近,Adam因其对零初始化偏差(RMSProp的一个弱点)的纠正措施以及其更有效地将RMSProp的核心概念与动量结合起来而变得流行。PyTorch通过以下构造函数公开了Adam优化器: - en: '[PRE12]' id: totrans-133 prefs: [] @@ -1292,11 +1299,13 @@ id: totrans-134 prefs: [] type: TYPE_NORMAL + zh: PyTorch中Adam的默认超参数设置通常表现良好,但Adam对于超参数选择通常是健壮的。唯一的例外是在某些情况下,学习率可能需要从默认值0.001进行修改。 - en: The Philosophy Behind Optimizer Selection id: totrans-135 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 优化器选择背后的哲学 - en: In this chapter, we’ve discussed several strategies that are used to make navigating the complex error surfaces of deep networks more tractable. These strategies have culminated in several optimization algorithms, each with its own benefits and @@ -1304,6 +1313,7 @@ id: totrans-136 prefs: [] type: TYPE_NORMAL + zh: 在本章中,我们讨论了几种用于使深度网络复杂误差曲面更易处理的策略。这些策略已经发展成为几种优化算法,每种算法都有其自身的优点和缺点。 - en: While it would be awfully nice to know when to use which algorithm, there is very little consensus among expert practitioners. Currently, the most popular algorithms are minibatch gradient descent,  minibatch gradient with momentum, RMSProp, RMSProp with momentum, Adam, @@ -1313,6 +1323,7 @@ id: totrans-137 prefs: [] type: TYPE_NORMAL + zh: 虽然知道何时使用哪种算法会很好,但在专业从业者中几乎没有共识。目前最流行的算法是小批量梯度下降、带动量的小批量梯度下降、RMSProp、带动量的RMSProp、Adam和AdaDelta(我们这里没有讨论,但PyTorch也支持)。我们鼓励您在我们构建的前馈网络模型上尝试这些优化算法。 - en: One important point, however, is that for most deep learning practitioners, the best way to push the cutting edge of deep learning is not by building more advanced optimizers. Instead, the vast majority of breakthroughs in deep learning @@ -1323,11 +1334,13 @@ id: totrans-138 prefs: [] type: TYPE_NORMAL + zh: 然而,对于大多数深度学习从业者来说,推动深度学习的前沿并不是通过构建更先进的优化器。相反,过去几十年深度学习中的绝大多数突破是通过发现更容易训练的架构来实现的,而不是试图应对恶劣的误差曲面。我们将在本书的其余部分开始关注如何利用架构更有效地训练神经网络。 - en: Summary id: totrans-139 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 摘要 - en: In this chapter, we discussed several challenges that arise when trying to train deep networks with complex error surfaces. We discussed how while the challenges of spurious local minima are likely exaggerated, saddle points and ill-conditioning @@ -1339,30 +1352,35 @@ id: totrans-140 prefs: [] type: TYPE_NORMAL + zh: 在本章中,我们讨论了在尝试训练具有复杂误差曲面的深度网络时出现的几个挑战。我们讨论了虽然虚假局部最小值的挑战可能被夸大了,但鞍点和病态确实对香草小批量梯度下降的成功构成了严重威胁。我们描述了如何使用动量来克服病态,并简要讨论了近期关于近似Hessian矩阵的二阶方法的研究。我们还描述了自适应学习率优化器的演变,这些优化器在训练过程中调整学习率以实现更好的收敛。 - en: Next, we’ll begin tackling the larger issue of network architecture and design. We’ll explore computer vision and how we might design deep networks that learn effectively from complex images. id: totrans-141 prefs: [] type: TYPE_NORMAL + zh: 接下来,我们将开始解决网络架构和设计的更大问题。我们将探讨计算机视觉以及我们如何设计能够有效学习复杂图像的深度网络。 - en: '^([1](ch06.xhtml#idm45934168902672-marker)) Bengio, Yoshua, et al. “Greedy Layer-Wise Training of Deep Networks.” *Advances in Neural Information Processing Systems* 19 (2007): 153.' id: totrans-142 prefs: [] type: TYPE_NORMAL + zh: ([1] Bengio, Yoshua等人。“深度网络的贪婪逐层训练。”*神经信息处理系统的进展* 19(2007):153。 - en: ^([2](ch06.xhtml#idm45934168861872-marker)) Goodfellow, Ian J., Oriol Vinyals, and Andrew M. Saxe. “Qualitatively characterizing neural network optimization problems.” *arXiv preprint arXiv*:1412.6544 (2014). id: totrans-143 prefs: [] type: TYPE_NORMAL + zh: ([2] Goodfellow, Ian J.、Oriol Vinyals和Andrew M. Saxe。“定性地表征神经网络优化问题。”*arXiv预印本arXiv*:1412.6544(2014)。 - en: ^([3](ch06.xhtml#idm45934164995408-marker)) Dauphin, Yann N., et al. “Identifying and attacking the saddle point problem in high-dimensional non-convex optimization.” *Advances in Neural Information Processing Systems*. 2014. id: totrans-144 prefs: [] type: TYPE_NORMAL + zh: ([3] Dauphin, Yann N.等人。“在高维非凸优化中识别和攻击鞍点问题。”*神经信息处理系统的进展*。2014年。 - en: '^([4](ch06.xhtml#idm45934167589056-marker)) Polyak, Boris T. “Some methods of speeding up the convergence of iteration methods.” *USSR Computational Mathematics and Mathematical Physics* 4.5 (1964): 1-17.'