2024-02-08 19:31:19

OpenDocCN · Feb 8, 2024 · 73f72c0 · 73f72c0
1 parent 2492d5d
commit 73f72c0
Show file tree

Hide file tree

Showing 2 changed files with 569 additions and 0 deletions.
diff --git a/totrans/prac-dl-cld_04.yaml b/totrans/prac-dl-cld_04.yaml
@@ -1928,11 +1928,14 @@
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图4-22。使用从音频预测的潜在因素进行预测使用模式分布的t-SNE可视化（图片来源：“深度基于内容的音乐推荐”由Aaron van den Oord，Sander
+    Dieleman，Benjamin Schrauwen，NIPS 2013）
 - en: Image Captioning
   id: totrans-275
   prefs:
   - PREF_H2
   type: TYPE_NORMAL
+  zh: 图像字幕
 - en: Image captioning is the science of translating an image into a sentence (as
     illustrated in [Figure 4-23](part0006.html#image_captioning_feature_in_seeing_aicol)).
     Going beyond just object tagging, this requires a deeper visual understanding
@@ -1945,17 +1948,21 @@
   id: totrans-276
   prefs: []
   type: TYPE_NORMAL
+  zh: 图像字幕是将图像翻译成句子的科学（如[图4-23](part0006.html#image_captioning_feature_in_seeing_aicol)所示）。这不仅仅是物体标记，还需要对整个图像和物体之间的关系有更深入的视觉理解。为了训练这些模型，一个名为MS
+    COCO的开源数据集于2014年发布，其中包括超过30万张图像，以及物体类别、句子描述、视觉问答对和物体分割。它作为每年竞赛的基准，用于观察图像字幕、物体检测和分割的进展。
 - en: '![Image captioning feature in Seeing AI: the Talking Camera App for the blind
     community](../images/00309.jpeg)'
   id: totrans-277
   prefs: []
   type: TYPE_IMG
+  zh: '![Seeing AI中的图像字幕功能：盲人社区的Talking Camera App](../images/00309.jpeg)'
 - en: 'Figure 4-23\. Image captioning feature in Seeing AI: the Talking Camera App
     for the blind community'
   id: totrans-278
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图4-23。Seeing AI中的图像字幕功能：盲人社区的Talking Camera App
 - en: A common strategy applied in the first year of the challenge (2015) was to append
     a language model (LSTM/RNN) with a CNN in such a way that the output of a CNN
     feature vector is taken as the input to the language model (LSTM/RNN). This combined
@@ -1970,6 +1977,7 @@
   id: totrans-279
   prefs: []
   type: TYPE_NORMAL
+  zh: 在挑战的第一年（2015年）中应用的一种常见策略是将语言模型（LSTM/RNN）与CNN结合起来，以使CNN特征向量的输出作为语言模型（LSTM/RNN）的输入。这种组合模型以端到端的方式联合训练，取得了令人印象深刻的结果，震惊了世界。尽管每个研究实验室都在努力超越对方，但后来发现进行简单的最近邻搜索可以产生最先进的结果。对于给定的图像，根据嵌入的相似性找到相似的图像。然后，注意相似图像字幕中的共同词，并打印包含最常见词的字幕。简而言之，懒惰的方法仍然能击败最先进的方法，这暴露了数据集中的一个关键偏见。
 - en: 'This bias has been coined the *Giraffe-Tree* problem by Larry Zitnick. Do an
     image search for “giraffe” on a search engine. Look closely: in addition to giraffe,
     is there grass in almost every image? Chances are you can describe the majority
@@ -1983,28 +1991,33 @@
   id: totrans-280
   prefs: []
   type: TYPE_NORMAL
+  zh: 这种偏见被Larry Zitnick称为*长颈鹿-树*问题。在搜索引擎上搜索“长颈鹿”进行图像搜索。仔细观察：除了长颈鹿，几乎每张图像中都有草吗？很有可能你可以将这些图像的大多数描述为“一只长颈鹿站在草地上”。同样，如果查询图像（如[图4-24](part0006.html#the_giraffe-tree_problem_left_parenthesi)中最左边的照片）包含一只长颈鹿和一棵树，几乎所有相似的图像（右边）都可以描述为“一只长颈鹿站在草地上，旁边有一棵树”。即使没有对图像有更深入的理解，也可以通过简单的最近邻搜索得出正确的字幕。这表明为了衡量系统的真正智能，我们需要在测试集中加入更多语义上新颖/原创的图像。
 - en: '![The Giraffe-Tree problem (image source: Measuring Machine Intelligence Through
     Visual Question Answering, C. Lawrence Zitnick, Aishwarya Agrawal, Stanislaw Antol,
     Margaret Mitchell, Dhruv Batra, Devi Parikh)](../images/00268.jpeg)'
   id: totrans-281
   prefs: []
   type: TYPE_IMG
+  zh: '![长颈鹿-树问题（图片来源：通过视觉问答测量机器智能，C.劳伦斯·齐特尼克，艾丝瓦里亚·阿格拉瓦尔，斯坦尼斯洛夫·安托尔，玛格丽特·米切尔，德鲁夫·巴特拉，黛薇·帕里克）](../images/00268.jpeg)'
 - en: 'Figure 4-24\. The Giraffe-Tree problem (image source: Measuring Machine Intelligence
     Through Visual Question Answering, C. Lawrence Zitnick, Aishwarya Agrawal, Stanislaw
     Antol, Margaret Mitchell, Dhruv Batra, Devi Parikh)'
   id: totrans-282
   prefs:
   - PREF_H6
   type: TYPE_NORMAL
+  zh: 图4-24。长颈鹿-树问题（图片来源：通过视觉问答测量机器智能，C.劳伦斯·齐特尼克，艾丝瓦里亚·阿格拉瓦尔，斯坦尼斯洛夫·安托尔，玛格丽特·米切尔，德鲁夫·巴特拉，黛薇·帕里克）
 - en: In short, don’t underestimate a simple nearest-neighbor approach!
   id: totrans-283
   prefs: []
   type: TYPE_NORMAL
+  zh: 简而言之，不要低估简单的最近邻方法！
 - en: Summary
   id: totrans-284
   prefs:
   - PREF_H1
   type: TYPE_NORMAL
+  zh: 总结
 - en: Now we are at the end of a successful expedition where we explored locating
     similar images with the help of embeddings. We took this one level further by
     exploring how to scale searches from a few thousand to a few billion documents