Skip to content

Commit

Permalink
2024-02-08 19:31:19
Browse files Browse the repository at this point in the history
  • Loading branch information
wizardforcel committed Feb 8, 2024
1 parent 2492d5d commit 73f72c0
Show file tree
Hide file tree
Showing 2 changed files with 569 additions and 0 deletions.
13 changes: 13 additions & 0 deletions totrans/prac-dl-cld_04.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1928,11 +1928,14 @@
prefs:
- PREF_H6
type: TYPE_NORMAL
zh: 图4-22。使用从音频预测的潜在因素进行预测使用模式分布的t-SNE可视化(图片来源:“深度基于内容的音乐推荐”由Aaron van den Oord,Sander
Dieleman,Benjamin Schrauwen,NIPS 2013)
- en: Image Captioning
id: totrans-275
prefs:
- PREF_H2
type: TYPE_NORMAL
zh: 图像字幕
- en: Image captioning is the science of translating an image into a sentence (as
illustrated in [Figure 4-23](part0006.html#image_captioning_feature_in_seeing_aicol)).
Going beyond just object tagging, this requires a deeper visual understanding
Expand All @@ -1945,17 +1948,21 @@
id: totrans-276
prefs: []
type: TYPE_NORMAL
zh: 图像字幕是将图像翻译成句子的科学(如[图4-23](part0006.html#image_captioning_feature_in_seeing_aicol)所示)。这不仅仅是物体标记,还需要对整个图像和物体之间的关系有更深入的视觉理解。为了训练这些模型,一个名为MS
COCO的开源数据集于2014年发布,其中包括超过30万张图像,以及物体类别、句子描述、视觉问答对和物体分割。它作为每年竞赛的基准,用于观察图像字幕、物体检测和分割的进展。
- en: '![Image captioning feature in Seeing AI: the Talking Camera App for the blind
community](../images/00309.jpeg)'
id: totrans-277
prefs: []
type: TYPE_IMG
zh: '![Seeing AI中的图像字幕功能:盲人社区的Talking Camera App](../images/00309.jpeg)'
- en: 'Figure 4-23\. Image captioning feature in Seeing AI: the Talking Camera App
for the blind community'
id: totrans-278
prefs:
- PREF_H6
type: TYPE_NORMAL
zh: 图4-23。Seeing AI中的图像字幕功能:盲人社区的Talking Camera App
- en: A common strategy applied in the first year of the challenge (2015) was to append
a language model (LSTM/RNN) with a CNN in such a way that the output of a CNN
feature vector is taken as the input to the language model (LSTM/RNN). This combined
Expand All @@ -1970,6 +1977,7 @@
id: totrans-279
prefs: []
type: TYPE_NORMAL
zh: 在挑战的第一年(2015年)中应用的一种常见策略是将语言模型(LSTM/RNN)与CNN结合起来,以使CNN特征向量的输出作为语言模型(LSTM/RNN)的输入。这种组合模型以端到端的方式联合训练,取得了令人印象深刻的结果,震惊了世界。尽管每个研究实验室都在努力超越对方,但后来发现进行简单的最近邻搜索可以产生最先进的结果。对于给定的图像,根据嵌入的相似性找到相似的图像。然后,注意相似图像字幕中的共同词,并打印包含最常见词的字幕。简而言之,懒惰的方法仍然能击败最先进的方法,这暴露了数据集中的一个关键偏见。
- en: 'This bias has been coined the *Giraffe-Tree* problem by Larry Zitnick. Do an
image search for “giraffe” on a search engine. Look closely: in addition to giraffe,
is there grass in almost every image? Chances are you can describe the majority
Expand All @@ -1983,28 +1991,33 @@
id: totrans-280
prefs: []
type: TYPE_NORMAL
zh: 这种偏见被Larry Zitnick称为*长颈鹿-树*问题。在搜索引擎上搜索“长颈鹿”进行图像搜索。仔细观察:除了长颈鹿,几乎每张图像中都有草吗?很有可能你可以将这些图像的大多数描述为“一只长颈鹿站在草地上”。同样,如果查询图像(如[图4-24](part0006.html#the_giraffe-tree_problem_left_parenthesi)中最左边的照片)包含一只长颈鹿和一棵树,几乎所有相似的图像(右边)都可以描述为“一只长颈鹿站在草地上,旁边有一棵树”。即使没有对图像有更深入的理解,也可以通过简单的最近邻搜索得出正确的字幕。这表明为了衡量系统的真正智能,我们需要在测试集中加入更多语义上新颖/原创的图像。
- en: '![The Giraffe-Tree problem (image source: Measuring Machine Intelligence Through
Visual Question Answering, C. Lawrence Zitnick, Aishwarya Agrawal, Stanislaw Antol,
Margaret Mitchell, Dhruv Batra, Devi Parikh)](../images/00268.jpeg)'
id: totrans-281
prefs: []
type: TYPE_IMG
zh: '![长颈鹿-树问题(图片来源:通过视觉问答测量机器智能,C.劳伦斯·齐特尼克,艾丝瓦里亚·阿格拉瓦尔,斯坦尼斯洛夫·安托尔,玛格丽特·米切尔,德鲁夫·巴特拉,黛薇·帕里克)](../images/00268.jpeg)'
- en: 'Figure 4-24\. The Giraffe-Tree problem (image source: Measuring Machine Intelligence
Through Visual Question Answering, C. Lawrence Zitnick, Aishwarya Agrawal, Stanislaw
Antol, Margaret Mitchell, Dhruv Batra, Devi Parikh)'
id: totrans-282
prefs:
- PREF_H6
type: TYPE_NORMAL
zh: 图4-24。长颈鹿-树问题(图片来源:通过视觉问答测量机器智能,C.劳伦斯·齐特尼克,艾丝瓦里亚·阿格拉瓦尔,斯坦尼斯洛夫·安托尔,玛格丽特·米切尔,德鲁夫·巴特拉,黛薇·帕里克)
- en: In short, don’t underestimate a simple nearest-neighbor approach!
id: totrans-283
prefs: []
type: TYPE_NORMAL
zh: 简而言之,不要低估简单的最近邻方法!
- en: Summary
id: totrans-284
prefs:
- PREF_H1
type: TYPE_NORMAL
zh: 总结
- en: Now we are at the end of a successful expedition where we explored locating
similar images with the help of embeddings. We took this one level further by
exploring how to scale searches from a few thousand to a few billion documents
Expand Down
Loading

0 comments on commit 73f72c0

Please sign in to comment.