diff --git a/totrans/prac-dl-cld_03.yaml b/totrans/prac-dl-cld_03.yaml index 9e5e338..6f122c2 100644 --- a/totrans/prac-dl-cld_03.yaml +++ b/totrans/prac-dl-cld_03.yaml @@ -131,15 +131,18 @@ id: totrans-16 prefs: [] type: TYPE_NORMAL + zh: 在机器学习中,我们需要将数据转换为一组可识别的特征,然后添加一个分类算法对它们进行分类。CNN也是如此。它们由两部分组成:卷积层和全连接层。卷积层的工作是将图像的大量像素转换为一个更小的表示;即特征。全连接层将这些特征转换为概率。全连接层实际上是一个具有隐藏层的神经网络,正如我们在[第1章](part0003.html#2RHM3-13fa565533764549a6f0ab7f11eed62b)中看到的那样。总之,卷积层充当特征提取器,而全连接层充当分类器。[图3-2](part0005.html#a_high-level_overview_of_a_convolutional)显示了CNN的高级概述。 - en: '![A high-level overview of a Convolutional Neural Network](../images/00082.jpeg)' id: totrans-17 prefs: [] type: TYPE_IMG + zh: '![卷积神经网络的高级概述](../images/00082.jpeg)' - en: Figure 3-2\. A high-level overview of a CNN id: totrans-18 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 图3-2. CNN的高级概述 - en: Imagine that we want to detect a human face. We might want to use a CNN to classify an image and determine whether it contains a face. Such a CNN would be composed of several layers connected one after another. These layers represent mathematical @@ -149,6 +152,7 @@ id: totrans-19 prefs: [] type: TYPE_NORMAL + zh: 想象一下,我们想要检测一个人脸。我们可能想要使用CNN对图像进行分类,并确定其中是否包含人脸。这样的CNN由几个层连接在一起组成。这些层代表数学运算。一个层的输出是下一个层的输入。第一个(或最底层)是输入层,输入图像被馈送到这里。最后一个(或最顶层)是输出层,给出预测。 - en: The way it works is the image is fed into the CNN and passes through a series of layers, with each performing a mathematical operation and passing the result to the subsequent layer. The resulting output is a list of object classes and @@ -158,11 +162,13 @@ id: totrans-20 prefs: [] type: TYPE_NORMAL + zh: 它的工作方式是将图像馈送到CNN中,并通过一系列层,每个层执行数学运算并将结果传递给下一个层。最终的输出是一个对象类别列表及其概率。例如,类别如球—65%,草—20%,等等。如果图像的输出包含一个“人脸”类别,概率为70%,我们可以得出结论,图像中包含人脸的可能性为70%。 - en: Note id: totrans-21 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 注意 - en: An intuitive (and overly simplified) way to look at CNNs is to see them as a series of filters. As the word filter implies, each layer acts as a sieve of information, letting something “pass through” only if it recognizes it. (If you have heard @@ -174,6 +180,7 @@ id: totrans-22 prefs: [] type: TYPE_NORMAL + zh: 看待CNN的一种直观(和过于简化的)方式是将它们视为一系列滤波器。正如“滤波器”一词所暗示的,每个层都充当信息的筛子,只有在识别到信息时才“通过”。(如果你听说过电子学中的高通和低通滤波器,这可能会很熟悉。)我们说该层对该信息“激活”。每个层对类似猫、狗、汽车等部分的视觉模式被激活。如果一个层没有识别信息(由于训练时学到的内容),其输出接近于零。CNN是深度学习世界的“保安”! - en: In the facial detection example, lower-level layers ([Figure 3-3](part0005.html#left_parenthesisaright_parenthesis_lower), a; layers that are closer to the input image) are “activated” for simpler shapes; for example, edges and curves. Because these layers activate only for basic shapes, @@ -190,6 +197,10 @@ id: totrans-23 prefs: [] type: TYPE_NORMAL + zh: 在人脸检测示例中,较低级别的层([图3-3](part0005.html#left_parenthesisaright_parenthesis_lower) + a; 靠近输入图像的层)被“激活”以获取更简单的形状;例如,边缘和曲线。因为这些层仅对基本形状激活,所以它们可以很容易地被重新用于不同于人脸识别的目的,比如检测汽车(毕竟每个图像都由边缘和曲线组成)。中级别的层([图3-3](part0005.html#left_parenthesisaright_parenthesis_lower) + b)被激活以获取更复杂的形状,比如眼睛、鼻子和嘴唇。这些层不像较低级别的层那样容易被重复使用。它们可能不太适用于检测汽车,但可能仍然适用于检测动物。更高级别的层([图3-3](part0005.html#left_parenthesisaright_parenthesis_lower) + c)被激活以获取更复杂的形状,例如大部分人脸。这些层往往更具任务特定性,因此在其他图像分类问题中最不可重复使用。 - en: '![(a) Lower level activations, followed by (b) mid-level activations and (c) upper layer activations (image source: Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations, Lee et al., ICML @@ -197,6 +208,8 @@ id: totrans-24 prefs: [] type: TYPE_IMG + zh: (a)较低级别的激活,接着是(b)中级别的激活和(c)上层的激活(图片来源:Lee等人的《用于可扩展无监督学习的分层表示的卷积深度信念网络》,ICML + 2009) - en: 'Figure 3-3\. (a) Lower-level activations, followed by (b) midlevel activations and (c) upper-layer activations (image source: Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations, Lee et al., diff --git a/totrans/prac-dl-cld_04.yaml b/totrans/prac-dl-cld_04.yaml index a74a2df..7556dae 100644 --- a/totrans/prac-dl-cld_04.yaml +++ b/totrans/prac-dl-cld_04.yaml @@ -1,7 +1,9 @@ - en: 'Chapter 4\. Building a Reverse Image Search Engine: Understanding Embeddings' + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 第四章。构建反向图像搜索引擎:理解嵌入 - en: 'Bob just bought a new home and is looking to fill it up with some fancy modern furniture. He’s flipping endlessly through furniture catalogs and visiting furniture showrooms, but hasn’t yet landed on something he likes. Then one day, he spots @@ -13,16 +15,20 @@ luck: no one knows this particular brand. And searching on the internet with keywords like “white L-shaped,” “modern sofa” gives him thousands of results, but not the one he’s looking for.' + id: totrans-1 prefs: [] type: TYPE_NORMAL + zh: 鲍勃刚刚买了一套新房子,正在寻找一些时尚现代的家具来填充它。他不停地翻阅家具目录,参观家具展厅,但还没有找到自己喜欢的东西。然后有一天,他看到了他梦寐以求的沙发——一张独特的L形白色现代沙发在一个办公室接待处。好消息是他知道自己想要什么。坏消息是他不知道从哪里购买。沙发上没有写品牌和型号号码。询问办公室经理也没有帮助。所以,他从不同角度拍了几张照片,想在当地的家具店里打听,但运气不佳:没有人知道这个特定品牌。在互联网上使用“白色L形”、“现代沙发”等关键词搜索给他带来了成千上万的结果,但没有他在找的那个。 - en: Alice hears Bob’s frustration and asks, “Why don’t you try reverse image search?” Bob uploads his images on Google and Bing’s Reverse Image Search and quickly spots a similar-looking image on an online shopping website. Taking this more perfect image from the website, he does a few more reverse image searches and finds other websites offering the same sofa at cheaper prices. After a few minutes of being online, Bob has officially ordered his dream sofa! + id: totrans-2 prefs: [] type: TYPE_NORMAL + zh: 爱丽丝听到鲍勃的沮丧,问道:“为什么不试试反向图像搜索?”鲍勃将他的图像上传到谷歌和必应的反向图像搜索,很快在一个在线购物网站上发现了一张看起来相似的图像。从网站上找到这张更完美的图像后,他进行了更多的反向图像搜索,发现其他网站以更便宜的价格提供相同的沙发。在上网几分钟后,鲍勃正式订购了他梦寐以求的沙发! - en: '*Reverse image search* (or as it is more technically known, *instance retrieval*) enables developers and researchers to build scenarios beyond simple keyword search. From discovering visually similar objects on Pinterest to recommending similar @@ -31,46 +37,66 @@ infringement when their photographs are posted without consent on the internet. Even face recognition in several security systems uses a similar concept to ascertain the identity of the person.' + id: totrans-3 prefs: [] type: TYPE_NORMAL + zh: '*反向图像搜索*(或更专业地称为*实例检索*)使开发人员和研究人员能够构建超越简单关键字搜索的场景。从在Pinterest上发现视觉上相似的对象到在Spotify上推荐相似的歌曲,再到亚马逊基于相机的产品搜索,底层使用的是一类类似的技术。像TinEye这样的网站在摄影师的照片未经同意发布在互联网上时会发出侵权警告。甚至在几个安全系统中的人脸识别也使用类似的概念来确定人的身份。' - en: The best part is, with the right knowledge, you can build a working replica of many of these products in a few hours. So let’s dig right in! + id: totrans-4 prefs: [] type: TYPE_NORMAL + zh: 最好的部分是,有了正确的知识,你可以在几个小时内构建许多这些产品的工作副本。所以让我们开始吧! - en: 'Here’s what we’re doing in this chapter:' + id: totrans-5 prefs: [] type: TYPE_NORMAL + zh: 这是我们在本章中要做的事情: - en: Performing feature extraction and similarity search on Caltech101 and Caltech256 datasets + id: totrans-6 prefs: - PREF_OL type: TYPE_NORMAL + zh: 在Caltech101和Caltech256数据集上执行特征提取和相似性搜索 - en: Learning how to scale to large datasets (up to billions of images) + id: totrans-7 prefs: - PREF_OL type: TYPE_NORMAL + zh: 学习如何扩展到大型数据集(多达数十亿张图像) - en: Making the system more accurate and optimized + id: totrans-8 prefs: - PREF_OL type: TYPE_NORMAL + zh: 使系统更准确和优化 - en: Analyzing case studies to see how these concepts are used in mainstream products + id: totrans-9 prefs: - PREF_OL type: TYPE_NORMAL + zh: 分析案例研究,看看这些概念在主流产品中如何使用 - en: Image Similarity + id: totrans-10 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 图像相似度 - en: 'The first and foremost question is: given two images, are they similar or not?' + id: totrans-11 prefs: [] type: TYPE_NORMAL + zh: 首要问题是:给定两幅图像,它们是否相似? - en: There are several approaches to this problem. One approach is to compare patches of areas between two images. Although this can help find exact or near-exact images (that might have been cropped), even a slight rotation would result in dissimilarity. By storing the hashes of the patches, duplicates of an image can be found. One use case for this approach would be the identification of plagiarism in photographs. + id: totrans-12 prefs: [] type: TYPE_NORMAL + zh: 解决这个问题有几种方法。一种方法是比较两幅图像之间的区域块。尽管这可以帮助找到精确或接近精确的图像(可能已经被裁剪),但即使轻微旋转也会导致不相似。通过存储区块的哈希值,可以找到图像的重复。这种方法的一个用例是在照片中识别抄袭行为。 - en: Another naive approach is to calculate the histogram of RGB values and compare their similarities. This might help find near-similar images captured in the same environment without much change in the contents. For example, in [Figure 4-1](part0006.html#rgb_histogram-based_quotation_marksimila), @@ -79,15 +105,21 @@ rest. Of course, there is an increasing possibility of false positives as your dataset grows. Another downside to this approach is that small changes to the color, hue, or white balance would make recognition more difficult. + id: totrans-13 prefs: [] type: TYPE_NORMAL + zh: 另一种天真的方法是计算RGB值的直方图并比较它们的相似性。这可能有助于找到在相同环境中拍摄的内容没有太多变化的近似图像。例如,在[图4-1](part0006.html#rgb_histogram-based_quotation_marksimila)中,这种技术用于图像去重软件,旨在找到硬盘上的照片爆发,这样您就可以选择最好的照片并删除其余的照片。当数据集增长时,误报的可能性会增加。这种方法的另一个缺点是,对颜色、色调或白平衡进行小的更改会使识别变得更加困难。 - en: '![RGB histogram-based “Similar Image Detector” program](../images/00280.jpeg)' + id: totrans-14 prefs: [] type: TYPE_IMG + zh: '![基于RGB直方图的“相似图像检测器”程序](../images/00280.jpeg)' - en: Figure 4-1\. RGB histogram-based “Similar Image Detector” program + id: totrans-15 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 图4-1。基于RGB直方图的“相似图像检测器”程序 - en: A more robust traditional computer vision-based approach is to find visual features near edges using algorithms like Scale-Invariant Feature Transform (SIFT), Speeded Up Robust Features (SURF), and Oriented FAST and Rotated BRIEF (ORB) and then @@ -101,12 +133,15 @@ on the Amazon app. The app displays these features in the form of blue dots ([Figure 4-2](part0006.html#product_scanner_in_amazon_app_with_visua)). When it sees a sufficient number of features, it sends them to the Amazon servers to retrieve product information. + id: totrans-16 prefs: [] type: TYPE_NORMAL - en: '![Product scanner in Amazon app with visual features highlighted](../images/00045.jpeg)' + id: totrans-17 prefs: [] type: TYPE_IMG - en: Figure 4-2\. Product scanner in Amazon app with visual features highlighted + id: totrans-18 prefs: - PREF_H6 type: TYPE_NORMAL @@ -122,6 +157,7 @@ it requires enormous volumes of labeled data to train the classifier for extracting these labels on new images. And every time a new category needs to be added, the model needs to be retrained. + id: totrans-19 prefs: [] type: TYPE_NORMAL - en: Because our aim is to search among millions of images, what we ideally need @@ -129,6 +165,7 @@ image into a smaller representation (of say a few thousand dimensions), and have this summarized representation be close together for similar objects and further away for dissimilar items. + id: totrans-20 prefs: [] type: TYPE_NORMAL - en: Luckily, deep neural networks come to the rescue. As we saw in [Chapter 2](part0004.html#3Q283-13fa565533764549a6f0ab7f11eed62b) @@ -146,9 +183,11 @@ from different classes are separated by larger distances. This is an important property that helps solve so many problems where a classifier can’t be used, especially in unsupervised problems because of a lack of adequate labeled data. + id: totrans-21 prefs: [] type: TYPE_NORMAL - en: Tip + id: totrans-22 prefs: - PREF_H6 type: TYPE_NORMAL @@ -156,18 +195,26 @@ example, pass the images through a pretrained convolutional neural network like ResNet-50, extract the features, and then use a metric to calculate the error rate like the Euclidean distance. + id: totrans-23 prefs: [] type: TYPE_NORMAL + zh: 找到相似图像的理想方法是使用*迁移学习*。例如,通过预训练的卷积神经网络(如ResNet-50)传递图像,提取特征,然后使用度量来计算错误率,如欧几里德距离。 - en: Enough talk, let’s code! + id: totrans-24 prefs: [] type: TYPE_NORMAL + zh: 说了这么多,让我们写代码吧! - en: Feature Extraction + id: totrans-25 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 特征提取 - en: An image is worth a thousand ~~words~~ features. + id: totrans-26 prefs: [] type: TYPE_NORMAL + zh: 一张图像胜过千言万语的特征。 - en: In this section, we play with and understand the concepts of feature extraction, primarily with the Caltech 101 dataset (131 MB, approximately 9,000 images), and then eventually with Caltech 256 (1.2 GB, approximately 30,000 images). Caltech @@ -177,137 +224,210 @@ in the first 101 categories, which needs to be deleted before we begin experimenting. Remember that all of the code we are writing is also available in the [GitHub repository](http://PracticalDeepLearning.ai). + id: totrans-27 prefs: [] type: TYPE_NORMAL + zh: 在这一部分中,我们将主要使用Caltech 101数据集(131 MB,约9,000张图像)来玩耍和理解特征提取的概念,然后最终使用Caltech 256(1.2 + GB,约30,000张图像)。Caltech 101,顾名思义,包含大约9,000张图像,分为101个类别,每个类别大约有40到800张图像。需要注意的是,有一个第102个类别称为“BACKGROUND_Google”,其中包含随机图像,不包含在前101个类别中,需要在我们开始实验之前删除。请记住,我们编写的所有代码也可以在[GitHub存储库](http://PracticalDeepLearning.ai)中找到。 - en: 'Let’s download the dataset:' + id: totrans-28 prefs: [] type: TYPE_NORMAL + zh: 让我们下载数据集: - en: '[PRE0]' + id: totrans-29 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: 'Now, import all of the necessary modules:' + id: totrans-30 prefs: [] type: TYPE_NORMAL + zh: 现在,导入所有必要的模块: - en: '[PRE1]' + id: totrans-31 prefs: [] type: TYPE_PRE + zh: '[PRE1]' - en: 'Load the ResNet-50 model without the top classification layers, so we get only the *bottleneck features.* Then define a function that takes an image path, loads the image, resizes it to proper dimensions supported by ResNet-50, extracts the features, and then normalizes them:' + id: totrans-32 prefs: [] type: TYPE_NORMAL + zh: 加载不带顶部分类层的ResNet-50模型,以便只获取*瓶颈特征*。然后定义一个函数,该函数接受图像路径,加载图像,将其调整为ResNet-50支持的适当尺寸,提取特征,然后对其进行归一化: - en: '[PRE2]' + id: totrans-33 prefs: [] type: TYPE_PRE + zh: '[PRE2]' - en: Tip + id: totrans-34 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 提示 - en: The function defined in the previous example is the `key` function that we use for almost every feature extraction need in Keras. + id: totrans-35 prefs: [] type: TYPE_NORMAL + zh: 在前面示例中定义的函数是我们在Keras中几乎每个特征提取需求中使用的`key`函数。 - en: 'That’s it! Let’s see the feature-length that the model generates:' + id: totrans-36 prefs: [] type: TYPE_NORMAL + zh: 就是这样!让我们看看模型生成的特征长度: - en: '[PRE3]' + id: totrans-37 prefs: [] type: TYPE_PRE + zh: '[PRE3]' - en: annoy + id: totrans-38 prefs: [] type: TYPE_NORMAL + zh: annoy - en: '[PRE4]' + id: totrans-39 prefs: [] type: TYPE_PRE + zh: '[PRE4]' - en: The ResNet-50 model generated 2,048 features from the provided image. Each feature is a floating-point value between 0 and 1. + id: totrans-40 prefs: [] type: TYPE_NORMAL + zh: ResNet-50模型从提供的图像生成了2,048个特征。每个特征都是介于0和1之间的浮点值。 - en: Tip + id: totrans-41 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 提示 - en: If your model is trained or fine tuned on a dataset that is not similar to ImageNet, redefine the “preprocess_input(img)” step accordingly. The mean values used in the function are particular to the ImageNet dataset. Each model in Keras has its own preprocessing function so make sure you are using the right one. + id: totrans-42 prefs: [] type: TYPE_NORMAL + zh: 如果您的模型是在与ImageNet不相似的数据集上训练或微调的,请相应重新定义“preprocess_input(img)”步骤。该函数中使用的均值是特定于ImageNet数据集的。Keras中的每个模型都有自己的预处理函数,因此请确保您使用正确的函数。 - en: 'Now it’s time to extract features for the entire dataset. First, we get all the filenames with this handy function, which recursively looks for all the image files (defined by their extensions) under a directory:' + id: totrans-43 prefs: [] type: TYPE_NORMAL + zh: 现在是时候为整个数据集提取特征了。首先,我们使用这个方便的函数获取所有文件名,该函数会递归查找目录下所有图像文件(由其扩展名定义): - en: '[PRE5]' + id: totrans-44 prefs: [] type: TYPE_PRE + zh: '[PRE5]' - en: 'Then, we provide the path to our dataset and call the function:' + id: totrans-45 prefs: [] type: TYPE_NORMAL + zh: 然后,我们提供数据集的路径并调用该函数: - en: '[PRE6]' + id: totrans-46 prefs: [] type: TYPE_PRE + zh: '[PRE6]' - en: 'We now define a variable that will store all of the features, go through all filenames in the dataset, extract their features, and append them to the previously defined variable:' + id: totrans-47 prefs: [] type: TYPE_NORMAL + zh: 我们现在定义一个变量,将存储所有特征,遍历数据集中的所有文件名,提取它们的特征,并将它们附加到先前定义的变量中: - en: '[PRE7]' + id: totrans-48 prefs: [] type: TYPE_PRE + zh: '[PRE7]' - en: On a CPU, this should take under an hour. On a GPU, only a few minutes. + id: totrans-49 prefs: [] type: TYPE_NORMAL + zh: 在CPU上,这应该在一个小时内完成。在GPU上,只需要几分钟。 - en: Tip + id: totrans-50 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 提示 - en: To get a better sense of time, use the super handy tool `tqdm`, which shows a progress meter ([Figure 4-3](part0006.html#progress_bar_shown_with_tqdm_notebook)) along with the speed per iteration as well as the time that has passed and expected finishing time. In Python, wrap an iterable with `tqdm;` for example, `tqdm(range(10))`. Its Jupyter Notebook variant is `tqdm_notebook`. + id: totrans-51 prefs: [] type: TYPE_NORMAL + zh: 为了更好地了解时间,使用超级方便的工具`tqdm`,它显示一个进度条([图4-3](part0006.html#progress_bar_shown_with_tqdm_notebook))以及每次迭代的速度,已经过去的时间和预计的完成时间。在Python中,使用`tqdm`包装一个可迭代对象;例如,`tqdm(range(10))`。其Jupyter + Notebook变体是`tqdm_notebook`。 - en: '![Progress bar shown with tqdm_notebook](../images/00322.jpeg)' + id: totrans-52 prefs: [] type: TYPE_IMG + zh: '![使用tqdm_notebook显示的进度条](../images/00322.jpeg)' - en: Figure 4-3\. Progress bar shown with `tqdm_notebook` + id: totrans-53 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 图4-3\. 使用`tqdm_notebook`显示的进度条 - en: 'Finally, write these features to a pickle file so that we can use them in the future without having to recalculate them:' + id: totrans-54 prefs: [] type: TYPE_NORMAL + zh: 最后,将这些特征写入pickle文件,以便我们将来可以在不必重新计算的情况下使用它们: - en: '[PRE8]' + id: totrans-55 prefs: [] type: TYPE_PRE + zh: '[PRE8]' - en: That’s all folks! We’re done with the feature extraction part. + id: totrans-56 prefs: [] type: TYPE_NORMAL + zh: 这就是全部!我们已经完成了特征提取部分。 - en: Similarity Search + id: totrans-57 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 相似性搜索 - en: 'Given a photograph, our aim is to find another photo in our dataset similar to the current one. We begin by loading the precomputed features:' + id: totrans-58 prefs: [] type: TYPE_NORMAL + zh: 给定一张照片,我们的目标是在我们的数据集中找到一张与当前照片相似的照片。我们首先加载预先计算的特征: - en: '[PRE9]' + id: totrans-59 prefs: [] type: TYPE_PRE + zh: '[PRE9]' - en: 'We’ll use Python’s machine learning library `scikit-learn` for finding *nearest neighbors* of the query features; that is, features that represent a query image. We train a nearest-neighbor model using the brute-force algorithm to find the nearest five neighbors based on Euclidean distance (to install `scikit-learn` on your system, use `pip3 install sklearn)`:' + id: totrans-60 prefs: [] type: TYPE_NORMAL + zh: 我们将使用Python的机器学习库`scikit-learn`来查找查询特征的*最近邻居*;也就是代表查询图像的特征。我们使用暴力算法训练一个最近邻模型,以根据欧几里德距离找到最近的五个邻居(要在系统上安装`scikit-learn`,请使用`pip3 + install sklearn)`: - en: '[PRE10]' + id: totrans-61 prefs: [] type: TYPE_PRE + zh: '[PRE10]' - en: Now you have both the indices and distances of the nearest five neighbors of the very first query feature (which represents the first image). Notice the quick execution of the first step—the training step. Unlike training most machine learning @@ -315,100 +435,152 @@ the nearest-neighbor model is instantaneous because at training time there isn’t much processing. This is also called *lazy learning* because all the processing is deferred to classification or inference time. + id: totrans-62 prefs: [] type: TYPE_NORMAL + zh: 现在,您已经知道了最接近第一个查询特征(代表第一张图像)的五个最近邻的索引和距离。请注意第一步——训练步骤的快速执行。与训练大多数机器学习模型不同,这些模型可能需要几分钟到几小时在大型数据集上训练,实例化最近邻模型是瞬时的,因为在训练时没有太多处理。这也被称为*惰性学习*,因为所有处理都推迟到分类或推理时间。 - en: 'Now that we know the indices, let’s see the actual image behind that feature. First, we pick an image to query, located at say, index = 0:' + id: totrans-63 prefs: [] type: TYPE_NORMAL + zh: 现在我们知道了索引,让我们看看那个特征背后的实际图像。首先,我们选择一个图像进行查询,比如说,索引=0: - en: '[PRE11]' + id: totrans-64 prefs: [] type: TYPE_PRE + zh: '[PRE11]' - en: '[Figure 4-4](part0006.html#the_query_image_from_caltech-101_dataset) shows the result.' + id: totrans-65 prefs: [] type: TYPE_NORMAL + zh: '[图4-4](part0006.html#the_query_image_from_caltech-101_dataset)展示了这个结果。' - en: '![The query image from Caltech-101 dataset](../images/00294.jpeg)' + id: totrans-66 prefs: [] type: TYPE_IMG + zh: '![Caltech-101数据集中的查询图像](../images/00294.jpeg)' - en: Figure 4-4\. The query image from the Caltech-101 dataset + id: totrans-67 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 图4-4。Caltech-101数据集中的查询图像 - en: Now, let’s examine the nearest neighbors by plotting the first result. + id: totrans-68 prefs: [] type: TYPE_NORMAL + zh: 现在,让我们通过绘制第一个结果来检查最近邻。 - en: '[PRE12]' + id: totrans-69 prefs: [] type: TYPE_PRE + zh: '[PRE12]' - en: '[Figure 4-5](part0006.html#the_nearest_neighbor_to_our_query_image) shows that result.' + id: totrans-70 prefs: [] type: TYPE_NORMAL + zh: '[图4-5](part0006.html#the_nearest_neighbor_to_our_query_image)展示了这个结果。' - en: '![The nearest neighbor to our query image](../images/00034.jpeg)' + id: totrans-71 prefs: [] type: TYPE_IMG + zh: '![我们查询图像的最近邻](../images/00034.jpeg)' - en: Figure 4-5\. The nearest neighbor to our query image + id: totrans-72 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 图4-5。我们查询图像的最近邻 - en: 'Wait, isn’t that a duplicate? Actually, the nearest index will be the image itself because that is what is being queried:' + id: totrans-73 prefs: [] type: TYPE_NORMAL + zh: 等等,这不是重复的吗?实际上,最近的索引将是图像本身,因为这就是被查询的内容: - en: '[PRE13]' + id: totrans-74 prefs: [] type: TYPE_PRE + zh: '[PRE13]' - en: '[PRE14]' + id: totrans-75 prefs: [] type: TYPE_PRE + zh: '[PRE14]' - en: 'This is also confirmed by the fact that the distance of the first result is zero. Now let’s plot the real first nearest neighbor:' + id: totrans-76 prefs: [] type: TYPE_NORMAL + zh: 这也得到了第一个结果距离为零的事实的证实。现在让我们绘制真正的第一个最近邻: - en: '[PRE15]' + id: totrans-77 prefs: [] type: TYPE_PRE + zh: '[PRE15]' - en: Take a look at the result this time in [Figure 4-6](part0006.html#the_second_nearest_neighbor_of_the_queri). + id: totrans-78 prefs: [] type: TYPE_NORMAL + zh: 这次看看[图4-6](part0006.html#the_second_nearest_neighbor_of_the_queri)中的结果。 - en: '![The second nearest neighbor of the queried image](../images/00213.jpeg)' + id: totrans-79 prefs: [] type: TYPE_IMG + zh: '![查询图像的第二近邻](../images/00213.jpeg)' - en: Figure 4-6\. The second nearest neighbor of the queried image + id: totrans-80 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 图4-6。查询图像的第二近邻 - en: This definitely looks like a similar image. It captured a similar concept, has the same image category (faces), same gender, and similar background with pillars and vegetation. In fact, it’s the same person! + id: totrans-81 prefs: [] type: TYPE_NORMAL + zh: 这绝对看起来像是一张相似的图像。它捕捉到了一个相似的概念,具有相同的图像类别(人脸),相同的性别,以及与柱子和植被相似的背景。事实上,这是同一个人! - en: We would probably use this functionality regularly, so we have already built a helper function `plot_images()` that visualizes several query images with their nearest neighbors. Now let’s call this function to visualize the nearest neighbors of six random images. Also, note that every time you run the following piece of code, the displayed images will be different ([Figure 4-7](part0006.html#nearest_neighbors_for_different_images_r)) because the displayed images are indexed by a random integer. + id: totrans-82 prefs: [] type: TYPE_NORMAL + zh: 我们可能会经常使用这个功能,所以我们已经构建了一个名为`plot_images()`的辅助函数,用于可视化几个查询图像及其最近邻。现在让我们调用这个函数来可视化六个随机图像的最近邻。另外,请注意,每次运行以下代码片段时,显示的图像将不同([图4-7](part0006.html#nearest_neighbors_for_different_images_r)),因为显示的图像是由一个随机整数索引的。 - en: '[PRE16]' + id: totrans-83 prefs: [] type: TYPE_PRE + zh: '[PRE16]' - en: '![Nearest neighbors for different images returns similar-looking images](../images/00179.jpeg)' + id: totrans-84 prefs: [] type: TYPE_IMG + zh: '![不同图像的最近邻返回相似的图像](../images/00179.jpeg)' - en: Figure 4-7\. Nearest neighbor for different images returns similar-looking images + id: totrans-85 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 图4-7。不同图像的最近邻返回相似的图像 - en: Visualizing Image Clusters with t-SNE + id: totrans-86 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 用t-SNE可视化图像聚类 - en: Let’s step up the game by visualizing the entire dataset! + id: totrans-87 prefs: [] type: TYPE_NORMAL + zh: 让我们通过可视化整个数据集来提升游戏! - en: 'To do this, we need to reduce the dimensions of the feature vectors because it’s not possible to plot a 2,048-dimension vector (the feature-length) in two dimensions (the paper). The t-distributed stochastic neighbor embedding (t-SNE) @@ -416,72 +588,102 @@ view of the dataset, which is helpful in recognizing clusters and nearby images. t-SNE is difficult to scale to large datasets, so it is a good idea to reduce the dimensionality using Principal Component Analysis (PCA) and then call t-SNE:' + id: totrans-88 prefs: [] type: TYPE_NORMAL + zh: 为了做到这一点,我们需要降低特征向量的维度,因为不可能在两个维度(纸张)中绘制一个2,048维向量(特征长度)。t-分布随机邻居嵌入(t-SNE)算法将高维特征向量降至2D,提供数据集的鸟瞰视图,有助于识别聚类和附近图像。t-SNE难以扩展到大型数据集,因此通过主成分分析(PCA)降低维度然后调用t-SNE是一个好主意: - en: '[PRE17]' + id: totrans-89 prefs: [] type: TYPE_PRE + zh: '[PRE17]' - en: We discuss PCA in more detail in later sections. In order to scale to larger dimensions, use Uniform Manifold Approximation and Projection (UMAP). + id: totrans-90 prefs: [] type: TYPE_NORMAL + zh: 我们将在后面的部分更详细地讨论PCA。为了扩展到更大的维度,使用均匀流形逼近和投影(UMAP)。 - en: '[Figure 4-8](part0006.html#t-sne_visualizing_clusters_of_image_feat) shows clusters of similar classes, and how they are spread close to one another.' + id: totrans-91 prefs: [] type: TYPE_NORMAL + zh: '[图4-8](part0006.html#t-sne_visualizing_clusters_of_image_feat)展示了相似类别的聚类,以及它们是如何靠近彼此的。' - en: '![t-SNE visualizing clusters of image features, each cluster represented one object class in the same color](../images/00136.jpeg)' + id: totrans-92 prefs: [] type: TYPE_IMG + zh: '![t-SNE可视化图像特征聚类,每个聚类用相同颜色表示一个对象类别](../images/00136.jpeg)' - en: Figure 4-8\. t-SNE visualizing clusters of image features, where each cluster represents one object class in the same color + id: totrans-93 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 图4-8。t-SNE可视化图像特征聚类,每个聚类用相同颜色表示一个对象类别 - en: Each color in [Figure 4-8](part0006.html#t-sne_visualizing_clusters_of_image_feat) indicates a different class. To make it even more clear, we can use another helper function, `plot_images_in_2d()`, to plot the images in these clusters, as demonstrated in [Figure 4-9](part0006.html#t-sne_visualizations_showing_image_clust). + id: totrans-94 prefs: [] type: TYPE_NORMAL + zh: '[图4-8](part0006.html#t-sne_visualizing_clusters_of_image_feat)中的每种颜色表示不同的类别。为了使其更加清晰,我们可以使用另一个辅助函数`plot_images_in_2d()`来绘制这些集群中的图像,就像[图4-9](part0006.html#t-sne_visualizations_showing_image_clust)中所演示的那样。' - en: '![t-SNE visualizations showing image clusters; similar images are in the same cluster](../images/00036.jpeg)' + id: totrans-95 prefs: [] type: TYPE_IMG + zh: '![t-SNE可视化显示图像集群;相似的图像在同一集群中](../images/00036.jpeg)' - en: Figure 4-9\. t-SNE visualization showing image clusters; similar images are in the same cluster + id: totrans-96 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 图4-9\. t-SNE可视化显示图像集群;相似的图像在同一集群中 - en: Neat! There is a clearly demarcated cluster of human faces, flowers, vintage cars, ships, bikes, and a somewhat spread-out cluster of land and marine animals. There are lots of images on top of one another, which makes [Figure 4-9](part0006.html#t-sne_visualizations_showing_image_clust) a tad bit confusing, so let’s try to plot the t-SNE as clear tiles with the helper function `tsne_to_grid_plotter_manual()`, the results of which you can see in [Figure 4-10](part0006.html#t-sne_visualization_with_tiled_imagessem). + id: totrans-97 prefs: [] type: TYPE_NORMAL + zh: 很棒!有一个明显划分的人脸、花朵、老式汽车、船只、自行车的集群,以及一个稍微分散的陆地和海洋动物集群。有很多图像重叠在一起,这使得[图4-9](part0006.html#t-sne_visualizations_showing_image_clust)有点令人困惑,所以让我们尝试使用辅助函数`tsne_to_grid_plotter_manual()`将t-SNE绘制为清晰的瓷砖,其结果可以在[图4-10](part0006.html#t-sne_visualization_with_tiled_imagessem)中看到。 - en: '[PRE18]' + id: totrans-98 prefs: [] type: TYPE_PRE + zh: '[PRE18]' - en: '![t-SNE visualization with tiled images; similar images are close together](../images/00011.jpeg)' + id: totrans-99 prefs: [] type: TYPE_IMG + zh: '![t-SNE可视化与瓷砖图像;相似的图像彼此靠近](../images/00011.jpeg)' - en: Figure 4-10\. t-SNE visualization with tiled images; similar images are close together + id: totrans-100 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 图4-10\. t-SNE可视化与瓷砖图像;相似的图像彼此靠近 - en: This is definitely much clearer. We can see similar images are colocated within the clusters of human faces, chairs, bikes, airplanes, ships, laptops, animals, watches, flowers, tilted minarets, vintage cars, anchor signs, and cameras, all close to their own kind. Birds of a feather indeed do flock together! + id: totrans-101 prefs: [] type: TYPE_NORMAL + zh: 这绝对更清晰了。我们可以看到相似的图像在人脸、椅子、自行车、飞机、船只、笔记本电脑、动物、手表、花朵、倾斜的尖塔、老式汽车、锚标志和相机的集群中共同定位,都靠近自己的同类。物以类聚! - en: Tip + id: totrans-102 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 提示 - en: 2D clusters are great, but visualizing them in 3D would look stellar. It would be even better if they could be rotated, zoomed in and out, and manipulated using the mouse without any coding. And bonus points if the data could be searched interactively, @@ -492,31 +694,44 @@ shows, it’s reassuring to see deep learning figure out that John Lennon, Led Zeppelin, and Eric Clapton happen to be used in a similar context to the Beatles in the English language. + id: totrans-103 prefs: [] type: TYPE_NORMAL + zh: 2D集群很棒,但在3D中可视化它们会看起来更加出色。如果它们可以旋转、缩放,并且可以使用鼠标进行操作而无需编码,那将更好。如果数据可以以交互方式搜索,显示其邻居,那就更加分数。[TensorFlow + Embedding projector](https://projector.tensorflow.org)在基于浏览器的GUI工具中实现了所有这些功能以及更多。来自图像和文本数据集的预加载嵌入对于更好地理解嵌入的强大性能是有帮助的。正如[图4-11](part0006.html#tensorflow_embedding_projector_showing_a)所示,看到深度学习发现约翰·列侬、齐柏林飞艇和埃里克·克莱普顿恰好在英语中与披头士乐队在类似的语境中使用是令人欣慰的。 - en: '![TensorFlow Embedding projector showing a 3D representation of common 10,000 English words and highlighting related words to “Beatles”](../images/00320.jpeg)' + id: totrans-104 prefs: [] type: TYPE_IMG + zh: '![TensorFlow Embedding projector显示了常见10,000个英语单词的3D表示,并突出显示了与“披头士”相关的单词](../images/00320.jpeg)' - en: Figure 4-11\. TensorFlow Embedding projector showing a 3D representation of 10,000 common English words and highlighting words related to “Beatles” + id: totrans-105 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 图4-11\. TensorFlow Embedding projector显示了10,000个常见英语单词的3D表示,并突出显示了与“披头士”相关的单词 - en: Improving the Speed of Similarity Search + id: totrans-106 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 提高相似性搜索的速度 - en: 'There are several opportunities to improve the speed of the similarity search step. For similarity search, we can make use of two strategies: either reduce the feature-length, or use a better algorithm to search among the features. Let’s examine each of these strategies individually.' + id: totrans-107 prefs: [] type: TYPE_NORMAL + zh: 有几个机会可以提高相似性搜索步骤的速度。对于相似性搜索,我们可以利用两种策略:要么减少特征长度,要么使用更好的算法在特征之间进行搜索。让我们分别检查这两种策略。 - en: Length of Feature Vectors + id: totrans-108 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 特征向量长度 - en: Ideally, we would expect that the smaller the amount of data in which to search, the faster the search should be. Recall that the ResNet-50 model gives 2,048 features. With each feature being a 32-bit floating-point, each image is represented by @@ -524,40 +739,63 @@ how slow it would be to search among 8 GB worth of features. To give us a better picture of our scenario, [Table 4-1](part0006.html#top_1percent_accuracy_and_feature_length) gives the feature-lengths that we get from different models. + id: totrans-109 prefs: [] type: TYPE_NORMAL + zh: 理想情况下,我们期望搜索的数据量越小,搜索速度就应该越快。回想一下,ResNet-50模型提供了2048个特征。每个特征都是一个32位浮点数,每个图像由一个8 + KB的特征向量表示。对于一百万张图像,这相当于将近8 GB。想象一下在8 GB的特征中进行搜索会有多慢。为了更好地了解我们的情况,[表4-1](part0006.html#top_1percent_accuracy_and_feature_length)给出了我们从不同模型中获得的特征长度。 - en: Table 4-1\. Top 1% accuracy and feature-lengths for different CNN models + id: totrans-110 prefs: [] type: TYPE_NORMAL + zh: 表4-1\. 不同CNN模型的Top 1%准确率和特征长度 - en: '| **Model** | **Bottleneck feature-length** | **Top-1% accuracy on ImageNet** |' + id: totrans-111 prefs: [] type: TYPE_TB + zh: '| **模型** | **瓶颈特征长度** | **在ImageNet上的Top-1%准确率** |' - en: '| --- | --- | --- |' + id: totrans-112 prefs: [] type: TYPE_TB + zh: '| --- | --- | --- |' - en: '| VGG16 | 512 | 71.5% |' + id: totrans-113 prefs: [] type: TYPE_TB + zh: '| VGG16 | 512 | 71.5% |' - en: '| VGG19 | 512 | 72.7% |' + id: totrans-114 prefs: [] type: TYPE_TB + zh: '| VGG19 | 512 | 72.7% |' - en: '| MobileNet | 1024 | 66.5% |' + id: totrans-115 prefs: [] type: TYPE_TB + zh: '| MobileNet | 1024 | 66.5% |' - en: '| InceptionV3 | 2048 | 78.8% |' + id: totrans-116 prefs: [] type: TYPE_TB + zh: '| InceptionV3 | 2048 | 78.8% |' - en: '| ResNet-50 | 2048 | 75.9% |' + id: totrans-117 prefs: [] type: TYPE_TB + zh: '| ResNet-50 | 2048 | 75.9% |' - en: '| Xception | 2048 | 79.0% |' + id: totrans-118 prefs: [] type: TYPE_TB + zh: '| Xception | 2048 | 79.0% |' - en: Note + id: totrans-119 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 注意 - en: 'Under the hood, many models available in `tf.keras.applications` yield several thousand features. For example, InceptionV3 yields features in the shape of 1 x 5 x 5 x 2048, which translates to 2,048 feature maps of 5 x 5 convolutions, @@ -565,34 +803,44 @@ this large vector by using an average or max-pooling layer. The pooling layer will condense each convolution (e.g., 5 x 5 layer) into a single value. This can be defined during model instantiation as follows:' + id: totrans-120 prefs: [] type: TYPE_NORMAL - en: '[PRE19]' + id: totrans-121 prefs: [] type: TYPE_PRE + zh: '[PRE19]' - en: For models that yield a large number of features, you will usually find that all code examples make use of this pooling option. [Table 4-2](part0006.html#number_of_features_before_and_after_pool) shows the before and after effect of max pooling on the number of features in different models. + id: totrans-122 prefs: [] type: TYPE_NORMAL - en: Table 4-2\. Number of features before and after pooling for different models + id: totrans-123 prefs: [] type: TYPE_NORMAL - en: '| **Model** | **# features before pooling** | **# features after pooling** |' + id: totrans-124 prefs: [] type: TYPE_TB - en: '| --- | --- | --- |' + id: totrans-125 prefs: [] type: TYPE_TB - en: '| ResNet-50 | [1,1,1,2048] = 2048 | 2048 |' + id: totrans-126 prefs: [] type: TYPE_TB - en: '| InceptionV3 | [1,5,5,2048] = 51200 | 2048 |' + id: totrans-127 prefs: [] type: TYPE_TB - en: '| MobileNet | [1,7,7,1024] = 50176 | 1024 |' + id: totrans-128 prefs: [] type: TYPE_TB - en: As we can see, almost all the models generate a large number of features. Imagine @@ -602,9 +850,11 @@ big data scenarios, for which the data can be loaded into RAM all at once instead of periodically loading parts of it, thus giving an even bigger speedup. PCA will help us make this happen. + id: totrans-129 prefs: [] type: TYPE_NORMAL - en: Reducing Feature-Length with PCA + id: totrans-130 prefs: - PREF_H2 type: TYPE_NORMAL @@ -616,44 +866,59 @@ of features that are a linear combination of the input features. These linear features are orthogonal to one another, which is why all the redundant features are absent. These features are known as *principal components.* + id: totrans-131 prefs: [] type: TYPE_NORMAL - en: 'Performing PCA is pretty simple. Using the `scikit-learn` library, execute the following:' + id: totrans-132 prefs: [] type: TYPE_NORMAL - en: '[PRE20]' + id: totrans-133 prefs: [] type: TYPE_PRE + zh: '[PRE20]' - en: 'PCA can also tell us the relative importance of each feature. The very first dimension has the most variance and the variance keeps on decreasing as we go on:' + id: totrans-134 prefs: [] type: TYPE_NORMAL - en: '[PRE21]' + id: totrans-135 prefs: [] type: TYPE_PRE + zh: '[PRE21]' - en: '[PRE22]' + id: totrans-136 prefs: [] type: TYPE_PRE + zh: '[PRE22]' - en: Hmm, why did we pick 100 dimensions from the original 2,048? Why not 200? PCA is representing our original feature vector but in reduced dimensions. Each new dimension has diminishing returns in representing the original vector (i.e., the new dimension might not explain the data much) and takes up valuable space. We can balance between how well the original data is explained versus how much we want to reduce it. Let’s visualize the importance of say the first 200 dimensions. + id: totrans-137 prefs: [] type: TYPE_NORMAL - en: '[PRE23]' + id: totrans-138 prefs: [] type: TYPE_PRE + zh: '[PRE23]' - en: '[Figure 4-12](part0006.html#variance_for_each_pca_dimension) presents the results.' + id: totrans-139 prefs: [] type: TYPE_NORMAL - en: '![Variance for each PCA dimension](../images/00262.jpeg)' + id: totrans-140 prefs: [] type: TYPE_IMG - en: Figure 4-12\. Variance for each PCA dimension + id: totrans-141 prefs: - PREF_H6 type: TYPE_NORMAL @@ -664,47 +929,66 @@ model. Another way to look at this is to visualize how much of the original data is explained by the limited number of features by finding the cumulative variance (see [Figure 4-13](part0006.html#cumulative_variance_with_each_pca_dimens)). + id: totrans-142 prefs: [] type: TYPE_NORMAL - en: '[PRE24]' + id: totrans-143 prefs: [] type: TYPE_PRE + zh: '[PRE24]' - en: '![Cumulative variance with each PCA dimension](../images/00237.jpeg)' + id: totrans-144 prefs: [] type: TYPE_IMG - en: Figure 4-13\. Cumulative variance with each PCA dimension + id: totrans-145 prefs: - PREF_H6 type: TYPE_NORMAL - en: As expected, adding 100 dimensions (from 100 to 200) adds only 0.1 variance and begins to gradually plateau. For reference, using the full 2,048 features would result in a cumulative variance of 1. + id: totrans-146 prefs: [] type: TYPE_NORMAL + zh: 如预期的那样,添加100个维度(从100到200)仅增加了0.1的方差,并开始逐渐趋于平稳。作为参考,使用完整的2,048个特征将导致累积方差为1。 - en: 'The number of dimensions in PCA is an important parameter that we can tune to the problem at hand. One way to directly justify a good threshold is to find a good balance between the number of features and its effect on accuracy versus speed:' + id: totrans-147 prefs: [] type: TYPE_NORMAL + zh: PCA中的维度数量是一个重要的参数,我们可以根据手头的问题进行调整。直接证明一个好的阈值的一种方法是找到特征数量与其对准确性和速度的影响之间的良好平衡: - en: '[PRE25]' + id: totrans-148 prefs: [] type: TYPE_PRE + zh: '[PRE25]' - en: 'We visualize these results using the graph in [Figure 4-14](part0006.html#test_time_versus_accuracy_for_each_pca_d) and see that after a certain number of dimensions an increase in dimensions does not lead to higher accuracy:' + id: totrans-149 prefs: [] type: TYPE_NORMAL + zh: 我们使用图表[图4-14](part0006.html#test_time_versus_accuracy_for_each_pca_d)来可视化这些结果,并看到在一定数量的维度之后,增加维度并不会导致更高的准确性: - en: '[PRE26]' + id: totrans-150 prefs: [] type: TYPE_PRE + zh: '[PRE26]' - en: '![Test time versus accuracy for each PCA dimension](../images/00191.jpeg)' + id: totrans-151 prefs: [] type: TYPE_IMG + zh: '![每个PCA维度的测试时间与准确性](../images/00191.jpeg)' - en: Figure 4-14\. Test time versus accuracy for each PCA dimension + id: totrans-152 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 图4-14。每个PCA维度的测试时间与准确性 - en: As is visible in the graph, there is little improvement in accuracy after increasing beyond a feature-length of 100 dimensions. With almost 20 times fewer dimensions (100) than the original (2,048), this offers drastically higher speed and less @@ -712,8 +996,10 @@ better) accuracy. Hence, 100 would be an ideal feature-length for this dataset. This also means that the first 100 dimensions contain the most information about the dataset. + id: totrans-153 prefs: [] type: TYPE_NORMAL + zh: 正如图中所示,在超过100个维度的特征长度之后,准确性几乎没有提高。与原始数据(2,048)相比,几乎少了20倍的维度(100),这在几乎任何搜索算法中都提供了极高的速度和更少的时间,同时实现了类似(有时略有更好)的准确性。因此,100将是这个数据集的理想特征长度。这也意味着前100个维度包含了关于数据集的大部分信息。 - en: There are a number of benefits to using this reduced representation, like efficient use of computational resources, noise removal, better generalization due to fewer dimensions, and improved performance for machine learning algorithms learning @@ -727,29 +1013,40 @@ space, the majority of points from a real-world dataset seem to be a similar distance away from one another, and the Euclidean distance metric begins to fail in discerning similar versus dissimilar items. PCA helps bring sanity back. + id: totrans-154 prefs: [] type: TYPE_NORMAL + zh: 使用这种减少的表示有许多好处,如有效利用计算资源、去除噪音、由于维度较少而实现更好的泛化,以及对在这些数据上学习的机器学习算法的性能改进。通过将距离计算减少到最重要的特征,我们还可以稍微提高结果的准确性。这是因为以前所有的2,048个特征在距离计算中都是平等贡献的,而现在,只有最重要的100个特征发挥作用。但更重要的是,它使我们摆脱了“维度灾难”。观察到,随着维度数量的增加,两个最接近点和两个最远点之间的欧氏距离比 + tend to become 1。在非常高维的空间中,来自真实世界数据集的大多数点似乎相互之间的距离相似,欧氏距离度量开始无法区分相似和不相似的项目。PCA有助于恢复理智。 - en: You can also experiment with different distances like Minkowski distance, Manhattan distance, Jaccardian distance, and weighted Euclidean distance (where the weight is the contribution of each feature as explained in `pca.explained_variance_ratio_`). + id: totrans-155 prefs: [] type: TYPE_NORMAL + zh: 您还可以尝试不同的距离,如闵可夫斯基距离、曼哈顿距离、杰卡德距离和加权欧氏距离(其中权重是每个特征的贡献,如`pca.explained_variance_ratio_`中所解释的)。 - en: Now, let’s turn our minds toward using this reduced set of features to make our search even faster. + id: totrans-156 prefs: [] type: TYPE_NORMAL + zh: 现在,让我们将注意力转向使用这个减少后的特征集来使我们的搜索更快。 - en: Scaling Similarity Search with Approximate Nearest Neighbors + id: totrans-157 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 使用近似最近邻扩展相似性搜索 - en: What do we want? Nearest neighbors. What is our baseline? Brute-force search. Although convenient to implement in two lines, it goes over each element and hence scales linearly with data size (number of items as well as the number of dimensions). Having PCA take our feature vector from a length of 2,048 to 100 will not only yield a 20-times reduction in data size, but also result in an increase in speed of 20 times when using brute force. PCA does pay off! + id: totrans-158 prefs: [] type: TYPE_NORMAL + zh: 我们想要什么?最近邻。我们的基准是什么?暴力搜索。虽然在两行中实现起来很方便,但它会遍历每个元素,因此随着数据大小(项目数量以及维度数量)的增加而呈线性增长。将我们的特征向量从2,048的长度缩减到100,不仅会使数据大小减少20倍,还会使使用暴力搜索时的速度增加20倍。PCA确实是值得的! - en: Let’s assume similarity searching a small collection of 10,000 images, now represented with 100 feature-length vectors, takes approximately 1 ms. Even though this looks fast for 10,000 items, in a real production system with larger data, perhaps 10 @@ -759,8 +1056,10 @@ machine (and loading the search index per thread), you would need multiple machines to be able to serve the traffic. In other words, an inefficient algorithm means money, lots of money, spent on hardware. + id: totrans-159 prefs: [] type: TYPE_NORMAL + zh: 假设相似性搜索一个包含10,000张图像的小集合,现在用100个特征长度向量表示,大约需要1毫秒。即使对于10,000个项目来说这看起来很快,但在一个真实的生产系统中,也许有更大的数据,比如10百万个项目,这将需要超过一秒的时间来搜索。我们的系统可能无法每秒每个CPU核心处理超过一个查询。如果您从用户那里每秒收到100个请求,即使在机器的多个CPU核心上运行(并为每个线程加载搜索索引),您也需要多台机器才能处理流量。换句话说,低效的算法意味着花费大量的硬件成本。 - en: Brute force is our baseline for every comparison. As in most algorithmic approaches, brute force is the slowest approach. Now that we have our baseline set, we will explore approximate nearest-neighbor algorithms. Instead of guaranteeing the correct @@ -769,12 +1068,16 @@ offer some form of tuning to balance between correctness and speed. It is possible to evaluate the quality of the results by comparing against the results of the brute-force baseline. + id: totrans-160 prefs: [] type: TYPE_NORMAL + zh: 蛮力搜索是我们进行每次比较的基准。在大多数算法方法中,蛮力搜索是最慢的方法。现在我们已经设定了基准,我们将探索近似最近邻算法。与蛮力搜索方法保证正确结果不同,近似算法*通常*能够得到正确结果,因为它们是...嗯,近似值。大多数算法提供某种形式的调整来平衡正确性和速度。可以通过与蛮力基准结果进行比较来评估结果的质量。 - en: Approximate Nearest-Neighbor Benchmark + id: totrans-161 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 近似最近邻基准 - en: There are several approximate nearest-neighbor (ANN) libraries out there, including well-known ones like Spotify’s Annoy, FLANN, Facebook’s Faiss, Yahoo’s NGT, and NMSLIB. Benchmarking each of them would be a tedious task (assuming you get past @@ -789,101 +1092,147 @@ of correctness is the fraction of top-*n* closest items returned with respect to the real top-*n* closest items. This ground truth is measured by brute-force search. + id: totrans-162 prefs: [] type: TYPE_NORMAL + zh: 有几个近似最近邻(ANN)库,包括知名的Spotify的Annoy、FLANN、Facebook的Faiss、Yahoo的NGT和NMSLIB。对它们进行基准测试将是一项繁琐的任务(假设您能够安装其中一些)。幸运的是,*[ann-benchmarks.com](http://ann-benchmarks.com)*的热心人(Martin + Aumueller、Erik Bernhardsson和Alec Faitfull)已经为我们做了大量工作,以可重现的方式在19个库上进行了大型公共数据集的基准测试。我们将在一个代表单词的特征嵌入数据集上进行比较(而不是图像),该数据集大小为350 + MB,包含400,000个表示100维单词的特征向量。[图4-15](part0006.html#comparison_of_ann_libraries_left_parenth)展示了它们在正确性调整时的原始性能。性能是以库每秒响应查询的能力来衡量的。请记住,正确性的度量是返回的前n个最接近项与真实前n个最接近项的比例。这个基准是通过蛮力搜索来衡量的真实情况。 - en: '![Comparison of ANN libraries (data from ann-benchmarks.com)](../images/00185.jpeg)' + id: totrans-163 prefs: [] type: TYPE_IMG + zh: '![ANN库比较(数据来自ann-benchmarks.com)](../images/00185.jpeg)' - en: Figure 4-15\. Comparison of ANN libraries (data from [ann-benchmarks.com](http://ann-benchmarks.com)) + id: totrans-164 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 图4-15. ANN库比较(数据来自[ann-benchmarks.com](http://ann-benchmarks.com)) - en: The strongest performers on this dataset return close to several thousand queries per second at the acceptable 0.8 recall. To put this in perspective, our brute-force search performs under 1 query per second. At the fastest, some of these libraries (like NGT) can return north of 15,000 results per second (albeit at a low recall, making it impractical for usage). + id: totrans-165 prefs: [] type: TYPE_NORMAL + zh: 在这个数据集上表现最强的库在可接受的0.8召回率下每秒返回接近数千个查询。为了让大家有个概念,我们的蛮力搜索每秒执行不到1个查询。在最快的情况下,一些库(如NGT)每秒可以返回超过15,000个结果(尽管召回率较低,使其在使用上不切实际)。 - en: Which Library Should I Use? + id: totrans-166 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 我应该使用哪个库? - en: It goes without saying that the library you use will end up depending heavily on your scenario. Each library presents a trade-off between search speed, accuracy, size of index, memory consumption, hardware use (CPU/GPU), and ease of setup. [Table 4-3](part0006.html#ann_library_recommendations) presents a synopsis of different scenarios and recommendations as to which library might be work best for each scenario. + id: totrans-167 prefs: [] type: TYPE_NORMAL + zh: '毫无疑问,您使用的库将在很大程度上取决于您的场景。每个库在搜索速度、准确性、索引大小、内存消耗、硬件使用(CPU/GPU)和设置便捷性之间存在权衡。[表4-3](part0006.html#ann_library_recommendations)提供了不同场景和推荐的库的摘要。 ' - en: Table 4-3\. ANN library recommendations + id: totrans-168 prefs: [] type: TYPE_NORMAL + zh: 表4-3. ANN库推荐 - en: '| **Scenario** | **Recommendation** |' + id: totrans-169 prefs: [] type: TYPE_TB + zh: '| **场景** | **推荐** |' - en: '| --- | --- |' + id: totrans-170 prefs: [] type: TYPE_TB + zh: '| --- | --- |' - en: '| I want to experiment quickly in Python without too much setup but I also care about fast speed. | Use Annoy or NMSLIB |' + id: totrans-171 prefs: [] type: TYPE_TB + zh: '| 我想在Python中快速进行实验,而不需要太多设置,但我也关心速度。 | 使用Annoy或NMSLIB |' - en: '| I have a large dataset (up to 10 million entries or several thousand dimensions) and care utmost about speed. | Use NGT |' + id: totrans-172 prefs: [] type: TYPE_TB + zh: '| 我有一个大型数据集(多达1000万条记录或数千个维度),并且非常关心速度。 | 使用NGT |' - en: '| I have a ridiculously large dataset (100 million-plus entries) and have a cluster of GPUs, too. | Use Faiss |' + id: totrans-173 prefs: [] type: TYPE_TB + zh: '| 我有一个庞大的数据集(超过1亿条记录)并且有一组GPU。 | 使用Faiss |' - en: '| I want to set a ground-truth baseline with 100% correctness. Then immediately move to a faster library, impress my boss with the orders of magnitude speedup, and get a bonus. | Use brute-force approach |' + id: totrans-174 prefs: [] type: TYPE_TB + zh: '| 我想建立一个100%正确性的基准,然后立即转向更快的库,用数量级的加速打动我的老板,并获得奖金。 | 使用蛮力搜索方法 |' - en: We offer much more detailed examples in code of several libraries on the book’s GitHub website (see [*http://PracticalDeepLearning.ai*](http://PracticalDeepLearning.ai)), but for our purposes here, we’ll showcase our go-to library, Annoy, in detail and compare it with brute-force search on a synthetic dataset. Additionally, we briefly touch on Faiss and NGT. + id: totrans-175 prefs: [] type: TYPE_NORMAL + zh: 我们在书的GitHub网站上提供了几个库的代码详细示例(请参阅[*http://PracticalDeepLearning.ai*](http://PracticalDeepLearning.ai)),但是在这里,我们将详细展示我们常用的库Annoy,并将其与合成数据集上的蛮力搜索进行比较。此外,我们还简要介绍了Faiss和NGT。 - en: Creating a Synthetic Dataset + id: totrans-176 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 创建合成数据集 - en: 'To make an apples-to-apples comparison between different libraries, we first create a million-item dataset composed of random floating-point values with mean 0 and variance 1\. Additionally, we pick a random feature vector as our query to find the nearest neighbors:' + id: totrans-177 prefs: [] type: TYPE_NORMAL + zh: 为了进行不同库之间的苹果对苹果比较,我们首先创建一个由随机浮点值组成的百万项目数据集,均值为0,方差为1。此外,我们选择一个随机特征向量作为我们的查询以找到最近的邻居: - en: '[PRE27]' + id: totrans-178 prefs: [] type: TYPE_PRE + zh: '[PRE27]' - en: Brute Force + id: totrans-179 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 蛮力算法 - en: 'First, we calculate the time for searching with the brute-force algorithm. It goes through the entire data serially, calculating the distance between the query and current item one at a time. We use the `timeit` command for calculating the time. First, we create the search index to retrieve the five nearest neighbors and then search with a query:' + id: totrans-180 prefs: [] type: TYPE_NORMAL + zh: 首先,我们计算使用蛮力算法进行搜索所需的时间。它会逐个遍历整个数据,计算查询和当前项目之间的距离。我们使用`timeit`命令来计算时间。首先,我们创建搜索索引以检索五个最近的邻居,然后使用查询进行搜索: - en: '[PRE28]' + id: totrans-181 prefs: [] type: TYPE_PRE + zh: '[PRE28]' - en: '[PRE29]' + id: totrans-182 prefs: [] type: TYPE_PRE + zh: '[PRE29]' - en: Tip + id: totrans-183 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 提示 - en: The `timeit` command is a handy tool. To benchmark the time of a single operation, prefix it with this command. Compared to the time command, which runs a statement for one time, `timeit` runs the subsequent line multiple times to give more precise @@ -891,42 +1240,62 @@ off garbage collection, making independent timings more comparable. That said, this might not reflect timings in real production loads where garbage collection is turned on. + id: totrans-184 prefs: [] type: TYPE_NORMAL + zh: '`timeit`命令是一个方便的工具。要对单个操作的时间进行基准测试,请在其前面加上此命令。与运行一次语句的time命令相比,`timeit`会多次运行后续行以提供更精确的聚合统计数据以及标准偏差。默认情况下,它关闭垃圾收集,使独立的计时更具可比性。也就是说,这可能不反映在实际生产负载中打开垃圾收集时的计时情况。' - en: Annoy + id: totrans-185 prefs: - PREF_H2 type: TYPE_NORMAL + zh: Annoy - en: '[Annoy](https://oreil.ly/1qqfv) (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings for searching nearest neighbors. Synonymous with speed, it was released by Spotify and is used in production to serve its music recommendations. In contrast to its name, it’s actually fun and easy to use.' + id: totrans-186 prefs: [] type: TYPE_NORMAL + zh: '[Annoy](https://oreil.ly/1qqfv)(近似最近邻居)是一个带有Python绑定的C++库,用于搜索最近邻居。以速度闻名,由Spotify发布,并用于生产中提供其音乐推荐。与其名字相反,实际上使用起来很有趣且简单。' - en: 'To use Annoy, we install it using `pip`:' + id: totrans-187 prefs: [] type: TYPE_NORMAL + zh: 要使用Annoy,我们使用`pip`进行安装: - en: '[PRE30]' + id: totrans-188 prefs: [] type: TYPE_PRE + zh: '[PRE30]' - en: 'It’s fairly straightforward to use. First, we build a search index with two hyperparameters: the number of dimensions of the dataset and the number of trees:' + id: totrans-189 prefs: [] type: TYPE_NORMAL + zh: 使用起来相当简单。首先,我们使用两个超参数构建一个搜索索引:数据集的维度数量和树的数量: - en: '[PRE31]' + id: totrans-190 prefs: [] type: TYPE_PRE + zh: '[PRE31]' - en: 'Now let’s find out the time it takes to search the five nearest neighbors of one image:' + id: totrans-191 prefs: [] type: TYPE_NORMAL + zh: 现在让我们看看搜索一个图像的五个最近邻居需要多长时间: - en: '[PRE32]' + id: totrans-192 prefs: [] type: TYPE_PRE + zh: '[PRE32]' - en: '[PRE33]' + id: totrans-193 prefs: [] type: TYPE_PRE + zh: '[PRE33]' - en: Now that is blazing fast! To put this in perspective, even for our million-item dataset, this can serve almost 28,000 requests on a single CPU core. Considering most CPUs have multiple cores, it should be able to handle more than 100,000 requests @@ -934,27 +1303,37 @@ memory between multiple processes. Thus, the biggest index can be equivalent to the size of your overall RAM, making it possible to serve multiple requests on a single system. + id: totrans-194 prefs: [] type: TYPE_NORMAL + zh: 现在这真是飞快!换个角度来看,即使对于我们的百万项目数据集,这在单个CPU核心上可以处理近28,000个请求。考虑到大多数CPU都有多个核心,它应该能够在单个系统上处理超过100,000个请求。最好的部分是它允许您在多个进程之间共享相同的内存中的索引。因此,最大的索引可以等同于您的整体RAM大小,使得在单个系统上处理多个请求成为可能。 - en: Other benefits include that it generates a modestly sized index. Moreover, it decouples creating indexes from loading them, so you can create an index on one machine, pass it around, and then on your serving machine load it in memory and serve it. + id: totrans-195 prefs: [] type: TYPE_NORMAL + zh: 其他好处包括生成一个适度大小的索引。此外,它将创建索引与加载索引分离,因此您可以在一台机器上创建索引,传递它,然后在您的服务机器上将其加载到内存中并提供服务。 - en: Tip + id: totrans-196 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 提示 - en: Wondering about how many trees to use? More trees give higher precision, but larger indexes. Usually, no more than 50 trees are required to attain the highest precision. + id: totrans-197 prefs: [] type: TYPE_NORMAL + zh: 想知道要使用多少棵树吗?更多的树会提供更高的精度,但会增加索引的大小。通常,不需要超过50棵树来获得最高精度。 - en: NGT + id: totrans-198 prefs: - PREF_H2 type: TYPE_NORMAL + zh: NGT - en: Yahoo Japan’s Neighborhood Graph and Tree (NGT) library currently leads most benchmarks and is best suited for large datasets (in millions of items) with large dimensions (in several thousands). Although the library has existed since 2016, @@ -963,12 +1342,16 @@ Neighbor Graph for proximity). Considering multiple threads might be running NGT on a server, it can place the index in shared memory with the help of memory mapped files, helping to reduce memory usage as well as increase load time. + id: totrans-199 prefs: [] type: TYPE_NORMAL + zh: 雅虎日本的邻域图和树(NGT)库目前在大多数基准测试中处于领先地位,最适合具有大维度(数千个)的大型数据集(数百万个项目)。尽管该库自2016年以来就存在,但真正进入行业基准测试场景的时间是在2018年,当时实现了ONNG算法(*k*最近邻图索引优化),考虑到可能有多个线程在服务器上运行NGT,它可以将索引放在共享内存中,借助内存映射文件来帮助减少内存使用量以及增加加载时间。 - en: Faiss + id: totrans-200 prefs: - PREF_H2 type: TYPE_NORMAL + zh: Faiss - en: Faiss is Facebook’s efficient similarity search library. It can scale to billions of vectors in RAM on a single server by storing a compressed representation of the vectors (compact quantization codes) instead of the original values. It’s @@ -978,105 +1361,151 @@ accuracy, memory usage, and indexing time. It’s one of the fastest known implementations of ANN search on GPU. Hey, if it’s good enough for Facebook, it’s good enough for most of us (as long as we have enough data). + id: totrans-201 prefs: [] type: TYPE_NORMAL + zh: Faiss是Facebook的高效相似性搜索库。通过存储向量的压缩表示(紧凑的量化代码)而不是原始值,它可以在单个服务器上扩展到数十亿个向量的RAM。它特别适用于密集向量。通过在GPU内存(VRAM)上存储索引,特别适用于具有GPU的机器。这适用于单GPU和多GPU设置。它提供了根据搜索时间、准确性、内存使用和索引时间配置性能的能力。它是已知的在GPU上最快的ANN搜索实现之一。嘿,如果对Facebook来说足够好,那对我们大多数人来说也足够好(只要我们有足够的数据)。 - en: While showing the entire process is beyond the scope of this book, we recommend installing Faiss using Anaconda or using its Docker containers to quickly get started. + id: totrans-202 prefs: [] type: TYPE_NORMAL + zh: 虽然展示整个过程超出了本书的范围,但我们建议使用Anaconda安装Faiss或使用其Docker容器快速入门。 - en: Improving Accuracy with Fine Tuning + id: totrans-203 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 通过微调提高准确性 - en: Many of the pretrained models were trained on the ImageNet dataset. Therefore, they provide an incredible starting point for similarity computations in most situations. That said, if you tuned these models to adapt to your specific problem, they would perform even more accurately at finding similar images. + id: totrans-204 prefs: [] type: TYPE_NORMAL + zh: 许多预训练模型是在ImageNet数据集上训练的。因此,在大多数情况下,它们为相似性计算提供了一个令人难以置信的起点。也就是说,如果你调整这些模型以适应你的特定问题,它们将更准确地找到相似的图像。 - en: In this portion of the chapter, we identify the worst-performing categories, visualize them with t-SNE, fine tune, and then see how their t-SNE graph changes. + id: totrans-205 prefs: [] type: TYPE_NORMAL + zh: 在本章的这一部分,我们识别了表现最差的类别,用t-SNE进行可视化,微调,然后看看它们的t-SNE图表如何变化。 - en: What is a good metric to check whether you are indeed getting similar images? + id: totrans-206 prefs: [] type: TYPE_NORMAL + zh: 什么是一个好的度量标准,用来检查是否确实获得了相似的图像? - en: Painful option 1 + id: totrans-207 prefs: [] type: TYPE_NORMAL + zh: 痛苦的选择1 - en: Go through the entire dataset one image at a time, and manually score whether the returned images indeed look similar. + id: totrans-208 prefs: [] type: TYPE_NORMAL + zh: 逐个图像浏览整个数据集,并手动评分返回的图像是否确实看起来相似。 - en: Happier option 2 + id: totrans-209 prefs: [] type: TYPE_NORMAL + zh: 更快乐的选择2 - en: Simply calculate accuracy. That is, for an image belonging to category *X*, are the similar images belonging to the same category? We will refer to this similarity accuracy. + id: totrans-210 prefs: [] type: TYPE_NORMAL + zh: 简单地计算准确性。也就是说,对于属于类别*X*的图像,相似的图像是否属于相同的类别?我们将称之为相似性准确性。 - en: 'So, what are our worst-performing categories? And why are they the worst? To answer this, we have predefined a helper function `worst_classes`. For every image in the dataset, it finds the nearest neighbors using the brute-force algorithm and then returns six classes with the least accuracy. To see the effects of fine tuning, we run our analysis on a more difficult dataset: Caltech-256\. Calling this function unveils the least-accurate classes:' + id: totrans-211 prefs: [] type: TYPE_NORMAL + zh: 那么,我们表现最差的类别是什么?为什么它们表现最差?为了回答这个问题,我们预先定义了一个辅助函数`worst_classes`。对于数据集中的每个图像,它使用蛮力算法找到最近邻,然后返回准确性最低的六个类别。为了查看微调的效果,我们在一个更困难的数据集上运行我们的分析:Caltech-256。调用这个函数揭示了准确性最低的类别: - en: '[PRE34]' + id: totrans-212 prefs: [] type: TYPE_PRE + zh: '[PRE34]' - en: '[PRE35]' + id: totrans-213 prefs: [] type: TYPE_PRE + zh: '[PRE35]' - en: To see why they are performing so poorly on certain classes, we’ve plotted a t-SNE graph to visualize the embeddings in 2D space, which you can see in [Figure 4-16](part0006.html#t-sne_visualization_of_feature_vectors_o). To prevent overcrowding on our plot, we use only 50 items from each of the 6 classes. + id: totrans-214 prefs: [] type: TYPE_NORMAL + zh: 为了了解它们在某些类别上表现如此糟糕的原因,我们绘制了一个t-SNE图表,以在2D空间中可视化嵌入,你可以在图4-16中看到。为了防止图表过度拥挤,我们只使用了每个6个类别中的50个项目。 - en: Tip + id: totrans-215 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 提示 - en: To enhance the visibility of the graph we can define different markers and different colors for each class. Matplotlib provides a wide variety of [markers](https://oreil.ly/cnoiE) and [colors](https://oreil.ly/Jox4B). + id: totrans-216 prefs: [] type: TYPE_NORMAL + zh: 为了增强图表的可见性,我们可以为每个类别定义不同的标记和不同的颜色。Matplotlib提供了各种[标记](https://oreil.ly/cnoiE)和[颜色](https://oreil.ly/Jox4B)。 - en: '[PRE36]' + id: totrans-217 prefs: [] type: TYPE_PRE + zh: '[PRE36]' - en: '![t-SNE visualization of feature vectors of least accurate classes before fine-tuning](../images/00117.jpeg)' + id: totrans-218 prefs: [] type: TYPE_IMG + zh: '![在微调之前对最不准确类别的特征向量进行t-SNE可视化](../images/00117.jpeg)' - en: Figure 4-16\. t-SNE visualization of feature vectors of least-accurate classes before fine tuning + id: totrans-219 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 图4-16. 在微调之前对最不准确类别的特征向量进行t-SNE可视化 - en: Aah, these feature vectors are all over the place and on top of one another. Using these feature vectors in other applications such as classification might not be a good idea because it would be difficult to find a clean plane of separation between them. No wonder they performed so poorly in this nearest neighbor–based classification test. + id: totrans-220 prefs: [] type: TYPE_NORMAL + zh: 啊,这些特征向量到处都是,互相重叠。在其他应用程序中使用这些特征向量,如分类,可能不是一个好主意,因为很难找到它们之间的清晰分隔平面。难怪它们在这个基于最近邻的分类测试中表现如此糟糕。 - en: What do you think will be the result if we repeat these steps with the fine-tuned model? We reckon something interesting; let’s take a look at [Figure 4-17](part0006.html#t-sne_visualization_of_feature_v-id00001) to see. + id: totrans-221 prefs: [] type: TYPE_NORMAL + zh: 如果我们使用微调模型重复这些步骤,你认为结果会是什么?我们认为会有一些有趣的事情;让我们看看图4-17,看看。 - en: '![t-SNE visualization of feature vectors of least accurate classes after fine tuning](../images/00078.jpeg)' + id: totrans-222 prefs: [] type: TYPE_IMG + zh: '![在微调后对最不准确类别的特征向量进行t-SNE可视化](../images/00078.jpeg)' - en: Figure 4-17\. t-SNE visualization of feature vectors of least-accurate classes after fine tuning + id: totrans-223 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 图4-17. 在微调后对最不准确类别的特征向量进行t-SNE可视化 - en: This is so much cleaner. With just a little bit of fine tuning as shown in [Chapter 3](part0005.html#4OIQ3-13fa565533764549a6f0ab7f11eed62b), the embeddings begin to group together. Compare the noisy/scattered embeddings of the pretrained models against those of the fine-tuned model. A machine learning @@ -1085,109 +1514,155 @@ similar images when not using a classifier. And, remember, these were the classes with the highest misclassifications; imagine how nicely the classes with originally higher accuracy would be after fine tuning. + id: totrans-224 prefs: [] type: TYPE_NORMAL + zh: 这样就清爽多了。就像在[第3章](part0005.html#4OIQ3-13fa565533764549a6f0ab7f11eed62b)中展示的那样,只需轻微微调,嵌入就开始聚集在一起。将预训练模型的嵌入与微调模型的嵌入进行比较。机器学习分类器将能够更轻松地在这些类别之间找到一个分隔平面,从而提高分类准确性,以及在不使用分类器时更相似的图像。请记住,这些是最高误分类的类别;想象一下在微调后原本准确率更高的类别会有多么好。 - en: Previously, the pretrained embeddings achieved 56% accuracy. The new embeddings after fine tuning deliver a whopping 87% accuracy! A little magic goes a long way. + id: totrans-225 prefs: [] type: TYPE_NORMAL + zh: 以前,预训练的嵌入实现了56%的准确率。微调后的新嵌入提供了惊人的87%准确率!一点点魔法就能产生巨大的影响。 - en: The one limitation for fine tuning is the requirement of labeled data, which is not always present. So depending on your use case, you might need to label some amount of data. + id: totrans-226 prefs: [] type: TYPE_NORMAL + zh: 微调的一个限制是需要有标记的数据,这并不总是存在。因此,根据您的用例,您可能需要标记一些数据。 - en: There’s a small unconventional training trick involved, though, which we discuss in the next section. + id: totrans-227 prefs: [] type: TYPE_NORMAL + zh: 不过,还有一个小小的非传统训练技巧,我们将在下一节中讨论。 - en: Fine Tuning Without Fully Connected Layers + id: totrans-228 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 没有全连接层的微调 - en: 'As we already know, a neural network comprises three parts:' + id: totrans-229 prefs: [] type: TYPE_NORMAL + zh: 正如我们已经知道的,神经网络由三部分组成: - en: Convolutional layers, which end up generating the feature vectors + id: totrans-230 prefs: - PREF_UL type: TYPE_NORMAL + zh: 最终生成特征向量的卷积层 - en: Fully connected layers + id: totrans-231 prefs: - PREF_UL type: TYPE_NORMAL + zh: 全连接层 - en: The final classifier layer + id: totrans-232 prefs: - PREF_UL type: TYPE_NORMAL + zh: 最终的分类器层 - en: 'Fine tuning, as the name suggests, involves tweaking a neural network lightly to adapt to a new dataset. It usually involves stripping off the fully connected layers (top layers), substituting them with new ones, and then training this newly composed neural network using this dataset. Training in this manner will cause two things:' + id: totrans-233 prefs: [] type: TYPE_NORMAL + zh: 微调,顾名思义,涉及轻微调整神经网络以适应新的数据集。通常涉及剥离全连接层(顶层),用新的层替换它们,然后使用这个数据集训练这个新组合的神经网络。以这种方式训练会导致两件事情: - en: The weights in all the newly added fully connected layers will be significantly affected. + id: totrans-234 prefs: - PREF_UL type: TYPE_NORMAL + zh: 所有新添加的全连接层中的权重将受到显著影响。 - en: The weights in the convolutional layers will be only slightly changed. + id: totrans-235 prefs: - PREF_UL type: TYPE_NORMAL + zh: 卷积层中的权重只会略微改变。 - en: The fully connected layers do a lot of the heavy lifting to get maximum classification accuracy. As a result, the majority of the network that generates the feature vectors will change insignificantly. Thus, the feature vectors, despite fine tuning, will show little change. + id: totrans-236 prefs: [] type: TYPE_NORMAL + zh: 全连接层在获得最大分类准确率方面起着很大作用。因此,生成特征向量的网络的大部分部分将变化微不足道。因此,尽管微调,特征向量将显示很少的变化。 - en: Our aim is for similar-looking objects to have closer feature vectors, which fine tuning as described earlier fails to accomplish. By forcing all of the task-specific learning to happen in the convolutional layers, we can see much better results. How do we achieve that? *By removing all of the fully connected layers and placing a classifier layer directly after the convolutional layers (which generate the feature vectors).* This model is optimized for similarity search rather than classification. + id: totrans-237 prefs: [] type: TYPE_NORMAL + zh: 我们的目标是让外观相似的对象具有更接近的特征向量,而之前描述的微调未能实现这一目标。通过强制所有特定任务的学习发生在卷积层中,我们可以看到更好的结果。我们如何实现这一点呢?*通过移除所有全连接层,并在卷积层之后直接放置一个分类器层(生成特征向量的卷积层)。*这个模型是为相似性搜索而优化的,而不是为分类。 - en: 'To compare the process of fine tuning a model optimized for classification tasks as opposed to similarity search, let’s recall how we fine tuned our model in [Chapter 3](part0005.html#4OIQ3-13fa565533764549a6f0ab7f11eed62b) for classification:' + id: totrans-238 prefs: [] type: TYPE_NORMAL + zh: 为了比较微调模型优化的过程,用于分类任务与相似性搜索,让我们回顾一下我们在[第3章](part0005.html#4OIQ3-13fa565533764549a6f0ab7f11eed62b)中如何微调我们的模型以进行分类: - en: '[PRE37]' + id: totrans-239 prefs: [] type: TYPE_PRE + zh: '[PRE37]' - en: 'And here’s how we fine tune our model for similarity search. Note the missing hidden dense layer in the middle:' + id: totrans-240 prefs: [] type: TYPE_NORMAL + zh: 以下是我们如何为相似性搜索微调我们的模型。请注意中间缺少的隐藏密集层: - en: '[PRE38]' + id: totrans-241 prefs: [] type: TYPE_PRE + zh: '[PRE38]' - en: 'After fine tuning, to use the `model_similarity_optimized` for extracting features instead of giving probabilities for classes, simply `pop` (i.e., remove) the last layer:' + id: totrans-242 prefs: [] type: TYPE_NORMAL + zh: 在微调之后,为了使用`model_similarity_optimized`来提取特征而不是为类别给出概率,只需`pop`(即移除)最后一层: - en: '[PRE39]' + id: totrans-243 prefs: [] type: TYPE_PRE + zh: '[PRE39]' - en: The key thing to appreciate here is if you used the regular fine-tuning process, we would get lower similarity accuracy than `model_similarity_optimized`. Obviously, we would want to use `model_classification_optimized` for classification scenarios and `model_similarity_optimized` for extracting embeddings for similarity search. + id: totrans-244 prefs: [] type: TYPE_NORMAL + zh: 这里需要注意的关键一点是,如果您使用常规的微调过程,我们将获得比`model_similarity_optimized`更低的相似性准确率。显然,我们希望在分类场景下使用`model_classification_optimized`,在提取嵌入以进行相似性搜索时使用`model_similarity_optimized`。 - en: With all this knowledge, you can now make both a fast and accurate similarity system for any scenario you are working on. It’s time to see how the giants in the AI industry build their products. + id: totrans-245 prefs: [] type: TYPE_NORMAL + zh: 有了所有这些知识,现在你可以为你正在处理的任何场景制作一个快速准确的相似性系统。是时候看看人工智能行业的巨头是如何构建他们的产品的了。 - en: Siamese Networks for One-Shot Face Verification + id: totrans-246 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 用于一次性人脸验证的连体网络 - en: A face verification system is usually trying to ascertain—given two images of faces—whether the two images are of the same person. This is a high-precision binary classifier that needs to robustly work with different lighting, clothing, @@ -1196,22 +1671,30 @@ there might be only a handful of images of the same person available. Similarly, signature identification in banks and product identification on Amazon suffer the same challenge of limited images per item. + id: totrans-247 prefs: [] type: TYPE_NORMAL + zh: 人脸验证系统通常试图确定——给定两张人脸图像——这两张图像是否属于同一个人。这是一个高精度的二元分类器,需要能够稳健地处理不同的光照、服装、发型、背景和面部表情。为了增加挑战,尽管在员工数据库中可能有许多人的图像,但可能只有少数同一个人的图像可用。同样,在银行的签名识别和亚马逊的产品识别中也存在相同的限制每个项目图像数量的挑战。 - en: 'How would you go about training such a classifier? Picking embeddings from a model like ResNet pretrained on ImageNet might not discern these fine facial attributes. One approach is to put each person as a separate class and then train like we usually train a regular network. Two key issues arise:' + id: totrans-248 prefs: [] type: TYPE_NORMAL + zh: 你会如何训练这样的分类器?从ImageNet预训练的ResNet模型中挑选嵌入可能无法区分这些细微的面部特征。一种方法是将每个人作为一个单独的类别,然后像通常训练常规网络一样进行训练。出现了两个关键问题: - en: If we had a million individuals, training for a million categories is not feasible. + id: totrans-249 prefs: - PREF_UL type: TYPE_NORMAL + zh: 如果我们有一百万个个体,训练一百万个类别是不可行的。 - en: Training with a few images per class will lead to overtraining. + id: totrans-250 prefs: - PREF_UL type: TYPE_NORMAL + zh: 每类图像训练数量较少会导致过度训练。 - en: 'Another thought: instead of teaching different categories, we could teach a network to directly compare and decide whether a pair of images are similar or dissimilar by giving guidance on their similarity during training. And this is @@ -1222,17 +1705,23 @@ the network end to end, the embeddings begin to capture the fine-grained representation of the inputs. This approach, shown in [Figure 4-18](part0006.html#a_siamese_network_for_signature_verifica), of directly optimizing for the distance metric is called *metric learning*.' + id: totrans-251 prefs: [] type: TYPE_NORMAL + zh: 另一个想法:我们可以教授网络直接比较并决定一对图像是否相似或不相似,通过在训练期间对它们的相似性提供指导。这就是连体网络背后的关键思想。拿一个模型,输入两张图像,提取两个嵌入,然后计算两个嵌入之间的距离。如果距离低于阈值,则认为它们相似,否则不相似。通过输入一对图像和相关标签,相似或不相似,并将网络端到端地训练,嵌入开始捕捉输入的细粒度表示。这种直接优化距离度量的方法,如[图4-18](part0006.html#a_siamese_network_for_signature_verifica)所示,被称为*度量学习*。 - en: '![A Siamese network for signature verification; note the same CNN was used for both input images](../images/00058.jpeg)' + id: totrans-252 prefs: [] type: TYPE_IMG + zh: '![用于签名验证的连体网络;请注意相同的CNN用于两个输入图像](../images/00058.jpeg)' - en: Figure 4-18\. A Siamese network for signature verification; note that the same CNN was used for both input images + id: totrans-253 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 图4-18。用于签名验证的连体网络;请注意相同的CNN用于两个输入图像 - en: We could extend this idea and even feed three images. Pick one anchor image, pick another positive sample (of the same category), and another negative sample (of a different category). Let’s now train this network to directly optimize for @@ -1241,28 +1730,39 @@ a *triplet loss* function. In the previous case with a pair of images, the loss function is called a *contrastive loss* function. The triplet loss function tends to give better results. + id: totrans-254 prefs: [] type: TYPE_NORMAL + zh: 我们可以扩展这个想法,甚至输入三张图像。选择一个锚定图像,选择另一个正样本(相同类别),和另一个负样本(不同类别)。现在让我们训练这个网络,直接优化相似项之间的距离最小化,不相似项之间的距离最大化。帮助我们实现这一目标的损失函数称为*三元损失*函数。在前面的一对图像的情况下,损失函数称为*对比损失*函数。三元损失函数往往会产生更好的结果。 - en: After the network is trained, we need only one reference image of a face for deciding at test time whether the person is the same. This methodology opens the doors for *one-shot learning*. Other common uses include signature and logo recognition. One remarkably creative application by Saket Maheshwary and Hemant Misra is to use a Siamese network for matching résumés with job applicants by calculating the semantic similarity between the two. + id: totrans-255 prefs: [] type: TYPE_NORMAL + zh: 网络训练完成后,在测试时我们只需要一张脸部的参考图像来判断这个人是否相同。这种方法为*一次性学习*打开了大门。其他常见用途包括签名和标志识别。Saket + Maheshwary和Hemant Misra提出了一个非常有创意的应用,即使用连体网络通过计算两者之间的语义相似性来匹配简历和求职者。 - en: Case Studies + id: totrans-256 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 案例研究 - en: Let’s look at a few interesting examples that show how what we have learned so far is applied in the industry. + id: totrans-257 prefs: [] type: TYPE_NORMAL + zh: 让我们看一些有趣的例子,展示我们迄今所学知识在工业中的应用。 - en: Flickr + id: totrans-258 prefs: - PREF_H2 type: TYPE_NORMAL + zh: Flickr - en: Flickr is one of the largest photo-sharing websites, especially popular among professional photographers. To help photographers find inspiration as well as showcase content the users might find interesting, Flickr produced a similarity @@ -1270,16 +1770,22 @@ exploring a desert pattern leads to several similarly patterned results. Under the hood, Flickr adopted an ANN algorithm called Locally Optimized Product Quantization (LOPQ), which has been open sourced in Python as well as Spark implementations. + id: totrans-259 prefs: [] type: TYPE_NORMAL + zh: Flickr是最大的照片分享网站之一,尤其受到专业摄影师的欢迎。为了帮助摄影师找到灵感并展示用户可能感兴趣的内容,Flickr推出了一个基于相同语义意义的相似性搜索功能。正如在[图4-19](part0006.html#similar_patterns_of_a_desert_photo_left)中所示,探索沙漠图案会导致几个类似图案的结果。在幕后,Flickr采用了一种名为局部优化产品量化(LOPQ)的ANN算法,该算法已在Python和Spark实现中开源。 - en: '![Similar patterns of a desert photo (image source: code.flickr.com]](../images/00184.jpeg)' + id: totrans-260 prefs: [] type: TYPE_IMG + zh: '![沙漠照片的相似图案(图片来源:code.flickr.com)](../images/00184.jpeg)' - en: Figure 4-19\. Similar patterns of a desert photo ([image source](https://code.flickr.com)) + id: totrans-261 prefs: - PREF_H6 type: TYPE_NORMAL - en: Pinterest + id: totrans-262 prefs: - PREF_H2 type: TYPE_NORMAL @@ -1288,6 +1794,7 @@ companies like Baidu and Alibaba have launched similar visual search systems. Also, Zappos, Google Shopping, and [like.com](http://like.com) are using computer vision for recommendation. + id: totrans-263 prefs: [] type: TYPE_NORMAL - en: Within Pinterest “women’s fashion” is one of the most popular themes of pins @@ -1300,18 +1807,22 @@ implements an incremental fingerprinting service that generates new digital signatures if either a new image is uploaded or if there is feature evolution (due to improvements or modifications in the underlying models by the engineers). + id: totrans-264 prefs: [] type: TYPE_NORMAL - en: '![The Similar Looks feature of the Pinterest application (image source: Pinterest blog)](../images/00080.jpeg)' + id: totrans-265 prefs: [] type: TYPE_IMG - en: 'Figure 4-20\. The Similar Looks feature of the Pinterest application (image source: Pinterest blog)' + id: totrans-266 prefs: - PREF_H6 type: TYPE_NORMAL - en: Celebrity Doppelgangers + id: totrans-267 prefs: - PREF_H2 type: TYPE_NORMAL @@ -1320,18 +1831,22 @@ A similar viral approach was taken by the Google Arts & Culture app in 2018, which shows the nearest existing portrait to your face. Twins or not is another application with a similar aim. + id: totrans-268 prefs: [] type: TYPE_NORMAL - en: '![Testing our friend Pete Warden’s photo (technical lead for mobile and embedded TensorFlow at Google) on the celebslike.me website](../images/00062.jpeg)' + id: totrans-269 prefs: [] type: TYPE_IMG - en: Figure 4-21\. Testing our friend Pete Warden’s photo (technical lead for mobile and embedded TensorFlow at Google) on the celebslike.me website + id: totrans-270 prefs: - PREF_H6 type: TYPE_NORMAL - en: Spotify + id: totrans-271 prefs: - PREF_H2 type: TYPE_NORMAL @@ -1352,21 +1867,25 @@ shows artists whose songs are projected in specific areas. We can discern hip-hop (upper left), rock (upper right), pop (lower left), and electronic music (lower right). As already discussed, Spotify uses Annoy in the background. + id: totrans-272 prefs: [] type: TYPE_NORMAL - en: '![t-SNE visualization of the distribution of predicted usage patterns, using latent factors predicted from audio (image source: Deep content-based music recommendation by Aaron van den Oord, Sander Dieleman, Benjamin Schrauwen)](../images/00021.jpeg)' + id: totrans-273 prefs: [] type: TYPE_IMG - en: 'Figure 4-22\. t-SNE visualization of the distribution of predicted usage patterns, using latent factors predicted from audio (image source: “Deep content-based music recommendation” by Aaron van den Oord, Sander Dieleman, Benjamin Schrauwen, NIPS 2013)' + id: totrans-274 prefs: - PREF_H6 type: TYPE_NORMAL - en: Image Captioning + id: totrans-275 prefs: - PREF_H2 type: TYPE_NORMAL @@ -1379,14 +1898,17 @@ visual question-answer pairs, and object segmentations. It serves as a benchmark for a yearly competition to see progress in image captioning, object detection, and segmentation. + id: totrans-276 prefs: [] type: TYPE_NORMAL - en: '![Image captioning feature in Seeing AI: the Talking Camera App for the blind community](../images/00309.jpeg)' + id: totrans-277 prefs: [] type: TYPE_IMG - en: 'Figure 4-23\. Image captioning feature in Seeing AI: the Talking Camera App for the blind community' + id: totrans-278 prefs: - PREF_H6 type: TYPE_NORMAL @@ -1401,6 +1923,7 @@ similar images, and print the caption containing the most common words. In short, a lazy approach would still beat the state-of-the-art one, and this exposed a critical bias in the dataset. + id: totrans-279 prefs: [] type: TYPE_NORMAL - en: 'This bias has been coined the *Giraffe-Tree* problem by Larry Zitnick. Do an @@ -1413,23 +1936,28 @@ of the image, one would arrive at the correct caption using a simple nearest-neighbor search. This shows that to measure the real intelligence of a system, we need more semantically novel/original images in the test set.' + id: totrans-280 prefs: [] type: TYPE_NORMAL - en: '![The Giraffe-Tree problem (image source: Measuring Machine Intelligence Through Visual Question Answering, C. Lawrence Zitnick, Aishwarya Agrawal, Stanislaw Antol, Margaret Mitchell, Dhruv Batra, Devi Parikh)](../images/00268.jpeg)' + id: totrans-281 prefs: [] type: TYPE_IMG - en: 'Figure 4-24\. The Giraffe-Tree problem (image source: Measuring Machine Intelligence Through Visual Question Answering, C. Lawrence Zitnick, Aishwarya Agrawal, Stanislaw Antol, Margaret Mitchell, Dhruv Batra, Devi Parikh)' + id: totrans-282 prefs: - PREF_H6 type: TYPE_NORMAL - en: In short, don’t underestimate a simple nearest-neighbor approach! + id: totrans-283 prefs: [] type: TYPE_NORMAL - en: Summary + id: totrans-284 prefs: - PREF_H1 type: TYPE_NORMAL @@ -1443,5 +1971,7 @@ to do one-shot learning, such as for face verification systems. We finally examined how nearest-neighbor approaches are used in various use cases across the industry. Nearest neighbors are a simple yet powerful tool to have in your toolkit. + id: totrans-285 prefs: [] type: TYPE_NORMAL + zh: 现在我们已经完成了一次成功的探险,我们在其中探索了如何利用嵌入来定位相似的图像。我们通过探索如何利用ANN算法和库(包括Annoy、NGT和Faiss)将搜索从几千个文档扩展到几十亿个文档。我们还了解到,通过对模型进行微调,可以提高在监督设置中嵌入的准确性和代表性能力。最后,我们看了如何使用Siamese网络,利用嵌入的力量进行一次性学习,比如用于人脸验证系统。最后,我们研究了在行业中各种用例中如何使用最近邻方法。最近邻是您工具包中简单但强大的工具。