From 7d91600b6bed4b53ffb16aea6928ef9250ade195 Mon Sep 17 00:00:00 2001 From: wizardforcel <562826179@qq.com> Date: Thu, 8 Feb 2024 19:24:21 +0800 Subject: [PATCH] 2024-02-08 19:24:19 --- totrans/prac-dl-cld_01.yaml | 588 ++++++++++++++++++++++++++++++++++++ 1 file changed, 588 insertions(+) diff --git a/totrans/prac-dl-cld_01.yaml b/totrans/prac-dl-cld_01.yaml index e5d502f..8a4d684 100644 --- a/totrans/prac-dl-cld_01.yaml +++ b/totrans/prac-dl-cld_01.yaml @@ -1,12 +1,16 @@ - en: Chapter 1\. Exploring the Landscape of Artificial Intelligence + id: totrans-0 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 第1章 探索人工智能的领域 - en: 'Following are the words from Dr. May Carson’s ([Figure 1-1](part0003.html#drdot_may_carson)) seminal paper on the changing role of artificial intelligence (AI) in human life in the twenty-first century:' + id: totrans-1 prefs: [] type: TYPE_NORMAL + zh: 以下是Dr. May Carson([图1-1](part0003.html#drdot_may_carson))关于人工智能(AI)在21世纪人类生活中角色变化的重要论文中的话: - en: Artificial Intelligence has often been termed as the electricity of the 21st century. Today, artificial intelligent programs will have the power to drive all forms of industry (including health), design medical devices and build new types @@ -15,10 +19,13 @@ can do their job and, importantly, avoid mistakes or dangerous accidents. Organizations need AI, but they also recognize that not everything they can do with AI is a good idea. + id: totrans-2 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 人工智能经常被称为21世纪的电力。今天,人工智能程序将有能力推动各种行业(包括医疗),设计医疗设备并构建新型产品和服务,包括机器人和汽车。随着AI的发展,组织已经在努力确保这些人工智能程序能够胜任工作,并且重要的是,避免错误或危险事故。组织需要AI,但他们也意识到并非所有可以用AI做的事情都是一个好主意。 - en: '' + id: totrans-3 prefs: - PREF_BQ type: TYPE_NORMAL @@ -27,10 +34,13 @@ money spent on AI programs per person, per year versus the amount used to research, build and produce them is roughly equal. That seems obvious, but it’s not entirely true. + id: totrans-4 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 我们对使用这些技术和政策操作人工智能所需的资金进行了广泛研究。主要结论是,每年每人在AI项目上花费的资金与用于研究、构建和生产这些项目的资金大致相等。这似乎是显而易见的,但并非完全如此。 - en: '' + id: totrans-5 prefs: - PREF_BQ type: TYPE_NORMAL @@ -42,20 +52,28 @@ complex than humans. For example, people will most often work in jobs requiring advanced knowledge but are not necessarily skilled in working with systems that need to be built and maintained. + id: totrans-6 prefs: - PREF_BQ type: TYPE_NORMAL + zh: 首先,AI系统需要支持和维护来帮助它们的功能。为了真正可靠,他们需要人们具备运行它们和帮助他们执行一些任务的技能。AI组织提供工作人员来执行这些服务所需的复杂任务是至关重要的。了解从事这些工作的人员也很重要,尤其是当AI比人类更复杂时。例如,人们通常会从事需要高级知识的工作,但不一定擅长处理需要构建和维护的系统。 - en: '![Dr. May Carson](../images/00247.jpeg)' + id: totrans-7 prefs: [] type: TYPE_IMG + zh: '![Dr. May Carson](../images/00247.jpeg)' - en: Figure 1-1\. Dr. May Carson + id: totrans-8 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 图1-1. Dr. May Carson - en: An Apology + id: totrans-9 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 致歉 - en: We now have to come clean and admit that everything in this chapter up to now was entirely fake. Literally everything! All of the text (other than the first italicized sentence, which was written by us as a seed) was generated using the @@ -65,36 +83,49 @@ be real, right? Nope, the picture was generated from the website *[ThisPersonDoesNotExist.com](http://ThisPersonDoesNotExist.com)* which shows us new pictures of nonexistent people each time we reload the page using the magic of Generative Adversarial Networks (GANs). + id: totrans-10 prefs: [] type: TYPE_NORMAL + zh: 现在我们必须坦白承认,本章直到现在为止的所有内容都是完全虚假的。真的是一切!所有的文本(除了第一个斜体句子,这是我们写的种子)都是使用GPT-2模型(由Adam + King构建)在网站*[TalkToTransformer.com](http://TalkToTransformer.com)*上生成的。作者的名字是使用网站*[Onitools.moe](http://Onitools.moe)*上的“Nado + Name Generator”生成的。至少作者的照片必须是真实的,对吧?不,这张照片是从网站*[ThisPersonDoesNotExist.com](http://ThisPersonDoesNotExist.com)*生成的,该网站利用生成对抗网络(GANs)的魔法每次重新加载页面时向我们展示不存在的人的新图片。 - en: Although we feel ambivalent, to say the least, about starting this entire book on a dishonest note, we thought it was important to showcase the state-of-the-art of AI when you, our reader, least expected it. It is, frankly, mind-boggling and amazing and terrifying at the same time to see what AI is already capable of. The fact that it can create sentences out of thin air that are more intelligent and eloquent than some world leaders is...let’s just say big league. + id: totrans-11 prefs: [] type: TYPE_NORMAL + zh: 尽管我们对以不诚实的方式开始整本书感到矛盾,但我们认为展示AI的最新技术是重要的,当您,我们的读者,最不期望时。看到AI已经能够创造出比一些世界领导人更聪明和雄辩的句子,这实在是令人难以置信、惊人和可怕的。它能够凭空创造出句子,这让我们不禁要说,这是大联盟。 - en: That being said, one thing AI can’t appropriate from us just yet is the ability to be fun. We’re hoping that those first three fake paragraphs will be the driest in this entire book. After all, we don’t want to be known as “the authors more boring than a machine.” + id: totrans-12 prefs: [] type: TYPE_NORMAL + zh: 话虽如此,AI目前还无法从我们这里获得的一件事是有趣的能力。我们希望这本书中前三段虚假的段落将是整本书中最枯燥的部分。毕竟,我们不想被称为“比机器更无聊的作者”。 - en: The Real Introduction + id: totrans-13 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 真正的介绍 - en: Recall that time you saw a magic show during which a trick dazzled you enough to think, “How the heck did they do that?!” Have you ever wondered the same about an AI application that made the news? In this book, we want to equip you with the knowledge and tools to not only deconstruct but also build a similar one. + id: totrans-14 prefs: [] type: TYPE_NORMAL + zh: 还记得你看过的一个魔术表演,其中一个把戏让你眼花缭乱,让你想,“他们到底是怎么做到的?!”你是否曾经对一则AI应用程序登上新闻感到好奇?在这本书中,我们希望为您提供知识和工具,不仅可以解构,还可以构建类似的应用程序。 - en: 'Through accessible, step-by-step explanations, we dissect real-world applications that use AI and showcase how you would go about creating them on a wide variety of platforms—from the cloud to the browser to smartphones to edge AI devices, and finally landing on the ultimate challenge currently in AI: autonomous cars.' + id: totrans-15 prefs: [] type: TYPE_NORMAL - en: In most chapters, we begin with a motivating problem and then build an end-to-end @@ -106,6 +137,7 @@ word “Practical” in the book title. To that effect, we discuss various options that are available to us and choose the appropriate options based on performance, energy consumption, scalability, reliability, and privacy trade-offs. + id: totrans-16 prefs: [] type: TYPE_NORMAL - en: In this first chapter, we take a step back to appreciate this moment in AI history. @@ -114,12 +146,15 @@ areas of technological progress in the early twenty-first century. We also examine the core components underlying a complete deep learning solution, to set us up for the subsequent chapters in which we actually get our hands dirty. + id: totrans-17 prefs: [] type: TYPE_NORMAL - en: So our journey begins here, with a very fundamental question. + id: totrans-18 prefs: [] type: TYPE_NORMAL - en: What Is AI? + id: totrans-19 prefs: - PREF_H1 type: TYPE_NORMAL @@ -127,16 +162,20 @@ learning,” and “deep learning” frequently, sometimes interchangeably. But in the strictest technical terms, they mean different things. Here’s a synopsis of each (see also [Figure 1-2](part0003.html#the_relationship_between_aicomma_machine)):' + id: totrans-20 prefs: [] type: TYPE_NORMAL - en: AI + id: totrans-21 prefs: [] type: TYPE_NORMAL - en: This gives machines the capabilities to mimic human behavior. IBM’s Deep Blue is a recognizable example of AI. + id: totrans-22 prefs: [] type: TYPE_NORMAL - en: Machine learning + id: totrans-23 prefs: [] type: TYPE_NORMAL - en: This is the branch of AI in which machines use statistical techniques to learn @@ -145,27 +184,34 @@ IBM’s Watson take on Ken Jennings and Brad Rutter on *Jeopardy!*, you saw machine learning in action. More relatably, the next time a spam email doesn’t reach your inbox, you can thank machine learning. + id: totrans-24 prefs: [] type: TYPE_NORMAL - en: Deep learning + id: totrans-25 prefs: [] type: TYPE_NORMAL - en: This is a subfield of machine learning in which deep, multilayered neural networks are used to make predictions, especially excelling in computer vision, speech recognition, natural language understanding, and so on. + id: totrans-26 prefs: [] type: TYPE_NORMAL - en: '![The relationship between AI, machine learning, and deep learning](../images/00160.jpeg)' + id: totrans-27 prefs: [] type: TYPE_IMG - en: Figure 1-2\. The relationship between AI, machine learning, and deep learning + id: totrans-28 prefs: - PREF_H6 type: TYPE_NORMAL - en: Throughout this book, we primarily focus on deep learning. + id: totrans-29 prefs: [] type: TYPE_NORMAL - en: Motivating Examples + id: totrans-30 prefs: - PREF_H2 type: TYPE_NORMAL @@ -173,76 +219,98 @@ your hard-earned money^([1](part0003.html#ch01fn1)) buying this book? Our motivation was simple: to get more people involved in the world of AI. The fact that you’re reading this book means that our job is already halfway done.' + id: totrans-31 prefs: [] type: TYPE_NORMAL - en: 'However, to really pique your interest, let’s take a look at some stellar examples that demonstrate what AI is already capable of doing:' + id: totrans-32 prefs: [] type: TYPE_NORMAL - en: '“DeepMind’s AI agents conquer human pros at StarCraft II”: *The Verge*, 2019' + id: totrans-33 prefs: - PREF_UL type: TYPE_NORMAL - en: '“AI-Generated Art Sells for Nearly Half a Million Dollars at Christie’s”: *AdWeek*, 2018' + id: totrans-34 prefs: - PREF_UL type: TYPE_NORMAL - en: '“AI Beats Radiologists in Detecting Lung Cancer”: *American Journal of Managed Care*, 2019' + id: totrans-35 prefs: - PREF_UL type: TYPE_NORMAL - en: '“Boston Dynamics Atlas Robot Can Do Parkour”: *ExtremeTech*, 2018' + id: totrans-36 prefs: - PREF_UL type: TYPE_NORMAL - en: '“Facebook, Carnegie Mellon build first AI that beats pros in 6-player poker”: *ai.facebook.com*, 2019' + id: totrans-37 prefs: - PREF_UL type: TYPE_NORMAL - en: '“Blind users can now explore photos by touch with Microsoft’s Seeing AI”: *TechCrunch*, 2019' + id: totrans-38 prefs: - PREF_UL type: TYPE_NORMAL - en: '“IBM’s Watson supercomputer defeats humans in final Jeopardy match”: *VentureBeat*, 2011' + id: totrans-39 prefs: - PREF_UL type: TYPE_NORMAL + zh: “IBM的沃森超级计算机在最后的《危险边缘》比赛中击败人类”:*VentureBeat*,2011 - en: '“Google’s ML-Jam challenges musicians to improvise and collaborate with AI”: *VentureBeat*, 2019' + id: totrans-40 prefs: - PREF_UL type: TYPE_NORMAL + zh: “谷歌的ML-Jam挑战音乐家即兴演奏并与人工智能合作”:*VentureBeat*,2019 - en: '“Mastering the Game of Go without Human Knowledge”: *Nature*, 2017' + id: totrans-41 prefs: - PREF_UL type: TYPE_NORMAL + zh: “无需人类知识即可掌握围棋游戏”:*Nature*,2017 - en: '“Chinese AI Beats Doctors in Diagnosing Brain Tumors”: *Popular Mechanics*, 2018' + id: totrans-42 prefs: - PREF_UL type: TYPE_NORMAL + zh: “中国人工智能在诊断脑瘤方面击败医生”:*Popular Mechanics*,2018 - en: '“Two new planets discovered using artificial intelligence”: *Phys.org*, 2019' + id: totrans-43 prefs: - PREF_UL type: TYPE_NORMAL + zh: “使用人工智能发现了两颗新行星”:*Phys.org*,2019 - en: '“Nvidia’s latest AI software turns rough doodles into realistic landscapes”: *The Verge*, 2019' + id: totrans-44 prefs: - PREF_UL type: TYPE_NORMAL + zh: “英伟达最新的人工智能软件将粗糙的涂鸦转化为逼真的风景”:*The Verge*,2019 - en: These applications of AI serve as our North Star. The level of these achievements is the equivalent of a gold-medal-winning Olympic performance. However, applications solving a host of problems in the real world is the equivalent of completing a 5K race. Developing these applications doesn’t require years of training, yet doing so provides the developer immense satisfaction when crossing the finish line. We are here to coach you through that 5K. + id: totrans-45 prefs: [] type: TYPE_NORMAL + zh: 这些人工智能的应用是我们的北极星。这些成就的水平相当于获得金牌的奥运表现。然而,在现实世界中解决一系列问题的应用相当于完成一场5K比赛。开发这些应用并不需要多年的训练,但当开发者跨过终点线时,会获得巨大的满足感。我们在这里指导你完成这场5K比赛。 - en: Throughout this book, we intentionally prioritize breadth. The field of AI is changing so quickly that we can only hope to equip you with the proper mindset and array of tools. In addition to tackling individual problems, we will look @@ -253,25 +321,33 @@ 0 to 80 quickly to tackle real-world problems. If we’ve generated enough interest that you decide you then want to go from 80 to 95, we’d consider our goal achieved. As the oft-used phrase goes, we want to “democratize AI.” + id: totrans-46 prefs: [] type: TYPE_NORMAL + zh: 在本书中,我们有意地优先考虑广度。人工智能领域的变化如此之快,以至于我们只能希望为您提供正确的思维方式和一系列工具。除了解决个别问题外,我们还将探讨不同的、看似无关的问题之间的基本重叠,这些重叠可以为我们所用。例如,声音识别使用卷积神经网络(CNNs),这也是现代计算机视觉的基础。我们涉及多个领域的实际方面,因此您将能够迅速从0到80,解决现实世界中的问题。如果我们引起了足够的兴趣,让您决定从80到95,我们将认为我们的目标已经实现。正如常用的短语所说,我们希望“民主化人工智能”。 - en: 'It’s important to note that much of the progress in AI happened in just the past few years—it’s difficult to overstate that. To illustrate how far we’ve come along, take this example: five years ago, you needed a Ph.D. just to get your foot in the door of the industry. Five years later, you don’t even need a Ph.D. to write an entire book on the subject. (Seriously, check our profiles!)' + id: totrans-47 prefs: [] type: TYPE_NORMAL + zh: 重要的是要注意,人工智能领域的许多进展都是在过去几年中取得的——这一点很难言之过早。为了说明我们已经取得了多大的进步,举个例子:五年前,你需要一个博士学位才能进入这个行业。五年后,你甚至不需要博士学位就能写一本关于这个主题的整本书。(真的,查看我们的个人资料吧!) - en: Although modern applications of deep learning seem pretty amazing, they did not get there all on their own. They stood on the shoulders of many giants of the industry who have been pushing the limits for decades. Indeed, we can’t fully appreciate the significance of this time without looking at the entire history. + id: totrans-48 prefs: [] type: TYPE_NORMAL + zh: 尽管现代深度学习的应用似乎非常惊人,但它们并不是靠自己就能达到这一点的。它们站在许多行业巨头的肩膀上,这些人已经在推动行业的极限数十年了。事实上,我们不能完全欣赏这个时代的重要性,而不看整个历史。 - en: A Brief History of AI + id: totrans-49 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 人工智能简史 - en: 'Let’s go back in time a little bit: our whole universe was in a hot dense state. Then nearly 14 billion years ago expansion started, wait...okay, we don’t have to go back that far (but now we have the song stuck in your head for the rest @@ -283,20 +359,26 @@ Turing found that framework rather restrictive and instead proposed a test: if a human cannot distinguish a machine from another human, does it really matter? An AI that can mimic a human is, in essence, human.' + id: totrans-50 prefs: [] type: TYPE_NORMAL + zh: 让我们回到过去一点:我们整个宇宙都处于一个炽热而密集的状态。然后大约在140亿年前开始膨胀,等等...好吧,我们不必回溯那么久(但现在你一整天都会被这首歌困扰,对吧?)。其实,70年前才种下了人工智能的第一颗种子。艾伦·图灵在他1950年的论文《计算机器械与智能》中首次提出了“机器能思考吗?”这实际上涉及到了更大的哲学辩论,即意识和成为人类的含义。拥有创作协奏曲的能力并知道你已经创作了它,这是什么意思?图灵发现这种框架相当狭隘,而提出了一个测试:如果一个人无法区分机器和另一个人,那真的重要吗?一个能够模仿人类的人工智能本质上就是人类。 - en: Exciting Beginnings + id: totrans-51 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 令人兴奋的开端 - en: The term “artificial intelligence” was coined by John McCarthy in 1956 at the Dartmouth Summer Research Project. Physical computers weren’t even really a thing back then, so it’s remarkable that they were able to discuss futuristic areas such as language simulation, self-improving learning machines, abstractions on sensory data, and more. Much of it was theoretical, of course. This was the first time that AI became a field of research rather than a single project. + id: totrans-52 prefs: [] type: TYPE_NORMAL + zh: “人工智能”这个术语是由约翰·麦卡锡在1956年的达特茅斯暑期研究项目中创造的。当时实际的计算机甚至并不是真正存在的东西,所以令人惊讶的是,他们能够讨论诸如语言模拟、自我改进的学习机器、对感官数据的抽象等未来领域。当然,其中很多是理论性的。这是人工智能第一次成为一个研究领域而不是一个单一项目。 - en: 'The paper “Perceptron: A Perceiving and Recognizing Automaton” in 1957 by Frank Rosenblatt laid the foundation for deep neural networks. He postulated that it should be feasible to construct an electronic or electromechanical system that @@ -304,9 +386,11 @@ or tonal information. This system would function similar to the human brain. Rather than using a rule-based model (which was standard for the algorithms at the time), he proposed using statistical models to make predictions.' + id: totrans-53 prefs: [] type: TYPE_NORMAL - en: Note + id: totrans-54 prefs: - PREF_H6 type: TYPE_NORMAL @@ -315,20 +399,25 @@ has *neurons* that activate when encountering something familiar. The different neurons are connected via connections (corresponding to synapses in our brain) that help information flow from one neuron to another. + id: totrans-55 prefs: [] type: TYPE_NORMAL - en: 'In [Figure 1-3](part0003.html#an_example_of_a_perceptron), we can see an example of the simplest neural network: a perceptron. Mathematically, the perceptron can be expressed as follows:' + id: totrans-56 prefs: [] type: TYPE_NORMAL - en: '*output = f(x[1], x[2], x[3]) = x[1] w[1] + x[2] w[2] + x[3] w[3] + b*' + id: totrans-57 prefs: [] type: TYPE_NORMAL - en: '![An example of a perceptron](../images/00270.jpeg)' + id: totrans-58 prefs: [] type: TYPE_IMG - en: Figure 1-3\. An example of a perceptron + id: totrans-59 prefs: - PREF_H6 type: TYPE_NORMAL @@ -336,6 +425,7 @@ paper “Group Method of Data Handling—A Rival Method of Stochastic Approximation.” There is some controversy in this area, but Ivakhnenko is regarded by some as the father of deep learning. + id: totrans-60 prefs: [] type: TYPE_NORMAL - en: Around this time, bold predictions were made about what machines would be capable @@ -343,9 +433,11 @@ better than humans. Governments around the world were excited and began opening up their wallets to fund these projects. This gold rush started in the late 1950s and was alive and well into the mid-1970s. + id: totrans-61 prefs: [] type: TYPE_NORMAL - en: The Cold and Dark Days + id: totrans-62 prefs: - PREF_H2 type: TYPE_NORMAL @@ -359,6 +451,7 @@ they are linear functions, whereas problems in the real world often require a nonlinear classifier for accurate predictions. Imagine trying to fit a line to a curve! + id: totrans-63 prefs: [] type: TYPE_NORMAL - en: So what happens when you over-promise and under-deliver? You lose funding. The @@ -367,6 +460,7 @@ a lot of the original projects in the United States. However, the lack of results over nearly two decades increasingly frustrated the agency. It was easier to land a man on the moon than to get a usable speech recognizer! + id: totrans-64 prefs: [] type: TYPE_NORMAL - en: Similarly, across the pond, the Lighthill Report was published in 1974, which @@ -376,9 +470,11 @@ and subsequently across the world, destroying many careers in the process. This phase of lost faith in AI lasted about two decades and came to be known as the “AI Winter.” If only Ned Stark had been around back then to warn them. + id: totrans-65 prefs: [] type: TYPE_NORMAL - en: A Glimmer of Hope + id: totrans-66 prefs: - PREF_H2 type: TYPE_NORMAL @@ -395,12 +491,15 @@ back the magnitude of the error into the network so it can learn to fix it. You repeat this process until the error becomes insignificant. A simple yet powerful concept. We use the term backpropagation repeatedly throughout this book.' + id: totrans-67 prefs: [] type: TYPE_NORMAL - en: '![An example multilayer neural network (image credit)](../images/00121.jpeg)' + id: totrans-68 prefs: [] type: TYPE_IMG - en: Figure 1-4\. An example multilayer neural network ([image source](https://oreil.ly/Jn-T6)) + id: totrans-69 prefs: - PREF_H6 type: TYPE_NORMAL @@ -412,6 +511,7 @@ of this network would quickly pose limitations in the real world. This could be overcome somewhat by using multiple hidden layers and training the network with…wait for it…backpropagation! + id: totrans-70 prefs: [] type: TYPE_NORMAL - en: On the more practical side of things, a team at Carnegie Mellon University built @@ -420,12 +520,15 @@ wheel. This eventually led to NavLab 5 in 1995\. During a demonstration, a car drove all but 50 of the 2,850-mile journey from Pittsburgh to San Diego on its own. NavLab got its driver’s license before many Tesla engineers were even born! + id: totrans-71 prefs: [] type: TYPE_NORMAL - en: '![The autonomous NavLab 1 from 1986 in all its glory (image source)](../images/00116.jpeg)' + id: totrans-72 prefs: [] type: TYPE_IMG - en: Figure 1-5\. The autonomous NavLab 1 from 1986 in all its glory ([image source](https://oreil.ly/b2Bnn)) + id: totrans-73 prefs: - PREF_H6 type: TYPE_NORMAL @@ -442,9 +545,11 @@ in the wild. Eventually, in the 1990s, banks would use an evolved version of the model called LeNet-5 to read handwritten numbers on checks. This laid the foundation for modern computer vision. + id: totrans-74 prefs: [] type: TYPE_NORMAL - en: Note + id: totrans-75 prefs: - PREF_H6 type: TYPE_NORMAL @@ -456,18 +561,22 @@ see in [Figure 1-6](part0003.html#a_sample_of_handwritten_digits_from_the), included resizing them to 28 x 28 pixels, centering the digit in that area, antialiasing, and so on. + id: totrans-76 prefs: [] type: TYPE_NORMAL - en: '![A sample of handwritten digits from the MNIST dataset](../images/00196.jpeg)' + id: totrans-77 prefs: [] type: TYPE_IMG - en: Figure 1-6\. A sample of handwritten digits from the MNIST dataset + id: totrans-78 prefs: - PREF_H6 type: TYPE_NORMAL - en: A few others kept their research going, including Jürgen Schmidhuber, who proposed networks like the Long Short-Term Memory (LSTM) with promising applications for text and speech. + id: totrans-79 prefs: [] type: TYPE_NORMAL - en: At that point, even though the theories were becoming sufficiently advanced, @@ -479,6 +588,7 @@ a machine learning technique introduced for classification problems in 1995, were faster and provided reasonably good results on smaller amounts of data, and thus had become the norm. + id: totrans-80 prefs: [] type: TYPE_NORMAL - en: As a result, AI and deep learning’s reputation was poor. Graduate students were @@ -488,9 +598,11 @@ machine learning, and others to dissociate themselves from the AI name. It’s a bit like when the U.S. Department of War was rebranded as the Department of Defense to be more palatable to the people. + id: totrans-81 prefs: [] type: TYPE_NORMAL - en: How Deep Learning Became a Thing + id: totrans-82 prefs: - PREF_H2 type: TYPE_NORMAL @@ -501,6 +613,7 @@ upload. The data lake was filling up, and gradually there were ample opportunities to take a dip. The 14 million-image ImageNet dataset was born from this happy confluence and some tremendous work by (then Princeton’s) Fei-Fei Li and company. + id: totrans-83 prefs: [] type: TYPE_NORMAL - en: During the same decade, PC and console gaming became really serious. Gamers @@ -512,6 +625,7 @@ their lighting, and so on all use matrix operations. And GPUs specialize in doing them. And you know what else needs a lot of matrix calculations? Neural networks. It’s one big happy coincidence. + id: totrans-84 prefs: [] type: TYPE_NORMAL - en: With ImageNet ready, the annual ImageNet Large Scale Visual Recognition Challenge @@ -522,6 +636,7 @@ (SIFT) + SVM yielded a 28% (in 2010) and a 25% (2011) top-5 error rate (i.e., if one of the top five guesses ranked by probability matches, it’s considered accurate). + id: totrans-85 prefs: [] type: TYPE_NORMAL - en: And then came 2012, with an entry on the leaderboard that nearly halved the @@ -532,213 +647,324 @@ resulting in a 240 MB model. It was trained over one week using two NVIDIA GPUs. This single event took everyone by surprise, proving the potential of CNNs that snowballed into the modern deep learning era. + id: totrans-86 prefs: [] type: TYPE_NORMAL + zh: 然后到了2012年,排行榜上的一个条目将错误率几乎减少到16%。来自多伦多大学的Alex Krizhevsky,Ilya Sutskever(最终创立了OpenAI)和Geoffrey + Hinton提交了该条目。AlexNet恰如其名,受到LeNet-5的启发。即使只有八层,AlexNet拥有庞大的6000万参数和65万个神经元,导致一个240 + MB的模型。它使用两个NVIDIA GPU在一周内训练。这一事件让所有人都感到惊讶,证明了CNN的潜力,这导致了现代深度学习时代的发展。 - en: '[Figure 1-7](part0003.html#evolution_of_winning_entries_at_imagenet) quantifies the progress that CNNs have made in the past decade. We saw a 40% year-on-year decrease in classification error rate among ImageNet LSVRC–winning entries since the arrival of deep learning in 2012\. As CNNs grew deeper, the error continued to decrease.' + id: totrans-87 prefs: [] type: TYPE_NORMAL + zh: '[图1-7](part0003.html#evolution_of_winning_entries_at_imagenet)量化了CNN在过去十年中取得的进展。自2012年深度学习出现以来,我们看到ImageNet + LSVRC获奖作品的分类错误率每年减少40%。随着CNN变得更深,错误率继续下降。' - en: '![Evolution of winning entries at ImageNet LSVRC](../images/00264.jpeg)' + id: totrans-88 prefs: [] type: TYPE_IMG + zh: '![ImageNet LSVRC获奖作品的演变](../images/00264.jpeg)' - en: Figure 1-7\. Evolution of winning entries at ImageNet LSVRC + id: totrans-89 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 图1-7\. ImageNet LSVRC获奖作品的演变 - en: Keep in mind we are vastly simplifying the history of AI, and we are surely glossing over some of the details. Essentially, it was a confluence of data, GPUs, and better techniques that led to this modern era of deep learning. And the progress kept expanding further into newer territories. As [Table 1-1](part0003.html#a_highlight_reel_of_the_modern_deep_lear) highlights, what was in the realm of science fiction is already a reality. + id: totrans-90 prefs: [] type: TYPE_NORMAL + zh: 请记住,我们正在大大简化AI的历史,并且肯定忽略了一些细节。基本上,这是数据、GPU和更好技术的融合导致了这个现代深度学习时代。进展不断扩展到新的领域。正如[表1-1](part0003.html#a_highlight_reel_of_the_modern_deep_lear)所强调的,科幻小说中的东西已经成为现实。 - en: Table 1-1\. A highlight reel of the modern deep learning era + id: totrans-91 prefs: [] type: TYPE_NORMAL + zh: 表1-1\. 现代深度学习时代的精华片段 - en: '| 2012 | Neural network from Google Brain team starts recognizing cats after watching YouTube videos |' + id: totrans-92 prefs: [] type: TYPE_TB + zh: '| 2012 | 来自Google Brain团队的神经网络开始观看YouTube视频后识别猫 |' - en: '| 2013 |' + id: totrans-93 prefs: [] type: TYPE_TB + zh: '| 2013 |' - en: Researchers begin tinkering with deep learning on a variety of tasks + id: totrans-94 prefs: - PREF_UL type: TYPE_NORMAL + zh: 研究人员开始在各种任务上尝试深度学习 - en: word2vec brings context to words and phrases, getting one step closer to understanding meanings + id: totrans-95 prefs: - PREF_UL type: TYPE_NORMAL + zh: word2vec为单词和短语带来上下文,使理解含义更接近一步 - en: Error rate for speech recognition went down 25% + id: totrans-96 prefs: - PREF_UL type: TYPE_NORMAL + zh: 语音识别的错误率下降了25% - en: '|' + id: totrans-97 prefs: [] type: TYPE_NORMAL + zh: '|' - en: '| 2014 |' + id: totrans-98 prefs: [] type: TYPE_TB + zh: '| 2014 |' - en: GANs invented + id: totrans-99 prefs: - PREF_UL type: TYPE_NORMAL + zh: GANs被发明 - en: Skype translates speech in real time + id: totrans-100 prefs: - PREF_UL type: TYPE_NORMAL + zh: Skype实时翻译语音 - en: Eugene Goostman, a chatbot, passes the Turing Test + id: totrans-101 prefs: - PREF_UL type: TYPE_NORMAL + zh: 聊天机器人Eugene Goostman通过图灵测试 - en: Sequence-to-sequence learning with neural networks invented + id: totrans-102 prefs: - PREF_UL type: TYPE_NORMAL + zh: 使用神经网络进行序列到序列学习 - en: Image captioning translates images to sentences + id: totrans-103 prefs: - PREF_UL type: TYPE_NORMAL + zh: 图像字幕将图像翻译成句子 - en: '|' + id: totrans-104 prefs: [] type: TYPE_NORMAL + zh: '|' - en: '| 2015 |' + id: totrans-105 prefs: [] type: TYPE_TB + zh: '| 2015 |' - en: Microsoft ResNet beats humans in image accuracy, trains 1,000-layer network + id: totrans-106 prefs: - PREF_UL type: TYPE_NORMAL + zh: 微软ResNet在图像准确性上击败人类,训练1000层网络 - en: Baidu’s Deep Speech 2 does end-to-end speech recognition + id: totrans-107 prefs: - PREF_UL type: TYPE_NORMAL + zh: 百度的Deep Speech 2进行端到端语音识别 - en: Gmail launches Smart Reply + id: totrans-108 prefs: - PREF_UL type: TYPE_NORMAL + zh: Gmail推出智能回复 - en: YOLO (You Only Look Once) does object detection in real time + id: totrans-109 prefs: - PREF_UL type: TYPE_NORMAL + zh: YOLO(You Only Look Once)实时进行目标检测 - en: Visual Question Answering allows asking questions based on images + id: totrans-110 prefs: - PREF_UL type: TYPE_NORMAL + zh: 视觉问答允许根据图像提出问题 - en: '|' + id: totrans-111 prefs: [] type: TYPE_NORMAL + zh: '|' - en: '| 2016 |' + id: totrans-112 prefs: [] type: TYPE_TB + zh: '| 2016 |' - en: AlphaGo wins against professional human Go players + id: totrans-113 prefs: - PREF_UL type: TYPE_NORMAL + zh: AlphaGo击败专业人类围棋选手 - en: Google WaveNets help generate realistic audio + id: totrans-114 prefs: - PREF_UL type: TYPE_NORMAL + zh: Google WaveNets帮助生成逼真的音频 - en: Microsoft achieves human parity in conversational speech recognition + id: totrans-115 prefs: - PREF_UL type: TYPE_NORMAL + zh: 微软在对话语音识别中实现了与人类的平等 - en: '|' + id: totrans-116 prefs: [] type: TYPE_NORMAL + zh: '|' - en: '| 2017 |' + id: totrans-117 prefs: [] type: TYPE_TB + zh: '| 2017 |' - en: AlphaGo Zero learns to play Go itself in 3 days + id: totrans-118 prefs: - PREF_UL type: TYPE_NORMAL + zh: AlphaGo Zero在3天内学会自己下围棋 - en: Capsule Nets fix flaws in CNNs + id: totrans-119 prefs: - PREF_UL type: TYPE_NORMAL + zh: 胶囊网络修复CNN中的缺陷 - en: Tensor Processing Units (TPUs) introduced + id: totrans-120 prefs: - PREF_UL type: TYPE_NORMAL + zh: 引入张量处理单元(TPUs) - en: California allows sale of autonomous cars + id: totrans-121 prefs: - PREF_UL type: TYPE_NORMAL + zh: 加利福尼亚州允许出售自动驾驶汽车 - en: Pix2Pix allows generating images from sketches + id: totrans-122 prefs: - PREF_UL type: TYPE_NORMAL + zh: Pix2Pix允许从草图生成图像 - en: '|' + id: totrans-123 prefs: [] type: TYPE_NORMAL + zh: '|' - en: '| 2018 |' + id: totrans-124 prefs: [] type: TYPE_TB + zh: '| 2018 |' - en: AI designs AI better than humans with Neural Architecture Search + id: totrans-125 prefs: - PREF_UL type: TYPE_NORMAL + zh: AI设计比人类更好的AI,使用神经结构搜索 - en: Google Duplex demo makes restaurant reservations on our behalf + id: totrans-126 prefs: - PREF_UL type: TYPE_NORMAL + zh: Google Duplex演示代表我们预订餐厅 - en: Deepfakes swap one face for another in videos + id: totrans-127 prefs: - PREF_UL type: TYPE_NORMAL + zh: Deepfakes在视频中交换一个面孔 - en: Google’s BERT succeeds humans in language understanding tasks + id: totrans-128 prefs: - PREF_UL type: TYPE_NORMAL + zh: Google的BERT在语言理解任务中成功超越人类 - en: DawnBench and MLPerf established to benchmark AI training + id: totrans-129 prefs: - PREF_UL type: TYPE_NORMAL + zh: DawnBench和MLPerf建立用于基准测试AI训练 - en: '|' + id: totrans-130 prefs: [] type: TYPE_NORMAL + zh: '|' - en: '| 2019 |' + id: totrans-131 prefs: [] type: TYPE_TB + zh: '| 2019 |' - en: OpenAI Five crushes Dota2 world champions + id: totrans-132 prefs: - PREF_UL type: TYPE_NORMAL + zh: OpenAI Five击败Dota2世界冠军 - en: StyleGan generates photorealistic images + id: totrans-133 prefs: - PREF_UL type: TYPE_NORMAL + zh: StyleGan生成逼真的图像 - en: OpenAI GPT-2 generates realistic text passages + id: totrans-134 prefs: - PREF_UL type: TYPE_NORMAL + zh: OpenAI GPT-2生成逼真的文本段落 - en: Fujitsu trains ImageNet in 75 seconds + id: totrans-135 prefs: - PREF_UL type: TYPE_NORMAL + zh: 富士通在75秒内训练ImageNet - en: Microsoft invests $1 billion in OpenAI + id: totrans-136 prefs: - PREF_UL type: TYPE_NORMAL + zh: 微软向OpenAI投资10亿美元 - en: AI by the Allen Institute passes 12th-grade science test with 80% score + id: totrans-137 prefs: - PREF_UL type: TYPE_NORMAL + zh: 艾伦研究所的AI通过12年级科学考试,得分80% - en: '|' + id: totrans-138 prefs: [] type: TYPE_NORMAL + zh: '|' - en: Hopefully, you now have a historical context of AI and deep learning and have an understanding of why this moment in time is significant. It’s important to recognize the rapid rate at which progress is happening in this area. But as we have seen so far, this was not always the case. + id: totrans-139 prefs: [] type: TYPE_NORMAL + zh: 希望现在您对AI和深度学习有了历史背景,并了解为什么这一时刻如此重要。重要的是要认识到在这一领域进展迅速的速度。但正如我们迄今所见,情况并非总是如此。 - en: The original estimate for achieving real-world computer vision was “one summer” back in the 1960s, according to two of the field’s pioneers. They were off by only half a century! It’s not easy being a futurist. A study by Alexander Wissner-Gross @@ -748,42 +974,62 @@ it helped achieve! Look at any of the breakthroughs in the past decade. The dataset that enabled that breakthrough was very likely made available just a few years prior. + id: totrans-140 prefs: [] type: TYPE_NORMAL + zh: 根据该领域的两位先驱,实现真实世界的计算机视觉的最初估计是在20世纪60年代的“一个夏天”。他们只差了半个世纪!成为未来学家并不容易。亚历山大·维斯纳-格罗斯的一项研究发现,算法提出和取得突破之间的平均时间约为18年。另一方面,数据集提供和帮助实现突破之间的平均时间仅为三年!看看过去十年的任何突破。实现该突破的数据集很可能在几年前才被提供。 - en: Data was clearly the limiting factor. This shows the crucial role that a good dataset can play for deep learning. However, data is not the only factor. Let’s look at the other pillars that make up the foundation of the perfect deep learning solution. + id: totrans-141 prefs: [] type: TYPE_NORMAL + zh: 数据显然是限制因素。这显示了一个好数据集对深度学习可以发挥的关键作用。然而,数据并不是唯一的因素。让我们看看构成完美深度学习解决方案基础的其他支柱。 - en: Recipe for the Perfect Deep Learning Solution + id: totrans-142 prefs: - PREF_H1 type: TYPE_NORMAL + zh: 完美深度学习解决方案的配方 - en: Before Gordon Ramsay starts cooking, he ensures he has all of the ingredients ready to go. The same goes for solving a problem using deep learning ([Figure 1-8](part0003.html#ingredients_for_the_perfect_deep_learnin)). + id: totrans-143 prefs: [] type: TYPE_NORMAL + zh: 在Gordon Ramsay开始烹饪之前,他确保所有的配料都准备就绪。解决问题时使用深度学习也是如此([图1-8](part0003.html#ingredients_for_the_perfect_deep_learnin))。 - en: '![Ingredients for the perfect deep learning solution](../images/00098.jpeg)' + id: totrans-144 prefs: [] type: TYPE_IMG + zh: '![完美深度学习解决方案的成分](../images/00098.jpeg)' - en: Figure 1-8\. Ingredients for the perfect deep learning solution + id: totrans-145 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 图1-8。完美深度学习解决方案的成分 - en: And here’s your deep learning *mise en place*! + id: totrans-146 prefs: [] type: TYPE_NORMAL + zh: 这就是你的深度学习*准备好了! - en: '[PRE0]' + id: totrans-147 prefs: [] type: TYPE_PRE + zh: '[PRE0]' - en: Let’s look into each of these in a little more detail. + id: totrans-148 prefs: [] type: TYPE_NORMAL + zh: 让我们更详细地看看这些内容。 - en: Datasets + id: totrans-149 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 数据集 - en: Just like Pac-Man is hungry for dots, deep learning is hungry for data—lots and lots of data. It needs this amount of data to spot meaningful patterns that can help make robust predictions. Traditional machine learning was the norm in @@ -791,8 +1037,10 @@ of examples. In contrast, Deep Neural Networks (DNNs), when built from scratch, would need orders more data for typical prediction tasks. The upside here is far better predictions. + id: totrans-150 prefs: [] type: TYPE_NORMAL + zh: 就像吃豆子的吃豆人渴望数据一样,深度学习也渴望数据——大量的数据。它需要这么多数据来发现有意义的模式,从而帮助做出稳健的预测。传统机器学习在20世纪80年代和90年代是主流,因为即使只有几百到几千个示例,它也可以正常运行。相比之下,从头开始构建的深度神经网络(DNNs)通常需要更多的数据来完成典型的预测任务。这里的好处是预测更准确。 - en: In this century, we are having a data explosion with quintillions of bytes of data being created every single day—images, text, videos, sensor data, and more. But to make effective use of this data, we need labels. To build a sentiment classifier @@ -803,8 +1051,10 @@ labeled with the human driver’s reactions on controls such as the brakes, accelerator, steering wheel, and so forth. These labels act as teachers to our AI and are far more valuable than unlabeled data alone. + id: totrans-151 prefs: [] type: TYPE_NORMAL + zh: 在本世纪,我们每天都会产生数量巨大的数据——图像、文本、视频、传感器数据等等,但要有效利用这些数据,我们需要标签。要构建一个情感分类器,以了解亚马逊评论是积极的还是消极的,我们需要成千上万的句子,并为每个句子分配一个情感。要训练一个用于Snapchat镜头的面部分割系统,我们需要在成千上万的图像上准确标记眼睛、嘴唇、鼻子等位置。要训练一辆自动驾驶汽车,我们需要用人类驾驶员对控制设备的反应标记视频片段,例如刹车、油门、方向盘等。这些标签对我们的人工智能起着教师的作用,比仅有未标记数据更有价值。 - en: Getting labels can be pricey. It’s no wonder that there is an entire industry around crowdsourcing labeling tasks among thousands of workers. Each label might cost from a few cents to dollars, depending on the time spent by the workers to @@ -815,17 +1065,24 @@ of times and you can begin to fathom the costs around some of the larger datasets. Some labeling companies like Appen and Scale AI are already valued at more than a billion dollars each. + id: totrans-152 prefs: [] type: TYPE_NORMAL + zh: 获取标签可能会很昂贵。难怪有成千上万的工人在众包标注任务周围形成了一个整个行业。每个标签的成本可能从几分钱到几美元不等,取决于工人分配标签所花费的时间。例如,在开发微软COCO(上下文中的常见对象)数据集期间,大约需要三秒来标记图像中每个对象的名称,大约需要30秒来在每个对象周围放置一个边界框,以及79秒来为每个对象绘制轮廓。重复这个过程数十万次,你就能开始估算一些更大数据集的成本。一些标注公司,如Appen和Scale + AI,已经价值超过十亿美元。 - en: 'We might not have a million dollars in our bank account. But luckily for us, two good things happened in this deep learning revolution:' + id: totrans-153 prefs: [] type: TYPE_NORMAL + zh: 我们的银行账户可能没有一百万美元。但幸运的是,在这次深度学习革命中发生了两件好事: - en: Gigantic labeled datasets have been generously made public by major companies and universities. + id: totrans-154 prefs: - PREF_UL type: TYPE_NORMAL + zh: 大型标记数据集已经被主要公司和大学慷慨地公开。 - en: A technique called *transfer learning,* which allows us to tune our models to datasets with even hundreds of examples—as long as our model was originally trained on a larger dataset similar to our current set. We use this repeatedly in the @@ -833,121 +1090,188 @@ where we experiment and prove even a few tens of examples can get us decent performance with this technique. Transfer learning busts the myth that big data is necessary for training a good model. Welcome to the world of *tiny data*! + id: totrans-155 prefs: - PREF_UL type: TYPE_NORMAL + zh: 一种称为 *迁移学习* 的技术,允许我们将模型调整到具有数百个示例的数据集上,只要我们的模型最初是在类似于我们当前集合的更大数据集上进行训练的。我们在本书中反复使用这种技术,包括在[第 + 5 章](part0007.html#6LJU3-13fa565533764549a6f0ab7f11eed62b)中,我们进行实验并证明即使有几十个示例,也可以通过这种技术获得良好的性能。迁移学习打破了训练良好模型需要大数据的神话。欢迎来到 + *微小数据* 的世界! - en: '[Table 1-2](part0003.html#a_diverse_range_of_public_datasets) showcases some of the popular datasets out there today for a variety of deep learning tasks.' + id: totrans-156 prefs: [] type: TYPE_NORMAL + zh: '[表格 1-2](part0003.html#a_diverse_range_of_public_datasets) 展示了当今一些流行的数据集,用于各种深度学习任务。' - en: Table 1-2\. A diverse range of public datasets + id: totrans-157 prefs: [] type: TYPE_NORMAL + zh: 表格 1-2. 多样的公共数据集 - en: '| **Data type** | **Name** | **Details** |' + id: totrans-158 prefs: [] type: TYPE_TB + zh: '| **数据类型** | **名称** | **详情** |' - en: '| --- | --- | --- |' + id: totrans-159 prefs: [] type: TYPE_TB + zh: '| --- | --- | --- |' - en: '| Image | Open Images V4(from Google) |' + id: totrans-160 prefs: [] type: TYPE_TB + zh: '| 图像 | 开放图像 V4(来自谷歌) |' - en: Nine million images in 19,700 categories + id: totrans-161 prefs: - PREF_UL type: TYPE_NORMAL + zh: 19700 个类别中的九百万张图像 - en: 1.74 Million images with 600 categories (bounding boxes) + id: totrans-162 prefs: - PREF_UL type: TYPE_NORMAL + zh: 拥有 600 个类别的 174 万张图像(带有边界框) - en: '|' + id: totrans-163 prefs: [] type: TYPE_NORMAL + zh: '|' - en: '|   | Microsoft COCO |' + id: totrans-164 prefs: [] type: TYPE_TB + zh: '|   | Microsoft COCO |' - en: 330,000 images with 80 object categories + id: totrans-165 prefs: - PREF_UL type: TYPE_NORMAL + zh: 拥有 80 个对象类别的 33 万张图像 - en: Contains bounding boxes, segmentation, and five captions per image + id: totrans-166 prefs: - PREF_UL type: TYPE_NORMAL + zh: 包含边界框、分割和每张图像五个标题 - en: '|' + id: totrans-167 prefs: [] type: TYPE_NORMAL + zh: '|' - en: '| Video | YouTube-8M |' + id: totrans-168 prefs: [] type: TYPE_TB + zh: '| 视频 | YouTube-8M |' - en: 6.1 million videos, 3,862 classes, 2.6 billion audio-visual features + id: totrans-169 prefs: - PREF_UL type: TYPE_NORMAL + zh: 610 万个视频,3862 个类别,26 亿个视听特征 - en: 3.0 labels/video + id: totrans-170 prefs: - PREF_UL type: TYPE_NORMAL + zh: 3.0 标签/视频 - en: 1.53 TB of randomly sampled videos + id: totrans-171 prefs: - PREF_UL type: TYPE_NORMAL + zh: 1.53 TB 的随机抽样视频 - en: '|' + id: totrans-172 prefs: [] type: TYPE_NORMAL + zh: '|' - en: '| Video, images | BDD100K(from UC Berkeley) |' + id: totrans-173 prefs: [] type: TYPE_TB + zh: '| 视频、图像 | BDD100K(来自加州大学伯克利分校) |' - en: 100,000 driving videos over 1,100 hours + id: totrans-174 prefs: - PREF_UL type: TYPE_NORMAL + zh: 超过 1100 小时的 10 万个驾驶视频 - en: 100,000 images with bounding boxes for 10 categories + id: totrans-175 prefs: - PREF_UL type: TYPE_NORMAL + zh: 拥有 10 个类别的边界框的 10 万张图像 - en: 100,000 images with lane markings + id: totrans-176 prefs: - PREF_UL type: TYPE_NORMAL + zh: 拥有车道标记的 10 万张图像 - en: 100,000 images with drivable-area segmentation + id: totrans-177 prefs: - PREF_UL type: TYPE_NORMAL + zh: 拥有可驾驶区域分割的 10 万张图像 - en: 10,000 images with pixel-level instance segmentation + id: totrans-178 prefs: - PREF_UL type: TYPE_NORMAL + zh: 拥有像素级实例分割的 1 万张图像 - en: '|' + id: totrans-179 prefs: [] type: TYPE_NORMAL + zh: '|' - en: '| Waymo Open Dataset | 3,000 driving scenes totaling 16.7 hours of video data, 600,000 frames, approximately 25 million 3D bounding boxes, and 22 million 2D bounding boxes |' + id: totrans-180 prefs: [] type: TYPE_TB + zh: '| Waymo 开放数据集 | 总计 3,000 个驾驶场景,16.7 小时的视频数据,60 万帧,约 2500 万个 3D 边界框和 2200 万个 + 2D 边界框 |' - en: '| Text | SQuAD | 150,000 Question and Answer snippets from Wikipedia |' + id: totrans-181 prefs: [] type: TYPE_TB + zh: '| 文本 | SQuAD | 来自维基百科的 15 万个问题和答案片段 |' - en: '|   | Yelp Reviews | Five million Yelp reviews |' + id: totrans-182 prefs: [] type: TYPE_TB + zh: '|   | Yelp 评论 | 五百万条 Yelp 评论 |' - en: '| Satellite data | Landsat Data | Several million satellite images (100 nautical mile width and height), along with eight spectral bands (15- to 60-meter spatial resolution) |' + id: totrans-183 prefs: [] type: TYPE_TB + zh: '| 卫星数据 | Landsat 数据 | 数百万卫星图像(100 海里宽度和高度),以及八个光谱波段(15 到 60 米的空间分辨率) |' - en: '| Audio | Google AudioSet | 2,084,320 10-second sound clips from YouTube with 632 categories |' + id: totrans-184 prefs: [] type: TYPE_TB + zh: '| 音频 | Google AudioSet | 来自 YouTube 的 2,084,320 个 10 秒音频片段,包含 632 个类别 |' - en: '| LibriSpeech | 1,000 hours of read English speech |' + id: totrans-185 prefs: [] type: TYPE_TB + zh: '| LibriSpeech | 1000 小时的英语朗读语音 |' - en: Model Architecture + id: totrans-186 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 模型架构 - en: At a high level, a model is just a function. It takes in one or more inputs and gives an output. The input might be in the form of text, images, audio, video, and more. The output is a prediction. A good model is one whose predictions reliably @@ -956,15 +1280,22 @@ people, this is all they really need to know about deep learning models. But it’s when we peek into the inner workings of a model that it becomes really interesting ([Figure 1-9](part0003.html#a_black_box_view_of_a_deep_learning_mode)). + id: totrans-187 prefs: [] type: TYPE_NORMAL + zh: 在高层次上,模型只是一个函数。它接受一个或多个输入并给出一个输出。输入可以是文本、图像、音频、视频等形式。输出是一个预测。一个好的模型是那些预测可靠地匹配预期现实的模型。模型在数据集上的准确性是决定其是否适用于实际应用的主要因素。对于许多人来说,这就是他们真正需要了解的关于深度学习模型的全部。但当我们窥探模型的内部工作时,它变得真正有趣([图 + 1-9](part0003.html#a_black_box_view_of_a_deep_learning_mode))。 - en: '![A black box view of a deep learning model](../images/00077.jpeg)' + id: totrans-188 prefs: [] type: TYPE_IMG + zh: '![一个深度学习模型的黑盒视图](../images/00077.jpeg)' - en: Figure 1-9\. A black box view of a deep learning model + id: totrans-189 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 图 1-9. 一个深度学习模型的黑盒视图 - en: Inside the model is a graph that consists of nodes and edges. Nodes represent mathematical operations, whereas edges represent how the data flows from one node to another. In other words, if the output of one node can become the input to @@ -972,8 +1303,10 @@ The structure of this graph determines the potential for accuracy, its speed, how much resources it consumes (memory, compute, and energy), and the type of input it’s capable of processing. + id: totrans-190 prefs: [] type: TYPE_NORMAL + zh: 模型内部是一个由节点和边组成的图。节点代表数学运算,而边代表数据如何从一个节点流向另一个节点。换句话说,如果一个节点的输出可以成为一个或多个节点的输入,那么这些节点之间的连接由边表示。这个图的结构决定了准确性的潜力、速度、它消耗的资源(内存、计算和能量)以及它能够处理的输入类型。 - en: The layout of the nodes and edges is known as the *architecture* of the model. Essentially, it’s a blueprint. Now, the blueprint is only half the picture. We still need the actual building. Training is the process that utilizes this blueprint @@ -983,8 +1316,10 @@ 4) propagating the magnitude of error back to the model so that it can progressively learn to correct itself. This training process is performed iteratively until we are satisfied with the accuracy of the predictions. + id: totrans-191 prefs: [] type: TYPE_NORMAL + zh: 节点和边的布局被称为模型的*架构*。本质上,它是一个蓝图。现在,蓝图只是一部分。我们仍然需要实际的建筑。训练是利用这个蓝图来构建那座建筑的过程。我们通过反复进行以下步骤来训练模型:1)输入数据,2)从中获取输出,3)监视这些预测与预期现实(即与数据相关联的标签)之间的差距,然后,4)将错误的大小传播回模型,以便它逐渐学会自我纠正。这个训练过程是迭代进行的,直到我们对预测的准确性感到满意。 - en: The result from this training is a set of numbers (also known as weights) that is assigned to each of the nodes. These weights are necessary parameters for the nodes in the graph to operate on the input given to them. Before the training @@ -992,176 +1327,270 @@ process is essentially to gradually tune the values of each set of these weights until they, in conjunction with their corresponding nodes, produce satisfactory predictions. + id: totrans-192 prefs: [] type: TYPE_NORMAL + zh: 这种训练的结果是一组数字(也称为权重),分配给每个节点。这些权重是图中的节点在给定的输入上操作所必需的参数。在训练开始之前,我们通常将随机数分配为权重。训练过程的目标基本上是逐渐调整每组这些权重的值,直到它们与相应的节点一起产生令人满意的预测。 - en: 'To understand weights a little better, let’s examine the following dataset with two inputs and one output:' + id: totrans-193 prefs: [] type: TYPE_NORMAL + zh: 为了更好地理解权重,让我们来看下面的数据集,其中有两个输入和一个输出: - en: Table 1-3\. Example dataset + id: totrans-194 prefs: [] type: TYPE_NORMAL + zh: 表1-3.示例数据集 - en: '| **input[1]** | **input[2]** | **output** |' + id: totrans-195 prefs: [] type: TYPE_TB + zh: '|**input[1]**|**input[2]**|**output**|' - en: '| --- | --- | --- |' + id: totrans-196 prefs: [] type: TYPE_TB + zh: '|---|---|---|' - en: '| 1 | 6 | 20 |' + id: totrans-197 prefs: [] type: TYPE_TB + zh: '|1|6|20|' - en: '| 2 | 5 | 19 |' + id: totrans-198 prefs: [] type: TYPE_TB + zh: '|2|5|19|' - en: '| 3 | 4 | 18 |' + id: totrans-199 prefs: [] type: TYPE_TB + zh: '|3|4|18|' - en: '| 4 | 3 | 17 |' + id: totrans-200 prefs: [] type: TYPE_TB + zh: '|4|3|17|' - en: '| 5 | 2 | 16 |' + id: totrans-201 prefs: [] type: TYPE_TB + zh: '|5|2|16|' - en: '| 6 | 1 | 15 |' + id: totrans-202 prefs: [] type: TYPE_TB + zh: '|6|1|15|' - en: 'Using linear algebra (or guesswork in our minds), we can deduce that the equation governing this dataset is:' + id: totrans-203 prefs: [] type: TYPE_NORMAL + zh: 使用线性代数(或我们头脑中的猜测),我们可以推断控制这个数据集的方程是: - en: '*output = f(input[1], input[2]*) = 2 x *input[1]* + 3 x *input[2]*' + id: totrans-204 prefs: [] type: TYPE_NORMAL + zh: '*output = f(input[1], input[2])* = 2 x *input[1]* + 3 x *input[2]*' - en: In this case, the weights for this mathematical operation are 2 and 3\. A deep neural network has millions of such weight parameters. + id: totrans-205 prefs: [] type: TYPE_NORMAL + zh: 在这种情况下,这个数学运算的权重是2和3。一个深度神经网络有数百万个这样的权重参数。 - en: Depending on the types of nodes used, different themes of model architectures will be better suited for different kinds of input data. For example, CNNs are used for image and audio, whereas Recurrent Neural Networks (RNNs) and LSTM are often used in text processing. + id: totrans-206 prefs: [] type: TYPE_NORMAL + zh: 根据使用的节点类型不同,不同主题的模型架构将更适合不同类型的输入数据。例如,CNNs用于图像和音频,而循环神经网络(RNNs)和LSTM通常用于文本处理。 - en: In general, training one of these models from scratch can take a pretty significant amount of time, potentially weeks. Luckily for us, many researchers have already done the difficult work of training them on a generic dataset (like ImageNet) and have made them available for everyone to use. What’s even better is that we can take these available models and tune them to our specific dataset. This process is called transfer learning and accounts for the vast majority of needs by practitioners. + id: totrans-207 prefs: [] type: TYPE_NORMAL + zh: 一般来说,从头开始训练这些模型可能需要相当长的时间,可能需要几周。幸运的是,许多研究人员已经完成了在通用数据集(如ImageNet)上训练它们的艰苦工作,并使它们可供所有人使用。更好的是,我们可以拿这些可用的模型并将它们调整到我们的特定数据集。这个过程称为迁移学习,占了从业者绝大多数需求。 - en: 'Compared to training from scratch, transfer learning provides a two-fold advantage: significantly reduced training time a (few minutes to hours instead of weeks), and it can work with a substantially smaller dataset (hundreds to thousands of data samples instead of millions). [Table 1-4](part0003.html#example_model_architectures_over_the_yea) shows some famous examples of model architectures.' + id: totrans-208 prefs: [] type: TYPE_NORMAL + zh: 与从头开始训练相比,迁移学习提供了双重优势:显著减少的训练时间(几分钟到几小时,而不是几周),并且可以使用大大较小的数据集(数百到数千个数据样本,而不是数百万个)。[表1-4](part0003.html#example_model_architectures_over_the_yea)显示了一些著名的模型架构示例。 - en: Table 1-4\. Example model architectures over the years + id: totrans-209 prefs: [] type: TYPE_NORMAL + zh: 表1-4.多年来的示例模型架构 - en: '| **Task** | **Example model architectures** |' + id: totrans-210 prefs: [] type: TYPE_TB + zh: '|**任务**|**示例模型架构**|' - en: '| --- | --- |' + id: totrans-211 prefs: [] type: TYPE_TB + zh: '|---|---|' - en: '| Image classification | ResNet-152 (2015), MobileNet (2017) |' + id: totrans-212 prefs: [] type: TYPE_TB + zh: '|图像分类|ResNet-152(2015年),MobileNet(2017年)|' - en: '| Text classification | BERT (2018), XLNet (2019) |' + id: totrans-213 prefs: [] type: TYPE_TB + zh: '|文本分类|BERT(2018年),XLNet(2019年)|' - en: '| Image segmentation | U-Net (2015), DeepLabV3 (2018) |' + id: totrans-214 prefs: [] type: TYPE_TB + zh: '|图像分割|U-Net(2015年),DeepLabV3(2018年)|' - en: '| Image translation | Pix2Pix (2017) |' + id: totrans-215 prefs: [] type: TYPE_TB + zh: '|图像翻译|Pix2Pix(2017年)|' - en: '| Object detection | YOLO9000 (2016), Mask R-CNN (2017) |' + id: totrans-216 prefs: [] type: TYPE_TB + zh: '|目标检测|YOLO9000(2016年),Mask R-CNN(2017年)|' - en: '| Speech generation | WaveNet (2016) |' + id: totrans-217 prefs: [] type: TYPE_TB + zh: '|语音生成|WaveNet(2016年)|' - en: Each one of the models from [Table 1-4](part0003.html#example_model_architectures_over_the_yea) has a published accuracy metric on reference datasets (e.g., ImageNet for classification, MS COCO for detection). Additionally, these architectures have their own characteristic resource requirements (model size in megabytes, computation requirements in floating-point operations, or FLOPS). + id: totrans-218 prefs: [] type: TYPE_NORMAL + zh: '[表1-4](part0003.html#example_model_architectures_over_the_yea)中的每个模型都在参考数据集(例如,分类的ImageNet,检测的MS + COCO)上有一个已发布的准确度指标。此外,这些架构有它们自己的特征资源需求(以兆字节为单位的模型大小,以浮点运算为单位的计算需求,或FLOPS)。' - en: We explore transfer learning in-depth in the upcoming chapters. Now, let’s look at the kinds of deep learning frameworks and services that are available to us. + id: totrans-219 prefs: [] type: TYPE_NORMAL + zh: 我们将在接下来的章节深入探讨迁移学习。现在,让我们看看我们可以使用的深度学习框架和服务。 - en: Note + id: totrans-220 prefs: - PREF_H6 type: TYPE_NORMAL + zh: 注 - en: 'When Kaiming He et al. came up with the 152-layer ResNet architecture in 2015—a feat of its day considering the previous largest GoogLeNet model consisted of 22 layers—there was just one question on everyone’s mind: “Why not 153 layers?” The reason, as it turns out, was that Kaiming ran out of GPU memory!' + id: totrans-221 prefs: [] type: TYPE_NORMAL + zh: 当Kaiming He等人在2015年提出了152层的ResNet架构时——考虑到之前最大的GoogLeNet模型由22层组成,这是当时的壮举——每个人心中只有一个问题:“为什么不是153层?”原来,原因是Kaiming的GPU内存用完了! - en: Frameworks + id: totrans-222 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 框架 - en: There are several deep learning libraries out there that help us train our models. Additionally, there are frameworks that specialize in using those trained models to make predictions (or *inference*), optimizing for where the application resides. + id: totrans-223 prefs: [] type: TYPE_NORMAL + zh: 有几个深度学习库可以帮助我们训练模型。此外,还有专门用于使用这些训练模型进行预测(或*推理*)的框架,优化应用程序所在的位置。 - en: Historically, as is the case with software generally, many libraries have come and gone—Torch (2002), Theano (2007), Caffe (2013), Microsoft Cognitive Toolkit (2015), Caffe2 (2017)—and the landscape has been evolving rapidly. Learnings from each have made the other libraries easier to pick up, driven interest, and improved productivity for beginners and experts alike. [Table 1-5](part0003.html#popular_deep_learning_frameworks) looks at some of the popular ones. + id: totrans-224 prefs: [] type: TYPE_NORMAL + zh: 从历史上看,就像通常的软件一样,许多库已经出现并消失了——Torch(2002年)、Theano(2007年)、Caffe(2013年)、Microsoft + Cognitive Toolkit(2015年)、Caffe2(2017年)——并且这个领域一直在迅速发展。从每个库中学到的东西使其他库更容易上手,引起了兴趣,并提高了初学者和专家的生产力。[表1-5](part0003.html#popular_deep_learning_frameworks)看一些流行的框架。 - en: Table 1-5\. Popular deep learning frameworks + id: totrans-225 prefs: [] type: TYPE_NORMAL + zh: 表1-5. 流行的深度学习框架 - en: '| **Framework** | **Best suited for** | **Typical target platform** |' + id: totrans-226 prefs: [] type: TYPE_TB + zh: '| **框架** | **最适用于** | **典型目标平台** |' - en: '| --- | --- | --- |' + id: totrans-227 prefs: [] type: TYPE_TB + zh: '| --- | --- | --- |' - en: '| TensorFlow (including Keras) | Training | Desktops, servers |' + id: totrans-228 prefs: [] type: TYPE_TB + zh: '| TensorFlow(包括Keras)| 训练 | 台式机、服务器 |' - en: '| PyTorch | Training | Desktops, servers |' + id: totrans-229 prefs: [] type: TYPE_TB + zh: '| PyTorch | 训练 | 台式机、服务器 |' - en: '| MXNet | Training | Desktops, servers |' + id: totrans-230 prefs: [] type: TYPE_TB + zh: '| MXNet | 训练 | 台式机、服务器 |' - en: '| TensorFlow Serving | Inference | Servers |' + id: totrans-231 prefs: [] type: TYPE_TB + zh: '| TensorFlow Serving | 推理 | 服务器 |' - en: '| TensorFlow Lite | Inference | Mobile and embedded devices |' + id: totrans-232 prefs: [] type: TYPE_TB + zh: '| TensorFlow Lite | 推理 | 移动和嵌入式设备 |' - en: '| TensorFlow.js | Inference | Browsers |' + id: totrans-233 prefs: [] type: TYPE_TB + zh: '| TensorFlow.js | 推理 | 浏览器 |' - en: '| ml5.js | Inference | Browsers |' + id: totrans-234 prefs: [] type: TYPE_TB + zh: '| ml5.js | 推理 | 浏览器 |' - en: '| Core ML | Inference | Apple devices |' + id: totrans-235 prefs: [] type: TYPE_TB + zh: '| Core ML | 推理 | 苹果设备 |' - en: '| Xnor AI2GO | Inference | Embedded devices |' + id: totrans-236 prefs: [] type: TYPE_TB + zh: '| Xnor AI2GO | 推理 | 嵌入式设备 |' - en: TensorFlow + id: totrans-237 prefs: - PREF_H3 type: TYPE_NORMAL + zh: TensorFlow - en: In 2011, Google Brain developed the DNN library DistBelief for internal research and engineering. It helped train Inception (2014’s winning entry to the ImageNet Large Scale Visual Recognition Challenge) as well as helped improve the quality @@ -1172,8 +1601,10 @@ scalable, highly performant, and portable to many hardware platforms. And the best part, it was open source. Google called it TensorFlow and announced its release on November 2015. + id: totrans-238 prefs: [] type: TYPE_NORMAL + zh: 2011年,Google Brain开发了用于内部研究和工程的DNN库DistBelief。它帮助训练了Inception(2014年ImageNet大规模视觉识别挑战的获奖作品),并帮助提高了Google产品中语音识别的质量。与Google的基础设施紧密联系,它不容易配置和与外部机器学习爱好者共享代码。意识到这些限制,Google开始研发第二代分布式机器学习框架,承诺是通用、可扩展、高性能且可移植到许多硬件平台。而且最重要的是,它是开源的。Google称之为TensorFlow,并于2015年11月宣布发布。 - en: TensorFlow delivered on a lot of these aforementioned promises, developing an end-to-end ecosystem from development to deployment, and it gained a massive following in the process. With more than 100,000 stars on GitHub, it shows no signs of stopping. @@ -1181,19 +1612,26 @@ being easy enough to use. As the joke went, TensorFlow was a library by Google engineers, of Google engineers, for Google engineers, and if you were smart enough to use TensorFlow, you were smart enough to get hired there. + id: totrans-239 prefs: [] type: TYPE_NORMAL + zh: TensorFlow实现了许多前述承诺,从开发到部署形成了一个端到端的生态系统,并在这个过程中获得了大量的追随者。在GitHub上拥有超过10万颗星星,显示出没有停止的迹象。然而,随着采用的增加,该库的用户 + rightly 批评它使用起来不够简单。正如笑话所说,TensorFlow是由Google工程师制作的库,为Google工程师制作的库,如果你足够聪明使用TensorFlow,你就足够聪明被雇佣在那里。 - en: But Google was not alone here. Let’s be honest. Even as late as 2015, it was a given that working with deep learning libraries would inevitably be an unpleasant experience. Forget even working on these; installing some of these frameworks made people want to pull their hair out. (Caffe users out there—does this ring a bell?) + id: totrans-240 prefs: [] type: TYPE_NORMAL + zh: 但Google并不孤单。说实话,即使到2015年,使用深度学习库仍然是一种令人不愉快的经历。甚至忘记使用这些库,安装其中一些框架就让人想拔头发。(Caffe的用户们,这是不是让你们想起了什么?) - en: Keras + id: totrans-241 prefs: - PREF_H3 type: TYPE_NORMAL + zh: Keras - en: As an answer to the hardships faced by deep learning practitioners, François Chollet released the open source framework Keras in March 2015, and the world hasn’t been the same since. This solution suddenly made deep learning accessible @@ -1202,12 +1640,17 @@ framework. Starting with Theano as its first backend, Keras encouraged rapid prototyping and reduced the number of lines of code. Eventually, this abstraction expanded to other frameworks including Cognitive Toolkit, MXNet, PlaidML, and, yes, TensorFlow. + id: totrans-242 prefs: [] type: TYPE_NORMAL + zh: 作为深度学习从业者面临困难的答案,François Chollet于2015年3月发布了开源框架Keras,自那以后世界就变了。这个解决方案突然使深度学习对初学者变得可访问。Keras提供了一个直观且易于使用的编码界面,然后使用其他深度学习库作为后端计算框架。从其第一个后端Theano开始,Keras鼓励快速原型设计并减少代码行数。最终,这种抽象扩展到其他框架,包括Cognitive + Toolkit、MXNet、PlaidML,以及TensorFlow。 - en: PyTorch + id: totrans-243 prefs: - PREF_H3 type: TYPE_NORMAL + zh: PyTorch - en: In parallel, PyTorch started at Facebook early in 2016, where engineers had the benefit of observing TensorFlow’s limitations. PyTorch supported native Python constructs and Python debugging right off the bat, making it flexible and easier @@ -1216,12 +1659,16 @@ models and deploy them to production to serve more than a billion users. Whereas PyTorch drove research, Caffe2 was primarily used in production. In 2018, Caffe2 was absorbed into PyTorch to make a full framework. + id: totrans-244 prefs: [] type: TYPE_NORMAL + zh: 同时,PyTorch在2016年初在Facebook开始,工程师们有幸观察到TensorFlow的局限性。PyTorch从一开始就支持本地Python构造和Python调试,使其灵活且易于使用,很快成为AI研究人员的最爱。它是第二大端到端深度学习系统。Facebook另外构建了Caffe2,用于将PyTorch模型部署到生产环境,为超过十亿用户提供服务。PyTorch推动了研究,而Caffe2主要用于生产。2018年,Caffe2被吸收到PyTorch中,形成一个完整的框架。 - en: A continuously evolving landscape + id: totrans-245 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 一个不断发展的领域 - en: Had this story ended with the ease of Keras and PyTorch, this book would not have the word “TensorFlow” in the subtitle. The TensorFlow team recognized that if it truly wanted to broaden the tool’s reach and democratize AI, it needed to @@ -1232,8 +1679,11 @@ deploy. It was a match made in heaven! And to top it all, TensorFlow 2.0 (released in 2019) included support for native Python constructs and eager execution, as we saw in PyTorch. + id: totrans-246 prefs: [] type: TYPE_NORMAL + zh: 如果这个故事以Keras和PyTorch的便利结束,这本书的副标题就不会有“TensorFlow”这个词了。TensorFlow团队意识到,如果真的想要扩大工具的影响力并使AI民主化,就需要让工具更容易使用。因此,当Keras正式作为TensorFlow的一部分包含在内时,这是一个好消息,提供了两全其美的选择。这使开发人员可以使用Keras定义模型和训练模型,使用核心TensorFlow进行高性能数据管道,包括分布式训练和部署生态系统。这是天作之合!最重要的是,TensorFlow + 2.0(2019年发布)包括对本地Python构造和急切执行的支持,正如我们在PyTorch中看到的那样。 - en: With so many competing frameworks available, the question of portability inevitability arises. Imagine a new research paper published with the state-of-the-art model being made public in PyTorch. If we didn’t work in PyTorch, we would be locked @@ -1249,44 +1699,68 @@ it provided converters for libraries that did not natively support this format. This allowed developers to train in one framework and do inferences in a different framework. + id: totrans-247 prefs: [] type: TYPE_NORMAL + zh: 有了这么多竞争框架可用,可移植性的问题不可避免地出现。想象一下,一篇新的研究论文以PyTorch公开发布的最先进模型。如果我们不在PyTorch中工作,我们将无法参与研究,必须重新实现和训练。开发人员喜欢能够自由共享模型,而不受限于特定的生态系统。许多开发人员自然地编写了库,将模型格式从一个库转换为另一个库。这是一个简单的解决方案,但由于转换工具的数量庞大,缺乏官方支持和足够的质量,导致了组合爆炸。为了解决这个问题,微软和Facebook等行业主要参与者发起了开放神经网络交换(ONNX)。ONNX提供了一个通用模型格式的规范,可以被多个流行库官方读写。此外,它为不支持此格式的库提供了转换器。这使开发人员可以在一个框架中进行训练,然后在另一个框架中进行推断。 - en: Apart from these frameworks, there are several Graphical User Interface (GUI) systems that make code-free training possible. Using transfer learning, they generate trained models quickly in several formats useful for inference. With point-and-click interfaces, even your grandma can now train a neural network! + id: totrans-248 prefs: [] type: TYPE_NORMAL + zh: 除了这些框架外,还有几个图形用户界面(GUI)系统,可以实现无代码训练。使用迁移学习,它们可以快速生成多种格式的训练模型,用于推断。即使是您的祖母也可以通过点按界面快速训练神经网络! - en: Table 1-6\. Popular GUI-based model training tools + id: totrans-249 prefs: [] type: TYPE_NORMAL + zh: 表1-6.流行的基于GUI的模型训练工具 - en: '| **Service** | **Platform** |' + id: totrans-250 prefs: [] type: TYPE_TB + zh: '| **服务** | **平台** |' - en: '| --- | --- |' + id: totrans-251 prefs: [] type: TYPE_TB + zh: '| --- | --- |' - en: '| Microsoft CustomVision.AI | Web-based |' + id: totrans-252 prefs: [] type: TYPE_TB + zh: '| 微软CustomVision.AI | 基于Web |' - en: '| Google AutoML | Web-based |' + id: totrans-253 prefs: [] type: TYPE_TB + zh: '| Google AutoML | 基于Web |' - en: '| Clarifai | Web-based |' + id: totrans-254 prefs: [] type: TYPE_TB + zh: '| Clarifai | 基于Web |' - en: '| IBM Visual Recognition | Web-based |' + id: totrans-255 prefs: [] type: TYPE_TB + zh: '| IBM视觉识别 | 基于Web |' - en: '| Apple Create ML | macOS |' + id: totrans-256 prefs: [] type: TYPE_TB + zh: '| 苹果Create ML | macOS |' - en: '| NVIDIA DIGITS | Desktop |' + id: totrans-257 prefs: [] type: TYPE_TB + zh: '| NVIDIA DIGITS | 桌面 |' - en: '| Runway ML | Desktop |' + id: totrans-258 prefs: [] type: TYPE_TB + zh: '| Runway ML | 桌面 |' - en: So why did we choose TensorFlow and Keras as the primary frameworks for this book? Considering the sheer amount of material available, including documentation, Stack Overflow answers, online courses, the vast community of contributors, platform @@ -1297,29 +1771,36 @@ discussed in the book are generalizable to other libraries, as well. Picking up a new framework shouldn’t take you too long. So, if you really want to move to a company that uses PyTorch exclusively, don’t hesitate to apply there. + id: totrans-259 prefs: [] type: TYPE_NORMAL + zh: 那么为什么我们选择TensorFlow和Keras作为本书的主要框架?考虑到可用的材料数量,包括文档、Stack Overflow答案、在线课程、庞大的贡献者社区、平台和设备支持、行业采用以及可用的工作岗位(在美国,与PyTorch相比,大约有三倍的TensorFlow相关角色),当涉及到框架时,TensorFlow和Keras目前主导着这个领域。对我们来说选择这种组合是有道理的。也就是说,本书讨论的技术也适用于其他库。学习一个新的框架不应该花费太长时间。因此,如果您真的想要加入一个专门使用PyTorch的公司,不要犹豫去申请。 - en: Hardware + id: totrans-260 prefs: - PREF_H2 type: TYPE_NORMAL + zh: 硬件 - en: In 1848, when James W. Marshall discovered gold in California, the news spread like wildfire across the United States. Hundreds of thousands of people stormed to the state to begin mining for riches. This was known as the *California Gold Rush*. Early movers were able to extract a decent chunk, but the latecomers were not nearly as lucky. But the rush did not stop for many years. Can you guess who made the most money throughout this period? The shovel makers! + id: totrans-261 prefs: [] type: TYPE_NORMAL - en: Cloud and hardware companies are the shovel makers of the twenty-first century. Don’t believe us? Look at the stock performance of Microsoft and NVIDIA in the past decade. The only difference between 1849 and now is the mind-bogglingly large amount of shovel choices available to us. + id: totrans-262 prefs: [] type: TYPE_NORMAL - en: Given the variety of hardware available, it is important to make the correct choices for the constraints imposed by resource, latency, budget, privacy, and legal requirements of the application. + id: totrans-263 prefs: [] type: TYPE_NORMAL - en: Depending on how your application interacts with the user, the inference phase @@ -1331,6 +1812,7 @@ processed per second (typically >15 fps). On the other hand, a photo uploaded to an image library such as Google Photos does not need immediate image categorization done on it. A few seconds or few minutes of latency is acceptable. + id: totrans-264 prefs: [] type: TYPE_NORMAL - en: Going to the other extreme, training takes a lot more time; anywhere between @@ -1342,89 +1824,112 @@ experiment to finish from a week to a few hours. This can be the difference in watching a documentary about the Grand Canyon (two hours) versus actually making the trip to visit the Grand Canyon (four days). + id: totrans-265 prefs: [] type: TYPE_NORMAL - en: 'Following are a few fundamental hardware categories to choose from and how they are typically characterized (see also [Figure 1-10](part0003.html#comparison_of_different_types_of_hardwar)):' + id: totrans-266 prefs: [] type: TYPE_NORMAL - en: Central Processing Unit (CPU) + id: totrans-267 prefs: [] type: TYPE_NORMAL - en: Cheap, flexible, slow. For example, Intel Core i9-9900K. + id: totrans-268 prefs: [] type: TYPE_NORMAL - en: GPU + id: totrans-269 prefs: [] type: TYPE_NORMAL - en: High throughput, great for batching to utilize parallel processing, expensive. For example, NVIDIA GeForce RTX 2080 Ti. + id: totrans-270 prefs: [] type: TYPE_NORMAL - en: Field-Programmable Gate Array (FPGA) + id: totrans-271 prefs: [] type: TYPE_NORMAL - en: Fast, low power, reprogrammable for custom solutions, expensive. Known companies include Xilinx, Lattice Semiconductor, Altera (Intel). Because of the ability to run in seconds and configurability to any AI model, Microsoft Bing runs the majority of its AI on FPGAs. + id: totrans-272 prefs: [] type: TYPE_NORMAL - en: Application-Specific Integrated Circuit (ASIC) + id: totrans-273 prefs: [] type: TYPE_NORMAL - en: 'Custom-made chip. Extremely expensive to design, but inexpensive when built for scale. Just like in the pharmaceutical industry, the first item costs the most due to the R&D effort that goes into designing and making it. Producing massive quantities is rather inexpensive. Specific examples include the following:' + id: totrans-274 prefs: [] type: TYPE_NORMAL - en: Tensor Processing Unit (TPU) + id: totrans-275 prefs: [] type: TYPE_NORMAL - en: ASIC specializing in operations for neural networks, available on Google Cloud only. + id: totrans-276 prefs: [] type: TYPE_NORMAL - en: Edge TPU + id: totrans-277 prefs: [] type: TYPE_NORMAL - en: Smaller than a US penny, accelerates inference on the edge. + id: totrans-278 prefs: [] type: TYPE_NORMAL - en: Neural Processing Unit (NPU) + id: totrans-279 prefs: [] type: TYPE_NORMAL - en: Often used by smartphone manufacturers, this is a dedicated chip for accelerating neural network inference. + id: totrans-280 prefs: [] type: TYPE_NORMAL - en: '![Comparison of different types of hardware relative to flexibility, performance, and cost](../images/00092.jpeg)' + id: totrans-281 prefs: [] type: TYPE_IMG - en: Figure 1-10\. Comparison of different types of hardware relative to flexibility, performance, and cost + id: totrans-282 prefs: - PREF_H6 type: TYPE_NORMAL - en: 'Let’s look at a few scenarios for which each one would be used:' + id: totrans-283 prefs: [] type: TYPE_NORMAL - en: Getting started with training → CPU + id: totrans-284 prefs: - PREF_UL type: TYPE_NORMAL - en: Training large networks → GPUs and TPUs + id: totrans-285 prefs: - PREF_UL type: TYPE_NORMAL - en: Inference on smartphones → Mobile CPU, GPU, Digital Signal Processor (DSP), NPU + id: totrans-286 prefs: - PREF_UL type: TYPE_NORMAL - en: Wearables (e.g., smart glasses, smartwatches) → Edge TPU, NPUs + id: totrans-287 prefs: - PREF_UL type: TYPE_NORMAL @@ -1432,91 +1937,111 @@ Accelerators like Google Coral, Intel Movidius with Raspberry Pi, or GPUs like NVIDIA Jetson Nano, all the way down to $15 microcontrollers (MCUs) for wake word detection in smart speakers + id: totrans-288 prefs: - PREF_UL type: TYPE_NORMAL - en: As we go through the book, we will closely explore many of these. + id: totrans-289 prefs: [] type: TYPE_NORMAL - en: Responsible AI + id: totrans-290 prefs: - PREF_H1 type: TYPE_NORMAL - en: So far, we have explored the power and the potential of AI. It shows great promise to enhance our abilities, to make us more productive, to give us superpowers. + id: totrans-291 prefs: [] type: TYPE_NORMAL - en: But with great power comes great responsibility. + id: totrans-292 prefs: [] type: TYPE_NORMAL - en: As much as AI can help humanity, it also has equal potential to harm us when not designed with thought and care (either intentionally or unintentionally). The AI is not to blame; rather, it’s the AI’s designers. + id: totrans-293 prefs: [] type: TYPE_NORMAL - en: Consider some real incidents that made the news in the past few years. + id: totrans-294 prefs: [] type: TYPE_NORMAL - en: '“____ can allegedly determine whether you’re a terrorist just by analyzing your face” ([Figure 1-11](part0003.html#startup_claiming_to_classify_people_base)): *Computer World*, 2016' + id: totrans-295 prefs: - PREF_UL type: TYPE_NORMAL - en: '“AI is sending people to jail—and getting it wrong”: *MIT Tech Review*, 2019' + id: totrans-296 prefs: - PREF_UL type: TYPE_NORMAL - en: '“____ supercomputer recommended ‘unsafe and incorrect’ cancer treatments, internal documents show”: *STAT News*, 2018' + id: totrans-297 prefs: - PREF_UL type: TYPE_NORMAL - en: '“____ built an AI tool to hire people but had to shut it down because it was discriminating against women”: *Business Insider*, 2018' + id: totrans-298 prefs: - PREF_UL type: TYPE_NORMAL - en: '“____ AI study: Major object recognition systems favor people with more money”: *VentureBeat*, 2019' + id: totrans-299 prefs: - PREF_UL type: TYPE_NORMAL - en: '“____ labeled black people ‘gorillas’” *USA Today*, 2015\. “Two years later, ____ solves ‘racist algorithm’ problem by purging ‘gorilla’ label from image classifier”: *Boing Boing*, 2018' + id: totrans-300 prefs: - PREF_UL type: TYPE_NORMAL - en: '“____ silences its new A.I. bot Tay, after Twitter users teach it racism”: *TechCrunch*, 2016' + id: totrans-301 prefs: - PREF_UL type: TYPE_NORMAL - en: '“AI Mistakes Bus-Side Ad for Famous CEO, Charges Her With Jaywalking”: *Caixin Global*, 2018' + id: totrans-302 prefs: - PREF_UL type: TYPE_NORMAL - en: '“____ to drop Pentagon AI contract after employee objections to the ‘business of war’”: *Washington Post*, 2018' + id: totrans-303 prefs: - PREF_UL type: TYPE_NORMAL - en: '“Self-driving ____ death car ‘spotted pedestrian six seconds before mowing down and killing her’”: *The Sun*, 2018' + id: totrans-304 prefs: - PREF_UL type: TYPE_NORMAL - en: '![Startup claiming to classify people based on their facial structure](../images/00324.jpeg)' + id: totrans-305 prefs: [] type: TYPE_IMG - en: Figure 1-11\. Startup claiming to classify people based on their facial structure + id: totrans-306 prefs: - PREF_H6 type: TYPE_NORMAL - en: Can you fill in the blanks here? We’ll give you some options—Amazon, Microsoft, Google, IBM, and Uber. Go ahead and fill them out. We’ll wait. + id: totrans-307 prefs: [] type: TYPE_NORMAL - en: There’s a reason we kept them blank. It’s to recognize that it’s not a problem @@ -1524,15 +2049,18 @@ although these things happened in the past, and might not reflect the current state, we can learn from them and try not to make the same mistakes. The silver lining here is that everyone learned from these mistakes. + id: totrans-308 prefs: [] type: TYPE_NORMAL - en: We, as developers, designers, architects, and leaders of AI, have the responsibility to think beyond just the technical problem at face value. Following are just a handful of topics that are relevant to any problem we solve (AI or otherwise). They must not take a backseat. + id: totrans-309 prefs: [] type: TYPE_NORMAL - en: Bias + id: totrans-310 prefs: - PREF_H2 type: TYPE_NORMAL @@ -1542,6 +2070,7 @@ power them were not created in a vacuum—they were created by human beings with their own biases. Computers don’t magically create bias on their own, they reflect and amplify existing ones. + id: totrans-311 prefs: [] type: TYPE_NORMAL - en: Take the example from the early days of the YouTube app when the developers @@ -1553,6 +2082,7 @@ had not accounted for that case during the development and testing of their mobile app, so YouTube uploaded videos to its server in the same orientation for both left-handed and right-handed users. + id: totrans-312 prefs: [] type: TYPE_NORMAL - en: This problem could have been caught much earlier if the developers had even @@ -1562,14 +2092,17 @@ play. Factors such as gender, skin tone, economic status, disability, country of origin, speech patterns, or even something as trivial as hair length can determine life-changing outcomes for someone, including how an algorithm treats them. + id: totrans-313 prefs: [] type: TYPE_NORMAL - en: 'Google’s [machine learning glossary](https://oreil.ly/ySfNv) lists several forms of bias that can affect a machine learning pipeline. The following are just some of them:' + id: totrans-314 prefs: [] type: TYPE_NORMAL - en: Selection bias + id: totrans-315 prefs: [] type: TYPE_NORMAL - en: The dataset is not representative of the distribution of the real-world problem @@ -1577,6 +2110,7 @@ and smart home speakers, some spoken accents are overrepresented, whereas other accents have no data at all in the training dataset, resulting in a poor UX for large chunks of the world’s population. + id: totrans-316 prefs: [] type: TYPE_NORMAL - en: Selection bias can also happen because of co-occurrence of concepts. For example, @@ -1585,18 +2119,22 @@ the genders, as demonstrated in [Figure 1-12](part0003.html#google_translate_reflecting_the_underlyi). This is likely because the dataset contains a large sample of co-occurrences of male pronouns and the word “doctor,” and female pronouns and the word “nurse.” + id: totrans-317 prefs: [] type: TYPE_NORMAL - en: '![Google Translate reflecting the underlying bias in data (as of September 2019)](../images/00013.jpeg)' + id: totrans-318 prefs: [] type: TYPE_IMG - en: Figure 1-12\. Google Translate reflecting the underlying bias in data (as of September 2019) + id: totrans-319 prefs: - PREF_H6 type: TYPE_NORMAL - en: Implicit bias + id: totrans-320 prefs: [] type: TYPE_NORMAL - en: This type of bias creeps in because of implicit assumptions that we all make @@ -1606,25 +2144,31 @@ toward textures,^([2](part0003.html#idm45475775799448)) most of them will classify the full image as a zebra. Except that we know that the image is of a sofa upholstered in a zebra-like fabric. + id: totrans-321 prefs: [] type: TYPE_NORMAL - en: '![Zebra sofa by Glen Edelson (image source)](../images/00134.jpeg)' + id: totrans-322 prefs: [] type: TYPE_IMG - en: Figure 1-13\. Zebra sofa by Glen Edelson ([image source](https://oreil.ly/Xg4MP)) + id: totrans-323 prefs: - PREF_H6 type: TYPE_NORMAL - en: Reporting bias + id: totrans-324 prefs: [] type: TYPE_NORMAL - en: Sometimes the loudest voices in the room are the most extreme ones and dominate the conversation. One good look at Twitter might make it seem as if the world is ending, whereas most people are busy leading mundane lives. Unfortunately, boring does not sell. + id: totrans-325 prefs: [] type: TYPE_NORMAL - en: In-group/out-group bias + id: totrans-326 prefs: [] type: TYPE_NORMAL - en: An annotator from East Asia might look at a picture of the Statue of Liberty @@ -1633,9 +2177,11 @@ or “Liberty Island.” It’s human nature to see one’s own groups with nuance while seeing other groups as more homogenous, and that reflects in our datasets, as well. + id: totrans-327 prefs: [] type: TYPE_NORMAL - en: Accountability and Explainability + id: totrans-328 prefs: - PREF_H2 type: TYPE_NORMAL @@ -1646,6 +2192,7 @@ it to move? What caused it to stop? What stopped it from burning the person sitting inside it? He had no answers. If this was the origin story of the car, you’d probably not want to get into that contraption. + id: totrans-329 prefs: [] type: TYPE_NORMAL - en: This is precisely what is happening with AI right now. Previously, with traditional @@ -1668,6 +2215,7 @@ there’s momentum to change that with investments in *Explainable AI*, wherein the model would be able to not just provide predictions but also account for the factors that caused it to make a certain prediction, and reveal areas of limitations. + id: totrans-330 prefs: [] type: TYPE_NORMAL - en: Additionally, cities (such as New York) are beginning to make their algorithms @@ -1676,9 +2224,11 @@ and audits by experts, improving expertise in government agencies to better evaluate each system they add, and by providing mechanisms to dispute a decision made by an algorithm. + id: totrans-331 prefs: [] type: TYPE_NORMAL - en: Reproducibility + id: totrans-332 prefs: - PREF_H2 type: TYPE_NORMAL @@ -1699,9 +2249,11 @@ constructed datasets) and open sourcing the code they used for their research. Members of the community can piggyback on this code, prove it works, and make it better, thereby leading to newer innovations rapidly. + id: totrans-333 prefs: [] type: TYPE_NORMAL - en: Robustness + id: totrans-334 prefs: - PREF_H2 type: TYPE_NORMAL @@ -1718,9 +2270,11 @@ the road, which led it to change lanes and drive into the oncoming lane. Robust AI that is capable of withstanding noise, slight deviations, and intentional manipulation is necessary if we are to be able to trust it. + id: totrans-335 prefs: [] type: TYPE_NORMAL - en: Privacy + id: totrans-336 prefs: - PREF_H2 type: TYPE_NORMAL @@ -1734,6 +2288,7 @@ attractive target for hackers, who steal personal information and sell it on the black market to criminal enterprises. Moreover, governments are already overreaching in an attempt to track each and every individual. + id: totrans-337 prefs: [] type: TYPE_NORMAL - en: All of this is at odds to the universally recognized human right of privacy. @@ -1741,6 +2296,7 @@ about them, who has access to it, how it’s being used, and mechanisms to opt out of the data collection process, as well to delete data that was already collected on them. + id: totrans-338 prefs: [] type: TYPE_NORMAL - en: As developers, we want to be aware of all the data we are collecting, and ask @@ -1749,6 +2305,7 @@ learning techniques such as Federated Learning (used in Google Keyboard) that allow us to train networks on the users’ devices without having to send any of the Personally Identifiable Information (PII) to a server. + id: totrans-339 prefs: [] type: TYPE_NORMAL - en: It turns out that in many of the aforementioned headlines at the beginning of @@ -1760,9 +2317,11 @@ to set a precedent for decades to come. As AI becomes ubiquitous, we need to come together to ask the tough questions and find answers for them if we want to minimize the potential harm while reaping the maximum benefits. + id: totrans-340 prefs: [] type: TYPE_NORMAL - en: Summary + id: totrans-341 prefs: - PREF_H1 type: TYPE_NORMAL @@ -1774,14 +2333,17 @@ architectures, frameworks, and hardware. This sets us up for further exploration in the upcoming chapters. We hope you enjoy the rest of the book. It’s time to dig in! + id: totrans-342 prefs: [] type: TYPE_NORMAL - en: Frequently Asked Questions + id: totrans-343 prefs: - PREF_H1 type: TYPE_NORMAL - en: I’m just getting started. Do I need to spend a lot of money on buying powerful hardware? + id: totrans-344 prefs: - PREF_OL type: TYPE_NORMAL @@ -1794,39 +2356,47 @@ you might want to get a GPU either by renting one on the cloud (Microsoft Azure, Amazon Web Services (AWS), Google Cloud Platform (GCP), and others) or purchasing the hardware. Watch out for those electricity bills, though! + id: totrans-345 prefs: - PREF_IND type: TYPE_NORMAL - en: '![Screenshot of a notebook on GitHub running on Colab inside Chrome](../images/00278.jpeg)' + id: totrans-346 prefs: - PREF_IND type: TYPE_IMG - en: Figure 1-14\. Screenshot of a notebook on GitHub running on Colab inside Chrome + id: totrans-347 prefs: - PREF_IND - PREF_H6 type: TYPE_NORMAL - en: Colab is great, but I already have a powerful computer that I purchased for playing . How should I set up my environment? + id: totrans-348 prefs: - PREF_OL type: TYPE_NORMAL - en: 'The ideal setup involves Linux, but Windows and macOS work, too. For most chapters, you need the following:' + id: totrans-349 prefs: - PREF_IND type: TYPE_NORMAL - en: Python 3 and PIP + id: totrans-350 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL - en: '`tensorflow` or `tensorflow-gpu` PIP package (version 2 or greater)' + id: totrans-351 prefs: - PREF_IND - PREF_UL type: TYPE_NORMAL - en: Pillow + id: totrans-352 prefs: - PREF_IND - PREF_UL @@ -1834,11 +2404,13 @@ - en: We like keeping things clean and self-contained, so we recommend using Python virtual environments. You should use the virtual environment whenever you install a package or run a script or a notebook. + id: totrans-353 prefs: - PREF_IND - PREF_IND type: TYPE_NORMAL - en: If you do not have a GPU, you are done with the setup. + id: totrans-354 prefs: - PREF_IND - PREF_IND @@ -1848,25 +2420,30 @@ there’s an easier solution than installing these packages manually, which can be tedious and error prone even for the best of us: simply install the entire environment with just one line using [Lambda Stack](https://oreil.ly/93oon).' + id: totrans-355 prefs: - PREF_IND - PREF_IND type: TYPE_NORMAL - en: Alternatively, you could install all of your packages using Anaconda Distribution, which works equally well for Windows, Mac, and Linux. + id: totrans-356 prefs: - PREF_IND - PREF_IND type: TYPE_NORMAL - en: Where will I find the code used in this book? + id: totrans-357 prefs: - PREF_OL type: TYPE_NORMAL - en: You’ll find ready-to-run examples at [*http://PracticalDeepLearning.ai*](http://PracticalDeepLearning.ai). + id: totrans-358 prefs: - PREF_IND type: TYPE_NORMAL - en: What are the minimal prerequisites to be able to read this book? + id: totrans-359 prefs: - PREF_OL type: TYPE_NORMAL @@ -1878,19 +2455,23 @@ understanding of mobile development (with Swift and/or Kotlin) will help, we’ve designed the examples to be self-sufficient and easy enough to be deployed by someone who has never written a mobile app previously. + id: totrans-360 prefs: - PREF_IND type: TYPE_NORMAL - en: What frameworks will we be using? + id: totrans-361 prefs: - PREF_OL type: TYPE_NORMAL - en: Keras + TensorFlow for training. And chapter by chapter, we explore different inference frameworks. + id: totrans-362 prefs: - PREF_IND type: TYPE_NORMAL - en: Will I be an expert when I finish this book? + id: totrans-363 prefs: - PREF_OL type: TYPE_NORMAL @@ -1898,31 +2479,38 @@ the way from training to inference, to maximizing performance. Even though this book primarily focuses on computer vision, you can bring the same know-how to other areas such as text, audio, and so on and get up to speed very quickly. + id: totrans-364 prefs: - PREF_IND type: TYPE_NORMAL - en: Who is the cat from earlier in the chapter? + id: totrans-365 prefs: - PREF_OL type: TYPE_NORMAL - en: That is Meher’s cat, Vader. He will be making multiple cameos throughout this book. And don’t worry, he has already signed a model release form. + id: totrans-366 prefs: - PREF_IND type: TYPE_NORMAL - en: Can I contact you? + id: totrans-367 prefs: - PREF_OL type: TYPE_NORMAL - en: Sure. Drop us an email at [PracticalDLBook@gmail.com](mailto:PracticalDLBook@gmail.com) with any questions, corrections, or whatever, or tweet to us [@PracticalDLBook](https://www.twitter.com/PracticalDLBook). + id: totrans-368 prefs: - PREF_IND type: TYPE_NORMAL - en: ^([1](part0003.html#ch01fn1-marker)) If you’re reading a pirated copy, consider us disappointed in you. + id: totrans-369 prefs: [] type: TYPE_NORMAL - en: ^([2](part0003.html#idm45475775799448-marker)) [Robert Geirhos et al.](https://arxiv.org/pdf/1811.12231.pdf) + id: totrans-370 prefs: [] type: TYPE_NORMAL