forked from fengdu78/Coursera-ML-AndrewNg-Notes
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path12 - 5 - Kernels II (16 min).srt
2201 lines (1761 loc) · 39.7 KB
/
12 - 5 - Kernels II (16 min).srt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1
00:00:00,530 --> 00:00:01,550
In the last video, we started
在上一节视频里 我们讨论了
(字幕整理:中国海洋大学 黄海广,[email protected] )
2
00:00:01,950 --> 00:00:03,230
to talk about the kernels idea
核函数这个想法
3
00:00:03,710 --> 00:00:04,590
and how it can be used to
以及怎样利用它去
4
00:00:04,860 --> 00:00:07,900
define new features for the support vector machine.
实现支持向量机的一些新特性
5
00:00:08,100 --> 00:00:08,910
In this video, I'd like to throw
在这一节视频中 我将
6
00:00:09,230 --> 00:00:10,670
in some of the missing details and,
补充一些缺失的细节
7
00:00:11,020 --> 00:00:12,070
also, say a few words about
并简单的介绍一下
8
00:00:12,270 --> 00:00:14,100
how to use these ideas in practice.
怎么在实际中使用应用这些想法
9
00:00:14,650 --> 00:00:15,850
Such as, how they pertain
例如 怎么处理
10
00:00:16,340 --> 00:00:20,120
to, for example, the bias variance trade-off in support vector machines.
支持向量机中的偏差方差折中
11
00:00:22,690 --> 00:00:23,680
In the last video, I talked
在上一节课中
12
00:00:24,000 --> 00:00:25,970
about the process of picking a few landmarks.
我谈到过选择标记点
13
00:00:26,660 --> 00:00:28,890
You know, l1, l2, l3 and that
例如 l1 l2 l3
14
00:00:29,150 --> 00:00:30,220
allowed us to define the
这些点使我们能够定义
15
00:00:30,300 --> 00:00:31,900
similarity function also called
相似度函数
16
00:00:32,200 --> 00:00:33,500
the kernel or in this
也称之为核函数
17
00:00:33,690 --> 00:00:34,830
example if you have
在这个例子里
18
00:00:35,070 --> 00:00:37,410
this similarity function this is a Gaussian kernel.
我们的相似度函数为高斯核函数
19
00:00:38,610 --> 00:00:40,370
And that allowed us to build
这使我们能够
20
00:00:40,660 --> 00:00:42,070
this form of a hypothesis function.
构造一个模型
21
00:00:43,180 --> 00:00:44,880
But where do we get these landmarks from?
但是 我们从哪里得到这些标记点?
22
00:00:45,150 --> 00:00:45,670
Where do we get l1, l2, l3 from?
我们从哪里得到l1 l2 l3?
23
00:00:45,690 --> 00:00:49,080
And it seems, also, that for complex learning
而且 在一些复杂的学习问题中
24
00:00:49,610 --> 00:00:50,830
problems, maybe we want a
也许我们需要
25
00:00:50,920 --> 00:00:53,060
lot more landmarks than just three of them that we might choose by hand.
更多的标记点 而不是我们手选的这三个
26
00:00:55,160 --> 00:00:56,450
So in practice this is
因此 在实际应用时
27
00:00:56,580 --> 00:00:57,730
how the landmarks are chosen
怎么选取标记点
28
00:00:57,830 --> 00:00:59,910
which is that given the
是机器学习中必须解决的问题
29
00:01:00,150 --> 00:01:01,110
machine learning problem. We have some
这是我们的数据集
30
00:01:01,370 --> 00:01:02,230
data set of some some positive
有一些正样本和一些负样本
31
00:01:02,710 --> 00:01:04,460
and negative examples. So, this is the idea here
我们的想法是
32
00:01:05,310 --> 00:01:06,270
which is that we're gonna take the
我们将选取样本点
33
00:01:06,630 --> 00:01:08,200
examples and for every
我们拥有的
34
00:01:08,470 --> 00:01:09,780
training example that we have,
每一个样本点
35
00:01:10,490 --> 00:01:11,430
we are just going to call
我们只需要直接使用它们
36
00:01:11,980 --> 00:01:13,270
it. We're just going
我们直接
37
00:01:13,440 --> 00:01:14,850
to put landmarks as exactly
将训练样本
38
00:01:15,490 --> 00:01:17,600
the same locations as the training examples.
作为标记点
39
00:01:18,930 --> 00:01:20,360
So if I have one training
如果我有一个
40
00:01:20,680 --> 00:01:21,880
example if that is x1,
训练样本x1
41
00:01:22,120 --> 00:01:23,460
well then I'm going
那么
42
00:01:23,670 --> 00:01:24,550
to choose this is my first landmark
我将在这个样本点
43
00:01:25,100 --> 00:01:26,470
to be at xactly the same location
精确一致的位置上
44
00:01:27,250 --> 00:01:28,170
as my first training example.
选作我的第一个标记点
45
00:01:29,260 --> 00:01:30,180
And if I have a different training
如果我有另一个
46
00:01:30,470 --> 00:01:32,340
example x2. Well we're
训练样本x2
47
00:01:32,500 --> 00:01:33,980
going to set the second landmark
那么 我将把第二个标记点选在
48
00:01:35,060 --> 00:01:37,300
to be the location of my second training example.
与第二个样本点一致的位置上
49
00:01:38,480 --> 00:01:39,320
On the figure on the right, I
在右边的这幅图上
50
00:01:39,480 --> 00:01:40,480
used red and blue dots
我用红点和蓝点
51
00:01:40,820 --> 00:01:41,930
just as illustration, the color
来阐述
52
00:01:42,420 --> 00:01:44,320
of this figure, the color of
这幅图以及这些点的颜色
53
00:01:44,370 --> 00:01:46,030
the dots on the figure on the right is not significant.
可能并不显眼
54
00:01:47,120 --> 00:01:47,930
But what I'm going to end up
但是利用
55
00:01:48,110 --> 00:01:49,660
with using this method is I'm
这个方法
56
00:01:49,790 --> 00:01:51,450
going to end up with m
最终能得到
57
00:01:52,160 --> 00:01:53,690
landmarks of l1, l2
m个标记点 l1 l2
58
00:01:54,950 --> 00:01:56,320
down to l(m) if I
直到lm
59
00:01:56,380 --> 00:01:58,180
have m training examples with
即每一个标记点
60
00:01:58,420 --> 00:02:00,500
one landmark per location of
的位置都与
61
00:02:00,810 --> 00:02:02,680
my per location of each
每一个样本点
62
00:02:02,860 --> 00:02:04,810
of my training examples. And this is
的位置精确对应
63
00:02:04,950 --> 00:02:05,920
nice because it is saying that
这个过程很棒
64
00:02:06,120 --> 00:02:07,630
my features are basically going
这说明特征函数基本上
65
00:02:07,700 --> 00:02:09,300
to measure how close an
是在描述
66
00:02:09,380 --> 00:02:10,800
example is to one
每一个样本距离
67
00:02:10,970 --> 00:02:13,150
of the things I saw in my training set.
样本集中其他样本的距离
68
00:02:13,440 --> 00:02:14,180
So, just to write this outline a
我们具体的列出
69
00:02:14,350 --> 00:02:16,270
little more concretely, given m
这个过程的大纲
70
00:02:16,470 --> 00:02:17,870
training examples, I'm going
给定m个训练样本
71
00:02:18,050 --> 00:02:19,100
to choose the the location
我将选取与
72
00:02:19,310 --> 00:02:20,430
of my landmarks to be exactly
m个训练样本精确一致
73
00:02:21,190 --> 00:02:23,920
near the locations of my m training examples.
的位置作为我的标记点
74
00:02:25,430 --> 00:02:26,600
When you are given example x,
当输入样本x
75
00:02:26,920 --> 00:02:28,090
and in this example x can be
样本x可以
76
00:02:28,230 --> 00:02:29,260
something in the training set,
属于训练集
77
00:02:29,570 --> 00:02:30,800
it can be something in the cross validation
也可以属于交叉验证集
78
00:02:31,490 --> 00:02:32,470
set, or it can be something in the test set.
也可以属于测试集
79
00:02:33,320 --> 00:02:34,090
Given an example x we are
给定样本x
80
00:02:34,320 --> 00:02:35,470
going to compute, you know,
我们可以计算
81
00:02:35,750 --> 00:02:37,220
these features as so f1,
这些特征 即f1
82
00:02:37,560 --> 00:02:39,180
f2, and so on.
f2 以此类推
83
00:02:39,580 --> 00:02:41,120
Where l1 is actually equal
这里l1等于x1
84
00:02:41,490 --> 00:02:42,850
to x1 and so on.
剩下标记点的以此类推
85
00:02:43,570 --> 00:02:46,080
And these then give me a feature vector.
最终我们能到一个特征向量
86
00:02:46,840 --> 00:02:49,540
So let me write f as the feature vector.
我将特征向量记为f
87
00:02:50,270 --> 00:02:52,090
I'm going to take these f1, f2 and
我将f1 f2等等
88
00:02:52,290 --> 00:02:53,370
so on, and just group
构造为
89
00:02:53,580 --> 00:02:55,330
them into feature vector.
特征向量
90
00:02:56,330 --> 00:02:58,000
Take those down to fm.
一直写到fm
91
00:02:59,320 --> 00:03:01,080
And, you know, just by convention.
此外 按照惯例
92
00:03:01,610 --> 00:03:02,870
If we want, we can add an
如果我们需要的话
93
00:03:02,990 --> 00:03:06,250
extra feature f0, which is always equal to 1.
可以添加额外的特征f0 f0的值始终为1
94
00:03:06,450 --> 00:03:08,530
So this plays a role similar to what we had previously.
它与我们之前讨论过的
95
00:03:09,480 --> 00:03:11,200
For x0, which was our interceptor.
截距x0的作用相似
96
00:03:13,200 --> 00:03:14,450
So, for example, if we
举个例子
97
00:03:14,580 --> 00:03:16,550
have a training example x(i), y(i),
假设我们右训练样本(xi, yi)
98
00:03:18,270 --> 00:03:19,300
the features we would compute for
这个样本对应的
99
00:03:20,080 --> 00:03:21,330
this training example will be
特征向量可以
100
00:03:21,440 --> 00:03:23,440
as follows: given x(i), we
这样计算 给定xi
101
00:03:23,640 --> 00:03:26,560
will then map it to, you know, f1(i).
我们可以通过相似度函数
102
00:03:27,980 --> 00:03:29,670
Which is the similarity. I'm going to
将其映射到f1(i)
103
00:03:29,960 --> 00:03:31,980
abbreviate as SIM instead of writing out the whole
在这里 我将整个单词similarity
104
00:03:32,090 --> 00:03:33,380
word
简记为
105
00:03:35,540 --> 00:03:35,540
similarity, right?
SIM
106
00:03:37,050 --> 00:03:39,180
And f2(i) equals the similarity
f2(i)等于x(i)与l2
107
00:03:40,090 --> 00:03:42,780
between x(i) and l2,
之间的相似度
108
00:03:43,140 --> 00:03:45,050
and so on,
以此类推
109
00:03:45,230 --> 00:03:48,370
down to fm(i) equals
最后有fm(i)
110
00:03:49,600 --> 00:03:54,480
the similarity between x(i) and l(m).
等于x(i)与lm之间的相似度
111
00:03:55,700 --> 00:03:58,700
And somewhere in the middle.
在这一列中间的
112
00:03:59,160 --> 00:04:01,320
Somewhere in this list, you know, at
某个位置
113
00:04:01,480 --> 00:04:03,930
the i-th component, I will
即第i个元素
114
00:04:04,230 --> 00:04:05,740
actually have one feature
有一个特征
115
00:04:06,150 --> 00:04:07,590
component which is f subscript
为fi(i)
116
00:04:08,170 --> 00:04:09,930
i(i), which is
为fi(i)
117
00:04:10,050 --> 00:04:11,180
going to be the similarity
这是xi和li之间的
118
00:04:13,080 --> 00:04:14,550
between x and l(i).
相似度
119
00:04:15,680 --> 00:04:16,990
Where l(i) is equal to
这里l(i)就等于
120
00:04:17,190 --> 00:04:18,560
x(i), and so you know
x(i) 所以
121
00:04:19,140 --> 00:04:20,320
fi(i) is just going to
fi(i)衡量的是
122
00:04:20,410 --> 00:04:22,250
be the similarity between x and itself.
x(i)与其自身的相似度
123
00:04:23,960 --> 00:04:25,380
And if you're using the Gaussian kernel this is
如果你使用高斯核函数的话
124
00:04:25,620 --> 00:04:26,720
actually e to the minus 0
这一项等于
125
00:04:27,170 --> 00:04:29,440
over 2 sigma squared and so, this will be equal to 1 and that's okay.
exp(-0/(2*sigma^2)) 等于1
126
00:04:29,790 --> 00:04:31,060
So one of my features for this
所以 对于这个样本来说
127
00:04:31,370 --> 00:04:32,940
training example is going to be equal to 1.
其中的某一个特征等于1
128
00:04:34,290 --> 00:04:35,570
And then similar to what I have above.
接下来 类似于我们之前的过程
129
00:04:35,990 --> 00:04:36,940
I can take all of these
我将这m个特征
130
00:04:37,870 --> 00:04:39,910
m features and group them into a feature vector.
合并为一个特征向量
131
00:04:40,340 --> 00:04:41,730
So instead of representing my example,
于是 相比之前用x(i)来描述样本
132
00:04:42,710 --> 00:04:44,200
using, you know, x(i) which is this what
x(i)为n维或者n+1维空间
133
00:04:44,430 --> 00:04:46,970
R(n) plus R(n) one dimensional vector.
的向量
134
00:04:48,290 --> 00:04:49,590
Depending on whether you can
取决于你的具体项数
135
00:04:49,990 --> 00:04:51,120
set terms, is either R(n)
可能为n维向量空间
136
00:04:52,070 --> 00:04:52,750
or R(n) plus 1.
也可能为n+1维向量空间
137
00:04:53,440 --> 00:04:55,140
We can now instead represent my
我们现在可以用
138
00:04:55,300 --> 00:04:56,700
training example using this feature
这个特征向量f
139
00:04:56,980 --> 00:04:58,810
vector f. I am
来描述我的特征向量
140
00:04:58,920 --> 00:05:01,240
going to write this f superscript i. Which
我将合并f(i)
141
00:05:01,400 --> 00:05:03,060
is going to be taking all
将所有这些项
142
00:05:03,300 --> 00:05:06,010
of these things and stacking them into a vector.
合并为一个向量
143
00:05:06,540 --> 00:05:09,180
So, f1(i) down
即从f1(i)
144
00:05:09,430 --> 00:05:12,740
to fm(i) and if you want and
到fm(i) 如果有需要的话
145
00:05:13,030 --> 00:05:15,160
well, usually we'll also add this
我们通常也会加上
146
00:05:15,420 --> 00:05:16,990
f0(i), where
f0(i)这一项
147
00:05:17,130 --> 00:05:19,370
f0(i) is equal to 1.
f0(i)等于1
148
00:05:19,370 --> 00:05:20,970
And so this vector
那么 这个向量
149
00:05:21,300 --> 00:05:23,260
here gives me my
就是
150
00:05:23,430 --> 00:05:25,180
new feature vector with which
我们用于描述训练样本的
151
00:05:25,480 --> 00:05:28,310
to represent my training example.
特征向量
152
00:05:29,040 --> 00:05:30,980
So given these kernels
当给定核函数
153
00:05:31,530 --> 00:05:33,160
and similarity functions, here's how
和相似度函数后
154
00:05:33,400 --> 00:05:35,030
we use a simple vector machine.
我们按照这个方法来使用支持向量机
155
00:05:35,720 --> 00:05:37,100
If you already have a learning
如果你已经得到参数theta
156
00:05:37,300 --> 00:05:39,040
set of parameters theta, then if you given a value of x and you want to make a prediction.
并且想对样本x做出预测
157
00:05:41,680 --> 00:05:42,850
What we do is we compute the
我们先要计算
158
00:05:43,060 --> 00:05:44,170
features f, which is now
特征向量f
159
00:05:44,450 --> 00:05:46,920
an R(m) plus 1 dimensional feature vector.
f是m+1维特征向量
160
00:05:49,040 --> 00:05:50,640
And we have m here because we have
这里之所以有m
161
00:05:51,610 --> 00:05:53,190
m training examples and thus
是因为我们有m个训练样本
162
00:05:53,570 --> 00:05:56,370
m landmarks and what
于是就有m个标记点
163
00:05:57,330 --> 00:05:58,310
we do is we predict
我们在theta的转置乘以f
164
00:05:58,600 --> 00:06:00,180
1 if theta transpose f
大于或等于0时
165
00:06:00,780 --> 00:06:01,860
is greater than or equal to 0.
预测y=1
166
00:06:02,230 --> 00:06:02,430
Right.
具体一点
167
00:06:02,640 --> 00:06:03,770
So, if theta transpose f, of course,
theta的转置乘以f
168
00:06:04,090 --> 00:06:07,200
that's just equal to theta 0, f0 plus theta 1,
等于theta_0*f_0加上theta_1*f_1
169
00:06:07,900 --> 00:06:08,990
f1 plus dot dot
加上点点点
170
00:06:09,120 --> 00:06:11,200
dot, plus theta m
直到theta_m*f_m
171
00:06:12,170 --> 00:06:13,900
f(m). And so my
所以
172
00:06:14,050 --> 00:06:15,720
parameter vector theta is also now
参数向量theta
173
00:06:16,170 --> 00:06:17,730
going to be an m
在这里为
174
00:06:17,990 --> 00:06:21,260
plus 1 dimensional vector.
m+1维向量
175
00:06:21,780 --> 00:06:23,100
And we have m here because where
这里有m是因为
176
00:06:23,260 --> 00:06:25,030
the number of landmarks is equal
标记点的个数等于
177
00:06:25,450 --> 00:06:26,600
to the training set size.
训练点的个数
178
00:06:26,910 --> 00:06:28,190
So m was the training set size and now, the
m就是训练集的大小
179
00:06:29,100 --> 00:06:31,950
parameter vector theta is going to be m plus one dimensional.
所以 参数向量theta为m+1维
180
00:06:32,990 --> 00:06:33,990
So that's how you make a prediction
以上就是当已知参数theta时
181
00:06:34,360 --> 00:06:36,870
if you already have a setting for the parameter's theta.
怎么做出预测的过程
182
00:06:37,840 --> 00:06:39,160
How do you get the parameter's theta?
但是怎么得到参数theta?
183
00:06:39,680 --> 00:06:40,650
Well you do that using the
你在使用
184
00:06:40,920 --> 00:06:43,040
SVM learning algorithm, and specifically
SVM学习算法时
185
00:06:43,850 --> 00:06:46,460
what you do is you would solve this minimization problem.
具体来说就是要求解这个最小化问题
186
00:06:46,690 --> 00:06:48,170
You've minimized the parameter's
你需要求出能使这个式子取最小值的参数theta
187
00:06:48,540 --> 00:06:51,630
theta of C times this cost function which we had before.
式子为C乘以这个我们之前见过的代价函数
188
00:06:52,430 --> 00:06:54,770
Only now, instead of looking
只是在这里
189
00:06:55,040 --> 00:06:56,650
there instead of making
相比之前使用
190
00:06:56,970 --> 00:06:59,300
predictions using theta transpose
theta的转置乘以x^(i) 即我们的原始特征
191
00:07:00,020 --> 00:07:01,410
x(i) using our original
做出预测
192
00:07:01,720 --> 00:07:03,320
features, x(i). Instead we've
我们将替换
193
00:07:03,520 --> 00:07:04,840
taken the features x(i)
特征向量x^(i)
194
00:07:05,090 --> 00:07:06,260
and replace them with a new features
并使用这个新的特征向量
195
00:07:07,270 --> 00:07:09,080
so we are using theta transpose
我们使用theta的转置
196
00:07:09,380 --> 00:07:10,840
f(i) to make a
乘以f^(i)来对第i个训练样本
197
00:07:11,130 --> 00:07:12,480
prediction on the i'f training
做出预测
198
00:07:12,860 --> 00:07:13,860
examples and we see that, you know,
我们可以看到
199
00:07:14,230 --> 00:07:16,580
in both places here and
这两个地方(都要做出替换)
200
00:07:16,700 --> 00:07:18,270
it's by solving this minimization problem
通过解决这个最小化问题