forked from fengdu78/Coursera-ML-AndrewNg-Notes
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path2 - 7 - Gradient Descent For Linear Regression (10 min).srt
1350 lines (1080 loc) · 43.2 KB
/
2 - 7 - Gradient Descent For Linear Regression (10 min).srt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1
00:00:00,454 --> 00:00:02,208
在以前的视频中我们谈到
In previous videos, we talked
2
00:00:02,238 --> 00:00:04,581
关于梯度下降算法
about the gradient descent algorithm
3
00:00:04,581 --> 00:00:06,005
梯度下降是很常用的算法 它不仅被用在线性回归上
and talked about the linear
4
00:00:06,005 --> 00:00:09,071
和线性回归模型、平方误差代价函数
regression model and the squared error cost function.
5
00:00:09,071 --> 00:00:10,822
在这段视频中 我们要
In this video, we're going to
6
00:00:10,822 --> 00:00:12,695
将梯度下降
put together gradient descent with
7
00:00:12,695 --> 00:00:14,672
和代价函数结合
our cost function, and that
8
00:00:14,672 --> 00:00:16,652
在后面的视频中 我们将用到此算法 并将其应用于
will give us an algorithm for
9
00:00:16,652 --> 00:00:19,431
具体的拟合直线的线性回归算法里
linear regression for fitting a straight line to our data.
10
00:00:21,001 --> 00:00:22,743
这就是
So, this is
11
00:00:22,743 --> 00:00:25,077
我们在之前的课程里所做的工作
what we worked out in the previous videos.
12
00:00:25,077 --> 00:00:27,095
这是梯度下降法
That's our gradient descent algorithm, which
13
00:00:27,095 --> 00:00:29,197
这个算法你应该很熟悉
should be familiar, and you
14
00:00:29,197 --> 00:00:31,199
这是线性回归模型
see the linear linear regression model
15
00:00:31,199 --> 00:00:36,461
还有线性假设和平方误差代价函数
with our linear hypothesis and our squared error cost function.
16
00:00:36,461 --> 00:00:38,612
我们将要做的就是
What we're going to do is apply
17
00:00:38,612 --> 00:00:42,288
用梯度下降的方法
gradient descent to minimize
18
00:00:44,519 --> 00:00:47,537
来最小化平方误差代价函数
our squared error cost function.
19
00:00:47,891 --> 00:00:49,381
为了
Now, in order to apply
20
00:00:49,381 --> 00:00:50,896
使梯度下降 为了
gradient descent, in order
21
00:00:50,896 --> 00:00:52,695
写这段代码
to write this piece of
22
00:00:52,695 --> 00:00:54,191
我们需要的关键项
code, the key term
23
00:00:54,191 --> 00:00:58,384
是这里这个微分项
we need is this derivative term over here.
24
00:00:59,692 --> 00:01:00,798
所以.我们需要弄清楚
So, we need to figure out
25
00:01:00,798 --> 00:01:02,830
这个偏导数项是什么
what is this partial derivative term,
26
00:01:02,830 --> 00:01:04,478
并结合这里的
and plug in the
27
00:01:04,478 --> 00:01:06,249
代价函数J
definition of the cost
28
00:01:06,249 --> 00:01:08,418
的定义
function J, this turns
29
00:01:08,418 --> 00:01:11,074
就是这样
out to be this "inaudible"
30
00:01:12,613 --> 00:01:15,156
一个求和项
equals sum 1 through M of
31
00:01:15,156 --> 00:01:18,856
代价函数就是
this squared error
32
00:01:20,456 --> 00:01:22,023
这个误差平方项
cost function term, and all
33
00:01:22,023 --> 00:01:23,794
我这样做 只是
I did here was I just
34
00:01:23,794 --> 00:01:25,538
把定义好的代价函数
you know plugged in the definition of
35
00:01:25,538 --> 00:01:28,275
插入了这个微分式
the cost function there, and simplifying
36
00:01:28,275 --> 00:01:30,563
再简化一下
little bit more, this turns
37
00:01:30,563 --> 00:01:34,136
这等于是
out to be equal to, this
38
00:01:34,136 --> 00:01:36,983
这一个求和项
"inaudible" equals sum 1 through M
39
00:01:36,983 --> 00:01:40,609
θ0 + θ1x(1) - y(i)
of tetha zero plus tetha one, XI
40
00:01:41,163 --> 00:01:43,427
θ0 + θ1x(1) - y(i)
minus YI squared.
41
00:01:43,427 --> 00:01:44,777
这一项其实就是
And all I did there was took
42
00:01:44,777 --> 00:01:46,983
我的假设的定义
the definition for my hypothesis
43
00:01:46,983 --> 00:01:49,376
然后把这一项放进去
and plug that in there.
44
00:01:49,622 --> 00:01:51,669
实际上我们需要
And it turns out we need
45
00:01:51,669 --> 00:01:52,523
弄清楚这两个
to figure out what is
46
00:01:52,523 --> 00:01:54,011
偏导数项是什么
the partial derivative of two
47
00:01:54,011 --> 00:01:55,284
这两项分别是 j=0
cases for J equals
48
00:01:55,284 --> 00:01:57,006
和j=1的情况
0 and for J equals 1 want
49
00:01:57,006 --> 00:01:58,547
因此我们要弄清楚
to figure out what is this
50
00:01:58,547 --> 00:02:00,767
θ0 和 θ1 对应的
partial derivative for both the
51
00:02:00,767 --> 00:02:04,115
偏导数项是什么
theta(0) case and the theta(1) case.
52
00:02:04,115 --> 00:02:07,012
我只把答案写出来
And I'm just going to write out the answers.
53
00:02:07,012 --> 00:02:10,996
事实上 第一项可简化为
It turns out this first term simplifies
54
00:02:10,996 --> 00:02:14,218
1 / m 乘以求和式
to 1/M, sum over
55
00:02:14,218 --> 00:02:16,446
对所有训练样本求和
my training set of just
56
00:02:16,446 --> 00:02:21,146
求和项是 h(x(i))-y(i)
that, X(i)- Y(i).
57
00:02:21,146 --> 00:02:23,951
而这一项
And for this term, partial derivative
58
00:02:23,951 --> 00:02:26,186
对θ(1)的微分项
with respect to theta(1), it turns
59
00:02:26,186 --> 00:02:34,954
得到的是这样一项
out I get this term: -Y(i)<i>X(i).</i>
60
00:02:35,031 --> 00:02:36,187
对吧
Okay.
61
00:02:36,402 --> 00:02:38,690
计算出这些
And computing these partial
62
00:02:38,690 --> 00:02:40,761
偏导数项
derivatives, so going from
63
00:02:40,761 --> 00:02:43,406
从这个等式
this equation to either
64
00:02:43,406 --> 00:02:46,414
到下面的等式
of these equations down there, computing
65
00:02:46,414 --> 00:02:51,090
计算这些偏导数项需要一些多元微积分
those partial derivative terms requires some multivariate calculus.
66
00:02:51,090 --> 00:02:53,118
如果你掌握了微积分
If you know calculus, feel free
67
00:02:53,118 --> 00:02:54,825
你可以随便自己推导这些
to work through the derivations yourself
68
00:02:54,825 --> 00:02:57,060
然后你检查你的微分
and check take the derivatives
69
00:02:57,060 --> 00:02:59,853
你实际上会得到我给出的答案
you actually get the answers that I got.
70
00:02:59,853 --> 00:03:00,855
但如果你
But if you are less
71
00:03:00,855 --> 00:03:02,611
不太熟悉微积分
familiar with calculus you don't
72
00:03:02,611 --> 00:03:04,235
别担心
worry about it, and it
73
00:03:04,235 --> 00:03:06,260
你可以直接用这些
is fine to just take these equations
74
00:03:06,260 --> 00:03:08,025
已经算出来的结果
worked out, and you
75
00:03:08,025 --> 00:03:09,462
你不需要掌握微积分
won't need to know calculus or
76
00:03:09,462 --> 00:03:10,340
或者别的东西
anything like that in order to
77
00:03:10,340 --> 00:03:14,551
来完成作业 你只需要会用梯度下降就可以
do the homework, so to implement gradient descent you'd get that to work.
78
00:03:14,551 --> 00:03:16,520
在定义这些以后
But so, after these definitions,
79
00:03:16,520 --> 00:03:18,156
在我们算出
or after what we've worked
80
00:03:18,156 --> 00:03:19,993
这些微分项以后
out to be the derivatives, which
81
00:03:19,993 --> 00:03:21,316
这些微分项
is really just the slope of
82
00:03:21,316 --> 00:03:22,889
实际上就是代价函数J的斜率
the cost function J. We
83
00:03:22,889 --> 00:03:24,724
现在可以将它们放回
can now plug them back into
84
00:03:24,724 --> 00:03:27,157
我们的梯度下降算法
our gradient descent algorithm.
85
00:03:27,157 --> 00:03:28,794
所以这就是专用于
So here's gradient descent, or
86
00:03:28,794 --> 00:03:30,309
线性回归的梯度下降
the regression, which is going
87
00:03:30,309 --> 00:03:32,971
反复执行括号中的式子直到收敛
to repeat until convergence, theta 0
88
00:03:32,971 --> 00:03:35,552
θ0和θ1不断被更新
and theta one get updated as,
89
00:03:35,552 --> 00:03:37,163
都是加上一个-α/m
you know, the same minus alpha
90
00:03:37,163 --> 00:03:39,591
乘上后面的求和项
times the derivative term.
91
00:03:39,591 --> 00:03:43,202
所以这里这一项
So, this term here.
92
00:03:43,202 --> 00:03:46,854
所以这就是我们的线性回归算法
So, here's our linear regression algorithm.
93
00:03:46,854 --> 00:03:52,696
这里的第一项
This first term here that
94
00:03:52,696 --> 00:03:54,495
当然
term is, of course, just
95
00:03:54,495 --> 00:03:56,071
这一项就是关于θ0的偏导数
a partial derivative of respective
96
00:03:56,071 --> 00:03:59,995
在上一张幻灯片中推出的
theta zero, that we worked on in the previous slide.
97
00:03:59,995 --> 00:04:03,454
而第二项
And this second term here,
98
00:04:03,454 --> 00:04:04,199
这一项是刚刚的推导出的
that term is just
99
00:04:04,199 --> 00:04:05,947
关于θ1的
a partial derivative with respect to
100
00:04:05,947 --> 00:04:11,188
偏导数项
theta one that we worked out on the previous line.
101
00:04:11,188 --> 00:04:13,582
提醒一下
And just as a quick reminder,
102
00:04:13,582 --> 00:04:15,485
执行梯度下降时
you must, when implementing gradient descent,
103
00:04:15,485 --> 00:04:17,067
有一个细节要注意
there's actually there's detail that, you
104
00:04:17,067 --> 00:04:19,248
就是必须要
know, you should be implementing
105
00:04:19,248 --> 00:04:24,452
同时更新θ0和θ1
it so the update theta zero and theta one simultaneously.
106
00:04:24,452 --> 00:04:28,270
所以 让我们来看看梯度下降是如何工作的
So, let's see how gradient descent works.
107
00:04:28,270 --> 00:04:29,447
我们用梯度下降解决问题的
One of the issues we solved
108
00:04:29,447 --> 00:04:32,839
一个原因是 它更容易得到局部最优值
gradient descent is that it can be susceptible to local optima.
109
00:04:32,839 --> 00:04:34,433
当我第一次解释梯度下降时
So, when I first explained gradient
110
00:04:34,449 --> 00:04:36,136
我展示过这幅图
descent, I showed you this picture
111
00:04:36,136 --> 00:04:37,724
在表面上
of it, you know, going downhill
112
00:04:37,724 --> 00:04:38,788
不断下降
on the surface and we
113
00:04:38,788 --> 00:04:40,120
并且我们知道了
saw how, depending on where
114
00:04:40,120 --> 00:04:42,872
根据你的初始化 你会得到不同的局部最优解
you're initializing, you can end up with different local optima.
115
00:04:42,872 --> 00:04:44,846
你知道.你可以结束了.在这里或这里。
You know, you can end up here or here.
116
00:04:44,846 --> 00:04:46,958
但是 事实证明
But, it turns out that
117
00:04:46,958 --> 00:04:49,025
用于线性回归的
the cost function for gradient
118
00:04:49,025 --> 00:04:50,791
代价函数
of cost function for linear regression
119
00:04:50,791 --> 00:04:52,754
总是这样一个
is always going to be
120
00:04:52,754 --> 00:04:55,305
弓形的样子
a bow-shaped function like this.
121
00:04:55,305 --> 00:04:57,573
这个函数的专业术语是
The technical term for this
122
00:04:57,573 --> 00:05:01,163
这是一个凸函数
is that this is called a convex function.
123
00:05:02,825 --> 00:05:04,920
我不打算在这门课中
And I'm not going
124
00:05:04,920 --> 00:05:06,561
给出凸函数的定义
to give the formal definition for what
125
00:05:06,561 --> 00:05:09,557
凸函数(convex function)
is a convex function, c-o-n-v-e-x, but
126
00:05:09,557 --> 00:05:11,310
但不正式的说法是
informally a convex function
127
00:05:11,310 --> 00:05:14,793
它就是一个弓形的函数
means a bow-shaped function, you know, kind of like a bow shaped.
128
00:05:14,793 --> 00:05:18,006
因此 这个函数
And so, this function doesn't
129
00:05:18,006 --> 00:05:19,738
没有任何局部最优解
have any local optima, except
130
00:05:19,738 --> 00:05:22,433
只有一个全局最优解
for the one global optimum.
131
00:05:22,433 --> 00:05:24,265
并且无论什么时候
And does gradient descent on
132
00:05:24,265 --> 00:05:25,590
你对这种代价函数
this type of cost function which
133
00:05:25,590 --> 00:05:27,695
使用线性回归
you get whenever you're using linear
134
00:05:27,695 --> 00:05:29,201
梯度下降法得到的结果
regression, it will always convert
135
00:05:29,201 --> 00:05:33,623
总是收敛到全局最优值 因为没有全局最优以外的其他局部最优点
to the global optimum, because there are no other local optima other than global optimum.
136
00:05:33,623 --> 00:05:37,272
现在 让我们来看看这个算法的执行过程
So now, let's see this algorithm in action.
137
00:05:38,026 --> 00:05:40,085
像往常一样
As usual, here are plots of
138
00:05:40,085 --> 00:05:42,182
这是假设函数的图
the hypothesis function and of
139
00:05:42,182 --> 00:05:45,024
还有代价函数J的图
my cost function J.
140
00:05:45,763 --> 00:05:47,011
让我们来看看如何
And so, let's see how
141
00:05:47,011 --> 00:05:49,687
初始化参数的值
to initialize my parameters at this value.
142
00:05:49,687 --> 00:05:51,652
通常来说
You know, let's say, usually you
143
00:05:51,652 --> 00:05:53,590
初始化参数为零
initialize your parameters at zero
144
00:05:53,590 --> 00:05:56,287
θ0和θ1都在零
for zero, theta zero and zero.
145
00:05:56,287 --> 00:05:58,331
但为了展示需要
For illustration in this
146
00:05:58,331 --> 00:05:59,948
在这个梯度下降的实现中
specific presentation, I have
147
00:05:59,948 --> 00:06:02,615
我把θ0初始化为-900
initialised theta zero at
148
00:06:02,615 --> 00:06:06,831
θ1初始化为-0.1
about 900, and theta one at about minus 0.1, okay?
149
00:06:06,831 --> 00:06:09,791
这对应的假设
And so, this corresponds to H
150
00:06:09,791 --> 00:06:12,022
就应该是这样
over X, equals, you know,
151
00:06:12,022 --> 00:06:15,859
h(x)是等于-900减0.1x
minus 900 minus 0.1 x
152
00:06:15,859 --> 00:06:19,341
这对应我们的代价函数
is this line, so out here on the cost function.
153
00:06:19,341 --> 00:06:20,491
现在 如果我们进行
Now if we take one
154
00:06:20,491 --> 00:06:22,163
一次梯度下降
step of gradient descent, we end
155
00:06:22,163 --> 00:06:24,298
从这个点开始
up going from this point
156
00:06:24,298 --> 00:06:27,065
在这里.一点点
out here, a little
157
00:06:27,065 --> 00:06:29,564
向左下方移动了一小步
bit to the down left
158
00:06:29,564 --> 00:06:31,732
这就得到了第二个点
to that second point over there.
159
00:06:31,732 --> 00:06:35,279
而且你注意到这条线改变了一点点
And, you notice that my line changed a little bit.
160
00:06:35,279 --> 00:06:36,547
然后我再进行
And, as I take another step
161
00:06:36,547 --> 00:06:41,425
一步梯度下降 左边这条线又变一点
at gradient descent, my line on the left will change.
162
00:06:41,425 --> 00:06:42,376
对吧
Right.
163
00:06:42,376 --> 00:06:43,804
同样地
And I have also
164
00:06:43,804 --> 00:06:47,544
我又移到代价函数上的另一个点
moved to a new point on my cost function.
165
00:06:47,544 --> 00:06:48,745
再进行一步梯度下降
And as I think further step
166
00:06:48,745 --> 00:06:50,697
我觉得我的代价项
is gradient descent, I'm going
167
00:06:50,697 --> 00:06:53,058
应该开始下降了
down in cost, right, so
168
00:06:53,058 --> 00:06:55,079
所以我的参数是
my parameter is following
169
00:06:55,079 --> 00:06:58,062
跟随着这个轨迹
this trajectory, and if
170
00:06:58,062 --> 00:07:00,368
再看左边这个图
you look on the left, this corresponds
171
00:07:00,368 --> 00:07:04,025
这个表示的是假设函数h(x)
to hypotheses that seem
172
00:07:04,025 --> 00:07:04,912
它变得好像
to be getting to be
173
00:07:04,912 --> 00:07:06,429
越来越拟合数据
better and better fits for the
174
00:07:06,429 --> 00:07:09,987
直到它渐渐地
data until eventually,
175
00:07:09,987 --> 00:07:12,663
收敛到全局最小值
I have now wound up at the global minimum.
176
00:07:12,663 --> 00:07:16,189
这个全局最小值
And this global minimum corresponds to
177
00:07:16,189 --> 00:07:20,506
对应的假设函数 给出了最拟合数据的解
this hypothesis, which gives me a good fit to the data.
178
00:07:20,922 --> 00:07:23,605
这就是梯度下降法
And so that's gradient
179
00:07:23,605 --> 00:07:24,969
我们刚刚运行了一遍
descent, and we've just run
180
00:07:24,969 --> 00:07:27,302
并且最终得到了
it and gotten a good
181
00:07:27,302 --> 00:07:31,359
房价数据的最好拟合结果
fit to my data set of housing prices.
182
00:07:31,359 --> 00:07:34,108
现在你可以用它来预测
And you can now use it to predict.
183
00:07:34,108 --> 00:07:35,284
比如说 假如你有个朋友
You know, if your friend has a
184
00:07:35,284 --> 00:07:36,452
他有一套房子
house with a
185
00:07:36,452 --> 00:07:39,116
面积1250平方英尺(约116平米)
size 1250 square feet, you
186
00:07:39,116 --> 00:07:40,684
现在你可以通过这个数据
can now read off the value
187
00:07:40,684 --> 00:07:42,090
然后告诉他们
and tell them that, I don't
188
00:07:42,090 --> 00:07:43,188
也许他的房子
know, maybe they can get
189
00:07:43,188 --> 00:07:47,159
可以卖到35万美元
$350,000 for their house.
190
00:07:48,606 --> 00:07:49,982
最后 我想再给出
Finally, just to give
191
00:07:49,982 --> 00:07:51,930
另一个名字
this another name, it turns out
192
00:07:51,930 --> 00:07:52,991
实际上 我们
that the algorithm that we
193
00:07:52,991 --> 00:07:55,030
刚刚使用的算法
just went over is sometimes
194
00:07:55,030 --> 00:07:57,074
有时也称为批量梯度下降
called batch gradient descent.
195
00:07:57,074 --> 00:07:58,781
实际上 在机器学习中
And it turns out in machine
196
00:07:58,781 --> 00:08:00,203
我们这些搞机器学习的人
learning, I feel like us machine
197
00:08:00,203 --> 00:08:02,050
通常不太会
learning people, we're not always
198
00:08:02,050 --> 00:08:04,381
给算法起名字
created has given me some algorithms.
199
00:08:04,381 --> 00:08:06,634
但这个名字"批量梯度下降"
But the term batch gradient descent
200
00:08:06,634 --> 00:08:08,212
指的是
means that refers to the