srt/2 - 7 - Gradient Descent For Linear Regression (10 min).srt

﻿1
00:00:00,454 --> 00:00:02,208
在以前的视频中我们谈到
In previous videos, we talked

2
00:00:02,238 --> 00:00:04,581
关于梯度下降算法
about the gradient descent algorithm

3
00:00:04,581 --> 00:00:06,005
梯度下降是很常用的算法 它不仅被用在线性回归上
and talked about the linear

4
00:00:06,005 --> 00:00:09,071
和线性回归模型、平方误差代价函数
regression model and the squared error cost function.

5
00:00:09,071 --> 00:00:10,822
在这段视频中  我们要
In this video, we're going to

6
00:00:10,822 --> 00:00:12,695
将梯度下降
put together gradient descent with

7
00:00:12,695 --> 00:00:14,672
和代价函数结合
our cost function, and that

8
00:00:14,672 --> 00:00:16,652
在后面的视频中 我们将用到此算法 并将其应用于
will give us an algorithm for

9
00:00:16,652 --> 00:00:19,431
具体的拟合直线的线性回归算法里
linear regression for fitting a straight line to our data.

10
00:00:21,001 --> 00:00:22,743
这就是
So, this is

11
00:00:22,743 --> 00:00:25,077
我们在之前的课程里所做的工作
what we worked out in the previous videos.

12
00:00:25,077 --> 00:00:27,095
这是梯度下降法
That's our gradient descent algorithm, which

13
00:00:27,095 --> 00:00:29,197
这个算法你应该很熟悉
should be familiar, and you

14
00:00:29,197 --> 00:00:31,199
这是线性回归模型
see the linear linear regression model

15
00:00:31,199 --> 00:00:36,461
还有线性假设和平方误差代价函数
with our linear hypothesis and our squared error cost function.

16
00:00:36,461 --> 00:00:38,612
我们将要做的就是
What we're going to do is apply

17
00:00:38,612 --> 00:00:42,288
用梯度下降的方法
gradient descent to minimize

18
00:00:44,519 --> 00:00:47,537
来最小化平方误差代价函数
our squared error cost function.

19
00:00:47,891 --> 00:00:49,381
为了
Now, in order to apply

20
00:00:49,381 --> 00:00:50,896
使梯度下降 为了
gradient descent, in order

21
00:00:50,896 --> 00:00:52,695
写这段代码
to write this piece of

22
00:00:52,695 --> 00:00:54,191
我们需要的关键项
code, the key term

23
00:00:54,191 --> 00:00:58,384
是这里这个微分项
we need is this derivative term over here.

24
00:00:59,692 --> 00:01:00,798
所以.我们需要弄清楚
So, we need to figure out

25
00:01:00,798 --> 00:01:02,830
这个偏导数项是什么
what is this partial derivative term,

26
00:01:02,830 --> 00:01:04,478
并结合这里的
and plug in the

27
00:01:04,478 --> 00:01:06,249
代价函数J
definition of the cost

28
00:01:06,249 --> 00:01:08,418
的定义
function J, this turns

29
00:01:08,418 --> 00:01:11,074
就是这样
out to be this "inaudible"

30
00:01:12,613 --> 00:01:15,156
一个求和项
equals sum 1 through M of

31
00:01:15,156 --> 00:01:18,856
代价函数就是
this squared error

32
00:01:20,456 --> 00:01:22,023
这个误差平方项
cost function term, and all

33
00:01:22,023 --> 00:01:23,794
我这样做 只是
I did here was I just

34
00:01:23,794 --> 00:01:25,538
把定义好的代价函数
you know plugged in the definition of

35
00:01:25,538 --> 00:01:28,275
插入了这个微分式
the cost function there, and simplifying

36
00:01:28,275 --> 00:01:30,563
再简化一下
little bit more, this turns

37
00:01:30,563 --> 00:01:34,136
这等于是
out to be equal to, this

38
00:01:34,136 --> 00:01:36,983
这一个求和项
"inaudible" equals sum 1 through M

39
00:01:36,983 --> 00:01:40,609
θ0 + θ1x(1) - y(i)
of tetha zero plus tetha one, XI

40
00:01:41,163 --> 00:01:43,427
θ0 + θ1x(1) - y(i)
minus YI squared.

41
00:01:43,427 --> 00:01:44,777
这一项其实就是
And all I did there was took

42
00:01:44,777 --> 00:01:46,983
我的假设的定义
the definition for my hypothesis

43
00:01:46,983 --> 00:01:49,376
然后把这一项放进去
and plug that in there.

44
00:01:49,622 --> 00:01:51,669
实际上我们需要
And it turns out we need

45
00:01:51,669 --> 00:01:52,523
弄清楚这两个
to figure out what is

46
00:01:52,523 --> 00:01:54,011
偏导数项是什么
the partial derivative of two

47
00:01:54,011 --> 00:01:55,284
这两项分别是 j=0
cases for J equals

48
00:01:55,284 --> 00:01:57,006
和j=1的情况
0 and for J equals 1 want

49
00:01:57,006 --> 00:01:58,547
因此我们要弄清楚
to figure out what is this

50
00:01:58,547 --> 00:02:00,767
θ0 和 θ1 对应的
partial derivative for both the

51
00:02:00,767 --> 00:02:04,115
偏导数项是什么
theta(0) case and the theta(1) case.

52
00:02:04,115 --> 00:02:07,012
我只把答案写出来
And I'm just going to write out the answers.

53
00:02:07,012 --> 00:02:10,996
事实上 第一项可简化为
It turns out this first term simplifies

54
00:02:10,996 --> 00:02:14,218
1 / m 乘以求和式
to 1/M, sum over

55
00:02:14,218 --> 00:02:16,446
对所有训练样本求和
my training set of just

56
00:02:16,446 --> 00:02:21,146
求和项是 h(x(i))-y(i)
that, X(i)-  Y(i).

57
00:02:21,146 --> 00:02:23,951
而这一项
And for this term, partial derivative

58
00:02:23,951 --> 00:02:26,186
对θ(1)的微分项
with respect to theta(1), it turns

59
00:02:26,186 --> 00:02:34,954
得到的是这样一项
out I get this term: -Y(i)<i>X(i).</i>

60
00:02:35,031 --> 00:02:36,187
对吧
Okay.

61
00:02:36,402 --> 00:02:38,690
计算出这些
And computing these partial

62
00:02:38,690 --> 00:02:40,761
偏导数项
derivatives, so going from

63
00:02:40,761 --> 00:02:43,406
从这个等式
this equation to either

64
00:02:43,406 --> 00:02:46,414
到下面的等式
of these equations down there, computing

65
00:02:46,414 --> 00:02:51,090
计算这些偏导数项需要一些多元微积分
those partial derivative terms requires some multivariate calculus.

66
00:02:51,090 --> 00:02:53,118
如果你掌握了微积分
If you know calculus, feel free

67
00:02:53,118 --> 00:02:54,825
你可以随便自己推导这些
to work through the derivations yourself

68
00:02:54,825 --> 00:02:57,060
然后你检查你的微分
and check take the derivatives

69
00:02:57,060 --> 00:02:59,853
你实际上会得到我给出的答案
you actually get the answers that I got.

70
00:02:59,853 --> 00:03:00,855
但如果你
But if you are less

71
00:03:00,855 --> 00:03:02,611
不太熟悉微积分
familiar with calculus you don't

72
00:03:02,611 --> 00:03:04,235
别担心
worry about it, and it

73
00:03:04,235 --> 00:03:06,260
你可以直接用这些
is fine to just take these equations

74
00:03:06,260 --> 00:03:08,025
已经算出来的结果
worked out, and you

75
00:03:08,025 --> 00:03:09,462
你不需要掌握微积分
won't need to know calculus or

76
00:03:09,462 --> 00:03:10,340
或者别的东西
anything like that in order to

77
00:03:10,340 --> 00:03:14,551
来完成作业 你只需要会用梯度下降就可以
do the homework, so to implement gradient descent you'd get that to work.

78
00:03:14,551 --> 00:03:16,520
在定义这些以后
But so, after these definitions,

79
00:03:16,520 --> 00:03:18,156
在我们算出
or after what we've worked

80
00:03:18,156 --> 00:03:19,993
这些微分项以后
out to be the derivatives, which

81
00:03:19,993 --> 00:03:21,316
这些微分项
is really just the slope of

82
00:03:21,316 --> 00:03:22,889
实际上就是代价函数J的斜率
the cost function J.  We

83
00:03:22,889 --> 00:03:24,724
现在可以将它们放回
can now plug them back into

84
00:03:24,724 --> 00:03:27,157
我们的梯度下降算法
our gradient descent algorithm.

85
00:03:27,157 --> 00:03:28,794
所以这就是专用于
So here's gradient descent, or

86
00:03:28,794 --> 00:03:30,309
线性回归的梯度下降
the regression, which is going

87
00:03:30,309 --> 00:03:32,971
反复执行括号中的式子直到收敛
to repeat until convergence, theta 0

88
00:03:32,971 --> 00:03:35,552
θ0和θ1不断被更新
and theta one get updated as,

89
00:03:35,552 --> 00:03:37,163
都是加上一个-α/m
you know, the same minus alpha

90
00:03:37,163 --> 00:03:39,591
乘上后面的求和项
times the derivative term.

91
00:03:39,591 --> 00:03:43,202
所以这里这一项
So, this term here.

92
00:03:43,202 --> 00:03:46,854
所以这就是我们的线性回归算法
So, here's our linear regression algorithm.

93
00:03:46,854 --> 00:03:52,696
这里的第一项
This first term here that

94
00:03:52,696 --> 00:03:54,495
当然
term is, of course, just

95
00:03:54,495 --> 00:03:56,071
这一项就是关于θ0的偏导数
a partial derivative of respective

96
00:03:56,071 --> 00:03:59,995
在上一张幻灯片中推出的
theta zero, that we worked on in the previous slide.

97
00:03:59,995 --> 00:04:03,454
而第二项
And this second term here,

98
00:04:03,454 --> 00:04:04,199
这一项是刚刚的推导出的
that term is just

99
00:04:04,199 --> 00:04:05,947
关于θ1的
a partial derivative with respect to

100
00:04:05,947 --> 00:04:11,188
偏导数项
theta one that we worked out on the previous line.

101
00:04:11,188 --> 00:04:13,582
提醒一下
And just as a quick reminder,

102
00:04:13,582 --> 00:04:15,485
执行梯度下降时
you must, when implementing gradient descent,

103
00:04:15,485 --> 00:04:17,067
有一个细节要注意
there's actually there's detail that, you

104
00:04:17,067 --> 00:04:19,248
就是必须要
know, you should be implementing

105
00:04:19,248 --> 00:04:24,452
同时更新θ0和θ1
it so the update theta zero and theta one simultaneously.

106
00:04:24,452 --> 00:04:28,270
所以 让我们来看看梯度下降是如何工作的
So, let's see how gradient descent works.

107
00:04:28,270 --> 00:04:29,447
我们用梯度下降解决问题的
One of the issues we solved

108
00:04:29,447 --> 00:04:32,839
一个原因是 它更容易得到局部最优值
gradient descent is that it can be susceptible to local optima.

109
00:04:32,839 --> 00:04:34,433
当我第一次解释梯度下降时
So, when I first explained gradient

110
00:04:34,449 --> 00:04:36,136
我展示过这幅图
descent, I showed you this picture

111
00:04:36,136 --> 00:04:37,724
在表面上
of it, you know, going downhill

112
00:04:37,724 --> 00:04:38,788
不断下降
on the surface and we

113
00:04:38,788 --> 00:04:40,120
并且我们知道了
saw how, depending on where

114
00:04:40,120 --> 00:04:42,872
根据你的初始化 你会得到不同的局部最优解
you're initializing, you can end up with different local optima.

115
00:04:42,872 --> 00:04:44,846
你知道.你可以结束了.在这里或这里。
You know, you can end up here or here.

116
00:04:44,846 --> 00:04:46,958
但是 事实证明
But, it turns out that

117
00:04:46,958 --> 00:04:49,025
用于线性回归的
the cost function for gradient

118
00:04:49,025 --> 00:04:50,791
代价函数
of cost function for linear regression

119
00:04:50,791 --> 00:04:52,754
总是这样一个
is always going to be

120
00:04:52,754 --> 00:04:55,305
弓形的样子
a bow-shaped function like this.

121
00:04:55,305 --> 00:04:57,573
这个函数的专业术语是
The technical term for this

122
00:04:57,573 --> 00:05:01,163
这是一个凸函数
is that this is called a convex function.

123
00:05:02,825 --> 00:05:04,920
我不打算在这门课中
And I'm not going

124
00:05:04,920 --> 00:05:06,561
给出凸函数的定义
to give the formal definition for what

125
00:05:06,561 --> 00:05:09,557
凸函数(convex function)
is a convex function, c-o-n-v-e-x, but

126
00:05:09,557 --> 00:05:11,310
但不正式的说法是
informally a convex function

127
00:05:11,310 --> 00:05:14,793
它就是一个弓形的函数
means a bow-shaped function, you know, kind of like a bow shaped.

128
00:05:14,793 --> 00:05:18,006
因此 这个函数
And so, this function doesn't

129
00:05:18,006 --> 00:05:19,738
没有任何局部最优解
have any local optima, except

130
00:05:19,738 --> 00:05:22,433
只有一个全局最优解
for the one global optimum.

131
00:05:22,433 --> 00:05:24,265
并且无论什么时候
And does gradient descent on

132
00:05:24,265 --> 00:05:25,590
你对这种代价函数
this type of cost function which

133
00:05:25,590 --> 00:05:27,695
使用线性回归
you get whenever you're using linear

134
00:05:27,695 --> 00:05:29,201
梯度下降法得到的结果
regression, it will always convert

135
00:05:29,201 --> 00:05:33,623
总是收敛到全局最优值 因为没有全局最优以外的其他局部最优点
to the global optimum, because there are no other local optima other than global optimum.

136
00:05:33,623 --> 00:05:37,272
现在 让我们来看看这个算法的执行过程
So now, let's see this algorithm in action.

137
00:05:38,026 --> 00:05:40,085
像往常一样
As usual, here are plots of

138
00:05:40,085 --> 00:05:42,182
这是假设函数的图
the hypothesis function and of

139
00:05:42,182 --> 00:05:45,024
还有代价函数J的图
my cost function J.

140
00:05:45,763 --> 00:05:47,011
让我们来看看如何
And so, let's see how

141
00:05:47,011 --> 00:05:49,687
初始化参数的值
to initialize my parameters at this value.

142
00:05:49,687 --> 00:05:51,652
通常来说
You know, let's say, usually you

143
00:05:51,652 --> 00:05:53,590
初始化参数为零
initialize your parameters at zero

144
00:05:53,590 --> 00:05:56,287
θ0和θ1都在零
for zero, theta zero and zero.

145
00:05:56,287 --> 00:05:58,331
但为了展示需要
For illustration in this

146
00:05:58,331 --> 00:05:59,948
在这个梯度下降的实现中
specific presentation, I have

147
00:05:59,948 --> 00:06:02,615
我把θ0初始化为-900
initialised theta zero at

148
00:06:02,615 --> 00:06:06,831
θ1初始化为-0.1
about 900, and theta one at about minus 0.1, okay?

149
00:06:06,831 --> 00:06:09,791
这对应的假设
And so, this corresponds to H

150
00:06:09,791 --> 00:06:12,022
就应该是这样
over X, equals, you know,

151
00:06:12,022 --> 00:06:15,859
h(x)是等于-900减0.1x
minus 900 minus 0.1 x

152
00:06:15,859 --> 00:06:19,341
这对应我们的代价函数
is this line, so out here on the cost function.

153
00:06:19,341 --> 00:06:20,491
现在 如果我们进行
Now if we take one

154
00:06:20,491 --> 00:06:22,163
一次梯度下降
step of gradient descent, we end

155
00:06:22,163 --> 00:06:24,298
从这个点开始
up going from this point

156
00:06:24,298 --> 00:06:27,065
在这里.一点点
out here, a little

157
00:06:27,065 --> 00:06:29,564
向左下方移动了一小步
bit to the down left

158
00:06:29,564 --> 00:06:31,732
这就得到了第二个点
to that second point over there.

159
00:06:31,732 --> 00:06:35,279
而且你注意到这条线改变了一点点
And, you notice that my line changed a little bit.

160
00:06:35,279 --> 00:06:36,547
然后我再进行
And, as I take another step

161
00:06:36,547 --> 00:06:41,425
一步梯度下降 左边这条线又变一点
at gradient descent, my line on the left will change.

162
00:06:41,425 --> 00:06:42,376
对吧
Right.

163
00:06:42,376 --> 00:06:43,804
同样地
And I have also

164
00:06:43,804 --> 00:06:47,544
我又移到代价函数上的另一个点
moved to a new point on my cost function.

165
00:06:47,544 --> 00:06:48,745
再进行一步梯度下降
And as I think further step

166
00:06:48,745 --> 00:06:50,697
我觉得我的代价项
is gradient descent, I'm going

167
00:06:50,697 --> 00:06:53,058
应该开始下降了
down in cost, right, so

168
00:06:53,058 --> 00:06:55,079
所以我的参数是
my parameter is following

169
00:06:55,079 --> 00:06:58,062
跟随着这个轨迹
this trajectory, and if

170
00:06:58,062 --> 00:07:00,368
再看左边这个图
you look on the left, this corresponds

171
00:07:00,368 --> 00:07:04,025
这个表示的是假设函数h(x)
to hypotheses that seem

172
00:07:04,025 --> 00:07:04,912
它变得好像
to be getting to be

173
00:07:04,912 --> 00:07:06,429
越来越拟合数据
better and better fits for the

174
00:07:06,429 --> 00:07:09,987
直到它渐渐地
data until eventually,

175
00:07:09,987 --> 00:07:12,663
收敛到全局最小值
I have now wound up at the global minimum.

176
00:07:12,663 --> 00:07:16,189
这个全局最小值
And this global minimum corresponds to

177
00:07:16,189 --> 00:07:20,506
对应的假设函数 给出了最拟合数据的解
this hypothesis, which gives me a good fit to the data.

178
00:07:20,922 --> 00:07:23,605
这就是梯度下降法
And so that's gradient

179
00:07:23,605 --> 00:07:24,969
我们刚刚运行了一遍
descent, and we've just run

180
00:07:24,969 --> 00:07:27,302
并且最终得到了
it and gotten a good

181
00:07:27,302 --> 00:07:31,359
房价数据的最好拟合结果
fit to my data set of housing prices.

182
00:07:31,359 --> 00:07:34,108
现在你可以用它来预测
And you can now use it to predict.

183
00:07:34,108 --> 00:07:35,284
比如说 假如你有个朋友
You know, if your friend has a

184
00:07:35,284 --> 00:07:36,452
他有一套房子
house with a

185
00:07:36,452 --> 00:07:39,116
面积1250平方英尺(约116平米)
size 1250 square feet, you

186
00:07:39,116 --> 00:07:40,684
现在你可以通过这个数据
can now read off the value

187
00:07:40,684 --> 00:07:42,090
然后告诉他们
and tell them that, I don't

188
00:07:42,090 --> 00:07:43,188
也许他的房子
know, maybe they can get

189
00:07:43,188 --> 00:07:47,159
可以卖到35万美元
$350,000 for their house.

190
00:07:48,606 --> 00:07:49,982
最后 我想再给出
Finally, just to give

191
00:07:49,982 --> 00:07:51,930
另一个名字
this another name, it turns out

192
00:07:51,930 --> 00:07:52,991
实际上 我们
that the algorithm that we

193
00:07:52,991 --> 00:07:55,030
刚刚使用的算法
just went over is sometimes

194
00:07:55,030 --> 00:07:57,074
有时也称为批量梯度下降
called batch gradient descent.

195
00:07:57,074 --> 00:07:58,781
实际上 在机器学习中
And it turns out in machine

196
00:07:58,781 --> 00:08:00,203
我们这些搞机器学习的人
learning, I feel like us machine

197
00:08:00,203 --> 00:08:02,050
通常不太会
learning people, we're not always

198
00:08:02,050 --> 00:08:04,381
给算法起名字
created has given me some algorithms.

199
00:08:04,381 --> 00:08:06,634
但这个名字"批量梯度下降"
But the term batch gradient descent

200
00:08:06,634 --> 00:08:08,212
指的是
means that refers to the

201
00:08:08,212 --> 00:08:09,551
在梯度下降的
fact that, in every step

202
00:08:09,551 --> 00:08:11,164
每一步中
of gradient descent we're looking

203
00:08:11,164 --> 00:08:13,649
我们都用到了所有的训​​练样本
at all of the training examples.

204
00:08:13,649 --> 00:08:15,783
在梯度下降中
So, in gradient descent, you

205
00:08:15,783 --> 00:08:18,875
在计算微分求导项时
know, when computing derivatives, we're computing

206
00:08:18,875 --> 00:08:21,307
我们需要进行求和运算
these sums, this sum of.

207
00:08:21,307 --> 00:08:22,210
所以 在每一个单独的
So, in every separate

208
00:08:22,210 --> 00:08:23,510
梯度下降中 我们最终
gradient descent, we end up

209
00:08:23,510 --> 00:08:25,278
都要计算这样一个东西
computing something like this, that

210
00:08:25,278 --> 00:08:28,434
这个项需要对所有m个训练样本求和
sums over our M training examples.

211
00:08:28,434 --> 00:08:29,835
因此 批量梯度下降法
And so the term batch gradient

212
00:08:29,835 --> 00:08:31,195
这个名字 说明了
descent refers to the fact

213
00:08:31,195 --> 00:08:33,193
我们需要考虑
when looking at the entire batch

214
00:08:33,193 --> 00:08:34,559
所有这一"批"训练样本
of training examples, and again,

215
00:08:34,559 --> 00:08:35,742
这确实不是一个
this is really, really not

216
00:08:35,742 --> 00:08:36,915
很好听的名字
a great name, but this is

217
00:08:36,915 --> 00:08:39,444
但是搞机器学习的人就是这么称呼的
what Machine Learning people call it.

218
00:08:39,444 --> 00:08:40,958
而事实上
And it turns out there are

219
00:08:40,958 --> 00:08:42,665
有时也有其他类型的
sometimes other versions of

220
00:08:42,665 --> 00:08:43,918
梯度下降法
gradient descent that are not

221
00:08:43,918 --> 00:08:45,969
不是这种"批量"型的
batch versions but instead do

222
00:08:45,969 --> 00:08:47,752
不考虑整个的训练集
not look at the entire traning set

223
00:08:47,752 --> 00:08:49,722
而是每次只关注
but look at small subsets

224
00:08:49,722 --> 00:08:51,529
训练集中的一些小的子集
of the training sets at the time,

225
00:08:51,529 --> 00:08:54,874
在后面的课程中 我们也将介绍这些方法
and we'll talk about those versions later in this course as well.

226
00:08:54,874 --> 00:08:56,195
但就目前而言
But for now, using the algorithm

227
00:08:56,195 --> 00:08:57,410
应用刚刚学到的算法
you just learned, now we're

228
00:08:57,410 --> 00:08:58,759
你应该已经掌握了批量梯度算法
using batch gradient descent, you

229
00:08:58,759 --> 00:09:01,159
并且能把它应用到
now know how to implement

230
00:09:01,159 --> 00:09:04,149
线性回归中了
gradient descent, or linear regression.

231
00:09:05,856 --> 00:09:08,672
这就是用于线性回归的梯度下降法
So that's linear regression with gradient descent.

232
00:09:09,349 --> 00:09:11,747
如果你之前学过线性代数
If you've seen advanced linear algebra

233
00:09:11,747 --> 00:09:12,672
有些同学之前
before so some you may

234
00:09:12,672 --> 00:09:14,206
可能已经学过
have taken a class with advanced

235
00:09:14,206 --> 00:09:16,279
高等线性代数
linear algebra, you might

236
00:09:16,295 --> 00:09:17,678
你应该知道
know that there exists a solution

237
00:09:17,678 --> 00:09:19,754
有一种计算
for numerically solving for the

238
00:09:19,754 --> 00:09:20,914
代价函数J最小值的
minimum of the cost function

239
00:09:20,914 --> 00:09:22,561
数值解法
J, without needing to

240
00:09:22,561 --> 00:09:25,604
不需要梯度下降这种迭代算法
use and iterative algorithm like gradient descent.

241
00:09:25,604 --> 00:09:27,154
在后面的课程中
Later in this course we will

242
00:09:27,154 --> 00:09:28,098
我们也会谈到这个方法
talk about that method as

243
00:09:28,098 --> 00:09:29,753
它可以在不需要
well that just solves for the

244
00:09:29,753 --> 00:09:31,076
多步梯度下降的情况下
minimum cost function J without

245
00:09:31,076 --> 00:09:33,791
也能解出代价函数J的最小值
needing this multiple steps of gradient descent.

246
00:09:33,791 --> 00:09:37,656
这是另一种称为正规方程(normal equations)的方法
That other method is called normal equations methods.

247
00:09:37,656 --> 00:09:39,167
可能你之前已经
And, but in case you

248
00:09:39,167 --> 00:09:40,141
听说过这种方法
have heard of that method, it turns

249
00:09:40,141 --> 00:09:41,622
实际上在数据量较大的情况下
out gradient descent will

250
00:09:41,622 --> 00:09:43,774
梯度下降法比正规方程
scale better to larger data

251
00:09:43,774 --> 00:09:45,008
要更适用一些
sets than that normal equals

252
00:09:45,008 --> 00:09:47,315
现在我们已经
method and, now that

253
00:09:47,315 --> 00:09:48,756
掌握了梯度下降
we know about gradient descent, we'll

254
00:09:48,756 --> 00:09:49,922
我们可以在不同的环境中
be able to use it in

255
00:09:49,922 --> 00:09:51,387
使用梯度下降法
lots of different contexts, and we'll

256
00:09:51,387 --> 00:09:54,849
我们还将在不同的机器学习问题中大量地使用它
use it in lots of different Machine Learning problems as well.

257
00:09:54,849 --> 00:09:57,138
所以 祝贺大家成功
So, congrats on learning

258
00:09:57,138 --> 00:10:00,317
学会你的第一个机器学习算法
about your first Machine Learning algorithm.

259
00:10:00,317 --> 00:10:02,508
我们稍后将有一些练习
We'll later have exercises in

260
00:10:02,508 --> 00:10:03,659
这些练习会要求你
which we'll ask you to

261
00:10:03,659 --> 00:10:05,068
实现梯度下降
implement gradient descent and

262
00:10:05,068 --> 00:10:07,071
希望大家能让这些算法真正地为你工作
hopefully see these algorithms work for yourselves.

263
00:10:07,071 --> 00:10:08,955
但在此之前
But before that I first

264
00:10:08,955 --> 00:10:10,587
我还想先
want to tell you in

265
00:10:10,587 --> 00:10:11,919
在下一组视频中
the next set of videos, the

266
00:10:11,919 --> 00:10:13,223
告诉你
first want to tell you about

267
00:10:13,223 --> 00:10:15,724
泛化的梯度下降
a generalization of the gradient descent

268
00:10:15,724 --> 00:10:16,935
算法 这将使
algorithm that will make

269
00:10:16,935 --> 00:10:18,403
梯度下降更加强大
it much more powerful and I

270
00:10:18,403 --> 00:10:28,403
在下一段视频中我将介绍这一问题
guess I will tell you about that in the next video.