-
Notifications
You must be signed in to change notification settings - Fork 0
/
ito_solutions.html
248 lines (243 loc) · 12.4 KB
/
ito_solutions.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
<html><head><title>niplav</title>
<link href="./favicon.png" rel="shortcut icon" type="image/png"/>
<link href="main.css" rel="stylesheet" type="text/css"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<!DOCTYPE HTML>
<style type="text/css">
code.has-jax {font: inherit; font-size: 100%; background: inherit; border: inherit;}
</style>
<script async="" src="./mathjax/latest.js?config=TeX-MML-AM_CHTML" type="text/javascript">
</script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
extensions: ["tex2jax.js"],
jax: ["input/TeX", "output/HTML-CSS"],
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
processEscapes: true,
skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
},
"HTML-CSS": { availableFonts: ["TeX"] }
});
</script>
<script>
document.addEventListener('DOMContentLoaded', function () {
// Change the title to the h1 header
var title = document.querySelector('h1')
if(title) {
var title_elem = document.querySelector('title')
title_elem.textContent=title.textContent + " – niplav"
}
});
</script>
</head><body><h2 id="home"><a href="./index.html">home</a></h2>
<p><em>author: niplav, created: 2022-10-19, modified: 2022-11-25, language: english, status: in progress, importance: 2, confidence: likely</em></p>
<blockquote>
<p><strong>Solutions to the textbook “Introduction to Optimization”.</strong></p>
</blockquote><div class="toc"><div class="toc-title">Contents</div><ul><li><a href="#401">4.0.1</a><ul></ul></li><li><a href="#411">4.11</a><ul><li><a href="#i">(i)</a><ul></ul></li><li><a href="#ii">(ii)</a><ul></ul></li><li><a href="#iii">(iii)</a><ul></ul></li><li><a href="#iv">iv)</a><ul></ul></li></ul></li><li><a href="#431">4.3.1</a><ul></ul></li><li><a href="#433">4.3.3</a><ul><li><a href="#i_1">(i)</a><ul></ul></li><li><a href="#ii_1">(ii)</a><ul></ul></li></ul></li><li><a href="#441">4.4.1</a><ul><li><a href="#i_2">(i)</a><ul></ul></li><li><a href="#ii_2">(ii)</a><ul></ul></li><li><a href="#iii_1">(iii)</a><ul></ul></li><li><a href="#iv_1">(iv)</a><ul></ul></li></ul></li></ul></div>
<h1 id="Solutions_to_Introduction_to_Optimization"><a class="hanchor" href="#Solutions_to_Introduction_to_Optimization">Solutions to “Introduction to Optimization”</a></h1>
<h3 id="401"><a class="hanchor" href="#401">4.0.1</a></h3>
<p>See <a href="./mfis_solutions.html#24">here</a>.</p>
<h3 id="411"><a class="hanchor" href="#411">4.11</a></h3>
<h4 id="i"><a class="hanchor" href="#i">(i)</a></h4>
<p>Some standard numpy code:</p>
<pre><code>import numpy as np
def givec(x, c=10):
n=len(x)
iexp=np.array(range(0,n))
cdiag=c**(iexp/((n-1)*np.ones(n)))
C=np.diag(cdiag)
return C
def fsq(x, c=10):
C=givec(x, c=c)
return x.T@C@x
def fhole(x, c=10, a=0.1):
retval=fsq(x, c=c)
return retval/(a**2+retval)
</code></pre>
<p>The derivative of <code>$f_{\text{sq}}=x^{\top}Cx$</code> is <code>$(C+C^{\top})x$</code>
(as per identity 81 from the <a href="https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf">Matrix
Cookbook</a>).</p>
<p>Which is excellent, since the identity 2.3 of Maths for Intelligent
Systems (Toussaint 2022) gives
<code>$\frac{\partial}{\partial x} x^{\top}Cx=x^{\top}C \frac{\partial}{\partial x}x+x^{\top} C^{\top} \frac{\partial}{\partial x} x$</code>
which would be <code>$x^{\top}C+x^{\top}C^{\top}=C^{\top}x+Cx=(C^{\top}+C)x$</code>
and therefore the same as above (and in identity 97 from the Matrix
Cookbook).</p>
<p>So then</p>
<pre><code>def fsqgrad(x, c=10):
C=givec(x, c=c)
return (C.T+C)@x
</code></pre>
<p>Since <code>$f_{\text{sq}}$</code> and <code>$f_{\text{hole}}$</code> return a scalar, one can
use a simple <a href="https://en.wikipedia.org/wiki/Quotient_rule">quotient rule</a>:</p>
<div>
$$f_{\text{hole}}(x)=\\
\frac{(C^{\top}+C)x\cdot(a^2+x^{\top}Cx)-x^{\top}Cx\cdot (C^{\top}+C)x)}{(a^2+x^{\top}Cx)^2}= \\
\frac{(C^{\top}+C)x\cdot(a^2+x^{\top}Cx-x^{\top}Cx)}{(a^2+x^{\top}Cx)^2}= \\
\frac{a^2 \cdot (C^{\top}+C)x}{(a^2+x^{\top}Cx)^2}$$
</div>
<p>The implementation is straightforwardly</p>
<pre><code>def fholegrad(x, c=10, a=0.1):
retval_fsq=fsq(x, c=c)
retval_fsqgrad=fsqgrad(x, c=c, a=a)
return (retval_fsqgrad*(a**2+retval_fsq)-retval_fsq*retval_fsqgrad)/((a**2+retval_fsq))**2
</code></pre>
<h4 id="ii"><a class="hanchor" href="#ii">(ii)</a></h4>
<p>The function can be pretty directly translated from the pseudocode in
the book to Python (no surprise here):</p>
<pre><code>def graddesc(f, fgrad, alpha=0.001, theta=10e-20, x0=[1,1], c=10, a=0.1):
x=np.array(x0)
i=0
prevcost=-math.inf
curcost=f(x)
while abs(prevcost-curcost)>theta:
print("#iteration: ", i)
print("current cost: ", curcost)
prevcost=curcost
x=x-alpha*fgrad(x)
curcost=f(x)
i=i+1
print("solution: ", x)
print("iterations: ", i)
print("cost: ", f(x))
return x
</code></pre>
<p>Executing <code>graddesc</code> with <code>fsq</code>/<code>fsqgrad</code> and <code>fhole</code>/<code>fholegrad</code> gives
~1000 iterations for finding a local minimum with <code>fsq</code>, and takes ~185k
iterations for <code>fhole</code>. But it finds the minimum:</p>
<pre><code>print("fsq:")
graddesc(fsq, fsqgrad)
print("fhole:")
graddesc(fhole, fholegrad)
fsq:
solution: [4.98354923e-09 1.65118900e-84]
iterations: 9549
cost: 2.4835762963261225e-17
fhole:
solution: [ 4.13240000e-11 -7.98149419e-19]
iterations: 184068
cost: 1.7076729729684466e-19
</code></pre>
<h4 id="iii"><a class="hanchor" href="#iii">(iii)</a></h4>
<pre><code>def backtrack_graddesc(f, fgrad, alpha=0.05, theta=10e-15, x0=[1,1], c=10, a=0.1, rho_ls=0.01, rho_plus=1.2, rho_minus=0.5, delta_max=math.inf):
x=np.array(x0)
i=0
prevcost=-math.inf
curcost=f(x)
delta=-fgrad(x)/np.linalg.norm(fgrad(x))
while np.linalg.norm(alpha*delta)>=theta:
while f(x+alpha*delta)>f(x)+rho_ls*fgrad(x)@(alpha*delta).T:
alpha=rho_minus*alpha
prevcost=curcost
x=x+alpha*delta
alpha=min(rho_plus*alpha, delta_max)
curcost=f(x)
i=i+1
return x
</code></pre>
<h4 id="iv"><a class="hanchor" href="#iv">iv)</a></h4>
<p>First we compute the derivative of <code>$\underset{β}{\text{argmin}}||y-Xβ||^2+λ||β||^2$</code>:</p>
<div>
$$\frac{\partial ||y-Xβ||^2+λ||β||^2}{\partial β}= \\
\frac{\partial ||y-Xβ||^2}{\partial β} + \frac{\partial λ||β||^2}{\partial β}= \\
\frac{\partial ||-Xβ-(-y)||^2}{\partial β}+λ2β=\\
2 \cdot \frac{-Xβ-(-y)}{||-Xβ-(-y)||}+λ2β$$
</div>
<h3 id="431"><a class="hanchor" href="#431">4.3.1</a></h3>
<div>
</div>
<h3 id="433"><a class="hanchor" href="#433">4.3.3</a></h3>
<div>
$$f(x)=\frac{1}{2}x^{\top} A x+b^{\top}+c=\\
\frac{1}{2} x^{\top} A x=\\
\frac{1}{2} x^{\top} \left [\matrix{2 & 0 \cr 0 & 1}\right ] x$$
</div>
<p>(since <code>$b=\mathbf{0}$</code> and <code>$c=0$</code>). So the negative gradient is</p>
<div>
$$-\nabla f(x)= \\
\frac{1}{2} x^{\top} (A+A^{\top})=\\
\frac{1}{2} x^{\top} \left [\matrix{4 & 0 \cr 0 & 2}\right ]$$
</div>
<h4 id="i_1"><a class="hanchor" href="#i_1">(i)</a></h4>
<p><code>$x_0=\left [ \matrix{1 \cr 1} \right ]$</code>, so</p>
<div>
$$δ_0=\\
-\nabla f(x_0)=\\
-\frac{1}{2} \left [ \matrix{1 & 1} \right ] \left [\matrix{4 & 0 \cr 0 & 2}\right ]=\\
-\left [ \matrix{2 & 1} \right ]$$
</div>
<p>Then</p>
<div>
$$f(x+αδ)=\\
\frac{1}{2} (\left [ \matrix{1 \cr 1} \right]-α\left [ \matrix{2 \cr 1} \right ])^{\top}\left [\matrix{2 & 0 \cr 0 & 1}\right ](\left [ \matrix{1 \cr 1} \right]-α\left [ \matrix{2 \cr 1} \right ])=\\
\frac{1}{2} (\left [ \matrix{1 \cr 1} \right]-α\left [ \matrix{2 \cr 1} \right ])^{\top}(\left [ \matrix{2 \cr 1} \right]-α\left [ \matrix{4 \cr 1} \right ])=\\
\frac{1}{2} (\left [ \matrix{1 & 1} \right]-α\left [ \matrix{2 & 1} \right ])(\left [ \matrix{2 \cr 1} \right]-α\left [ \matrix{4 \cr 1} \right ])=\\
\frac{1}{2}(3-10\cdot α+9 \cdot α^2)=\\
1.5-5\cdot α+9 \cdot α^2$$
</div>
<p>The derivative of that is <code>$5+18 \cdot α$</code>. Setting it to zero gives
<code>$α=\frac{5}{18}$</code>.</p>
<p>The next <code>$x$</code> then is (going by the values we computed)
<code>$\left [ \matrix{0.\overline{4} \cr 0.7\overline{2}} \right ]$</code>.</p>
<p>The new gradient then is</p>
<div>
$$g=\\
-\nabla f(x)=\\
-\frac{1}{2} \left [ \matrix{0.\overline{4} & 0.7\overline{2}} \right ] \left [\matrix{2 & 0 \cr 0 & 1}\right ]=\\
\left [ \matrix{-0.\overline{4} & -0.36\overline{1}} \right ]$$
</div>
<p>Now computing <code>$β$</code>:</p>
<div>
$$\frac{g^{\top}(g-g')}{g'^{\top}g'}=\\
\frac{\left [ \matrix{-0.\overline{4} & -0.36\overline{1}} \right ]^{\top}(\left [ \matrix{-0.\overline{4} & -0.36\overline{1}} \right ]-\left [ \matrix{-2 & -1} \right ])}{\left [ \matrix{-2 & -1} \right ]^{\top}\left [ \matrix{-2 & -1} \right ]}\approx \\
-0.1844$$
</div>
<p>which is smaller than zero, so our new <code>$β$</code> is <code>$0$</code>.</p>
<p>Then <code>$δ$</code> is <code>$\left [ \matrix{-0.\overline{4} & -0.36\overline{1}} \right ]$</code>.</p>
<p>We now go through the whole exercise a second time:</p>
<div>
$$f(x+αδ)=\\
\frac{1}{2} (\left [ \matrix{0.\overline{4} \cr 0.7\overline{2}} \right ]-α\left [ \matrix{-1.8765 & -1.4938} \right ])^{\top}\left [\matrix{2 & 0 \cr 0 & 1}\right ](\left [ \matrix{0.\overline{4} \cr 0.7\overline{2}} \right ]-α\left [ \matrix{-1.8765 & -1.4938} \right ])=\\
(\frac{1}{2} (\left [ \matrix{0.\overline{4} \cr 0.7\overline{2}} \right ]-α\left [ \matrix{-1.8765 & -1.4938} \right ])^{\top}\left [ \matrix{0.\overline{8} \cr 0.7\overline{2}} \right]-α\left [ \matrix{-0.\overline{8} \cr -0.36\overline{1}} \right ]) \approx \\
0.458\overline{3}-0.6559\cdot α+0.2627 \cdot α^2$$
</div>
<p>Then the derivative is approximately <code>$-0.6559+0.5255 \cdot α$</code>, and
the new <code>$α$</code> is the solution to that, namely <code>$\approx 1.248$</code>.</p>
<p>Then the new <code>$x$</code> is <code>$\approx \left [ \matrix{-0.1103 & 0.2715} \right ]$</code>.
The new <code>$g$</code> is <code>$\approx \left [ \matrix{0.11029 & -0.1358} \right ]$</code>, and
<code>$β\approx 0.0933$</code>, which is this time greater than zero so we can keep it.</p>
<p>The new <code>$δ$</code> therefore is <code>$\approx \left [ \matrix{0.0688 & -0.1694} \right ]$</code>.</p>
<p>…And we're done with the second round. I'm not going to do this for
<code>$x_0=(-1,2)$</code>, in case you were wondering, I hope I've shown that I <em>can</em>
do this, and it's tedious to type out.</p>
<!--TODO: maybe do other ones too?-->
<h4 id="ii_1"><a class="hanchor" href="#ii_1">(ii)</a></h4>
<p>Intuitively, this makes sense: We walked all the way to the minimum in
the direction of the current gradient, so the next direction should not
go "back" in the direction where we came from, or further than we needed
to go.</p>
<!--TODO: actually prove this-->
<h3 id="441"><a class="hanchor" href="#441">4.4.1</a></h3>
<p>Not gonna show you the sketch, of course. (I also don't have time right
now to learn <a href="https://manim.community">manim</a> for that, though it would
be cool).</p>
<h4 id="i_2"><a class="hanchor" href="#i_2">(i)</a></h4>
<p>The solution is <code>$\frac{-π}{2}$</code>.</p>
<h4 id="ii_2"><a class="hanchor" href="#ii_2">(ii)</a></h4>
<p>The solution is
<code>$\left [ \matrix{-\frac{\sqrt{2}}{2} \cr -\frac{\sqrt{2}}{2}} \right ]$</code>.</p>
<h4 id="iii_1"><a class="hanchor" href="#iii_1">(iii)</a></h4>
<p>Since the gradient is "pulling" to the bottom left, the best one can do
is to set <code>$x_1$</code> to zero. So
<code>$x=\left [ \matrix{-\frac{\sqrt{2}}{2} \cr -\frac{\sqrt{2}}{2}} \right ]$</code>.</p>
<h4 id="iv_1"><a class="hanchor" href="#iv_1">(iv)</a></h4>
<p>The second constraint can be pictured the following way: On the <code>$x_2$</code>
axis there is a parabola forming a paraboloid cylinder which is zero where
<code>$x_2$</code> is zero, and grows symmetrically in both other directions. At
<code>$x_2≥1$</code> and <code>$x_2≤-1$</code> this cylinder is greater than <code>$x_1$</code>,
which we want to avoid. So we know that <code>$x_2=-1$</code>.</p>
<p>But what is <code>$x_1$</code>? It should be as small as possible. Plugging in
<code>$x_1=0$</code> tells us that this is the best we can do.</p>
</body></html>