-
Notifications
You must be signed in to change notification settings - Fork 0
/
mfis_solutions.html
166 lines (164 loc) · 7.69 KB
/
mfis_solutions.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
<html><head><title>niplav</title>
<link href="./favicon.png" rel="shortcut icon" type="image/png"/>
<link href="main.css" rel="stylesheet" type="text/css"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<!DOCTYPE HTML>
<style type="text/css">
code.has-jax {font: inherit; font-size: 100%; background: inherit; border: inherit;}
</style>
<script async="" src="./mathjax/latest.js?config=TeX-MML-AM_CHTML" type="text/javascript">
</script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
extensions: ["tex2jax.js"],
jax: ["input/TeX", "output/HTML-CSS"],
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
processEscapes: true,
skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
},
"HTML-CSS": { availableFonts: ["TeX"] }
});
</script>
<script>
document.addEventListener('DOMContentLoaded', function () {
// Change the title to the h1 header
var title = document.querySelector('h1')
if(title) {
var title_elem = document.querySelector('title')
title_elem.textContent=title.textContent + " – niplav"
}
});
</script>
</head><body><h2 id="home"><a href="./index.html">home</a></h2>
<p><em>author: niplav, created: 2022-10-19, modified: 2022-12-20, language: english, status: in progress, importance: 2, confidence: likely</em></p>
<blockquote>
<p><strong>Solutions to the textbook “Maths for Intelligent Systems”.</strong></p>
</blockquote><div class="toc"><div class="toc-title">Contents</div><ul><li><a href="#Chapter_2">Chapter 2</a><ul><li><a href="#Stray_NonExercise_1">Stray Non-Exercise 1</a><ul></ul></li><li><a href="#24">2.4</a><ul><li><a href="#i">(i)</a><ul></ul></li><li><a href="#ii">(ii)</a><ul></ul></li><li><a href="#iii">(iii)</a><ul></ul></li><li><a href="#iv">(iv)</a><ul></ul></li><li><a href="#v">(v)</a><ul></ul></li><li><a href="#vi">(vi)</a><ul></ul></li></ul></li><li><a href="#261">2.6.1</a><ul></ul></li><li><a href="#262">2.6.2</a><ul></ul></li><li><a href="#263">2.6.3</a><ul><li><a href="#i_1">i)</a><ul></ul></li><li><a href="#ii_1">ii)</a><ul></ul></li></ul></li><li><a href="#264">2.6.4</a><ul></ul></li></ul></li></ul></div>
<h1 id="Solutions_to_Maths_for_Intelligent_Systems"><a class="hanchor" href="#Solutions_to_Maths_for_Intelligent_Systems">Solutions to “Maths for Intelligent Systems”</a></h1>
<h2 id="Chapter_2"><a class="hanchor" href="#Chapter_2">Chapter 2</a></h2>
<h3 id="Stray_NonExercise_1"><a class="hanchor" href="#Stray_NonExercise_1">Stray Non-Exercise 1</a></h3>
<blockquote>
<p>Let me start with an example: We have three real-valued quantities <code>$x,
g$</code> and <code>$f$</code> which depend on each other. Specifically, $f(x,g)=3x+2g$
and <code>$g(x)=2x$</code>.<br/>
Question: What is the “derivative of <code>$f$</code> w.r.t. <code>$x$</code>”?</p>
</blockquote>
<p>Intuitively, I'd say that <code>$\frac{\partial}{\partial x}f(x,g)=3$</code>. But then I notice that <code>$g$</code>
is allegedly a "real-valued quantity", what is that supposed to mean? Is
it not a function?</p>
<p>Alas, plugging in <code>$g$</code> into <code>$f$</code> gives <code>$f(x)=3x+2(2x)$</code> and
<code>$\frac{\partial}{\partial x}f(x)=3+4=7$</code>.</p>
<h3 id="24"><a class="hanchor" href="#24">2.4</a></h3>
<h4 id="i"><a class="hanchor" href="#i">(i)</a></h4>
<div>
$$XA+A^{\top}=\mathbf{I} \Leftrightarrow \\
XA=\mathbf{I}-A^{\top} \Leftrightarrow \\
X=(\mathbf{I}-A^{\top})A^{-1}$$
</div>
<h4 id="ii"><a class="hanchor" href="#ii">(ii)</a></h4>
<div>
$$ X^{\top}C=(2A(X+B))^{\top} \Leftrightarrow \\
X^{\top}C=(2AX)^{\top}+(2AB)^{\top} \Leftrightarrow \\
X^{\top}C-X^{\top}(2A)^{\top}=(2AB)^{\top} \Leftrightarrow \\
X^{\top}(C-(2A)^{\top})=(2AB)^{\top} \\
X^{\top}=(2AB)^{\top} (C^{-1}-((2A)^{\top})^{-1}) \Leftrightarrow \\
X=((C^{-1})^{\top}-(2A)^{-1}) 2AB \Leftrightarrow \\
X=(C^{-1})^{\top}2AB-B $$
</div>
<h4 id="iii"><a class="hanchor" href="#iii">(iii)</a></h4>
<div>
$$(Ax-y)^{\top}A=\mathbf{0}_n^{\top} \Leftrightarrow \\
A^{\top}(Ax-y)=\mathbf{0}_n^{\top} \Leftrightarrow \\
A^{\top}Ax -A^{\top}y=\mathbf{0}_n^{\top} \Leftrightarrow \\
x=(A^{\top}A)^{-1}(\mathbf{0}_n^{\top}+A^{\top}y)$$
</div>
<h4 id="iv"><a class="hanchor" href="#iv">(iv)</a></h4>
<div>
$$(Ax-y)^{\top}A+x^{\top}B=\mathbf{0}_n^{\top} \Leftrightarrow \\
A^{\top}(Ax-y)+x^{\top}B=\mathbf{0}_n^{\top} \Leftrightarrow \\
A^{\top}Ax-A^{\top}y+x^{\top}B=\mathbf{0}_n^{\top} \Leftrightarrow \\
A^{\top}Ax+x^{\top}B=\mathbf{0}_n^{\top}+A^{\top}y \Leftrightarrow \\
A^{\top}Ax+B^{\top}x=\mathbf{0}_n^{\top}+A^{\top}y \Leftrightarrow \\
(A^{\top}A+B^{\top})x=\mathbf{0}_n^{\top}+A^{\top}y \Leftrightarrow \\
x=(A^{\top}A+B^{\top})^{-1}(\mathbf{0}_n^{\top}+A^{\top}y) $$
</div>
<h4 id="v"><a class="hanchor" href="#v">(v)</a></h4>
<!--TODO-->
<h4 id="vi"><a class="hanchor" href="#vi">(vi)</a></h4>
<!--TODO-->
<h3 id="261"><a class="hanchor" href="#261">2.6.1</a></h3>
<p>I… I don't know what the skew matrix is :-/, and Wikipedia
isn't very helpful (I don't think it's the <a href="https://en.wikipedia.org/wiki/Skew-Hermitian_matrix">skew-Hermitian
matrix</a>
or the <a href="https://en.wikipedia.org/wiki/Skew-symmetric_matrix">skew-symmetric
matrix</a>
or the <a href="https://en.wikipedia.org/wiki/Skew-Hamiltonian_matrix">skew-Hamiltonian
matrix</a>).</p>
<h3 id="262"><a class="hanchor" href="#262">2.6.2</a></h3>
<!--TODO: do this later when I understand tensors, or am maybe just a
bit more comfortable in general-->
<h3 id="263"><a class="hanchor" href="#263">2.6.3</a></h3>
<p>Writing code: This I can do.</p>
<h4 id="i_1"><a class="hanchor" href="#i_1">i)</a></h4>
<pre><code>using Random, LinearAlgebra
function gradient_check(x, f, df):
n=length(x)
d=length(f(x))
ε=10^-6
J=zero(Matrix{Float64}(undef, d, n))
for i in 1:n
unit=zero(rand(n))
unit[i]=1
J[:,i]=(f(x+ε*unit)-f(x-ε*unit))/(2*ε)
end
if norm(J-df(x), Inf)<10^-4
return true
else
return false
end
end
</code></pre>
<h4 id="ii_1"><a class="hanchor" href="#ii_1">ii)</a></h4>
<pre><code>julia> A=rand(Float64, (10, 15))
julia> f(x)=A*x
julia> df(x)=A
julia> x=randn(15)
15-element Vector{Float64}:
1.536516645971545
1.0136394994998532
-0.09863977762813898
1.3510191388362935
0.84503226122143
0.09296670831415606
-1.5390337565597376
1.4679194319980104
-0.7085023577127753
-0.10676335224166593
-0.8686753109089055
1.2912744597257453
0.7364123079861109
0.5736005534388826
0.5332386427039576
julia> gradient_check(x, f, df)
true
</code></pre>
<p>And now the cooler <code>$f$</code>:</p>
<pre><code>julia> f(x)=transpose(x)*x
f (generic function with 1 method)
julia> df(x)=2*transpose(x)
df (generic function with 1 method)
</code></pre>
<h3 id="264"><a class="hanchor" href="#264">2.6.4</a></h3>
<p>The derivative of <code>$σ(W_0 \times x_0)$</code>, using the chain rule and the derivative
of <code>$\frac{dσ}{dx}=σ'$</code>, is <code>$σ'(W_0 \times x_0) \times W_0$</code>.</p>
<p>Applying this again for <code>$W_1 \times σ(W_0 \times x_0)$</code>,
we get <code>$W_1 \times σ'(W_0 \times x_0) \times W_0$</code>.</p>
<p>Again: <code>$\frac{d}{d x_0} σ(W_1 \times σ(W_0 \times x_0))=σ'(W_1 \times σ(W_0 \times x_0)) \times W_1 \times σ'(W_0 \times x_0) \times W_0$</code>.</p>
<p>And finally:
<code>$\frac{d}{d x_0} W_2 \times σ(W_1 \times σ(W_0 \times x_0))=W_2 \times σ'(W_1 \times σ(W_0 \times x_0)) \times W_1 \times σ'(W_0 \times x_0) \times W_0$</code>.</p>
<p>Then the formula for computing <code>$\frac{d f}{d x_0}$</code> is <code>$W_2 \times \prod_{l=0}^{m-1} σ'(z_{l+1}) \times W_l$</code>,
where <code>$m$</code> is the number of matrices, and <code>$\prod$</code> is left matrix
multiplication.</p>
</body></html>