From f8fcf28d91f32bfc9a5d294a3b4538fae3ba21ad Mon Sep 17 00:00:00 2001
From: Quarto GHA Workflow Runner <quarto-github-actions-publish@example.com>
Date: Mon, 7 Oct 2024 20:07:36 +0000
Subject: [PATCH] Built site for gh-pages

---
 .nojekyll                                |   2 +-
 schedule/slides/11-kernel-smoothers.html |   2 +-
 schedule/slides/12-why-smooth.html       | 285 ++++++++++++++++-------
 search.json                              |  76 +++---
 sitemap.xml                              |  96 ++++----
 5 files changed, 284 insertions(+), 177 deletions(-)
diff --git a/.nojekyll b/.nojekyll
index 87e5c5d..44832bc 100644
--- a/.nojekyll
+++ b/.nojekyll
@@ -1 +1 @@
-55738651
\ No newline at end of file
+88cc06e1
\ No newline at end of file
diff --git a/schedule/slides/11-kernel-smoothers.html b/schedule/slides/11-kernel-smoothers.html
index 6d2d438..0b219dc 100644
--- a/schedule/slides/11-kernel-smoothers.html
+++ b/schedule/slides/11-kernel-smoothers.html
@@ -487,7 +487,7 @@ <h2>11 Local methods</h2>
 <h2>Last time…</h2>
 <p>We looked at <span class="secondary">feature maps</span> as a way to do nonlinear regression.</p>
 <p>We used new “features” <span class="math inline">\(\Phi(x) = \bigg(\phi_1(x),\ \phi_2(x),\ldots,\phi_k(x)\bigg)\)</span></p>
-<p>Now we examine an alternative</p>
+<p>Now we examine a <em>nonparametric</em> alternative</p>
 <p>Suppose I just look at the “neighbours” of some point (based on the <span class="math inline">\(x\)</span>-values)</p>
 <p>I just average the <span class="math inline">\(y\)</span>’s at those locations together</p>
 </section>
diff --git a/schedule/slides/12-why-smooth.html b/schedule/slides/12-why-smooth.html
index 596a3f2..dce7bc3 100644
--- a/schedule/slides/12-why-smooth.html
+++ b/schedule/slides/12-why-smooth.html
@@ -333,7 +333,7 @@
 <h2>12 To(o) smooth or not to(o) smooth?</h2>
 <p><span class="secondary">Stat 406</span></p>
 <p><span class="secondary">Geoff Pleiss, Trevor Campbell</span></p>
-<p>Last modified – 09 October 2023</p>
+<p>Last modified – 07 October 2024</p>
 <p><span class="math display">\[
 \DeclareMathOperator*{\argmin}{argmin}
 \DeclareMathOperator*{\argmax}{argmax}
@@ -359,38 +359,73 @@ <h2>12 To(o) smooth or not to(o) smooth?</h2>
 \newcommand{\bls}{\widehat{\beta}_{ols}}
 \newcommand{\blt}{\widehat{\beta}^L_{s}}
 \newcommand{\bll}{\widehat{\beta}^L_{\lambda}}
+\newcommand{\U}{\mathbf{U}}
+\newcommand{\D}{\mathbf{D}}
+\newcommand{\V}{\mathbf{V}}
 \]</span></p>
 </section>
-<section id="last-time" class="slide level2">
-<h2>Last time…</h2>
-<p>We’ve been discussing smoothing methods in 1-dimension:</p>
+<section id="smooting-vs-linear-models" class="slide level2">
+<h2>Smooting vs Linear Models</h2>
+<p>We’ve been discussing nonlinear methods in 1-dimension:</p>
 <p><span class="math display">\[\Expect{Y\given X=x} = f(x),\quad x\in\R\]</span></p>
-<p>We looked at basis expansions, e.g.:</p>
-<p><span class="math display">\[f(x) \approx \beta_0 + \beta_1 x + \beta_2 x^2 + \cdots + \beta_k x^k\]</span></p>
-<p>We looked at local methods, e.g.:</p>
-<p><span class="math display">\[f(x_i) \approx  s_i^\top \y\]</span></p>
+<ol type="1">
+<li>Basis expansions, e.g.:</li>
+</ol>
+<p><span class="math display">\[\hat f_\mathrm{basis}(x) = \beta_0 + \beta_1 x + \beta_2 x^2 + \cdots + \beta_k x^k\]</span></p>
+<ol start="2" type="1">
+<li>Local methods, e.g.:</li>
+</ol>
+<p><span class="math display">\[\hat f_\mathrm{local}(x_i) = s_i^\top \y\]</span></p>
+<p>Which should we choose?<br>
+<span class="small">Of course, we can do model selection. But can we analyze the risk mathematically?</span></p>
+</section>
+<section id="risk-decomposition" class="slide level2">
+<h2>Risk Decomposition</h2>
+<p><span class="math display">\[
+R_n = \mathrm{Bias}^2 + \mathrm{Var} + \sigma^2
+\]</span></p>
+<p>How does <span class="math inline">\(R_n^{(\mathrm{basis})}\)</span> compare to <span class="math inline">\(R_n^{(\mathrm{local})}\)</span> as we change <span class="math inline">\(n\)</span>?<br>
+</p>
 <div class="fragment">
-<p>What if <span class="math inline">\(x \in \R^p\)</span> and <span class="math inline">\(p&gt;1\)</span>?</p>
-
+<!-- -->
+<h3 id="variance">Variance</h3>
+<ul>
+<li>Basis: variance decreases as <span class="math inline">\(n\)</span> increases</li>
+<li>Local: variance decreases as <span class="math inline">\(n\)</span> increases<br>
+<span class="small">But at what rate?</span></li>
+</ul>
 </div>
-<aside><div>
-<p>Note that <span class="math inline">\(p\)</span> means the dimension of <span class="math inline">\(x\)</span>, not the dimension of the space of the polynomial basis or something else. That’s why I put <span class="math inline">\(k\)</span> above.</p>
-</div></aside></section>
-<section id="kernels-and-interactions" class="slide level2">
-<h2>Kernels and interactions</h2>
-<p>In multivariate nonparametric regression, you estimate a <span class="secondary">surface</span> over the input variables.</p>
-<p>This is trying to find <span class="math inline">\(\widehat{f}(x_1,\ldots,x_p)\)</span>.</p>
-<p>Therefore, this function <span class="secondary">by construction</span> includes interactions, handles categorical data, etc. etc.</p>
-<p>This is in contrast with explicit <span class="secondary">linear models</span> which need you to specify these things.</p>
-<p>This extra complexity (automatically including interactions, as well as other things) comes with tradeoffs.</p>
 <div class="fragment">
-<p>More complicated functions (smooth Kernel regressions vs.&nbsp;linear models) tend to have <span class="secondary">lower bias</span> but <span class="secondary">higher variance</span>.</p>
+<!-- -->
+<h3 id="bias">Bias</h3>
+<ul>
+<li>Basis: bias is <em>fixed</em><br>
+<span class="small">Assuming <span class="math inline">\(k\)</span> is fixed</span></li>
+<li>Local: bias depends on choice of bandwidth <span class="math inline">\(\sigma\)</span>.</li>
+</ul>
 </div>
 </section>
-<section id="issue-1" class="slide level2">
-<h2>Issue 1</h2>
-<p>For <span class="math inline">\(p=1\)</span>, one can show that for kernels (with the correct bandwidth)</p>
-<p><span class="math display">\[\textrm{MSE}(\hat{f}) = \frac{C_1}{n^{4/5}} + \frac{C_2}{n^{4/5}} + \sigma^2\]</span></p>
+<section id="risk-decomposition-1" class="slide level2">
+<h2>Risk Decomposition</h2>
+<div class="flex">
+<div class="w-60">
+<h3 id="basis">Basis</h3>
+<p><span class="math display">\[
+R_n^{(\mathrm{basis})} =
+  \underbrace{C_1^{(b)}}_{\mathrm{bias}^2} +
+  \underbrace{\frac{C_2^{(b)}}{n}}_{\mathrm{var}} +
+  \sigma^2
+\]</span></p>
+<h3 id="local">Local</h3>
+<p><em>With the optimal bandwidth</em> (<span class="math inline">\(\propto n^{-1/5}\)</span>), we have</p>
+<p><span class="math display">\[
+R_n^{(\mathrm{local})} =
+  \underbrace{\frac{C_1^{(l)}}{n^{4/5}}}_{\mathrm{bias}^2} +
+  \underbrace{\frac{C_2^{(l)}}{n^{4/5}}}_{\mathrm{var}} +
+  \sigma^2
+\]</span></p>
+</div>
+<div class="w-40">
 <div class="callout callout-important callout-titled callout-style-default">
 <div class="callout-body">
 <div class="callout-title">
@@ -401,86 +436,158 @@ <h2>Issue 1</h2>
 </div>
 <div class="callout-content">
 <p><em>you don’t need to memorize these formulas</em> but you should know the intuition</p>
-<p><em>the constants</em> don’t matter for the intuition, but they matter for a particular data set. We don’t know them. So you estimate this.</p>
+<p><em>The constants</em> don’t matter for the intuition, but they matter for a particular data set. You have to estimate them.</p>
 </div>
 </div>
 </div>
-</section>
-<section id="issue-1-1" class="slide level2">
-<h2>Issue 1</h2>
-<p>For <span class="math inline">\(p=1\)</span>, one can show that for kernels (with the correct bandwidth)</p>
-<p><span class="math display">\[\textrm{MSE}(\hat{f}) = \frac{C_1}{n^{4/5}} + \frac{C_2}{n^{4/5}} + \sigma^2\]</span></p>
-<p>Recall, this decomposition is <span class="secondary">squared bias + variance + irreducible error</span></p>
-<ul>
-<li>It depends on the <strong>choice</strong> of <span class="math inline">\(h\)</span></li>
-</ul>
-<p><span class="math display">\[\textrm{MSE}(\hat{f}) = C_1 h^4 + \frac{C_2}{nh} + \sigma^2\]</span></p>
+<h3 id="what-do-you-notice">What do you notice?</h3>
+<div class="fragment">
 <ul>
-<li>Using <span class="math inline">\(h = cn^{-1/5}\)</span> <strong>balances</strong> squared bias and variance, leads to the above rate. (That balance minimizes the MSE)</li>
+<li>As <span class="math inline">\(n\)</span> increases, the optimal bandwidth <span class="math inline">\(\sigma\)</span> decreases</li>
+<li>As <span class="math inline">\(n \to \infty\)</span>, <span class="math inline">\(R_n^{(\mathrm{basis})} \to C_1^{(b)} + \sigma^2\)</span></li>
+<li>As <span class="math inline">\(n \to \infty\)</span>, <span class="math inline">\(R_n^{(\mathrm{local})} \to \sigma^2\)</span></li>
 </ul>
+</div>
+</div>
+</div>
+<!-- . . . -->
+<!-- What if $x \in \R^p$ and $p>1$? -->
+<!-- ::: aside -->
+<!-- Note that $p$ means the dimension of $x$, not the dimension of the space of the polynomial basis or something else. That's why I put $k$ above. -->
+<!-- ::: -->
 </section>
-<section id="issue-1-2" class="slide level2">
-<h2>Issue 1</h2>
-<p>For <span class="math inline">\(p=1\)</span>, one can show that for kernels (with the correct bandwidth)</p>
-<p><span class="math display">\[\textrm{MSE}(\hat{f}) = \frac{C_1}{n^{4/5}} + \frac{C_2}{n^{4/5}} + \sigma^2\]</span></p>
-<h3 id="intuition">Intuition:</h3>
-<p>as you collect data, use a smaller bandwidth and the MSE (on future data) decreases</p>
-</section>
-<section id="issue-1-3" class="slide level2">
-<h2>Issue 1</h2>
-<p>For <span class="math inline">\(p=1\)</span>, one can show that for kernels (with the correct bandwidth)</p>
-<p><span class="math display">\[\textrm{MSE}(\hat{f}) = \frac{C_1}{n^{4/5}} + \frac{C_2}{n^{4/5}} + \sigma^2\]</span></p>
-<p><span class="primary">How does this compare to just using a linear model?</span></p>
-<p><span class="primary">Bias</span></p>
-<ol type="1">
-<li>The bias of using a linear model <span class="secondary">when the truth nonlinear</span> is a number <span class="math inline">\(b &gt; 0\)</span> which doesn’t depend on <span class="math inline">\(n\)</span>.</li>
-<li>The bias of using kernel regression is <span class="math inline">\(C_1/n^{4/5}\)</span>. This goes to 0 as <span class="math inline">\(n\rightarrow\infty\)</span>.</li>
-</ol>
-<p><span class="primary">Variance</span></p>
+<section id="takeaway" class="slide level2">
+<h2>Takeaway</h2>
 <ol type="1">
-<li>The variance of using a linear model is <span class="math inline">\(C/n\)</span> <span class="secondary">no matter what</span></li>
-<li>The variance of using kernel regression is <span class="math inline">\(C_2/n^{4/5}\)</span>.</li>
+<li>Local methods are <em>consistent</em> (bias and variance go to 0 as <span class="math inline">\(n \to \infty\)</span>)</li>
+<li>Fixed basis expansions are <em>biased</em> but have lower variance when <span class="math inline">\(n\)</span> is relatively small.<br>
+<span class="small"><span class="math inline">\(\underbrace{O(1/n)}_{\text{basis var.}} &lt; \underbrace{O(1/n^{4/5})}_{\text{local var.}}\)</span></span></li>
 </ol>
 </section>
-<section id="issue-1-4" class="slide level2">
-<h2>Issue 1</h2>
-<p>For <span class="math inline">\(p=1\)</span>, one can show that for kernels (with the correct bandwidth)</p>
-<p><span class="math display">\[\textrm{MSE}(\hat{f}) = \frac{C_1}{n^{4/5}} + \frac{C_2}{n^{4/5}} + \sigma^2\]</span></p>
-<h3 id="to-conclude">To conclude:</h3>
+<section>
+<section id="the-curse-of-dimensionality" class="title-slide slide level1 center">
+<h1>The Curse of Dimensionality</h1>
+<p>How do local methods perform when <span class="math inline">\(p &gt; 1\)</span>?</p>
+</section>
+<section id="intuitively" class="slide level2">
+<h2>Intuitively</h2>
+<p><em>Parametric</em> multivariate regressors (e.g.&nbsp;basis expansions) require you to specify nonlinear interaction terms<br>
+<span class="small">e.g.&nbsp;<span class="math inline">\(x^{(1)} x^{(2)}\)</span>, <span class="math inline">\(\cos( x^{(1)} + x^{(2)})\)</span>, etc.</span></p>
+<p><br>
+<em>Nonparametric</em> multivariate regressors (e.g.&nbsp;KNN, local methods) automatically handle interactions.<br>
+<span class="small">The distance function (e.g.&nbsp;<span class="math inline">\(d(x,x') = \Vert x - x' \Vert_2\)</span>) used by kernels implicitly defines <em>infinitely many</em> interactions!</span></p>
+<p><br>
+<span class="secondary">This extra complexity (automatically including interactions, as well as other things) comes with a tradeoff.</span></p>
+</section>
+<section id="mathematically" class="slide level2">
+<h2>Mathematically</h2>
+<p>Let’s say <span class="math inline">\(x_1, \ldots, x_n\)</span> are distributed <em>uniformly</em> over the space <span class="math inline">\(\mathcal B_1(p)\)</span><br>
+<span class="small"><span class="math inline">\(B_1(p)\)</span> is the “unit ball,” or the set of all <span class="math inline">\(x\)</span> such that <span class="math inline">\(\Vert x \Vert_2 \leq 1\)</span>.</span></p>
+<div class="fragment">
+<p><br>
+<span class="secondary">What is the <em>maximum</em> distance between any two points in <span class="math inline">\(\mathcal B_1(p)\)</span>?</span></p>
+</div>
+<div class="fragment">
+<p><span class="math inline">\(\Vert x - x' \Vert_2 \leq \Vert x \Vert_2 + \Vert x' \Vert_2 \leq 1 + 1 = 2.\)</span></p>
+</div>
+<div class="fragment">
+<p><br>
+<span class="secondary">What about the <em>average</em> distance?</span></p>
+</div>
+</section>
+<section id="the-average-sq.-distance-between-points-in-mathcal-b_1p" class="slide level2">
+<h2>The average (sq.) distance between points in <span class="math inline">\(\mathcal B_1(p)\)</span></h2>
+<p><span class="math display">\[
+\begin{align}
+E\left[ \Vert x - x' \Vert_2^2 \right]
+&amp;=
+E\left[ \textstyle \sum_{k=1}^p (x_k - x_k')^2 \right]
+\\
+&amp;= \textstyle{
+  E[ \sum_{k=1}^p x_k^2 ]
+  + 2 \sum_{k=1}^p \sum_{\ell=1}^p \underbrace{E[ x_l x'_k ]}_{=0}
+  + E[ \sum_{k=1}^p x_k^{\prime 2} ]
+}
+\\
+&amp;= 2  E[ \textstyle{\sum_{k=1}^p} x_k^2 ]
+= 2 E[ \Vert x \Vert_2^2 ]
+\end{align}
+\]</span></p>
+<div class="fragment">
+<p><span class="math inline">\(2 E[ \Vert x \Vert_2^2 ] = 2^{1 - 1/p}.\)</span></p>
+<div class="flex">
+<div class="w-60">
+<div class="fragment">
 <ul>
-<li><p>bias of kernels goes to zero, bias of lines doesn’t (unless the truth is linear).</p></li>
-<li><p>but variance of lines goes to zero faster than for kernels.</p></li>
+<li>When <span class="math inline">\(p=2\)</span>, <span class="math inline">\(\frac{\text{avg dist}}{\text{max dist}} = 0.707\)</span></li>
+<li>When <span class="math inline">\(p=5\)</span>, <span class="math inline">\(\frac{\text{avg dist}}{\text{max dist}} = 0.871\)</span>!</li>
+<li>When <span class="math inline">\(p=10\)</span>, <span class="math inline">\(\frac{\text{avg dist}}{\text{max dist}} = 0.933\)</span>!!</li>
+<li>When <span class="math inline">\(p=100\)</span>, <span class="math inline">\(\frac{\text{avg dist}}{\text{max dist}} = 0.993\)</span>!!!</li>
 </ul>
-<p>If the linear model is <span class="secondary">right</span>, you win.</p>
-<p>But if it’s wrong, you (eventually) lose as <span class="math inline">\(n\)</span> grows.</p>
-<p>How do you know if you have enough data?</p>
-<p>Compare of the kernel version with CV-selected tuning parameter with the estimate of the risk for the linear model.</p>
-</section>
-<section>
-<section id="danger" class="title-slide slide level1 center">
-<h1>☠️☠️ Danger ☠️☠️</h1>
-<p>You can’t just compare the CVM for the kernel version to the CVM for the LM. This is because you used CVM to select the tuning parameter, so we’re back to the usual problem of using the data twice. You have to do <span class="hand">another</span> CV to estimate the risk of the kernel version at CV selected tuning parameter. ️</p>
+</div>
+</div>
+<div class="w-40">
+<div class="fragment">
+<!-- -->
+<h3 id="why-is-this-problematic">Why is this problematic?</h3>
+<ul>
+<li>All points are maximally far apart from all other points</li>
+<li>Can’t distinguish between “similar” and “different” inputs</li>
+</ul>
+</div>
+</div>
+</div>
+</div>
 </section>
-<section id="issue-2" class="slide level2">
-<h2>Issue 2</h2>
-<p>For <span class="math inline">\(p&gt;1\)</span>, there is more trouble.</p>
-<p>First, lets look again at <span class="math display">\[\textrm{MSE}(\hat{f}) = \frac{C_1}{n^{4/5}} + \frac{C_2}{n^{4/5}} + \sigma^2\]</span></p>
-<p>That is for <span class="math inline">\(p=1\)</span>. It’s not <span class="secondary">that much</span> slower than <span class="math inline">\(C/n\)</span>, the variance for linear models.</p>
-<p>If <span class="math inline">\(p&gt;1\)</span> similar calculations show,</p>
-<p><span class="math display">\[\textrm{MSE}(\hat f) = \frac{C_1+C_2}{n^{4/(4+p)}} + \sigma^2 \hspace{2em} \textrm{MSE}(\hat \beta)  = b + \frac{Cp}{n} + \sigma^2 .\]</span></p>
+<section id="curse-of-dimensionality" class="slide level2">
+<h2>Curse of Dimensionality</h2>
+<p>Distance becomes (exponentially) meaningless in high dimensions.*<br>
+<span class="small">*(Unless our data has “low dimensional structure.”)</span></p>
+<div class="fragment">
+<h3 id="risk-decomposition-p-1">Risk decomposition (<span class="math inline">\(p &gt; 1\)</span>)</h3>
+<p><span class="small">Assuming optimal bandwidth of <span class="math inline">\(n^{-1/(4+p)}\)</span>…</span></p>
+<p><span class="math display">\[
+R_n^{(\mathrm{basis})} =
+  \underbrace{C_1^{(b)}}_{\mathrm{bias}^2} +
+  \underbrace{\tfrac{C_2^{(b)}}{n/p}}_{\mathrm{var}} +
+  \sigma^2,
+\qquad
+R_n^{(\mathrm{local})} =
+  \underbrace{\tfrac{C_1^{(l)}}{n^{4/(4+p)}}}_{\mathrm{bias}^2} +
+  \underbrace{\tfrac{C_2^{(l)}}{n^{4/(4+p)}}}_{\mathrm{var}} +
+  \sigma^2.
+\]</span></p>
+<div class="fragment">
+<!-- -->
+<h3 id="observations">Observations</h3>
+<ul>
+<li><span class="math inline">\((C_1 + C_2) / n^{4/(4+p)}\)</span> is relatively big, but <span class="math inline">\(C_2^{(b)} / (n/p)\)</span> is relatively small.</li>
+<li>So unless <span class="math inline">\(C_1^{(b)}\)</span> is big, we should use the linear model.*<br>
+</li>
+</ul>
+</div>
+</div>
 </section>
-<section id="issue-2-1" class="slide level2">
-<h2>Issue 2</h2>
-<p><span class="math display">\[\textrm{MSE}(\hat f) = \frac{C_1+C_2}{n^{4/(4+p)}} + \sigma^2 \hspace{2em} \textrm{MSE}(\hat \beta)  = b + \frac{Cp}{n} + \sigma^2 .\]</span></p>
-<p>What if <span class="math inline">\(p\)</span> is big (and <span class="math inline">\(n\)</span> is really big)?</p>
+<section id="in-practice" class="slide level2">
+<h2>In practice</h2>
+<p><span class="small">The previous math assumes that our data are “densely” distributed throughout <span class="math inline">\(\R^p\)</span>.</span></p>
+<p>However, if our data lie on a low-dimensional manifold within <span class="math inline">\(\R^p\)</span>, then local methods can work well!</p>
+<p><span class="small">We generally won’t know the “intrinsic dimensinality” of our data though…</span></p>
+<div class="fragment">
+<!-- -->
+<h3 id="how-to-decide-between-basis-expansions-versus-local-kernel-smoothers">How to decide between basis expansions versus local kernel smoothers:</h3>
 <ol type="1">
-<li>Then <span class="math inline">\((C_1 + C_2) / n^{4/(4+p)}\)</span> is still big.</li>
-<li>But <span class="math inline">\(Cp / n\)</span> is small.</li>
-<li>So unless <span class="math inline">\(b\)</span> is big, we should use the linear model.</li>
+<li>Model selection</li>
+<li>Using a <span class="secondary">very, very</span> questionable rule of thumb: if <span class="math inline">\(p&gt;\log(n)\)</span>, don’t do smoothing.</li>
 </ol>
-<p>How do you tell? Do model selection to decide.</p>
-<p>A <span class="secondary">very, very</span> questionable rule of thumb: if <span class="math inline">\(p&gt;\log(n)\)</span>, don’t do smoothing.</p>
+</div>
 </section></section>
+<section id="danger" class="title-slide slide level1 center">
+<h1>☠️☠️ Danger ☠️☠️</h1>
+<p>You can’t just compare the GCV/CV/etc. scores for basis models versus local kernel smoothers.</p>
+<p>You used GCV/CV/etc. to select the tuning parameter, so we’re back to the usual problem of using the data twice. You have to do <span class="hand">another</span> CV to estimate the risk of the kernel version once you have used GCV/CV/etc. to select the bandwidth.</p>
+</section>
+
 <section id="next-time" class="title-slide slide level1 center">
 <h1>Next time…</h1>
 <p>Compromises if <em>p</em> is big</p>
diff --git a/search.json b/search.json
index 8d9d4fc..23dc0cd 100644
--- a/search.json
+++ b/search.json
@@ -354,7 +354,7 @@
     "href": "schedule/slides/11-kernel-smoothers.html#last-time",
     "title": "UBC Stat406 2024W",
     "section": "Last time…",
-    "text": "Last time…\nWe looked at feature maps as a way to do nonlinear regression.\nWe used new “features” \\(\\Phi(x) = \\bigg(\\phi_1(x),\\ \\phi_2(x),\\ldots,\\phi_k(x)\\bigg)\\)\nNow we examine an alternative\nSuppose I just look at the “neighbours” of some point (based on the \\(x\\)-values)\nI just average the \\(y\\)’s at those locations together"
+    "text": "Last time…\nWe looked at feature maps as a way to do nonlinear regression.\nWe used new “features” \\(\\Phi(x) = \\bigg(\\phi_1(x),\\ \\phi_2(x),\\ldots,\\phi_k(x)\\bigg)\\)\nNow we examine a nonparametric alternative\nSuppose I just look at the “neighbours” of some point (based on the \\(x\\)-values)\nI just average the \\(y\\)’s at those locations together"
   },
   {
     "objectID": "schedule/slides/11-kernel-smoothers.html#lets-use-3-neighbours",
@@ -2230,70 +2230,70 @@
     "href": "schedule/slides/12-why-smooth.html#section",
     "title": "UBC Stat406 2024W",
     "section": "12 To(o) smooth or not to(o) smooth?",
-    "text": "12 To(o) smooth or not to(o) smooth?\nStat 406\nGeoff Pleiss, Trevor Campbell\nLast modified – 09 October 2023\n\\[\n\\DeclareMathOperator*{\\argmin}{argmin}\n\\DeclareMathOperator*{\\argmax}{argmax}\n\\DeclareMathOperator*{\\minimize}{minimize}\n\\DeclareMathOperator*{\\maximize}{maximize}\n\\DeclareMathOperator*{\\find}{find}\n\\DeclareMathOperator{\\st}{subject\\,\\,to}\n\\newcommand{\\E}{E}\n\\newcommand{\\Expect}[1]{\\E\\left[ #1 \\right]}\n\\newcommand{\\Var}[1]{\\mathrm{Var}\\left[ #1 \\right]}\n\\newcommand{\\Cov}[2]{\\mathrm{Cov}\\left[#1,\\ #2\\right]}\n\\newcommand{\\given}{\\ \\vert\\ }\n\\newcommand{\\X}{\\mathbf{X}}\n\\newcommand{\\x}{\\mathbf{x}}\n\\newcommand{\\y}{\\mathbf{y}}\n\\newcommand{\\P}{\\mathcal{P}}\n\\newcommand{\\R}{\\mathbb{R}}\n\\newcommand{\\norm}[1]{\\left\\lVert #1 \\right\\rVert}\n\\newcommand{\\snorm}[1]{\\lVert #1 \\rVert}\n\\newcommand{\\tr}[1]{\\mbox{tr}(#1)}\n\\newcommand{\\brt}{\\widehat{\\beta}^R_{s}}\n\\newcommand{\\brl}{\\widehat{\\beta}^R_{\\lambda}}\n\\newcommand{\\bls}{\\widehat{\\beta}_{ols}}\n\\newcommand{\\blt}{\\widehat{\\beta}^L_{s}}\n\\newcommand{\\bll}{\\widehat{\\beta}^L_{\\lambda}}\n\\]"
+    "text": "12 To(o) smooth or not to(o) smooth?\nStat 406\nGeoff Pleiss, Trevor Campbell\nLast modified – 07 October 2024\n\\[\n\\DeclareMathOperator*{\\argmin}{argmin}\n\\DeclareMathOperator*{\\argmax}{argmax}\n\\DeclareMathOperator*{\\minimize}{minimize}\n\\DeclareMathOperator*{\\maximize}{maximize}\n\\DeclareMathOperator*{\\find}{find}\n\\DeclareMathOperator{\\st}{subject\\,\\,to}\n\\newcommand{\\E}{E}\n\\newcommand{\\Expect}[1]{\\E\\left[ #1 \\right]}\n\\newcommand{\\Var}[1]{\\mathrm{Var}\\left[ #1 \\right]}\n\\newcommand{\\Cov}[2]{\\mathrm{Cov}\\left[#1,\\ #2\\right]}\n\\newcommand{\\given}{\\ \\vert\\ }\n\\newcommand{\\X}{\\mathbf{X}}\n\\newcommand{\\x}{\\mathbf{x}}\n\\newcommand{\\y}{\\mathbf{y}}\n\\newcommand{\\P}{\\mathcal{P}}\n\\newcommand{\\R}{\\mathbb{R}}\n\\newcommand{\\norm}[1]{\\left\\lVert #1 \\right\\rVert}\n\\newcommand{\\snorm}[1]{\\lVert #1 \\rVert}\n\\newcommand{\\tr}[1]{\\mbox{tr}(#1)}\n\\newcommand{\\brt}{\\widehat{\\beta}^R_{s}}\n\\newcommand{\\brl}{\\widehat{\\beta}^R_{\\lambda}}\n\\newcommand{\\bls}{\\widehat{\\beta}_{ols}}\n\\newcommand{\\blt}{\\widehat{\\beta}^L_{s}}\n\\newcommand{\\bll}{\\widehat{\\beta}^L_{\\lambda}}\n\\newcommand{\\U}{\\mathbf{U}}\n\\newcommand{\\D}{\\mathbf{D}}\n\\newcommand{\\V}{\\mathbf{V}}\n\\]"
   },
   {
-    "objectID": "schedule/slides/12-why-smooth.html#last-time",
-    "href": "schedule/slides/12-why-smooth.html#last-time",
+    "objectID": "schedule/slides/12-why-smooth.html#smooting-vs-linear-models",
+    "href": "schedule/slides/12-why-smooth.html#smooting-vs-linear-models",
     "title": "UBC Stat406 2024W",
-    "section": "Last time…",
-    "text": "Last time…\nWe’ve been discussing smoothing methods in 1-dimension:\n\\[\\Expect{Y\\given X=x} = f(x),\\quad x\\in\\R\\]\nWe looked at basis expansions, e.g.:\n\\[f(x) \\approx \\beta_0 + \\beta_1 x + \\beta_2 x^2 + \\cdots + \\beta_k x^k\\]\nWe looked at local methods, e.g.:\n\\[f(x_i) \\approx  s_i^\\top \\y\\]\n\nWhat if \\(x \\in \\R^p\\) and \\(p&gt;1\\)?\n\n\n\nNote that \\(p\\) means the dimension of \\(x\\), not the dimension of the space of the polynomial basis or something else. That’s why I put \\(k\\) above."
+    "section": "Smooting vs Linear Models",
+    "text": "Smooting vs Linear Models\nWe’ve been discussing nonlinear methods in 1-dimension:\n\\[\\Expect{Y\\given X=x} = f(x),\\quad x\\in\\R\\]\n\nBasis expansions, e.g.:\n\n\\[\\hat f_\\mathrm{basis}(x) = \\beta_0 + \\beta_1 x + \\beta_2 x^2 + \\cdots + \\beta_k x^k\\]\n\nLocal methods, e.g.:\n\n\\[\\hat f_\\mathrm{local}(x_i) = s_i^\\top \\y\\]\nWhich should we choose?\nOf course, we can do model selection. But can we analyze the risk mathematically?"
   },
   {
-    "objectID": "schedule/slides/12-why-smooth.html#kernels-and-interactions",
-    "href": "schedule/slides/12-why-smooth.html#kernels-and-interactions",
+    "objectID": "schedule/slides/12-why-smooth.html#risk-decomposition",
+    "href": "schedule/slides/12-why-smooth.html#risk-decomposition",
     "title": "UBC Stat406 2024W",
-    "section": "Kernels and interactions",
-    "text": "Kernels and interactions\nIn multivariate nonparametric regression, you estimate a surface over the input variables.\nThis is trying to find \\(\\widehat{f}(x_1,\\ldots,x_p)\\).\nTherefore, this function by construction includes interactions, handles categorical data, etc. etc.\nThis is in contrast with explicit linear models which need you to specify these things.\nThis extra complexity (automatically including interactions, as well as other things) comes with tradeoffs.\n\nMore complicated functions (smooth Kernel regressions vs. linear models) tend to have lower bias but higher variance."
+    "section": "Risk Decomposition",
+    "text": "Risk Decomposition\n\\[\nR_n = \\mathrm{Bias}^2 + \\mathrm{Var} + \\sigma^2\n\\]\nHow does \\(R_n^{(\\mathrm{basis})}\\) compare to \\(R_n^{(\\mathrm{local})}\\) as we change \\(n\\)?\n\n\n\nVariance\n\nBasis: variance decreases as \\(n\\) increases\nLocal: variance decreases as \\(n\\) increases\nBut at what rate?\n\n\n\n\nBias\n\nBasis: bias is fixed\nAssuming \\(k\\) is fixed\nLocal: bias depends on choice of bandwidth \\(\\sigma\\)."
   },
   {
-    "objectID": "schedule/slides/12-why-smooth.html#issue-1",
-    "href": "schedule/slides/12-why-smooth.html#issue-1",
+    "objectID": "schedule/slides/12-why-smooth.html#risk-decomposition-1",
+    "href": "schedule/slides/12-why-smooth.html#risk-decomposition-1",
     "title": "UBC Stat406 2024W",
-    "section": "Issue 1",
-    "text": "Issue 1\nFor \\(p=1\\), one can show that for kernels (with the correct bandwidth)\n\\[\\textrm{MSE}(\\hat{f}) = \\frac{C_1}{n^{4/5}} + \\frac{C_2}{n^{4/5}} + \\sigma^2\\]\n\n\n\n\n\n\nImportant\n\n\nyou don’t need to memorize these formulas but you should know the intuition\nthe constants don’t matter for the intuition, but they matter for a particular data set. We don’t know them. So you estimate this."
+    "section": "Risk Decomposition",
+    "text": "Risk Decomposition\n\n\nBasis\n\\[\nR_n^{(\\mathrm{basis})} =\n  \\underbrace{C_1^{(b)}}_{\\mathrm{bias}^2} +\n  \\underbrace{\\frac{C_2^{(b)}}{n}}_{\\mathrm{var}} +\n  \\sigma^2\n\\]\nLocal\nWith the optimal bandwidth (\\(\\propto n^{-1/5}\\)), we have\n\\[\nR_n^{(\\mathrm{local})} =\n  \\underbrace{\\frac{C_1^{(l)}}{n^{4/5}}}_{\\mathrm{bias}^2} +\n  \\underbrace{\\frac{C_2^{(l)}}{n^{4/5}}}_{\\mathrm{var}} +\n  \\sigma^2\n\\]\n\n\n\n\n\n\n\n\nImportant\n\n\nyou don’t need to memorize these formulas but you should know the intuition\nThe constants don’t matter for the intuition, but they matter for a particular data set. You have to estimate them.\n\n\n\nWhat do you notice?\n\n\nAs \\(n\\) increases, the optimal bandwidth \\(\\sigma\\) decreases\nAs \\(n \\to \\infty\\), \\(R_n^{(\\mathrm{basis})} \\to C_1^{(b)} + \\sigma^2\\)\nAs \\(n \\to \\infty\\), \\(R_n^{(\\mathrm{local})} \\to \\sigma^2\\)"
   },
   {
-    "objectID": "schedule/slides/12-why-smooth.html#issue-1-1",
-    "href": "schedule/slides/12-why-smooth.html#issue-1-1",
+    "objectID": "schedule/slides/12-why-smooth.html#takeaway",
+    "href": "schedule/slides/12-why-smooth.html#takeaway",
     "title": "UBC Stat406 2024W",
-    "section": "Issue 1",
-    "text": "Issue 1\nFor \\(p=1\\), one can show that for kernels (with the correct bandwidth)\n\\[\\textrm{MSE}(\\hat{f}) = \\frac{C_1}{n^{4/5}} + \\frac{C_2}{n^{4/5}} + \\sigma^2\\]\nRecall, this decomposition is squared bias + variance + irreducible error\n\nIt depends on the choice of \\(h\\)\n\n\\[\\textrm{MSE}(\\hat{f}) = C_1 h^4 + \\frac{C_2}{nh} + \\sigma^2\\]\n\nUsing \\(h = cn^{-1/5}\\) balances squared bias and variance, leads to the above rate. (That balance minimizes the MSE)"
+    "section": "Takeaway",
+    "text": "Takeaway\n\nLocal methods are consistent (bias and variance go to 0 as \\(n \\to \\infty\\))\nFixed basis expansions are biased but have lower variance when \\(n\\) is relatively small.\n\\(\\underbrace{O(1/n)}_{\\text{basis var.}} &lt; \\underbrace{O(1/n^{4/5})}_{\\text{local var.}}\\)"
   },
   {
-    "objectID": "schedule/slides/12-why-smooth.html#issue-1-2",
-    "href": "schedule/slides/12-why-smooth.html#issue-1-2",
+    "objectID": "schedule/slides/12-why-smooth.html#intuitively",
+    "href": "schedule/slides/12-why-smooth.html#intuitively",
     "title": "UBC Stat406 2024W",
-    "section": "Issue 1",
-    "text": "Issue 1\nFor \\(p=1\\), one can show that for kernels (with the correct bandwidth)\n\\[\\textrm{MSE}(\\hat{f}) = \\frac{C_1}{n^{4/5}} + \\frac{C_2}{n^{4/5}} + \\sigma^2\\]\nIntuition:\nas you collect data, use a smaller bandwidth and the MSE (on future data) decreases"
+    "section": "Intuitively",
+    "text": "Intuitively\nParametric multivariate regressors (e.g. basis expansions) require you to specify nonlinear interaction terms\ne.g. \\(x^{(1)} x^{(2)}\\), \\(\\cos( x^{(1)} + x^{(2)})\\), etc.\n\nNonparametric multivariate regressors (e.g. KNN, local methods) automatically handle interactions.\nThe distance function (e.g. \\(d(x,x') = \\Vert x - x' \\Vert_2\\)) used by kernels implicitly defines infinitely many interactions!\n\nThis extra complexity (automatically including interactions, as well as other things) comes with a tradeoff."
   },
   {
-    "objectID": "schedule/slides/12-why-smooth.html#issue-1-3",
-    "href": "schedule/slides/12-why-smooth.html#issue-1-3",
+    "objectID": "schedule/slides/12-why-smooth.html#mathematically",
+    "href": "schedule/slides/12-why-smooth.html#mathematically",
     "title": "UBC Stat406 2024W",
-    "section": "Issue 1",
-    "text": "Issue 1\nFor \\(p=1\\), one can show that for kernels (with the correct bandwidth)\n\\[\\textrm{MSE}(\\hat{f}) = \\frac{C_1}{n^{4/5}} + \\frac{C_2}{n^{4/5}} + \\sigma^2\\]\nHow does this compare to just using a linear model?\nBias\n\nThe bias of using a linear model when the truth nonlinear is a number \\(b &gt; 0\\) which doesn’t depend on \\(n\\).\nThe bias of using kernel regression is \\(C_1/n^{4/5}\\). This goes to 0 as \\(n\\rightarrow\\infty\\).\n\nVariance\n\nThe variance of using a linear model is \\(C/n\\) no matter what\nThe variance of using kernel regression is \\(C_2/n^{4/5}\\)."
+    "section": "Mathematically",
+    "text": "Mathematically\nLet’s say \\(x_1, \\ldots, x_n\\) are distributed uniformly over the space \\(\\mathcal B_1(p)\\)\n\\(B_1(p)\\) is the “unit ball,” or the set of all \\(x\\) such that \\(\\Vert x \\Vert_2 \\leq 1\\).\n\n\nWhat is the maximum distance between any two points in \\(\\mathcal B_1(p)\\)?\n\n\n\\(\\Vert x - x' \\Vert_2 \\leq \\Vert x \\Vert_2 + \\Vert x' \\Vert_2 \\leq 1 + 1 = 2.\\)\n\n\n\nWhat about the average distance?"
   },
   {
-    "objectID": "schedule/slides/12-why-smooth.html#issue-1-4",
-    "href": "schedule/slides/12-why-smooth.html#issue-1-4",
+    "objectID": "schedule/slides/12-why-smooth.html#the-average-sq.-distance-between-points-in-mathcal-b_1p",
+    "href": "schedule/slides/12-why-smooth.html#the-average-sq.-distance-between-points-in-mathcal-b_1p",
     "title": "UBC Stat406 2024W",
-    "section": "Issue 1",
-    "text": "Issue 1\nFor \\(p=1\\), one can show that for kernels (with the correct bandwidth)\n\\[\\textrm{MSE}(\\hat{f}) = \\frac{C_1}{n^{4/5}} + \\frac{C_2}{n^{4/5}} + \\sigma^2\\]\nTo conclude:\n\nbias of kernels goes to zero, bias of lines doesn’t (unless the truth is linear).\nbut variance of lines goes to zero faster than for kernels.\n\nIf the linear model is right, you win.\nBut if it’s wrong, you (eventually) lose as \\(n\\) grows.\nHow do you know if you have enough data?\nCompare of the kernel version with CV-selected tuning parameter with the estimate of the risk for the linear model."
+    "section": "The average (sq.) distance between points in \\(\\mathcal B_1(p)\\)",
+    "text": "The average (sq.) distance between points in \\(\\mathcal B_1(p)\\)\n\\[\n\\begin{align}\nE\\left[ \\Vert x - x' \\Vert_2^2 \\right]\n&=\nE\\left[ \\textstyle \\sum_{k=1}^p (x_k - x_k')^2 \\right]\n\\\\\n&= \\textstyle{\n  E[ \\sum_{k=1}^p x_k^2 ]\n  + 2 \\sum_{k=1}^p \\sum_{\\ell=1}^p \\underbrace{E[ x_l x'_k ]}_{=0}\n  + E[ \\sum_{k=1}^p x_k^{\\prime 2} ]\n}\n\\\\\n&= 2  E[ \\textstyle{\\sum_{k=1}^p} x_k^2 ]\n= 2 E[ \\Vert x \\Vert_2^2 ]\n\\end{align}\n\\]\n\n\\(2 E[ \\Vert x \\Vert_2^2 ] = 2^{1 - 1/p}.\\)\n\n\n\n\nWhen \\(p=2\\), \\(\\frac{\\text{avg dist}}{\\text{max dist}} = 0.707\\)\nWhen \\(p=5\\), \\(\\frac{\\text{avg dist}}{\\text{max dist}} = 0.871\\)!\nWhen \\(p=10\\), \\(\\frac{\\text{avg dist}}{\\text{max dist}} = 0.933\\)!!\nWhen \\(p=100\\), \\(\\frac{\\text{avg dist}}{\\text{max dist}} = 0.993\\)!!!\n\n\n\n\n\n\nWhy is this problematic?\n\nAll points are maximally far apart from all other points\nCan’t distinguish between “similar” and “different” inputs"
   },
   {
-    "objectID": "schedule/slides/12-why-smooth.html#issue-2",
-    "href": "schedule/slides/12-why-smooth.html#issue-2",
+    "objectID": "schedule/slides/12-why-smooth.html#curse-of-dimensionality",
+    "href": "schedule/slides/12-why-smooth.html#curse-of-dimensionality",
     "title": "UBC Stat406 2024W",
-    "section": "Issue 2",
-    "text": "Issue 2\nFor \\(p&gt;1\\), there is more trouble.\nFirst, lets look again at \\[\\textrm{MSE}(\\hat{f}) = \\frac{C_1}{n^{4/5}} + \\frac{C_2}{n^{4/5}} + \\sigma^2\\]\nThat is for \\(p=1\\). It’s not that much slower than \\(C/n\\), the variance for linear models.\nIf \\(p&gt;1\\) similar calculations show,\n\\[\\textrm{MSE}(\\hat f) = \\frac{C_1+C_2}{n^{4/(4+p)}} + \\sigma^2 \\hspace{2em} \\textrm{MSE}(\\hat \\beta)  = b + \\frac{Cp}{n} + \\sigma^2 .\\]"
+    "section": "Curse of Dimensionality",
+    "text": "Curse of Dimensionality\nDistance becomes (exponentially) meaningless in high dimensions.*\n*(Unless our data has “low dimensional structure.”)\n\nRisk decomposition (\\(p &gt; 1\\))\nAssuming optimal bandwidth of \\(n^{-1/(4+p)}\\)…\n\\[\nR_n^{(\\mathrm{basis})} =\n  \\underbrace{C_1^{(b)}}_{\\mathrm{bias}^2} +\n  \\underbrace{\\tfrac{C_2^{(b)}}{n/p}}_{\\mathrm{var}} +\n  \\sigma^2,\n\\qquad\nR_n^{(\\mathrm{local})} =\n  \\underbrace{\\tfrac{C_1^{(l)}}{n^{4/(4+p)}}}_{\\mathrm{bias}^2} +\n  \\underbrace{\\tfrac{C_2^{(l)}}{n^{4/(4+p)}}}_{\\mathrm{var}} +\n  \\sigma^2.\n\\]\n\n\nObservations\n\n\\((C_1 + C_2) / n^{4/(4+p)}\\) is relatively big, but \\(C_2^{(b)} / (n/p)\\) is relatively small.\nSo unless \\(C_1^{(b)}\\) is big, we should use the linear model.*"
   },
   {
-    "objectID": "schedule/slides/12-why-smooth.html#issue-2-1",
-    "href": "schedule/slides/12-why-smooth.html#issue-2-1",
+    "objectID": "schedule/slides/12-why-smooth.html#in-practice",
+    "href": "schedule/slides/12-why-smooth.html#in-practice",
     "title": "UBC Stat406 2024W",
-    "section": "Issue 2",
-    "text": "Issue 2\n\\[\\textrm{MSE}(\\hat f) = \\frac{C_1+C_2}{n^{4/(4+p)}} + \\sigma^2 \\hspace{2em} \\textrm{MSE}(\\hat \\beta)  = b + \\frac{Cp}{n} + \\sigma^2 .\\]\nWhat if \\(p\\) is big (and \\(n\\) is really big)?\n\nThen \\((C_1 + C_2) / n^{4/(4+p)}\\) is still big.\nBut \\(Cp / n\\) is small.\nSo unless \\(b\\) is big, we should use the linear model.\n\nHow do you tell? Do model selection to decide.\nA very, very questionable rule of thumb: if \\(p&gt;\\log(n)\\), don’t do smoothing."
+    "section": "In practice",
+    "text": "In practice\nThe previous math assumes that our data are “densely” distributed throughout \\(\\R^p\\).\nHowever, if our data lie on a low-dimensional manifold within \\(\\R^p\\), then local methods can work well!\nWe generally won’t know the “intrinsic dimensinality” of our data though…\n\n\nHow to decide between basis expansions versus local kernel smoothers:\n\nModel selection\nUsing a very, very questionable rule of thumb: if \\(p&gt;\\log(n)\\), don’t do smoothing."
   },
   {
     "objectID": "schedule/slides/00-intro-to-class.html#section",
diff --git a/sitemap.xml b/sitemap.xml
index 72bd7a0..00e9b31 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -2,194 +2,194 @@
 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/00-r-review.html</loc>
-    <lastmod>2024-10-07T17:57:46.545Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.724Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/handouts/keras-nnet.html</loc>
-    <lastmod>2024-10-07T17:57:46.538Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.718Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/11-kernel-smoothers.html</loc>
-    <lastmod>2024-10-07T17:57:46.546Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.725Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/09-l1-penalties.html</loc>
-    <lastmod>2024-10-07T17:57:46.545Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.725Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/18-the-bootstrap.html</loc>
-    <lastmod>2024-10-07T17:57:46.546Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.726Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/23-nnets-other.html</loc>
-    <lastmod>2024-10-07T17:57:46.546Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.726Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/05-estimating-test-mse.html</loc>
-    <lastmod>2024-10-07T17:57:46.545Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.725Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/13-gams-trees.html</loc>
-    <lastmod>2024-10-07T17:57:46.546Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.726Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/26-pca-v-kpca.html</loc>
-    <lastmod>2024-10-07T17:57:46.546Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.726Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/00-classification-losses.html</loc>
-    <lastmod>2024-10-07T17:57:46.544Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.724Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/20-boosting.html</loc>
-    <lastmod>2024-10-07T17:57:46.546Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.726Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/27-kmeans.html</loc>
-    <lastmod>2024-10-07T17:57:46.547Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.726Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/14-classification-intro.html</loc>
-    <lastmod>2024-10-07T17:57:46.546Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.726Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/04-bias-variance.html</loc>
-    <lastmod>2024-10-07T17:57:46.545Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.725Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/06-information-criteria.html</loc>
-    <lastmod>2024-10-07T17:57:46.545Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.725Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/03-regression-function.html</loc>
-    <lastmod>2024-10-07T17:57:46.545Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.725Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/21-nnets-intro.html</loc>
-    <lastmod>2024-10-07T17:57:46.546Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.726Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/12-why-smooth.html</loc>
-    <lastmod>2024-10-07T17:57:46.546Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.726Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/00-intro-to-class.html</loc>
-    <lastmod>2024-10-07T17:57:46.544Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.724Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/handouts/lab00-git.html</loc>
-    <lastmod>2024-10-07T17:57:46.538Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.718Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/course-setup.html</loc>
-    <lastmod>2024-10-07T17:57:46.521Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.701Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/computing/windows.html</loc>
-    <lastmod>2024-10-07T17:57:46.521Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.701Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/computing/mac_x86.html</loc>
-    <lastmod>2024-10-07T17:57:46.521Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.701Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/computing/index.html</loc>
-    <lastmod>2024-10-07T17:57:46.521Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.701Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/index.html</loc>
-    <lastmod>2024-10-07T17:57:46.522Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.701Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/computing/mac_arm.html</loc>
-    <lastmod>2024-10-07T17:57:46.521Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.701Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/computing/ubuntu.html</loc>
-    <lastmod>2024-10-07T17:57:46.521Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.701Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/syllabus.html</loc>
-    <lastmod>2024-10-07T17:57:46.568Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.748Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/index.html</loc>
-    <lastmod>2024-10-07T17:57:46.544Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.724Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/00-course-review.html</loc>
-    <lastmod>2024-10-07T17:57:46.544Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.724Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/00-version-control.html</loc>
-    <lastmod>2024-10-07T17:57:46.545Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.725Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/faq.html</loc>
-    <lastmod>2024-10-07T17:57:46.522Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.701Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/01-lm-review.html</loc>
-    <lastmod>2024-10-07T17:57:46.545Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.725Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/00-cv-for-many-models.html</loc>
-    <lastmod>2024-10-07T17:57:46.544Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.724Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/19-bagging-and-rf.html</loc>
-    <lastmod>2024-10-07T17:57:46.546Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.726Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/22-nnets-estimation.html</loc>
-    <lastmod>2024-10-07T17:57:46.546Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.726Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/16-logistic-regression.html</loc>
-    <lastmod>2024-10-07T17:57:46.546Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.726Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/08-ridge-regression.html</loc>
-    <lastmod>2024-10-07T17:57:46.545Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.725Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/00-quiz-0-wrap.html</loc>
-    <lastmod>2024-10-07T17:57:46.544Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.724Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/15-LDA-and-QDA.html</loc>
-    <lastmod>2024-10-07T17:57:46.546Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.726Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/24-pca-intro.html</loc>
-    <lastmod>2024-10-07T17:57:46.546Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.726Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/25-pca-issues.html</loc>
-    <lastmod>2024-10-07T17:57:46.546Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.726Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/10-basis-expansions.html</loc>
-    <lastmod>2024-10-07T17:57:46.545Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.725Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/28-hclust.html</loc>
-    <lastmod>2024-10-07T17:57:46.547Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.726Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/07-greedy-selection.html</loc>
-    <lastmod>2024-10-07T17:57:46.545Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.725Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/02-lm-example.html</loc>
-    <lastmod>2024-10-07T17:57:46.545Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.725Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/17-nonlinear-classifiers.html</loc>
-    <lastmod>2024-10-07T17:57:46.546Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.726Z</lastmod>
   </url>
   <url>
     <loc>https://UBC-STAT.github.io/stat-406/schedule/slides/00-gradient-descent.html</loc>
-    <lastmod>2024-10-07T17:57:46.544Z</lastmod>
+    <lastmod>2024-10-07T20:06:45.724Z</lastmod>
   </url>
 </urlset>