Kernel smoothers updates

UBC-STAT · Oct 7, 2024 · 5f9251d · 5f9251d
1 parent f4189ae
commit 5f9251d
Show file tree

Hide file tree

Showing 9 changed files with 2,340 additions and 1,217 deletions.
diff --git a/_freeze/schedule/slides/11-kernel-smoothers/execute-results/html.json b/_freeze/schedule/slides/11-kernel-smoothers/execute-results/html.json
diff --git a/_freeze/schedule/slides/11-kernel-smoothers/figure-revealjs/smoothed-lidar-1.svg b/_freeze/schedule/slides/11-kernel-smoothers/figure-revealjs/smoothed-lidar-1.svg
diff --git a/_freeze/schedule/slides/11-kernel-smoothers/figure-revealjs/unnamed-chunk-2-1.svg b/_freeze/schedule/slides/11-kernel-smoothers/figure-revealjs/unnamed-chunk-2-1.svg
diff --git a/_freeze/schedule/slides/11-kernel-smoothers/figure-revealjs/unnamed-chunk-3-1.svg b/_freeze/schedule/slides/11-kernel-smoothers/figure-revealjs/unnamed-chunk-3-1.svg
diff --git a/_freeze/schedule/slides/11-kernel-smoothers/figure-revealjs/unnamed-chunk-4-1.svg b/_freeze/schedule/slides/11-kernel-smoothers/figure-revealjs/unnamed-chunk-4-1.svg
diff --git a/_freeze/schedule/slides/11-kernel-smoothers/figure-revealjs/unnamed-chunk-5-1.svg b/_freeze/schedule/slides/11-kernel-smoothers/figure-revealjs/unnamed-chunk-5-1.svg
diff --git a/_freeze/schedule/slides/11-kernel-smoothers/figure-revealjs/unnamed-chunk-7-1.svg b/_freeze/schedule/slides/11-kernel-smoothers/figure-revealjs/unnamed-chunk-7-1.svg
diff --git a/_freeze/schedule/slides/11-kernel-smoothers/figure-revealjs/unnamed-chunk-8-1.svg b/_freeze/schedule/slides/11-kernel-smoothers/figure-revealjs/unnamed-chunk-8-1.svg
diff --git a/schedule/slides/11-kernel-smoothers.qmd b/schedule/slides/11-kernel-smoothers.qmd
@@ -9,11 +9,11 @@ metadata-files:
 
 ## 1) The Bias of OLS
 
-In the clicker question from last class, we were trying to predict income from
-a $p=2$ dataset with $n=50000$. I claimed that the OLS predictor was *high bias*.\
+In last class' clicker question, we were trying to predict income from
+a $p=2$, $n=50000$ dataset. I claimed that the *OLS predictor is high bias* in this example.\
 
 \
-But Trevor proved that the OLS predictor was unbiased (via the following proof):
+But Trevor proved that the *OLS predictor is unbiased* (via the following proof):
 
 A. Assume that $y = x^\top \beta + \epsilon, \quad \epsilon \sim \mathcal N(0, \sigma^2).$
 
@@ -27,24 +27,24 @@ E. So $E[\hat \beta_\mathrm{ols}] - \beta = E[(\mathbf X^\top \mathbf X)^{-1} \m
 
 \
 [Why did this proof not apply to the clicker question?]{.secondary} \
-[(Which step did the proof break down?)]{.small}
+[(Which step of this proof breaks down?)]{.small}
 
 
 ## 1) The Bias of OLS
 
-[Why did this proof not apply to the clicker question?]{.secondary} \
-[(Which step did the proof break down?)]{.small}
+Which step did the proof break down?
 
 A. Assume that $y = x^\top \beta + \epsilon, \quad \epsilon \sim \mathcal N(0, \sigma^2).$
 
+\
 This assumption does not hold.
 
 It is (almost certainly) not the case that `Income ~ Age + Education`.
 
 . . .
 
 \
-In reality, $y \sim f(x) + \epsilon$ where $f(x)$ is some nonlinear function and $\epsilon$ is non-Gaussian. So
+In reality, $y \sim f(x) + \epsilon$ where $f(x)$ is some potentially nonlinear function. So
 
 $$
 \begin{align}
@@ -174,15 +174,51 @@ $\widehat{\mathbf{y}} = \mathbf{S} \mathbf{y}$ for some matrix $S$.
 
 You should imagine what $\mathbf{S}$ looks like.
 
-What is the degrees of freedom of KNN?
+1. What is the degrees of freedom of KNN, and how does it depend on $k$?
+2. How does $k$ affect the bias/variance?
 
-KNN averages the neighbours with equal weight.
+. . .
 
-But some neighbours are "closer" than other neighbours.
+- $\mathrm{df} = \tr{\mathbf S} = n/k$.
+- $k = n$ produces a constant predictor (highest bias, lowest variance).
+- $k = 1$ produces a low bias but extremely high variance predictor.
 
+---
 
+```{r}
+#| code-fold: true
+#| fig-width: 12
+#| fig-height: 4
+set.seed(406406)
 
-## Local averages
+plot_knn <- function(k) {
+  ggplot(arcuate_unif, aes(position, fa)) +
+    geom_point(colour = blue) +
+    geom_line(
+      data = tibble(
+        position = seq_range(arcuate_unif$position),
+        fa = FNN::knn.reg(
+          arcuate_unif$position, matrix(position, ncol = 1),
+          y = arcuate_unif$fa,
+          k = k
+        )$pred
+      ),
+      colour = orange, linewidth = 2
+    ) + ggtitle(paste("k =", k))
+}
+
+g1 <- plot_knn(1)
+g2 <- plot_knn(5)
+g3 <- plot_knn(length(arcuate_unif$position))
+plot_grid(g1, g2, g3, ncol = 3)
+```
+
+
+## Local averages (soft KNN)
+
+KNN averages the neighbours with equal weight.
+
+But some neighbours are "closer" than other neighbours.
 
 Instead of choosing the number of neighbours to average, we can average 
 any observations within a certain distance.
@@ -196,18 +232,20 @@ ggplot(arcuate_unif, aes(position, fa)) +
   geom_rect(aes(xmin = position[25] - 15, xmax = position[25] + 15, ymin = 0, ymax = .1), fill = green)
 ```
 
-. . .
-
 The boxes have width 30. 
 
 
 ## What is a "kernel" smoother?
 
 * The mathematics:
 
-> A kernel is any function $K$ such that for any $u$, $K(u) \geq 0$, $\int du K(u)=1$ and $\int uK(u)du=0$.
+> A kernel is any function $K$ such that for any $u$,
+>
+> - $K(u) \geq 0$,
+> - $\int K(u) du = 1$,
+> - $\int uK(u) du = 0$.
 
-* The idea: a kernel is a nice way to take weighted averages. The kernel function gives the weights.
+* The idea: a kernel takes weighted averages. The kernel function gives the weights.
 
 * The previous example is called the [boxcar]{.secondary} kernel. 
 
@@ -231,11 +269,27 @@ This one gives the same non-zero weight to all points within $\pm 15$ range.
 
 ## Other kernels
 
-Most of the time, we don't use the boxcar because the weights are weird. (constant)
+Most of the time, we don't use the boxcar because the weights are weird.\
+[Ideally we would like closer points to have more weight.]{.small}
 
-A more common one is the Gaussian kernel:
+::: flex
+::: w-60
+A more common kernel: the Gaussian kernel
 
-```{r}
+$$
+K(u) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp\left(-\frac{u^2}{2\sigma^2}\right)
+$$
+
+For the plot, I made $\sigma=7.5$. 
+
+Now the weights "die away" for points farther from where we're predicting.\
+[(but all nonzero!!)]{.small}
+
+:::
+
+::: w-40
+
+```{r fig.width=8, fig.height=5}
 #| code-fold: true
 gaussian_kernel <- function(x) dnorm(x, mean = arcuate_unif$position[15], sd = 7.5) * 3
 ggplot(arcuate_unif, aes(position, fa)) +
@@ -244,10 +298,8 @@ ggplot(arcuate_unif, aes(position, fa)) +
   stat_function(fun = gaussian_kernel, geom = "area", fill = orange)
 ```
 
-For the plot, I made $\sigma=7.5$. 
-
-Now the weights "die away" for points farther from where we're predicting. (but all nonzero!!)
-
+:::
+:::
 
 ## Other kernels
 
@@ -306,12 +358,12 @@ ggplot(arcuate_unif, aes(position, fa)) +
 
 * It is way more important than which kernel you use.
 
-* The default kernel in `ksmooth()` is something called 'Epanechnikov':
+<!-- * The default kernel in `ksmooth()` is something called 'Epanechnikov': -->
 
-```{r}
-epan <- function(x) 3/4 * (1 - x^2) * (abs(x) < 1)
-ggplot(data.frame(x = c(-2, 2)), aes(x)) + stat_function(fun = epan, colour = green, linewidth = 2)
-```
+<!-- ```{r} -->
+<!-- epan <- function(x) 3/4 * (1 - x^2) * (abs(x) < 1) -->
+<!-- ggplot(data.frame(x = c(-2, 2)), aes(x)) + stat_function(fun = epan, colour = green, linewidth = 2) -->
+<!-- ``` -->
 
 
 ## Choosing the bandwidth