Skip to content

Commit

Permalink
Kernel smoothers updates
Browse files Browse the repository at this point in the history
  • Loading branch information
gpleiss committed Oct 7, 2024
1 parent f4189ae commit 5f9251d
Show file tree
Hide file tree
Showing 9 changed files with 2,340 additions and 1,217 deletions.

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
106 changes: 79 additions & 27 deletions schedule/slides/11-kernel-smoothers.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,11 @@ metadata-files:

## 1) The Bias of OLS

In the clicker question from last class, we were trying to predict income from
a $p=2$ dataset with $n=50000$. I claimed that the OLS predictor was *high bias*.\
In last class' clicker question, we were trying to predict income from
a $p=2$, $n=50000$ dataset. I claimed that the *OLS predictor is high bias* in this example.\

\
But Trevor proved that the OLS predictor was unbiased (via the following proof):
But Trevor proved that the *OLS predictor is unbiased* (via the following proof):

A. Assume that $y = x^\top \beta + \epsilon, \quad \epsilon \sim \mathcal N(0, \sigma^2).$

Expand All @@ -27,24 +27,24 @@ E. So $E[\hat \beta_\mathrm{ols}] - \beta = E[(\mathbf X^\top \mathbf X)^{-1} \m

\
[Why did this proof not apply to the clicker question?]{.secondary} \
[(Which step did the proof break down?)]{.small}
[(Which step of this proof breaks down?)]{.small}


## 1) The Bias of OLS

[Why did this proof not apply to the clicker question?]{.secondary} \
[(Which step did the proof break down?)]{.small}
Which step did the proof break down?

A. Assume that $y = x^\top \beta + \epsilon, \quad \epsilon \sim \mathcal N(0, \sigma^2).$

\
This assumption does not hold.

It is (almost certainly) not the case that `Income ~ Age + Education`.

. . .

\
In reality, $y \sim f(x) + \epsilon$ where $f(x)$ is some nonlinear function and $\epsilon$ is non-Gaussian. So
In reality, $y \sim f(x) + \epsilon$ where $f(x)$ is some potentially nonlinear function. So

$$
\begin{align}
Expand Down Expand Up @@ -174,15 +174,51 @@ $\widehat{\mathbf{y}} = \mathbf{S} \mathbf{y}$ for some matrix $S$.

You should imagine what $\mathbf{S}$ looks like.

What is the degrees of freedom of KNN?
1. What is the degrees of freedom of KNN, and how does it depend on $k$?
2. How does $k$ affect the bias/variance?

KNN averages the neighbours with equal weight.
. . .

But some neighbours are "closer" than other neighbours.
- $\mathrm{df} = \tr{\mathbf S} = n/k$.
- $k = n$ produces a constant predictor (highest bias, lowest variance).
- $k = 1$ produces a low bias but extremely high variance predictor.

---

```{r}
#| code-fold: true
#| fig-width: 12
#| fig-height: 4
set.seed(406406)
## Local averages
plot_knn <- function(k) {
ggplot(arcuate_unif, aes(position, fa)) +
geom_point(colour = blue) +
geom_line(
data = tibble(
position = seq_range(arcuate_unif$position),
fa = FNN::knn.reg(
arcuate_unif$position, matrix(position, ncol = 1),
y = arcuate_unif$fa,
k = k
)$pred
),
colour = orange, linewidth = 2
) + ggtitle(paste("k =", k))
}
g1 <- plot_knn(1)
g2 <- plot_knn(5)
g3 <- plot_knn(length(arcuate_unif$position))
plot_grid(g1, g2, g3, ncol = 3)
```


## Local averages (soft KNN)

KNN averages the neighbours with equal weight.

But some neighbours are "closer" than other neighbours.

Instead of choosing the number of neighbours to average, we can average
any observations within a certain distance.
Expand All @@ -196,18 +232,20 @@ ggplot(arcuate_unif, aes(position, fa)) +
geom_rect(aes(xmin = position[25] - 15, xmax = position[25] + 15, ymin = 0, ymax = .1), fill = green)
```

. . .

The boxes have width 30.


## What is a "kernel" smoother?

* The mathematics:

> A kernel is any function $K$ such that for any $u$, $K(u) \geq 0$, $\int du K(u)=1$ and $\int uK(u)du=0$.
> A kernel is any function $K$ such that for any $u$,
>
> - $K(u) \geq 0$,
> - $\int K(u) du = 1$,
> - $\int uK(u) du = 0$.
* The idea: a kernel is a nice way to take weighted averages. The kernel function gives the weights.
* The idea: a kernel takes weighted averages. The kernel function gives the weights.

* The previous example is called the [boxcar]{.secondary} kernel.

Expand All @@ -231,11 +269,27 @@ This one gives the same non-zero weight to all points within $\pm 15$ range.

## Other kernels

Most of the time, we don't use the boxcar because the weights are weird. (constant)
Most of the time, we don't use the boxcar because the weights are weird.\
[Ideally we would like closer points to have more weight.]{.small}

A more common one is the Gaussian kernel:
::: flex
::: w-60
A more common kernel: the Gaussian kernel

```{r}
$$
K(u) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp\left(-\frac{u^2}{2\sigma^2}\right)
$$

For the plot, I made $\sigma=7.5$.

Now the weights "die away" for points farther from where we're predicting.\
[(but all nonzero!!)]{.small}

:::

::: w-40

```{r fig.width=8, fig.height=5}
#| code-fold: true
gaussian_kernel <- function(x) dnorm(x, mean = arcuate_unif$position[15], sd = 7.5) * 3
ggplot(arcuate_unif, aes(position, fa)) +
Expand All @@ -244,10 +298,8 @@ ggplot(arcuate_unif, aes(position, fa)) +
stat_function(fun = gaussian_kernel, geom = "area", fill = orange)
```

For the plot, I made $\sigma=7.5$.

Now the weights "die away" for points farther from where we're predicting. (but all nonzero!!)

:::
:::

## Other kernels

Expand Down Expand Up @@ -306,12 +358,12 @@ ggplot(arcuate_unif, aes(position, fa)) +

* It is way more important than which kernel you use.

* The default kernel in `ksmooth()` is something called 'Epanechnikov':
<!-- * The default kernel in `ksmooth()` is something called 'Epanechnikov': -->

```{r}
epan <- function(x) 3/4 * (1 - x^2) * (abs(x) < 1)
ggplot(data.frame(x = c(-2, 2)), aes(x)) + stat_function(fun = epan, colour = green, linewidth = 2)
```
<!-- ```{r} -->
<!-- epan <- function(x) 3/4 * (1 - x^2) * (abs(x) < 1) -->
<!-- ggplot(data.frame(x = c(-2, 2)), aes(x)) + stat_function(fun = epan, colour = green, linewidth = 2) -->
<!-- ``` -->


## Choosing the bandwidth
Expand Down

0 comments on commit 5f9251d

Please sign in to comment.