Damian Anderson, Whitney Anderson, Erika Ibarra, Bryce Lunceford, Paul Smith, and Sebastian Valencia
Suppose that we generate data according to some process:
where
- Train a neural network with parameters
$\theta$ to predict$y_i$ from$x_i$ . Call it$\hat{f}_\theta(x_i)$ . - Let
$X$ be a random variable that is distributed like the data points$x_i$ . - Compare the distribution of
$f(X) - \hat{f}_\theta(X)$ to the distribution of$\varepsilon_i$ . - If the two distributions are similar, this could explain why double descent is observed.
- We will try this with a bunch of different neural network architectures, functions
$f$ and distributions over$\varepsilon_i$ .