From 141c983cb920b03743c8034ba57083b94fccd76d Mon Sep 17 00:00:00 2001 From: Quarto GHA Workflow Runner Date: Tue, 24 Sep 2024 04:25:55 +0000 Subject: [PATCH] Built site for gh-pages --- .nojekyll | 2 +- schedule/slides/06-information-criteria.html | 45 +++++---- search.json | 21 ++--- sitemap.xml | 96 ++++++++++---------- 4 files changed, 77 insertions(+), 87 deletions(-) diff --git a/.nojekyll b/.nojekyll index bb02d9e..a591150 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -87f2f05c \ No newline at end of file +57dbfbac \ No newline at end of file diff --git a/schedule/slides/06-information-criteria.html b/schedule/slides/06-information-criteria.html index da4d643..fd61585 100644 --- a/schedule/slides/06-information-criteria.html +++ b/schedule/slides/06-information-criteria.html @@ -458,7 +458,7 @@

LOO-CV: Math to the rescue!

LOO-CV: Math to the rescue!

For models where predictions are a linear function of the training responses*,

-

LOO-CV has a closed-form expression!

+

LOO-CV has a closed-form expression! Just need to fit once:

\[\mbox{LOO-CV} \,\, \hat R_n = \frac{1}{n} \sum_{i=1}^n \frac{(Y_i -\widehat{Y}_i)^2}{(1-{\boldsymbol H}_{ii})^2}.\]

-
-

Recommendations

-
-

When comparing models, choose one criterion: CV / AIC / BIC / Cp / GCV.

-

CV is usually easiest to make sense of and doesn’t depend on other unknown parameters.

-

But, it requires refitting the model.

-

Also, it can be strange in cases with discrete predictors, time series, repeated measurements, graph structures, etc.

-
-
-
-

High-level intuition of these:

+
+

Commentary

+
    +
  • When comparing models, choose one criterion: CV / AIC / BIC / Cp / GCV. +
      +
    • In some special cases, AIC = Cp = SURE \(\approx\) LOO-CV
    • +
  • +
  • CV is generic, easy, and doesn’t depend on unknowns.
      -
    • GCV tends to choose “dense” models.

    • -
    • Theory says AIC chooses the “best predicting model” asymptotically.

    • -
    • Theory says BIC should choose the “true model” asymptotically, tends to select fewer predictors.

    • -
    • In some special cases, AIC = Cp = SURE \(\approx\) LOO-CV

    • -
    • As a technical point, CV (or validation set) is estimating error on new data, unseen \((X_0, Y_0)\), while AIC / CP are estimating error on new Y at the observed \(x_1,\ldots,x_n\). This is subtle.

    • +
    • But requires refitting, and nontrivial for discrete predictors, time series, etc.
    • +
  • +
  • GCV tends to choose “dense” models.
  • +
  • Theory says AIC chooses “best predicting model” asymptotically.
  • +
  • Theory says BIC chooses “true model” asymptotically, tends to select fewer predictors.
  • +
  • Technical: CV (or validation set) is estimating error on new data, unseen \((X_0, Y_0)\); AIC / CP are estimating error on new Y at the observed \(x_1,\ldots,x_n\). This is subtle.