-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathMathur_Navodita_HW_08.Rmd
1170 lines (813 loc) · 49.8 KB
/
Mathur_Navodita_HW_08.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "INFSCI 2595 Spring 2023 Homework: 08"
subtitle: "Assigned March 21, 2023; Due: March 28, 2023"
author: "Navodita Mathur"
date: "Submission time: March 28, 2023 at 11:00PM EST"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
#### Collaborators
Include the names of your collaborators here.
## Overview
This homework assignment is focused on model complexity and the influence of the prior **regularization** strength. You will fit non-Bayesian and Bayesian linear models, compare them, and make predictions to visualize the trends. You will use multiple prior **strengths** to study the impact on the coefficient posteriors and on the posterior predictive distributions.
You are also introduced to non-Bayesian regularization with Lasso regression via the `glmnet` package. If you do not have `glmnet` installed please download it before starting the assignment.
**IMPORTANT**: The RMarkdown assumes you have downloaded the data set (CSV file) to the same directory you saved the template Rmarkdown file. If you do not have the CSV files in the correct location, the data will not be loaded correctly.
### IMPORTANT!!!
Certain code chunks are created for you. Each code chunk has `eval=FALSE` set in the chunk options. You **MUST** change it to be `eval=TRUE` in order for the code chunks to be evaluated when rendering the document.
You are free to add more code chunks if you would like.
## Load packages
This assignment will use packages from the `tidyverse` suite as well as the `coefplot` package. Those packages are imported for you below.
```{r, load_packages}
library(tidyverse)
library(coefplot)
```
This assignment also uses the `splines` and `MASS` packages. Both are installed with base `R` and so you do not need to download any additional packages to complete the assignment.
The last question in the assignment uses the `glmnet` package. As stated previously, please download and install `glmnet` if you do not currently have it.
## Problem 01
You will fit and compare **6 models** of varying complexity using **non-Bayesian methods**. The unknown parameters will be be estimated by finding their Maximum Likelihood Estimates (MLE). You are allowed to use the `lm()` function for this problem.
The data are loaded in the code chunk and a glimpse is shown for you below. There are 2 continuous inputs, `x1` and `x2`, and a continuous response `y`.
```{r, read_data}
hw_file_path <- 'hw08_data.csv'
df <- readr::read_csv(hw_file_path, col_names = TRUE)
df %>% glimpse()
```
### 1a)
**Create a scatter plot between the response, `y`, and each input using `ggplot()`.**
**Based on the visualizations, do you think there are trends between either input and the response?**
#### SOLUTION
```{r, solution_01a}
df%>%
ggplot(mapping=aes(y=y))+
geom_point(mapping=aes(x=x1), color='red')
```
```{r, solution_01a_2}
df%>%
ggplot(mapping=aes(y=y))+
geom_point(mapping=aes(x=x2), color='green')
```
Based on the visualizations, the trends between either input and the response seem to be parabolic
### 1b)
You will fit multiple models of varying complexity in this problem. You will start with *linear additive features* which *add* the effect of one input with the other. Your model therefore *controls* for both inputs.
**Fit a model with linear additive features to predict the response, `y`. Use the formula interface and the `lm()` function to fit the model. Assign the result to the `mod01` object.**
**Visualize the coefficient summaries with the `coefplot()` function. Are any of the features statistically significant?**
#### SOLUTION
```{r, solution_01b}
### add more code chunks if you like
mod01 <- lm(y~x1+x2, data=df)
coefplot(mod01)
```
None of the features are considered to be statistically significant.
### 1c)
As discussed in lecture, we can derive features from inputs. We have worked with polynomial features and spline-based features in previous assignments. Features can also be derived as the products between different inputs. A feature calculated as the **product** of multiple inputs is usually referred to as the **interaction** between those inputs.
In the formula interface, a product of two inputs is denoted by the `:`. And so if we want to include just the multiplication of `x1` and `x2` in a model we would type, `x1:x2`. We can then include **main-effect** terms by including the additive features within the formula. Thus, the formula for a model with additive features and the interaction between `x1` and `x2` is:
`y ~ x1 + x2 + x1:x2`
However, the formula interface provides a short-cut to create main effects and interaction features. In the formula interface, the `*` operator will generate all main-effects and all interactions for us.
**Fit a model with all main-effect and all-interaction features between `x1` and `x2` using the short-cut `*` operator within the formula interface. Assign the result to the `mod02` object.**
**Visualize the coefficient summaries with the `coefplot()` function. How many features are present in the model? Are any of the features statistically significant?**
#### SOLUTION
```{r, solution_01c}
### add more code chunks if you like
mod02 <- lm(y~x1+x2+ x1:x2, data=df)
coefplot(mod02)
```
There are 4 features in the model. x1 and x1x2 are considered to be statistically significant.
### 1d)
The `*` operator will interact more than just inputs. We can interact expressions or groups of features together. To interact one group of features by another group of features, we just need to enclose each group within parenthesis, `()`, and separate them by the `*` operator. The line of code below shows how this works with the `<expression 1>` and `<expression 2>` as place holders for any expression we want to use.
`(<expression 1>) * (<expression 2>)`
**Fit a model which interacts linear and quadratic features from `x1` with linear and quadratic features from `x2`. Assign the result to the `mod03` object.**
**Visualize the coefficient summaries with the `coefplot()` function. How many features are present in the model? Are any of the features statistically significant?**
*HINT*: Remember to use the `I()` function when typing polynomials in the formula interface.
#### SOLUTION
```{r, solution_01d}
### add more code chunks if you like
mod03 <- lm(y~(x1+I(x1^2))*(x2+I(x2^2)), data=df)
coefplot(mod03)
```
There are 9 features in the model. intercept and x2^2 are considered to be statistically significant.
### 1e)
Let's now try a more complicated model.
**Fit a model which interacts linear, quadratic, cubic, and quartic (4th degree) polynomial features from `x1` with linear, quadratic, cubic, and quartic (4th degree) polynomial features from `x2`. Assign the result to the `mod04` object.**
**Visualize the coefficient summaries with the `coefplot()` function. Are any of the features statistically significant?**
#### SOLUTION
```{r, solution_01e}
### add more code chunks if you like
mod04 <- lm(y~(x1+I(x1^2)+I(x1^3)+I(x1^4))*(x2+I(x2^2)+I(x2^3)+I(x2^4)), data=df)
coefplot(mod04)
```
Among all only Intercept is considered to be statistically significant.
### 1f)
Let's try using spline based features. We will use a high degree-of-freedom natural spline applied to `x1` and interact those features with polynomial features derived from `x2`.
**Fit a model which interacts a 12 degree-of-freedom natural (DOF) spline from `x1` with linear and quadrtic polyonomial features from `x2`. Assign the result to `mod05`.**
**Visualize the coefficient summaries with the `coefplot()` function. Are any of the features statistically significant?**
#### SOLUTION
```{r, solution_01f}
### add more code chunks if you like
mod05 <- lm(y~(splines::ns(x1,df = 12))*(x2+I(x2^2)), data=df)
coefplot(mod05)
```
None of the features are considered to be statistically significant.
### 1g)
Let's fit one final model.
**Fit a model which interacts a 12 degree-of-freedom natural spline from `x1` with linear, quadrtic, cubic, and quartic (4th degree) polyonomial features from `x2`. Assign the result to `mod05`.**
**Visualize the coefficient summaries with the `coefplot()` function. Are any of the features statistically significant?**
#### SOLUTION
```{r, solution_01g}
### add more code chunks if you like
mod06 <- lm(y~(splines::ns(x1, df=12))*(x2+I(x2^2)+I(x2^3)+I(x2^4)), data=df)
coefplot(mod06)
```
None of the features are considered to be statistically significant.
### 1h)
Now that you have fit multiple models of varying complexity, it is time to identify the best performing model.
**Identify the best model considering training set only performance metrics. Which model is best according to R-squared? Which model is best according to AIC? Which model is best according to BIC?**
*HINT*: The `brooom::glance()` function can be helpful here. The `broom` package is installed with `tidyverse` and so you should have it already.
#### SOLUTION
```{r, solution_01h}
### add more code chunks if you like
perf_metrics <- function(mod, model_name)
{
broom::glance(mod) %>%
mutate(model_name = model_name)
}
model_list <- list(mod01,mod02,mod03,mod04,mod05,mod06)
model_names <- list("mod01","mod02","mod03","mod04","mod05","mod06")
all_model_metrics <- purrr::map2_dfr(model_list,
model_names,
perf_metrics)
all_model_metrics %>% dplyr::select(model_name,r.squared, adj.r.squared, AIC, BIC)
```
mod06 is considered to be best according to r2 while mod06 is considered as best according to AIC and BIC
## Problem 02
Now that you know which model is best, let's visualize the predictive trends from the six models. This will help us better understand their performance and behavior.
### 2a)
You will define a prediction or visualization test grid. This grid will allow you to visualize behavior with respect to `x1` for multiple values of `x2`.
**Create a grid of input values where `x1` consists of 101 evenly spaced points between -3.2 and 3.2 and `x2` is 9 evenly spaced points between -3 and 3. The `expand.grid()` function is started for you and the data type conversion is provided to force the result to be a `tibble`.**
#### SOLUTION
```{r, solution_02a, eval=TRUE}
viz_grid <- expand.grid(x1 = seq(-3.2, 3.2, length.out=101),
x2 = seq(-3, 3, length.out=9),
KEEP.OUT.ATTRS = FALSE,
stringsAsFactors = FALSE) %>%
as.data.frame() %>% tibble::as_tibble()
```
### 2b)
You will make predictions for each of the models and visualize their trends. A function, `tidy_predict()`, is created for you which assembles the predicted mean trend, the confidence interval, and the prediction interval into a `tibble` for you. The result include the input values to streamline making the visualizations.
```{r, make_tidy_predict_function}
tidy_predict <- function(mod, xnew)
{
pred_df <- predict(mod, xnew, interval = "confidence") %>%
as.data.frame() %>% tibble::as_tibble() %>%
dplyr::select(pred = fit, ci_lwr = lwr, ci_upr = upr) %>%
bind_cols(predict(mod, xnew, interval = 'prediction') %>%
as.data.frame() %>% tibble::as_tibble() %>%
dplyr::select(pred_lwr = lwr, pred_upr = upr))
xnew %>% bind_cols(pred_df)
}
```
The first argument to the `tidy_predict()` function is a `lm()` model object and the second argument is new or test dataframe of inputs. When working with `lm()` and its `predict()` method, the functions will create the test design matrix consistent with the training design basis. It does so via the model object's formula which is contained within the `lm()` model object. The `lm()` object therefore takes care of the heavy lifting for us!
**Make predictions with each of the six models you fit in Problem 01 using the visualization grid, `viz_grid`. The predictions should be assigned to the variables `pred_lm_01` through `pred_lm_06` where the number is consistent with the model number fit previously.**
#### SOLUTION
```{r, solution_02b, eval=TRUE}
pred_lm_01 <- tidy_predict(mod01, viz_grid)
pred_lm_02 <- tidy_predict(mod02, viz_grid)
pred_lm_03 <- tidy_predict(mod03, viz_grid)
pred_lm_04 <- tidy_predict(mod04, viz_grid)
pred_lm_05 <- tidy_predict(mod05, viz_grid)
pred_lm_06 <- tidy_predict(mod06, viz_grid)
```
### 2c)
You will now visualize the predictive trends and the confidence and prediction intervals for each model. The `pred` column in of each `pred_lm_` objects is the predictive mean trend. The `ci_lwr` and `ci_upr` columns are the lower and upper bounds of the confidence interval, respectively. The `pred_lwr` and `pred_upr` columns are the lower and upper bounds of the prediction interval, respectively.
You will use `ggplot()` to visualize the predictions. You will use `geom_line()` to visualize the mean trend and `geom_ribbon()` to visualize the uncertainty intervals.
**Visualize the predictions of each model on the visualization grid. Pipe the `pred_lm_` object to `ggplot()` and map the `x1` variable to the x-aesthetic. Add three geometric object layers. The first and second layers are each `geom_ribbon()` and the third layer is `geom_line()`. In the `geom_line()` layer map the `pred` variable to the `y` aesthetic. In the first `geom_ribbon()` layer, map `pred_lwr` and `pred_upr` to the `ymin` and `ymax` aesthetics, respectively. Hard code the `fill` to be orange in the first `geom_ribbon()` layer (outside the `aes()` call). In the second `geom_ribbon()` layer, map `ci_lwr` and `ci_upr` to the `ymin` and `ymax` aesthetics, respectively. Hard code the `fill` to be `grey` in the second `geom_ribbon()` layer (outside the `aes()` call). Include `facet_wrap()` with the facets with controlled by the `x2` variable.**
**To help compare the visualizations across models include a `coord_cartesian()` layer with the `ylim` argument set to `c(-7,7)`.**
**Each model's prediction visualization should be created in a separate code chunk.**
#### SOLUTION
Create separate code chunks for each visualization.
```{r}
pred_lm_01 %>%
ggplot(mapping = aes(x = x1)) +
geom_ribbon(mapping = aes(ymin = pred_lwr, ymax = pred_upr), fill = "orange") +
geom_ribbon(mapping = aes(ymin = ci_lwr, ymax = ci_upr), fill = "grey") +
geom_line(mapping = aes(y = pred)) +
coord_cartesian(ylim = c(-7, 7))+
facet_wrap(~x2)
```
```{r}
pred_lm_02%>%
ggplot(mapping = aes(x=x1))+
geom_ribbon(mapping = aes(ymin = pred_lwr, ymax=pred_upr), fill="orange")+
geom_ribbon(mapping = aes(ymin = ci_lwr, ymax=ci_upr), fill="grey")+
geom_line(mapping = aes(y=pred))+
coord_cartesian(ylim=c(-7,7))+
facet_wrap(~x2)
```
```{r}
pred_lm_03%>%
ggplot(mapping = aes(x=x1))+
geom_ribbon(mapping = aes(ymin = pred_lwr, ymax=pred_upr), fill="orange")+
geom_ribbon(mapping = aes(ymin = ci_lwr, ymax=ci_upr), fill="grey")+
geom_line(mapping = aes(y=pred))+
coord_cartesian(ylim=c(-7,7))+
facet_wrap(~x2)
```
```{r}
pred_lm_04%>%
ggplot(mapping = aes(x=x1))+
geom_ribbon(mapping = aes(ymin = pred_lwr, ymax=pred_upr), fill="orange")+
geom_ribbon(mapping = aes(ymin = ci_lwr, ymax=ci_upr), fill="grey")+
geom_line(mapping = aes(y=pred))+
coord_cartesian(ylim=c(-7,7))+
facet_wrap(~x2)
```
```{r}
pred_lm_05%>%
ggplot(mapping = aes(x=x1))+
geom_ribbon(mapping = aes(ymin = pred_lwr, ymax=pred_upr), fill="orange")+
geom_ribbon(mapping = aes(ymin = ci_lwr, ymax=ci_upr), fill="grey")+
geom_line(mapping = aes(y=pred))+
coord_cartesian(ylim=c(-7,7))+
facet_wrap(~x2)
```
```{r}
pred_lm_06%>%
ggplot(mapping = aes(x=x1))+
geom_ribbon(mapping = aes(ymin = pred_lwr, ymax=pred_upr), fill="orange")+
geom_ribbon(mapping = aes(ymin = ci_lwr, ymax=ci_upr), fill="grey")+
geom_line(mapping = aes(y=pred))+
coord_cartesian(ylim=c(-7,7))+
facet_wrap(~x2)
```
### 2d)
**Do you feel the predictions are consistent with the model performance rankings based on AIC/BIC? What is the defining characteristic of the models considered to be the worst by AIC/BIC?**
#### SOLUTION
What do you think?
Yes, the predictions are consistent with the model performance rankings based on AIC/BIC.
The defining characteristic of the models considered to be the worst by AIC/BIC is that they tend to have more parameters than necessary to capture the important relationships between the variables in the model.
## Problem 03
Now that you have fit non-Bayesian linear models with maximum likelihood estimation, it is time to use Bayesian models to understand the influence of the prior on the model behavior.
Regardless of your answers in Problem 02 you will only work with model 3 and model 6 in this problem.
### 3a)
You will perform the Bayesian analysis using the Laplace Approximation just as you did in the previous assignment. You will define the log-posterior function just as you did in the previous assignment and so before doing so you must create the list of required information. This list will include the observed response, the design matrix, and the prior specification. You will use independent Gaussian priors on the regression parameters with a shared prior mean and shared prior standard deviation. You will use an Exponential prior on the unknown likelihood noise (the $\sigma$ parameter).
**Complete the two code chunks below. In the first, create the design matrix following `mod03`'s formula, and assign the object to the `X03` variable. Complete the `info_03_weak` list by assigning the response to `yobs` and the design matrix to `design_matrix`. Specify the shared prior mean, `mu_beta`, to be 0, the shared prior standard deviation, `tau_beta`, as 50, and the rate parameter on the noise, `sigma_rate`, to be 1.**
**Complete the second code chunk with the same prior specification. The second code chunk however requires that you create the design matrix associated with `mod06`'s formula and assign the object to the `X06` variable. Assign `X06` to the `design_matrix` field of the `info_06_weak` list.**
#### SOLUTION
```{r, solution_03a_a, eval=TRUE}
X03 <- model.matrix(y ~ (x1+I(x1^2))*(x2+I(x2^2)), data = df)
info_03_weak <- list(
yobs = df$y,
design_matrix = X03,
mu_beta = 0,
tau_beta = 50,
sigma_rate = 1
)
```
```{r, solution_03a_b, eval=TRUE}
X06 <- model.matrix(y~(splines::ns(x1, df=12))*(x2+I(x2^2)+I(x2^3)+I(x2^4)), data=df)
info_06_weak <- list(
yobs = df$y,
design_matrix = X06,
mu_beta = 0,
tau_beta = 50,
sigma_rate = 1
)
```
### 3b)
You will now define the log-posterior function `lm_logpost()`. You will continue to use the log-transformation on $\sigma$, and so you will actually define the log-posterior in terms of the mean trend $\boldsymbol{\beta}$-parameters and the unbounded noise parameter, $\varphi = \log\left[\sigma\right]$.
The comments in the code chunk below tell you what you need to fill in. The unknown parameters to learn are contained within the first input argument, `unknowns`. You will assume that the unknown $\boldsymbol{\beta}$-parameters are listed before the unknown $\varphi$ parameter in the `unknowns` vector. You must specify the number of $\boldsymbol{\beta}$ parameters programmatically to allow scaling up your function to an arbitrary number of unknowns. You will assume that all variables contained in the `my_info` list (the second argument to `lm_logpost()`) are the same fields in the `info_03_weak` list you defined in Problem 3a).
**Define the log-posterior function by completing the code chunk below. You must calculate the mean trend, `mu`, using matrix math between the design matrix and the unknown $\boldsymbol{\beta}$ column vector.**
*HINT*: This function should look very famaliar...
#### SOLUTION
```{r, solution_03b, eval=TRUE}
lm_logpost <- function(unknowns, my_info)
{
# specify the number of unknown beta parameters
length_beta <- ncol(my_info$design_matrix)
# extract the beta parameters from the `unknowns` vector
beta_v <- unknowns[1:length_beta]
# extract the unbounded noise parameter, varphi
lik_varphi <- unknowns[length_beta+1]
# back-transform from varphi to sigma
lik_sigma <- exp(lik_varphi)
# extract design matrix
X <- my_info$design_matrix
# calculate the linear predictor
mu <- as.vector(X %*% as.matrix(beta_v))
# evaluate the log-likelihood
log_lik <- sum(dnorm(x= my_info$yobs, mean = as.numeric(mu), sd = lik_sigma, log = TRUE))
# evaluate the log-prior
log_prior_beta <- sum(dnorm(x = beta_v, mean = my_info$mu_beta, sd = my_info$tau_beta, log = TRUE))
log_prior_sigma <- dexp(x= lik_sigma, rate = my_info$sigma_rate, log = TRUE)
# add the mean trend prior and noise prior together
log_prior <- log_prior_beta + log_prior_sigma
# account for the transformation
log_derive_adjust <- lik_varphi
# sum together
return(log_lik+log_prior+log_derive_adjust)
}
```
### 3c)
The `my_laplace()` function is defined for you in the code chunk below. This function executes the laplace approximation and returns the object consisting of the posterior mode, posterior covariance matrix, and the log-evidence.
```{r, define_my_laplace_func}
my_laplace <- function(start_guess, logpost_func, ...)
{
# code adapted from the `LearnBayes`` function `laplace()`
fit <- optim(start_guess,
logpost_func,
gr = NULL,
...,
method = "BFGS",
hessian = TRUE,
control = list(fnscale = -1, maxit = 1001))
mode <- fit$par
post_var_matrix <- -solve(fit$hessian)
p <- length(mode)
int <- p/2 * log(2 * pi) + 0.5 * log(det(post_var_matrix)) + logpost_func(mode, ...)
# package all of the results into a list
list(mode = mode,
var_matrix = post_var_matrix,
log_evidence = int,
converge = ifelse(fit$convergence == 0,
"YES",
"NO"),
iter_counts = as.numeric(fit$counts[1]))
}
```
**Execute the Laplace Approximation for the model 3 formulation and the model 6 formulation. Assign the model 3 result to the `laplace_03_weak` object, and assign the model 6 result to the `laplace_06_weak` object. Check that the optimization scheme converged.**
#### SOLUTION
```{r, solution_03c}
### add more code chunks if you like
laplace_03_weak <- my_laplace(rep(1,ncol(info_03_weak$design_matrix) + 1), lm_logpost, info_03_weak)
laplace_03_weak$converge
```
```{r}
laplace_06_weak <- my_laplace(rep(1,ncol(info_06_weak$design_matrix) + 1), lm_logpost, info_06_weak)
laplace_06_weak$converge
```
### 3d)
A function is defined for you in the code chunk below. This function creates a coefficient summary plot in the style of the `coefplot()` function, but uses the Bayesian results from the Laplace Approximation. The first argument is the vector of posterior means, and the second argument is the vector of posterior standard deviations. The third argument is the name of the feature associated with each coefficient.
```{r, make_coef_viz_function}
viz_post_coefs <- function(post_means, post_sds, xnames)
{
tibble::tibble(
mu = post_means,
sd = post_sds,
x = xnames
) %>%
mutate(x = factor(x, levels = xnames)) %>%
ggplot(mapping = aes(x = x)) +
geom_hline(yintercept = 0, color = 'grey', linetype = 'dashed') +
geom_point(mapping = aes(y = mu)) +
geom_linerange(mapping = aes(ymin = mu - 2 * sd,
ymax = mu + 2 * sd,
group = x)) +
labs(x = 'feature', y = 'coefficient value') +
coord_flip() +
theme_bw()
}
```
**Create the posterior summary visualization figure for model 3 and model 6. You must provide the posterior means and standard deviations of the regression coefficients (the $\beta$ parameters). Do NOT include the $\varphi$ parameter. The feature names associated with the coefficients can be extracted from the design matrix using the `colnames()` function.**
#### SOLUTION
```{r, solution_03d_a}
### make the posterior coefficient visualization for model 3
viz_post_coefs(laplace_03_weak$mode[1:9],sqrt(diag(laplace_03_weak$var_matrix))[1:9],colnames(info_03_weak$design_matrix))
```
```{r, solution_03d_b}
### make the posterior coefficient visualization for model 6
viz_post_coefs(laplace_06_weak$mode[1:65],sqrt(diag(laplace_06_weak$var_matrix))[1:65],colnames(info_06_weak$design_matrix))
```
### 3e)
**Use the Bayes Factor to identify the better of the models.**
#### SOLUTION
```{r, solution_03e}
### add more code chunks if you like
mod03_evidence <- exp(laplace_03_weak$log_evidence)
mod06_evidence <- exp(laplace_06_weak$log_evidence)
mod03_weight <- mod03_evidence/sum(mod03_evidence+mod06_evidence)
mod06_weight <- mod06_evidence/sum(mod03_evidence+mod06_evidence)
tibble::tibble(
w = c(mod03_weight,mod06_weight)
) %>%
mutate(J = c(3,6)) %>%
ggplot( mapping = aes( x = as.factor(J), y = w)) + geom_bar(stat = "identity")
```
### 3f)
You fit the Bayesian models assuming a diffuse or *weak* prior. Let's now try a more informative or *strong* prior by reducing the prior standard deviation on the regression coefficients from 50 to 1. The prior mean will still be zero.
**Complete the first code chunk below, which defines the list of required information for both the model 3 and model 6 formulations using the strong prior on the regression coefficients. All other information, data and the $\sigma$ prior, are the same as before.**
**Run the Laplace Approximation using the strong prior for both the model 3 and model 6 formulations. Assign the results to `laplace_03_strong` and `laplace_06_strong`.**
**Confirm that the optimizations converged for both laplace approximation results.**
#### SOLUTION
Define the lists of required information for the strong prior.
```{r, solution_03f_a, eval=TRUE}
info_03_strong <- list(
yobs = df$y,
design_matrix = X03,
mu_beta = 0,
tau_beta = 1,
sigma_rate = 1
)
info_06_strong <- list(
yobs = df$y,
design_matrix = X06,
mu_beta = 0,
tau_beta = 1,
sigma_rate = 1
)
```
Execute the Laplace Approximation.
```{r, solution_03f_b}
### add more code chunks if you like
laplace_03_strong <- my_laplace(rep(1,ncol(info_03_strong$design_matrix) + 1), lm_logpost, info_03_strong)
laplace_03_strong$converge
```
```{r}
laplace_06_strong <- my_laplace(rep(1,ncol(info_06_strong$design_matrix) + 1), lm_logpost, info_06_strong)
laplace_06_strong$converge
```
### 3g)
**Use the `viz_post_coefs()` function to visualize the posterior coefficient summaries for model 3 and model 6, based on the strong prior specification.**
#### SOLUTION
```{r, solution_03g}
### add more code chunks if you like
viz_post_coefs(laplace_03_strong$mode[1:9],sqrt(diag(laplace_03_strong$var_matrix))[1:9],colnames(info_03_strong$design_matrix))
```
```{r}
viz_post_coefs(laplace_06_strong$mode[1:65],sqrt(diag(laplace_06_strong$var_matrix))[1:65],colnames(info_06_strong$design_matrix))
```
### 3h)
You will fit one more set of Bayesian models with a very strong prior on the regression coefficients. The prior standard deviation will be equal to 1/50.
**Complete the first code chunk below, which defines the list of required information for both the model 3 and model 6 formulations using the very strong prior on the regression coefficients. All other information, data and the $\sigma$ prior, are the same as before.**
**Run the Laplace Approximation using the strong prior for both the model 3 and model 6 formulations. Assign the results to `laplace_03_very_strong` and `laplace_06_very_strong`.**
**Confirm that the optimizations converged for both laplace approximation results.**
#### SOLUTION
```{r, solution_03h_a, eval=TRUE}
info_03_very_strong <- list(
yobs = df$y,
design_matrix = X03,
mu_beta = 0,
tau_beta = 0.02,
sigma_rate = 1
)
info_06_very_strong <- list(
yobs = df$y,
design_matrix = X06,
mu_beta = 0,
tau_beta = 0.02,
sigma_rate = 1
)
```
Execute the Laplace Approximation.
```{r, solution_03h_b}
### add more code chunks if you like
laplace_03_very_strong <- my_laplace(rep(1,ncol(info_03_very_strong$design_matrix) + 1), lm_logpost, info_03_very_strong)
laplace_03_very_strong$converge
```
```{r}
laplace_06_very_strong <- my_laplace(rep(1,ncol(info_06_very_strong$design_matrix) + 1), lm_logpost, info_06_very_strong)
laplace_06_very_strong$converge
```
### 3i)
**Use the `viz_post_coefs()` function to visualize the posterior coefficient summaries for model 3 and model 6, based on the very strong prior specification.**
#### SOLUTION
```{r, solution_03i}
### add more code chunks if you like
viz_post_coefs(laplace_03_very_strong$mode[1:9],sqrt(diag(laplace_03_very_strong$var_matrix))[1:9],colnames(info_03_very_strong$design_matrix))
```
```{r}
viz_post_coefs(laplace_06_very_strong$mode[1:65],sqrt(diag(laplace_06_very_strong$var_matrix))[1:65],colnames(info_06_very_strong$design_matrix))
```
### 3j)
**Describe the influence of the regression coefficient prior standard deviation on the coefficient posterior distributions.**
#### SOLUTION
What do you think?
If the regression coefficient prior standard deviation is large, the posterior distributions will be more influenced by the data.
On the other hand, if the regression coefficient prior standard deviation is small, indicating low uncertainty or strong prior knowledge about the coefficient values, the posterior distributions will be more influenced by the prior beliefs. In this case, the posterior distributions will be closer to the prior distributions and less influenced by the data.
### 3k)
You previously compared the two models using the Bayes Factor based on the weak prior specification.
**Compare the performance of the two models with Bayes Factors again, but considering the results based on the strong and very strong priors. Does the prior influence which model is considered to be better?**
#### SOLUTIOn
```{r, solution_03k}
### add more code chunks if you like
mod03_evidence <- exp(laplace_03_strong$log_evidence)
mod06_evidence <- exp(laplace_06_strong$log_evidence)
mod03_weight <- mod03_evidence/sum(mod03_evidence+mod06_evidence)
mod06_weight <- mod06_evidence/sum(mod03_evidence+mod06_evidence)
tibble::tibble(
w = c(mod03_weight,mod06_weight)
) %>%
mutate(J = c(3,6)) %>%
ggplot( mapping = aes( x = as.factor(J), y = w)) + geom_bar(stat = "identity")
```
```{r}
mod03_evidence <- exp(laplace_03_very_strong$log_evidence)
mod06_evidence <- exp(laplace_06_very_strong$log_evidence)
mod03_weight <- mod03_evidence/sum(mod03_evidence+mod06_evidence)
mod06_weight <- mod06_evidence/sum(mod03_evidence+mod06_evidence)
tibble::tibble(
w = c(mod03_weight,mod06_weight)
) %>%
mutate(J = c(3,6)) %>%
ggplot( mapping = aes( x = as.factor(J), y = w)) + geom_bar(stat = "identity")
```
Yes, the prior does influence which model is considered to be "better".
## Problem 04
You examined the behavior of the coefficient posterior based on the influence of the prior. Let's now consider the prior's influence by examining the posterior predictive distributions.
### 4a)
You will make posterior predictions following the approach from the previous assignment. Posterior samples are generated and those samples are used to calculate the posterior samples of the mean trend and generate random posterior samples of the response around the mean. In the previous assignment, you made posterior predictions in order to calculate errors. In this assignment, you will not calculate errors, instead you will summarize the posterior predictions of the mean and of the random response.
The `generate_lm_post_samples()` function is defined for you below. It uses the `MASS::mvrnorm()` function generate posterior samples from the Laplace Approximation's MVN distribution.
```{r, make_lm_post_samples_func}
generate_lm_post_samples <- function(mvn_result, length_beta, num_samples)
{
MASS::mvrnorm(n = num_samples,
mu = mvn_result$mode,
Sigma = mvn_result$var_matrix) %>%
as.data.frame() %>% tibble::as_tibble() %>%
purrr::set_names(c(sprintf("beta_%02d", 0:(length_beta-1)), "varphi")) %>%
mutate(sigma = exp(varphi))
}
```
The code chunk below starts the `post_lm_pred_samples()` function. This function generates posterior mean trend predictions and posterior predictions of the response. The first argument, `Xnew`, is a potentially new or test design matrix that we wish to make predictions at. The second argument, `Bmat`, is a matrix of posterior samples of the $\boldsymbol{\beta}$-parameters, and the third argument, `sigma_vector`, is a vector of posterior samples of the likelihood noise. The `Xnew` matrix has rows equal to the number of predictions points, `M`, and the `Bmat` matrix has rows equal to the number of posterior samples `S`.
You must complete the function by performing the necessary matrix math to calculate the matrix of posterior mean trend predictions, `Umat`, and the matrix of posterior response predictions, `Ymat`. You must also complete missing arguments to the definition of the `Rmat` and `Zmat` matrices. The `Rmat` matrix replicates the posterior likelihood noise samples the correct number of times. The `Zmat` matrix is the matrix of randomly generated standard normal values. You must correctly specify the required number of rows to the `Rmat` and `Zmat` matrices.
The `post_lm_pred_samples()` returns the `Umat` and `Ymat` matrices contained within a list.
**Perform the necessary matrix math to calculate the matrix of posterior predicted mean trends `Umat` and posterior predicted responses `Ymat`. You must specify the number of required rows to create the `Rmat` and `Zmat` matrices.**
*HINT*: The following code chunk should look famaliar...
#### SOLUTION
```{r, solution_04a, eval=TRUE}
post_lm_pred_samples <- function(Xnew, Bmat, sigma_vector)
{
# number of new prediction locations
M <- nrow(Xnew)
# number of posterior samples
S <- nrow(Bmat)
# matrix of linear predictors
Umat <- Xnew %*% t(Bmat)
# assmeble matrix of sigma samples, set the number of rows
Rmat <- matrix(rep(sigma_vector, M), M , byrow = TRUE)
# generate standard normal and assemble into matrix
# set the number of rows
Zmat <- matrix(rnorm(M*S), M, byrow = TRUE)
# calculate the random observation predictions
Ymat <- Umat + Rmat * Zmat
# package together
list(Umat = Umat, Ymat = Ymat)
}
```
### 4b)
Since this assignment is focused on visualizing the predictions, we will summarize the posterior predictions to focus on the posterior means and the middle 95% uncertainty intervals. The code chunk below is defined for you which serves as a useful wrapper function to call `post_lm_pred_samples()`.
```{r, make_the_lm_pred_func}
make_post_lm_pred <- function(Xnew, post)
{
Bmat <- post %>% select(starts_with("beta_")) %>% as.matrix()
sigma_vector <- post %>% pull(sigma)
post_lm_pred_samples(Xnew, Bmat, sigma_vector)
}
```
The code chunk below defines a function `summarize_lm_pred_from_laplace()` which manages the actions necessary to summarize posterior predictions. The first argument, `mvn_result`, is the Laplace Approximation object. The second object is the test design matrix, `Xtest`, and the third argument, `num_samples`, is the number of posterior samples to make.
You must complete the code chunk below which summarizes the posterior predictions. This function takes care of most of the coding for you. You do not have to worry about the generation of the posterior samples OR calculating the posterior quantiles associated with the middle 95% uncertainty interval. You must calculate the posterior average by deciding on whether you should use `colMeans()` or `rowMeans()` to calculate the average across all posterior samples per prediction location.
**Follow the comments in the code chunk below to complete the definition of the summarize_lm_pred_from_laplace() function. You must calculate the average posterior mean trend and the average posterior response.**
#### SOLUTION
```{r, solution_04b, eval=TRUE}
summarize_lm_pred_from_laplace <- function(mvn_result, Xtest, num_samples)
{
# generate posterior samples of the beta parameters
post <- generate_lm_post_samples(mvn_result, ncol(Xtest), num_samples)
# make posterior predictions on the test set
pred_test <- make_post_lm_pred(Xtest, post)
# calculate summary statistics on the predicted mean and response
# summarize over the posterior samples
# posterior mean, should you summarize along rows (rowMeans) or
# summarize down columns (colMeans) ???
mu_avg <- rowMeans(pred_test$Umat)
y_avg <- rowMeans(pred_test$Ymat)
# posterior quantiles for the middle 95% uncertainty intervals
mu_lwr <- apply(pred_test$Umat, 1, stats::quantile, probs = 0.025)
mu_upr <- apply(pred_test$Umat, 1, stats::quantile, probs = 0.975)
y_lwr <- apply(pred_test$Ymat, 1, stats::quantile, probs = 0.025)
y_upr <- apply(pred_test$Ymat, 1, stats::quantile, probs = 0.975)
# book keeping
tibble::tibble(
mu_avg = mu_avg,
mu_lwr = mu_lwr,
mu_upr = mu_upr,
y_avg = y_avg,
y_lwr = y_lwr,
y_upr = y_upr
) %>%
tibble::rowid_to_column("pred_id")
}
```
### 4c)
When you made predictions in Problem 02, the `lm()` object handled making the test design matrix. However, since we have programmed the Bayesian modeling approach from scratch we need to create the test design matrix manually.
**Create the test design matrix based on the visualization grid, `viz_grid`, using the model 3 formulation. Assign the result to the `X03_test` object.**
**Call the `summarize_lm_pred_from_laplace()` function to summarize the posterior predictions from the model 3 formulation for the weak, strong, and very strong prior specifications. Use 5000 posterior samples for each case. Assign the results from the weak prior to `post_pred_summary_viz_03_weak`, the results from the strong prior to `post_pred_summary_viz_03_strong`, and the results from the very strong prior to `post_pred_summary_viz_03_very_strong`.**
#### SOLUTION
```{r, solution_04c}
### add as many code chunks as you'd like
X03_test <- model.matrix( ~ (x1 + I(x1^2)) * (x2 + I(x2^2)), data = viz_grid)
```
```{r}
post_pred_summary_viz_03_weak <- summarize_lm_pred_from_laplace(laplace_03_weak,X03_test,5000)
```
```{r}
post_pred_summary_viz_03_strong <- summarize_lm_pred_from_laplace(laplace_03_strong,X03_test,5000)
```
```{r}
post_pred_summary_viz_03_very_strong <- summarize_lm_pred_from_laplace(laplace_03_very_strong,X03_test,5000)
```
### 4d)
You will now visualize the posterior predictions from the model 3 Bayesian models associated with the weak, strong, and very strong priors. The `viz_grid` object is joined to the prediction dataframes assuming you have used the correct variable names!
**Visualize the predicted means, confidence intervals, and prediction intervals in the style of those that you created in Problem 02. The confidence interval bounds are `mu_lwr` and `mu_upr` columns and the prediction interval bounds are the `y_lwr` and `y_upr` columns, respectively. The posterior predicted mean of the mean is `mu_avg`.**
**Pipe the result of the joined dataframe into `ggplot()` and make appropriate aesthetics and layers to visualize the predictions with the `x1` variable mapped to the `x` aesthetic and the `x2` variable used as a facet variable.**
#### SOLUTION
```{r, solution_04d_a, eval=TRUE}
post_pred_summary_viz_03_weak %>%
left_join(viz_grid %>% tibble::rowid_to_column("pred_id"),
by = 'pred_id')%>%
ggplot(mapping = aes(x = x1)) +
geom_ribbon(mapping = aes(ymin = y_lwr, ymax = y_upr), fill = "orange") +
geom_ribbon(mapping = aes(ymin = mu_lwr, ymax = mu_upr), fill = "grey") +
geom_line(mapping = aes(y = mu_avg)) +
coord_cartesian(ylim = c(-7, 7))+
facet_wrap(~x2)
```
```{r, solution_04d_b, eval=TRUE}
post_pred_summary_viz_03_strong %>%
left_join(viz_grid %>% tibble::rowid_to_column("pred_id"),
by = 'pred_id')%>%
ggplot(mapping = aes(x = x1)) +
geom_ribbon(mapping = aes(ymin = y_lwr, ymax = y_upr), fill = "orange") +
geom_ribbon(mapping = aes(ymin = mu_lwr, ymax = mu_upr), fill = "grey") +
geom_line(mapping = aes(y = mu_avg)) +
coord_cartesian(ylim = c(-7, 7))+
facet_wrap(~x2)
```
```{r, solution_04d_c, eval=TRUE}
post_pred_summary_viz_03_very_strong %>%
left_join(viz_grid %>% tibble::rowid_to_column("pred_id"),
by = 'pred_id')%>%
ggplot(mapping = aes(x = x1)) +
geom_ribbon(mapping = aes(ymin = y_lwr, ymax = y_upr), fill = "orange") +
geom_ribbon(mapping = aes(ymin = mu_lwr, ymax = mu_upr), fill = "grey") +
geom_line(mapping = aes(y = mu_avg)) +
coord_cartesian(ylim = c(-7, 7))+
facet_wrap(~x2)
```
### 4e)
In order to make posterior predictions for the model 6 formulation you must create a test design matrix consistent with the training set basis. The code chunk below creates a helper function which extracts the interior and boundary knots of a natural spline associated with the training set for you. The first argument, `J`, is the degrees-of-freedom (DOF) of the spline, the second argument, `train_data`, is the training data set. The third argument `xname` is the name of the variable you are applying the spline to. The `xname` argument **must** be provided as a character string.
```{r, make_knots_get_function}
make_splines_training_knots <- function(J, train_data, xname)
{
# extract the input from the training set
x <- train_data %>% select(all_of(xname)) %>% pull()
# create the training basis
train_basis <- splines::ns(x, df = J)
# extract the knots
interior_knots <- as.vector(attributes(train_basis)$knots)
boundary_knots <- as.vector(attributes(train_basis)$Boundary.knots)
# book keeping
list(interior_knots = interior_knots,
boundary_knots = boundary_knots)
}
```
**Create the test design matrix based on the visualization grid, `viz_grid`, using the model 6 formulation. Assign the result to the `X06_test` object. Use the `make_splines_training_knots()` function to get the interior and boundary knots associated with the training set for the `x1` variable to create the test design matrix.**
**Call the `summarize_lm_pred_from_laplace()` function to summarize the posterior predictions from the model 6 formulation for the weak, strong, and very strong prior specifications. Use 5000 posterior samples for each case. Assign the results from the weak prior to `post_pred_summary_viz_06_weak`, the results from the strong prior to `post_pred_summary_viz_06_strong`, and the results from the very strong prior to `post_pred_summary_viz_06_very_strong`.**
*HINT*: The `make_spline_training_knots()` function returns a list! The fields or elements of the list can be accessed via the `$` operator.
#### SOLUTION
```{r, solution_04e}
### add as many code chunks as you'd like
knots <- make_splines_training_knots(12,df,"x1")
X06_test <- model.matrix( ~ splines::ns(x1, knots = knots$interior_knots, Boundary.knots = knots$boundary_knots, df=12) * (x2 + I(x2^2) + I(x2^3) + I(x2^4)), data = viz_grid)
```
```{r}
post_pred_summary_viz_06_weak <- summarize_lm_pred_from_laplace(laplace_06_weak,X06_test,5000)
```
```{r}
post_pred_summary_viz_06_strong <- summarize_lm_pred_from_laplace(laplace_06_strong,X06_test,5000)
```
```{r}
post_pred_summary_viz_06_very_strong <- summarize_lm_pred_from_laplace(laplace_06_very_strong,X06_test,5000)
```
### 4f)
You will now visualize the posterior predictions from the model 6 Bayesian models associated with the weak, strong, and very strong priors. The `viz_grid` object is joined to the prediction dataframes assuming you have used the correct variable names!
**Visualize the predicted means, confidence intervals, and prediction intervals in the style of those that you created in Problem 02. The confidence interval bounds are `mu_lwr` and `mu_upr` columns and the prediction interval bounds are the `y_lwr` and `y_upr` columns, respectively. The posterior predicted mean of the mean is `mu_avg`.**
**Pipe the result of the joined dataframe into `ggplot()` and make appropriate aesthetics and layers to visualize the predictions with the `x1` variable mapped to the `x` aesthetic and the `x2` variable used as a facet variable.**