-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathfirstOrderVAR.Rmd
791 lines (653 loc) · 24.6 KB
/
firstOrderVAR.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
---
title: "First Order Vector Autoregression"
author: "Benjamin Christoffersen"
date: "`r Sys.Date()`"
output: html_document
bibliography: bibliography.bib
nocite: |
@kingma14
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
echo = TRUE,
fig.height = 4, fig.width = 7, dpi = 128,
cache.path = "cache/firstOrderVAR-cache/", fig.path = "fig/firstOrderVAR-fig/",
error = FALSE)
options(digits = 4, scipen = 10, width = 70)
```
## Introduction
We will simulate and estimate a first order vector auto-regression model in
this example using the particle filter and smoother. For details see
[this vignette](../vignettes/Particle_filtering.pdf) which can also be found by
calling `vignette("Particle_filtering", package = "dynamichazard")`. The models
we are going to simulate from and estimate are of the form
$$
\begin{split}
y_{it} &\sim g(\cdot\vert\eta_{it}) & \\
\vec{\eta}_t &= X_tR^+\vec{\alpha}_t + Z_t\vec{\beta} + \vec{o}_t \\
\vec{\alpha}_t &= F\vec{\alpha}_{t - 1} + R\vec{\epsilon}_t &
\quad \vec{\epsilon}_t \sim N(\vec{0}, Q) \\
& & \quad \vec{\alpha}_0 \sim N(\vec{a}_0, Q_0)
\end{split}, \qquad
\begin{array}{l} i = 1, \dots, n_t \\ t = 1, \dots, d \end{array}
$$
where the $y_{it}$ is individual $i$'s indicator at time $t$ for whether he dies
between time $(t - 1, t]$. The indicators, $y_{it}$,
are binomial distributed with the complementary log-log link function conditional on
knowing the log times at risk, $\vec{o}_1,\cdots,\vec{o}_d$, covariates, $X_t$ and $Z_t$, and latent states, $\vec{\alpha}_1,\dots,\vec{\alpha}_d$.
The total survival time of individual $i$ is $T_i$ which is
piecewise constant exponentially distributed
conditional on knowing the latent states. Further, we set $Z_t = X_t$ so the
states have a non-zero mean. The true values are
$$
F = \begin{pmatrix}0.9 & 0 \\ 0 & 0.9 \end{pmatrix}, \quad
Q = \begin{pmatrix}0.33^2 & 0 \\ 0 & 0.33^2 \end{pmatrix}, \quad
R = I_2, \quad
\vec{\beta} = (-6.5, -2)^\top
$$
```{r assign_kit_info, echo = FALSE}
git_key <- system("git rev-parse --short HEAD", intern = TRUE)
git_bra <- system("git branch", intern = TRUE)
regexp <- "^(\\*\ )(.+)$"
git_bra <- git_bra[grepl(regexp, git_bra)]
git_bra <- gsub(regexp, "\\2", git_bra)
```
where $I_2$ is the two-dimensional identity matrix and
$\vec{a}_0$ and $Q_0$ are given by the invariant distribution. The unknown
parameters to be estimated is everything but $Q_0$ and $R$ (since we fix $Q_0$ doing the estimation and we set $\vec{a}_0 = (0, 0)^\top$). This
example is run on the git branch "`r git_bra`" with ID "`r git_key`". The code
can be found on
[the github site for the package](https://github.com/boennecd/dynamichazard/tree/master/examples).
All functions which assignments are not shown and are not in the
`dynamichazard` package can be found on the github site.
## Simulation
We start by simulating the data. Feel free to skip this part as the specifications
are given above. First we assign the parameters for the simulation
```{r load_dynam}
library(dynamichazard)
```
```{r assign_sim_params, cache = 1}
n_obs <- 1000L
n_periods <- 200L
Fmat <- matrix(c(.9, 0, 0, .9), 2)
Rmat <- diag(1 , 2)
Qmat <- diag(.33^2, 2)
Q_0 <- get_Q_0(Qmat, Fmat)
beta <- c(-6.5, -2)
```
`get_Q_0` is a function to get the covariance matrix for the invariant distribution.
Then we simulate and plot the latent states
```{r sim_plot_latent}
set.seed(54432125)
betas <- matrix(nrow = n_periods + 1, ncol = 2)
betas[1, ] <- rnorm(2) %*% chol(Q_0)
for(i in 1:n_periods + 1)
betas[i, ] <- Fmat %*% betas[i - 1, ] + drop(rnorm(2) %*% chol(Qmat))
betas <- t(t(betas) + beta)
# plot of latent variables
cols <- c("black", "darkblue")
matplot(betas, type = "l", lty = 1, col = cols)
for(i in 1:2)
abline(h = beta[i], lty = 2, col = cols[i])
```
We simulate the observations as follows
```{r sim_obs}
df <- replicate(n_obs, {
# left-censoring
tstart <- max(0L, sample.int((n_periods - 1L) * 2L, 1) - n_periods + 1L)
# covariates
x <- runif(1, -1, 1)
covars <- c(1, x)
# outcome (stop time and event indicator)
y <- FALSE
for(tstop in (tstart + 1L):n_periods){
fail_time <- rexp(1) / exp(covars %*% betas[tstop + 1L, ])
if(fail_time <= 1){
y <- TRUE
tstop <- tstop - 1L + fail_time
break
}
}
c(tstart = tstart, tstop = tstop, x = x, y = y)
})
df <- data.frame(t(df))
head(df, 10)
```
We left-censor the observations since we otherwise may end up with a low number
of observations towards the end.
### Model without latent variables
We can fit a model without the latent variables (i.e., a constant coefficient
model) as follows
```{r survreg_fit}
surv_fit <- survreg(Surv(tstop - tstart, y) ~ x, df, dist = "exponential")
summary(surv_fit) # signs are flipped
logLik(surv_fit)
```
The signs are flipped as stated in `help("survreg")`. They are though close to
$\vec{\beta}$ as expected. Further, we can compare the log-likelihood with this
models with the log-likelihood approximation we get from the particle filter
in the next section.
## Particle filter and smoother
We start off by using the generalized two-filter smoother from @briers09. We
estimate the parameters with an EM algorithm by calling `PF_EM` as
follows
<!--
knitr::opts_knit$set(output.dir = ".")
knitr::load_cache("brier_far_F", "pf_Brier", path = "examples/cache/firstOrderVAR-cache/")
-->
```{r confs, cache = 1}
n_threads <- 6
# you can replace this with e.g.,
# max(parallel::detectCores(logical = FALSE), 2)))
```
```{r brier_far_F, cache = 1, dependson = c("assign_sim_params", "confs")}
set.seed(30520116)
system.time(pf_Brier <- PF_EM(
Surv(tstart, tstop, y) ~ x + ddFixed(x) + ddFixed_intercept(TRUE), df,
Q_0 = diag(1, 2), Q = diag(1, 2), Fmat = matrix(c(.1, 0, 0, .1), 2),
by = 1, type = "VAR", model = "exponential", max_T = n_periods,
control = PF_control(
N_fw_n_bw = 750, N_smooth = 1, N_first = 2000, eps = .001,
method = "AUX_normal_approx_w_cloud_mean", fix_seed = FALSE,
n_max = 100, smoother = "Brier_O_N_square",
# we add some extra variation to the proposal distributions
nu = 8, covar_fac = 1.1, n_threads = n_threads)))
```
`system.time` is used to show the computation time. The estimated parameters are
```{r brier_ests_far_F}
pf_Brier$F
show_covar <- function(Q){
cat("Standard deviations\n")
print(sqrt(diag(Q)))
cat("Lower correlation matrix\n")
tmp <- cov2cor(Q)
print(tmp[-1, -ncol(tmp)])
}
show_covar(pf_Brier$Q)
pf_Brier$fixed_effects
```
The effective sample size in the smoothing step at each time point is
```{r brier_eff_far_F}
plot(pf_Brier$effective_sample_size$smoothed_clouds, type = "h",
ylim = range(0, pf_Brier$effective_sample_size$smoothed_clouds),
ylab = "Effective sample size")
```
```{r plot_clouds, echo = FALSE}
plot_cloud <- function(
pf_fit, type = "smoothed_clouds", start_at_zero = FALSE){
par_old <- par(no.readonly = TRUE)
on.exit(par(par_old))
# plot random effects
jpeg(tempfile()) # avoid output from the plot
out <- plot(pf_fit, type = type, col = c("black", "blue"))
dev.off()
par(par_old)
# then we use it and add the fixed effects
y <- t(t(out$mean) + pf_fit$fixed_effects)
ylim <- range(betas, y)
x <- if(start_at_zero) 1:nrow(out$mean) - 1 else 1:nrow(out$mean)
matplot(x, t(t(out$mean) + pf_fit$fixed_effects), type = "l",
lty = 1, col = c("black", "blue"), ylim = ylim)
# add actual points
matplot(0:n_periods, betas, type = "l", add = TRUE,
lty = 2, col = c("black", "blue"), ylim = ylim)
# show places where we had a hard time sampling
ess <- pf_fit$effective_sample_size[[type]]
idx <- which(ess <= 100)
for(i in idx - start_at_zero)
abline(v = i, lty = 2)
invisible(ess)
}
```
The smoothed estimates of the latent states looks as follows (the vertical dashed lines are points where
we did not sample well)
```{r brier_plot_far_F}
plot_cloud(pf_Brier)
```
The dashed non-vertical lines are the true curve and the continuous is the smoothed estimate. The
approximate log-likelihoods from the particle filter at each EM iteration are
```{r brier_log_like_far_F}
plot(pf_Brier$log_likes)
```
We can compare the output above with the smoother from @fearnhead10
<!--
knitr::opts_knit$set(output.dir = ".")
knitr::load_cache("fear_far_F", "pf_Fear", path = "examples/cache/firstOrderVAR-cache/")
-->
```{r fear_far_F, cache = 1, dependson = c("assign_sim_params", "confs")}
set.seed(30520116)
system.time(pf_Fear <- PF_EM(
Surv(tstart, tstop, y) ~ x + ddFixed(x) + ddFixed_intercept(TRUE), df,
Q_0 = diag(1, 2), Q = diag(1, 2), Fmat = matrix(c(.1, 0, 0, .1), 2),
by = 1, type = "VAR", model = "exponential", max_T = n_periods,
control = PF_control(
N_fw_n_bw = 1000, N_smooth = 2000, N_first = 2000, eps = .001,
method = "AUX_normal_approx_w_cloud_mean", fix_seed = FALSE,
n_max = 100, smoother = "Fearnhead_O_N",
nu = 8, covar_fac = 1.1, n_threads = n_threads)))
```
The estimates are similar
```{r fear_ests_far_F}
pf_Fear$F
show_covar(pf_Fear$Q)
pf_Fear$fixed_effects
```
The effective sample size are better now though
```{r fear_eff_far_F}
plot(pf_Fear$effective_sample_size$smoothed_clouds, type = "h",
ylim = range(0, pf_Fear$effective_sample_size$smoothed_clouds),
ylab = "Effective sample size")
```
However, there are many options to reduce the computation time for the former smoother as e.g,
mentioned in @briers09 but those are not implemented. The smoothed estimates are
also rather close to what we saw before
```{r fear_plot_far_F}
plot_cloud(pf_Fear)
```
The log-likelihoods are almost the same
```{r fear_log_like_far_F}
plot(pf_Fear$log_likes)
```
The log-likelihood where flat toward the end and not monotonically increasing
```{r fear_log_like_far_F_tail}
plot(tail(pf_Fear$log_likes, 20))
```
This will not happen in an EM algorithm but may happen in an MCEM algorithm due
to the Monte Carlo error. We may want to take a few more EM iterations with
more particles. We can do this as follows
<!--
knitr::opts_knit$set(output.dir = ".")
knitr::load_cache("more_parts", "pf_Fear_more", path = "examples/cache/firstOrderVAR-cache/")
-->
```{r more_parts, cache = 1, dependson = "fear_far_F"}
set.seed(30520116)
pf_Fear_more <- PF_EM(
Surv(tstart, tstop, y) ~ x + ddFixed(x) + ddFixed_intercept(TRUE), df,
Q_0 = diag(1, 2), Q = pf_Fear$Q, Fmat = pf_Fear$F,
fixed_effects = pf_Fear$fixed_effects, by = 1, type = "VAR",
model = "exponential", max_T = n_periods,
control = PF_control(
N_fw_n_bw = 5000, N_smooth = 10000, N_first = 5000, eps = .001,
method = "AUX_normal_approx_w_cloud_mean", fix_seed = FALSE,
n_max = 10, smoother = "Fearnhead_O_N",
nu = 8, covar_fac = 1.1, n_threads = n_threads))
```
The log-likelihoods from these iterations are
```{r more_parts_pf_Fear_log_like}
plot(pf_Fear_more$log_likes)
```
and the final estimates are
```{r more_parts_pf_Fear_show_params}
pf_Fear_more$F
show_covar(pf_Fear_more$Q)
pf_Fear_more$fixed_effects
```
### Averaging
Averaging is sometimes argued for in MCEM algorithm. E.g., see @cappe05
section 11.1.2.2. The idea is to take an average the estimates or sufficient
sufficient statistic after some iteration number. As of this writing,
the present implementation does not allow one to change the number of particles
at each iteration as suggested in @cappe05.
<!--
knitr::opts_knit$set(output.dir = ".")
knitr::load_cache("averaging", "pf_Fear_avg", path = "examples/cache/firstOrderVAR-cache/")
-->
```{r averaging, cache = 1, dependson = c("assign_sim_params", "confs")}
set.seed(30520116)
system.time(pf_Fear_avg <- PF_EM(
Surv(tstart, tstop, y) ~ x + ddFixed(x) + ddFixed_intercept(TRUE), df,
Q_0 = diag(1, 2), Q = diag(1, 2), Fmat = matrix(c(.1, 0, 0, .1), 2),
by = 1, type = "VAR", model = "exponential", max_T = n_periods,
control = PF_control(
N_fw_n_bw = 200, N_smooth = 400, N_first = 1000, eps = 1e-5,
method = "AUX_normal_approx_w_cloud_mean", fix_seed = FALSE,
n_max = 1000L, smoother = "Fearnhead_O_N", averaging_start = 150L,
nu = 8, covar_fac = 1.1, n_threads = n_threads)))
```
```{r show_averaging_res}
pf_Fear_avg$F
show_covar(pf_Fear_avg$Q)
pf_Fear_avg$fixed_effects
plot(pf_Fear_avg$log_likes, type = "l")
```
### Stochastic gradient descent
We will run stochastic gradient descent as in @polyak92 which is described
in @cappe05 [page 414].
<!--
knitr::opts_knit$set(output.dir = ".")
knitr::load_cache("sgd", path = "examples/cache/firstOrderVAR-cache/")
-->
```{r assg_pf_start}
#####
# fit to start from
set.seed(30520116)
pf_start <- PF_EM(
Surv(tstart, tstop, y) ~ x + ddFixed(x) + ddFixed_intercept(TRUE), df,
Q_0 = diag(1, 2), Q = diag(1, 2), Fmat = matrix(c(.1, 0, 0, .1), 2),
by = 1, type = "VAR", model = "exponential", max_T = n_periods,
control = PF_control(
N_fw_n_bw = 200, N_smooth = 400, N_first = 1000, eps = 1e-5,
method = "AUX_normal_approx_w_cloud_mean", fix_seed = FALSE,
n_max = 1L, smoother = "Fearnhead_O_N",
nu = 8, covar_fac = 1.1, n_threads = n_threads))
```
```{r sgd, cache=1}
# Function to perform stochastic gradient descent. Uses Adam algorithm
# suggested by Kingma et al. (2015).
#
# Args:
# object: an object of class PF_EM.
# n_runs: number of iterations.
# ns: number of particles at each iteration.
# debug: TRUE if information should be printed during estimation.
# use_O_n_sq: TRUE if O(N^2) method should be used.
# lr: learning rate.
# mp: decay rate for first moment.
# vp: decay rate for second moment.
#
# Returns:
# list with final estimates, log-likelihood approximations at each iteration,
# and estimates at each iteration.
sgd <- function(object, n_runs, ns, debug = FALSE, use_O_n_sq = FALSE,
lr = .001, mp = .9, vp = .999)
{
#####
# get object to perform computation and matrices and vectors for output
comp_obj <- PF_get_score_n_hess(object, use_O_n_sq = use_O_n_sq)
state <- matrix(
NA_real_, n_runs + 1L, length(object$F) + length(object$Q))
# setup matrix to map from gradient to the state (the covariance part is only
# the lower diagonal)
dF <- length(object$F)
dQ <- NCOL(object$Q)
nQ <- (dQ * (dQ + 1L)) / 2L
K <- matrix(0., ncol(state), length(object$F) + nQ)
K[ 1:dF , 1:dF] <- diag(dF)
# can use e.g., matrixcal instead to get the duplication matrix
K[-(1:dF), -(1:dF)] <- dynamichazard:::.get_dup_mat(dQ)
dfix <- length(object$fixed_effects)
obs <- matrix(
NA_real_, n_runs + 1L, dfix)
state[1, ] <- c(object$F, object$Q)
obs [1, ] <- object$fixed_effects
lls <- rep(NA_real_, n_runs)
mv <- vv <- rep(0., ncol(K) + dfix)
for(i in 1:n_runs){
comp_obj$set_n_particles(N_fw = ns[i], N_first = ns[i])
fw <- comp_obj$run_particle_filter()
lls[i] <- logLik(fw)
score <- comp_obj$get_get_score_n_hess(only_score = TRUE)
sc <- score$score
mv <- mp * mv + (1 - mp) * sc
vv <- vp * vv + (1 - vp) * sc^2
muse <- mv / (1 - mp^i)
vuse <- vv / (1 - vp^i)
direc <- muse / (sqrt(vuse) + 1e-8)
# step-half until we have a valid set a of parameters
n_halfs <- 0L
n_halfs_max <- 25L
lri <- lr
while((n_halfs <- n_halfs + 1L) <= n_halfs_max){
state_i <-
state[i + 1L, ] <- state[i, ] + lri * drop(K %*% direc[-(1:dfix)])
obs_i <-
obs [i + 1L, ] <- obs [i, ] + lri * direc[1:dfix]
lri <- lri / 2
# check that system is stationary
Ftmp <- matrix(state_i[1:dQ^2], dQ, dQ)
if(any(Mod(eigen(Ftmp)$values) >= 1))
next
# set parameters
o <- comp_obj$set_parameters(state = state_i, obs = obs_i)
# Q is positive definite
if(all(eigen(o$Q)$values > 0.))
break
}
if(n_halfs > n_halfs_max)
stop("step halvings failed")
if(debug){
msg <- paste0(sprintf("It %4d Log-likelihood %10.2f (max %10.2f)", i,
lls[i], max(lls, na.rm = TRUE)))
cat(msg, "\n", rep("-", nchar(msg)), "\n", sep = "")
cat(sprintf("Gradient norm %14.4f\n", norm(t(sc))))
print(o$Fmat)
print(o$Q)
print(o$fixed_effects)
cat("\n")
}
}
list(o = o, lls = lls, state = state, obs = obs)
}
# number of iterations
n_runs <- 400L
# number of particles to use
ns <- ceiling(exp(seq(log(100), log(1000), length.out = n_runs)))
out <- sgd(n_runs = n_runs, ns = ns, object = pf_start, lr = .005)
lls <- out$lls
o <- out$o
```
The results are illustrated below
```{r show_sgd_res}
# approximate log-likelihood at each iteration
par(mar = c(5, 4, 1, 1))
plot(lls, xlab = "Iteration", ylab = "Log-likelihood", type = "l")
print(max(lls), digits = 5)
# the estimates
o$Fmat
show_covar(o$Q)
o$fixed_effects
```
We can also use the method from @Poyiadjis11. The present implementation
scales poorly in the number of particles but the method has a variance
that may be linear in the number of time periods instead of at least quadratic.
This is important for long time series.
<!--
knitr::opts_knit$set(output.dir = ".")
knitr::load_cache("O_N_sq_sgd", path = "examples/cache/firstOrderVAR-cache/")
-->
```{r O_N_sq_sgd, cache = 1, dependson = "sgd"}
# number of iterations
n_runs <- 400L
# number of particles to use
ns <- ceiling(exp(seq(log(200), log(1000), length.out = n_runs)))
out <- sgd(n_runs = n_runs, ns = ns, use_O_n_sq = TRUE, object = pf_start,
lr = .005)
lls <- out$lls
o <- out$o
```
The results are illustrated below (the log-likelihood approximations from
the forward particle filter are poor approximations due to few particles).
```{r O_N_sq_show_sgd_res}
# approximate log-likelihood at each iteration
par(mar = c(5, 4, 1, 1))
plot(lls, xlab = "Iteration", ylab = "Log-likelihood", type = "l")
print(max(lls), digits = 5)
# the estimates
o$Fmat
show_covar(o$Q)
o$fixed_effects
```
We can approximate standard errors as follows
<button onclick="show_obs_info()" style="
color: #494949 !important;
background: #ffffff;
padding: 5px;
border: 4px solid #494949 !important;
border-radius: 6px;
display: inline-block;
margin: 5px 0;"
>Show/hide code and result</button>
<script>
function show_obs_info() {
var x = document.getElementById("obsInfo");
if (x.style.display === "none") {
x.style.display = "block";
} else {
x.style.display = "none";
}
}
</script>
<div id="obsInfo" style = "display: none;">
<!--
knitr::opts_knit$set(output.dir = ".")
knitr::load_cache("obs_info_sgd", path = "examples/cache/firstOrderVAR-cache/")
-->
```{r obs_info_sgd, cache = 1, dependson = "O_N_sq_sgd"}
set.seed(24680530)
aprxs <- replicate(10, {
# make approximation
sink(tempfile())
comp_obj <- PF_get_score_n_hess(pf_start, use_O_n_sq = TRUE)
sink()
comp_obj$set_parameters(
state = c(o$Fmat, o$Q), obs = o$fixed_effects)
comp_obj$set_n_particles(N_fw = 1000L, N_first = 10000L)
comp_obj$run_particle_filter()
comp_out <- comp_obj$get_get_score_n_hess()
# return approximate observed information matrix, score and standard erors
list(
obs_info = comp_out$obs_info,
score = comp_out$score,
ses = suppressWarnings(rbind(
full = sqrt(diag(solve(comp_out$obs_info))),
# assuming expected gradient is zero
`w/o gradient` =
sqrt(diag(solve(comp_out$obs_info - tcrossprod(comp_out$score)))))))
}, simplify = FALSE)
```
```{r show_obs_info_sgd}
# may be quite noisy...
aperm(simplify2array(lapply(aprxs, "[[", "ses")), c(1, 3, 2))
# the gradient estimates are not exaclty zero in each run (may seem like it
# has not converged from score below and the `sgd` output)
t(sapply(aprxs, "[[", "score"))
# we can consider expanding averages instead
run_means <- apply(sapply(aprxs, "[[", "score"), 1L, function(x)
sapply(seq_along(x), function(i) mean(x[1:i])))
matplot(run_means, type = "l", xlab = "Elements in expanding mean",
ylab = "score", lty = 1, col = "black")
abline(h = 0, lty = 2)
tail(run_means, 1) # final values
run_means_info <- simplify2array(lapply(aprxs, "[[", "obs_info"))
run_means_info <- apply(run_means_info, 1:2, function(x)
sapply(seq_along(x), function(i) mean(x[1:i])))
run_means_info_se <- apply(run_means_info, 1L, function(x)
sqrt(diag(solve(x))))
matplot(t(run_means_info_se), type = "l", xlab = "Elements in expanding mean",
ylab = "Approximate standard errors", lty = 1, col = "black",
ylim = c(0, max(run_means_info_se, na.rm = TRUE)))
run_means_info_se[, ncol(run_means_info_se)] # final values
```
</div>
### Log-likelihood evaluation
We may question what the log-likelihood is at the true parameters. We can use
the `PF_forward_filter` function to perform approximate log-likelihood evaluation
<!--
knitr::opts_knit$set(output.dir = ".")
knitr::load_cache("pF_forward_filter_true", path = "examples/cache/firstOrderVAR-cache/")
-->
```{r pF_forward_filter_true, cache = 1, dependson = c("assign_sim_params", "confs")}
fw_precise <- PF_forward_filter(
Surv(tstart, tstop, y) ~ x + ddFixed(x) + ddFixed_intercept(TRUE),
N_fw = 100000, N_first = 100000, df, type = "VAR",
model = "exponential", max_T = n_periods, by = 1,
control = PF_control(
N_fw_n_bw = 1, N_smooth = 1, N_first = 1,
method = "AUX_normal_approx_w_cloud_mean",
smoother = "Fearnhead_O_N",
n_threads = n_threads, nu = 8, covar_fac = 1.1),
Fmat = Fmat, a_0 = c(0, 0), Q = Qmat, Q_0 = Q_0, R = diag(1, 2),
fixed_effects = beta)
logLik(fw_precise)
```
The log-likelihood from the final model we estimated is
```{r end_log_like_before}
print(tail(pf_Fear_more$log_likes, 1), digits = 7)
print(tail(pf_Brier$log_likes, 1), digits = 7)
```
or we can get a more precise estimate by calling (though the log-likelihood
is evaluated at the final parameter estimates and not the iteration one
step prior as above)
```{r end_log_like_before_precise, cache = 1, dependson = "fear_far_F"}
print(
logLik(PF_forward_filter(pf_Fear_more, N_fw = 20000, N_first = 20000)),
digits = 7)
```
A question is what happens if we start at the true parameter values? We may expect that
we only take a few EM iterations and end up at the MLE or another local
maximum not to far from the true parameters
<!--
knitr::opts_knit$set(output.dir = ".")
tmp <- knitr::load_cache("fear_true", "pf_Fear_close", path = "examples/cache/firstOrderVAR-cache/")
-->
```{r fear_true, cache = 1, dependson = c("assign_sim_params", "confs")}
set.seed(30520116)
pf_Fear_close <- PF_EM(
Surv(tstart, tstop, y) ~ x + ddFixed(x) + ddFixed_intercept(TRUE), df,
Fmat = Fmat, a_0 = c(0, 0), Q = Qmat, Q_0 = Q_0, fixed_effects = beta,
by = 1, type = "VAR", model = "exponential", max_T = n_periods,
control = PF_control(
N_fw_n_bw = 500, N_smooth = 2000, N_first = 2000, eps = .001,
method = "AUX_normal_approx_w_cloud_mean", fix_seed = FALSE,
n_max = 100, smoother = "Fearnhead_O_N",
nu = 8, covar_fac = 1.1, n_threads = n_threads))
```
The final estimates are
```{r fear_true_log_like_ests}
pf_Fear_close$F
show_covar(pf_Fear_close$Q)
pf_Fear_close$fixed_effects
```
The log-likelihoods at the EM iterations look as follows
```{r fear_true_log_like}
plot(pf_Fear_close$log_likes, type = "l")
```
### Restricted model
The model can be written as in terms of 2 parameters for the state model by
writing it as
$$
F = \begin{pmatrix}
\theta & 0 \\
0 & \theta
\end{pmatrix}, \quad
Q = \begin{pmatrix}
\exp(2\psi) & 0 \\
0 & \exp(2\psi)
\end{pmatrix}
$$
We can estimate the restricted model as follows (see the vignette mentioned in
the beginning for details about the arguments to `PF_EM`).
<!--
knitr::opts_knit$set(output.dir = ".")
knitr::load_cache("est_restrict", "pf_Fear_restrict", path = "examples/cache/firstOrderVAR-cache/")
-->
```{r est_restrict, cache = 1}
G <- matrix(0, 2^2, 1)
G[1, 1] <- G[4, 1] <- 1
J <- matrix(1, 2, 1)
K <- matrix(nrow = 2 * (2 - 1) / 2, ncol = 0)
pf_Fear_restrict <- PF_EM(
Surv(tstart, tstop, y) ~ x + ddFixed(x) + ddFixed_intercept(TRUE), df,
Q_0 = diag(1, 2), G = G, J = J, K = K, theta = .1, psi = 0, phi = numeric(),
by = 1, type = "VAR", model = "exponential", max_T = n_periods,
control = PF_control(
N_fw_n_bw = 500, N_smooth = 2000, N_first = 2000, eps = .001,
method = "AUX_normal_approx_w_cloud_mean", fix_seed = FALSE,
n_max = 100, smoother = "Fearnhead_O_N",
nu = 8, covar_fac = 1.1, n_threads = n_threads))
```
The final estimates are
```{r est_restrict_show_ests}
pf_Fear_restrict$F
show_covar(pf_Fear_restrict$Q)
pf_Fear_restrict$fixed_effects
```
and a log-likelihood approximation is
```{r pf_Fear_restrict_ll, cache = 1, dependson = "est_restrict"}
print(
logLik(PF_forward_filter(pf_Fear_restrict, N_fw = 20000, N_first = 20000)),
digits = 7)
```
We have $2^2 + 2(2 + 1) / 2 - 4 = 3$ fewer parameters so the difference in
log-likelihood to the full model seems reasonable.
## References