diff --git a/articles/chapter-10.html b/articles/chapter-10.html index 16e5828..dec7e55 100644 --- a/articles/chapter-10.html +++ b/articles/chapter-10.html @@ -104,55 +104,250 @@ -
-

This chapter is under construction.

-
 library(alda)
 library(dplyr)
-#> 
-#> Attaching package: 'dplyr'
-#> The following objects are masked from 'package:stats':
-#> 
-#>     filter, lag
-#> The following objects are masked from 'package:base':
-#> 
-#>     intersect, setdiff, setequal, union
-
-library(tidyr)
+library(tidyr)
+library(purrr)
 library(ggplot2)
+library(patchwork)
 library(survival)
 library(broom)
-

10.1 The Life Table +

10.1 The life table

-

Table 10.1, page 327:

+

In Section 10.1 Singer and Willett (2003) introduce the life +table—the primary tool for for summarizing the sample +distribution of event occurrence—using a subset of data from Singer +(1993), who measured how many years 3941 newly hired special educators +in Michigan stayed in teaching between 1972 and 1978. Teachers were +followed for up to 13 years or until they stopped teaching in the +state.

+

For this example we return to the teachers data set +introduced in Chapter 9, a person-level data frame with 3941 rows and 3 +columns:

+ +
+teachers
+#> # A tibble: 3,941 × 3
+#>    id    years censor
+#>    <fct> <dbl>  <dbl>
+#>  1 1         1      0
+#>  2 2         2      0
+#>  3 3         1      0
+#>  4 4         1      0
+#>  5 5        12      1
+#>  6 6         1      0
+#>  7 7        12      1
+#>  8 8         1      0
+#>  9 9         2      0
+#> 10 10        2      0
+#> # ℹ 3,931 more rows
+

As Singer and Willett (2003) discuss, a life table tracks the event +histories of a sample of individuals over a series of contiguous +intervals—from the beginning of time through the end of data +collection—by including information on the number of individuals +who:

+ +

We can either construct a life table “by hand” by first converting +the person-level data set to a person-period data set, then +cross-tabulating time period and +event-indicator variables; or by using a prepackaged +routine. Because we will be constructing a life table “by hand” in +Section 10.5, here we demonstrate the prepackaged routine approach.

+

Conceptually, the life table is simply the tabular form of a +survival function (see Section 10.2); thus, an easy way +to construct a life table is to first fit a survival function to the +person-level data set, then use the summary of the fit as a starting +point to construct the remainder of the table.

+

We can fit a survival function using the survfit() +function from the survival package. The model formula +for the survfit() function takes the form +response ~ terms, where the response must be a “survival +object” created by the Surv() function. For right-censored +data, the survival object can be created by supplying two unnamed +arguments to the Surv() function corresponding to +time and event variables, in that order. Note +that we can recode a censor variable into an +event variable by reversing its values. For 0-1 coded data, +we can write the event status as event = censor - 1.

-# A life table is the tabular form of a survival curve, so begin by fitting a
-# Kaplan-Meir curve to the data.
-teachers_fit <- survfit(Surv(years, 1 - censor) ~ 1, data = teachers)
+teachers_fit <- survfit(Surv(years, 1 - censor) ~ 1, data = teachers)
 
-table_10.1 <- teachers_fit |>
-  # Add a starting time (time 0) for the table.
+summary(teachers_fit)
+#> Call: survfit(formula = Surv(years, 1 - censor) ~ 1, data = teachers)
+#> 
+#>  time n.risk n.event survival std.err lower 95% CI upper 95% CI
+#>     1   3941     456    0.884 0.00510        0.874        0.894
+#>     2   3485     384    0.787 0.00652        0.774        0.800
+#>     3   3101     359    0.696 0.00733        0.682        0.710
+#>     4   2742     295    0.621 0.00773        0.606        0.636
+#>     5   2447     218    0.566 0.00790        0.550        0.581
+#>     6   2229     184    0.519 0.00796        0.504        0.535
+#>     7   2045     123    0.488 0.00796        0.472        0.504
+#>     8   1642      79    0.464 0.00800        0.449        0.480
+#>     9   1256      53    0.445 0.00811        0.429        0.461
+#>    10    948      35    0.428 0.00827        0.412        0.445
+#>    11    648      16    0.418 0.00848        0.401        0.435
+#>    12    391       5    0.412 0.00870        0.396        0.430
+

Next, we’ll collect the summary information from the +survfit object into a tibble using the tidy() +function from the broom package. For now we will +exclude any statistical summaries from the life table, focusing +exclusively on columns related to the event histories of the +teachers data. Note also that the summary information from +the survfit object starts at the time of the first event, +not the “beginning of time”. We can add a “beginning of time” to the +survfit object using the survfit0() function +from the survival package, which (by default) adds a starting time of 0 +to the life table.

+
+teachers_lifetable <- teachers_fit |>
   survfit0() |>
   tidy() |>
-  # The summary of the fit gives most of what we want, but to match Table 10.1
-  # we need to do a little more wrangling.
-  select(-c(std.error:conf.low)) |>
+  select(-c(estimate:conf.low)) |>
+  mutate(interval = paste0("[", time, ", ", time + 1, ")"), .after = time) |>
+  rename(year = time)
+
+teachers_lifetable
+#> # A tibble: 13 × 5
+#>     year interval n.risk n.event n.censor
+#>    <dbl> <chr>     <dbl>   <dbl>    <dbl>
+#>  1     0 [0, 1)     3941       0        0
+#>  2     1 [1, 2)     3941     456        0
+#>  3     2 [2, 3)     3485     384        0
+#>  4     3 [3, 4)     3101     359        0
+#>  5     4 [4, 5)     2742     295        0
+#>  6     5 [5, 6)     2447     218        0
+#>  7     6 [6, 7)     2229     184        0
+#>  8     7 [7, 8)     2045     123      280
+#>  9     8 [8, 9)     1642      79      307
+#> 10     9 [9, 10)    1256      53      255
+#> 11    10 [10, 11)    948      35      265
+#> 12    11 [11, 12)    648      16      241
+#> 13    12 [12, 13)    391       5      386
+

As Singer and Willett (2003) discuss, we interpret the columns of the +life table as follows:

+ +

Importantly, notice that once an individual experiences the target +event or is censored during an interval, they drop out of the risk set +in all future intervals; thus, the risk set is inherently +irreversible.

+
+
+

10.2 A framework for characterizing the distribution of +discrete-time event occurrence data +

+

In Section 10.2 Singer and Willett (2003) introduce three statistics +for summarizing the event history information of the life table, which +can estimated directly from the life table:

+ +
+

Using the life table to estimate hazard probability, survival +probability, and median lifetime +

+

First, the discrete-time hazard function and the survival function. +Note the use of if-else statements to provide preset values for the +“beginning of time”, which by definition will always be NA +for the discrete-time hazard function and 1 for the +survival function.

+
+teachers_lifetable <- teachers_lifetable |>
   mutate(
-    interval = paste0("[", time, ", ", time + 1, ")"),
-    haz.estimate = n.event / n.risk
-  ) |>
-  rename(year = time, surv.estimate = estimate) |>
-  relocate(
-    year, interval, n.risk, n.event, n.censor, haz.estimate, surv.estimate
+    haz.estimate = if_else(year != 0, n.event / n.risk, NA),
+    surv.estimate = if_else(year != 0, 1 - haz.estimate, 1),
+    surv.estimate = cumprod(surv.estimate)
   )
 
-table_10.1
+# Table 10.1, page 327:
+teachers_lifetable
 #> # A tibble: 13 × 7
 #>     year interval n.risk n.event n.censor haz.estimate surv.estimate
 #>    <dbl> <chr>     <dbl>   <dbl>    <dbl>        <dbl>         <dbl>
-#>  1     0 [0, 1)     3941       0        0       0              1    
+#>  1     0 [0, 1)     3941       0        0      NA              1    
 #>  2     1 [1, 2)     3941     456        0       0.116          0.884
 #>  3     2 [2, 3)     3485     384        0       0.110          0.787
 #>  4     3 [3, 4)     3101     359        0       0.116          0.696
@@ -165,154 +360,469 @@ 

10.1 The Life Table#> 11 10 [10, 11) 948 35 265 0.0369 0.428 #> 12 11 [11, 12) 648 16 241 0.0247 0.418 #> 13 12 [12, 13) 391 5 386 0.0128 0.412

-
-
-

10.2 A Framework for Characterizing the Distribution of -Discrete-Time Event Occurrence Data -

-

Figure 10.1, page 333:

-
-ggplot(table_10.1, aes(x = year, y = haz.estimate)) +
-  geom_line() +
-  scale_x_continuous(breaks = 0:13, limits = c(1, 13)) +
-  scale_y_continuous(breaks = c(0, .05, .1, .15), limits = c(0, .15)) +
-  coord_cartesian(xlim = c(0, 13))
-#> Warning: Removed 1 row containing missing values or values outside the scale range
-#> (`geom_line()`).
-

-
-
-# First interpolate median lifetime
-median_lifetime <- table_10.1 |>
-  # Get the row indices for the first survival estimate immediately below and
-  # immediately above 0.5. This will only work correctly if the values are in
-  # descending order, otherwise min() and max() must be swapped. By default, the
-  # survival estimates are in descending order, however, I've added the
-  # redundant step of ensuring they are here for demonstration purposes.
-  arrange(desc(surv.estimate)) |>
-  slice(min(which(surv.estimate <= .5)), max(which(surv.estimate >= .5))) |>
-  select(year, surv.estimate) |>
-  # Linearly interpolate between the two values of the survival estimates that
-  # bracket .5 following Miller's (1981) equation.
+

Then the median lifetime. Here we use the slice() +function from the dplyr package to select the time +intervals immediately before and after the median lifetime, then do a +bit of wrangling to make applying the median lifetime equation easier +and clearer.

+
+teachers_median_lifetime <- teachers_lifetable |>
+  slice(max(which(surv.estimate >= .5)), min(which(surv.estimate <= .5))) |>
+  mutate(m = c("before", "after")) |>
+  select(m, year, surv = surv.estimate) |>
+  pivot_wider(names_from = m, values_from = c(year, surv)) |>
   summarise(
-    year =
-      min(year) +
-      ((max(surv.estimate) - .5) /
-       (max(surv.estimate) - min(surv.estimate))) *
-      ((min(year) + 1) - min(year)),
-    surv.estimate = .5
+    surv.estimate = .5,
+    year = year_before
+      + ((surv_before - .5) / (surv_before - surv_after))
+      * (year_after - year_before)
   )
+  
+teachers_median_lifetime
+#> # A tibble: 1 × 2
+#>   surv.estimate  year
+#>           <dbl> <dbl>
+#> 1           0.5  6.61
+

A valuable way of examining these statistics is to plot their +trajectories over time.

+
+teachers_haz <- ggplot(teachers_lifetable, aes(x = year, y = haz.estimate)) +
+  geom_line() +
+  scale_x_continuous(breaks = 0:13) +
+  coord_cartesian(xlim = c(0, 13), ylim = c(0, .15))
 
-ggplot(table_10.1, aes(x = year, y = surv.estimate)) +
+teachers_surv <- ggplot(teachers_lifetable, aes(x = year, y = surv.estimate)) +
   geom_line() +
   geom_segment(
-    aes(xend = year, y = 0, yend = .5), data = median_lifetime, linetype = 2
+    aes(xend = year, y = 0, yend = .5),
+    data = teachers_median_lifetime,
+    linetype = 2
   ) +
   geom_segment(
-    aes(xend = 0, yend = .5), data = median_lifetime, linetype = 2
+    aes(xend = 0, yend = .5),
+    data = teachers_median_lifetime,
+    linetype = 2
   ) +
   scale_x_continuous(breaks = 0:13) +
-  scale_y_continuous(breaks = c(0, .5, 1), limits = c(0, 1)) +
-  coord_cartesian(xlim = c(0, 13))
-

+ scale_y_continuous(breaks = c(0, .5, 1)) + + coord_cartesian(xlim = c(0, 13)) + +# Figure 10.1, page 333: +teachers_haz + teachers_surv + plot_layout(ncol = 1, axes = "collect")
+

+

When examining plots like these, Singer and Willett (2003) recommend +looking for patterns in and between the trajectories to answer questions +like:

+
    +
  • What is the overall shape of the hazard function?
  • +
  • When are the time periods of high and low risk?
  • +
  • Are time periods with elevated risk likely to affect large or small +numbers of people, given the value of the survivor function?
  • +
+
-

10.3 Developing Intuition About Hazard Functions, Survivor -Functions, and Median Lifetimes +

10.3 Developing intuition about hazard functions, survivor +functions, and median lifetimes

-

Figure 10.2, page 340:

-
-relapse_fit <- survfit(Surv(weeks, 1 - censor) ~ 1, data = cocaine_relapse_1)
-relapse_tidy <- tidy(relapse_fit)
-relapse_summary <- glance(relapse_fit)
+

In Section 10.3 Singer and Willett (2003) examine and describe the +estimated discrete-time hazard functions, survivor functions, and median +lifetimes from four studies that differ by their type of target event, +metric for clocking time, and underlying profile of risk:

+ +

We can plot the discrete-time hazard functions, survivor functions, +and median lifetimes from each of these four studies in a single call +using the pmap() function from the purrr +package.

+
+study_plots <- pmap(
+  list(
+      list("cocaine_relapse_1", "first_sex", "suicide_ideation", "congresswomen"),
+      list(cocaine_relapse_1, first_sex, suicide_ideation, congresswomen),
+      list("weeks", "grade", "age", "terms"),
+      list(0, 6, 5, 0)
+  ),
+  \(.title, .study, .time, .beginning) {
+    
+    # Get life table statistics.
+    study_fit <- survfit(Surv(.study[[.time]], 1 - censor) ~ 1, data = .study)
+    
+    study_lifetable <- study_fit |>
+      survfit0(start.time = .beginning) |>
+      tidy() |>
+      rename(surv.estimate = estimate) |>
+      mutate(haz.estimate = if_else(time != .beginning, n.event / n.risk, NA))
+    
+    study_median_lifetime <- study_lifetable |>
+      slice(max(which(surv.estimate >= .5)), min(which(surv.estimate <= .5))) |>
+      mutate(m = c("before", "after")) |>
+      select(m, time, surv = surv.estimate) |>
+      pivot_wider(names_from = m, values_from = c(time, surv)) |>
+      summarise(
+        surv.estimate = .5,
+        time = time_before
+          + ((surv_before - .5) / (surv_before - surv_after))
+          * (time_after - time_before)
+      )
+    
+    # Plot discrete-time hazard and survival functions.
+    study_haz <- ggplot(study_lifetable, aes(x = time, y = haz.estimate)) +
+      geom_line() +
+      xlab(.time)
+    
+    study_surv <- ggplot(study_lifetable, aes(x = time, y = surv.estimate)) +
+      geom_line() +
+      geom_segment(
+        aes(xend = time, y = 0, yend = .5),
+        data = study_median_lifetime,
+        linetype = 2
+      ) +
+      geom_segment(
+        aes(xend = .beginning, yend = .5),
+        data = study_median_lifetime,
+        linetype = 2
+      ) +
+      xlab(.time)
+    
+    wrap_elements(panel = (study_haz | study_surv)) + ggtitle(.title)
+  }
+)
+
+# Figure 10.2, page 340:
+wrap_plots(study_plots, ncol = 1)
+

+

Focusing on the overall shape of the discrete-time hazard functions, +and contextualizing their shape against their respective survival +functions, Singer and Willet (2003) make the following observations:

+
-

10.4 Quantifying the Effects of Sampling Variation +

10.4 Quantifying the effects of sampling variation

-

Table 10.2, page 349:

-
-summary(teachers_fit)
-#> Call: survfit(formula = Surv(years, 1 - censor) ~ 1, data = teachers)
-#> 
-#>  time n.risk n.event survival std.err lower 95% CI upper 95% CI
-#>     1   3941     456    0.884 0.00510        0.874        0.894
-#>     2   3485     384    0.787 0.00652        0.774        0.800
-#>     3   3101     359    0.696 0.00733        0.682        0.710
-#>     4   2742     295    0.621 0.00773        0.606        0.636
-#>     5   2447     218    0.566 0.00790        0.550        0.581
-#>     6   2229     184    0.519 0.00796        0.504        0.535
-#>     7   2045     123    0.488 0.00796        0.472        0.504
-#>     8   1642      79    0.464 0.00800        0.449        0.480
-#>     9   1256      53    0.445 0.00811        0.429        0.461
-#>    10    948      35    0.428 0.00827        0.412        0.445
-#>    11    648      16    0.418 0.00848        0.401        0.435
-#>    12    391       5    0.412 0.00870        0.396        0.430
-
-
-teachers_fit |>
-  tidy() |>
+

In Section 10.4 Singer and Willett (2003) return to the +teachers data to discuss standard errors for the estimated +discrete-time hazard probabilities and survival probabilities, which can +also estimated directly from the life table:

+
    +
  • +

    Because the estimated discrete-time hazard probability is simply +a sample proportion, its standard error in the \(j\)th time period can be estimated using +the usual formula for estimating the standard error of a proportion:

    +

    \[ +se \big(\hat h(t_j) \big) = + \sqrt{\frac{\hat h(t_j) \big(1 - \hat h(t_j) \big)}{n \text{ at +risk}_j}}. +\]

    +
  • +
  • +

    For risk sets greater than size 20, the standard error of the +survival probability in the \(j\)th +time period can be can be estimated using Greenwood’s approximation:

    +

    \[ +se \big(\hat S(t_j) \big) = + \hat S(t_j) \sqrt{ + \frac{\hat h(t_1)}{n \text{ at risk}_1 \big(1 - \hat h(t_1) \big)} + + \frac{\hat h(t_2)}{n \text{ at risk}_2 \big(1 - \hat h(t_2) \big)} + + \cdots + + \frac{\hat h(t_j)}{n \text{ at risk}_j \big(1 - \hat h(t_j) \big)} + }. +\]

    +
  • +
+

We estimate these standard errors here using the +teachers_lifetable from Section 10.2.

+
+# Table 10.2, page 349:
+teachers_lifetable |>
+  filter(year != 0) |>
   mutate(
-    # The tidy() method for survfit objects returns the standard error for the
-    # cumulative hazard instead of the survival probability. Multiplying the
-    # survival estimate with the cumulative hazard's standard error will return
-    # the standard error for the survival probability. Note that it is unlikely
-    # the tidy() method will ever change to return the the standard error for
-    # the survival probability instead. See:
-    # - https://github.com/tidymodels/broom/pull/1162
-    # Other transformations of the survival probability can be found here:
-    # - https://stat.ethz.ch/pipermail/r-help/2014-June/376247.html
-    surv.std.error = estimate * std.error,
-    haz.estimate = n.event / n.risk,
     haz.std.error = sqrt(haz.estimate * (1 - haz.estimate) / n.risk),
-    sqrt = (std.error)^2 / (estimate)^2
+    surv.std.error = surv.estimate * sqrt(
+      cumsum(haz.estimate / (n.risk * (1 - haz.estimate)))
+    )
   ) |>
-  select(
-    year = time,
-    n.risk,
-    haz.estimate,
-    haz.std.error,
-    surv.estimate = estimate,
-    sqrt,
-    surv.std.error
-  )
-#> # A tibble: 12 × 7
-#>     year n.risk haz.estimate haz.std.error surv.estimate     sqrt surv.std.error
-#>    <dbl>  <dbl>        <dbl>         <dbl>         <dbl>    <dbl>          <dbl>
-#>  1     1   3941       0.116        0.00510         0.884  4.25e-5        0.00510
-#>  2     2   3485       0.110        0.00530         0.787  1.11e-4        0.00652
-#>  3     3   3101       0.116        0.00575         0.696  2.29e-4        0.00733
-#>  4     4   2742       0.108        0.00592         0.621  4.02e-4        0.00773
-#>  5     5   2447       0.0891       0.00576         0.566  6.09e-4        0.00790
-#>  6     6   2229       0.0825       0.00583         0.519  8.74e-4        0.00796
-#>  7     7   2045       0.0601       0.00526         0.488  1.12e-3        0.00796
-#>  8     8   1642       0.0481       0.00528         0.464  1.38e-3        0.00800
-#>  9     9   1256       0.0422       0.00567         0.445  1.68e-3        0.00811
-#> 10    10    948       0.0369       0.00612         0.428  2.03e-3        0.00827
-#> 11    11    648       0.0247       0.00610         0.418  2.36e-3        0.00848
-#> 12    12    391       0.0128       0.00568         0.412  2.62e-3        0.00870
+ select(year, n.risk, starts_with("haz"), starts_with("surv")) +#> # A tibble: 12 × 6 +#> year n.risk haz.estimate haz.std.error surv.estimate surv.std.error +#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> +#> 1 1 3941 0.116 0.00510 0.884 0.00510 +#> 2 2 3485 0.110 0.00530 0.787 0.00652 +#> 3 3 3101 0.116 0.00575 0.696 0.00733 +#> 4 4 2742 0.108 0.00592 0.621 0.00773 +#> 5 5 2447 0.0891 0.00576 0.566 0.00790 +#> 6 6 2229 0.0825 0.00583 0.519 0.00796 +#> 7 7 2045 0.0601 0.00526 0.488 0.00796 +#> 8 8 1642 0.0481 0.00528 0.464 0.00800 +#> 9 9 1256 0.0422 0.00567 0.445 0.00811 +#> 10 10 948 0.0369 0.00612 0.428 0.00827 +#> 11 11 648 0.0247 0.00610 0.418 0.00848 +#> 12 12 391 0.0128 0.00568 0.412 0.00870
-

10.5 A Simple and Useful Strategy for Constructing the Life -Table +

10.5 A simple and useful strategy for constructing the life +table

-

Figure 10.4, page 353:

-
-filter(teachers, id %in% c(20, 126, 129))
+

In Section 10.5 Singer and Willett (2003) introduce the +person-period format for event occurrence data, +demonstrating how it can be used to construct the life table “by hand” +using the person-level teachers data +set.

+
+

The person-level data set +

+

In the person-level format for event occurrence data, each person has +only one row of data with columns for their event time and censorship +status, and (optionally) a participant identifier variable or any other +variables of interest. This is demonstrated in the teachers +data set, A person-level data frame with 3941 rows and 3 columns:

+
    +
  • +id: Teacher ID.
  • +
  • +years: The number of years between a teacher’s dates of +hire and departure from the Michigan public schools.
  • +
  • +censor: Censoring status.
  • +
+
+teachers
+#> # A tibble: 3,941 × 3
+#>    id    years censor
+#>    <fct> <dbl>  <dbl>
+#>  1 1         1      0
+#>  2 2         2      0
+#>  3 3         1      0
+#>  4 4         1      0
+#>  5 5        12      1
+#>  6 6         1      0
+#>  7 7        12      1
+#>  8 8         1      0
+#>  9 9         2      0
+#> 10 10        2      0
+#> # ℹ 3,931 more rows
+

Note that unlike when modelling change, the person-level data set +does not contain multiple columns for each time period; thus, as we will +demonstrate below, a new strategy is needed to convert a person-level +data set into a person-period data set. Additionally, and also unlike +when modelling change, the person-level data set is often useful for +analyzing event occurrence—as we have demonstrated through several +examples in the current and previous chapter.

+
+
+

The person-period data set +

+

In the person-period format for event occurrence data, each person +has one row of data for each time period when they were +at risk, with a participant identifier variable for +each person, and an event-indicator variable for each +time period.

+

We can use the reframe() function from the +dplyr package to convert a person-level data set into a +person-period data set. The reframe() function works +similarly to dplyr’s summarise() function, except that it +can return an arbitrary number of rows per group. We take advantage of +this property to add rows for each time period when individuals were at +risk, then use the information stored in these new rows and the +person-level data set to identify whether an event occurred in each +individual’s last period, given their censorship status.

+
+teachers_pp <- teachers |>
+  group_by(id) |>
+  reframe(
+    year = 1:years,
+    event = if_else(year == years & censor == 0, true = 1, false = 0)
+  )
+
+teachers_pp
+#> # A tibble: 24,875 × 3
+#>    id     year event
+#>    <fct> <int> <dbl>
+#>  1 1         1     1
+#>  2 2         1     0
+#>  3 2         2     1
+#>  4 3         1     1
+#>  5 4         1     1
+#>  6 5         1     0
+#>  7 5         2     0
+#>  8 5         3     0
+#>  9 5         4     0
+#> 10 5         5     0
+#> # ℹ 24,865 more rows
+

Following similar logic, we can use the summarise() +function from the dplyr package to convert a person-period data set to +person-level data set.

+
+teachers_pl <- teachers_pp |>
+  group_by(id) |>
+  summarise(
+    years = max(year),
+    censor = if_else(all(event == 0), true = 1, false = 0)
+  )
+
+teachers_pl
+#> # A tibble: 3,941 × 3
+#>    id    years censor
+#>    <fct> <int>  <dbl>
+#>  1 1         1      0
+#>  2 2         2      0
+#>  3 3         1      0
+#>  4 4         1      0
+#>  5 5        12      1
+#>  6 6         1      0
+#>  7 7        12      1
+#>  8 8         1      0
+#>  9 9         2      0
+#> 10 10        2      0
+#> # ℹ 3,931 more rows
+

The difference between the person-level and person-period formats is +best seen by examining the data from a subset of individuals with +different (censored) event times.

+
+# Figure 10.4, page 353:
+filter(teachers_pl, id %in% c(20, 126, 129))
 #> # A tibble: 3 × 3
 #>   id    years censor
-#>   <fct> <dbl>  <dbl>
+#>   <fct> <int>  <dbl>
 #> 1 20        3      0
 #> 2 126      12      0
 #> 3 129      12      1
-
+
 
-teachers_pp <- teachers |>
-  reframe(
-    year = 1:max(years),
-    event = if_else(year == years & censor == 0, 1, 0),
-    .by = id
-  )
-
 teachers_pp |>
   filter(id %in% c(20, 126, 129)) |>
   print(n = 27)
@@ -346,32 +856,45 @@ 

10.5 A Sim #> 25 129 10 0 #> 26 129 11 0 #> 27 129 12 0

-

Table 10.3, page 355:

-
-teachers_pp |>
+
+
+

Using the person-period data set to construct the life table +

+

The life table can be constructed using the person-period data set +through cross-tabulation of the time period and event-indicator +variables. This can be accomplished using a standard +df |> group_by(...) |> summarise(...) statement with +the dplyr package, where we count the number of individuals who were at +risk, who experienced the target event, and who were censored for each +time period. After this, statistics for summarizing the event history +information of the life table can be estimated using the methods +demonstrated in Section 10.2.

+
+# Table 10.3, page 355:
+teachers_pp |>
   group_by(year) |>
-  count(event) |>
-  pivot_wider(names_from = event, names_prefix = "event_", values_from = n) |>
-  mutate(
-    total = event_0 + event_1,
-    p.event_1 = event_1 / total
+  summarise(
+    n.risk = n(),
+    n.event = sum(event == 1),
+    n.censor = sum(event == 0),
+    haz.estimate = n.event / n.risk
   )
 #> # A tibble: 12 × 5
-#> # Groups:   year [12]
-#>     year event_0 event_1 total p.event_1
-#>    <int>   <int>   <int> <int>     <dbl>
-#>  1     1    3485     456  3941    0.116 
-#>  2     2    3101     384  3485    0.110 
-#>  3     3    2742     359  3101    0.116 
-#>  4     4    2447     295  2742    0.108 
-#>  5     5    2229     218  2447    0.0891
-#>  6     6    2045     184  2229    0.0825
-#>  7     7    1922     123  2045    0.0601
-#>  8     8    1563      79  1642    0.0481
-#>  9     9    1203      53  1256    0.0422
-#> 10    10     913      35   948    0.0369
-#> 11    11     632      16   648    0.0247
-#> 12    12     386       5   391    0.0128
+#> year n.risk n.event n.censor haz.estimate +#> <int> <int> <int> <int> <dbl> +#> 1 1 3941 456 3485 0.116 +#> 2 2 3485 384 3101 0.110 +#> 3 3 3101 359 2742 0.116 +#> 4 4 2742 295 2447 0.108 +#> 5 5 2447 218 2229 0.0891 +#> 6 6 2229 184 2045 0.0825 +#> 7 7 2045 123 1922 0.0601 +#> 8 8 1642 79 1563 0.0481 +#> 9 9 1256 53 1203 0.0422 +#> 10 10 948 35 913 0.0369 +#> 11 11 648 16 632 0.0247 +#> 12 12 391 5 386 0.0128
+
diff --git a/articles/chapter-10_files/figure-html/unnamed-chunk-12-1.png b/articles/chapter-10_files/figure-html/unnamed-chunk-12-1.png new file mode 100644 index 0000000..81e6ae8 Binary files /dev/null and b/articles/chapter-10_files/figure-html/unnamed-chunk-12-1.png differ diff --git a/articles/chapter-10_files/figure-html/unnamed-chunk-7-1.png b/articles/chapter-10_files/figure-html/unnamed-chunk-7-1.png new file mode 100644 index 0000000..8f0354a Binary files /dev/null and b/articles/chapter-10_files/figure-html/unnamed-chunk-7-1.png differ diff --git a/articles/chapter-5.html b/articles/chapter-5.html index 7bc2774..c372b06 100644 --- a/articles/chapter-5.html +++ b/articles/chapter-5.html @@ -1578,8 +1578,8 @@

5 Fixing rates of change, where the model is simplified by removing the varying slope change. -

For this example we a subset of the dropout_wages data -purposefully constructed to be severely unbalanced.

+

For this example we use a subset of the dropout_wages +data purposefully constructed to be severely unbalanced.

 dropout_wages_subset
 #> # A tibble: 257 × 5
diff --git a/articles/chapter-9.html b/articles/chapter-9.html
index d75cbae..0d81e99 100644
--- a/articles/chapter-9.html
+++ b/articles/chapter-9.html
@@ -175,18 +175,18 @@ 

9.1 Sh
 suicide_ideation
 #> # A tibble: 391 × 4
-#>    id     time censor   age
-#>    <fct> <dbl>  <dbl> <dbl>
-#>  1 1        16      0    18
-#>  2 2        10      0    19
-#>  3 3        16      0    19
-#>  4 4        20      0    22
-#>  5 6        15      0    22
-#>  6 7        10      0    19
-#>  7 8        22      1    22
-#>  8 9        22      1    22
-#>  9 10       15      0    20
-#> 10 11       10      0    19
+#>    id      age censor age_now
+#>    <fct> <dbl>  <dbl>   <dbl>
+#>  1 1        16      0      18
+#>  2 2        10      0      19
+#>  3 3        16      0      19
+#>  4 4        20      0      22
+#>  5 6        15      0      22
+#>  6 7        10      0      19
+#>  7 8        22      1      22
+#>  8 9        22      1      22
+#>  9 10       15      0      20
+#> 10 11       10      0      19
 #> # ℹ 381 more rows
diff --git a/pkgdown.yml b/pkgdown.yml index 4b0f9d9..691b700 100644 --- a/pkgdown.yml +++ b/pkgdown.yml @@ -17,7 +17,7 @@ articles: chapter-8: chapter-8.html chapter-9: chapter-9.html longitudinal-data-organization: longitudinal-data-organization.html -last_built: 2024-05-23T20:12Z +last_built: 2024-05-27T04:16Z urls: reference: https://mccarthy-m-g.github.io/alda/reference article: https://mccarthy-m-g.github.io/alda/articles diff --git a/reference/cocaine_relapse_1.html b/reference/cocaine_relapse_1.html index 2a5dee3..2034b09 100644 --- a/reference/cocaine_relapse_1.html +++ b/reference/cocaine_relapse_1.html @@ -1,8 +1,8 @@ -Weeks to cocaine relapse after treatment — cocaine_relapse_1 • aldaWeeks to cocaine relapse after treatment — cocaine_relapse_1 • aldaAge of first sexual intercourse — first_sex • aldaAge of first sexual intercourse — first_sex • aldaSurvival analysis

-

A subset of data from Capaldi, Crosby, and Stoolmiller's (1996) measuring the +

A subset of data from Capaldi, Crosby, and Stoolmiller (1996) measuring the grade year of first sexual intercourse in a sample of 180 at-risk heterosexual adolescent males. Adolescent males were followed from Grade 7 up to Grade 12 or until they reported having had sexual intercourse for the diff --git a/reference/suicide_ideation.html b/reference/suicide_ideation.html index 8e3a92e..15ade4c 100644 --- a/reference/suicide_ideation.html +++ b/reference/suicide_ideation.html @@ -103,13 +103,13 @@

Format<
id

Participant ID.

-
time
+
age

Reported age of first suicide ideation.

censor

Censoring status.

-
age
+
age_now

Participant age at the time of the survey.

diff --git a/search.json b/search.json index 0742832..b89392d 100644 --- a/search.json +++ b/search.json @@ -1 +1 @@ -[{"path":[]},{"path":"https://mccarthy-m-g.github.io/alda/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"CC0 1.0 Universal","title":"CC0 1.0 Universal","text":"CREATIVE COMMONS CORPORATION LAW FIRM PROVIDE LEGAL SERVICES. DISTRIBUTION DOCUMENT CREATE ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES INFORMATION “-” BASIS. CREATIVE COMMONS MAKES WARRANTIES REGARDING USE DOCUMENT INFORMATION WORKS PROVIDED HEREUNDER, DISCLAIMS LIABILITY DAMAGES RESULTING USE DOCUMENT INFORMATION WORKS PROVIDED HEREUNDER.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/LICENSE.html","id":"statement-of-purpose","dir":"","previous_headings":"","what":"Statement of Purpose","title":"CC0 1.0 Universal","text":"laws jurisdictions throughout world automatically confer exclusive Copyright Related Rights (defined ) upon creator subsequent owner(s) (, “owner”) original work authorship /database (, “Work”). Certain owners wish permanently relinquish rights Work purpose contributing commons creative, cultural scientific works (“Commons”) public can reliably without fear later claims infringement build upon, modify, incorporate works, reuse redistribute freely possible form whatsoever purposes, including without limitation commercial purposes. owners may contribute Commons promote ideal free culture production creative, cultural scientific works, gain reputation greater distribution Work part use efforts others. /purposes motivations, without expectation additional consideration compensation, person associating CC0 Work (“Affirmer”), extent owner Copyright Related Rights Work, voluntarily elects apply CC0 Work publicly distribute Work terms, knowledge Copyright Related Rights Work meaning intended legal effect CC0 rights. Copyright Related Rights. Work made available CC0 may protected copyright related neighboring rights (“Copyright Related Rights”). Copyright Related Rights include, limited , following: right reproduce, adapt, distribute, perform, display, communicate, translate Work; moral rights retained original author(s) /performer(s); publicity privacy rights pertaining person’s image likeness depicted Work; rights protecting unfair competition regards Work, subject limitations paragraph 4(), ; rights protecting extraction, dissemination, use reuse data Work; database rights (arising Directive 96/9/EC European Parliament Council 11 March 1996 legal protection databases, national implementation thereof, including amended successor version directive); similar, equivalent corresponding rights throughout world based applicable law treaty, national implementations thereof. Waiver. greatest extent permitted , contravention , applicable law, Affirmer hereby overtly, fully, permanently, irrevocably unconditionally waives, abandons, surrenders Affirmer’s Copyright Related Rights associated claims causes action, whether now known unknown (including existing well future claims causes action), Work () territories worldwide, (ii) maximum duration provided applicable law treaty (including future time extensions), (iii) current future medium number copies, (iv) purpose whatsoever, including without limitation commercial, advertising promotional purposes (“Waiver”). Affirmer makes Waiver benefit member public large detriment Affirmer’s heirs successors, fully intending Waiver shall subject revocation, rescission, cancellation, termination, legal equitable action disrupt quiet enjoyment Work public contemplated Affirmer’s express Statement Purpose. Public License Fallback. part Waiver reason judged legally invalid ineffective applicable law, Waiver shall preserved maximum extent permitted taking account Affirmer’s express Statement Purpose. addition, extent Waiver judged Affirmer hereby grants affected person royalty-free, non transferable, non sublicensable, non exclusive, irrevocable unconditional license exercise Affirmer’s Copyright Related Rights Work () territories worldwide, (ii) maximum duration provided applicable law treaty (including future time extensions), (iii) current future medium number copies, (iv) purpose whatsoever, including without limitation commercial, advertising promotional purposes (“License”). License shall deemed effective date CC0 applied Affirmer Work. part License reason judged legally invalid ineffective applicable law, partial invalidity ineffectiveness shall invalidate remainder License, case Affirmer hereby affirms () exercise remaining Copyright Related Rights Work (ii) assert associated claims causes action respect Work, either case contrary Affirmer’s express Statement Purpose. Limitations Disclaimers. trademark patent rights held Affirmer waived, abandoned, surrendered, licensed otherwise affected document. Affirmer offers Work -makes representations warranties kind concerning Work, express, implied, statutory otherwise, including without limitation warranties title, merchantability, fitness particular purpose, non infringement, absence latent defects, accuracy, present absence errors, whether discoverable, greatest extent permissible applicable law. Affirmer disclaims responsibility clearing rights persons may apply Work use thereof, including without limitation person’s Copyright Related Rights Work. , Affirmer disclaims responsibility obtaining necessary consents, permissions rights required use Work. Affirmer understands acknowledges Creative Commons party document duty obligation respect CC0 use Work.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-10.html","id":"the-life-table","dir":"Articles","previous_headings":"","what":"10.1 The Life Table","title":"Chapter 10: Describing discrete-time event occurrence data","text":"Table 10.1, page 327:","code":"# A life table is the tabular form of a survival curve, so begin by fitting a # Kaplan-Meir curve to the data. teachers_fit <- survfit(Surv(years, 1 - censor) ~ 1, data = teachers) table_10.1 <- teachers_fit |> # Add a starting time (time 0) for the table. survfit0() |> tidy() |> # The summary of the fit gives most of what we want, but to match Table 10.1 # we need to do a little more wrangling. select(-c(std.error:conf.low)) |> mutate( interval = paste0(\"[\", time, \", \", time + 1, \")\"), haz.estimate = n.event / n.risk ) |> rename(year = time, surv.estimate = estimate) |> relocate( year, interval, n.risk, n.event, n.censor, haz.estimate, surv.estimate ) table_10.1 #> # A tibble: 13 × 7 #> year interval n.risk n.event n.censor haz.estimate surv.estimate #> #> 1 0 [0, 1) 3941 0 0 0 1 #> 2 1 [1, 2) 3941 456 0 0.116 0.884 #> 3 2 [2, 3) 3485 384 0 0.110 0.787 #> 4 3 [3, 4) 3101 359 0 0.116 0.696 #> 5 4 [4, 5) 2742 295 0 0.108 0.621 #> 6 5 [5, 6) 2447 218 0 0.0891 0.566 #> 7 6 [6, 7) 2229 184 0 0.0825 0.519 #> 8 7 [7, 8) 2045 123 280 0.0601 0.488 #> 9 8 [8, 9) 1642 79 307 0.0481 0.464 #> 10 9 [9, 10) 1256 53 255 0.0422 0.445 #> 11 10 [10, 11) 948 35 265 0.0369 0.428 #> 12 11 [11, 12) 648 16 241 0.0247 0.418 #> 13 12 [12, 13) 391 5 386 0.0128 0.412"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-10.html","id":"a-framework-for-characterizing-the-distribution-of-discrete-time-event-occurrence-data","dir":"Articles","previous_headings":"","what":"10.2 A Framework for Characterizing the Distribution of Discrete-Time Event Occurrence Data","title":"Chapter 10: Describing discrete-time event occurrence data","text":"Figure 10.1, page 333:","code":"ggplot(table_10.1, aes(x = year, y = haz.estimate)) + geom_line() + scale_x_continuous(breaks = 0:13, limits = c(1, 13)) + scale_y_continuous(breaks = c(0, .05, .1, .15), limits = c(0, .15)) + coord_cartesian(xlim = c(0, 13)) #> Warning: Removed 1 row containing missing values or values outside the scale range #> (`geom_line()`). # First interpolate median lifetime median_lifetime <- table_10.1 |> # Get the row indices for the first survival estimate immediately below and # immediately above 0.5. This will only work correctly if the values are in # descending order, otherwise min() and max() must be swapped. By default, the # survival estimates are in descending order, however, I've added the # redundant step of ensuring they are here for demonstration purposes. arrange(desc(surv.estimate)) |> slice(min(which(surv.estimate <= .5)), max(which(surv.estimate >= .5))) |> select(year, surv.estimate) |> # Linearly interpolate between the two values of the survival estimates that # bracket .5 following Miller's (1981) equation. summarise( year = min(year) + ((max(surv.estimate) - .5) / (max(surv.estimate) - min(surv.estimate))) * ((min(year) + 1) - min(year)), surv.estimate = .5 ) ggplot(table_10.1, aes(x = year, y = surv.estimate)) + geom_line() + geom_segment( aes(xend = year, y = 0, yend = .5), data = median_lifetime, linetype = 2 ) + geom_segment( aes(xend = 0, yend = .5), data = median_lifetime, linetype = 2 ) + scale_x_continuous(breaks = 0:13) + scale_y_continuous(breaks = c(0, .5, 1), limits = c(0, 1)) + coord_cartesian(xlim = c(0, 13))"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-10.html","id":"developing-intuition-about-hazard-functions-survivor-functions-and-median-lifetimes","dir":"Articles","previous_headings":"","what":"10.3 Developing Intuition About Hazard Functions, Survivor Functions, and Median Lifetimes","title":"Chapter 10: Describing discrete-time event occurrence data","text":"Figure 10.2, page 340:","code":"relapse_fit <- survfit(Surv(weeks, 1 - censor) ~ 1, data = cocaine_relapse_1) relapse_tidy <- tidy(relapse_fit) relapse_summary <- glance(relapse_fit)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-10.html","id":"quantifying-the-effects-of-sampling-variation","dir":"Articles","previous_headings":"","what":"10.4 Quantifying the Effects of Sampling Variation","title":"Chapter 10: Describing discrete-time event occurrence data","text":"Table 10.2, page 349:","code":"summary(teachers_fit) #> Call: survfit(formula = Surv(years, 1 - censor) ~ 1, data = teachers) #> #> time n.risk n.event survival std.err lower 95% CI upper 95% CI #> 1 3941 456 0.884 0.00510 0.874 0.894 #> 2 3485 384 0.787 0.00652 0.774 0.800 #> 3 3101 359 0.696 0.00733 0.682 0.710 #> 4 2742 295 0.621 0.00773 0.606 0.636 #> 5 2447 218 0.566 0.00790 0.550 0.581 #> 6 2229 184 0.519 0.00796 0.504 0.535 #> 7 2045 123 0.488 0.00796 0.472 0.504 #> 8 1642 79 0.464 0.00800 0.449 0.480 #> 9 1256 53 0.445 0.00811 0.429 0.461 #> 10 948 35 0.428 0.00827 0.412 0.445 #> 11 648 16 0.418 0.00848 0.401 0.435 #> 12 391 5 0.412 0.00870 0.396 0.430 teachers_fit |> tidy() |> mutate( # The tidy() method for survfit objects returns the standard error for the # cumulative hazard instead of the survival probability. Multiplying the # survival estimate with the cumulative hazard's standard error will return # the standard error for the survival probability. Note that it is unlikely # the tidy() method will ever change to return the the standard error for # the survival probability instead. See: # - https://github.com/tidymodels/broom/pull/1162 # Other transformations of the survival probability can be found here: # - https://stat.ethz.ch/pipermail/r-help/2014-June/376247.html surv.std.error = estimate * std.error, haz.estimate = n.event / n.risk, haz.std.error = sqrt(haz.estimate * (1 - haz.estimate) / n.risk), sqrt = (std.error)^2 / (estimate)^2 ) |> select( year = time, n.risk, haz.estimate, haz.std.error, surv.estimate = estimate, sqrt, surv.std.error ) #> # A tibble: 12 × 7 #> year n.risk haz.estimate haz.std.error surv.estimate sqrt surv.std.error #> #> 1 1 3941 0.116 0.00510 0.884 4.25e-5 0.00510 #> 2 2 3485 0.110 0.00530 0.787 1.11e-4 0.00652 #> 3 3 3101 0.116 0.00575 0.696 2.29e-4 0.00733 #> 4 4 2742 0.108 0.00592 0.621 4.02e-4 0.00773 #> 5 5 2447 0.0891 0.00576 0.566 6.09e-4 0.00790 #> 6 6 2229 0.0825 0.00583 0.519 8.74e-4 0.00796 #> 7 7 2045 0.0601 0.00526 0.488 1.12e-3 0.00796 #> 8 8 1642 0.0481 0.00528 0.464 1.38e-3 0.00800 #> 9 9 1256 0.0422 0.00567 0.445 1.68e-3 0.00811 #> 10 10 948 0.0369 0.00612 0.428 2.03e-3 0.00827 #> 11 11 648 0.0247 0.00610 0.418 2.36e-3 0.00848 #> 12 12 391 0.0128 0.00568 0.412 2.62e-3 0.00870"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-10.html","id":"a-simple-and-useful-strategy-for-constructing-the-life-table","dir":"Articles","previous_headings":"","what":"10.5 A Simple and Useful Strategy for Constructing the Life Table","title":"Chapter 10: Describing discrete-time event occurrence data","text":"Figure 10.4, page 353: Table 10.3, page 355:","code":"filter(teachers, id %in% c(20, 126, 129)) #> # A tibble: 3 × 3 #> id years censor #> #> 1 20 3 0 #> 2 126 12 0 #> 3 129 12 1 teachers_pp <- teachers |> reframe( year = 1:max(years), event = if_else(year == years & censor == 0, 1, 0), .by = id ) teachers_pp |> filter(id %in% c(20, 126, 129)) |> print(n = 27) #> # A tibble: 27 × 3 #> id year event #> #> 1 20 1 0 #> 2 20 2 0 #> 3 20 3 1 #> 4 126 1 0 #> 5 126 2 0 #> 6 126 3 0 #> 7 126 4 0 #> 8 126 5 0 #> 9 126 6 0 #> 10 126 7 0 #> 11 126 8 0 #> 12 126 9 0 #> 13 126 10 0 #> 14 126 11 0 #> 15 126 12 1 #> 16 129 1 0 #> 17 129 2 0 #> 18 129 3 0 #> 19 129 4 0 #> 20 129 5 0 #> 21 129 6 0 #> 22 129 7 0 #> 23 129 8 0 #> 24 129 9 0 #> 25 129 10 0 #> 26 129 11 0 #> 27 129 12 0 teachers_pp |> group_by(year) |> count(event) |> pivot_wider(names_from = event, names_prefix = \"event_\", values_from = n) |> mutate( total = event_0 + event_1, p.event_1 = event_1 / total ) #> # A tibble: 12 × 5 #> # Groups: year [12] #> year event_0 event_1 total p.event_1 #> #> 1 1 3485 456 3941 0.116 #> 2 2 3101 384 3485 0.110 #> 3 3 2742 359 3101 0.116 #> 4 4 2447 295 2742 0.108 #> 5 5 2229 218 2447 0.0891 #> 6 6 2045 184 2229 0.0825 #> 7 7 1922 123 2045 0.0601 #> 8 8 1563 79 1642 0.0481 #> 9 9 1203 53 1256 0.0422 #> 10 10 913 35 948 0.0369 #> 11 11 632 16 648 0.0247 #> 12 12 386 5 391 0.0128"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-11.html","id":"toward-a-statistical-model-for-discretetime-hazard","dir":"Articles","previous_headings":"","what":"11.1 Toward a Statistical Model for DiscreteTime Hazard","title":"Chapter 11: Fitting basic discrete-time hazard models","text":"Several examples chapter rely following: Figure 11.1, page 359: Table 11.1, page 360: Figure 11.2, page 363: Figure 11.3, page 366:","code":"first_sex_fit <- survfit(Surv(grade, 1 - censor) ~ 1, data = first_sex) first_sex_pt <- c(0, 1) |> map_dfr( \\(.x) { first_sex_fit_subset <- update( first_sex_fit, subset = (parental_transition == .x) ) first_sex_fit_subset |> survfit0(start.time = 6) |> tidy() |> rename(survival_probability = estimate) |> mutate( hazard_probability = n.event / n.risk, odds = hazard_probability / (1 - hazard_probability), log_odds = log(odds) ) |> select(-starts_with(\"conf\"), -std.error) |> rename(grade = time) |> pivot_longer( cols = c(survival_probability, hazard_probability, odds, log_odds), values_to = \"estimate\" ) |> # The figure doesn't include data for grade 6 in the hazard function. filter( !(name %in% c(\"hazard_probability\", \"odds\", \"log_odds\") & grade == 6) ) }, .id = \"parental_transition\" ) first_sex_pt |> filter(name %in% c(\"survival_probability\", \"hazard_probability\")) |> ggplot(aes(x = grade, y = estimate, colour = parental_transition)) + geom_hline( aes(yintercept = .5), data = tibble(name = \"survival_probability\"), alpha = .25, linetype = 2 ) + geom_line() + scale_x_continuous(breaks = 6:12) + coord_cartesian(xlim = c(6, 12)) + facet_wrap(vars(name), ncol = 1, scales = \"free_y\") + ggh4x::facetted_pos_scales( y = list( name == \"hazard_probability\" ~ scale_y_continuous(limits = c(0, .5)), name == \"survival_probability\" ~ scale_y_continuous(breaks = c(0, .5, 1), limits = c(0, 1)) ) ) # First two sections of the table first_sex_pt |> filter(grade != 6, !(name %in% c(\"odds\", \"log_odds\"))) |> pivot_wider(names_from = name, values_from = estimate) |> select(everything(), -n.censor, hazard_probability, survival_probability) #> # A tibble: 12 × 6 #> parental_transition grade n.risk n.event survival_probability #> #> 1 1 7 72 2 0.972 #> 2 1 8 70 2 0.944 #> 3 1 9 68 8 0.833 #> 4 1 10 60 8 0.722 #> 5 1 11 52 10 0.583 #> 6 1 12 42 8 0.472 #> 7 2 7 108 13 0.880 #> 8 2 8 95 5 0.833 #> 9 2 9 90 16 0.685 #> 10 2 10 74 21 0.491 #> 11 2 11 53 15 0.352 #> 12 2 12 38 18 0.185 #> # ℹ 1 more variable: hazard_probability # Last section first_sex_fit |> tidy() |> rename(survival_probability = estimate) |> mutate( hazard_probability = n.event / n.risk, .before = survival_probability ) |> select(-starts_with(\"conf\"), -std.error, -n.censor) |> rename(grade = time) #> # A tibble: 6 × 5 #> grade n.risk n.event hazard_probability survival_probability #> #> 1 7 180 15 0.0833 0.917 #> 2 8 165 7 0.0424 0.878 #> 3 9 158 24 0.152 0.744 #> 4 10 134 29 0.216 0.583 #> 5 11 105 25 0.238 0.444 #> 6 12 80 26 0.325 0.3 first_sex_pt |> filter(name %in% c(\"hazard_probability\", \"odds\", \"log_odds\")) |> mutate( name = factor(name, levels = c(\"hazard_probability\", \"odds\", \"log_odds\")) ) |> ggplot(aes(x = grade, y = estimate, colour = parental_transition)) + geom_line() + scale_x_continuous(breaks = 6:12) + coord_cartesian(xlim = c(6, 12)) + facet_wrap(vars(name), ncol = 1, scales = \"free_y\") + ggh4x::facetted_pos_scales( y = list( name %in% c(\"hazard_probability\", \"odds\") ~ scale_y_continuous(limits = c(0, 1)), name == \"log_odds\" ~ scale_y_continuous(limits = c(-4, 0)) ) ) # Transform to person-period format. first_sex_pp <- first_sex |> rename(grades = grade) |> reframe( grade = 7:max(grades), event = if_else(grade == grades & censor == 0, 1, 0), parental_transition, parental_antisociality, .by = id ) # Fit models for each panel. first_sex_fit_11.3a <- glm( event ~ parental_transition, family = \"binomial\", data = first_sex_pp ) first_sex_fit_11.3b <- update(first_sex_fit_11.3a, . ~ . + grade) first_sex_fit_11.3c <- update(first_sex_fit_11.3a, . ~ . + factor(grade)) # Plot: map_df( list(a = first_sex_fit_11.3a, b = first_sex_fit_11.3b, c = first_sex_fit_11.3c), \\(.x) augment(.x, newdata = first_sex_pp), .id = \"model\" ) |> ggplot(aes(x = grade, y = .fitted, colour = factor(parental_transition))) + geom_line() + geom_point( aes(y = estimate), data = first_sex_pt |> mutate(parental_transition = as.numeric(parental_transition) - 1) |> filter(name == \"log_odds\") ) + coord_cartesian(ylim = c(-4, 0)) + facet_wrap(vars(model), ncol = 1, labeller = label_both) + labs( y = \"logit(hazard)\", colour = \"parental_transition\" )"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-11.html","id":"a-formal-representation-of-the-population-discrete-time-hazard-model","dir":"Articles","previous_headings":"","what":"11.2 A Formal Representation of the Population Discrete-Time Hazard Model","title":"Chapter 11: Fitting basic discrete-time hazard models","text":"Figure 11.4, page 374:","code":"# Panel A: first_sex_fit_11.3c |> augment(newdata = first_sex_pp) |> ggplot(aes(x = grade, y = .fitted, colour = factor(parental_transition))) + geom_line() + coord_cartesian(ylim = c(-4, 0)) # Panel B: first_sex_fit_11.4b <- update( first_sex_fit_11.3c, . ~ . + parental_transition * factor(grade) ) first_sex_fit_11.4b |> augment(newdata = first_sex_pp) |> ggplot(aes(x = grade, y = exp(.fitted), colour = factor(parental_transition))) + geom_line() + coord_cartesian(ylim = c(0, 1)) # Panel C: first_sex_fit_11.4b |> augment(newdata = first_sex_pp, type.predict = \"response\") |> ggplot(aes(x = grade, y = .fitted, colour = factor(parental_transition))) + geom_line()"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-11.html","id":"fitting-a-discrete-time-hazard-model-to-data","dir":"Articles","previous_headings":"","what":"11.3 Fitting a Discrete-Time Hazard Model to Data","title":"Chapter 11: Fitting basic discrete-time hazard models","text":"Figure 11.5: Table 11.3, page 386:","code":"model_A <- glm( event ~ factor(grade) - 1, family = \"binomial\", data = first_sex_pp ) model_B <- update(model_A, . ~ . + parental_transition) model_C <- update(model_A, . ~ . + parental_antisociality) model_D <- update(model_B, . ~ . + parental_antisociality) anova(model_B) #> Analysis of Deviance Table #> #> Model: binomial, link: logit #> #> Response: event #> #> Terms added sequentially (first to last) #> #> #> Df Deviance Resid. Df Resid. Dev Pr(>Chi) #> NULL 822 1139.53 #> factor(grade) 6 487.58 816 651.96 < 2.2e-16 *** #> parental_transition 1 17.29 815 634.66 3.203e-05 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 anova(model_C) #> Analysis of Deviance Table #> #> Model: binomial, link: logit #> #> Response: event #> #> Terms added sequentially (first to last) #> #> #> Df Deviance Resid. Df Resid. Dev Pr(>Chi) #> NULL 822 1139.53 #> factor(grade) 6 487.58 816 651.96 < 2.2e-16 *** #> parental_antisociality 1 14.79 815 637.17 0.0001204 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 # Deviance tests are sequential so the order of terms matters. To test # parental_transition and parental_antisociality, the model needs to be # fit twice, once with each as the last term. anova(update(model_C, . ~ . + parental_transition)) #> Analysis of Deviance Table #> #> Model: binomial, link: logit #> #> Response: event #> #> Terms added sequentially (first to last) #> #> #> Df Deviance Resid. Df Resid. Dev Pr(>Chi) #> NULL 822 1139.53 #> factor(grade) 6 487.58 816 651.96 < 2.2e-16 *** #> parental_antisociality 1 14.79 815 637.17 0.0001204 *** #> parental_transition 1 8.02 814 629.15 0.0046222 ** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 anova(model_D) #> Analysis of Deviance Table #> #> Model: binomial, link: logit #> #> Response: event #> #> Terms added sequentially (first to last) #> #> #> Df Deviance Resid. Df Resid. Dev Pr(>Chi) #> NULL 822 1139.53 #> factor(grade) 6 487.58 816 651.96 < 2.2e-16 *** #> parental_transition 1 17.29 815 634.66 3.203e-05 *** #> parental_antisociality 1 5.51 814 629.15 0.01886 * #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-11.html","id":"interpreting-parameter-estimates","dir":"Articles","previous_headings":"","what":"11.4 Interpreting Parameter Estimates","title":"Chapter 11: Fitting basic discrete-time hazard models","text":"Table 11.4, page 388:","code":"model_A |> tidy() |> select(term, estimate) |> mutate( odds = exp(estimate), hazard = 1 / (1 + exp(-estimate)) ) #> # A tibble: 6 × 4 #> term estimate odds hazard #> #> 1 factor(grade)7 -2.40 0.0909 0.0833 #> 2 factor(grade)8 -3.12 0.0443 0.0424 #> 3 factor(grade)9 -1.72 0.179 0.152 #> 4 factor(grade)10 -1.29 0.276 0.216 #> 5 factor(grade)11 -1.16 0.313 0.238 #> 6 factor(grade)12 -0.731 0.481 0.325"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-11.html","id":"displaying-fitted-hazard-and-survivor-functions","dir":"Articles","previous_headings":"","what":"11.5 Displaying Fitted Hazard and Survivor Functions","title":"Chapter 11: Fitting basic discrete-time hazard models","text":"Table 11.5, page 392: Figure 11.6, page 393: Figure 11.7, page 395:","code":"model_B_tidy <- model_B |> augment( newdata = expand_grid(grade = 7:12, parental_transition = 0:1) ) |> mutate( hazard = 1 / (1 + exp(-.fitted)), survival = cumprod(1 - hazard), .by = parental_transition ) model_B_tidy #> # A tibble: 12 × 5 #> grade parental_transition .fitted hazard survival #> #> 1 7 0 -2.99 0.0477 0.952 #> 2 7 1 -2.12 0.107 0.893 #> 3 8 0 -3.70 0.0241 0.929 #> 4 8 1 -2.83 0.0559 0.843 #> 5 9 0 -2.28 0.0927 0.843 #> 6 9 1 -1.41 0.197 0.677 #> 7 10 0 -1.82 0.139 0.726 #> 8 10 1 -0.949 0.279 0.488 #> 9 11 0 -1.65 0.161 0.609 #> 10 11 1 -0.781 0.314 0.335 #> 11 12 0 -1.18 0.235 0.466 #> 12 12 1 -0.305 0.424 0.193 # FIXME: should use survfit0() for the survival panel so time starts at 6. model_B_tidy |> pivot_longer(cols = .fitted:survival) |> ggplot(aes(x = grade, y = value, colour = factor(parental_transition))) + geom_line() + facet_wrap(vars(name), ncol = 1, scales = \"free_y\") prototypical_males <- tibble( id = rep(1:6, times = length(7:12)), expand_grid( grade = 7:12, parental_transition = c(0, 1), parental_antisociality = -1:1 ) ) prototypical_first_sex <- tibble( log_odds = predict( model_D, prototypical_males ), hazard = 1 / (1 + exp(-log_odds)) ) grade_six <- tibble( id = 1:6, grade = 6, expand_grid( parental_transition = c(0, 1), parental_antisociality = -1:1 ), log_odds = NA, hazard = NA, survival = 1 ) prototypical_males |> bind_cols(prototypical_first_sex) |> mutate(survival = cumprod(1 - hazard), .by = id) |> add_row(grade_six) |> pivot_longer(cols = c(hazard, survival)) |> ggplot(aes(x = grade, y = value, group = id)) + geom_line( aes( colour = factor(parental_antisociality), linetype = factor(parental_transition) ) ) + scale_colour_grey(start = 0, end = 0.75) + facet_wrap( vars(name), ncol = 1, scales = \"free_y\" ) #> Warning: Removed 6 rows containing missing values or values outside the scale range #> (`geom_line()`)."},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-12.html","id":"alternative-specifications-for-the-main-effect-of-time","dir":"Articles","previous_headings":"","what":"12.1 Alternative Specifications for the “Main Effect of TIME”","title":"Chapter 12: Extending the discrete-time hazard model","text":"Table 12.2, page 413: Figure 12.1, page 414:","code":"# Convert to person-period format tenure_pp <- tenure |> reframe( year = 1:max(years), event = if_else(year == years & censor == 0, 1, 0), .by = id ) |> mutate( temp_year = year, temp_dummy = 1 ) |> pivot_wider( names_from = temp_year, names_prefix = \"year_\", values_from = temp_dummy, values_fill = 0 ) # Fit models tenure_fit_general <- glm( event ~ factor(year), family = \"binomial\", data = tenure_pp ) tenure_fit_constant <- glm( event ~ 1, family = \"binomial\", data = tenure_pp ) tenure_fit_linear <- update(tenure_fit_constant, . ~ year) tenure_fit_quadratic <- update(tenure_fit_linear, . ~ . + I(year^2)) tenure_fit_cubic <- update(tenure_fit_quadratic, . ~ . + I(year^3)) tenure_fit_order_4 <- update(tenure_fit_cubic, . ~ . + I(year^4)) tenure_fit_order_5 <- update(tenure_fit_order_4, . ~ . + I(year^5)) # Compare anova( tenure_fit_constant, tenure_fit_linear, tenure_fit_quadratic, tenure_fit_cubic, tenure_fit_order_4, tenure_fit_order_5 ) #> Analysis of Deviance Table #> #> Model 1: event ~ 1 #> Model 2: event ~ year #> Model 3: event ~ year + I(year^2) #> Model 4: event ~ year + I(year^2) + I(year^3) #> Model 5: event ~ year + I(year^2) + I(year^3) + I(year^4) #> Model 6: event ~ year + I(year^2) + I(year^3) + I(year^4) + I(year^5) #> Resid. Df Resid. Dev Df Deviance Pr(>Chi) #> 1 1473 1037.57 #> 2 1472 867.46 1 170.103 < 2.2e-16 *** #> 3 1471 836.30 1 31.158 2.379e-08 *** #> 4 1470 833.17 1 3.132 0.07679 . #> 5 1469 832.74 1 0.430 0.51208 #> 6 1468 832.73 1 0.011 0.91831 #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 tenure_fit_trajectories <- map_df( list( constant = tenure_fit_constant, linear = tenure_fit_linear, quadratic = tenure_fit_quadratic, cubic = tenure_fit_cubic, general = tenure_fit_general ), \\(.x) { augment(.x, newdata = tibble(year = 1:9)) }, .id = \"model\" ) tenure_fit_trajectories |> mutate( model = factor( model, levels = c(\"constant\", \"linear\", \"quadratic\", \"cubic\", \"general\") ), hazard = if_else( model %in% c(\"quadratic\", \"general\"), 1 / (1 + exp(-.fitted)), NA ), survival = if_else( model %in% c(\"quadratic\", \"general\"), cumprod(1 - hazard), NA ), .by = model ) |> rename(logit_hazard = .fitted) |> pivot_longer(cols = logit_hazard:survival, names_to = \"estimate\") |> mutate(estimate = factor( estimate, levels = c(\"logit_hazard\", \"hazard\", \"survival\")) ) |> ggplot(aes(x = year, y = value, colour = model)) + geom_line() + scale_color_brewer(type = \"qual\", palette = \"Dark2\") + scale_x_continuous(breaks = 1:9) + facet_wrap(vars(estimate), scales = \"free_y\", labeller = label_both) #> Warning: Removed 54 rows containing missing values or values outside the scale range #> (`geom_line()`)."},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-12.html","id":"using-the-complementary-log-log-link-to-specify-a-discrete-time-hazard-model","dir":"Articles","previous_headings":"","what":"12.2 Using the Complementary Log-Log Link to Specify a Discrete-Time Hazard Model","title":"Chapter 12: Extending the discrete-time hazard model","text":"Figure 12.2: Figure 12.3, page 423: Table 12.3, page 424:","code":"first_sex_pp <- first_sex |> rename(grades = grade) |> reframe( grade = 7:max(grades), event = if_else(grade == grades & censor == 0, 1, 0), parental_transition, parental_antisociality, .by = id ) # The nested map_() is used here so we can get an ID column for both the # link function and the subset. map_dfr( list(logit = \"logit\", cloglog = \"cloglog\"), \\(.x) { map_dfr( list(`0` = 0, `1` = 1), \\(.y) { first_sex_fit <- glm( event ~ factor(grade), family = binomial(link = .x), data = first_sex_pp, subset = c(parental_transition == .y) ) augment(first_sex_fit, newdata = tibble(grade = 7:12)) }, .id = \"parental_transition\" ) }, .id = \"link\" ) |> ggplot( aes(x = grade, y = .fitted, colour = parental_transition, linetype = link) ) + geom_line() map_dfr( list(cloglog = \"cloglog\", logit = \"logit\"), \\(.x) { first_sex_fit <- glm( event ~ -1 + factor(grade) + parental_transition, family = binomial(link = .x), data = first_sex_pp ) first_sex_fit |> tidy() |> select(term, estimate) |> mutate( base_hazard = case_when( .x == \"logit\" & term != \"parental_transition\" ~ 1 / (1 + exp(-estimate)), .x == \"cloglog\" & term != \"parental_transition\" ~ 1 - exp(-exp(estimate)) ) ) }, .id = \"link\" ) |> pivot_wider(names_from = link, values_from = c(estimate, base_hazard)) #> # A tibble: 7 × 5 #> term estimate_cloglog estimate_logit base_hazard_cloglog base_hazard_logit #> #> 1 factor(… -2.97 -2.99 0.0498 0.0477 #> 2 factor(… -3.66 -3.70 0.0254 0.0241 #> 3 factor(… -2.32 -2.28 0.0940 0.0927 #> 4 factor(… -1.90 -1.82 0.139 0.139 #> 5 factor(… -1.76 -1.65 0.158 0.161 #> 6 factor(… -1.34 -1.18 0.230 0.235 #> 7 parenta… 0.785 0.874 NA NA"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-12.html","id":"time-varying-predictors","dir":"Articles","previous_headings":"","what":"12.3 Time-Varying Predictors","title":"Chapter 12: Extending the discrete-time hazard model","text":"Figure 12.4, page 432: Figure 12.5, page 437:","code":"first_depression_fit <- glm( depressive_episode ~ poly(I(period - 18), 3, raw = TRUE) + parental_divorce, family = binomial(link = \"logit\"), data = first_depression_1 ) # When a predictor enters the model as part of a matrix of covariates, such as # with stats::poly(), it is represented in augment() as a matrix column. A simple # workaround to get the predictor on its original scale as a vector is to pass # the original data to augment(). first_depression_predictions <- first_depression_fit |> augment(data = first_depression_1) |> mutate(hazard = 1 / (1 + exp(-.fitted))) # Proportions of the risk set at each age who experienced an initial depressive # episode at that age, as function of their parental divorce status at that age. first_depression_proportions <- first_depression_1 |> group_by(period, parental_divorce) |> summarise( total = n(), event = sum(depressive_episode), proportion = event / total, proportion = if_else(proportion == 0, NA, proportion), logit = log(proportion / (1 - proportion)) ) #> `summarise()` has grouped output by 'period'. You can override using the #> `.groups` argument. # Top plot ggplot(mapping = aes(x = period, colour = factor(parental_divorce))) + geom_line( aes(y = hazard), data = first_depression_predictions ) + geom_point( aes(y = proportion), data = first_depression_proportions ) + scale_x_continuous(breaks = seq(0, 40, by = 5), limits = c(0, 40)) + scale_y_continuous(limits = c(0, 0.06)) #> Warning: Removed 14 rows containing missing values or values outside the scale range #> (`geom_point()`). # Bottom plot ggplot(mapping = aes(x = period, colour = factor(parental_divorce))) + geom_line( aes(y = .fitted), data = first_depression_predictions ) + geom_point( aes(y = logit), data = first_depression_proportions ) + scale_x_continuous(breaks = seq(0, 40, by = 5), limits = c(0, 40)) + scale_y_continuous(breaks = seq(-8, -2, by = 1), limits = c(-8, -2)) #> Warning: Removed 14 rows containing missing values or values outside the scale range #> (`geom_point()`). first_depression_fit_2 <- update(first_depression_fit, . ~ . + female) first_depression_fit_2 |> augment( newdata = expand_grid( period = 4:39, parental_divorce = c(0, 1), female = c(0, 1) ) ) |> mutate( female = factor(female), parental_divorce = factor(parental_divorce), hazard = 1 / (1 + exp(-.fitted)), survival = cumprod(1 - hazard), .by = c(female, parental_divorce) ) |> pivot_longer(cols = c(hazard, survival), names_to = \"estimate\") |> ggplot(aes(x = period, y = value, linetype = female, colour = parental_divorce)) + geom_line() + facet_wrap(vars(estimate), ncol = 1, scales = \"free_y\") + scale_x_continuous(breaks = seq(0, 40, by = 5), limits = c(0, 40)) + ggh4x::facetted_pos_scales( y = list( estimate == \"hazard\" ~ scale_y_continuous(limits = c(0, .04)), estimate == \"survival\" ~ scale_y_continuous(limits = c(0, 1)) ) )"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-12.html","id":"the-linear-additivity-assumption-uncovering-violations-and-simple-solutions","dir":"Articles","previous_headings":"","what":"12.4 The Linear Additivity Assumption: Uncovering Violations and Simple Solutions","title":"Chapter 12: Extending the discrete-time hazard model","text":"Figure 12.6, page 445: Table 12.4, page 449:","code":"# Raw first_arrest |> group_by(period, abused, black) |> summarise( total = n(), event = sum(event), proportion = event / total, proportion = if_else(proportion == 0, NA, proportion), logit = log(proportion / (1 - proportion)) ) |> ungroup() |> mutate(across(c(abused, black), factor)) |> na.omit() |> ggplot(aes(x = period, y = logit, colour = abused, group = abused)) + geom_line() + scale_x_continuous(breaks = 7:19, limits = c(7, 19)) + scale_y_continuous(limits = c(-7, -2)) + facet_wrap(vars(black), labeller = label_both) #> `summarise()` has grouped output by 'period', 'abused'. You can override using #> the `.groups` argument. # Model first_arrest_fit <- glm( event ~ factor(period) + abused + black + abused:black, family = binomial(link = \"logit\"), data = first_arrest ) first_arrest_fit |> augment( newdata = expand_grid(period = 8:18, abused = c(0, 1), black = c(0, 1)) ) |> ggplot( aes( x = period, y = .fitted, colour = factor(abused), linetype = factor(black) ) ) + geom_line() + scale_x_continuous(breaks = 7:19, limits = c(7, 19)) + scale_y_continuous(limits = c(-8, -2)) model_A <- update(first_depression_fit_2, . ~ . + siblings) model_B <- update( first_depression_fit_2, . ~ . + between(siblings, 1, 2) + between(siblings, 3, 4) + between(siblings, 5, 6) + between(siblings, 7, 8) + between(siblings, 9, Inf) ) model_C <- update(first_depression_fit_2, . ~ . + bigfamily) tidy(model_A) #> # A tibble: 7 × 5 #> term estimate std.error statistic p.value #> #> 1 (Intercept) -4.36 0.122 -35.8 2.23e-281 #> 2 poly(I(period - 18), 3, raw = TRUE)1 0.0611 0.0117 5.24 1.64e- 7 #> 3 poly(I(period - 18), 3, raw = TRUE)2 -0.00731 0.00122 -5.97 2.34e- 9 #> 4 poly(I(period - 18), 3, raw = TRUE)3 0.000182 0.0000790 2.30 2.14e- 2 #> 5 parental_divorce 0.373 0.162 2.29 2.18e- 2 #> 6 female 0.559 0.109 5.10 3.34e- 7 #> 7 siblings -0.0814 0.0223 -3.66 2.57e- 4 tidy(model_B) #> # A tibble: 11 × 5 #> term estimate std.error statistic p.value #> #> 1 (Intercept) -4.50 0.207 -21.8 4.22e-105 #> 2 poly(I(period - 18), 3, raw = TRUE)1 0.0615 0.0117 5.27 1.37e- 7 #> 3 poly(I(period - 18), 3, raw = TRUE)2 -0.00729 0.00122 -5.96 2.56e- 9 #> 4 poly(I(period - 18), 3, raw = TRUE)3 0.000181 0.0000790 2.30 2.17e- 2 #> 5 parental_divorce 0.373 0.162 2.29 2.18e- 2 #> 6 female 0.560 0.110 5.11 3.24e- 7 #> 7 between(siblings, 1, 2)TRUE 0.0209 0.198 0.106 9.16e- 1 #> 8 between(siblings, 3, 4)TRUE 0.0108 0.210 0.0512 9.59e- 1 #> 9 between(siblings, 5, 6)TRUE -0.494 0.255 -1.94 5.22e- 2 #> 10 between(siblings, 7, 8)TRUE -0.775 0.344 -2.26 2.41e- 2 #> 11 between(siblings, 9, Inf)TRUE -0.658 0.344 -1.91 5.56e- 2 tidy(model_C) #> # A tibble: 7 × 5 #> term estimate std.error statistic p.value #> #> 1 (Intercept) -4.48 0.109 -41.2 0 #> 2 poly(I(period - 18), 3, raw = TRUE)1 0.0614 0.0117 5.27 1.40e-7 #> 3 poly(I(period - 18), 3, raw = TRUE)2 -0.00729 0.00122 -5.96 2.54e-9 #> 4 poly(I(period - 18), 3, raw = TRUE)3 0.000182 0.0000790 2.30 2.15e-2 #> 5 parental_divorce 0.371 0.162 2.29 2.22e-2 #> 6 female 0.558 0.109 5.10 3.44e-7 #> 7 bigfamily -0.611 0.145 -4.22 2.39e-5"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-12.html","id":"the-proportionality-assumption-uncovering-violations-and-simple-solutions","dir":"Articles","previous_headings":"","what":"12.5 The proportionality assumption: Uncovering violations and simple solutions","title":"Chapter 12: Extending the discrete-time hazard model","text":"Figure 12.8, page 458: Table 12.5, page 459:","code":"# Raw math_dropout |> group_by(term, woman) |> summarise( total = n(), event = sum(event), proportion = event / total, proportion = if_else(proportion == 0, NA, proportion), logit = log(proportion / (1 - proportion)) ) |> ungroup() |> mutate(across(c(woman), factor)) |> na.omit() |> ggplot(aes(x = term, y = logit, colour = woman)) + geom_line() #> `summarise()` has grouped output by 'term'. You can override using the #> `.groups` argument. # Models model_A <- glm( event ~ -1 + factor(term) + woman, family = binomial(link = \"logit\"), data = math_dropout ) model_B <- glm( event ~ -1 + factor(term) + factor(term):woman, family = binomial(link = \"logit\"), data = math_dropout ) model_C <- update(model_A, . ~ . + woman:I(term - 1)) map_df( list(model_A = model_A, model_B = model_B, model_C = model_C), \\(.x) { .x |> augment(newdata = expand_grid(term = 1:5, woman = c(0, 1))) |> mutate(hazard = 1 / (1 + exp(-.fitted))) }, .id = \"model\" ) |> ggplot(aes(x = term, y = hazard, colour = factor(woman))) + geom_line() + facet_wrap(vars(model)) tidy(model_A) #> # A tibble: 6 × 5 #> term estimate std.error statistic p.value #> #> 1 factor(term)1 -2.13 0.0567 -37.6 0 #> 2 factor(term)2 -0.942 0.0479 -19.7 3.14e- 86 #> 3 factor(term)3 -1.45 0.0634 -22.8 1.66e-115 #> 4 factor(term)4 -0.618 0.0757 -8.16 3.42e- 16 #> 5 factor(term)5 -0.772 0.143 -5.40 6.54e- 8 #> 6 woman 0.379 0.0501 7.55 4.33e- 14 tidy(model_B) #> # A tibble: 10 × 5 #> term estimate std.error statistic p.value #> #> 1 factor(term)1 -2.01 0.0715 -28.1 1.40e-173 #> 2 factor(term)2 -0.964 0.0585 -16.5 5.98e- 61 #> 3 factor(term)3 -1.48 0.0847 -17.5 1.45e- 68 #> 4 factor(term)4 -0.710 0.101 -7.05 1.81e- 12 #> 5 factor(term)5 -0.869 0.191 -4.56 5.23e- 6 #> 6 factor(term)1:woman 0.157 0.0978 1.60 1.09e- 1 #> 7 factor(term)2:woman 0.419 0.0792 5.28 1.27e- 7 #> 8 factor(term)3:woman 0.441 0.116 3.81 1.42e- 4 #> 9 factor(term)4:woman 0.571 0.145 3.95 7.86e- 5 #> 10 factor(term)5:woman 0.601 0.286 2.10 3.55e- 2 tidy(model_C) #> # A tibble: 7 × 5 #> term estimate std.error statistic p.value #> #> 1 factor(term)1 -2.05 0.0646 -31.6 7.80e-220 #> 2 factor(term)2 -0.926 0.0482 -19.2 3.96e- 82 #> 3 factor(term)3 -1.50 0.0665 -22.5 3.54e-112 #> 4 factor(term)4 -0.718 0.0861 -8.34 7.34e- 17 #> 5 factor(term)5 -0.917 0.156 -5.89 3.94e- 9 #> 6 woman 0.227 0.0774 2.94 3.31e- 3 #> 7 woman:I(term - 1) 0.120 0.0470 2.55 1.08e- 2"},{"path":[]},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-12.html","id":"residual-analysis","dir":"Articles","previous_headings":"","what":"12.7 Residual Analysis","title":"Chapter 12: Extending the discrete-time hazard model","text":"Table 12.6, page 465: Figure 12.8, page 467:","code":"first_sex_fit <- glm( event ~ -1 + factor(grade) + parental_transition + parental_antisociality, family = binomial(link = \"logit\"), data = first_sex_pp ) first_sex_fit |> augment(data = first_sex_pp, type.residuals = \"deviance\") |> select(id:parental_antisociality, .resid) |> filter(id %in% c(22, 112, 166, 89, 102, 87, 67, 212)) |> pivot_wider( id_cols = id, names_from = grade, names_prefix = \"grade_\", values_from = .resid ) #> # A tibble: 8 × 7 #> id grade_7 grade_8 grade_9 grade_10 grade_11 grade_12 #> #> 1 22 -0.412 -0.294 -0.584 -0.718 -0.775 1.41 #> 2 67 -0.618 -0.448 -0.856 -1.03 -1.10 1.04 #> 3 87 1.82 NA NA NA NA NA #> 4 89 -0.325 -0.231 -0.464 -0.575 1.86 NA #> 5 102 -0.491 2.37 NA NA NA NA #> 6 112 -0.411 -0.294 -0.583 -0.717 -0.774 -0.956 #> 7 166 -0.661 -0.481 -0.911 -1.09 1.19 NA #> 8 212 -0.286 -0.203 -0.410 -0.509 -0.552 -0.696 first_sex_fit |> augment(data = first_sex_pp, type.residuals = \"deviance\") |> ggplot(aes(x = id, y = .resid)) + geom_point() + geom_hline(yintercept = 0) first_sex_fit |> augment(data = first_sex_pp, type.residuals = \"deviance\") |> group_by(id) |> summarise(ss.deviance = sum(.resid^2)) |> ggplot(aes(x = id, y = ss.deviance)) + geom_point()"},{"path":[]},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-13.html","id":"grouped-methods-for-estimating-continuous-time-survivor-and-hazard-functions","dir":"Articles","previous_headings":"","what":"13.2 Grouped Methods for Estimating Continuous-Time Survivor and Hazard Functions","title":"Chapter 13: Describing continuous-time event occurrence data","text":"Table 13.2, page 477: Figure 13.1, page 479:","code":"# Adding discrete-time intervals honking_discrete <- honking |> mutate( event = if_else(censor == 0, 1, 0), time_interval = cut(seconds, breaks = c(1:8, 18), right = FALSE), time_start = str_extract(time_interval, \"[[:digit:]]+(?=,)\"), time_end = str_extract(time_interval, \"(?<=,)[[:digit:]]+\"), across(c(time_start, time_end), as.numeric) ) # Grouped life table honking_grouped <- honking_discrete |> group_by(time_interval, time_start, time_end) |> summarise( total = n(), n_event = sum(event), n_censor = sum(censor), # All grouping needs to be dropped in order to calculate the number at risk # correctly. .groups = \"drop\" ) |> mutate(n_risk = sum(total) - lag(cumsum(total), default = 0)) # The conditional probability can be estimated using the same discrete-time methods # from the previous chapter, using the grouped data. honking_grouped_fit <- glm( cbind(n_event, n_risk - n_event) ~ 0 + time_interval, family = binomial(link = \"logit\"), data = honking_grouped ) honking_grouped_fit |> # .fitted is the conditional probability broom::augment(newdata = honking_grouped, type.predict = \"response\") |> mutate( survival = cumprod(1 - .fitted), hazard = .fitted / (time_end - time_start) ) #> # A tibble: 8 × 10 #> time_interval time_start time_end total n_event n_censor n_risk .fitted #> #> 1 [1,2) 1 2 6 5 1 57 0.0877 #> 2 [2,3) 2 3 17 14 3 51 0.275 #> 3 [3,4) 3 4 11 9 2 34 0.265 #> 4 [4,5) 4 5 10 6 4 23 0.261 #> 5 [5,6) 5 6 4 2 2 13 0.154 #> 6 [6,7) 6 7 4 2 2 9 0.222 #> 7 [7,8) 7 8 1 1 0 5 0.2 #> 8 [8,18) 8 18 4 3 1 4 0.75 #> # ℹ 2 more variables: survival , hazard # Estimates by hand honking_discrete_fit <- honking_grouped |> mutate( conditional_probability = n_event / n_risk, discrete.s = cumprod(1 - conditional_probability), discrete.h = conditional_probability / (time_end - time_start), # The actuarial method redefines the number of individuals to be at risk of # event occurrence for both the survival and hazard functions, and thus has # different conditional probabilities from the discrete method. n_risk.s = n_risk - (n_censor / 2), conditional_probability.s = n_event / n_risk.s, actuarial.s = cumprod(1 - conditional_probability.s), n_risk.h = n_risk.s - (n_event / 2), conditional_probability.h = n_event / n_risk.h, actuarial.h = conditional_probability.h / (time_end - time_start) ) |> select( -c(conditional_probability.s, conditional_probability.h, n_risk.s, n_risk.h) ) |> add_row(time_end = 0:1, discrete.s = 1, actuarial.s = 1) honking_discrete_fit #> # A tibble: 10 × 12 #> time_interval time_start time_end total n_event n_censor n_risk #> #> 1 [1,2) 1 2 6 5 1 57 #> 2 [2,3) 2 3 17 14 3 51 #> 3 [3,4) 3 4 11 9 2 34 #> 4 [4,5) 4 5 10 6 4 23 #> 5 [5,6) 5 6 4 2 2 13 #> 6 [6,7) 6 7 4 2 2 9 #> 7 [7,8) 7 8 1 1 0 5 #> 8 [8,18) 8 18 4 3 1 4 #> 9 NA NA 0 NA NA NA NA #> 10 NA NA 1 NA NA NA NA #> # ℹ 5 more variables: conditional_probability , discrete.s , #> # discrete.h , actuarial.s , actuarial.h honking_discrete_fit |> pivot_longer( cols = c(discrete.h, discrete.s, actuarial.h, actuarial.s), names_to = \"estimate\" ) |> mutate( estimate = factor( estimate, levels = c(\"discrete.s\", \"actuarial.s\", \"discrete.h\", \"actuarial.h\") ) ) |> ggplot(aes(x = time_end, y = value)) + geom_line(data = \\(x) filter(x, str_detect(estimate, \"discrete\"))) + geom_step( data = \\(x) filter(x, str_detect(estimate, \"actuarial\")), direction = \"vh\" ) + scale_x_continuous(limits = c(0, 20)) + facet_wrap(vars(estimate), scales = \"free_y\") + ggh4x::facetted_pos_scales( y = list( str_detect(estimate, \"s$\") ~ scale_y_continuous(limits = c(0, 1)), str_detect(estimate, \"h$\") ~ scale_y_continuous(limits = c(0, .35), breaks = seq(0, .35, by = .05)) ) )"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-13.html","id":"the-kaplan-meier-method-of-estimating-the-continuous-time-survivor-function","dir":"Articles","previous_headings":"","what":"13.3 The Kaplan-Meier Method of Estimating the Continuous-Time Survivor Function","title":"Chapter 13: Describing continuous-time event occurrence data","text":"Table 13.3, page 484: Figure 13.2, page 485:","code":"honking_continuous_fit <- survfit(Surv(seconds, 1 - censor) ~ 1, data = honking) honking_continuous_fit_tidy <- honking_continuous_fit |> survfit0() |> tidy() |> select(-starts_with(\"conf\")) |> mutate( # tidy() returns the standard error for the cumulative hazard, so we need to # transform it into the standard error for the survival. std.error = estimate * std.error, conditional_probability = n.event / n.risk, time_interval = 1:n(), time_end = lead(time, default = Inf), width = time_end - time, hazard = conditional_probability / width ) |> relocate( time_interval, time_start = time, time_end, n.risk:n.censor, conditional_probability, survival = estimate ) honking_continuous_fit_tidy #> # A tibble: 57 × 11 #> time_interval time_start time_end n.risk n.event n.censor #> #> 1 1 0 1.41 57 0 0 #> 2 2 1.41 1.51 57 1 1 #> 3 3 1.51 1.67 55 1 0 #> 4 4 1.67 1.68 54 1 0 #> 5 5 1.68 1.86 53 1 0 #> 6 6 1.86 2.12 52 1 0 #> 7 7 2.12 2.19 51 1 0 #> 8 8 2.19 2.36 50 1 0 #> 9 9 2.36 2.48 49 0 1 #> 10 10 2.48 2.5 48 1 0 #> # ℹ 47 more rows #> # ℹ 5 more variables: conditional_probability , survival , #> # std.error , width , hazard honking_continuous_fit_tidy |> add_row(time_end = 0:1, survival = 1) |> # The largest event time was censored, so we extend the last step out to that # largest censored value rather than going to infinity. mutate(time_end = if_else(time_end == Inf, time_start, time_end)) |> ggplot() + geom_step( aes(x = time_end, y = survival, linetype = \"1: Kaplan Meier\"), direction = \"vh\" ) + geom_line( aes(x = time_end, y = discrete.s, linetype = \"2: Discrete-time\"), data = honking_discrete_fit ) + geom_step( aes(x = time_end, y = actuarial.s, linetype = \"3: Actuarial\"), data = honking_discrete_fit, direction = \"vh\" ) + scale_x_continuous(limits = c(0, 20)) + labs(x = \"time\")"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-13.html","id":"the-cumulative-hazard-function","dir":"Articles","previous_headings":"","what":"13.4 The Cumulative Hazard Function","title":"Chapter 13: Describing continuous-time event occurrence data","text":"Figure 13.4, page 493:","code":"honking_continuous_fit_tidy |> mutate(time_end = if_else(time_end == Inf, time_start, time_end)) |> ggplot(aes(x = time_end)) + geom_step( aes(y = -log(survival), linetype = \"Negative log\"), direction = \"vh\" ) + geom_step( aes(y = cumsum(hazard * width), linetype = \"Nelson-Aalen\"), direction = \"vh\" ) #> Warning: Removed 1 row containing missing values or values outside the scale range #> (`geom_step()`)."},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-13.html","id":"kernel-smoothed-estimates-of-the-hazard-function","dir":"Articles","previous_headings":"","what":"13.5 Kernel-Smoothed Estimates of the Hazard Function","title":"Chapter 13: Describing continuous-time event occurrence data","text":"Figure 13.5, page 496:","code":"kernel_smoothed_hazards <- map_df( set_names(1:3), \\(bandwidth) { # muhaz() estimates the hazard function from right-censored data using # kernel-based methods, using the vector of survival and event times. kernel_smoothed_hazard <- muhaz( honking$seconds, 1 - honking$censor, # Narrow the temporal region the smoothed function describes, given the # bandwidth and the minimum and maximum observed event times. min.time = min(honking$seconds[honking$censor == 0]) + bandwidth, max.time = max(honking$seconds[honking$censor == 0]) - bandwidth, bw.grid = bandwidth, bw.method = \"global\", b.cor = \"none\", kern = \"epanechnikov\" ) tidy(kernel_smoothed_hazard) }, .id = \"bandwidth\" ) #> Warning in muhaz(honking$seconds, 1 - honking$censor, min.time = min(honking$seconds[honking$censor == : minimum time > minimum Survival Time #> Warning in muhaz(honking$seconds, 1 - honking$censor, min.time = min(honking$seconds[honking$censor == : minimum time > minimum Survival Time #> Warning in muhaz(honking$seconds, 1 - honking$censor, min.time = min(honking$seconds[honking$censor == : minimum time > minimum Survival Time ggplot(kernel_smoothed_hazards, aes(x = time, y = estimate)) + geom_line() + scale_x_continuous(limits = c(0, 20)) + facet_wrap(vars(bandwidth), ncol = 1, labeller = label_both)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-13.html","id":"developing-an-intuition-about-continuous-time-survivor-cumulative-hazard-and-kernel-smoothed-hazard-functions","dir":"Articles","previous_headings":"","what":"13.6 Developing an Intuition about Continuous-Time Survivor, Cumulative Hazard, and Kernel-Smoothed Hazard Functions","title":"Chapter 13: Describing continuous-time event occurrence data","text":"Figure 13.6, page 499:","code":"# TODO: Check that models are correct, then tidy up code. # Fit survival models alcohol_relapse_fit <- survfit( Surv(weeks, 1 - censor) ~ 1, data = alcohol_relapse ) judges_fit <- survfit( Surv(tenure, dead) ~ 1, data = judges ) first_depression_fit <- survfit( Surv(age, 1 - censor) ~ 1, data = first_depression_2 ) health_workers_fit <- survfit( Surv(weeks, 1 - censor) ~ 1, data = health_workers ) # Tidy survival models survival_models <- list( alcohol_relapse = alcohol_relapse_fit, judges = judges_fit, first_depression = first_depression_fit, health_workers = health_workers_fit ) survival_models_tidy <- map( survival_models, \\(.x) { .x |> survfit0() |> tidy() |> mutate(cumulative_hazard = -log(estimate)) |> select(time, survival = estimate, cumulative_hazard) |> pivot_longer( cols = c(survival, cumulative_hazard), names_to = \"statistic\", values_to = \"estimate\" ) } ) # Estimate and tidy smoothed hazards kernel_smoothed_hazards_tidy <- pmap( list( list( alcohol_relapse = alcohol_relapse$weeks, judges = judges$tenure, first_depression = first_depression_2$age, health_workers = health_workers$weeks ), list( 1 - alcohol_relapse$censor, judges$dead, 1 - first_depression_2$censor, 1 - health_workers$censor ), list(12, 5, 7, 7) ), \\(survival_time, event, bandwidth) { kernel_smoothed_hazard <- muhaz( survival_time, event, min.time = min(survival_time[1 - event == 0]) + bandwidth, max.time = max(survival_time[1 - event == 0]) - bandwidth, bw.grid = bandwidth, bw.method = \"global\", b.cor = \"none\", kern = \"epanechnikov\" ) kernel_smoothed_hazard |> tidy() |> mutate(statistic = \"hazard\") } ) #> Warning in muhaz(survival_time, event, min.time = min(survival_time[1 - : minimum time > minimum Survival Time #> Warning in muhaz(survival_time, event, min.time = min(survival_time[1 - : minimum time > minimum Survival Time #> Warning in muhaz(survival_time, event, min.time = min(survival_time[1 - : minimum time > minimum Survival Time #> Warning in muhaz(survival_time, event, min.time = min(survival_time[1 - : minimum time > minimum Survival Time # Combine estimates estimates_tidy <- map2( survival_models_tidy, kernel_smoothed_hazards_tidy, \\(.x, .y) { bind_rows(.x, .y) |> mutate(statistic = factor( statistic, levels = c(\"survival\", \"cumulative_hazard\", \"hazard\")) ) } ) plots <- map2( estimates_tidy, names(estimates_tidy), \\(.x, .y) { ggplot(.x, aes(x = time, y = estimate)) + geom_step(data = \\(.x) filter(.x, statistic != \"hazard\")) + geom_line(data = \\(.x) filter(.x, statistic == \"hazard\")) + facet_wrap(vars(statistic), ncol = 1, scales = \"free_y\") + labs(title = .y) } ) patchwork::wrap_plots(plots, ncol = 4)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-14.html","id":"toward-a-statistical-model-for-continuous-time-hazard","dir":"Articles","previous_headings":"","what":"14.1 Toward a Statistical Model for Continuous-Time Hazard","title":"Chapter 14: Fitting the Cox regression model","text":"Figure 14.1, page 505: Figure 14.2, page 508:","code":"# Fit survival models rearrest_fit <- survfit( Surv(months, abs(censor - 1)) ~ 1, data = rearrest ) person_crime_0_fit <- update(rearrest_fit, subset = person_crime == 0) person_crime_1_fit <- update(rearrest_fit, subset = person_crime == 1) # Tidy survival models survival_models <- list( person_crime_0 = person_crime_0_fit, person_crime_1 = person_crime_1_fit ) survival_models_tidy <- map( survival_models, \\(.x) { .x |> survfit0() |> tidy() |> mutate(cumulative_hazard = -log(estimate)) |> select(time, survival = estimate, cumulative_hazard) |> pivot_longer( cols = c(survival, cumulative_hazard), names_to = \"statistic\", values_to = \"estimate\" ) } ) # Estimate and tidy smoothed hazards kernel_smoothed_hazards_tidy <- map2( list( person_crime_0 = filter(rearrest, person_crime == 0)$months, person_crime_1 = filter(rearrest, person_crime == 1)$months ), list( abs(filter(rearrest, person_crime == 0)$censor - 1), abs(filter(rearrest, person_crime == 1)$censor - 1) ), \\(survival_time, event) { kernel_smoothed_hazard <- muhaz( survival_time, event, min.time = min(survival_time[1 - event == 0]) + 8, max.time = max(survival_time[1 - event == 0]) - 8, bw.grid = 8, bw.method = \"global\", b.cor = \"none\", kern = \"epanechnikov\" ) kernel_smoothed_hazard |> tidy() |> mutate(statistic = \"hazard\") } ) #> Warning in muhaz(survival_time, event, min.time = min(survival_time[1 - : minimum time > minimum Survival Time #> Warning in muhaz(survival_time, event, min.time = min(survival_time[1 - : minimum time > minimum Survival Time # Combine estimates estimates_tidy <- map2( survival_models_tidy, kernel_smoothed_hazards_tidy, \\(.x, .y) { bind_rows(.x, .y) |> mutate(statistic = factor( statistic, levels = c(\"survival\", \"cumulative_hazard\", \"hazard\")) ) } ) |> list_rbind(names_to = \"person_crime\") # Plot ggplot(estimates_tidy, aes(x = time, y = estimate, linetype = person_crime)) + geom_step(data = \\(.x) filter(.x, statistic != \"hazard\")) + geom_line(data = \\(.x) filter(.x, statistic == \"hazard\")) + facet_wrap(vars(statistic), ncol = 1, scales = \"free_y\") # Top plot log_cumulative_hazards <- estimates_tidy |> filter(statistic == \"cumulative_hazard\") |> mutate(estimate = log(estimate)) |> filter(!is.infinite(estimate)) ggplot( log_cumulative_hazards, aes(x = time, y = estimate, linetype = person_crime) ) + geom_hline(yintercept = 0) + geom_step() + coord_cartesian(xlim = c(0, 30), ylim = c(-6, 1)) # Middle and bottom plots ---- rearrest_fit_2 <- coxph( Surv(months, abs(censor - 1)) ~ person_crime, data = rearrest, method = \"efron\" ) rearrest_fit_2_curves <- map_df( list(person_crime_0 = 0, person_crime_1 = 1), \\(.x) { rearrest_fit_2 |> survfit( newdata = data.frame(person_crime = .x), type = \"kaplan-meier\" ) |> tidy() |> mutate( cumulative_hazard = -log(estimate), log_cumulative_hazard = log(cumulative_hazard) ) }, .id = \"person_crime\" ) # Middle plot rearrest_fit_2 |> augment(data = rearrest, type.predict = \"survival\") |> mutate( cumulative_hazard = -log(.fitted), log_cumulative_hazard = log(cumulative_hazard) ) |> ggplot(mapping = aes(x = months, y = log_cumulative_hazard)) + geom_step(aes(x = time, linetype = person_crime), data = rearrest_fit_2_curves)+ geom_point( aes(shape = person_crime, x = time, y = estimate), data = log_cumulative_hazards ) + scale_y_continuous(breaks = -6:1) + coord_cartesian(xlim = c(0, 30), ylim = c(-6, 1)) # Bottom plot rearrest_fit_2 |> augment(data = rearrest, type.predict = \"survival\") |> mutate( cumulative_hazard = -log(.fitted), log_cumulative_hazard = log(cumulative_hazard) ) |> ggplot(mapping = aes(x = months, y = cumulative_hazard)) + geom_step(aes(x = time, linetype = person_crime), data = rearrest_fit_2_curves) + geom_point( aes(shape = person_crime, x = time, y = estimate), data = filter(estimates_tidy, statistic == \"cumulative_hazard\") ) + coord_cartesian(xlim = c(0, 30))"},{"path":[]},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-14.html","id":"interpreting-the-results-of-fitting-the-cox-regression-model-to-data","dir":"Articles","previous_headings":"","what":"14.3 Interpreting the Results of Fitting the Cox Regression Model to Data","title":"Chapter 14: Fitting the Cox regression model","text":"Table 14.1, page 525: Table 14.2, page 533:","code":"# TODO: Make table model_A <- coxph(Surv(months, abs(censor - 1)) ~ person_crime, data = rearrest) model_B <- coxph(Surv(months, abs(censor - 1)) ~ property_crime, data = rearrest) model_C <- coxph(Surv(months, abs(censor - 1)) ~ age, data = rearrest) model_D <- coxph( Surv(months, abs(censor - 1)) ~ person_crime + property_crime + age, data = rearrest ) # TODO"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-14.html","id":"nonparametric-strategies-for-displaying-the-results-of-model-fitting","dir":"Articles","previous_headings":"","what":"14.4 Nonparametric Strategies for Displaying the Results of Model Fitting","title":"Chapter 14: Fitting the Cox regression model","text":"Figure 14.4, page 538: Figure 14.5, page 541:","code":"pmap( list( list(baseline = 0, average = mean(rearrest$person_crime)), list(0, mean(rearrest$property_crime)), list(0, mean(rearrest$age)) ), \\(.person_crime, .property_crime, .age) { model_D_baseline <- model_D |> survfit( newdata = tibble( person_crime = .person_crime, property_crime = .property_crime, age = .age) ) |> survfit0() |> tidy() survival <- ggplot(model_D_baseline, aes(x = time, y = estimate)) + geom_line() + geom_point() + scale_x_continuous(limits = c(0, 29)) + coord_cartesian(xlim = c(0, 36), ylim = c(0, 1)) cumulative_hazard <- ggplot( model_D_baseline, aes(x = time, y = -log(estimate)) ) + geom_line() + geom_point() + scale_x_continuous(limits = c(0, 29)) + coord_cartesian(xlim = c(0, 36), ylim = c(0, 1.5)) #TODO: Not sure if muhaz can deal with this situation with newdata hazard <- muhaz( model_D_baseline$time, 1 - model_D_baseline$n.censor, min.time = min(rearrest$months[rearrest$censor == 0]) + 8, max.time = max(rearrest$months[rearrest$censor == 0]) - 8, bw.grid = 8, bw.method = \"global\", b.cor = \"none\", kern = \"epanechnikov\" ) |> tidy() |> ggplot(aes(x = time, y = estimate)) + geom_line() + coord_cartesian(xlim = c(0, 36), ylim = c(0, 0.08)) survival + cumulative_hazard + hazard + plot_layout(ncol = 1) } ) |> patchwork::wrap_plots() #> Warning in muhaz(model_D_baseline$time, 1 - model_D_baseline$n.censor, min.time = min(rearrest$months[rearrest$censor == : minimum time > minimum Survival Time #> Warning in muhaz(model_D_baseline$time, 1 - model_D_baseline$n.censor, min.time = min(rearrest$months[rearrest$censor == : minimum time > minimum Survival Time #> Warning: Removed 6 rows containing missing values or values outside the scale range #> (`geom_line()`). #> Warning: Removed 6 rows containing missing values or values outside the scale range #> (`geom_point()`). #> Warning: Removed 6 rows containing missing values or values outside the scale range #> (`geom_line()`). #> Warning: Removed 6 rows containing missing values or values outside the scale range #> (`geom_point()`). #> Warning: Removed 6 rows containing missing values or values outside the scale range #> (`geom_line()`). #> Warning: Removed 6 rows containing missing values or values outside the scale range #> (`geom_point()`). #> Warning: Removed 6 rows containing missing values or values outside the scale range #> (`geom_line()`). #> Warning: Removed 6 rows containing missing values or values outside the scale range #> (`geom_point()`). #TODO: Not sure if muhaz can deal with this situation with newdata, not sure if the # estimates can be modified after fitting to get the desired values hazard_fit <- muhaz( rearrest$months, 1 - rearrest$censor, min.time = min(rearrest$months[rearrest$censor == 0]) + 8, max.time = max(rearrest$months[rearrest$censor == 0]) - 8, bw.grid = 8, bw.method = \"global\", b.cor = \"none\", kern = \"epanechnikov\" ) #> Warning in muhaz(rearrest$months, 1 - rearrest$censor, min.time = min(rearrest$months[rearrest$censor == : minimum time > minimum Survival Time hazard_fit |> str() #> List of 7 #> $ pin :List of 13 #> ..$ times : num [1:194] 0.0657 0.1314 0.23 0.2957 0.2957 ... #> ..$ delta : num [1:194] 1 1 1 1 1 1 1 0 1 1 ... #> ..$ nobs : int 194 #> ..$ min.time : num 8.07 #> ..$ max.time : num 21 #> ..$ n.min.grid : num 51 #> ..$ min.grid : num [1:51] 8.07 8.32 8.58 8.84 9.1 ... #> ..$ n.est.grid : num 101 #> ..$ bw.pilot : num 1.03 #> ..$ bw.smooth : num 5.16 #> ..$ method : int 1 #> ..$ b.cor : num 0 #> ..$ kernel.type: num 1 #> $ est.grid : num [1:101] 8.07 8.19 8.32 8.45 8.58 ... #> $ haz.est : num [1:101] 0.0418 0.0419 0.042 0.0421 0.0421 ... #> $ imse.opt : num 0 #> $ bw.glob : num 8 #> $ glob.imse: num 0 #> $ bw.grid : num 8 #> - attr(*, \"class\")= chr \"muhaz\" prototypical_individuals <- map2_df( # .person_crime list(neither = 0, personal_only = 1, property_only = 0, both = 1), # .property_crime list(0, 0, 1, 1), \\(.person_crime, .property_crime) { model_D |> survfit( newdata = tibble( person_crime = .person_crime, property_crime = .property_crime, age = mean(rearrest$age) ) ) |> survfit0() |> tidy() }, .id = \"prototypical_individual\" ) prototypical_individuals_survival <- ggplot( prototypical_individuals, aes(x = time, y = estimate, colour = prototypical_individual)) + geom_line() + scale_x_continuous(limits = c(0, 29)) + coord_cartesian(xlim = c(0, 36), ylim = c(0, 1)) + labs( y = \"Survival\" ) prototypical_individuals_cumhaz <- ggplot( prototypical_individuals, aes(x = time, y = -log(estimate), colour = prototypical_individual)) + geom_line() + scale_x_continuous(limits = c(0, 29)) + coord_cartesian(xlim = c(0, 36), ylim = c(0, 2)) + labs( y = \"Cumulative hazard\" ) prototypical_individuals_logcumhaz <- ggplot( filter(prototypical_individuals, time != 0), aes(x = time, y = log(-log(estimate)), colour = prototypical_individual)) + geom_line() + scale_x_continuous(limits = c(0, 29)) + scale_y_continuous(breaks = -7:1) + coord_cartesian(xlim = c(0, 36), ylim = c(-7, 1)) + labs( y = \"log Cumulative hazard\" ) prototypical_individuals_survival + prototypical_individuals_cumhaz + prototypical_individuals_logcumhaz + plot_layout(ncol = 1, guides = \"collect\") #> Warning: Removed 24 rows containing missing values or values outside the scale range #> (`geom_line()`). #> Removed 24 rows containing missing values or values outside the scale range #> (`geom_line()`). #> Removed 24 rows containing missing values or values outside the scale range #> (`geom_line()`)."},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-15.html","id":"time-varying-predictors","dir":"Articles","previous_headings":"","what":"15.1 Time-Varying Predictors","title":"Chapter 15: Extending the Cox regression model","text":"Table 15.1, page 548:","code":"# TODO: Clean up code and make table model_A <- coxph( Surv(used_cocaine_age, 1 - censor) ~ birthyr + early_marijuana_use + early_drug_use, data = first_cocaine ) # Model B ---- first_cocaine_pp <- first_cocaine |> group_by(id) |> reframe( # {survival} uses the counting process method for time-varying predictors, # so we need to construct intervals for the ages at which different events # occurred. These intervals are left-censored, so we start with the end # time; we also only require unique intervals, so duplicate ages should be # removed. age_end = sort(unique(c(used_cocaine_age, used_marijuana_age, used_drugs_age))), age_start = lag(age_end, default = 0), # Time-varying predictors should be lagged so that they describe an individual's # status in the immediately prior year. used_cocaine = if_else( age_end == used_cocaine_age & censor == 0, true = 1, false = 0, missing = 0 ), used_marijuana = if_else( age_end > used_marijuana_age, true = 1, false = 0, missing = 0 ), used_drugs = if_else( age_end > used_drugs_age, true = 1, false = 0, missing = 0 ), # Keep time-invariant predictors from the person-level data birthyr ) |> relocate(age_start, .before = age_end) model_B <- coxph( Surv(age_start, age_end, used_cocaine) ~ birthyr + used_marijuana + used_drugs, data = first_cocaine_pp, ties = \"efron\" ) ## This method with tmerge() also works tmerge( first_cocaine, first_cocaine, id = id, used_cocaine = event(used_cocaine_age, 1 - censor), used_marijuana = tdc(used_marijuana_age), used_drugs = tdc(used_drugs_age), options = list( tstartname = \"age_start\", tstopname = \"age_end\" ) ) |> as_tibble() |> arrange(id) #> Warning: Unknown or uninitialised column: `tstop`. #> Warning: Unknown or uninitialised column: `tstart`. #> Warning in tmerge(first_cocaine, first_cocaine, id = id, used_cocaine = #> event(used_cocaine_age, : replacement of variable 'used_marijuana' #> Warning in tmerge(first_cocaine, first_cocaine, id = id, used_cocaine = #> event(used_cocaine_age, : replacement of variable 'used_drugs' #> # A tibble: 3,086 × 18 #> id used_cocaine_age censor birthyr early_marijuana_use early_drug_use #> #> 1 5 41 1 0 0 0 #> 2 5 41 1 0 0 0 #> 3 5 41 1 0 0 0 #> 4 8 32 1 10 0 0 #> 5 9 36 1 5 0 0 #> 6 9 36 1 5 0 0 #> 7 11 41 1 0 0 0 #> 8 12 32 0 4 0 0 #> 9 12 32 0 4 0 0 #> 10 13 39 1 3 0 0 #> # ℹ 3,076 more rows #> # ℹ 12 more variables: used_marijuana , used_marijuana_age , #> # sold_marijuana , sold_marijuana_age , used_drugs , #> # used_drugs_age , sold_drugs , sold_drugs_age , rural , #> # age_start , age_end , used_cocaine coxph( Surv(age_start, age_end, used_cocaine) ~ birthyr + used_marijuana + used_drugs, data = tmerge( first_cocaine, first_cocaine, id = id, used_cocaine = event(used_cocaine_age, 1 - censor), used_marijuana = tdc(used_marijuana_age), used_drugs = tdc(used_drugs_age), options = list( tstartname = \"age_start\", tstopname = \"age_end\" ) ), ties = \"efron\" ) |> summary() #> Warning: Unknown or uninitialised column: `tstop`. #> Warning: Unknown or uninitialised column: `tstart`. #> Warning in tmerge(first_cocaine, first_cocaine, id = id, used_cocaine = #> event(used_cocaine_age, : replacement of variable 'used_marijuana' #> Warning in tmerge(first_cocaine, first_cocaine, id = id, used_cocaine = #> event(used_cocaine_age, : replacement of variable 'used_drugs' #> Warning: Unknown or uninitialised column: `tstop`. #> Warning: Unknown or uninitialised column: `tstart`. #> Warning in tmerge(first_cocaine, first_cocaine, id = id, used_cocaine = #> event(used_cocaine_age, : replacement of variable 'used_marijuana' #> Warning in tmerge(first_cocaine, first_cocaine, id = id, used_cocaine = #> event(used_cocaine_age, : replacement of variable 'used_drugs' #> Call: #> coxph(formula = Surv(age_start, age_end, used_cocaine) ~ birthyr + #> used_marijuana + used_drugs, data = tmerge(first_cocaine, #> first_cocaine, id = id, used_cocaine = event(used_cocaine_age, #> 1 - censor), used_marijuana = tdc(used_marijuana_age), #> used_drugs = tdc(used_drugs_age), options = list(tstartname = \"age_start\", #> tstopname = \"age_end\")), ties = \"efron\") #> #> n= 3086, number of events= 382 #> #> coef exp(coef) se(coef) z Pr(>|z|) #> birthyr 0.10741 1.11340 0.02145 5.008 5.5e-07 *** #> used_marijuana 2.55176 12.82972 0.28095 9.082 < 2e-16 *** #> used_drugs 1.85387 6.38446 0.12921 14.347 < 2e-16 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> exp(coef) exp(-coef) lower .95 upper .95 #> birthyr 1.113 0.89815 1.068 1.161 #> used_marijuana 12.830 0.07794 7.397 22.252 #> used_drugs 6.384 0.15663 4.956 8.225 #> #> Concordance= 0.876 (se = 0.008 ) #> Likelihood ratio test= 856 on 3 df, p=<2e-16 #> Wald test = 451.1 on 3 df, p=<2e-16 #> Score (logrank) test = 1039 on 3 df, p=<2e-16 # Model C and D ---- first_cocaine_pp_C <- first_cocaine |> group_by(id) |> reframe( age_end = sort( unique( c( used_cocaine_age, used_marijuana_age, used_drugs_age, sold_marijuana_age, sold_drugs_age ) ) ), age_start = lag(age_end, default = 0), # Time-varying predictors should be lagged so that they describe an individual's # status in the immediately prior year. used_cocaine = if_else( age_end == used_cocaine_age & censor == 0, true = 1, false = 0, missing = 0 ), used_marijuana = if_else( age_end > used_marijuana_age, true = 1, false = 0, missing = 0 ), used_drugs = if_else( age_end > used_drugs_age, true = 1, false = 0, missing = 0 ), sold_marijuana = if_else( age_end > sold_marijuana_age, true = 1, false = 0, missing = 0 ), sold_drugs = if_else( age_end > sold_drugs_age, true = 1, false = 0, missing = 0 ), # Keep time-invariant predictors from the person-level data birthyr, early_marijuana_use, early_drug_use, rural ) |> relocate(age_start, .before = age_end) first_cocaine_model_C <- coxph( Surv(age_start, age_end, used_cocaine) ~ birthyr + used_marijuana + used_drugs + sold_marijuana + sold_drugs, data = first_cocaine_pp_C, ties = \"efron\" ) model_D <- update(first_cocaine_model_C, . ~ . + early_marijuana_use + early_drug_use)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-15.html","id":"imputation-strategies-for-time-varying-predictors","dir":"Articles","previous_headings":"15.1 Time-Varying Predictors","what":"15.1.3 Imputation Strategies for Time-Varying Predictors","title":"Chapter 15: Extending the Cox regression model","text":"Section 15.1.3 Singer Willett (2003) discuss imputation strategies time-varying predictors using subset unpublished data Hall, Havassy, Wasserman (1990), measured relation number days relapse cocaine use several predictors might associated relapse sample 104 newly abstinent cocaine users recently completed abstinence-oriented treatment program. Former cocaine users followed 12 weeks post-treatment used cocaine 7 consecutive days. Self-reported abstinence confirmed interview absence cocaine urine specimens. example use cocaine_relapse_2 data set, person-period data frame 1248 rows 7 columns: id: Participant ID. days: Number days relapse cocaine use censoring. Relapse defined 4 days cocaine use week preceding interview. Study dropouts lost participants coded relapsing cocaine use, number days relapse coded occurring week last follow-interview attended. censor: Censoring status (0 = relapsed, 1 = censored). needle: Binary indicator whether cocaine ever used intravenously. base_mood: Total score positive mood subscales (Activity Happiness) Mood Questionnaire (Ryman, Biersner, & LaRocco, 1974), taken intake interview last week treatment. item used five point Likert score (ranging 0 = , 4 = extremely). followup: Week follow-interview. -mood: Total score positive mood subscales (Activity Happiness) Mood Questionnaire (Ryman, Biersner, & LaRocco, 1974), taken follow-interviews week post-treatment. item used five point Likert score (ranging 0 = , 4 = extremely). time relapse measured days follow-interviews conducted week, cocaine relapse data current form fails meet data requirement time-varying predictors: unique event time days know time-varying mood scores—everyone still risk —moments. Thus, order meet data requirement must generate predictor histories provide near-daily mood scores participant. proceeding steps develop three Cox regression models fitted cocaine relapse data illustrate compare different imputation strategies time-varying predictors. includes number days relapse cocaine use outcome variable; time-invariant predictor needle; different time-varying variable representing predictor total score positive mood subscales Mood Questionnaire (Ryman, Biersner, & LaRocco, 1974), explore following popular imputation strategies suggested Singer Willett (2003): Carry forward mood score next one available. Interpolate adjacent mood scores. Compute moving average based recent several past mood scores.","code":"glimpse(cocaine_relapse_2) #> Rows: 1,248 #> Columns: 7 #> $ id 550, 550, 550, 550, 550, 550, 550, 550, 550, 550, 550, 550, … #> $ censor 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, … #> $ days 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, … #> $ needle 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, … #> $ base_mood 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 25, 25, 25, … #> $ followup 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, … #> $ mood 23, 27, 28, 31, 29, 32, 33, 28, 36, 33, 33, 24, 31, 19, 29, …"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-15.html","id":"exploratory-data-analysis","dir":"Articles","previous_headings":"15.1 Time-Varying Predictors > 15.1.3 Imputation Strategies for Time-Varying Predictors","what":"Exploratory Data Analysis","title":"Chapter 15: Extending the Cox regression model","text":"begin exploring time-invariant variables cocaine_relapse_2 data. convenient one Cox regression models fitted later , using person-level version cocaine_relapse_2 data. total 62 newly abstinent cocaine users (59.6%) relapsed cocaine use within 12 weeks completing abstinence-oriented treatment program. users relapsed early-follow-period. Across sample 38 unique event times. total 69 participants (66.3%) reported previously used cocaine intravenously.","code":"cocaine_relapse_2_pl <- cocaine_relapse_2 |> pivot_wider( names_from = followup, names_prefix = \"mood_\", values_from = mood ) glimpse(cocaine_relapse_2_pl) #> Rows: 104 #> Columns: 17 #> $ id 550, 604, 608, 631, 513, 531, 533, 536, 599, 542, 564, 573, … #> $ censor 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, … #> $ days 83, 83, 83, 83, 82, 82, 82, 82, 82, 81, 81, 81, 81, 81, 81, … #> $ needle 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, … #> $ base_mood 29, 25, 37, 39, 33, 27, 10, 27, 28, 19, 35, 32, 32, 31, 31, … #> $ mood_1 23, 31, 40, 42, 43, 14, 16, 22, 29, 31, 30, 27, 27, 41, 40, … #> $ mood_2 27, 19, 37, 22, 25, 11, 26, 21, 28, 25, 33, 26, 27, NA, NA, … #> $ mood_3 28, 29, 36, 38, 34, 2, 37, 24, 25, 28, 33, 24, 23, 38, 31, 4… #> $ mood_4 31, 24, 36, 41, 42, 8, 17, 24, NA, 20, 33, 29, 22, 37, NA, N… #> $ mood_5 29, 22, 32, 41, 46, 3, 30, NA, NA, 23, 26, 21, 26, 24, 28, N… #> $ mood_6 32, 22, 35, 42, 42, 3, 15, 22, 16, 29, 35, 28, 28, 27, 28, 2… #> $ mood_7 33, NA, 34, 42, 46, 5, 16, NA, 22, 27, 35, 22, 24, 27, 31, 2… #> $ mood_8 28, 20, 35, 42, 46, 3, 15, 23, 23, NA, 33, 28, 17, 28, 22, 3… #> $ mood_9 36, 31, 29, 46, 47, 2, 21, 19, 24, NA, 29, 25, 14, 31, 24, 3… #> $ mood_10 33, 33, 36, NA, NA, 2, 14, 21, 16, 31, NA, NA, NA, 37, 33, N… #> $ mood_11 33, 30, 30, 47, 28, 0, 20, 15, 18, 23, 26, 25, 17, 38, 29, 2… #> $ mood_12 24, NA, 36, 43, 44, 0, 16, 18, 16, 21, NA, 25, 19, 34, 30, 2… cocaine_relapse_2_pl |> group_by(relapsed = 1 - censor) |> summarise(count = n()) |> mutate(proportion = count / sum(count)) #> # A tibble: 2 × 3 #> relapsed count proportion #> #> 1 0 42 0.404 #> 2 1 62 0.596 ggplot(cocaine_relapse_2_pl, aes(x = days)) + geom_histogram(binwidth = 7) + scale_x_continuous(breaks = c(0, 1:12 * 7)) + facet_wrap(vars(relapsed = 1 - censor), labeller = label_both) # We will use these event times later on during the imputation procedure for # Model B. It is important they are sorted in ascending order for this # procedure, so we do so here for convenience while creating the object. event_times <- cocaine_relapse_2_pl |> filter(1 - censor == 1) |> pull(days) |> unique() |> sort() censor_times <- cocaine_relapse_2_pl |> filter(censor == 1) |> pull(days) |> unique() event_times |> discard(\\(.x) .x %in% censor_times) |> length() #> [1] 38 cocaine_relapse_2_pl |> group_by(needle) |> summarise(count = n()) |> mutate(proportion = count / sum(count)) #> # A tibble: 2 × 3 #> needle count proportion #> #> 1 0 35 0.337 #> 2 1 69 0.663"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-15.html","id":"model-a-time-invariant-baseline","dir":"Articles","previous_headings":"15.1 Time-Varying Predictors > 15.1.3 Imputation Strategies for Time-Varying Predictors","what":"Model A: Time-Invariant Baseline","title":"Chapter 15: Extending the Cox regression model","text":"Model uses time-invariant predictor assessing respondent’s mood score just release treatment.","code":"model_A <- coxph( Surv(days, 1 - censor) ~ needle + base_mood, data = cocaine_relapse_2_pl, ties = \"efron\" ) summary(model_A) #> Call: #> coxph(formula = Surv(days, 1 - censor) ~ needle + base_mood, #> data = cocaine_relapse_2_pl, ties = \"efron\") #> #> n= 104, number of events= 62 #> #> coef exp(coef) se(coef) z Pr(>|z|) #> needle 1.020734 2.775232 0.314068 3.250 0.00115 ** #> base_mood -0.003748 0.996259 0.014709 -0.255 0.79886 #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> exp(coef) exp(-coef) lower .95 upper .95 #> needle 2.7752 0.3603 1.4996 5.136 #> base_mood 0.9963 1.0038 0.9679 1.025 #> #> Concordance= 0.63 (se = 0.036 ) #> Likelihood ratio test= 12.51 on 2 df, p=0.002 #> Wald test = 10.6 on 2 df, p=0.005 #> Score (logrank) test = 11.51 on 2 df, p=0.003"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-15.html","id":"model-b","dir":"Articles","previous_headings":"15.1 Time-Varying Predictors > 15.1.3 Imputation Strategies for Time-Varying Predictors","what":"Model B:","title":"Chapter 15: Extending the Cox regression model","text":"Model B return person-period version cocaine_relapse_2 data explore first imputation strategy suggested Singer Willett (2003): Carrying forward mood score next one available. procedure also lag mood score predictor one week—associating, example, first followup baseline mood scores, second followup first followup’s mood scores, forth. Next, prepare cocaine_relapse_2_prevweek data modelling, want transform data set number days relapse counting process (start, stop) format, person-period format : row transformed data set represents “-risk” time interval (day_start, day_end], open left closed right. event variable row 1 time interval ends event 0 otherwise. Variable values row values apply time interval. start end points time interval determined vector unique event_times, defined earlier. censored data, end point final time interval determined time censorship—included vector unique event times—needs handled separately. Transforming cocaine_relapse_2_prevweek data counting process format two-step process. First create counting process structure, columns participant ID, start time, stop time, event status record. also add week variable indicating week record occurred , important second step process. Note step done using either person-period person-level versions cocaine_relapse_2 data however, readability use person-level data . result can obtained using person-period data wrapping calls days censor unique(). Second, join cocaine_relapse_2_prevweek data counting process structure id week, giving us counting process formatted data time-varying predictor’s values occurring appropriate time interval participant. Finally, match text, rename mood score variable week_mood. survival package also comes two utility functions, survSplit() tmerge(), can used transform data counting process format. discussion, see vignette(\"timedep\", package=\"survival\"). Now can fit Model B.","code":"cocaine_relapse_2_prevweek <- cocaine_relapse_2 |> group_by(id) |> mutate( mood_previous_week = lag(mood, default = unique(base_mood)), mood_previous_week_fill = vec_fill_missing( mood_previous_week, direction = \"down\" ) ) cocaine_relapse_2_prevweek #> # A tibble: 1,248 × 9 #> # Groups: id [104] #> id censor days needle base_mood followup mood mood_previous_week #> #> 1 550 1 83 1 29 1 23 29 #> 2 550 1 83 1 29 2 27 23 #> 3 550 1 83 1 29 3 28 27 #> 4 550 1 83 1 29 4 31 28 #> 5 550 1 83 1 29 5 29 31 #> 6 550 1 83 1 29 6 32 29 #> 7 550 1 83 1 29 7 33 32 #> 8 550 1 83 1 29 8 28 33 #> 9 550 1 83 1 29 9 36 28 #> 10 550 1 83 1 29 10 33 36 #> # ℹ 1,238 more rows #> # ℹ 1 more variable: mood_previous_week_fill cocaine_relapse_2_prevweek_cp <- cocaine_relapse_2_pl |> group_by(id) |> reframe( # For censored data the final day should be a participant's days value, so # we need to concatenate their days to the vector of event times. The call # to unique() around the vector removes the duplicate for uncensored data in # the final time interval. day_end = unique(c(event_times[event_times <= days], days)), day_start = lag(day_end, default = 0), event = if_else(day_end == days & censor == 0, true = 1, false = 0), week = floor(day_end / 7) + 1 ) |> relocate(day_start, .after = id) cocaine_relapse_2_prevweek_cp #> # A tibble: 2,805 × 5 #> id day_start day_end event week #> #> 1 501 0 1 0 1 #> 2 501 1 2 0 1 #> 3 501 2 3 0 1 #> 4 501 3 4 0 1 #> 5 501 4 6 0 1 #> 6 501 6 7 0 2 #> 7 501 7 8 0 2 #> 8 501 8 9 0 2 #> 9 501 9 10 0 2 #> 10 501 10 11 0 2 #> # ℹ 2,795 more rows cocaine_relapse_2_prevweek_cp <- cocaine_relapse_2_prevweek_cp |> left_join( cocaine_relapse_2_prevweek, by = join_by(id == id, week == followup) ) |> rename(week_mood = mood_previous_week_fill) cocaine_relapse_2_prevweek_cp #> # A tibble: 2,805 × 12 #> id day_start day_end event week censor days needle base_mood mood #> #> 1 501 0 1 0 1 0 12 1 29 34 #> 2 501 1 2 0 1 0 12 1 29 34 #> 3 501 2 3 0 1 0 12 1 29 34 #> 4 501 3 4 0 1 0 12 1 29 34 #> 5 501 4 6 0 1 0 12 1 29 34 #> 6 501 6 7 0 2 0 12 1 29 19 #> 7 501 7 8 0 2 0 12 1 29 19 #> 8 501 8 9 0 2 0 12 1 29 19 #> 9 501 9 10 0 2 0 12 1 29 19 #> 10 501 10 11 0 2 0 12 1 29 19 #> # ℹ 2,795 more rows #> # ℹ 2 more variables: mood_previous_week , week_mood model_B <- coxph( Surv(day_start, day_end, event) ~ needle + week_mood, data = cocaine_relapse_2_prevweek_cp, ties = \"efron\" ) summary(model_B) #> Call: #> coxph(formula = Surv(day_start, day_end, event) ~ needle + week_mood, #> data = cocaine_relapse_2_prevweek_cp, ties = \"efron\") #> #> n= 2805, number of events= 62 #> #> coef exp(coef) se(coef) z Pr(>|z|) #> needle 1.07959 2.94348 0.31574 3.419 0.000628 *** #> week_mood -0.03490 0.96570 0.01387 -2.517 0.011832 * #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> exp(coef) exp(-coef) lower .95 upper .95 #> needle 2.9435 0.3397 1.5853 5.4654 #> week_mood 0.9657 1.0355 0.9398 0.9923 #> #> Concordance= 0.662 (se = 0.037 ) #> Likelihood ratio test= 18.61 on 2 df, p=9e-05 #> Wald test = 16.62 on 2 df, p=2e-04 #> Score (logrank) test = 17.49 on 2 df, p=2e-04"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-15.html","id":"model-c","dir":"Articles","previous_headings":"15.1 Time-Varying Predictors > 15.1.3 Imputation Strategies for Time-Varying Predictors","what":"Model C:","title":"Chapter 15: Extending the Cox regression model","text":"Model C also start lagged weekly mood scores, however, use different imputation strategy: Interpolating adjacent mood scores. Although Singer Willett (2003) suggest “resisting temptation design sophisticated imputation algorithms,” approach interpolating adjacent mood scores somewhat complex. Consequently, need create function suit purpose, rather using existing functions like zoo::na.approx() imputeTS::na_ma(). Singer Willett (2003) describe approach text, algorithm appears based following rules: Trailing NAs imputed consecutively using recent non-missing mood score. Internal NAs imputed using mean adjacent non-missing mood scores. consecutive internal NAs, following first imputed mood score sequence, every NA thereafter imputed using mean previous NA value’s imputed mood score next non-missing mood score. Imputed mood scores rounded nearest integer. Now can impute lagged weekly mood scores using na_adjacent() function. Next, prepare cocaine_relapse_2_adjacent data modelling, transform counting process format; however, Model C “-risk” time interval one day long. Following Singer Willett (2003), construct day_mood variable linearly interpolating adjacent weekly values yield daily values, assigning given day mood value imputed immediate prior day. Now can fit Model C. Table 15.2, page 555:","code":"na_adjacent <- function(x) { # The while loop is used here to allow us to carry forward imputed mood scores # for consecutive internal NAs. x_avg <- x while (any(is.na(x_avg[2:length(x)]))) { x_avg <- pslide_dbl( list( x_avg, vec_fill_missing(x_avg, direction = \"down\"), vec_fill_missing(x_avg, direction = \"up\") ), \\(.x, .x_fill_down, .x_fill_up) { case_when( # Rule 1: all(is.na(.x[3:length(.x)])) ~ .x_fill_down[2], # Rule 2: !is.na(.x[1]) & is.na(.x[2]) ~ mean(c(.x_fill_up[1], .x_fill_up[2])), TRUE ~ .x[2] ) }, .before = 1, .after = Inf, .complete = TRUE ) # Rule 3. We are not using round() here because it goes to the even digit when # rounding off a 5, rather than always going upward. x_avg <- if_else(x_avg %% 1 < .5, floor(x_avg), ceiling(x_avg)) x_avg[1] <- x[1] } x_avg } cocaine_relapse_2_adjacent <- cocaine_relapse_2_prevweek |> group_by(id) |> mutate( # It's important to include the final follow-up when imputing between # adjacent mood scores, otherwise cases where the second last score is an # internal NA will fill down instead of using the mean between adjacent mood # scores. However, afterwards the final follow-up can be dropped. mood_adjacent_lag = na_adjacent(c(mood_previous_week, last(mood)))[-13], # We also want the non-lagged mood scores for later, which we impute using # similar logic. mood_adjacent = na_adjacent(c(first(mood_previous_week), mood))[-1] ) # Here is a small preview of the difference between the imputation strategies # for Models B and C: cocaine_relapse_2_adjacent |> filter(id == 544) |> select(id, followup, mood_previous_week:mood_adjacent_lag) #> # A tibble: 12 × 5 #> # Groups: id [1] #> id followup mood_previous_week mood_previous_week_fill mood_adjacent_lag #> #> 1 544 1 40 40 40 #> 2 544 2 40 40 40 #> 3 544 3 38 38 38 #> 4 544 4 27 27 27 #> 5 544 5 NA 27 25 #> 6 544 6 22 22 22 #> 7 544 7 NA 22 21 #> 8 544 8 NA 22 21 #> 9 544 9 20 20 20 #> 10 544 10 NA 20 25 #> 11 544 11 30 30 30 #> 12 544 12 28 28 28 cocaine_relapse_2_adjacent_cp <- cocaine_relapse_2_adjacent |> group_by(id, followup) |> reframe( day_end = (followup - 1) * 7 + 1:7, day_start = day_end - 1, days = unique(days), censor = unique(censor), event = if_else( day_end == days & censor == 0, true = 1, false = 0 ), needle = unique(needle), mood_day = approx(c(mood_adjacent_lag, mood_adjacent), n = 8)[[2]][1:7], ) |> relocate(day_start, day_end, days, .after = id) |> filter(day_end <= days) cocaine_relapse_2_adjacent_cp #> # A tibble: 4,948 × 9 #> id day_start day_end days followup censor event needle mood_day #> #> 1 501 0 1 12 1 0 0 1 29 #> 2 501 1 2 12 1 0 0 1 29.7 #> 3 501 2 3 12 1 0 0 1 30.4 #> 4 501 3 4 12 1 0 0 1 31.1 #> 5 501 4 5 12 1 0 0 1 31.9 #> 6 501 5 6 12 1 0 0 1 32.6 #> 7 501 6 7 12 1 0 0 1 33.3 #> 8 501 7 8 12 2 0 0 1 34 #> 9 501 8 9 12 2 0 0 1 31.9 #> 10 501 9 10 12 2 0 0 1 29.7 #> # ℹ 4,938 more rows model_C <- coxph( Surv(day_start, day_end, event) ~ needle + mood_day, data = cocaine_relapse_2_adjacent_cp, ties = \"efron\" ) summary(model_C) #> Call: #> coxph(formula = Surv(day_start, day_end, event) ~ needle + mood_day, #> data = cocaine_relapse_2_adjacent_cp, ties = \"efron\") #> #> n= 4948, number of events= 62 #> #> coef exp(coef) se(coef) z Pr(>|z|) #> needle 1.12077 3.06720 0.31700 3.536 0.000407 *** #> mood_day -0.05438 0.94707 0.01489 -3.651 0.000261 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> exp(coef) exp(-coef) lower .95 upper .95 #> needle 3.0672 0.326 1.6478 5.7091 #> mood_day 0.9471 1.056 0.9198 0.9751 #> #> Concordance= 0.695 (se = 0.036 ) #> Likelihood ratio test= 25.52 on 2 df, p=3e-06 #> Wald test = 23.1 on 2 df, p=1e-05 #> Score (logrank) test = 24.04 on 2 df, p=6e-06 # TODO"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-15.html","id":"nonproportional-hazards-models-via-stratification","dir":"Articles","previous_headings":"","what":"15.2 Nonproportional Hazards Models via Stratification","title":"Chapter 15: Extending the Cox regression model","text":"Figure 15.2, page 559: Table 15.3, page 560:","code":"# FIXME: The upper limit of the data doesn't match the textbook. survfit(Surv(used_cocaine_age, 1 - censor) ~ rural, data = first_cocaine) |> tidy() |> mutate( strata = stringr::str_remove(strata, \"rural=\"), cumulative_hazard = -log(estimate), log_cumulative_hazard = log(cumulative_hazard) ) |> rename(rural = strata) |> ggplot(aes(x = time, y = log_cumulative_hazard, linetype = rural)) + geom_line() + coord_cartesian(ylim = c(-6, -1)) # first_cocaine_model_C from earlier is the first model first_cocaine_model_C #> Call: #> coxph(formula = Surv(age_start, age_end, used_cocaine) ~ birthyr + #> used_marijuana + used_drugs + sold_marijuana + sold_drugs, #> data = first_cocaine_pp_C, ties = \"efron\") #> #> coef exp(coef) se(coef) z p #> birthyr 0.08493 1.08864 0.02183 3.890 1e-04 #> used_marijuana 2.45920 11.69542 0.28357 8.672 < 2e-16 #> used_drugs 1.25110 3.49419 0.15656 7.991 1.34e-15 #> sold_marijuana 0.68989 1.99349 0.12263 5.626 1.84e-08 #> sold_drugs 0.76037 2.13908 0.13066 5.819 5.91e-09 #> #> Likelihood ratio test=944.5 on 5 df, p=< 2.2e-16 #> n= 3312, number of events= 382 first_cocaine_model_stratified <- update( first_cocaine_model_C, . ~ . + strata(rural) ) first_cocaine_model_nonrural <- update( first_cocaine_model_C, subset = rural == 0 ) first_cocaine_model_rural <- update( first_cocaine_model_C, subset = rural == 1 ) # TODO: Make table."},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-15.html","id":"nonproportional-hazards-models-via-interactions-with-time","dir":"Articles","previous_headings":"","what":"15.3 Nonproportional Hazards Models via Interactions with Time","title":"Chapter 15: Extending the Cox regression model","text":"Table 15.4 page 566: Figure 15.3 page 567:","code":"# TODO psychiatric_discharge #> # A tibble: 174 × 4 #> id days censor treatment_plan #> #> 1 2 3 0 1 #> 2 8 46 0 0 #> 3 73 30 0 0 #> 4 76 45 0 0 #> 5 78 22 0 0 #> 6 79 50 0 0 #> 7 81 59 0 0 #> 8 83 44 0 0 #> 9 95 44 0 1 #> 10 117 22 0 0 #> # ℹ 164 more rows # FIXME: The upper limit of the data doesn't match the textbook. survfit(Surv(days, 1 - censor) ~ treatment_plan, data = psychiatric_discharge) |> tidy() |> mutate( strata = stringr::str_remove(strata, \"treatment_plan=\"), cumulative_hazard = -log(estimate), log_cumulative_hazard = log(cumulative_hazard) ) |> rename(treatment_plan = strata) |> ggplot(aes(x = time, y = log_cumulative_hazard, linetype = treatment_plan)) + geom_hline(yintercept = 0, linewidth = .25, linetype = 3) + geom_line() + coord_cartesian(xlim = c(0, 77), ylim = c(-4, 2)) # TODO: Bottom panel"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-15.html","id":"regression-diagnostics","dir":"Articles","previous_headings":"","what":"15.4 Regression Diagnostics","title":"Chapter 15: Extending the Cox regression model","text":"Figure 15.4 page 573: Figure 15.5 page 577: Figure 15.6 page 580: Figure 15.7 page 583:","code":"rearrest <- rearrest |> mutate(rank_time = rank(months, ties.method = \"average\"), .after = \"months\") rearrest_null_model <- coxph(Surv(months, 1 - censor) ~ 1, data = rearrest) rearrest_full_model <- update( rearrest_null_model, . ~ . + person_crime + property_crime + age ) rearrest_models <- list( null = rearrest_null_model, full = rearrest_full_model ) rearrest_fits <- rearrest_models |> map( \\(.x) { map_df( list(martingale = \"martingale\", deviance = \"deviance\"), \\(.y) augment( .x, data = rearrest, type.predict = \"lp\", type.residuals = .y ), .id = \".resid_type\" ) } ) |> list_rbind(names_to = \"model\") |> mutate( model = factor(model, levels = c(\"null\", \"full\")), censored = as.logical(censor) ) rearrest_fits |> filter(.resid_type == \"martingale\") |> ggplot(aes(x = age, y = .resid)) + geom_hline(yintercept = 0, linewidth = .25, linetype = 3) + geom_point(aes(shape = censored)) + scale_shape_manual(values = c(16, 3)) + geom_smooth(se = FALSE) + facet_wrap(vars(model), ncol = 1, labeller = label_both) + coord_cartesian(ylim = c(-3, 1)) #> `geom_smooth()` using method = 'loess' and formula = 'y ~ x' stem(resid(rearrest_full_model, type = \"deviance\"), scale = 2) #> #> The decimal point is 1 digit(s) to the left of the | #> #> -22 | 09 #> -20 | 21 #> -18 | 6491 #> -16 | 969321 #> -14 | 54216 #> -12 | 87654741 #> -10 | 208776542110 #> -8 | 99852176544 #> -6 | 876322098874400 #> -4 | 444443332877644210 #> -2 | 85009955411 #> -0 | 997597551 #> 0 | 268979 #> 2 | 0318 #> 4 | 133678802337 #> 6 | 2688919 #> 8 | 1136690012233889999 #> 10 | 122334724789 #> 12 | 0115 #> 14 | 0228055578 #> 16 | 03795 #> 18 | 2336 #> 20 | 0735 #> 22 | 6 #> 24 | 00 #> 26 | 7 rearrest_fits |> filter(model == \"full\" & .resid_type == \"deviance\") |> ggplot(aes(x = .fitted, y = .resid)) + geom_hline(yintercept = 0, linewidth = .25, linetype = 3) + geom_point(aes(shape = censored)) + scale_shape_manual(values = c(16, 3)) + scale_x_continuous(breaks = -3:3) + scale_y_continuous(breaks = -3:3) + coord_cartesian(xlim = c(-3, 3), ylim = c(-3, 3)) # augment.coxph is bugged and won't return the .resid column when using # `newdata`, likely related to this issue: https://github.com/tidymodels/broom/issues/937 # So this code doesn't work: # augment( # rearrest_full_model, # newdata = filter(rearrest, censor == 0), # type.predict = \"lp\", # type.residuals = \"schoenfeld\" # ) # Likewise, `data` can't be used because it expects the full dataset; thus, it # will error out even when using the filtered data. # However, updating the model first does work: # Schoenfeld residuals only pertain to those who experience the event, so we need # to update the model before retrieving them, and only use a subset of the data # when getting predictions. rearrest_full_model |> update(subset = censor == 0) |> augment( data = filter(rearrest, censor == 0), type.predict = \"lp\", type.residuals = \"schoenfeld\" ) |> mutate(.resid = as.data.frame(.resid)) |> unnest_wider(col = .resid, names_sep = \"_\") |> pivot_longer( cols = starts_with(\".resid\"), names_to = \"predictor\", values_to = \".resid\" ) |> mutate( predictor = stringr::str_remove(predictor, \".resid_\"), predictor = factor( predictor, levels = c(\"person_crime\", \"property_crime\", \"age\") ) ) |> ggplot(aes(x = rank_time, y = .resid)) + geom_hline(yintercept = 0, linewidth = .25, linetype = 3) + geom_point() + scale_shape_manual(values = c(16, 3)) + geom_smooth(se = FALSE, span = 1) + facet_wrap( vars(predictor), ncol = 1, scales = \"free_y\", labeller = label_both ) + scale_x_continuous(n.breaks = 8) + ggh4x::facetted_pos_scales( y = list( predictor == \"person_crime\" ~ scale_y_continuous(limits = c(-.5, 1)), predictor == \"property_crime\" ~ scale_y_continuous( n.breaks = 7, limits = c(-1, .2) ), predictor == \"age\" ~ scale_y_continuous(limits = c(-10, 20)) ) ) + coord_cartesian(xlim = c(0, 175)) #> `geom_smooth()` using method = 'loess' and formula = 'y ~ x' # TODO: set y-axis scales to match textbook. rearrest_full_model |> augment( data = rearrest, type.predict = \"lp\", type.residuals = \"score\" ) |> mutate(.resid = as.data.frame(.resid)) |> unnest_wider(col = .resid, names_sep = \"_\") |> pivot_longer( cols = starts_with(\".resid\"), names_to = \"predictor\", values_to = \".resid\" ) |> mutate( predictor = stringr::str_remove(predictor, \".resid_\"), predictor = factor( predictor, levels = c(\"person_crime\", \"property_crime\", \"age\") ), censored = as.logical(censor) ) |> ggplot(aes(x = rank_time, y = .resid)) + geom_hline(yintercept = 0, linewidth = .25, linetype = 3) + geom_point(aes(shape = censored)) + scale_shape_manual(values = c(16, 3)) + facet_wrap( vars(predictor), ncol = 1, scales = \"free_y\", labeller = label_both )"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-15.html","id":"competing-risks","dir":"Articles","previous_headings":"","what":"15.5 Competing Risks","title":"Chapter 15: Extending the Cox regression model","text":"Figure 15.8 page 589: Table 15.7 page 592:","code":"judges_null_models <- list( dead = survfit(Surv(tenure, dead) ~ 1, data = judges), retired = survfit(Surv(tenure, retired) ~ 1, data = judges) ) judges_null_models_tidy <- map( judges_null_models, \\(.x) { .x |> survfit0() |> tidy() |> mutate(cumulative_hazard = -log(estimate)) |> select(time, survival = estimate, cumulative_hazard) |> pivot_longer( cols = c(survival, cumulative_hazard), names_to = \"statistic\", values_to = \"estimate\" ) } ) # Estimate and tidy smoothed hazards judges_kernel_smoothed_hazards_tidy <- map( list( judges_dead = judges$dead, judges_retired = judges$retired ), \\(event) { kernel_smoothed_hazard <- muhaz( judges$tenure, event, min.time = min(judges$tenure[event == 0]) + 6, max.time = max(judges$tenure[event == 0]) - 6, bw.grid = 6, bw.method = \"global\", b.cor = \"none\", kern = \"epanechnikov\" ) kernel_smoothed_hazard |> tidy() |> mutate(statistic = \"hazard\") } ) #> Warning in muhaz(judges$tenure, event, min.time = min(judges$tenure[event == : minimum time > minimum Survival Time #> Warning in muhaz(judges$tenure, event, min.time = min(judges$tenure[event == : minimum time > minimum Survival Time # Combine estimates estimates_tidy <- map2_df( judges_null_models_tidy, judges_kernel_smoothed_hazards_tidy, \\(.x, .y) { bind_rows(.x, .y) |> mutate(statistic = factor( statistic, levels = c(\"survival\", \"cumulative_hazard\", \"hazard\")) ) }, .id = \"event\" ) ggplot(estimates_tidy, aes(x = time, y = estimate, linetype = event)) + geom_step(data = \\(.x) filter(.x, statistic != \"hazard\")) + geom_line(data = \\(.x) filter(.x, statistic == \"hazard\")) + facet_wrap(vars(statistic), ncol = 1, scales = \"free_y\") judges_model_A <- coxph( Surv(tenure, dead) ~ appointment_age + appointment_year, data = judges ) judges_model_B <- coxph( Surv(tenure, retired) ~ appointment_age + appointment_year, data = judges ) judges_model_C <- coxph( Surv(tenure, left_appointment) ~ appointment_age + appointment_year, data = judges ) # TODO: Make table."},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-15.html","id":"late-entry-into-the-risk-set","dir":"Articles","previous_headings":"","what":"15.6 Late Entry into the Risk Set","title":"Chapter 15: Extending the Cox regression model","text":"Table 15.8 page 601:","code":"# Model A ---- # First we need to transform to a counting process format. physicians_event_times_A <- physicians |> filter(1 - censor == 1) |> pull(exit) |> unique() |> sort() # We'll use survSplit() this time around. physicians_cp_A <- physicians |> mutate(event = 1 - censor) |> survSplit( Surv(entry, exit, event) ~ ., data = _, cut = physicians_event_times_A, end = \"exit\" ) |> as_tibble() # The warning message here can be ignored. physicians_model_A <- coxph( Surv(entry, exit, event) ~ part_time + age + age:exit, data = physicians_cp_A ) #> Warning in coxph(Surv(entry, exit, event) ~ part_time + age + age:exit, : a #> variable appears on both the left and right sides of the formula # Model B ---- physicians_event_times_B <- physicians |> filter(1 - censor == 1 & entry == 0) |> pull(exit) |> unique() |> sort() physicians_cp_B <- physicians |> filter(entry == 0) |> mutate(event = 1 - censor) |> survSplit( Surv(entry, exit, event) ~ ., data = _, cut = physicians_event_times_B, end = \"exit\" ) |> as_tibble() physicians_model_B <- coxph( Surv(entry, exit, event) ~ part_time + age + age:exit, data = physicians_cp_B ) #> Warning in coxph(Surv(entry, exit, event) ~ part_time + age + age:exit, : a #> variable appears on both the left and right sides of the formula # Model C ---- physicians_cp_C <- physicians |> mutate( event = 1 - censor, entry = 0 ) |> survSplit( Surv(entry, exit, event) ~ ., data = _, cut = physicians_event_times_A, end = \"exit\" ) |> as_tibble() physicians_model_C <- coxph( Surv(entry, exit, event) ~ part_time + age + age:exit, data = physicians_cp_C ) #> Warning in coxph(Surv(entry, exit, event) ~ part_time + age + age:exit, : a #> variable appears on both the left and right sides of the formula # TODO: Make table and clean up code."},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-15.html","id":"using-late-entrants-to-introduce-alternative-metrics-for-clocking-time","dir":"Articles","previous_headings":"15.6 Late Entry into the Risk Set","what":"15.6.2 Using Late Entrants to Introduce Alternative Metrics for Clocking Time","title":"Chapter 15: Extending the Cox regression model","text":"Table 15.9 page 604:","code":"monkeys_model_A <- coxph( Surv(sessions, 1 - censor) ~ initial_age + birth_weight + female, data = monkeys ) monkeys_model_B <- update(monkeys_model_A, Surv(end_age, 1 - censor) ~ .) # The warning message here can be ignored. monkeys_model_C <- update( monkeys_model_A, Surv(initial_age, end_age, 1 - censor) ~ . ) #> Warning in coxph(formula = Surv(initial_age, end_age, 1 - censor) ~ initial_age #> + : a variable appears on both the left and right sides of the formula # TODO: Make table."},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"creating-a-longitudinal-data-set","dir":"Articles","previous_headings":"","what":"2.1 Creating a longitudinal data set","title":"Chapter 2: Exploring longitudinal data on change","text":"Section 2.1 Singer Willett (2003) introduce two distinct formats data organization longitudinal data—person-level format person-period format—using subset data National Youth Survey (NYS) measuring development tolerance towards deviant behaviour adolescents time relation self-reported sex exposure deviant peers (Raudenbush & Chan, 1992). Adolescents’ tolerance towards deviant behaviour based 9-item scale measuring attitudes tolerant deviant behaviour. scale administered year age 11 15 time-varying variable. However, adolescents’ self-reported sex exposure deviant peers recorded beginning study period time-invariant variables. example illustrate difference two formats using deviant_tolerance_pl deviant_tolerance_pp data sets, correspond adolescent tolerance deviant behaviour data organized person-level person-period formats, respectively.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"the-person-level-data-set","dir":"Articles","previous_headings":"2.1 Creating a longitudinal data set","what":"The Person-Level Data Set","title":"Chapter 2: Exploring longitudinal data on change","text":"person-level format (also known wide multivariate format), person one row data multiple columns containing data measurement occasion time-varying variables. demonstrated deviant_tolerance_pl data set, person-level data frame 16 rows 8 columns: id: Participant ID. tolerance_11, tolerance_12, tolerance_13, tolerance_14, tolerance_15: Average score across 9-item scale assessing attitudes favourable deviant behaviour ages 11, 12, 13, 14, 15. item used four point scale (1 = wrong, 2 = wrong, 3 = little bit wrong, 4 = wrong ). male: Binary indicator whether adolescent male. exposure: Average score across 9-item scale assessing level exposure deviant peers. item used five point Likert score (ranging 0 = none, 4 = ). Although person-level format common cross-sectional research, four disadvantages make ill-suited longitudinal data analysis: restricts data analysis examining rank order wave--wave relationships, leading non-informative summaries tell us person changes time, even direction change. omits explicit time-indicator variable, rendering time unavailable data analysis. requires adding additional variable data set unique measurement occasion, making inefficient useless number spacing measurement occasions varies across individuals. requires adding additional set columns time-varying predictor (one column per measurement occasion), rendering unable easily handle presence time-varying predictors. Singer Willett (2003) exemplify first disadvantages postulating one might analyze person-level tolerance towards deviant behaviour data set. natural approach summarize wave--wave relationships among tolerance_11 tolerance_15 using bivariate correlations bivariate plots; however, tell us anything adolescent tolerance towards deviant behaviour changed time either individuals groups. Rather, weak positive correlation measurement occasions merely tells us rank order tolerance towards deviant behaviour remained relatively stable across occasions—, adolescents tolerant towards deviant behaviour one measurement occasion tended tolerant next. first disadvantage also apparent examining bivariate plots measurement occasions: way tell adolescent tolerance towards deviant behaviour changed time either individuals groups. Moreover, lack explicit time-indicator variable, isn’t possible plot person-level data set meaningful way, time series plot organized id. Considered together, disadvantages make person-level format ill-suited longitudinal data analyses. Fortunately, disadvantages person-level format can addressed simple conversion person-period format.","code":"deviant_tolerance_pl #> # A tibble: 16 × 8 #> id tolerance_11 tolerance_12 tolerance_13 tolerance_14 tolerance_15 male #> #> 1 9 2.23 1.79 1.9 2.12 2.66 0 #> 2 45 1.12 1.45 1.45 1.45 1.99 1 #> 3 268 1.45 1.34 1.99 1.79 1.34 1 #> 4 314 1.22 1.22 1.55 1.12 1.12 0 #> 5 442 1.45 1.99 1.45 1.67 1.9 0 #> 6 514 1.34 1.67 2.23 2.12 2.44 1 #> 7 569 1.79 1.9 1.9 1.99 1.99 0 #> 8 624 1.12 1.12 1.22 1.12 1.22 1 #> 9 723 1.22 1.34 1.12 1 1.12 0 #> 10 918 1 1 1.22 1.99 1.22 0 #> 11 949 1.99 1.55 1.12 1.45 1.55 1 #> 12 978 1.22 1.34 2.12 3.46 3.32 1 #> 13 1105 1.34 1.9 1.99 1.9 2.12 1 #> 14 1542 1.22 1.22 1.99 1.79 2.12 0 #> 15 1552 1 1.12 2.23 1.55 1.55 0 #> 16 1653 1.11 1.11 1.34 1.55 2.12 0 #> # ℹ 1 more variable: exposure # Table 2.1, page 20: deviant_tolerance_pl |> select(starts_with(\"tolerance\")) |> correlate(diagonal = 1) |> shave() |> fashion() #> term tolerance_11 tolerance_12 tolerance_13 tolerance_14 tolerance_15 #> 1 tolerance_11 1.00 #> 2 tolerance_12 .66 1.00 #> 3 tolerance_13 .06 .25 1.00 #> 4 tolerance_14 .14 .21 .59 1.00 #> 5 tolerance_15 .26 .39 .57 .83 1.00 deviant_tolerance_pl |> select(starts_with(\"tolerance\")) |> pairs()"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"the-person-period-data-set","dir":"Articles","previous_headings":"2.1 Creating a longitudinal data set","what":"The Person-Period Data Set","title":"Chapter 2: Exploring longitudinal data on change","text":"person-period format (also known long univariate format), person one row data measurement occasion, participant identifier variable person, time-indicator variable measurement occasion. format, time-invariant variables identical values across measurement occasion; whereas time-varying variables potentially differing values. demonstrated deviant_tolerance_pp data set, person-period data frame 80 rows 5 columns: id: Participant ID. age: Adolescent age years. tolerance: Average score across 9-item scale assessing attitudes favourable deviant behaviour. item used four point scale (1 = wrong, 2 = wrong, 3 = little bit wrong, 4 = wrong ). male: Binary indicator whether adolescent male. exposure: Average score across 9-item scale assessing level exposure deviant peers. item used five point Likert score (ranging 0 = none, 4 = ). Although person-period data set contains information person-level data set, format data organization makes amenable longitudinal data analysis, specifically: includes explicit participant identifier variable, enabling data sorted person-specific subsets. includes explicit time-indicator variable, rendering time available data analysis, accommodating research designs number spacing measurement occasions varies across individuals. needs single column variable data set—whether time-varying time-invariant, outcome predictor—making trivial handle number variables. Indeed, R functions designed work data person-period format—falls larger umbrella tidy data format—due R’s vectorized nature. Wickham, Çetinkaya-Rundel, Grolemund (2023) explain, three interrelated rules make data set tidy: variable must column. observation must row. value must cell. Thus, person-period format simply special case tidy data format, distinguishes longitudinal nature requirements explicit participant identifier time-indicator variables.","code":"deviant_tolerance_pp #> # A tibble: 80 × 5 #> id age tolerance male exposure #> #> 1 9 11 2.23 0 1.54 #> 2 9 12 1.79 0 1.54 #> 3 9 13 1.9 0 1.54 #> 4 9 14 2.12 0 1.54 #> 5 9 15 2.66 0 1.54 #> 6 45 11 1.12 1 1.16 #> 7 45 12 1.45 1 1.16 #> 8 45 13 1.45 1 1.16 #> 9 45 14 1.45 1 1.16 #> 10 45 15 1.99 1 1.16 #> # ℹ 70 more rows"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"converting-between-person-level-and-person-period-data-sets","dir":"Articles","previous_headings":"2.1 Creating a longitudinal data set","what":"Converting Between Person-Level and Person-Period Data Sets","title":"Chapter 2: Exploring longitudinal data on change","text":"Unfortunately, longitudinal data often initially stored person-level data set, meaning real analyses require least little tidying get data person-period format. reasons : Many people aren’t familiar principles tidy data—special cases like person person-period format—’s hard derive without spending lot time working longitudinal data. person-level format closely resembles familiar cross-sectional data-set format, making seemingly sensible default inexperienced analysts. Data often organized facilitate non-analytical goals, data entry, rather data analysis. Thus, essential skill aspiring longitudinal data analyst able convert person-level data set person-period data set. tidyr package provides two functions can easily convert longitudinal data set one format : pivot_longer() pivot_wider(). convert person-level data set person-period data set use pivot_longer(): person-level data, five key arguments: cols specifies columns need pivoted longer format—longitudinal data, always columns corresponding time-varying variables. argument uses tidy selection, small data science language selecting columns data frame (?tidyr_tidy_select), making simple select column time-varying variable based naming pattern. names_to names new column (columns) create information stored column names specified cols. named new column age. names_prefix removes matching text start column name—longitudinal data, always prefix time-varying variables separating variable name measurement occasion. argument uses regular expression select matching text. names_transform applies function new column (columns). converted new column age type character type integer. values_to names new column (columns) create data stored cell values. named new column tolerance. Note \"age\" \"tolerance\" quoted call pivot_longer() represent column names new variables ’re creating, rather already-existing variables data. Although longitudinal data analyses begin getting data person-period format, can occasionally useful go opposite direction. computations can made easier using person-period data set, certain functions analyses expect person-period data set; therefore, ’s helpful know untidy, transform, re-tidy data needed. convert person-period data set person-level data set use dplyr::pivot_wider(): person-period data, three key arguments: names_from specifies column (columns) get name output columns —longitudinal data, always columns corresponding time-indicator variables. names_prefix adds specified string start output column name—longitudinal data, always prefix time-varying variables separating variable name measurement occasion. values_from specifies column (columns) get cell values —longitudinal data, always columns corresponding time-varying variables. learn principles tidy data pivoting works, see Data Tidying chapter R Data Science.","code":"# Figure 2.1, page 18: pivot_longer( deviant_tolerance_pl, cols = starts_with(\"tolerance_\"), names_to = \"age\", names_prefix = \"tolerance_\", names_transform = as.integer, values_to = \"tolerance\" ) #> # A tibble: 80 × 5 #> id male exposure age tolerance #> #> 1 9 0 1.54 11 2.23 #> 2 9 0 1.54 12 1.79 #> 3 9 0 1.54 13 1.9 #> 4 9 0 1.54 14 2.12 #> 5 9 0 1.54 15 2.66 #> 6 45 1 1.16 11 1.12 #> 7 45 1 1.16 12 1.45 #> 8 45 1 1.16 13 1.45 #> 9 45 1 1.16 14 1.45 #> 10 45 1 1.16 15 1.99 #> # ℹ 70 more rows pivot_wider( deviant_tolerance_pp, names_from = age, names_prefix = \"tolerance_\", values_from = tolerance ) #> # A tibble: 16 × 8 #> id male exposure tolerance_11 tolerance_12 tolerance_13 tolerance_14 #> #> 1 9 0 1.54 2.23 1.79 1.9 2.12 #> 2 45 1 1.16 1.12 1.45 1.45 1.45 #> 3 268 1 0.9 1.45 1.34 1.99 1.79 #> 4 314 0 0.81 1.22 1.22 1.55 1.12 #> 5 442 0 1.13 1.45 1.99 1.45 1.67 #> 6 514 1 0.9 1.34 1.67 2.23 2.12 #> 7 569 0 1.99 1.79 1.9 1.9 1.99 #> 8 624 1 0.98 1.12 1.12 1.22 1.12 #> 9 723 0 0.81 1.22 1.34 1.12 1 #> 10 918 0 1.21 1 1 1.22 1.99 #> 11 949 1 0.93 1.99 1.55 1.12 1.45 #> 12 978 1 1.59 1.22 1.34 2.12 3.46 #> 13 1105 1 1.38 1.34 1.9 1.99 1.9 #> 14 1542 0 1.44 1.22 1.22 1.99 1.79 #> 15 1552 0 1.04 1 1.12 2.23 1.55 #> 16 1653 0 1.25 1.11 1.11 1.34 1.55 #> # ℹ 1 more variable: tolerance_15 "},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"descriptive-analysis-of-individual-change-over-time","dir":"Articles","previous_headings":"","what":"2.2 Descriptive analysis of individual change over time","title":"Chapter 2: Exploring longitudinal data on change","text":"Section 2.2 Singer Willett (2003) use deviant_tolerance_pp data set demonstrate person-period format facilitates exploratory analyses describe individuals data set change time, revealing nature idiosyncrasies person’s temporal pattern change.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"empirical-growth-plots","dir":"Articles","previous_headings":"2.2 Descriptive analysis of individual change over time","what":"Empirical Growth Plots","title":"Chapter 2: Exploring longitudinal data on change","text":"Empirical growth plots show, individual, sequence change time-varying variable. change can evaluated either absolute terms scale variable interest, relative terms comparison sample members. Singer Willett (2003) identify several questions helpful answer examining empirical growth plots: increasing? decreasing? increasing ? least? decreasing ? least? anyone increase decrease? anyone decrease increase? construct empirical growth plot ggplot2 package, put time-indicator x-axis, time-varying variable y-axis, facet individual separate panel. data set large, Singer Willett (2003) suggest constructing empirical growth plots randomly selected subsample individuals—perhaps stratified groups defined values important predictors—rather using entire sample. task can easily accomplished using filter() function dplyr package prior plotting. example, sample four random adolescents. Note use set.seed() function prior sampling, sets state R’s random number generator: results random sample reproducible. approach can also extended randomly select subsample individuals within different strata combining group_split() function dplyr package split data list different groups, map() function purrr package apply filter() call previous example group. example, sample two random adolescent males two random adolescent females, combine filtered data frames list back together using list_rbind() function purrr package.","code":"# Figure 2.2, page 25: deviant_tolerance_empgrowth <- deviant_tolerance_pp |> ggplot(aes(x = age, y = tolerance)) + geom_point() + coord_cartesian(ylim = c(0, 4)) + facet_wrap(vars(id), labeller = label_both) deviant_tolerance_empgrowth set.seed(345) deviant_tolerance_pp |> filter(id %in% sample(unique(id), size = 4)) #> # A tibble: 20 × 5 #> id age tolerance male exposure #> #> 1 268 11 1.45 1 0.9 #> 2 268 12 1.34 1 0.9 #> 3 268 13 1.99 1 0.9 #> 4 268 14 1.79 1 0.9 #> 5 268 15 1.34 1 0.9 #> 6 442 11 1.45 0 1.13 #> 7 442 12 1.99 0 1.13 #> 8 442 13 1.45 0 1.13 #> 9 442 14 1.67 0 1.13 #> 10 442 15 1.9 0 1.13 #> 11 569 11 1.79 0 1.99 #> 12 569 12 1.9 0 1.99 #> 13 569 13 1.9 0 1.99 #> 14 569 14 1.99 0 1.99 #> 15 569 15 1.99 0 1.99 #> 16 1105 11 1.34 1 1.38 #> 17 1105 12 1.9 1 1.38 #> 18 1105 13 1.99 1 1.38 #> 19 1105 14 1.9 1 1.38 #> 20 1105 15 2.12 1 1.38 set.seed(123) deviant_tolerance_pp |> group_split(male) |> map(\\(.group) filter(.group, id %in% sample(unique(id), size = 2))) |> list_rbind() #> # A tibble: 20 × 5 #> id age tolerance male exposure #> #> 1 442 11 1.45 0 1.13 #> 2 442 12 1.99 0 1.13 #> 3 442 13 1.45 0 1.13 #> 4 442 14 1.67 0 1.13 #> 5 442 15 1.9 0 1.13 #> 6 918 11 1 0 1.21 #> 7 918 12 1 0 1.21 #> 8 918 13 1.22 0 1.21 #> 9 918 14 1.99 0 1.21 #> 10 918 15 1.22 0 1.21 #> 11 268 11 1.45 1 0.9 #> 12 268 12 1.34 1 0.9 #> 13 268 13 1.99 1 0.9 #> 14 268 14 1.79 1 0.9 #> 15 268 15 1.34 1 0.9 #> 16 514 11 1.34 1 0.9 #> 17 514 12 1.67 1 0.9 #> 18 514 13 2.23 1 0.9 #> 19 514 14 2.12 1 0.9 #> 20 514 15 2.44 1 0.9"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"using-a-trajectory-to-summarize-each-persons-empirical-growth-record","dir":"Articles","previous_headings":"2.2 Descriptive analysis of individual change over time","what":"Using a Trajectory to Summarize Each Person’s Empirical Growth Record","title":"Chapter 2: Exploring longitudinal data on change","text":"person’s empirical growth record can summarized applying two standardized approaches: nonparametric approach uses nonparametric smooths summarize person’s pattern change time graphically without imposing specific functional form. primary advantage nonparametric approach requires assumptions. parametric approach uses separate parametric models fit person’s data summarize pattern change time. model uses common functional form trajectories (e.g., straight line, quadratic curve, etc.). primary advantage parametric approach provides numeric summaries trajectories can used exploration. Singer Willett (2003) recommend using approaches—beginning nonparametric approach—examining smoothed trajectories help select common functional form trajectories parametric approach.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"the-nonparametric-approach","dir":"Articles","previous_headings":"2.2 Descriptive analysis of individual change over time > Using a Trajectory to Summarize Each Person’s Empirical Growth Record","what":"The Nonparametric Approach","title":"Chapter 2: Exploring longitudinal data on change","text":"stat_smooth() function can used add nonparametric smooth layer empirical growth record plot. choice particular smoothing algorithm primarily matter convenience, ’ll use default loess smoother. span argument controls amount smoothing default loess smoother—smaller numbers producing wigglier lines larger numbers producing smoother lines; choose value creates similar smooth textbook figure. Singer Willett (2003) recommend focusing elevation, shape, tilt smoothed trajectories answering questions like: scores hover low, medium, high end scale? everyone change time people remain ? trajectories inflection point plateau? rate change steep shallow? overall functional form trajectory group-level? linear curvilinear? smooth step-like? Answering last question particularly important, help select common functional form trajectories parametric approach.","code":"# Figure 2.3, page 27: deviant_tolerance_empgrowth + stat_smooth(method = \"loess\", se = FALSE, span = .9)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"the-parametric-approach","dir":"Articles","previous_headings":"2.2 Descriptive analysis of individual change over time > Using a Trajectory to Summarize Each Person’s Empirical Growth Record","what":"The Parametric Approach","title":"Chapter 2: Exploring longitudinal data on change","text":"parametric approach, Singer Willett (2003) suggest using following three-step process: Estimate within-person linear model person data set. Collect summary statistics within-person linear model single data set. Add person’s fitted trajectory empirical growth record plot. begin, ’ll use lmList() function lme4 package fit common linear model adolescent data set. model formula lmList() function takes form response ~ terms | group. select straight line common functional form trajectories, age centred age 11. Next ’ll collect summary statistics within-person linear model single data set using tidy() function broom package. However, lmList() returns list models, need apply tidy() call model prior collecting summary statistics single data set. Ironically, also need tidy result tidy() prepare data plotting. Finally, can add person’s fitted trajectory empirical growth record plot using geom_abline() function. However, centred age linear model, need transform scale x-axis empirical growth plot centred well—otherwise ggplot2 able align fitted trajectories correctly. , must create custom transformation object using new_transform() function scales package, defines transformation, inverse, methods generating breaks labels. Alternatively, plan examine parametric trajectories graphically, three-step process suggested Singer Willett (2003) can skipped altogether using stat_smooth() function \"lm\" method. approach also fits within-person linear model person data set; drawback makes awkward (though impossible) access summary statistics model.","code":"deviant_tolerance_fit <- lmList( tolerance ~ I(age - 11) | id, pool = FALSE, data = deviant_tolerance_pp ) # Table 2.2, page 30: summary(deviant_tolerance_fit) #> Call: #> Model: tolerance ~ I(age - 11) | NULL #> Data: deviant_tolerance_pp #> #> Coefficients: #> (Intercept) #> Estimate Std. Error t value Pr(>|t|) #> 9 1.902 0.25194841 7.549165 4.819462e-03 #> 45 1.144 0.13335666 8.578499 3.329579e-03 #> 268 1.536 0.26038049 5.899059 9.725771e-03 #> 314 1.306 0.15265648 8.555156 3.356044e-03 #> 442 1.576 0.20786534 7.581832 4.759898e-03 #> 514 1.430 0.13794927 10.366130 1.915399e-03 #> 569 1.816 0.02572936 70.580844 6.267530e-06 #> 624 1.120 0.04000000 28.000000 1.000014e-04 #> 723 1.268 0.08442748 15.018806 6.407318e-04 #> 918 1.000 0.30444376 3.284679 4.626268e-02 #> 949 1.728 0.24118043 7.164760 5.600382e-03 #> 978 1.028 0.31995000 3.213002 4.884420e-02 #> 1105 1.538 0.15115555 10.174949 2.022903e-03 #> 1542 1.194 0.18032748 6.621287 7.015905e-03 #> 1552 1.184 0.37355321 3.169562 5.049772e-02 #> 1653 0.954 0.13925516 6.850734 6.366647e-03 #> I(age - 11) #> Estimate Std. Error t value Pr(>|t|) #> 9 0.119 0.10285751 1.1569404 0.33105320 #> 45 0.174 0.05444263 3.1960249 0.04948216 #> 268 0.023 0.10629989 0.2163690 0.84257784 #> 314 -0.030 0.06232175 -0.4813729 0.66318168 #> 442 0.058 0.08486067 0.6834733 0.54336337 #> 514 0.265 0.05631755 4.7054602 0.01816360 #> 569 0.049 0.01050397 4.6649040 0.01859462 #> 624 0.020 0.01632993 1.2247449 0.30806801 #> 723 -0.054 0.03446738 -1.5666989 0.21516994 #> 918 0.143 0.12428864 1.1505476 0.33330784 #> 949 -0.098 0.09846150 -0.9953129 0.39294486 #> 978 0.632 0.13061904 4.8384983 0.01683776 #> 1105 0.156 0.06170899 2.5279945 0.08557441 #> 1542 0.237 0.07361839 3.2193045 0.04861002 #> 1552 0.153 0.15250246 1.0032625 0.38965538 #> 1653 0.246 0.05685068 4.3271249 0.02275586 deviant_tolerance_tidy <- deviant_tolerance_fit |> map(tidy) |> list_rbind(names_to = \"id\") |> mutate( id = as.factor(id), term = case_when( term == \"(Intercept)\" ~ \"intercept\", term == \"I(age - 11)\" ~ \"slope\" ) ) deviant_tolerance_abline <- deviant_tolerance_tidy |> select(id:estimate) |> pivot_wider(names_from = term, values_from = estimate) deviant_tolerance_abline #> # A tibble: 16 × 3 #> id intercept slope #> #> 1 9 1.90 0.119 #> 2 45 1.14 0.174 #> 3 268 1.54 0.0230 #> 4 314 1.31 -0.0300 #> 5 442 1.58 0.0580 #> 6 514 1.43 0.265 #> 7 569 1.82 0.0490 #> 8 624 1.12 0.0200 #> 9 723 1.27 -0.0540 #> 10 918 1 0.143 #> 11 949 1.73 -0.0980 #> 12 978 1.03 0.632 #> 13 1105 1.54 0.156 #> 14 1542 1.19 0.237 #> 15 1552 1.18 0.153 #> 16 1653 0.954 0.246 transform_centre <- function(subtract) { new_transform( \"centre\", transform = \\(x) x - subtract, inverse = \\(x) x + subtract ) } # Figure 2.5, page 32: deviant_tolerance_empgrowth + geom_abline( aes(intercept = intercept, slope = slope), data = deviant_tolerance_abline ) + scale_x_continuous(transform = transform_centre(11)) deviant_tolerance_empgrowth + stat_smooth(method = \"lm\", se = FALSE)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"exploring-differences-in-change-across-people","dir":"Articles","previous_headings":"","what":"2.3 Exploring differences in change across people","title":"Chapter 2: Exploring longitudinal data on change","text":"explored individual changes time, Section 2.3 Singer Willett (2003) continue deviant_tolerance_pp data set demonstrate three strategies exploring interindividual differences change: Plotting entire set individual trajectories together, along average change trajectory entire group. Individual trajectories can either compared one another examine similarities differences changes across people, average change trajectory compare individual change group change. Conducting descriptive analyses key model parameters, estimated intercepts slopes individual change trajectory models. Exploring relationship change time-invariant predictors. relationship can explored plots statistical modelling.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"plotting-the-entire-set-of-trajectories-together","dir":"Articles","previous_headings":"2.3 Exploring differences in change across people","what":"Plotting the entire set of trajectories together","title":"Chapter 2: Exploring longitudinal data on change","text":"purpose first strategy answer generic questions change, : direction rate change similar different across people? individual change compare group-averaged change trajectory? strategy, Singer Willett (2003) suggest using nonparametric parametric approaches, certain patterns data may somewhat easier interpret using one approach .","code":"deviant_tolerance_grptraj <- map( list(\"loess\", \"lm\"), \\(.method) { deviant_tolerance_pp |> mutate(method = .method) |> ggplot(mapping = aes(x = age, y = tolerance)) + stat_smooth( aes(linewidth = \"individual\", group = id), method = .method, se = FALSE, span = .9 ) + stat_smooth( aes(linewidth = \"average\"), method = .method, se = FALSE, span = .9 ) + scale_linewidth_manual(values = c(2, .25)) + coord_cartesian(ylim = c(0, 4)) + facet_wrap(vars(method), labeller = label_both) + labs(linewidth = \"trajectory\") } ) # Figure 2.6, page 34: wrap_plots(deviant_tolerance_grptraj) + plot_layout(guides = \"collect\", axes = \"collect\")"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"conducting-descriptive-analyses-of-key-model-parameters","dir":"Articles","previous_headings":"2.3 Exploring differences in change across people","what":"Conducting descriptive analyses of key model parameters","title":"Chapter 2: Exploring longitudinal data on change","text":"purpose second strategy answer specific questions behaviour key parameters individual change trajectory models, : average initial status average annual rate change? observed variability initial status annual rate change? relationship initial status annual rate change? strategy, Singer Willett (2003) suggest examining estimated intercepts slopes fitted linear models following three summary statistics: sample mean, summarizes average initial status (intercept) annual rate change (slope) across sample. sample variance standard deviation, summarize amount observed interindividual heterogeneity initial status annual rate change. sample correlation, summarizes strength direction relationship initial status annual rate change. sample mean, variance, standard deviation can computed together tidied model fits saved earlier using combination group_by() summarise() functions dplyr package. sample correlation needs computed separate step, requires additional transformations tidied model fits . use correlate() function corrr package—part tidymodels universe packages—API designed data pipelines mind.","code":"# Table 2.3, page 37: deviant_tolerance_tidy |> group_by(term) |> summarise( mean = mean(estimate), var = var(estimate), sd = sd(estimate) ) #> # A tibble: 2 × 4 #> term mean var sd #> #> 1 intercept 1.36 0.0887 0.298 #> 2 slope 0.131 0.0297 0.172 deviant_tolerance_tidy |> select(id, term, estimate) |> pivot_wider(names_from = term, values_from = estimate) |> select(-id) |> correlate() |> stretch(na.rm = TRUE, remove.dups = TRUE) #> # A tibble: 1 × 3 #> x y r #> #> 1 intercept slope -0.448"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"exploring-the-relationship-between-change-and-time-invariant-predictors","dir":"Articles","previous_headings":"2.3 Exploring differences in change across people","what":"Exploring the relationship between change and time-invariant predictors","title":"Chapter 2: Exploring longitudinal data on change","text":"purpose final strategy answer questions systematic interindividual differences change, : observed (average) initial status (average) annual rate change differ across levels values time-invariant predictors? relationship initial status annual rate change time-invariant predictors? strategy, Singer Willett (2003) suggest using two approaches: Plotting (smoothed) individual growth trajectories, displayed separately groups distinguished important values time-invariant predictors. categorical predictors, level predictor can used. continuous predictors, values can temporarily categorized purpose display. Conducting exploratory analyses relationship change time-invariant predictors, investigating whether estimated intercepts slopes individual change trajectory models vary systematically different predictors.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"the-plotting-approach","dir":"Articles","previous_headings":"2.3 Exploring differences in change across people > Exploring the relationship between change and time-invariant predictors","what":"The plotting approach","title":"Chapter 2: Exploring longitudinal data on change","text":"plotting approach, can adapt code used earlier plot entire set trajectories together, simply changing variable ’ll facet . facet categorical predictor male continuous predictor exposure, split median purposes display. examining plots like , Singer Willett (2003) recommend recommend looking systematic patterns trajectories answer questions like: observed trajectories differ across groups? observed differences appear intercepts slopes? observed trajectories groups heterogeneous others? also wished conduct descriptive analyses key model parameters groups use update() function update refit common linear model different subsets data. store model fits subgroup list ’re easier iterate upon together. example, descriptive analysis intercepts slopes males females.","code":"deviant_tolerance_grptraj_by <- map( list(male = \"male\", exposure = \"exposure\"), \\(.by) { deviant_tolerance_pp |> mutate( exposure = if_else(exposure < median(exposure), \"low\", \"high\"), exposure = factor(exposure, levels = c(\"low\", \"high\")) ) |> ggplot(aes(x = age, y = tolerance)) + stat_smooth( aes(linewidth = \"individual\", group = id), method = \"lm\", se = FALSE, span = .9 ) + stat_smooth( aes(linewidth = \"average\"), method = \"lm\", se = FALSE, span = .9 ) + scale_linewidth_manual(values = c(2, .25)) + coord_cartesian(ylim = c(0, 4)) + facet_wrap(.by, labeller = label_both) + labs(linewidth = \"trajectory\") } ) # Figure 2.7, page 38: wrap_plots(deviant_tolerance_grptraj_by, ncol = 1, guides = \"collect\") tolerance_fit_sex <- list( male = update(deviant_tolerance_fit, subset = male == 1), female = update(deviant_tolerance_fit, subset = male == 0) ) tolerance_fit_exposure <- list( low = update(deviant_tolerance_fit, subset = exposure < 1.145), high = update(deviant_tolerance_fit, subset = exposure >= 1.145) ) tolerance_fit_sex |> map( \\(.fit_sex) { .fit_sex |> map(tidy) |> list_rbind(names_to = \"id\") |> group_by(term) |> summarise( mean = mean(estimate), sd = sd(estimate) ) } ) |> list_rbind(names_to = \"sex\") #> # A tibble: 4 × 4 #> sex term mean sd #> #> 1 male (Intercept) 1.36 0.264 #> 2 male I(age - 11) 0.167 0.238 #> 3 female (Intercept) 1.36 0.338 #> 4 female I(age - 11) 0.102 0.106"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"the-exploratory-analysis-approach","dir":"Articles","previous_headings":"2.3 Exploring differences in change across people > Exploring the relationship between change and time-invariant predictors","what":"The exploratory analysis approach","title":"Chapter 2: Exploring longitudinal data on change","text":"exploratory analysis approach, Singer Willett (2003) recommend restricting simplest approaches examine relationship change time-invariant predictors—bivariate scatter plots sample correlations. reasoning restriction twofold: statistical models presented chapter intended descriptive exploratory purposes ; estimates known biases make imperfect measures person’s true initial status true rate change. models soon replaced multilevel model change Chapter 3, better-suited modelling longitudinal data. plotting computations, first need add adolescent’s male exposure values deviant_tolerance_tidy data frame. easily done using left_join() function dplyr package, performs mutating join add columns one data frame another, matching observations based keys. join selection columns person-level deviant_tolerance_pl data set: Specifically, id column, exists data frames thus used joining; two time-invariant predictors male exposure. ’ll also create new sex variable, can use instead male plotting. Now can create bivariate scatter plots. Note use .data pronoun inside call aes()—.data pronoun special construct tidyverse allows us treat character vector variable names environment variables, work expected way arguments use non-standard evaluation. learn .data pronoun, see dplyr package’s Programming dplyr vignette. Finally, can compute correlation intercepts slopes individual change trajectory models time-invariant predictors male exposure. use cor() function rather corrr::correlate() since just want return correlation value, correlation data frame.","code":"deviant_tolerance_tidy_2 <- deviant_tolerance_tidy |> left_join(select(deviant_tolerance_pl, id, male, exposure)) |> mutate(sex = if_else(male == 0, \"female\", \"male\")) deviant_tolerance_tidy_2 #> # A tibble: 32 × 9 #> id term estimate std.error statistic p.value male exposure sex #> #> 1 9 intercept 1.90 0.252 7.55 0.00482 0 1.54 female #> 2 9 slope 0.119 0.103 1.16 0.331 0 1.54 female #> 3 45 intercept 1.14 0.133 8.58 0.00333 1 1.16 male #> 4 45 slope 0.174 0.0544 3.20 0.0495 1 1.16 male #> 5 268 intercept 1.54 0.260 5.90 0.00973 1 0.9 male #> 6 268 slope 0.0230 0.106 0.216 0.843 1 0.9 male #> 7 314 intercept 1.31 0.153 8.56 0.00336 0 0.81 female #> 8 314 slope -0.0300 0.0623 -0.481 0.663 0 0.81 female #> 9 442 intercept 1.58 0.208 7.58 0.00476 0 1.13 female #> 10 442 slope 0.0580 0.0849 0.683 0.543 0 1.13 female #> # ℹ 22 more rows deviant_tolerance_biplot <- map( list(sex = \"sex\", exposure = \"exposure\"), \\(.x) { ggplot(deviant_tolerance_tidy_2, aes(x = .data[[.x]], y = estimate)) + geom_point() + facet_wrap(vars(term), ncol = 1, scales = \"free_y\") } ) # Figure 2.8, page 40: wrap_plots(deviant_tolerance_biplot) + plot_layout(axes = \"collect\") # Correlation values shown in Figure 2.8, page 40: deviant_tolerance_tidy_2 |> group_by(term) |> summarise( male_cor = cor(estimate, male), exposure_cor = cor(estimate, exposure) ) #> # A tibble: 2 × 3 #> term male_cor exposure_cor #> #> 1 intercept 0.00863 0.232 #> 2 slope 0.194 0.442"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-3.html","id":"what-is-the-purpose-of-the-multilevel-model-for-change","dir":"Articles","previous_headings":"","what":"3.1 What Is the Purpose of the Multilevel Model for Change?","title":"Chapter 3: Introducing the multilevel model for change","text":"Chapter 3 Singer Willett (2003) develop explain multilevel model change using subset data Burchinal, Campbell, Bryant, Wasik, Ramey (1997), measured effect early educational intervention cognitive performance sample African-American children ages 12, 18, 24 months (.e., 1.0, 1.5, 2.0 years). example use early_intervention data set, person-period data frame 309 rows 4 columns: id: Child ID. age: Age years time measurement. treatment: Treatment condition (control = 0, intervention = 1). cognitive_score: Cognitive performance score one two standardized intelligence tests: Bayley Scales Infant Development (Bayley, 1969) 12 18 months, Stanford Binet (Terman & Merrill, 1972) 24 months. Note reasons participant privacy early_intervention data set uses simulated data rather real data used Singer Willett (2003), examples presented article similar identical results presented text. motivate need multilevel model change, begin basic exploration description early_intervention data. Starting age variable, can see child’s cognitive performance measured three occasions ages 1.0, 1.5, 2.0 years, thus early_intervention data uses time-structured design. Next ’ll look time-invariant treatment variable. ’re summarizing time-invariant predictor, ’ll transform data person-level format pivot_wider() function tidyr package summarizing. total 58 children (56.3%) assigned participate early educational intervention, remaining 45 children (43.7%) participating control group. Singer Willett (2003) discuss, kind statistical model needed represent change processes longitudinal data like must include components two levels: level-1 submodel describes individuals change time, can address questions within-person change. level-2 submodel describes changes vary across individuals, can address questions -person differences change. Together, two components form multilevel model (also known linear mixed-effects model mixed model) change.","code":"# Table 3.1, page 48: early_intervention #> # A tibble: 309 × 4 #> id age treatment cognitive_score #> #> 1 1 1 1 106. #> 2 1 1.5 1 91.7 #> 3 1 2 1 74.2 #> 4 2 1 1 112. #> 5 2 1.5 1 114. #> 6 2 2 1 119. #> 7 3 1 0 90.4 #> 8 3 1.5 0 94.7 #> 9 3 2 0 80.4 #> 10 4 1 1 103. #> # ℹ 299 more rows measurement_occasions <- unique(early_intervention$age) measurement_occasions #> [1] 1.0 1.5 2.0 early_intervention |> group_by(id) |> summarise(all_occasions = identical(age, measurement_occasions)) |> pull(all_occasions) |> unique() #> [1] TRUE early_intervention_pl <- pivot_wider( early_intervention, names_from = age, names_prefix = \"age_\", values_from = cognitive_score ) early_intervention_pl #> # A tibble: 103 × 5 #> id treatment age_1 age_1.5 age_2 #> #> 1 1 1 106. 91.7 74.2 #> 2 2 1 112. 114. 119. #> 3 3 0 90.4 94.7 80.4 #> 4 4 1 103. 101. 93.9 #> 5 5 1 103. 75.0 71.7 #> 6 6 0 106. 96.8 93.5 #> 7 7 1 136. 117. 119. #> 8 8 0 79.8 69.3 67.5 #> 9 9 1 113. 105. 108. #> 10 10 1 88.2 87.5 85.3 #> # ℹ 93 more rows early_intervention_pl |> group_by(treatment) |> summarise(count = n()) |> mutate(proportion = count / sum(count)) #> # A tibble: 2 × 3 #> treatment count proportion #> #> 1 0 45 0.437 #> 2 1 58 0.563"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-3.html","id":"the-level-1-submodel-for-individual-change","dir":"Articles","previous_headings":"","what":"3.2 The Level-1 Submodel for Individual Change","title":"Chapter 3: Introducing the multilevel model for change","text":"Section 3.2 Singer Willett (2003) introduce level-1 component multilevel model change: submodel individual change—also known individual growth model—represents individual change outcome variable expect occur time period study. individual growth model specifies common functional form individual trajectories, Singer Willett (2003) suggest preceding level-1 submodel specification visual inspection empirical growth plots order select parsimonious functional form observed data reasonably come . Singer Willett (2003) identify several questions helpful answer examining empirical growth plots aid model specification: type population individual growth model generated observed data? population individual growth model linear curvilinear time? Smooth jagged? Continuous disjoint? nonlinearities observed data consistent across individuals? Might due measurement error random error? many measurement occasions ? many? early_intervention data three measurement occasions individual, nonlinearities observed data likely due measurement error random error, Singer Willett (2003) specify simple linear model level-1 submodel: \\[ \\text{cognitive_score}_{ij} = \\pi_{0i} + \\pi_{1i} (\\text{age}_{ij} - 1) + \\epsilon_{ij}, \\] asserts \\(\\text{cognitive_score}_{ij}\\)—true value cognitive_score \\(\\)th child \\(j\\)th time—linear function age measurement occasion, \\(\\text{age}_{ij}\\), deviations linearity observed data time result random error, \\(\\epsilon_{ij}\\). \\(\\text{age}\\) centred \\((\\text{age} - 1)\\) model intercept, \\(\\pi_{0i}\\), represents \\(\\)th child’s true initial status—, true \\(\\text{cognitive_score}\\) value age 1. preferable using \\(\\text{age}\\) centring, intercept model instead represent \\(\\)th child’s true value cognitive_score age 0, meaning: (1) uncentred \\(\\text{age}\\) model must predict beyond temporal limits early_intervention data; (2) assume individual trajectories extend back time birth linearly age. Finally, model slope, \\(\\pi_{1i}\\), represents true rate change \\(\\)th child’s true \\(\\text{cognitive_score}\\) time—case, true annual rate change.","code":"set.seed(567) # Figure 3.1, page 50: early_intervention |> filter(id %in% sample(unique(id), size = 8)) |> ggplot(aes(x = age, y = cognitive_score)) + geom_point() + stat_smooth(method = \"lm\", se = FALSE) + scale_x_continuous(breaks = c(1, 1.5, 2)) + coord_cartesian(ylim = c(50, 150)) + facet_wrap(vars(id), ncol = 4, labeller = label_both)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-3.html","id":"relating-the-level-1-submodel-to-the-exploratory-methods-of-chapter-2","dir":"Articles","previous_headings":"3.2 The Level-1 Submodel for Individual Change","what":"Relating the Level-1 Submodel to the Exploratory Methods of Chapter 2","title":"Chapter 3: Introducing the multilevel model for change","text":"fitting model, find helpful introduce Gelman Hill’s (2006) concepts complete pooling, pooling, partial pooling relation exploratory linear models fit Chapter 2 summarize average individual patterns change time. complete pooling model, group indicators individuals included model, single average trajectory using information individuals fitted. definition, complete pooling model ignores variation individuals, making unsuitable analyzing individual change time. corresponds average trajectory group trajectory plot Section 2.3. pooling model, separate within-person models fit person’s data, resulting individual trajectory individual. However, individual’s fitted trajectory ignores information individuals, pooling models tend overfit data within individual—potentially overstating variation individuals, making interindividual differences initial status rate change look different actually . corresponds individual trajectories group trajectory plot Section 2.3. partial pooling model represents compromise two extremes complete pooling partial pooling, wherein fit single model—individual growth model—takes account information individual, one hand, average individuals, , determine individual’s fitted trajectory. , partial pooling approach regularizes individual’s fitted trajectory—pulling extreme initial statuses rates change towards overall average. can fit level-1 submodel individual change using lmer() function lme4 package. model formula lmer() function takes form response ~ fixed_effects + random_effects. random effects term takes form (varying_effects | group), left side vertical bar defining variable(s) allowed vary across groups, right side grouping variable. Note , order match maximum likelihood method used Singer Willett (2003) chapter, also set REML argument FALSE model fit using full maximum likelihood (FML) estimation rather restricted maximum likelihood (REML). can visualize differences complete pooling, pooling, partial pooling adding fitted trajectory model empirical growth plots random sample children early_intervention data. , ’ll first use augment() function broom broom.mixed packages get predicted values child’s fitted trajectory complete pooling, pooling, partial pooling models. Note augment() generic function, whose methods linear multilevel models available broom broom.mixed packages, respectively. Next, ’ll tidy sample predicted values data. Finally, can plot empirical growth plots randomly sampled children, along fitted trajectory complete pooling, pooling, partial pooling models. Examining plots, differences complete pooling, pooling, partial pooling become apparent: complete pooling model estimates single average trajectory, ignoring variation individuals, child predicted exact trajectory. pooling model estimates unique trajectory individual closely follows observed data. partial pooling model estimates unique trajectory individual, sometimes closely follows observed data, times lies somewhere -complete pooling pooling trajectories. Conceptually, partial pooling model similar pooling model: approaches model individual change outcome variable expect occur time period study, allowing individuals vary initial status rate change. However, partial pooling model takes account relative amount information individual average individuals, trajectories pulled towards overall average—stronger pull extreme individual’s observed data. case early_intervention data, children’s cognitive performance declined time, apparent average trajectory complete pooling model, entire set trajectories pooling model together. Although rate decline varied across children, showed improvement. Given , cases show improvement given less weight partial pooling model, estimates pulled strongly towards group average. full extent effect can seen plotting entire set fitted trajectories partial pooling model together, along model’s population average trajectory.","code":"early_intervention_fit_cp <- lm( cognitive_score ~ I(age - 1), data = early_intervention ) early_intervention_fit_np <- lmList( cognitive_score ~ I(age - 1) | id, pool = FALSE, data = early_intervention ) early_intervention_fit_1 <- lmer( cognitive_score ~ I(age - 1) + (1 + I(age - 1) | id), data = early_intervention, REML = FALSE ) summary(early_intervention_fit_1) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: cognitive_score ~ I(age - 1) + (1 + I(age - 1) | id) #> Data: early_intervention #> #> AIC BIC logLik deviance df.resid #> 2412.7 2435.1 -1200.4 2400.7 303 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -2.08234 -0.50220 0.04103 0.53133 2.40849 #> #> Random effects: #> Groups Name Variance Std.Dev. Corr #> id (Intercept) 137.02 11.706 #> I(age - 1) 53.60 7.321 -0.46 #> Residual 69.24 8.321 #> Number of obs: 309, groups: id, 103 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) 109.881 1.375 79.92 #> I(age - 1) -16.913 1.366 -12.38 #> #> Correlation of Fixed Effects: #> (Intr) #> I(age - 1) -0.561 # Because the complete pooling model does not have a group indicator for # individuals, we need to manually add the IDs to the predicted values. early_intervention_pred_cp <- early_intervention_fit_cp |> augment() |> mutate(model = \"complete_pooling\", id = early_intervention$id) early_intervention_pred_cp #> # A tibble: 309 × 10 #> cognitive_score `I(age - 1)` .fitted .resid .hat .sigma .cooksd .std.resid #> > #> 1 106. 0 110. -4.04 0.00809 13.8 3.52e-4 -0.294 #> 2 91.7 0.5 101. -9.74 0.00324 13.8 8.11e-4 -0.707 #> 3 74.2 1 93.0 -18.7 0.00809 13.8 7.58e-3 -1.36 #> 4 112. 0 110. 2.33 0.00809 13.8 1.17e-4 0.169 #> 5 114. 0.5 101. 12.5 0.00324 13.8 1.33e-3 0.905 #> 6 119. 1 93.0 26.3 0.00809 13.7 1.49e-2 1.91 #> 7 90.4 0 110. -19.5 0.00809 13.8 8.19e-3 -1.42 #> 8 94.7 0.5 101. -6.69 0.00324 13.8 3.82e-4 -0.485 #> 9 80.4 1 93.0 -12.6 0.00809 13.8 3.42e-3 -0.916 #> 10 103. 0 110. -6.77 0.00809 13.8 9.88e-4 -0.492 #> # ℹ 299 more rows #> # ℹ 2 more variables: model , id # Because the no pooling models are separate linear models stored in a list, we # need to apply the augment() call to each model, then bind the predicted values # from each model into a single data set. Here the individual ID for each model # is stored in the name of list entry, which we add to the data frame using the # `names_to` argument of list_rbind(). early_intervention_pred_np <- early_intervention_fit_np |> map(augment) |> list_rbind(names_to = \"id\") |> mutate(model = \"no_pooling\") early_intervention_pred_np #> # A tibble: 309 × 10 #> id cognitive_score `I(age - 1)` .fitted .resid .hat .sigma .cooksd #> > #> 1 1 106. 0 106. -0.551 0.833 NaN 2.50 #> 2 1 91.7 0.5 90.6 1.10 0.333 NaN 0.25 #> 3 1 74.2 1 74.8 -0.551 0.833 Inf 2.50 #> 4 2 112. 0 112. 0.610 0.833 NaN 2.50 #> 5 2 114. 0.5 115. -1.22 0.333 NaN 0.25 #> 6 2 119. 1 119. 0.610 0.833 Inf 2.50 #> 7 3 90.4 0 93.5 -3.12 0.833 NaN 2.50 #> 8 3 94.7 0.5 88.5 6.23 0.333 NaN 0.25 #> 9 3 80.4 1 83.5 -3.12 0.833 Inf 2.50 #> 10 4 103. 0 104. -0.686 0.833 NaN 2.50 #> # ℹ 299 more rows #> # ℹ 2 more variables: .std.resid , model # Nothing special needs to be done for the partial pooling model, aside from # having the broom.mixed package loaded. early_intervention_pred_pp <- early_intervention_fit_1 |> augment() |> mutate(model = \"partial_pooling\") early_intervention_pred_pp #> # A tibble: 309 × 15 #> cognitive_score `I(age - 1)` id .fitted .resid .hat .cooksd .fixed .mu #> > #> 1 106. 0 1 103. 3.11 0.439 0.0974 110. 103. #> 2 91.7 0.5 1 92.6 -0.939 0.276 0.00336 101. 92.6 #> 3 74.2 1 1 82.5 -8.29 0.395 0.535 93.0 82.5 #> 4 112. 0 2 118. -5.90 0.439 0.352 110. 118. #> 5 114. 0.5 2 112. 1.42 0.276 0.00765 101. 112. #> 6 119. 1 2 107. 12.4 0.395 1.20 93.0 107. #> 7 90.4 0 3 97.7 -7.34 0.439 0.543 110. 97.7 #> 8 94.7 0.5 3 90.7 4.07 0.276 0.0632 101. 90.7 #> 9 80.4 1 3 83.6 -3.21 0.395 0.0803 93.0 83.6 #> 10 103. 0 4 107. -3.71 0.439 0.139 110. 107. #> # ℹ 299 more rows #> # ℹ 6 more variables: .offset , .sqrtXwt , .sqrtrwt , #> # .weights , .wtres , model # Finally, we can bind the predicted values from the models into a single data # frame. early_intervention_preds <- bind_rows( early_intervention_pred_cp, early_intervention_pred_np, early_intervention_pred_pp ) set.seed(333) early_intervention_preds_tidy <- early_intervention_preds |> select(model, id, cognitive_score, age = `I(age - 1)`, .fitted) |> mutate( id = factor(id, levels = unique(id)), age = as.numeric(age + 1) ) |> filter(id %in% sample(unique(id), size = 8)) early_intervention_preds_tidy #> # A tibble: 72 × 5 #> model id cognitive_score age .fitted #> #> 1 complete_pooling 2 112. 1 110. #> 2 complete_pooling 2 114. 1.5 101. #> 3 complete_pooling 2 119. 2 93.0 #> 4 complete_pooling 6 106. 1 110. #> 5 complete_pooling 6 96.8 1.5 101. #> 6 complete_pooling 6 93.5 2 93.0 #> 7 complete_pooling 14 88.7 1 110. #> 8 complete_pooling 14 97.6 1.5 101. #> 9 complete_pooling 14 81.3 2 93.0 #> 10 complete_pooling 39 111. 1 110. #> # ℹ 62 more rows ggplot(early_intervention_preds_tidy, aes(x = age, group = id)) + geom_point(aes(y = cognitive_score)) + geom_line(aes(y = .fitted, colour = model, group = model), linewidth = .75) + scale_x_continuous(breaks = c(1, 1.5, 2)) + scale_colour_brewer(palette = \"Dark2\") + coord_cartesian(ylim = c(50, 150)) + facet_wrap(vars(id), nrow = 2, labeller = label_both) # Figure 3.3, page 57: early_intervention |> ggplot(mapping = aes(x = age, y = cognitive_score)) + stat_smooth( aes(linewidth = \"no_pooling\", group = id), method = \"lm\", se = FALSE, span = .9 ) + stat_smooth( aes(linewidth = \"complete_pooling\"), method = \"lm\", se = FALSE, span = .9 ) + scale_linewidth_manual(values = c(2, .25)) + scale_x_continuous(breaks = c(1, 1.5, 2)) + coord_cartesian(ylim = c(50, 150)) + labs(linewidth = \"model\") early_intervention_fit_1 |> augment() |> select(-cognitive_score) |> rename(cognitive_score = .fitted, age = `I(age - 1)`) |> mutate(age = as.numeric(age + 1)) |> ggplot(aes(x = age, y = cognitive_score)) + geom_line(aes(linewidth = \"individual\", group = id), colour = \"#3366FF\") + # We'll use predict() rather than augment() to get the population-level # predictions, due to some currently bad behaviour in augment() when making # predictions on new data: https://github.com/bbolker/broom.mixed/issues/141 geom_line( aes(linewidth = \"average\"), data = tibble( age = measurement_occasions, cognitive_score = predict( early_intervention_fit_1, tibble(age = measurement_occasions), re.form = NA ) ), colour = \"#3366FF\" ) + scale_linewidth_manual(values = c(2, .25)) + scale_x_continuous(breaks = c(1, 1.5, 2)) + coord_cartesian(ylim = c(50, 150)) + labs(linewidth = \"trajectory\")"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-3.html","id":"the-level-2-submodel-for-systematic-interindividual-differences-in-change","dir":"Articles","previous_headings":"","what":"3.3 The Level-2 Submodel for Systematic Interindividual Differences in Change","title":"Chapter 3: Introducing the multilevel model for change","text":"Section 3.3 Singer Willett (2003) introduce level-2 component multilevel model change—submodel systematic interindividual differences change—defined four specific features: outcomes must individual growth parameters level-1 submodel, \\(\\pi_{0i}\\) \\(\\pi_{1i}\\). must one level-2 submodel level-1 individual growth parameter, must written separate parts. level-2 submodel must specify relationship individual growth parameter level-2 time-invariant predictors. level-2 submodel must allow individuals share common predictor values vary individual change trajectories. level-2 submodel must simultaneously account -group differences individual growth parameters within-group differences change, Singer Willett (2003) suggest preceding level-2 submodel specification visual inspection individual growth trajectories stratified levels time-invariant predictor(s), order identify kind population model give rise observed patterns. early_intervention data, single time-invariant predictor, Singer Willett (2003) specify following level-2 submodel: \\[ \\begin{align} \\pi_{0i} &= \\gamma_{00} + \\gamma_{01} \\text{treatment}_i + \\zeta_{0i} \\\\ \\pi_{1i} &= \\gamma_{10} + \\gamma_{11} \\text{treatment}_i + \\zeta_{1i}, \\end{align} \\] asserts individual growth parameters level-1 submodel, \\(\\pi_{0i}\\) \\(\\pi_{1i}\\), treated level-2 outcomes linear function \\(\\)th child’s treatment status, \\(\\text{treatment}_i\\). parameters \\(\\gamma_{00}\\) \\(\\gamma_{10}\\) level-2 intercepts; parameters \\(\\gamma_{01}\\) \\(\\gamma_{11}\\) level-2 slopes. Collectively, four level-2 parameters known fixed effects. Finally, parameters \\(\\zeta_{0i}\\) \\(\\zeta_{1i}\\) level-2 residuals allow value individual’s growth parameters scattered around respective population averages. Collectively, two level-2 parameters known random effects, assume bivariate normally distributed mean 0, unknown variances, \\(\\sigma_0^2\\) \\(\\sigma_1^2\\), unknown covariance, \\(\\sigma_{01}\\): \\[ \\begin{align} \\begin{bmatrix} \\zeta_{0i} \\\\ \\zeta_{1i} \\end{bmatrix} & \\sim \\operatorname{N} \\begin{pmatrix} \\begin{bmatrix} 0 \\\\ 0 \\end{bmatrix}, \\begin{bmatrix} \\sigma_0^2 & \\sigma_{01}\\\\ \\sigma_{01} & \\sigma_1^2 \\end{bmatrix} \\end{pmatrix}. \\end{align} \\]","code":"# Figure 3.4, page 59: ggplot(early_intervention, mapping = aes(x = age, y = cognitive_score)) + stat_smooth( aes(linewidth = \"individual\", group = id), method = \"lm\", se = FALSE, span = .9 ) + stat_smooth( aes(linewidth = \"average\"), method = \"lm\", se = FALSE, span = .9 ) + scale_linewidth_manual(values = c(2, .25)) + scale_x_continuous(breaks = c(1, 1.5, 2)) + coord_cartesian(ylim = c(50, 150)) + facet_wrap(vars(treatment), labeller = label_both) + labs(linewidth = \"trajectory\") # TODO: Decide whether or not to do the bottom plots."},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-3.html","id":"fitting-the-multilevel-model-for-change-to-data","dir":"Articles","previous_headings":"","what":"3.4 Fitting the Multilevel Model for Change to Data","title":"Chapter 3: Introducing the multilevel model for change","text":"Putting level-1 level-2 submodels together, multilevel model change early_intervention data looks like: \\[ \\begin{alignat}{3} & \\text{Level 1:} \\qquad & \\text{cognitive_score}_{ij} &= \\pi_{0i} + \\pi_{1i} (\\text{age}_{ij} - 1) + \\epsilon_{ij} \\\\ & \\text{Level 2:} \\qquad & \\pi_{0i} &= \\gamma_{00} + \\gamma_{01} \\text{treatment}_i + \\zeta_{0i} \\\\ & & \\pi_{1i} &= \\gamma_{10} + \\gamma_{11} \\text{treatment}_i + \\zeta_{1i}. \\end{alignat} \\] fitting model, find helpful substitute level-2 equations level-1, yielding single reduced equation. Although mixed model equation mathematically identical multilevel model equation , substituting equations helpful practice two reasons: makes parameters model easier identify, particularly case interactions level-1 level-2 predictors. format mixed-effects modelling packages R use specify model formula. clarifies statistical model actually fit data software. \\[ \\begin{align} \\text{cog}_{ij} &= \\pi_{0i} + \\pi_{1i} (\\text{age}_{ij} - 1) + \\epsilon_{ij} \\\\ &= \\gamma_{00} + \\gamma_{01} \\text{trt}_i + \\pi_{1i} (\\text{age}_{ij} - 1) + \\epsilon_{ij} + \\zeta_{0i} \\\\ &= \\gamma_{00} + \\gamma_{01} \\text{trt}_i + (\\gamma_{10} + \\gamma_{11} \\text{trt}_i + \\zeta_{1i})(\\text{age}_{ij} - 1) + \\epsilon_{ij} + \\zeta_{0i} \\\\ &= \\underbrace{ \\gamma_{00} + \\gamma_{01} \\text{trt}_i + \\gamma_{10}(\\text{age}_{ij} - 1) + \\gamma_{11} \\text{trt}_i(\\text{age}_{ij} - 1) }_{\\text{Fixed Effects}} + \\underbrace{ \\epsilon_{ij} + \\zeta_{0i} + \\zeta_{1i}(\\text{age}_{ij} - 1). }_{\\text{Random Effects}} \\end{align} \\] can now fit multilevel model change early_intervention data. Based equation , need update level-1 submodel include predictors treatment, interaction treatment age. Alternatively, start scratch, can specify final multilevel model change like .","code":"update( early_intervention_fit_1, . ~ . + treatment + treatment:I(age - 1) ) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: cognitive_score ~ I(age - 1) + (1 + I(age - 1) | id) + treatment + #> I(age - 1):treatment #> Data: early_intervention #> AIC BIC logLik deviance df.resid #> 2402.540 2432.407 -1193.270 2386.540 301 #> Random effects: #> Groups Name Std.Dev. Corr #> id (Intercept) 11.564 #> I(age - 1) 6.754 -0.57 #> Residual 8.321 #> Number of obs: 309, groups: id, 103 #> Fixed Effects: #> (Intercept) I(age - 1) treatment #> 107.822 -20.123 3.657 #> I(age - 1):treatment #> 5.702 early_intervention_fit <- lmer( cognitive_score ~ I(age - 1) * treatment + (1 + I(age - 1) | id), data = early_intervention, REML = FALSE ) # Table 3.3, page 69: summary(early_intervention_fit) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: cognitive_score ~ I(age - 1) * treatment + (1 + I(age - 1) | id) #> Data: early_intervention #> #> AIC BIC logLik deviance df.resid #> 2402.5 2432.4 -1193.3 2386.5 301 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -2.04567 -0.48714 0.04639 0.53367 2.32828 #> #> Random effects: #> Groups Name Variance Std.Dev. Corr #> id (Intercept) 133.74 11.564 #> I(age - 1) 45.61 6.754 -0.57 #> Residual 69.24 8.321 #> Number of obs: 309, groups: id, 103 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) 107.822 2.063 52.276 #> I(age - 1) -20.123 2.023 -9.949 #> treatment 3.657 2.749 1.330 #> I(age - 1):treatment 5.702 2.695 2.116 #> #> Correlation of Fixed Effects: #> (Intr) I(g-1) trtmnt #> I(age - 1) -0.605 #> treatment -0.750 0.454 #> I(g-1):trtm 0.454 -0.750 -0.605"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-3.html","id":"examining-estimated-fixed-effects","dir":"Articles","previous_headings":"","what":"3.5 Examining Estimated Fixed Effects","title":"Chapter 3: Introducing the multilevel model for change","text":"Section 3.5 Singer Willett (2003) explain two ways interpret fixed effects estimates multilevel model change: Interpreting fixed effects coefficients directly. Plotting fitted change trajectories prototypical individuals.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-3.html","id":"interpreting-fixed-effects-coefficients-directly","dir":"Articles","previous_headings":"3.5 Examining Estimated Fixed Effects","what":"Interpreting fixed effects coefficients directly","title":"Chapter 3: Introducing the multilevel model for change","text":"fixed effects estimates multilevel model change can interpreted directly way regression coefficient. Thus, multilevel model change early_intervention data, fixed effects estimates interpreted follows: \\(\\gamma_{00}\\): model intercept. parameter estimates population average true initial status children control group. \\(\\gamma_{01}\\): coefficient treatment. parameter estimates difference population average true initial status children treatment group children control group. \\(\\gamma_{10}\\): coefficient age. parameter estimates population average annual rate true change children control group. \\(\\gamma_{11}\\): coefficient interaction age treatment. parameter estimates difference population average annual rate true change children treatment group children control group. model, can view estimates summary() function. can also access estimates programmatically using either generic fixef() function return vector estimates, tidy() function broom.mixed package return tibble.","code":"summary(early_intervention_fit) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: cognitive_score ~ I(age - 1) * treatment + (1 + I(age - 1) | id) #> Data: early_intervention #> #> AIC BIC logLik deviance df.resid #> 2402.5 2432.4 -1193.3 2386.5 301 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -2.04567 -0.48714 0.04639 0.53367 2.32828 #> #> Random effects: #> Groups Name Variance Std.Dev. Corr #> id (Intercept) 133.74 11.564 #> I(age - 1) 45.61 6.754 -0.57 #> Residual 69.24 8.321 #> Number of obs: 309, groups: id, 103 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) 107.822 2.063 52.276 #> I(age - 1) -20.123 2.023 -9.949 #> treatment 3.657 2.749 1.330 #> I(age - 1):treatment 5.702 2.695 2.116 #> #> Correlation of Fixed Effects: #> (Intr) I(g-1) trtmnt #> I(age - 1) -0.605 #> treatment -0.750 0.454 #> I(g-1):trtm 0.454 -0.750 -0.605 fixef(early_intervention_fit) #> (Intercept) I(age - 1) treatment #> 107.821683 -20.123396 3.656602 #> I(age - 1):treatment #> 5.702077 tidy(early_intervention_fit, effects = \"fixed\") #> # A tibble: 4 × 5 #> effect term estimate std.error statistic #> #> 1 fixed (Intercept) 108. 2.06 52.3 #> 2 fixed I(age - 1) -20.1 2.02 -9.95 #> 3 fixed treatment 3.66 2.75 1.33 #> 4 fixed I(age - 1):treatment 5.70 2.70 2.12"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-3.html","id":"plotting-fitted-change-trajectories-for-prototypical-individuals","dir":"Articles","previous_headings":"3.5 Examining Estimated Fixed Effects","what":"Plotting fitted change trajectories for prototypical individuals","title":"Chapter 3: Introducing the multilevel model for change","text":"Another way interpreting fixed effects plot fitted change trajectories prototypical individuals, using fixed effects estimates make predictions. can using following three-step process: Construct data set prototypical individuals. Predict fitted change trajectories prototypical individuals using fixed effects estimates. Plot fitted change trajectories. Depending complexity multilevel model change, number prototypical individuals wish examine, number ways construct data set prototypical individuals. simplest way construct data set hand using, example, tibble() tribble() functions tibble package. However, often case prototypical individuals simply represent unique combinations different predictor values, often convenient construct data set using expand_grid() crossing() functions tidyr package, expand data frame include possible combinations values. difference functions crossing() wrapper around expand_grid() de-duplicates sorts inputs. early_intervention multilevel model change, two prototypical individuals possible: child treatment group (treatment = 1) child control group (treatment = 0). make predictions using fixed effects estimates, set re.form argument predict() function NA. noted earlier code comment, use predict() rather augment() get predictions, due currently bad behaviour augment.merMod() making predictions new data models (although specific example augment() approach worked well). Finally, can plot fitted change trajectories usual.","code":"prototypical_children <- crossing(treatment = c(0, 1), age = c(1, 1.5, 2)) prototypical_children #> # A tibble: 6 × 2 #> treatment age #> #> 1 0 1 #> 2 0 1.5 #> 3 0 2 #> 4 1 1 #> 5 1 1.5 #> 6 1 2 prototypical_children <- prototypical_children |> mutate( treatment = factor(treatment), cognitive_score = predict( early_intervention_fit, newdata = prototypical_children, re.form = NA ) ) prototypical_children #> # A tibble: 6 × 3 #> treatment age cognitive_score #> #> 1 0 1 108. #> 2 0 1.5 97.8 #> 3 0 2 87.7 #> 4 1 1 111. #> 5 1 1.5 104. #> 6 1 2 97.1 # Figure 3.5, page 71: ggplot(prototypical_children, aes(x = age, y = cognitive_score)) + geom_line(aes(linetype = treatment, group = treatment)) + scale_x_continuous(breaks = c(1, 1.5, 2)) + coord_cartesian(ylim = c(50, 150))"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"example-changes-in-adolescent-alcohol-use","dir":"Articles","previous_headings":"","what":"4.1 Example: Changes in adolescent alcohol use","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"Chapter 4 Singer Willett (2003) delve deeper specification, estimation, interpretation multilevel model change using subset data Curran, Stice, Chassin (1997), measured relation changes alcohol use changes peer alcohol use 3-year period community-based sample Hispanic Caucasian adolescents. example use alcohol_use_1 data set, person-period data frame 246 rows 6 columns: id: Adolescent ID. age: Age years time measurement. child_of_alcoholic: Binary indicator whether adolescent child alcoholic parent. male: Binary indicator whether adolescent male. alcohol_use: Square root summed scores four eight-point items measuring frequency alcohol use. peer_alcohol_use: Square root summed scores two six-point items measuring frequency peer alcohol use. inform specification multilevel model change fit subsequent sections, begin basic exploration description alcohol_use_1 data. Starting age variable, can see adolescent measured three occasions ages 14, 15, 16 years. Next ’ll look time-invariant male child_of_alcoholic variables. ’re summarizing time-invariant predictors, ’ll transform data person-level format pivot_wider() function tidyr package summarizing. total 42 adolescents (51.2%) male 40 female (48.8%), total 37 adolescents (45.1%) children alcoholic parent 45 (54.9%). inform specification level-1 submodel, can look empirical growth plots random sample adolescents usual. Finally, inform specification level-2 submodel, can look coincident growth trajectories—simply usual individual growth trajectories summarized number individuals trajectory—displayed separately groups distinguished important values time-invariant predictors. look two time-invariant predictors, child_of_alcoholic peer_alcohol_use. peer_alcohol_use continuous variable, split sample mean purpose display. plot coincident growth trajectories, first need summarize, predictor, number individuals trajectory. easiest way count number groups trajectory pattern level time-invariant predictors using person-level data, tidying coincident trajectory summary back person-period format. Afterwards can plot usual, addition linewidth aesthetic coincident trajectory counts. Note plot differs slightly text, unlike Singer Willett (2003) use entire sample instead random sample. Based exploratory analyses, Singer Willett (2003) posited following multilevel model change alcohol_use_1 data: \\[ \\begin{alignat}{3} & \\text{Level 1:} \\qquad & \\text{alcohol_use}_{ij} &= \\pi_{0i} + \\pi_{1i} (\\text{age}_{ij} - 14) + \\epsilon_{ij} \\\\ & \\text{Level 2:} \\qquad & \\pi_{0i} &= \\gamma_{00} + \\gamma_{01} \\text{child_of_alcoholic}_i + \\zeta_{0i} \\\\ & & \\pi_{1i} &= \\gamma_{10} + \\gamma_{11} \\text{child_of_alcoholic}_i + \\zeta_{1i}, \\end{alignat} \\] model parameters follow definitions interpretations discussed Chapter 3.","code":"alcohol_use_1 #> # A tibble: 246 × 6 #> id age child_of_alcoholic male alcohol_use peer_alcohol_use #> #> 1 1 14 1 0 1.73 1.26 #> 2 1 15 1 0 2 1.26 #> 3 1 16 1 0 2 1.26 #> 4 2 14 1 1 0 0.894 #> 5 2 15 1 1 0 0.894 #> 6 2 16 1 1 1 0.894 #> 7 3 14 1 1 1 0.894 #> 8 3 15 1 1 2 0.894 #> 9 3 16 1 1 3.32 0.894 #> 10 4 14 1 1 0 1.79 #> # ℹ 236 more rows measurement_occasions <- unique(alcohol_use_1$age) measurement_occasions #> [1] 14 15 16 alcohol_use_1 |> group_by(id) |> summarise(all_occasions = identical(age, measurement_occasions)) |> pull(all_occasions) |> unique() #> [1] TRUE alcohol_use_1_pl <- pivot_wider( alcohol_use_1, names_from = age, names_prefix = \"alcohol_use_\", values_from = alcohol_use ) alcohol_use_1_pl #> # A tibble: 82 × 7 #> id child_of_alcoholic male peer_alcohol_use alcohol_use_14 alcohol_use_15 #> #> 1 1 1 0 1.26 1.73 2 #> 2 2 1 1 0.894 0 0 #> 3 3 1 1 0.894 1 2 #> 4 4 1 1 1.79 0 2 #> 5 5 1 0 0.894 0 0 #> 6 6 1 1 1.55 3 3 #> 7 7 1 0 1.55 1.73 2.45 #> 8 8 1 1 0 0 0 #> 9 9 1 1 0 0 1 #> 10 10 1 0 2 1 1 #> # ℹ 72 more rows #> # ℹ 1 more variable: alcohol_use_16 map( list(male = \"male\", child_of_alcoholic = \"child_of_alcoholic\"), \\(.x) { alcohol_use_1_pl |> group_by(.data[[.x]]) |> summarise(count = n()) |> mutate(proportion = count / sum(count)) } ) #> $male #> # A tibble: 2 × 3 #> male count proportion #> #> 1 0 40 0.488 #> 2 1 42 0.512 #> #> $child_of_alcoholic #> # A tibble: 2 × 3 #> child_of_alcoholic count proportion #> #> 1 0 45 0.549 #> 2 1 37 0.451 # Figure 4.1, page 77: alcohol_use_1 |> filter(id %in% c(4, 14, 23, 32, 41, 56, 65, 82)) |> ggplot(aes(x = age, y = alcohol_use)) + stat_smooth(method = \"lm\", se = FALSE) + geom_point() + coord_cartesian(xlim = c(13, 17), ylim = c(0, 4)) + facet_wrap(vars(id), ncol = 4, labeller = label_both) alcohol_use_1_pl <- alcohol_use_1_pl |> mutate( peer_alcohol_use_split = if_else( peer_alcohol_use < mean(peer_alcohol_use), true = \"low\", false = \"high\" ), peer_alcohol_use_split = factor(peer_alcohol_use_split, levels = c(\"low\", \"high\")) ) alcohol_use_1_pl #> # A tibble: 82 × 8 #> id child_of_alcoholic male peer_alcohol_use alcohol_use_14 alcohol_use_15 #> #> 1 1 1 0 1.26 1.73 2 #> 2 2 1 1 0.894 0 0 #> 3 3 1 1 0.894 1 2 #> 4 4 1 1 1.79 0 2 #> 5 5 1 0 0.894 0 0 #> 6 6 1 1 1.55 3 3 #> 7 7 1 0 1.55 1.73 2.45 #> 8 8 1 1 0 0 0 #> 9 9 1 1 0 0 1 #> 10 10 1 0 2 1 1 #> # ℹ 72 more rows #> # ℹ 2 more variables: alcohol_use_16 , peer_alcohol_use_split alcohol_use_1_cotraj <- map( list(\"child_of_alcoholic\", \"peer_alcohol_use_split\"), \\(.x) { # Wrangle .coincident_trajectories <- alcohol_use_1_pl |> group_by(.data[[.x]], pick(starts_with(\"alcohol_use\"))) |> summarise(coincident_trajectories = n(), .groups = \"drop\") |> mutate(trajectory_id = 1:n(), .before = everything()) |> pivot_longer( cols = starts_with(\"alcohol_use\"), names_to = \"age\", names_prefix = \"alcohol_use_\", names_transform = as.integer, values_to = \"alcohol_use\" ) # Plot ggplot(.coincident_trajectories, aes(x = age, y = alcohol_use)) + stat_smooth( aes(group = trajectory_id, linewidth = coincident_trajectories), method = \"lm\", se = FALSE ) + scale_linewidth_continuous(limits = c(1, 22), range = c(.25, 4)) + coord_cartesian(xlim = c(13, 17), ylim = c(0, 4)) + facet_wrap(vars(.data[[.x]]), labeller = label_both) } ) # Figure 4.2, page 79: wrap_plots(alcohol_use_1_cotraj, ncol = 1, guides = \"collect\")"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"the-composite-specification-of-the-multilevel-model-for-change","dir":"Articles","previous_headings":"","what":"4.2 The composite specification of the multilevel model for change","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"Section 4.2 Singer Willett (2003) introduce call composite multilevel model change, prematurely introduced Chapter 3 examples mixed model specification. substituting level-2 equations level-1, composite multilevel model change alcohol_use_1 data looks like: \\[ \\text{alcohol_use}_{ij} = \\underbrace{ \\gamma_{00} + \\gamma_{01} \\text{coa}_i + \\gamma_{10}(\\text{age}_{ij} - 14) + \\gamma_{11} \\text{coa}_i(\\text{age}_{ij} - 14) }_{\\text{Fixed Effects}} + \\underbrace{ \\epsilon_{ij} + \\zeta_{0i} + \\zeta_{1i}(\\text{age}_{ij} - 14). }_{\\text{Random Effects}} \\]","code":""},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"methods-of-estimation-revisited","dir":"Articles","previous_headings":"","what":"4.3 Methods of Estimation, Revisited","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"Section 4.3 Singer Willett (2003) discuss two methods estimation available frequentist multilevel models, must selected fitting model: Generalized least squares (GLS), extension ordinary least-squares estimation allows residuals autocorrelated heteroscedastic. GLS estimates obtained minimizing weighted function residuals. gls() function nlme package can used fit multilevel model change using GLS. Maximum likelihood (ML), general approach limited linear regression models. ML estimates obtained maximizing likelihood function , assumed statistical model, observed data probable. previously demonstrated, lmer() function lme4 package can used fit multilevel model change using ML. generalized least squares maximum likelihood estimation use different procedures fit multilevel model change, estimates may differ fitting model using data; however, normal distribution assumptions required maximum likelihood estimation hold, estimates equivalent. Additionally, maximum likelihood estimation can distinguished two types: full restricted. Singer Willett (2003) explain, full maximum likelihood (FML) likelihood sample data maximized, goodness--fit statistics refer fit entire model (fixed random effects); restricted maximum likelihood (REML) likelihood sample residuals maximized, goodness--fit statistics refer fit random effects. Consequently, statistical tests comparing goodness--fit statistics FML models can used test hypotheses either fixed random effect parameters, whereas REML models can used test hypotheses random effect parameters.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"first-steps-fitting-two-unconditional-multilevel-models-for-change","dir":"Articles","previous_headings":"","what":"4.4 First Steps: Fitting Two Unconditional Multilevel Models for Change","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"Section 4.4 Singer Willett (2003) introduce new model building workflow multilevel model change begins fitting two unconditional multilevel models change including substantive predictors: unconditional means model, partitions quantifies total variation outcome variable across individuals without regard time. fit model first determine whether variance components \\(\\epsilon_{ij}\\) \\(\\zeta_{0i}\\) sufficient variation within individuals (level 1) individuals (level 2), respectively, warrant linking outcome variation level predictors. unconditional growth model, partitions quantifies variation outcome variable across individuals time. fit model second determine whether interindividual differences change due outcome variation true initial status, \\(\\zeta_{0i}\\), true rate change, \\(\\zeta_{1i}\\). Together, two models (1) provide valuable baseline can evaluate compare subsequent models include substantive predictors, (2) help establish whether systematic variation outcome variable worth exploring resides.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"the-unconditional-means-model","dir":"Articles","previous_headings":"4.4 First Steps: Fitting Two Unconditional Multilevel Models for Change","what":"The unconditional means model","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"unconditional means model intercept-model allows intercept vary across individuals: \\[ \\begin{alignat}{3} & \\text{Level 1:} \\qquad & \\text{alcohol_use}_{ij} &= \\pi_{0i} + \\epsilon_{ij} \\\\ & \\text{Level 2:} \\qquad & \\pi_{0i} &= \\gamma_{00} + \\zeta_{0i}, \\end{alignat} \\] postulates observed value alcohol_use \\(\\)th adolescent \\(j\\)th time composed within-person deviations, \\(\\epsilon_{ij}\\), person-specific true mean, \\(\\pi_{0i}\\), turn composed -person deviation, \\(\\zeta_{0i}\\), population average true mean, \\(\\gamma_{00}\\). Note unconditional means model lacks temporal predictors, stipulates true change trajectory individual completely flat time, sitting person-specific mean (\\(\\pi_{0i}\\)); true population change trajectory also flat, sitting grand mean (\\(\\gamma_{00}\\)).","code":"# Table 4.1, Model A, page 94-95: model_A <- lmer( alcohol_use ~ 1 + (1 | id), data = alcohol_use_1, REML = FALSE ) summary(model_A) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: alcohol_use ~ 1 + (1 | id) #> Data: alcohol_use_1 #> #> AIC BIC logLik deviance df.resid #> 676.2 686.7 -335.1 670.2 243 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -1.8865 -0.3076 -0.3067 0.6137 2.8567 #> #> Random effects: #> Groups Name Variance Std.Dev. #> id (Intercept) 0.5639 0.7509 #> Residual 0.5617 0.7495 #> Number of obs: 246, groups: id, 82 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) 0.92195 0.09571 9.633 model_A |> augment(data = alcohol_use_1) |> ggplot(aes(x = age, y = .fitted)) + geom_line(aes(linewidth = \"individual\", group = id), alpha = .35) + geom_line( aes(linewidth = \"average\"), data = tibble(.fitted = fixef(model_A), age = measurement_occasions) ) + scale_linewidth_manual(values = c(2, .25)) + coord_cartesian(xlim = c(13, 17), ylim = c(0, 4)) + labs(y = \"alcohol_use\", linewidth = \"trajectory\")"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"the-unconditional-growth-model","dir":"Articles","previous_headings":"4.4 First Steps: Fitting Two Unconditional Multilevel Models for Change","what":"The unconditional growth model","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"unconditional growth model introduces time-indicator predictor model, allows rate change vary across individuals: \\[ \\begin{alignat}{3} & \\text{Level 1:} \\qquad & \\text{alcohol_use}_{ij} &= \\pi_{0i} + \\pi_{1i}(\\text{age}_{ij} - 14) + \\epsilon_{ij} \\\\ & \\text{Level 2:} \\qquad & \\pi_{0i} &= \\gamma_{00} + \\zeta_{0i} \\\\ & & \\pi_{1i} &= \\gamma_{10} + \\zeta_{1i}, \\end{alignat} \\] postulates observed value alcohol_use \\(\\)th adolescent \\(j\\)th time composed within-person deviations, \\(\\epsilon_{ij}\\), true linear change trajectory—linear function true initial status, \\(\\pi_{0i}\\), true rate change, \\(\\pi_{1i}\\)—turn composed -person deviations, \\(\\zeta_{0i}\\) \\(\\zeta_{1i}\\), population average true initial status, \\(\\gamma_{00}\\), population average true rate change, \\(\\gamma_{10}\\), respectively. model Chapter 3’s individual growth model—new name emphasize model includes substantive predictors—can plot trajectories usual.","code":"model_B <- lmer( alcohol_use ~ I(age - 14) + (I(age - 14) | id), data = alcohol_use_1, REML = FALSE ) summary(model_B) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: alcohol_use ~ I(age - 14) + (I(age - 14) | id) #> Data: alcohol_use_1 #> #> AIC BIC logLik deviance df.resid #> 648.6 669.6 -318.3 636.6 240 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -2.47999 -0.38401 -0.07553 0.39001 2.50685 #> #> Random effects: #> Groups Name Variance Std.Dev. Corr #> id (Intercept) 0.6244 0.7902 #> I(age - 14) 0.1512 0.3888 -0.22 #> Residual 0.3373 0.5808 #> Number of obs: 246, groups: id, 82 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) 0.65130 0.10508 6.198 #> I(age - 14) 0.27065 0.06245 4.334 #> #> Correlation of Fixed Effects: #> (Intr) #> I(age - 14) -0.441 model_B |> augment() |> select(-alcohol_use) |> rename(alcohol_use = .fitted, age = `I(age - 14)`) |> mutate(age = as.numeric(age + 14)) |> ggplot(aes(x = age, y = alcohol_use)) + geom_line(aes(linewidth = \"individual\", group = id), colour = \"#3366FF\") + geom_line( aes(linewidth = \"average\"), data = tibble( age = measurement_occasions, alcohol_use = predict( model_B, tibble(age = measurement_occasions), re.form = NA ) ), colour = \"#3366FF\" ) + scale_linewidth_manual(values = c(2, .25)) + scale_x_continuous(breaks = 13:17) + coord_cartesian(xlim = c(13, 17), ylim = c(0, 4)) + labs(linewidth = \"trajectory\")"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"practical-data-analytic-strategies-for-model-building","dir":"Articles","previous_headings":"","what":"4.5 Practical Data Analytic Strategies for Model Building","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"Section 4.5 Singer Willett (2003) present data analytic strategy model building, focuses building systematic sequence models , set, address research questions meaningful way. refer sequence taxonomy statistical models, : model taxonomy extends prior model sensible way. Decisions enter, retain, remove predictors based combination logic, theory, prior research, supplemented hypothesis testing comparisons model fit. taxonomy progresses toward “final” model whose interpretation addresses research questions. present strategy one potential analytic path alcohol_use_1 data, research question focused relationship changes adolescent alcohol use child alcoholic parent. first substantive model, Model C, updates unconditional growth model include child_of_alcoholic predictor initial status rate change. Singer Willett (2003) added terms logical first step, given research question. Model D builds upon Model C, controlling effects peer_alcohol_use initial status rate change. Singer Willett (2003) added terms see might explain conditional residual variation initial status rate change Model C. Model E reduces Model D, removing child_of_alcoholic predictor rate change. Singer Willett (2003) removed term based results Models C D, estimated difference rate change alcohol_use children alchoholic nonalcoholic parents practically zero. Model F serves alternative Model E, peer_alcohol_use centred sample mean 1.018 (person-level data set). Singer Willett (2003) centred peer_alcohol_use level-2 intercepts, \\(\\gamma_{00}\\) \\(\\gamma_{10}\\), represent child non-alchoholic parents average value peer_alcohol_use (peer_alcohol_use = 1.018 child_of_alcoholic = 0), rather child non-alchoholic parents whose peers age 14 totally abstinent (peer_alcohol_use = 0 child_of_alcoholic = 0). Finally, Model G serves alternative Model F, child_of_alcoholic also centred sample mean 0.451 (person-level data set). Singer Willett (2003) also centred child_of_alcoholic level-2 intercepts, \\(\\gamma_{00}\\) \\(\\gamma_{10}\\), represent adolescent average values peer_alcohol_use child_of_alcoholic (peer_alcohol_use = 1.018 child_of_alcoholic = 0.451), numerically identical corresponding level-2 intercepts unconditional growth model. make taxonomy statistical models easier work subsequent sections, store models list.","code":"model_C <- update( model_B, . ~ . + child_of_alcoholic + I(age - 14):child_of_alcoholic ) summary(model_C) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: alcohol_use ~ I(age - 14) + (I(age - 14) | id) + child_of_alcoholic + #> I(age - 14):child_of_alcoholic #> Data: alcohol_use_1 #> #> AIC BIC logLik deviance df.resid #> 637.2 665.2 -310.6 621.2 238 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -2.5480 -0.3880 -0.1058 0.3602 2.3961 #> #> Random effects: #> Groups Name Variance Std.Dev. Corr #> id (Intercept) 0.4876 0.6983 #> I(age - 14) 0.1506 0.3881 -0.22 #> Residual 0.3373 0.5808 #> Number of obs: 246, groups: id, 82 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) 0.31595 0.13070 2.417 #> I(age - 14) 0.29296 0.08423 3.478 #> child_of_alcoholic 0.74321 0.19457 3.820 #> I(age - 14):child_of_alcoholic -0.04943 0.12539 -0.394 #> #> Correlation of Fixed Effects: #> (Intr) I(g-14) chld__ #> I(age - 14) -0.460 #> chld_f_lchl -0.672 0.309 #> I(-14):ch__ 0.309 -0.672 -0.460 model_D <- update( model_C, . ~ . + peer_alcohol_use + I(age - 14):peer_alcohol_use ) summary(model_D) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: alcohol_use ~ I(age - 14) + (I(age - 14) | id) + child_of_alcoholic + #> peer_alcohol_use + I(age - 14):child_of_alcoholic + I(age - #> 14):peer_alcohol_use #> Data: alcohol_use_1 #> #> AIC BIC logLik deviance df.resid #> 608.7 643.7 -294.3 588.7 236 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -2.59554 -0.40005 -0.07769 0.46003 2.29373 #> #> Random effects: #> Groups Name Variance Std.Dev. Corr #> id (Intercept) 0.2409 0.4908 #> I(age - 14) 0.1391 0.3730 -0.03 #> Residual 0.3373 0.5808 #> Number of obs: 246, groups: id, 82 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) -0.31651 0.14806 -2.138 #> I(age - 14) 0.42943 0.11369 3.777 #> child_of_alcoholic 0.57917 0.16249 3.564 #> peer_alcohol_use 0.69430 0.11153 6.225 #> I(age - 14):child_of_alcoholic -0.01403 0.12477 -0.112 #> I(age - 14):peer_alcohol_use -0.14982 0.08564 -1.749 #> #> Correlation of Fixed Effects: #> (Intr) I(g-14) chld__ pr_lc_ I(-14):c__ #> I(age - 14) -0.436 #> chld_f_lchl -0.371 0.162 #> peer_lchl_s -0.686 0.299 -0.162 #> I(-14):ch__ 0.162 -0.371 -0.436 0.071 #> I(-14):pr__ 0.299 -0.686 0.071 -0.436 -0.162 model_E <- update( model_D, . ~ . - I(age - 14):child_of_alcoholic ) summary(model_E) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: alcohol_use ~ I(age - 14) + (I(age - 14) | id) + child_of_alcoholic + #> peer_alcohol_use + I(age - 14):peer_alcohol_use #> Data: alcohol_use_1 #> #> AIC BIC logLik deviance df.resid #> 606.7 638.3 -294.4 588.7 237 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -2.59554 -0.40414 -0.08352 0.45550 2.29975 #> #> Random effects: #> Groups Name Variance Std.Dev. Corr #> id (Intercept) 0.2409 0.4908 #> I(age - 14) 0.1392 0.3730 -0.03 #> Residual 0.3373 0.5808 #> Number of obs: 246, groups: id, 82 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) -0.31382 0.14611 -2.148 #> I(age - 14) 0.42469 0.10559 4.022 #> child_of_alcoholic 0.57120 0.14623 3.906 #> peer_alcohol_use 0.69518 0.11126 6.249 #> I(age - 14):peer_alcohol_use -0.15138 0.08451 -1.791 #> #> Correlation of Fixed Effects: #> (Intr) I(g-14) chld__ pr_lc_ #> I(age - 14) -0.410 #> chld_f_lchl -0.338 0.000 #> peer_lchl_s -0.709 0.351 -0.146 #> I(-14):pr__ 0.334 -0.814 0.000 -0.431 model_F <- update( model_E, data = mutate(alcohol_use_1, peer_alcohol_use = peer_alcohol_use - 1.018) ) summary(model_F) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: alcohol_use ~ I(age - 14) + (I(age - 14) | id) + child_of_alcoholic + #> peer_alcohol_use + I(age - 14):peer_alcohol_use #> Data: mutate(alcohol_use_1, peer_alcohol_use = peer_alcohol_use - 1.018) #> #> AIC BIC logLik deviance df.resid #> 606.7 638.3 -294.4 588.7 237 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -2.59554 -0.40414 -0.08352 0.45550 2.29975 #> #> Random effects: #> Groups Name Variance Std.Dev. Corr #> id (Intercept) 0.2409 0.4908 #> I(age - 14) 0.1392 0.3730 -0.03 #> Residual 0.3373 0.5808 #> Number of obs: 246, groups: id, 82 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) 0.39387 0.10354 3.804 #> I(age - 14) 0.27058 0.06127 4.416 #> child_of_alcoholic 0.57120 0.14623 3.906 #> peer_alcohol_use 0.69518 0.11126 6.249 #> I(age - 14):peer_alcohol_use -0.15138 0.08451 -1.791 #> #> Correlation of Fixed Effects: #> (Intr) I(g-14) chld__ pr_lc_ #> I(age - 14) -0.336 #> chld_f_lchl -0.637 0.000 #> peer_lchl_s 0.094 0.000 -0.146 #> I(-14):pr__ 0.000 0.001 0.000 -0.431 model_G <- update( model_F, data = mutate( alcohol_use_1, peer_alcohol_use = peer_alcohol_use - 1.018, child_of_alcoholic = child_of_alcoholic - 0.451 ) ) summary(model_G) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: alcohol_use ~ I(age - 14) + (I(age - 14) | id) + child_of_alcoholic + #> peer_alcohol_use + I(age - 14):peer_alcohol_use #> Data: mutate(alcohol_use_1, peer_alcohol_use = peer_alcohol_use - 1.018, #> child_of_alcoholic = child_of_alcoholic - 0.451) #> #> AIC BIC logLik deviance df.resid #> 606.7 638.3 -294.4 588.7 237 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -2.59554 -0.40414 -0.08352 0.45550 2.29975 #> #> Random effects: #> Groups Name Variance Std.Dev. Corr #> id (Intercept) 0.2409 0.4908 #> I(age - 14) 0.1392 0.3730 -0.03 #> Residual 0.3373 0.5808 #> Number of obs: 246, groups: id, 82 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) 0.65148 0.07979 8.165 #> I(age - 14) 0.27058 0.06127 4.416 #> child_of_alcoholic 0.57120 0.14623 3.906 #> peer_alcohol_use 0.69518 0.11126 6.249 #> I(age - 14):peer_alcohol_use -0.15138 0.08451 -1.791 #> #> Correlation of Fixed Effects: #> (Intr) I(g-14) chld__ pr_lc_ #> I(age - 14) -0.436 #> chld_f_lchl 0.000 0.000 #> peer_lchl_s 0.001 0.000 -0.146 #> I(-14):pr__ 0.000 0.001 0.000 -0.431 alcohol_use_1_fits <- list( `Model A` = model_A, `Model B` = model_B, `Model C` = model_C, `Model D` = model_D, `Model E` = model_E, `Model F` = model_F, `Model G` = model_G )"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"inspecting-model-summary-and-goodness-of-fit-statistics","dir":"Articles","previous_headings":"4.5 Practical Data Analytic Strategies for Model Building","what":"Inspecting model summary and goodness-of-fit statistics","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"addition output summary() can return one row tibble model summary goodness--fit statistics using glance() function broom.mixed package. Individual statistics can also returned using generic functions corresponding name (e.g., AIC(), BIC(), deviance(), etc.). Singer Willett (2003) also introduce three pseudo-\\(R^2\\) statistics multilevel model change, can used cautiously quantify much outcome variation “explained” model’s predictors. first statistic, \\(R^2_{y \\hat y}\\), assesses proportion total outcome variation “explained” model’s specific combination predictors, based squared sample correlation observed predicted values. second statistic, \\(R^2_{\\epsilon}\\), assesses proportion within-person variation “explained” time, based proportional decrease within-person residual variance unconditional means model subsequent models. Note way reducing variance component add time-varying predictors level-1 submodel, statistic models fit alcohol_use_1 data. final statistic, \\(R^2_{\\zeta}\\), assesses proportion -person variation “explained” one level-2 predictors, based proportional decrease level-2 residual variance unconditional growth model subsequent models level-2 residual variance component. adding statistics table next section, join together .","code":"glance(model_A) #> # A tibble: 1 × 7 #> nobs sigma logLik AIC BIC deviance df.residual #> #> 1 246 0.749 -335. 676. 687. 670. 243 r2_yy <- alcohol_use_1_fits |> map( \\(.fit) { .fit |> augment() |> summarise( r2_yy = cor(alcohol_use, .fixed)^2 ) } ) |> list_rbind(names_to = \"model\") r2_e <- alcohol_use_1_fits[2:7] |> map( \\(.fit) { .fit |> augment() |> summarise( r2_e = (sigma(model_A)^2 - sigma(.fit)^2) / sigma(model_A)^2 ) } ) |> list_rbind(names_to = \"model\") r2_z <- alcohol_use_1_fits[3:7] |> map( \\(.fit) { zeta <- map( list(x = model_B, y = .fit), \\(.fit2) { .fit2 |> tidy(effects = \"ran_pars\", scales = \"vcov\") |> filter(group != \"Residual\" & stringr::str_detect(term, \"^var\")) } ) zeta$x |> left_join(zeta$y, by = c(\"effect\", \"group\", \"term\")) |> mutate( r2 = (estimate.x - estimate.y) / estimate.x, name = c(\"r2_z1\", \"r2_z2\") ) |> select(name, r2) |> pivot_wider(names_from = name, values_from = r2) } ) |> list_rbind(names_to = \"model\") alcohol_use_1_fits_r2 <- r2_yy |> left_join(r2_e) |> left_join(r2_z) alcohol_use_1_fits_r2 #> # A tibble: 7 × 5 #> model r2_yy r2_e r2_z1 r2_z2 #> #> 1 Model A NA NA NA NA #> 2 Model B 0.0434 0.400 NA NA #> 3 Model C 0.150 0.400 0.219 0.00401 #> 4 Model D 0.291 0.400 0.614 0.0799 #> 5 Model E 0.291 0.400 0.614 0.0797 #> 6 Model F 0.291 0.400 0.614 0.0797 #> 7 Model G 0.291 0.400 0.614 0.0797"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"interpreting-fitted-models","dir":"Articles","previous_headings":"4.5 Practical Data Analytic Strategies for Model Building","what":"Interpreting Fitted Models","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"systematically compare fitted models—describing happens predictors added removed—Singer Willett (2003) suggest placing side--side table, allows easily inspect compare estimated fixed effects, variance components, goodness--fit statistics one model next. can construct table using modelsummary() function modelsummary package. better match table text, set table output \"gt\" can post-process using gt package.","code":"# This option needs to be set in order to make all the desired goodness-of-fit # statistics available to modelsummary. options(modelsummary_get = \"all\") # Table 4.1, page 94-95: alcohol_use_1_fits |> modelsummary( shape = term + effect + statistic ~ model, scales = c(\"vcov\", NA), # argument from broom.mixed::tidy() coef_map = c( \"(Intercept)\", \"child_of_alcoholic\", \"peer_alcohol_use\", \"I(age - 14)\", \"I(age - 14):child_of_alcoholic\", \"I(age - 14):peer_alcohol_use\", \"var__Observation\", \"var__(Intercept)\", \"var__I(age - 14)\", \"cov__(Intercept).I(age - 14)\" ), gof_map = tibble( raw = c(\"deviance\", \"AIC\", \"BIC\"), clean = c(\"Deviance\", \"AIC\", \"BIC\"), fmt = 2 ), # The R2s need to be transposed to be added to the table columns. Their # position in the table is set by the `position` attribute. add_rows = alcohol_use_1_fits_r2 |> pivot_longer(-model, names_to = \"estimate\") |> pivot_wider(names_from = model) |> mutate(effect = \"\", .after = estimate) |> structure(position = 17:21), output = \"gt\" ) |> tab_row_group(label = \"Goodness-of-Fit\", rows = 17:23) |> tab_row_group(label = \"Variance Components\", rows = 13:16) |> tab_row_group(label = \"Fixed Effects\", rows = 1:12) |> cols_hide(effect)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"displaying-prototypical-change-trajectories","dir":"Articles","previous_headings":"4.5 Practical Data Analytic Strategies for Model Building","what":"Displaying Prototypical Change Trajectories","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"addition numerical summaries, Singer Willett (2003) suggest plotting fitted trajectories prototypical individuals describe results model fitting, prototypical values predictors selected using one following strategies: Choosing substantively interesting values, categorical continuous predictors well-known values. Using range percentiles, continuous predictors without well-known values. Using standard deviations sample mean, continuous predictors without well-known values. Using sample mean, categorical continuous predictors want control . selecting prototypical values predictors, prototypical change trajectories can derived combinations values using usual model predictions. convenience, can use estimate_prediction() function modelbased package make predictions, map2() function purrr package iterate desired model prototypical value combinations. look prototypical change trajectories Models B, C, E. systematically compare prototypical change trajectories, can helpful plot side--side. However, certain predictors present models others, need supply na.value scale_() functions ensure trajectory appears panels predictors present. Note depending number predictors across different models, may preferable instead create separate plots (later added together using patchwork package).","code":"prototypical_alcohol_use <- alcohol_use_1_fits |> keep_at(paste0(\"Model \", c(\"B\", \"C\", \"E\"))) |> map2( list( tibble(age = 14:16), crossing(age = 14:16, child_of_alcoholic = 0:1), crossing( age = 14:16, child_of_alcoholic = 0:1, peer_alcohol_use = c(0.655, 1.381) ) ), \\(.fit, .data) { .fit |> estimate_prediction(data = .data) |> rename(alcohol_use = Predicted) |> as_tibble() } ) |> list_rbind(names_to = \"model\") |> mutate( child_of_alcoholic = factor(child_of_alcoholic), peer_alcohol_use = factor(peer_alcohol_use, labels = c(\"low\", \"high\")) ) prototypical_alcohol_use #> # A tibble: 21 × 8 #> model age alcohol_use SE CI_low CI_high child_of_alcoholic #> #> 1 Model B 14 0.651 0.590 -0.511 1.81 NA #> 2 Model B 15 0.922 0.589 -0.238 2.08 NA #> 3 Model B 16 1.19 0.594 0.0233 2.36 NA #> 4 Model C 14 0.316 0.595 -0.857 1.49 0 #> 5 Model C 14 1.06 0.598 -0.120 2.24 1 #> 6 Model C 15 0.609 0.593 -0.559 1.78 0 #> 7 Model C 15 1.30 0.595 0.130 2.48 1 #> 8 Model C 16 0.902 0.602 -0.284 2.09 0 #> 9 Model C 16 1.55 0.607 0.351 2.74 1 #> 10 Model E 14 0.142 0.591 -1.02 1.31 0 #> # ℹ 11 more rows #> # ℹ 1 more variable: peer_alcohol_use # Figure 4.3, page 99: prototypical_alcohol_use |> ggplot(aes(x = age, y = alcohol_use)) + geom_line(aes(linetype = child_of_alcoholic, colour = peer_alcohol_use)) + scale_linetype_manual(values = c(2, 6), na.value = 1) + scale_color_viridis_d( option = \"G\", begin = .4, end = .7, na.value = \"black\" ) + scale_x_continuous(breaks = 13:17) + coord_cartesian(xlim = c(13, 17), ylim = c(0, 2)) + facet_wrap(vars(model))"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"comparing-models-using-deviance-statistics","dir":"Articles","previous_headings":"","what":"4.6 Comparing Models Using Deviance Statistics","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"Section 4.6 Singer Willett (2003) introduce deviance statistic, quantifies much worse current model fit comparison saturated model fits observed data perfectly comparing log-likelihood statistics two models: \\[ \\text{Deviance} = -2 (LL_\\text{current model} - LL_\\text{saturated model}). \\] Note multilevel model change equation reduces : \\[ \\text{Deviance} = -2LL_\\text{current model}, \\] log-likelihood statistic saturated model always zero. Deviance statistics nested models estimated using identical data can compared using anova() function, computes analysis deviance tables one fitted models. Unfortunately anova() doesn’t accept list input, meta-programming use list models don’t want type model hand. Note default anova() function refit objects class merMod FML comparing models estimated REML prevent common mistake inappropriately comparing REML-fitted models different fixed effects, whose likelihoods directly comparable. REML-fitted models identical fixed effects different random effects, refit argument can set FALSE directly compare REML-fitted models.","code":"with( alcohol_use_1_fits[1:5], do.call(anova, map(names(alcohol_use_1_fits[1:5]), as.name)) ) #> Data: alcohol_use_1 #> Models: #> Model A: alcohol_use ~ 1 + (1 | id) #> Model B: alcohol_use ~ I(age - 14) + (I(age - 14) | id) #> Model C: alcohol_use ~ I(age - 14) + (I(age - 14) | id) + child_of_alcoholic + I(age - 14):child_of_alcoholic #> Model E: alcohol_use ~ I(age - 14) + (I(age - 14) | id) + child_of_alcoholic + peer_alcohol_use + I(age - 14):peer_alcohol_use #> Model D: alcohol_use ~ I(age - 14) + (I(age - 14) | id) + child_of_alcoholic + peer_alcohol_use + I(age - 14):child_of_alcoholic + I(age - 14):peer_alcohol_use #> npar AIC BIC logLik deviance Chisq Df Pr(>Chisq) #> Model A 3 676.16 686.67 -335.08 670.16 #> Model B 6 648.61 669.64 -318.31 636.61 33.5449 3 2.472e-07 *** #> Model C 8 637.20 665.25 -310.60 621.20 15.4085 2 0.0004509 *** #> Model E 9 606.70 638.25 -294.35 588.70 32.4993 1 1.192e-08 *** #> Model D 10 608.69 643.74 -294.35 588.69 0.0126 1 0.9104569 #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"using-wald-statistics-to-test-composite-hypotheses-about-fixed-effects","dir":"Articles","previous_headings":"","what":"4.7 Using Wald Statistics to Test Composite Hypotheses About Fixed Effects","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"section intentionally left blank.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"evaluating-the-tenability-of-a-models-assumptions","dir":"Articles","previous_headings":"","what":"4.8 Evaluating the Tenability of a Model’s Assumptions","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"Section 4.8 Singer Willett (2003) offer strategies checking following assumptions multilevel model change: linear (nonlinear) functional form hypothesized individual change trajectory seems reasonable observed data—appear systematic deviations linearity (nonlinearity) across participants. level-1 level-2 residuals normally distributed. level-1 level-2 residuals equal variances level every predictor.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"checking-functional-form","dir":"Articles","previous_headings":"4.8 Evaluating the Tenability of a Model’s Assumptions","what":"Checking Functional Form","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"functional form assumption multilevel model change can assessed inspecting “outcome versus predictors” plots level. level-1, empirical growth plots superimposed individual change trajectories support suitability specified functional form. Empirical growth plots examined individual (several subsamples individuals), looking systematic deviations disconfirm suitability hypothesized individual change trajectory. level-2, OLS-estimated individual growth parameters plotted substantive predictor confirm suitability specified level-2 relationships. linear models, continuous predictors need assessed categorical predictors always linear.","code":"set.seed(333) alcohol_use_1 |> filter(id %in% sample(unique(id), size = 16)) |> ggplot(aes(x = age, y = alcohol_use)) + stat_smooth(method = \"lm\", se = FALSE) + geom_point() + coord_cartesian(xlim = c(13, 17), ylim = c(0, 4)) + facet_wrap(vars(id), ncol = 4, labeller = label_both) alcohol_use_1_fit_np <- lmList( alcohol_use ~ I(age - 14) | id, pool = FALSE, data = alcohol_use_1 ) alcohol_use_1_est_np <- alcohol_use_1_fit_np |> map(tidy) |> list_rbind(names_to = \"id\") |> select(id:estimate, alcohol_use = estimate) |> left_join(alcohol_use_1_pl) |> mutate(child_of_alcoholic = factor(child_of_alcoholic)) alcohol_use_1_ovp <- map( list(\"child_of_alcoholic\", \"peer_alcohol_use\"), \\(.x) { ggplot(alcohol_use_1_est_np, aes(x = .data[[.x]], y = alcohol_use)) + geom_hline(yintercept = 0, alpha = .25) + geom_point() + facet_wrap(vars(term), ncol = 1, scales = \"free_y\") } ) # Figure 4.4: wrap_plots(alcohol_use_1_ovp) + plot_layout(axes = \"collect\")"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"checking-normality","dir":"Articles","previous_headings":"4.8 Evaluating the Tenability of a Model’s Assumptions","what":"Checking Normality","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"normality assumption multilevel model change can assessed inspecting Q-Q plots level-1 level-2 residuals, also (optionally) statistical tests normality. check_normality() function performance package can perform tasks. plot() method check_normality() can used return Q-Q plots level-1 level-2 residuals.","code":"model_F_normality <- map( set_names(c(\"fixed\", \"random\")), \\(.x) check_normality(model_F, effects = .x) ) model_F_normality #> $fixed #> Warning: Non-normality of residuals detected (p = 0.011). #> #> $random #> OK: Random effects 'id: (Intercept)' appear as normally distributed (p = 0.270). #> Warning: Non-normality for random effects 'id: I(age - 14)' detected (p < .001). # Figure 4.5 (left panels), page 131: plot(model_F_normality$fixed, detrend = FALSE) + plot(model_F_normality$random) + plot_layout(widths = c(1/3, 2/3)) & theme_bw() & theme(panel.grid = element_blank())"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"checking-homoscedasticity","dir":"Articles","previous_headings":"4.8 Evaluating the Tenability of a Model’s Assumptions","what":"Checking Homoscedasticity","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"homoscedasticity assumption multilevel model change can assessed inspecting “residual versus predictors” plots level see residual variability approximately equal every predictor value. level-1 residuals plotted level-1 predictor. level-2 residuals plotted level-2 predictor(s).","code":"# Figure 4.6 (top panel), page 133: model_F |> augment(re.form = NA) |> rename(age = `I(age - 14)`) |> mutate(age = as.numeric(age + 14)) |> ggplot(aes(x = age, y = .resid)) + geom_hline(yintercept = 0, alpha = .25) + geom_point() + coord_cartesian(xlim = c(13, 17), ylim = c(-2, 2)) alcohol_use_1_ranef <- model_F |> ranef() |> augment() |> as_tibble() |> rename(term = variable, id = level, .resid = estimate) |> left_join(alcohol_use_1_pl) |> mutate(child_of_alcoholic = factor(child_of_alcoholic)) alcohol_use_1_rvp <- map( list(\"child_of_alcoholic\", \"peer_alcohol_use\"), \\(.x) { ggplot(alcohol_use_1_ranef, aes(x = .data[[.x]], y = .resid)) + geom_hline(yintercept = 0, alpha = .25) + geom_point() + facet_wrap(vars(term), ncol = 1, scales = \"free_y\") + coord_cartesian(ylim = c(-1, 1)) } ) # Figure 4.6 (bottom panels), page 133: wrap_plots(alcohol_use_1_rvp) + plot_layout(axes = \"collect\")"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"model-based-estimates-of-the-individual-growth-parameters","dir":"Articles","previous_headings":"","what":"4.9 Model-Based Estimates of the Individual Growth Parameters","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"Section 4.9 Singer Willett (2003) discuss use model-based estimates display individual growth trajectories, simply partial pooling trajectories previously discussed Chapter 3. begin predicting three types growth trajectories individual: (1) pooling trajectory estimated separate linear models; (2) population average trajectory estimated multilevel model change, condition random effects; (3) model-based trajectory estimated multilevel model change, condition random effects. Now can plot three trajectories. Similar partial pooling example Chapter 3, notice : population average trajectories stable, varying least across individuals. pooling trajectories least stable, varying across individuals. model-based trajectories fall somewhere population average pooling trajectories, due effects partial pooling. Although typically prefer model-based trajectories multilevel model change, Singer Willett (2003) conclude cautioning model-based trajectories flawed model flawed well—quality depends heavily quality model fit soundness model’s assumptions (given data).","code":"alcohol_use_1_np_pred <- alcohol_use_1_fit_np |> map(augment) |> list_rbind(names_to = \"id\") |> mutate(trajectory = \"no_pooling\") alcohol_use_1_pp_pred <- list(population_average = NA, model_based = NULL) |> map(\\(.x) augment(model_F, re.form = .x)) |> list_rbind(names_to = \"trajectory\") # For display purposes we will tidy up the prediction data frames and only use # a subset of participants. alcohol_use_1_preds <- alcohol_use_1_np_pred |> bind_rows(alcohol_use_1_pp_pred) |> filter(id %in% c(4, 14, 23, 32, 41, 56, 65, 82)) |> rename(age = `I(age - 14)`) |> mutate( trajectory = factor( trajectory, levels = c(\"population_average\", \"no_pooling\", \"model_based\") ), id = factor(id, levels = sort(as.numeric(unique(id)))), age = as.numeric(age + 14), ) |> select(trajectory, id, age, alcohol_use, .fitted) alcohol_use_1_preds #> # A tibble: 72 × 5 #> trajectory id age alcohol_use .fitted #> #> 1 no_pooling 4 14 0 0.378 #> 2 no_pooling 4 15 2 1.24 #> 3 no_pooling 4 16 1.73 2.11 #> 4 no_pooling 14 14 2.83 3.09 #> 5 no_pooling 14 15 3.61 3.09 #> 6 no_pooling 14 16 2.83 3.09 #> 7 no_pooling 23 14 1 0.878 #> 8 no_pooling 23 15 1 1.24 #> 9 no_pooling 23 16 1.73 1.61 #> 10 no_pooling 32 14 1.73 1.75 #> # ℹ 62 more rows # Figure 4.7: ggplot(alcohol_use_1_preds, aes(x = age)) + geom_point(aes(y = alcohol_use)) + geom_line(aes(y = .fitted, colour = trajectory)) + scale_colour_brewer(palette = \"Dark2\") + scale_y_continuous(breaks = 0:4) + coord_cartesian(xlim = c(13, 17), ylim = c(-1, 4)) + facet_wrap(vars(id), nrow = 2, labeller = label_both)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-5.html","id":"variably-spaced-measurement-occasions","dir":"Articles","previous_headings":"","what":"5.1 Variably Spaced Measurement Occasions","title":"Chapter 5: Treating time more flexibly","text":"Section 5.1 Singer Willett (2003) demonstrate can fit multilevel model change data variably spaced measurement occasions using subset data Children National Longitudinal Study Youth (US Bureau Labor Statistics), measured changes reading subtest Peabody Individual Achievement Test (PIAT) sample 89 African-American children across three waves around ages 6, 8, 10. example use reading_scores data set, person-period data frame 267 rows 5 columns: id: Child ID. wave: Wave measurement. age_group: Expected age measurement occasion. age: Age years time measurement. reading_score: Reading score reading subtest Peabody Individual Achievement Test (PIAT). Note structure reading_scores data identical person-period data sets shown previous chapters, except three time-indicator variables: values wave reflect study’s design; time-structured across children, little substantive meaning. values age_group reflect child’s expected age measurement occasion; time-structured across children substantive meaning. values age reflect child’s actual age measurement occasion; variably spaced across children substantive meaning. demonstrates distinctive feature time-unstructured data sets—possibility multiple representations time. Thus, perspective age_group variable, reading_scores data appears time-structured: Whereas perspective age variable, reading_scores data appears variably spaced: However, Singer Willett (2003) discuss, specification, estimation, interpretation multilevel model change proceeds exact way regardless temporal representation use; thus, generally preferable use accurate unstructured temporal representation rather forcing data time-structured design. fit unconditional growth model using structured unstructured temporal representations demonstrate latter generally preferable. usual, begin inspecting empirical growth plots help select functional form level-1 submodel. linear change individual growth model seems parsimonious temporal representations. Following Singer Willett (2003), centre age_group age age 6.5 (average child’s age wave 1) parameters models identical interpretations, label time model generic time variable. Comparing models, see age model fits data better age_group model—less unexplained variation initial status rates change, smaller AIC BIC statistics.","code":"# Table 5.1, page 141: reading_scores #> # A tibble: 267 × 5 #> id wave age_group age reading_score #> #> 1 1 1 6.5 6 18 #> 2 1 2 8.5 8.33 35 #> 3 1 3 10.5 10.3 59 #> 4 2 1 6.5 6 18 #> 5 2 2 8.5 8.5 25 #> 6 2 3 10.5 10.6 28 #> 7 3 1 6.5 6.08 18 #> 8 3 2 8.5 8.42 23 #> 9 3 3 10.5 10.4 32 #> 10 4 1 6.5 6 18 #> # ℹ 257 more rows select(reading_scores, id, age_group, reading_score) #> # A tibble: 267 × 3 #> id age_group reading_score #> #> 1 1 6.5 18 #> 2 1 8.5 35 #> 3 1 10.5 59 #> 4 2 6.5 18 #> 5 2 8.5 25 #> 6 2 10.5 28 #> 7 3 6.5 18 #> 8 3 8.5 23 #> 9 3 10.5 32 #> 10 4 6.5 18 #> # ℹ 257 more rows select(reading_scores, id, age, reading_score) #> # A tibble: 267 × 3 #> id age reading_score #> #> 1 1 6 18 #> 2 1 8.33 35 #> 3 1 10.3 59 #> 4 2 6 18 #> 5 2 8.5 25 #> 6 2 10.6 28 #> 7 3 6.08 18 #> 8 3 8.42 23 #> 9 3 10.4 32 #> 10 4 6 18 #> # ℹ 257 more rows # Figure 5.1, page 143: reading_scores |> filter(id %in% c(4, 27, 31, 33, 41, 49, 69, 77, 87)) |> pivot_longer( starts_with(\"age\"), names_to = \"time_indicator\", values_to = \"age\" ) |> ggplot(aes(x = age, y = reading_score, colour = time_indicator)) + geom_point() + stat_smooth(method = \"lm\", se = FALSE, linewidth = .5) + scale_x_continuous(breaks = 5:12) + scale_color_brewer(palette = \"Dark2\") + coord_cartesian(xlim = c(5, 12), ylim = c(0, 80)) + facet_wrap(vars(id), labeller = label_both) reading_scores_fits <- map( list(age_group = \"age_group\", age = \"age\"), \\(.time) { lmer( reading_score ~ I(time - 6.5) + (1 + I(time - 6.5) | id), data = mutate(reading_scores, time = .data[[.time]]), REML = FALSE ) } ) options(modelsummary_get = \"all\") # Table 5.2, page 145: reading_scores_fits |> modelsummary( shape = term + effect + statistic ~ model, scales = c(\"vcov\", NA), coef_map = c( \"(Intercept)\", \"I(time - 6.5)\", \"var__Observation\", \"var__(Intercept)\", \"var__I(time - 6.5)\" ), gof_map = tibble( raw = c(\"deviance\", \"AIC\", \"BIC\"), clean = c(\"Deviance\", \"AIC\", \"BIC\"), fmt = 1 ), output = \"gt\" ) |> tab_row_group(label = \"Goodness-of-Fit\", rows = 8:10) |> tab_row_group(label = \"Variance Components\", rows = 5:7) |> tab_row_group(label = \"Fixed Effects\", rows = 1:4) |> cols_hide(effect)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-5.html","id":"varying-numbers-of-measurement-occasions","dir":"Articles","previous_headings":"","what":"5.2 Varying Numbers of Measurement Occasions","title":"Chapter 5: Treating time more flexibly","text":"Section 5.2 Singer Willett (2003) demonstrate can fit multilevel model change data varying numbers measurement occasions (.e., unbalanced data) using subset data National Longitudinal Study Youth tracking labour market experiences male high school dropouts (Murnane, Boudett, & Willett, 1999). example use dropout_wages data set, person-period data frame 6402 rows 9 columns: id: Participant ID. log_wages: Natural logarithm wages. experience: Labour force experience years, tracked dropouts’ first day work. ged: Binary indicator whether dropout obtained GED. postsecondary_education: Binary indicator whether dropout obtained post-secondary education. black: Binary indicator whether dropout black. hispanic: Binary indicator whether dropout hispanic. highest_grade: Highest grade completed. unemployment_rate: Unemployment rate local geographic area. dropout_wages data, number measurement occasions varies widely across individuals, 1 13 waves. Indeed, examining data subset individuals, can see dropout_wages data varies number spacing measurement occasions. Yet, Singer Willett (2003) discuss, major advantage multilevel model change can easily fit unbalanced data like —long person-period data set includes enough people enough waves data model converge, analyses can proceed usual. fit three models dropout_wages data: unconditional growth model (Model ), two models include predictors race highest grade completed (Models B C). Likewise, even data varying numbers measurement occasions, prototypical change trajectories can derived model usual.","code":"dropout_wages #> # A tibble: 6,402 × 9 #> id log_wages experience ged postsecondary_education black hispanic #> #> 1 31 1.49 0.015 1 0.015 0 1 #> 2 31 1.43 0.715 1 0.715 0 1 #> 3 31 1.47 1.73 1 1.73 0 1 #> 4 31 1.75 2.77 1 2.77 0 1 #> 5 31 1.93 3.93 1 3.93 0 1 #> 6 31 1.71 4.95 1 4.95 0 1 #> 7 31 2.09 5.96 1 5.96 0 1 #> 8 31 2.13 6.98 1 6.98 0 1 #> 9 36 1.98 0.315 1 0.315 0 0 #> 10 36 1.80 0.983 1 0.983 0 0 #> # ℹ 6,392 more rows #> # ℹ 2 more variables: highest_grade , unemployment_rate dropout_wages |> group_by(id) |> summarise(waves = n()) |> count(waves, name = \"count\") #> # A tibble: 13 × 2 #> waves count #> #> 1 1 38 #> 2 2 39 #> 3 3 47 #> 4 4 35 #> 5 5 74 #> 6 6 92 #> 7 7 103 #> 8 8 123 #> 9 9 127 #> 10 10 113 #> 11 11 65 #> 12 12 26 #> 13 13 6 # Table 5.3, page 147: dropout_wages |> filter(id %in% c(206, 332, 1028)) |> select(id, experience, log_wages, black, highest_grade, unemployment_rate) #> # A tibble: 20 × 6 #> id experience log_wages black highest_grade unemployment_rate #> #> 1 206 1.87 2.03 0 10 9.2 #> 2 206 2.81 2.30 0 10 11 #> 3 206 4.31 2.48 0 10 6.30 #> 4 332 0.125 1.63 0 8 7.1 #> 5 332 1.62 1.48 0 8 9.6 #> 6 332 2.41 1.80 0 8 7.2 #> 7 332 3.39 1.44 0 8 6.20 #> 8 332 4.47 1.75 0 8 5.60 #> 9 332 5.18 1.53 0 8 4.60 #> 10 332 6.08 2.04 0 8 4.30 #> 11 332 7.04 2.18 0 8 3.40 #> 12 332 8.20 2.19 0 8 4.39 #> 13 332 9.09 4.04 0 8 6.70 #> 14 1028 0.004 0.872 1 8 9.3 #> 15 1028 0.035 0.903 1 8 7.4 #> 16 1028 0.515 1.39 1 8 7.3 #> 17 1028 1.48 2.32 1 8 7.4 #> 18 1028 2.14 1.48 1 8 6.30 #> 19 1028 3.16 1.70 1 8 5.90 #> 20 1028 4.10 2.34 1 8 6.9 # Fit models ------------------------------------------------------------------ dropout_wages_fit_A <- lmer( log_wages ~ experience + (1 + experience | id), data = dropout_wages, REML = FALSE ) dropout_wages_fit_B <- update( dropout_wages_fit_A, . ~ . + experience * I(highest_grade - 9) + experience * black ) # The model fails to converge with the default optimizer (although the estimates # are fine). Changing the optimizer achieves convergence. dropout_wages_fit_C <- update( dropout_wages_fit_B, . ~ . - experience:I(highest_grade - 9) - black, control = lmerControl(optimizer = \"bobyqa\") ) dropout_wages_fits <- list( `Model A` = dropout_wages_fit_A, `Model B` = dropout_wages_fit_B, `Model C` = dropout_wages_fit_C ) # Make table ------------------------------------------------------------------ # Table 5.4, page 149: dropout_wages_fits |> modelsummary( shape = term + effect + statistic ~ model, scales = c(\"vcov\", NA), coef_map = c( \"(Intercept)\", \"I(highest_grade - 9)\", \"black\", \"experience\", \"experience:I(highest_grade - 9)\", \"experience:black\", \"var__Observation\", \"var__(Intercept)\", \"var__experience\" ), gof_map = tibble( raw = c(\"deviance\", \"AIC\", \"BIC\"), clean = c(\"Deviance\", \"AIC\", \"BIC\"), fmt = 1 ), output = \"gt\" ) |> tab_row_group(label = \"Goodness-of-Fit\", rows = 15:18) |> tab_row_group(label = \"Variance Components\", rows = 13:15) |> tab_row_group(label = \"Fixed Effects\", rows = 1:12) |> cols_hide(effect) prototypical_dropout_wages <- dropout_wages_fit_C |> estimate_prediction( data = crossing( experience = c(0, 12), highest_grade = c(0, 3) + 9, black = c(FALSE, TRUE) ) ) |> rename(log_wages = Predicted) |> mutate(highest_grade = factor(highest_grade)) |> as_tibble() # Figure 5.2, page 150: ggplot(prototypical_dropout_wages, aes(x = experience, y = log_wages)) + geom_line(aes(colour = highest_grade, linetype = black)) + scale_x_continuous(breaks = seq(0, 12, by = 2)) + scale_color_brewer(palette = \"Dark2\") + scale_linetype_manual(values = c(2, 1)) + coord_cartesian(ylim = c(1.6, 2.4))"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-5.html","id":"practical-problems-that-may-arise-when-analyzing-unbalanced-data-sets","dir":"Articles","previous_headings":"5.2 Varying Numbers of Measurement Occasions","what":"5.2.2 Practical Problems That May Arise When Analyzing Unbalanced Data Sets","title":"Chapter 5: Treating time more flexibly","text":"multilevel model may fail converge unable estimate one variance components data sets severely unbalanced, people enough waves data. Section 5.2.2 Singer Willett (2003) discuss two strategies addressing problems: Removing boundary constraints, software permitted obtain negative variance components. Fixing rates change, model simplified removing varying slope change. example subset dropout_wages data purposefully constructed severely unbalanced. First refit Model C dropout_wages_subset data. Note estimated variance component experience practically zero model summary following message bottom: “boundary (singular) fit: see help(‘isSingular’)”. first strategy Singer Willett (2003) suggest remove boundary constraints software, however, lme4 package support removal boundary constraints allow negative variance components, strategy replicated (Model B). second strategy simplify model fixing rates change, removing varying slope experience. model fits without issue. Comparing Models C, note deviance statistics identical, AIC BIC statistics smaller Model C, suggesting : (1) Model C improvement Model ; (2) effectively model systematic interindividual differences rates change data set.","code":"dropout_wages_subset #> # A tibble: 257 × 5 #> id log_wages experience black highest_grade #> #> 1 206 2.03 1.87 0 10 #> 2 206 2.30 2.81 0 10 #> 3 206 2.48 4.31 0 10 #> 4 266 1.81 0.322 0 9 #> 5 304 1.84 0.58 0 8 #> 6 329 1.42 0.016 0 8 #> 7 329 1.31 0.716 0 8 #> 8 329 1.88 1.76 0 8 #> 9 336 1.89 1.91 1 8 #> 10 336 1.28 2.51 1 8 #> # ℹ 247 more rows dropout_wages_fit_A_subset <- update( dropout_wages_fit_C, data = dropout_wages_subset ) summary(dropout_wages_fit_A_subset) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: log_wages ~ experience + (1 + experience | id) + I(highest_grade - #> 9) + experience:black #> Data: dropout_wages_subset #> Control: lmerControl(optimizer = \"bobyqa\") #> #> AIC BIC logLik deviance df.resid #> 299.9 328.3 -141.9 283.9 249 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -2.4109 -0.4754 -0.0290 0.4243 4.2842 #> #> Random effects: #> Groups Name Variance Std.Dev. Corr #> id (Intercept) 8.215e-02 0.286615 #> experience 3.526e-06 0.001878 1.00 #> Residual 1.150e-01 0.339068 #> Number of obs: 257, groups: id, 124 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) 1.73734 0.04760 36.499 #> experience 0.05161 0.02108 2.449 #> I(highest_grade - 9) 0.04610 0.02447 1.884 #> experience:black -0.05968 0.03477 -1.716 #> #> Correlation of Fixed Effects: #> (Intr) exprnc I(_-9) #> experience -0.612 #> I(hghst_-9) 0.051 -0.133 #> exprnc:blck -0.129 -0.297 0.023 #> optimizer (bobyqa) convergence code: 0 (OK) #> boundary (singular) fit: see help('isSingular') dropout_wages_fit_C_subset <- update( dropout_wages_fit_A_subset, . ~ . - (1 + experience | id) + (1 | id) ) summary(dropout_wages_fit_C_subset) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: #> log_wages ~ experience + I(highest_grade - 9) + (1 | id) + experience:black #> Data: dropout_wages_subset #> Control: lmerControl(optimizer = \"bobyqa\") #> #> AIC BIC logLik deviance df.resid #> 295.9 317.2 -141.9 283.9 251 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -2.4202 -0.4722 -0.0290 0.4197 4.2439 #> #> Random effects: #> Groups Name Variance Std.Dev. #> id (Intercept) 0.08425 0.2903 #> Residual 0.11480 0.3388 #> Number of obs: 257, groups: id, 124 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) 1.73734 0.04775 36.383 #> experience 0.05178 0.02093 2.474 #> I(highest_grade - 9) 0.04576 0.02450 1.868 #> experience:black -0.06007 0.03458 -1.737 #> #> Correlation of Fixed Effects: #> (Intr) exprnc I(_-9) #> experience -0.614 #> I(hghst_-9) 0.051 -0.135 #> exprnc:blck -0.130 -0.294 0.024 dropout_wages_fits_subset <- list( `Model A` = dropout_wages_fit_A_subset, `Model C` = dropout_wages_fit_C_subset ) # Table 5.5, page 154: dropout_wages_fits_subset |> modelsummary( shape = term + effect + statistic ~ model, scales = c(\"vcov\", NA), coef_map = c( \"(Intercept)\", \"I(highest_grade - 9)\", \"black\", \"experience\", \"experience:I(highest_grade - 9)\", \"experience:black\", \"var__Observation\", \"var__(Intercept)\", \"var__experience\" ), gof_map = tibble( raw = c(\"deviance\", \"AIC\", \"BIC\"), clean = c(\"Deviance\", \"AIC\", \"BIC\"), fmt = 1 ), output = \"gt\" ) |> tab_row_group(label = \"Goodness-of-Fit\", rows = 12:14) |> tab_row_group(label = \"Variance Components\", rows = 9:11) |> tab_row_group(label = \"Fixed Effects\", rows = 1:8) |> cols_hide(effect)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-5.html","id":"time-varying-predictors","dir":"Articles","previous_headings":"","what":"5.3 Time-Varying Predictors","title":"Chapter 5: Treating time more flexibly","text":"Section 5.3 Singer Willett (2003) demonstrate fit multilevel model change data time-varying predictors using subset data Ginexi, Howe, Caplan (2000), measured changes depressive symptoms job loss sample 254 recently unemployed men women. Interviews conducted three waves around 1, 5, 12 months job loss. example use depression_unemployment data set, person-period data frame 674 rows 5 columns: id: Participant ID. interview: Time interview. months: Months since job loss. depression: Total score Center Epidemiologic Studies’ Depression (CES-D) scale (Radloff, 1977). unemployed: Binary indicator whether participant unemployed time interview. Note participants unemployed first interview, changes unemployment status gathered second third interviews. depression_unemployment data, number spacing measurement occasions varies across individuals. total 193 participants (76%) three interviews, 34 participants (13.4%) two interviews, 27 participants (10.6%) one interview. average time job loss first interview 27.6 days (SD = 10.7; range = 2-61), 151 days second interview (SD = 18.3; range = 111-220), 359 days third interview (SD = 19.1; range = 319-458). Additionally, examining data subset individuals, can see unemployed variable time-varying predictor several unique patterns change across participants. Considering participants complete data, 78 unemployed every interview (pattern 1-1-1), 55 always employed first interview (pattern 1-0-0), 41 still unemployed second interview employed third (pattern 1-1-0), 19 employed second interview unemployed third (pattern 1-0-1). previous examples, special strategies needed fit multilevel model change time-varying predictors. However, Singer Willett (2003) discuss, inclusion time-varying predictors model implies existence multiple continuous discontinuous change trajectories—one possible pattern time-varying predictors. fit four models depression_unemployment data: unconditional growth model (Model ), model includes main effect time-varying predictor (Model B), model includes interaction effect time-varying predictor (Model C), model allows time-varying predictor fixed random effects (Model D). Note Model D, Singer Willett (2003) fit model using SAS, report issues model given data; however, programs (R, MPlus, SPSS, STATA) convergence/singularity problems possible get results match textbook. programs react differently situation, reasonable conclude problem software, model complex, given data.","code":"depression_unemployment #> # A tibble: 674 × 5 #> id interview months depression unemployed #> #> 1 103 1 1.15 25 1 #> 2 103 2 5.95 16 1 #> 3 103 3 12.9 33 1 #> 4 641 1 0.789 27 1 #> 5 641 2 4.86 7 0 #> 6 641 3 11.8 25 0 #> 7 741 1 1.05 40 1 #> 8 846 1 0.624 2 1 #> 9 846 2 4.93 22 1 #> 10 846 3 11.8 0 0 #> # ℹ 664 more rows depression_unemployment |> group_by(id) |> summarise(waves = n()) |> count(waves, name = \"count\") |> mutate(proportion = count / sum(count)) #> # A tibble: 3 × 3 #> waves count proportion #> #> 1 1 27 0.106 #> 2 2 34 0.134 #> 3 3 193 0.760 depression_unemployment |> group_by(interview) |> mutate(days = months * 30.4167) |> summarise( mean = mean(days), sd = sd(days), min = min(days), max = max(days) ) #> # A tibble: 3 × 5 #> interview mean sd min max #> #> 1 1 27.6 10.7 2.00 61.0 #> 2 2 151. 18.4 111. 220. #> 3 3 359. 19.1 319. 458. # Table 5.6, page 161: filter(depression_unemployment, id %in% c(7589, 55697, 67641, 65441, 53782)) #> # A tibble: 14 × 5 #> id interview months depression unemployed #> #> 1 7589 1 1.31 36 1 #> 2 7589 2 5.09 40 1 #> 3 7589 3 11.8 39 1 #> 4 53782 1 0.427 22 1 #> 5 53782 2 4.24 15 0 #> 6 53782 3 11.1 21 1 #> 7 55697 1 1.35 7 1 #> 8 55697 2 5.78 4 1 #> 9 65441 1 1.08 27 1 #> 10 65441 2 4.70 15 1 #> 11 65441 3 11.3 7 0 #> 12 67641 1 0.329 32 1 #> 13 67641 2 4.11 9 0 #> 14 67641 3 10.9 10 0 unemployed_patterns <- depression_unemployment |> group_by(id) |> filter(n() == 3) |> summarise(unemployed_pattern = paste(unemployed, collapse = \"-\")) |> count(unemployed_pattern, name = \"count\") unemployed_patterns #> # A tibble: 4 × 2 #> unemployed_pattern count #> #> 1 1-0-0 55 #> 2 1-0-1 19 #> 3 1-1-0 41 #> 4 1-1-1 78 # Fit models ------------------------------------------------------------------ depression_unemployment_fit_A <- lmer( depression ~ months + (1 + months | id), data = depression_unemployment, REML = FALSE ) # The model fails to converge with the default optimizer (although the # estimates are fine). Changing the optimizer achieves convergence. depression_unemployment_fit_B <- update( depression_unemployment_fit_A, . ~ . + unemployed, control = lmerControl(optimizer = \"bobyqa\") ) depression_unemployment_fit_C <- update( depression_unemployment_fit_B, . ~ . + months:unemployed ) # The number of observations is less than the number of random effects levels # for each term, which makes the random effects variances (probably) # unidentifiable in this model and throws an error. In order to fit the model # we need to ignore this check. depression_unemployment_fit_D <- lmer( depression ~ unemployed + unemployed:months + (1 + unemployed + months:unemployed | id), data = depression_unemployment, REML = FALSE, control = lmerControl(check.nobs.vs.nRE = \"ignore\") ) depression_unemployment_fits <- list( `Model A` = depression_unemployment_fit_A, `Model B` = depression_unemployment_fit_B, `Model C` = depression_unemployment_fit_C, `Model D` = depression_unemployment_fit_D ) # Make table ------------------------------------------------------------------ # Table 5.7, page 163: depression_unemployment_fits |> modelsummary( shape = term + effect + statistic ~ model, scales = c(\"vcov\", NA), coef_map = c( \"(Intercept)\" = \"(Intercept)\", \"months\" = \"months\", \"black\" = \"black\", \"unemployed\" = \"unemployed\", \"months:unemployed\" = \"months:unemployed\", \"unemployed:months\" = \"months:unemployed\", \"var__Observation\" = \"var__Observation\", \"var__(Intercept)\" = \"var__(Intercept)\", \"var__months\" = \"var__months\", \"var__unemployed\" = \"var__unemployed\", \"var__unemployed:months\" = \"var__unemployed:months\" ), gof_map = tibble( raw = c(\"deviance\", \"AIC\", \"BIC\"), clean = c(\"Deviance\", \"AIC\", \"BIC\"), fmt = 1 ), output = \"gt\" ) |> tab_row_group(label = \"Goodness-of-Fit\", rows = 14:16) |> tab_row_group(label = \"Variance Components\", rows = 9:13) |> tab_row_group(label = \"Fixed Effects\", rows = 1:8) |> cols_hide(effect)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-5.html","id":"plotting-discontinuous-change-trajectories","dir":"Articles","previous_headings":"5.3 Time-Varying Predictors","what":"Plotting discontinuous change trajectories","title":"Chapter 5: Treating time more flexibly","text":"Unlike previous examples, addition time-varying predictor model implies given change trajectory may composed either one continuous segment multiple discontinuous segments. , new strategies required construct data set prototypical individuals plot fitted change trajectories, segment start end times predictions. data set can either wide long format, however: wide formats, segment must plotted using geom_segment() function ggplot2 package; whereas long formats, segment must grouping ID, can otherwise plotted using geom_line() function usual. demonstrate formats constructing prototypical change trajectories Model B. convenient way construct data set prototypical individuals wide format reframe() function dplyr package, works similarly summarise() function, can return arbitrary number rows per group. use (1) expand unemployed_pattern string numeric vector using str_extract_all() function stringr package; (2) add start stop times segment. prediction proceeds usual, except use dplyr’s across() function avoid writing predict() code. Although plot prototypical trajectories using wide format data, note convenient way create grouping ID long format data consecutive_id() function dplyr package, generates unique identifier increments every time variable changes. resulting variable can passed ggplot2’s group aesthetic ensure correct cases connected together. Now can plot four trajectories. alternative strategy plotting discontinuous change trajectories suggested Singer Willett (2003) represent wide variety transition times using just two continuous trajectories encompass extreme contrasts possible: , someone consistently unemployed, someone consistently employed. approach, prototypical change trajectories can predicted plotted using strategies used models time-invariant predictors, conveying () information set discontinuous trajectories . demonstrate alternative strategy Models B, C, D. depression_unemployment study’s design, start fitted trajectory consistently employed individual 3.5 months—earliest time participant second interview. examining plots like , Singer Willett (2003) suggest thinking two extreme trajectories envelope representing complete set prototypical individuals implied model: participants unemployed first interview (design), individual starts unemployed trajectory. second interview—regardless transition time—become employed move employed trajectory, don’t stay unemployed trajectory. third interview—regardless transition time—become unemployed move back unemployed trajectory, don’t stay employed trajectory.","code":"prototypical_depression_B <- unemployed_patterns |> select(-count) |> group_by(unemployed_pattern) |> reframe( unemployed = str_extract_all(unemployed_pattern, \"[:digit:]\", simplify = TRUE), unemployed = as.numeric(unemployed), months_start = c(0, 5, 10), months_end = c(5, 10, 15), ) |> mutate( across( starts_with(\"months\"), \\(.time) { predict( depression_unemployment_fit_B, tibble(unemployed, months = .time), re.form = NA ) }, .names = \"depression_{.col}\" ), unemployed_pattern = factor( unemployed_pattern, levels = c(\"1-1-1\", \"1-0-0\", \"1-1-0\", \"1-0-1\") ) ) |> rename_with( \\(.x) str_remove(.x, \"months_\"), .cols = starts_with(\"depression\") ) prototypical_depression_B #> # A tibble: 12 × 6 #> unemployed_pattern unemployed months_start months_end depression_start #> #> 1 1-0-0 1 0 5 17.8 #> 2 1-0-0 0 5 10 11.7 #> 3 1-0-0 0 10 15 10.6 #> 4 1-0-1 1 0 5 17.8 #> 5 1-0-1 0 5 10 11.7 #> 6 1-0-1 1 10 15 15.8 #> 7 1-1-0 1 0 5 17.8 #> 8 1-1-0 1 5 10 16.8 #> 9 1-1-0 0 10 15 10.6 #> 10 1-1-1 1 0 5 17.8 #> 11 1-1-1 1 5 10 16.8 #> 12 1-1-1 1 10 15 15.8 #> # ℹ 1 more variable: depression_end prototypical_depression_B |> pivot_longer( cols = c(starts_with(\"months\"), starts_with(\"depression\")), names_to = c(\".value\"), names_pattern = \"(^.*(?=_))\" ) |> group_by(unemployed_pattern) |> mutate(cid = consecutive_id(unemployed), .after = unemployed_pattern) #> # A tibble: 24 × 5 #> # Groups: unemployed_pattern [4] #> unemployed_pattern cid unemployed months depression #> #> 1 1-0-0 1 1 0 17.8 #> 2 1-0-0 1 1 5 16.8 #> 3 1-0-0 2 0 5 11.7 #> 4 1-0-0 2 0 10 10.6 #> 5 1-0-0 2 0 10 10.6 #> 6 1-0-0 2 0 15 9.64 #> 7 1-0-1 1 1 0 17.8 #> 8 1-0-1 1 1 5 16.8 #> 9 1-0-1 2 0 5 11.7 #> 10 1-0-1 2 0 10 10.6 #> # ℹ 14 more rows # Figure 5.3: ggplot(prototypical_depression_B, aes(x = months_start, y = depression_start)) + geom_segment(aes(xend = months_end, yend = depression_end)) + coord_cartesian(ylim = c(5, 20)) + facet_wrap(vars(unemployed_pattern), labeller = label_both) + labs(x = \"months\", y = \"depression\") prototypical_depression <- depression_unemployment_fits[-1] |> map( \\(.fit) { .fit |> estimate_prediction( data = tibble(months = c(0, 14, 3.5, 14), unemployed = c(1, 1, 0, 0)) ) |> rename(depression = Predicted) |> mutate(unemployed = as.logical(unemployed)) |> as_tibble() } ) |> list_rbind(names_to = \"model\") # Figure 5.4, page 167: ggplot(prototypical_depression, aes(x = months, y = depression)) + geom_line(aes(colour = unemployed)) + scale_x_continuous(breaks = seq(0, 14, by = 2)) + scale_color_brewer(palette = \"Dark2\") + coord_cartesian(xlim = c(0, 14), ylim = c(5, 20)) + facet_wrap(vars(model))"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-5.html","id":"recentring-time-varying-predictors","dir":"Articles","previous_headings":"5.3 Time-Varying Predictors","what":"5.3.3 Recentring time-varying predictors","title":"Chapter 5: Treating time more flexibly","text":"Section 5.3.3 Singer Willett (2003) return dropout_wages data discuss three strategies centring time-varying predictors: Constant centring: Centre around single substantively meaningful constant observations. Within-person centring: Decompose time-varying predictor two constituent predictors , individual, first predictor within-person mean; second predictor measurement occasion’s deviation within-person mean. Time-one centring: Decompose time-varying predictor two constituent predictors , individual, first predictor value first measurement occasion; second predictor measurement occasion’s deviation first measurement occasion. demonstrate strategies updating Model C, dropout_wages_fit_C, include main effect time-varying predictor unemployment_rate, fitting model uses constant centring (Model A2), within-person centring (Model B2), time-one centring (Model C2).","code":"# Fit models ------------------------------------------------------------------ dropout_wages_fit_A2 <- update( dropout_wages_fit_C, . ~ . + I(unemployment_rate - 7) ) dropout_wages_fit_B2 <- update( dropout_wages_fit_C, . ~ . + unemployment_rate_mean + unemployment_rate_dev, data = mutate( dropout_wages, unemployment_rate_mean = mean(unemployment_rate), unemployment_rate_dev = unemployment_rate - unemployment_rate_mean, .by = id ) ) dropout_wages_fit_C2 <- update( dropout_wages_fit_C, . ~ . + unemployment_rate_first + unemployment_rate_dev, data = mutate( dropout_wages, unemployment_rate_first = first(unemployment_rate), unemployment_rate_dev = unemployment_rate - unemployment_rate_first, .by = id ) ) dropout_wages_fits_2 <- list( `Model A2` = dropout_wages_fit_A2, `Model B2` = dropout_wages_fit_B2, `Model C2` = dropout_wages_fit_C2 ) # Make table ------------------------------------------------------------------ # Table 5.8: dropout_wages_fits_2 |> modelsummary( shape = term + effect + statistic ~ model, scales = c(\"vcov\", NA), coef_map = c( \"(Intercept)\" = \"(Intercept)\", \"I(highest_grade - 9)\" = \"I(highest_grade - 9)\", \"I(unemployment_rate - 7)\" = \"unemployment_rate\", \"unemployment_rate_mean\" = \"unemployment_rate\", \"unemployment_rate_first\" = \"unemployment_rate\", \"unemployment_rate_dev\" = \"unemployment_rate_dev\", \"black\" = \"black\", \"experience\" = \"experience\", \"experience:I(highest_grade - 9)\" = \"experience:I(highest_grade - 9)\", \"experience:black\" = \"experience:black\", \"var__Observation\" = \"var__Observation\", \"var__(Intercept)\" = \"var__(Intercept)\", \"var__experience\" = \"var__experience\" ), gof_map = tibble( raw = c(\"deviance\", \"AIC\", \"BIC\"), clean = c(\"Deviance\", \"AIC\", \"BIC\"), fmt = 1 ), fmt = 4, output = \"gt\" ) |> tab_row_group(label = \"Goodness-of-Fit\", rows = 16:18) |> tab_row_group(label = \"Variance Components\", rows = 13:15) |> tab_row_group(label = \"Fixed Effects\", rows = 1:12) |> cols_hide(effect)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-5.html","id":"recentring-the-effect-of-time","dir":"Articles","previous_headings":"","what":"5.4 Recentring the effect of time","title":"Chapter 5: Treating time more flexibly","text":"Section 5.4 Singer Willett (2003) discuss strategies centring time-indicator variables using subset data Tomarken, Shelton, Elkins, Anderson (1997), measured relation changes positive mood supplemental antidepressant medication course week sample 73 men women already receiving nonpharmacological therapy depression. example use antidepressants data set, person-period data frame 1242 rows 8 columns: id: Participant ID. wave: Wave measurement. day: Day measurement. reading: Time day reading taken. positive_mood: Positive mood score. treatment: Treatment condition (placebo pills = 0, antidepressant pills = 1). Note antidepressants data three time-indicator variables, providing different representation time: values wave reflect study’s design, little substantive meaning due conceptual difficulty dividing one week 21 components. values day reflect study’s design meaningful way, fail distinguish morning, afternoon, evening readings. values reading also reflect study’s design meaningful way—capturing time day reading taken—fail distinguish days, difficult analyze due character vector. facilitate model fitting, can create new time-indicator variables meaningful easier analyze. create two new time-indicator variables: time_of_day: Time day reading taken, expressed numerically (0 morning readings; 0.33 afternoon readings; 0.67 evening readings). time: Time measurement expressed combination day time_of_day. advantage time variable captures aspects time antidepressants data single variable, making easy centre different time points study. Following Singer Willett (2003), centre time three different points study: time: centred initial status. time_3.33: centred study’s midpoint. time_6.67: centred study’s final wave. fit three models antidepressants data demonstrate centring time affects parameter estimates interpretation: model time centred initial status (Model ), model time centred study’s midpoint (Model B), model time centred study’s final wave (Model C). Notice parameters related slope identical Models , B, C, related intercept different. Singer Willett (2003) explain, centring time-indicator variable changes location fitted trajectory’s anchors around given point time. can visualize anchoring effect plotting prototypical change trajectories models fit antidepressants data: dashed vertical lines highlight, centring time-indicator variable changes location focal comparison control treatment groups model, causing resultant estimates describe trajectories behaviours specific point time. Note Models , B, C structurally identical, matter model used make predictions—prototypical change trajectories.","code":"antidepressants #> # A tibble: 1,242 × 6 #> id wave day reading positive_mood treatment #> #> 1 1 1 0 8 AM 107. 1 #> 2 1 2 0 3 PM 100 1 #> 3 1 3 0 10 PM 100 1 #> 4 1 4 1 8 AM 100 1 #> 5 1 5 1 3 PM 100 1 #> 6 1 6 1 10 PM 100 1 #> 7 1 7 2 8 AM 100 1 #> 8 1 8 2 3 PM 100 1 #> 9 1 9 2 10 PM 100 1 #> 10 1 10 3 8 AM 107. 1 #> # ℹ 1,232 more rows antidepressants <- antidepressants |> mutate( time_of_day = case_when( reading == \"8 AM\" ~ 0, reading == \"3 PM\" ~ 1/3, reading == \"10 PM\" ~ 2/3 ), time = day + time_of_day, .after = reading ) antidepressants #> # A tibble: 1,242 × 8 #> id wave day reading time_of_day time positive_mood treatment #> #> 1 1 1 0 8 AM 0 0 107. 1 #> 2 1 2 0 3 PM 0.333 0.333 100 1 #> 3 1 3 0 10 PM 0.667 0.667 100 1 #> 4 1 4 1 8 AM 0 1 100 1 #> 5 1 5 1 3 PM 0.333 1.33 100 1 #> 6 1 6 1 10 PM 0.667 1.67 100 1 #> 7 1 7 2 8 AM 0 2 100 1 #> 8 1 8 2 3 PM 0.333 2.33 100 1 #> 9 1 9 2 10 PM 0.667 2.67 100 1 #> 10 1 10 3 8 AM 0 3 107. 1 #> # ℹ 1,232 more rows # Table 5.9, page 182: antidepressants |> select(-c(id, positive_mood, treatment)) |> mutate(time_3.33 = time - 3.33, time_6.67 = time - 6.67) #> # A tibble: 1,242 × 7 #> wave day reading time_of_day time time_3.33 time_6.67 #> #> 1 1 0 8 AM 0 0 -3.33 -6.67 #> 2 2 0 3 PM 0.333 0.333 -3.00 -6.34 #> 3 3 0 10 PM 0.667 0.667 -2.66 -6.00 #> 4 4 1 8 AM 0 1 -2.33 -5.67 #> 5 5 1 3 PM 0.333 1.33 -2.00 -5.34 #> 6 6 1 10 PM 0.667 1.67 -1.66 -5.00 #> 7 7 2 8 AM 0 2 -1.33 -4.67 #> 8 8 2 3 PM 0.333 2.33 -0.997 -4.34 #> 9 9 2 10 PM 0.667 2.67 -0.663 -4.00 #> 10 10 3 8 AM 0 3 -0.33 -3.67 #> # ℹ 1,232 more rows # Fit models ------------------------------------------------------------------ antidepressants_fit_A <- lmer( positive_mood ~ treatment * time + (1 + time | id), data = antidepressants, REML = FALSE ) antidepressants_fit_B <- update( antidepressants_fit_A, data = mutate(antidepressants, time = time - 3.33), control = lmerControl(optimizer = \"bobyqa\") ) antidepressants_fit_C <- update( antidepressants_fit_A, data = mutate(antidepressants, time = time - 6.67) ) antidepressants_fits <- list( `Model A` = antidepressants_fit_A, `Model B` = antidepressants_fit_B, `Model C` = antidepressants_fit_C ) # Make table ------------------------------------------------------------------ # Table 5.10, page 184: antidepressants_fits |> modelsummary( shape = term + effect + statistic ~ model, scales = c(\"vcov\", NA), coef_map = c( \"(Intercept)\", \"treatment\", \"time\", \"treatment:time\", \"var__Observation\", \"var__(Intercept)\", \"var__time\", \"cov__(Intercept).time\" ), gof_map = tibble( raw = c(\"deviance\", \"AIC\", \"BIC\"), clean = c(\"Deviance\", \"AIC\", \"BIC\"), fmt = 1 ), output = \"gt\" ) |> tab_row_group(label = \"Goodness-of-Fit\", rows = 13:15) |> tab_row_group(label = \"Variance Components\", rows = 9:12) |> tab_row_group(label = \"Fixed Effects\", rows = 1:8) |> cols_hide(effect) protoypical_mood <- antidepressants_fit_A |> estimate_prediction( data = tibble( treatment = c(0, 0, 0, 1, 1, 1), time = c(0, 3.33, 6.67, 0, 3.33, 6.67) ) ) |> rename(positive_mood = Predicted) |> mutate(treatment = as.logical(treatment)) # Figure 5.5, page 185: ggplot(protoypical_mood, aes(x = time, y = positive_mood)) + geom_line(aes(colour = treatment)) + geom_line(aes(group = time), linetype = 2) + scale_x_continuous(breaks = seq(0, 7, by = 1)) + scale_color_brewer(palette = \"Dark2\") + coord_cartesian(ylim = c(140, 190))"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-6.html","id":"discontinuous-individual-change","dir":"Articles","previous_headings":"","what":"6.1 Discontinuous Individual Change","title":"Chapter 6: Modeling Discontinuous and Nonlinear Change","text":"Section 6.1, Singer Willett (2003) discuss strategies fitting discontinuous individual change trajectories using subset data National Longitudinal Study Youth tracking labour market experiences male high school dropouts (Murnane, Boudett, & Willett, 1999). example return dropout_wages data set introduced Chapter 5, person-period data frame 6402 rows 9 columns: id: Participant ID. log_wages: Natural logarithm wages. experience: Labour force experience years, tracked dropouts’ first day work. ged: Binary indicator whether dropout obtained GED. postsecondary_education: Binary indicator whether dropout obtained post-secondary education. black: Binary indicator whether dropout black. hispanic: Binary indicator whether dropout hispanic. highest_grade: Highest grade completed. unemployment_rate: Unemployment rate local geographic area. demonstrated Section 5.3, inclusion one () time-varying predictor(s) level-1 individual growth model can used model discontinuous individual change trajectories. Singer Willett (2003) discuss, dropout_wages data contains several time-varying predictors can used model different forms discontinuous change. can see behaviour predictors examining data subset individuals: ged: immediate shift elevation, shift slope. postsecondary_education: immediate shift slope, shift elevation. ged_x_experience: Immediate shifts elevation slope. fit ten models dropout_wages data: Model : baseline model. Model B: model adds discontinuity elevation, slope, including fixed random effects ged. Model C: model excludes variance/covariance components associated ged Model B. Model D: model adds discontinuity slope, elevation, including fixed random effects postsecondary_education. Model E: model excludes variance/covariance components associated postsecondary_education Model D. Model F: model adds discontinuity elevation slope including fixed random effects ged postsecondary_education. Model G: model excludes variance/covariance components associated postsecondary_education Model F. Model H: model excludes variance/covariance components associated ged Model F. Model : model adds discontinuity elevation slope including fixed random effects ged interaction ged experience. Model J: model excludes variance/covariance components associated interaction ged experience Model . can visualize different forms discontinuous change hypothesized models plotting fitted change trajectory single prototypical individual. can select “final” model taxonomy models comparing deviance statistics nested models, AIC/BIC statistics non-nested models (addition using combination logic, theory, prior research). Following Singer Willett (2003), choose Model F “final” model. Finally, can plot prototypical change trajectories model. Note additional information captured discontinuous trajectories compared continuous trajectories dropout_wages data presented Chapter 5 (Figure 5.2).","code":"dropout_wages #> # A tibble: 6,402 × 9 #> id log_wages experience ged postsecondary_education black hispanic #> #> 1 31 1.49 0.015 1 0.015 0 1 #> 2 31 1.43 0.715 1 0.715 0 1 #> 3 31 1.47 1.73 1 1.73 0 1 #> 4 31 1.75 2.77 1 2.77 0 1 #> 5 31 1.93 3.93 1 3.93 0 1 #> 6 31 1.71 4.95 1 4.95 0 1 #> 7 31 2.09 5.96 1 5.96 0 1 #> 8 31 2.13 6.98 1 6.98 0 1 #> 9 36 1.98 0.315 1 0.315 0 0 #> 10 36 1.80 0.983 1 0.983 0 0 #> # ℹ 6,392 more rows #> # ℹ 2 more variables: highest_grade , unemployment_rate # Table 6.1, page 192: dropout_wages |> filter(id %in% c(206, 2365, 4384)) |> select(id, log_wages, experience, ged, postsecondary_education) |> mutate(ged_x_experience = ged * experience) |> print(n = 22) #> # A tibble: 22 × 6 #> id log_wages experience ged postsecondary_education ged_x_experience #> #> 1 206 2.03 1.87 0 0 0 #> 2 206 2.30 2.81 0 0 0 #> 3 206 2.48 4.31 0 0 0 #> 4 2365 1.78 0.66 0 0 0 #> 5 2365 1.76 1.68 0 0 0 #> 6 2365 1.71 2.74 0 0 0 #> 7 2365 1.74 3.68 0 0 0 #> 8 2365 2.19 4.68 1 0 4.68 #> 9 2365 2.04 5.72 1 1.04 5.72 #> 10 2365 2.32 6.72 1 2.04 6.72 #> 11 2365 2.66 7.87 1 3.19 7.87 #> 12 2365 2.42 9.08 1 4.40 9.08 #> 13 2365 2.39 10.0 1 5.36 10.0 #> 14 2365 2.48 11.1 1 6.44 11.1 #> 15 2365 2.44 12.0 1 7.36 12.0 #> 16 4384 2.86 0.096 0 0 0 #> 17 4384 1.53 1.04 0 0 0 #> 18 4384 1.59 1.73 1 0 1.73 #> 19 4384 1.97 3.13 1 1.40 3.13 #> 20 4384 1.68 4.28 1 2.56 4.28 #> 21 4384 2.62 5.72 1 4.00 5.72 #> 22 4384 2.58 6.02 1 4.30 6.02 # Fit models ------------------------------------------------------------------ dropout_wages_fit_A <- lmer( log_wages ~ experience + I(highest_grade - 9) + experience:black + I(unemployment_rate - 7) + (1 + experience | id), data = dropout_wages, REML = FALSE ) dropout_wages_fit_B <- update( dropout_wages_fit_A, . ~ . - (1 + experience | id) + ged + (1 + experience + ged | id), control = lmerControl(optimizer = \"bobyqa\") ) dropout_wages_fit_C <- update( dropout_wages_fit_A, . ~ . + ged, control = lmerControl(optimizer = \"bobyqa\") ) dropout_wages_fit_D <- update( dropout_wages_fit_A, . ~ . - (1 + experience | id) + postsecondary_education + (1 + experience + postsecondary_education | id), control = lmerControl(optimizer = \"bobyqa\") ) dropout_wages_fit_E <- update( dropout_wages_fit_A, . ~ . + postsecondary_education ) dropout_wages_fit_F <- update( dropout_wages_fit_A, . ~ . - (1 + experience | id) + ged + postsecondary_education + (1 + experience + ged + postsecondary_education | id), control = lmerControl(optimizer = \"bobyqa\") ) dropout_wages_fit_G <- update( dropout_wages_fit_F, . ~ . - (1 + experience + ged + postsecondary_education | id) + (1 + experience + ged | id) ) dropout_wages_fit_H <- update( dropout_wages_fit_F, . ~ . - (1 + experience + ged + postsecondary_education | id) + (1 + experience + postsecondary_education | id) ) dropout_wages_fit_I <- update( dropout_wages_fit_A, . ~ . - (1 + experience | id) + ged + experience:ged + (1 + experience + ged + experience:ged | id) ) dropout_wages_fit_J <- update( dropout_wages_fit_I, . ~ . - (1 + experience + ged + experience:ged | id) + (1 + experience + ged | id) ) dropout_wages_fits <- list( `Model A` = dropout_wages_fit_A, `Model B` = dropout_wages_fit_B, `Model C` = dropout_wages_fit_C, `Model D` = dropout_wages_fit_D, `Model E` = dropout_wages_fit_E, `Model F` = dropout_wages_fit_F, `Model G` = dropout_wages_fit_G, `Model H` = dropout_wages_fit_H, `Model I` = dropout_wages_fit_I, `Model J` = dropout_wages_fit_J ) # Make table ------------------------------------------------------------------ options(modelsummary_get = \"all\") dropout_wages_fits |> modelsummary( shape = term + effect + statistic ~ model, scales = c(\"vcov\", NA), coef_map = c( \"(Intercept)\", \"I(highest_grade - 9)\", \"I(unemployment_rate - 7)\", \"black\", \"experience\", \"experience:I(highest_grade - 9)\", \"experience:black\", \"ged\", \"postsecondary_education\", \"experience:ged\", \"var__Observation\", \"var__(Intercept)\", \"var__experience\", \"var__ged\", \"var__postsecondary_education\", \"var__experience:ged\" ), gof_map = tibble( raw = c(\"deviance\", \"AIC\", \"BIC\"), clean = c(\"Deviance\", \"AIC\", \"BIC\"), fmt = 1 ), output = \"gt\" ) |> tab_row_group(label = \"Goodness-of-Fit\", rows = 23:25) |> tab_row_group(label = \"Variance Components\", rows = 17:22) |> tab_row_group(label = \"Fixed Effects\", rows = 1:16) |> cols_hide(effect) prototypical_dropout_wages <- dropout_wages_fits |> keep_at(paste0(\"Model \", c(\"B\", \"D\", \"F\", \"I\"))) |> map( \\(.fit) { prototypical_dropout <- tibble( experience = c(0, 3, 3, 10), ged = c(0, 0, 1, 1), postsecondary_education = c(0, 0, 0, 7), highest_grade = 9, black = 1, unemployment_rate = 7, cid = c(1, 1, 2, 2) ) prototypical_dropout |> mutate(log_wages = predict(.fit, prototypical_dropout, re.form = NA)) } ) |> list_rbind(names_to = \"model\") # Similar to Figure 6.2: ggplot(prototypical_dropout_wages, aes(x = experience, y = log_wages)) + geom_line(aes(group = cid)) + geom_line(aes(group = experience), alpha = .25) + facet_wrap(vars(model)) dropout_wages_anovas <- list( \"Model B\" = anova(dropout_wages_fit_B, dropout_wages_fit_A), \"Model C\" = anova(dropout_wages_fit_C, dropout_wages_fit_B), \"Model D\" = anova(dropout_wages_fit_D, dropout_wages_fit_A), \"Model E\" = anova(dropout_wages_fit_E, dropout_wages_fit_D), \"Model F\" = anova(dropout_wages_fit_F, dropout_wages_fit_B), \"Model F\" = anova(dropout_wages_fit_F, dropout_wages_fit_D), \"Model G\" = anova(dropout_wages_fit_G, dropout_wages_fit_F), \"Model H\" = anova(dropout_wages_fit_H, dropout_wages_fit_F), \"Model I\" = anova(dropout_wages_fit_I, dropout_wages_fit_B), \"Model J\" = anova(dropout_wages_fit_J, dropout_wages_fit_I) ) # Table 6.2, page 203: dropout_wages_anovas |> map(tidy) |> list_rbind(names_to = \"model\") |> select(model, term, npar, deviance, statistic, df, p.value) |> mutate( term = stringr::str_remove(term, \"dropout_wages_fit_\"), across(c(npar, df), as.integer) ) |> group_by(model) |> gt() |> cols_label(term = \"comparison\") |> fmt_number(columns = where(is.double), decimals = 2) |> sub_missing(missing_text = \"\") # Table 6.3, page 205: dropout_wages_fit_F |> list() |> set_names(\"Estimate\") |> modelsummary( shape = term + effect + statistic ~ model, scales = c(\"vcov\", NA), coef_map = c( \"(Intercept)\", \"I(highest_grade - 9)\", \"I(unemployment_rate - 7)\", \"experience\", \"experience:black\", \"ged\", \"postsecondary_education\", \"var__Observation\", \"var__(Intercept)\", \"var__experience\", \"var__ged\", \"var__postsecondary_education\" ), gof_map = tibble( raw = c(\"deviance\", \"AIC\", \"BIC\"), clean = c(\"Deviance\", \"AIC\", \"BIC\"), fmt = 1 ), output = \"gt\" ) |> tab_row_group(label = \"Goodness-of-Fit\", rows = 20:22) |> tab_row_group(label = \"Variance Components\", rows = 15:19) |> tab_row_group(label = \"Fixed Effects\", rows = 1:14) |> cols_hide(effect) prototypical_dropout_wages_F <- dropout_wages_fit_F |> estimate_prediction( data = tibble( experience = rep(c(0, 3, 3, 10), times = 4), highest_grade = rep(c(9, 12), each = 4, times = 2), black = rep(c(FALSE, TRUE), each = 8), ged = rep(c(0, 0, 1, 1), times = 4), unemployment_rate = 7, postsecondary_education = rep(c(0, 0, 0, 7), times = 4) ) ) |> rename(log_wages = Predicted) |> mutate( highest_grade = factor(highest_grade), black = as.logical(black), cid = consecutive_id(ged) ) # Figure 6.3: ggplot(prototypical_dropout_wages_F, aes(x = experience, y = log_wages)) + geom_line(aes(group = cid, colour = black)) + geom_line( aes(group = interaction(experience, black), colour = black), alpha = .25 ) + scale_x_continuous(breaks = seq(0, 10, by = 2)) + scale_color_brewer(palette = \"Dark2\") + coord_cartesian(xlim = c(0, 10), ylim = c(1.6, 2.4)) + facet_wrap(vars(highest_grade), labeller = label_both)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-6.html","id":"using-transformations-to-model-nonlinear-individual-change","dir":"Articles","previous_headings":"","what":"6.2 Using Transformations to Model Nonlinear Individual Change","title":"Chapter 6: Modeling Discontinuous and Nonlinear Change","text":"Section 6.2, Singer Willett (2003) discuss strategies transforming outcome time-indicator variables model nonlinear individual change trajectories using subset data Curran, Stice, Chassin (1997), measured relation changes alcohol use changes peer alcohol use 3-year period community-based sample Hispanic Caucasian adolescents. example return alcohol_use_1 data set introduced Chapter 4, person-period data frame 246 rows 6 columns: id: Adolescent ID. age: Age years time measurement. child_of_alcoholic: Binary indicator whether adolescent child alcoholic parent. male: Binary indicator whether adolescent male. alcohol_use: Square root summed scores four eight-point items measuring frequency alcohol use. peer_alcohol_use: Square root summed scores two six-point items measuring frequency peer alcohol use. Note outcome variable, alcohol_use, uses square root metric—meaning square root transformation previously applied raw scores. return alcohol_use scores original metric, can back-transform applying inverse transformation. case, squaring square root transformed values. difference original transformed metrics best seen examining empirical growth plots random subset individuals. Notice original metric—despite three measurement occasions—individual change trajectories participants 13, 20, 22, 45 appear obviously nonlinear. However, square root metric trajectories become obviously linear. Singer Willett (2003) discuss, ability transform outcome variable (time-indicator variables) individual change becomes linear suggests simple strategy modelling nonlinear change: Transform outcome variable (time-indicator variables) individual change becomes linear. Fit multilevel model change transformed metric. Back-transform estimates predictions present findings original metric. return Model E Chapter 4 demonstrate strategy. Chapter 4, begin fitting multilevel model change transformed metric. However, instead plotting prototypical change trajectories transformed metric, now back-transform present original metric. Note similarities differences nonlinear trajectories compared linear trajectories alcohol_use_1 data presented Chapter 4 (Figure 4.3).","code":"alcohol_use_1 #> # A tibble: 246 × 6 #> id age child_of_alcoholic male alcohol_use peer_alcohol_use #> #> 1 1 14 1 0 1.73 1.26 #> 2 1 15 1 0 2 1.26 #> 3 1 16 1 0 2 1.26 #> 4 2 14 1 1 0 0.894 #> 5 2 15 1 1 0 0.894 #> 6 2 16 1 1 1 0.894 #> 7 3 14 1 1 1 0.894 #> 8 3 15 1 1 2 0.894 #> 9 3 16 1 1 3.32 0.894 #> 10 4 14 1 1 0 1.79 #> # ℹ 236 more rows alcohol_use_1 <- alcohol_use_1 |> mutate(alcohol_use_raw = alcohol_use^2, .before = alcohol_use) |> rename(alcohol_use_sqrt = alcohol_use) alcohol_use_1 #> # A tibble: 246 × 7 #> id age child_of_alcoholic male alcohol_use_raw alcohol_use_sqrt #> #> 1 1 14 1 0 3.00 1.73 #> 2 1 15 1 0 4 2 #> 3 1 16 1 0 4 2 #> 4 2 14 1 1 0 0 #> 5 2 15 1 1 0 0 #> 6 2 16 1 1 1 1 #> 7 3 14 1 1 1 1 #> 8 3 15 1 1 4 2 #> 9 3 16 1 1 11.0 3.32 #> 10 4 14 1 1 0 0 #> # ℹ 236 more rows #> # ℹ 1 more variable: peer_alcohol_use alcohol_use_1_empgrowth <- map( list(original = \"alcohol_use_raw\", sqrt = \"alcohol_use_sqrt\"), \\(.y) { set.seed(333) alcohol_use_1 |> filter(id %in% sample(id, size = 8)) |> ggplot(aes(x = age, y = .data[[.y]])) + geom_point() + coord_cartesian(xlim = c(13, 17), ylim = c(-1, 15)) + facet_wrap(vars(id), ncol = 4, labeller = label_both) } ) alcohol_use_1_empgrowth$original alcohol_use_1_empgrowth$sqrt alcohol_use_1_fit <- lmer( alcohol_use_sqrt ~ I(age - 14) * peer_alcohol_use + child_of_alcoholic + (1 + I(age - 14) | id), data = alcohol_use_1, REML = FALSE ) summary(alcohol_use_1_fit) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: #> alcohol_use_sqrt ~ I(age - 14) * peer_alcohol_use + child_of_alcoholic + #> (1 + I(age - 14) | id) #> Data: alcohol_use_1 #> #> AIC BIC logLik deviance df.resid #> 606.7 638.3 -294.4 588.7 237 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -2.59554 -0.40414 -0.08352 0.45550 2.29975 #> #> Random effects: #> Groups Name Variance Std.Dev. Corr #> id (Intercept) 0.2409 0.4908 #> I(age - 14) 0.1392 0.3730 -0.03 #> Residual 0.3373 0.5808 #> Number of obs: 246, groups: id, 82 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) -0.31382 0.14611 -2.148 #> I(age - 14) 0.42469 0.10559 4.022 #> peer_alcohol_use 0.69518 0.11126 6.249 #> child_of_alcoholic 0.57120 0.14623 3.906 #> I(age - 14):peer_alcohol_use -0.15138 0.08451 -1.791 #> #> Correlation of Fixed Effects: #> (Intr) I(g-14) pr_lc_ chld__ #> I(age - 14) -0.410 #> peer_lchl_s -0.709 0.351 #> chld_f_lchl -0.338 0.000 -0.146 #> I(-14):pr__ 0.334 -0.814 -0.431 0.000 prototypical_alcohol_use <- alcohol_use_1_fit |> estimate_prediction( data = crossing( age = seq(14, 16, by = .25), child_of_alcoholic = 0:1, peer_alcohol_use = c(0.655, 1.381) ) ) |> rename(alcohol_use = Predicted) |> mutate( alcohol_use = alcohol_use^2, child_of_alcoholic = factor(child_of_alcoholic), peer_alcohol_use = factor(peer_alcohol_use, labels = c(\"low\", \"high\")) ) # Figure 6.4, page 209: ggplot(prototypical_alcohol_use, aes(x = age, y = alcohol_use)) + geom_line(aes(linetype = child_of_alcoholic, colour = peer_alcohol_use)) + scale_color_viridis_d(option = \"G\", begin = .4, end = .7) + scale_x_continuous(breaks = 13:17) + coord_cartesian(xlim = c(13, 17), ylim = c(0, 3))"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-6.html","id":"selecting-a-suitable-transformation","dir":"Articles","previous_headings":"6.2 Using Transformations to Model Nonlinear Individual Change","what":"Selecting a suitable transformation","title":"Chapter 6: Modeling Discontinuous and Nonlinear Change","text":"Singer Willett (2003) suggest examining empirical growth plots participant (random subset) several different transformations selecting one analysis, keeping mind : order analyze data, transformation must used entire sample. Different transformations may less successful different participants. transformation selected analysis work well sample, compromise expected. illustrate process using berkeley data set, contains subset data Berkeley Growth Study measuring changes IQ single girl followed childhood older adulthood (Bayley, 1935). Following Singer Willett (2003), try transforming outcome time-indicator variable see successful removing nonlinearity. Note although two transformations example simply inversions (raising iq power 2.3, taking 2.3th root age), produce identical reductions nonlinearity.","code":"berkeley #> # A tibble: 18 × 2 #> age iq #> #> 1 5 37 #> 2 7 65 #> 3 9 85 #> 4 10 88 #> 5 11 95 #> 6 12 101 #> 7 13 103 #> 8 14 107 #> 9 15 113 #> 10 18 121 #> 11 21 148 #> 12 24 161 #> 13 27 165 #> 14 36 187 #> 15 42 205 #> 16 48 218 #> 17 54 218 #> 18 60 228 berkeley_transforms <- list(\"original\", \"iq^(2.3)\", \"age^(1/2.3)\") |> set_names() |> map( \\(.transform) { mutate( berkeley, transform = .transform, age = if_else(transform == \"age^(1/2.3)\", age^(1/2.3), age), iq = if_else(transform == \"iq^(2.3)\", iq^(2.3), iq) ) } ) |> list_rbind(names_to = \"metric\") |> mutate(metric = factor(metric, levels = unique(metric))) # Figure 6.6, page 212: ggplot(berkeley_transforms, aes(x = age, y = iq)) + geom_point() + facet_wrap(vars(metric), scales = \"free\", labeller = label_both)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-6.html","id":"representing-individual-change-using-a-polynomial-function-of-time","dir":"Articles","previous_headings":"","what":"6.3 Representing individual change using a polynomial function of time","title":"Chapter 6: Modeling Discontinuous and Nonlinear Change","text":"Section 6.3, Singer Willett (2003) discuss strategies fitting polynomial individual change trajectories using subset data Keiley, Bates, Dodge, Pettit (2000), measured changes externalizing behaviour sample 45 children tracked first sixth grade. example use externalizing_behaviour data set, person-period data frame 270 rows 5 columns: id: Child ID. time: Time measurement. externalizing_behaviour: Sum scores Achenbach’s (1991) Child Behavior Checklist. Scores range 0 68 female: Binary indicator whether adolescent female. grade: Grade year. Singer Willett (2003) recommend using two approaches selecting among competing polynomial forms level-1 individual growth model: Examining empirical growth plots identify highest order polynomial change trajectory suggested data. Comparing goodness--fit statistics across series polynomial level-1 models.","code":"externalizing_behaviour #> # A tibble: 270 × 5 #> id externalizing_behaviour female time grade #> #> 1 1 50 0 0 1 #> 2 1 57 0 1 2 #> 3 1 51 0 2 3 #> 4 1 48 0 3 4 #> 5 1 43 0 4 5 #> 6 1 19 0 5 6 #> 7 2 4 0 0 1 #> 8 2 6 0 1 2 #> 9 2 3 0 2 3 #> 10 2 3 0 3 4 #> # ℹ 260 more rows"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-6.html","id":"using-a-polynomial-trajectory-to-summarize-each-persons-empirical-growth-record","dir":"Articles","previous_headings":"6.3 Representing individual change using a polynomial function of time","what":"Using a polynomial trajectory to summarize each person’s empirical growth record","title":"Chapter 6: Modeling Discontinuous and Nonlinear Change","text":"begin examining empirical growth plots subset children whose trajectories span wide array individual change patterns externalizing_behaviour data. Unlike previous examples, (1) match order IDs input vector order ids resultant data frame using map() function purrr package filter data specific order; (2) assign participant subset consecutive alphabetical identifier using consecutive_id() function dplyr package. Now can examine empirical growth plots externalizing_behaviour_subset data. faced many different individual change patterns, Singer Willett (2003) suggest beginning following exploratory approach: First, identify highest order polynomial needed summarize change participant fitting separate person-specific polynomial trajectories. Second, identify highest order polynomial needed summarize change participant fitting common polynomial trajectory across participants. Similar previous examples, geom_smooth() function \"lm\" method can used add person’s fitted polynomial trajectory empirical growth record plot—need specify functional form trajectories formula argument. two ways specify polynomial trajectories using R’s formula syntax: (): () function can used construct series polynomial predictors hand. poly(): poly() function can used construct series (default, orthogonal) polynomial predictors degree 1 specified degree. begin person-specific polynomial trajectories. Note aesthetic mappings geom_smooth() formulas can used specify degree child’s polynomial trajectory, instead need add separate smooth geom empirical growth plots child’s data. Next common polynomial trajectories. Following Singer Willett (2003), select quartic (degree 4) trajectory child appears need higher order polynomial.","code":"# Note that participant 26 is last in the input vector and resultant data frame. externalizing_behaviour_subset <- c(1, 6, 11, 25, 34, 36, 40, 26) |> map(\\(.id) filter(externalizing_behaviour, id == .id)) |> list_rbind() |> mutate(child = LETTERS[consecutive_id(id)]) tail(externalizing_behaviour_subset, n = 12) #> # A tibble: 12 × 6 #> id externalizing_behaviour female time grade child #> #> 1 40 40 1 0 1 G #> 2 40 23 1 1 2 G #> 3 40 7 1 2 3 G #> 4 40 28 1 3 4 G #> 5 40 35 1 4 5 G #> 6 40 56 1 5 6 G #> 7 26 19 0 0 1 H #> 8 26 32 0 1 2 H #> 9 26 25 0 2 3 H #> 10 26 40 0 3 4 H #> 11 26 20 0 4 5 H #> 12 26 23 0 5 6 H externalizing_behaviour_empgrowth <- externalizing_behaviour_subset |> ggplot(aes(x = grade, y = externalizing_behaviour)) + geom_point() + scale_x_continuous(breaks = 0:7) + coord_cartesian(xlim = c(0, 7), ylim = c(0, 60)) + facet_wrap(vars(child), ncol = 4, labeller = label_both) externalizing_behaviour_empgrowth externalizing_behaviour_empgrowth <- externalizing_behaviour_empgrowth + map2( group_split(externalizing_behaviour_subset, child), list(2, 2, 1, 1, 3, 4, 2, 4), \\(.child, .degree) { geom_smooth( aes(linetype = polynomial, colour = degree), data = mutate( .child, polynomial = factor(\"person-specific\"), degree = factor(.degree, levels = 1:4) ), method = \"lm\", formula = y ~ poly(x, degree = .degree), se = FALSE ) } ) + scale_colour_brewer(palette = \"Dark2\", drop = FALSE) + guides(linetype = guide_legend(override.aes = list(colour = \"black\"))) externalizing_behaviour_empgrowth # Figure 6.7, page 218: externalizing_behaviour_empgrowth + geom_smooth( aes(linetype = \"common (quartic)\"), method = \"lm\", formula = y ~ poly(x, degree = 4), se = FALSE, colour = \"black\", linewidth = .5 )"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-6.html","id":"testing-higher-order-terms-in-a-polynomial-level-1-model","dir":"Articles","previous_headings":"6.3 Representing individual change using a polynomial function of time","what":"Testing Higher Order Terms in a Polynomial Level-1 Model","title":"Chapter 6: Modeling Discontinuous and Nonlinear Change","text":"select “final” polynomial trajectory, Singer Willett (2003) suggest fitting series level-1 individual growth models increasing polynomial complexity, stopping goodness--fit statistics suggest need add polynomial predictors model. fit four models externalizing_behaviour data: unconditional means model (Model ), unconditional growth model (Model B), two models increasing polynomial order (Models C D). Note set raw argument poly() function TRUE order use raw, orthogonal, polynomials. usual, can inspect analysis deviance tables compare nested models.","code":"# Fit models ------------------------------------------------------------------ externalizing_behaviour_fit_A <- lmer( externalizing_behaviour ~ 1 + (1 | id), data = externalizing_behaviour, REML = FALSE ) externalizing_behaviour_poly_fits <- map( set_names(1:3), \\(.degree) { lmer( externalizing_behaviour ~ poly(time, .degree, raw = TRUE) + (poly(time, .degree, raw = TRUE) | id), data = externalizing_behaviour, REML = FALSE ) } ) externalizing_behaviour_fits <- list( \"Model A\" = externalizing_behaviour_fit_A, \"Model B\" = externalizing_behaviour_poly_fits[[\"1\"]], \"Model C\" = externalizing_behaviour_poly_fits[[\"2\"]], \"Model D\" = externalizing_behaviour_poly_fits[[\"3\"]] ) # Make table ------------------------------------------------------------------ # Table 6.5, page 221: externalizing_behaviour_fits |> modelsummary( shape = term + effect + statistic ~ model, statistic = NULL, scales = c(\"vcov\", NA), coef_map = c( \"(Intercept)\" = \"(Intercept)\", \"poly(time, .degree, raw = TRUE)\" = \"time\", \"poly(time, .degree, raw = TRUE)1\" = \"time\", \"poly(time, .degree, raw = TRUE)2\" = \"time^2\", \"poly(time, .degree, raw = TRUE)3\" = \"time^3\", \"var__Observation\" = \"var__Observation\", \"var__(Intercept)\" = \"var__(Intercept)\", \"var__poly(time, .degree, raw = TRUE)\" = \"var__time\", \"var__poly(time, .degree, raw = TRUE)1\" = \"var__time\", \"cov__(Intercept).poly(time, .degree, raw = TRUE)\" = \"cov__(Intercept).time\", \"cov__(Intercept).poly(time, .degree, raw = TRUE)1\" = \"cov__(Intercept).time\", \"var__poly(time, .degree, raw = TRUE)2\" = \"var__time^2\", \"cov__(Intercept).poly(time, .degree, raw = TRUE)2\" = \"cov__(Intercept).time^2\", \"cov__poly(time, .degree, raw = TRUE)1.poly(time, .degree, raw = TRUE)2\" = \"cov__time.time^2\", \"var__poly(time, .degree, raw = TRUE)3\" = \"var__time^3\", \"cov__(Intercept).poly(time, .degree, raw = TRUE)3\" = \"cov__(Intercept).time^3\", \"cov__poly(time, .degree, raw = TRUE)1.poly(time, .degree, raw = TRUE)3\" = \"cov__time.time^3\", \"cov__poly(time, .degree, raw = TRUE)2.poly(time, .degree, raw = TRUE)3\" = \"cov__time^2.time^3\" ), gof_map = tibble( raw = c(\"deviance\", \"AIC\", \"BIC\"), clean = c(\"Deviance\", \"AIC\", \"BIC\"), fmt = 1 ), output = \"gt\" ) |> tab_row_group(label = \"Goodness-of-Fit\", rows = 16:18) |> tab_row_group(label = \"Variance Components\", rows = 5:15) |> tab_row_group(label = \"Fixed Effects\", rows = 1:4) |> cols_hide(effect) externalizing_behaviour_fits |> with(do.call(anova, map(names(externalizing_behaviour_fits), as.name))) |> tidy() #> # A tibble: 4 × 9 #> term npar AIC BIC logLik deviance statistic df p.value #> #> 1 Model A 3 2016. 2027. -1005. 2010. NA NA NA #> 2 Model B 6 2004. 2025. -996. 1992. 18.5 3 0.000345 #> 3 Model C 10 1996. 2032. -988. 1976. 15.9 4 0.00315 #> 4 Model D 15 1997. 2051. -984. 1967. 8.48 5 0.132"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-6.html","id":"truly-nonlinear-trajectories","dir":"Articles","previous_headings":"","what":"6.4 Truly Nonlinear Trajectories","title":"Chapter 6: Modeling Discontinuous and Nonlinear Change","text":"Section 6.4, Singer Willett (2003) discuss strategies fitting truly nonlinear change trajectories using data Tivnan (1980), measured changes cognitive growth three-week period sample 17 first second-graders. Childrens’ cognitive growth based improvement number moves completed two-person checkerboard game, Fox n’ Geese, making catastrophic error. example use cognitive_growth data set, person-period data frame 445 rows 4 columns: id: Child ID. game: Game number. child played maximum 27 games. nmoves: number moves completed making catastrophic error. reading_score: Score unnamed standardized reading test. inform specification nonlinear multilevel model change fit example, begin examining empirical growth plots subset children cognitive_growth data. Singer Willett (2003) discuss, knowledge Fox n’ Geese game inspection plots suggests (generalized) logistic trajectory level-1 individual growth model consisting three features: fixed lower asymptote: child’s trajectory rises fixed lower asymptote 1 players must make least one move. fixed upper asymptote: child’s trajectory approaches upper asymptote can make finite number moves making catastrophic error. Based examining empirical growth plots, fixed upper asymptote 20 appears reasonable. smooth curve joining lower upper asymptotes: Learning theory suggests child’s true trajectory smoothly traverse region lower upper asymptotes—accelerating away lower asymptote child initially deduces winning strategy, decelerating toward upper asymptote child finds increasingly difficult refine winning strategy . write level-1 logistic trajectory : \\[ \\text{nmoves}_{ij} = a_{1} + \\frac{(a_{2} - a_{1})}{1 + \\pi_{0i} e^{-(\\pi_{1i} \\text{game}_{ij})}} + \\epsilon_{ij}, \\] asserts \\(\\text{nmoves}_{ij}\\)—true value nmoves \\(\\)th child \\(j\\)th game—nonlinear function logistic growth parameters \\(\\pi_{0i}\\) \\(\\pi_{1i}\\). parameters \\(a_{1}\\) \\(a_{2}\\) represent lower upper asymptotes, fix values 1 20, respectively. develop intuition level-1 logistic trajectory models relationship nmoves game, can plot true trajectories different children using specific combinations values nonlinear parameters, \\(\\pi_{0i}\\) \\(\\pi_{1i}\\). can first writing function deriv() representing level-1 logistic trajectory (note also use function fitting nonlinear multilevel model change). using geom_function() geom plot true trajectories different combinations nonlinear parameters. fit two models cognitive_growth data using postulated level-1 logistic trajectory: unconditional logistic growth model (Model ); logistic growth model includes time-invariant predictor children’s reading skill, reading_score, centred sample mean 1.95625 (Model B). Following Singer Willett (2003), specify level-2 submodels nonlinear parameters Models B : \\[ \\begin{align} \\text{Model :} \\qquad \\pi_{0i} &= \\gamma_{00} + \\zeta_{0i} \\\\ \\pi_{1i} &= \\gamma_{10} + \\zeta_{1i} \\\\ \\\\ \\text{Model B:} \\qquad \\pi_{0i} &= \\gamma_{00} + \\gamma_{01}(\\text{reading_score}_i - \\overline{\\text{reading_score}}) + \\zeta_{0i} \\\\ \\pi_{1i} &= \\gamma_{10} + \\gamma_{11}(\\text{reading_score}_i - \\overline{\\text{reading_score}}) + \\zeta_{1i} \\end{align} \\] \\[ \\begin{bmatrix} \\zeta_{0i} \\\\ \\zeta_{1i} \\end{bmatrix} \\sim \\begin{pmatrix} N \\begin{bmatrix} 0 \\\\ 0 \\end{bmatrix}, \\begin{bmatrix} \\sigma^2_0 & \\ \\sigma_{10} \\\\ \\ \\sigma_{10} & \\sigma^2_1 \\end{bmatrix} \\end{pmatrix}. \\] can fit logistic growth models using nlme() function nlme package. model formula nlme() function takes form response ~ nonlinear_formula, nonlinear formula can either represented using function written directly. example use logistic_function() created . several important differences linear nonlinear models keep mind fitting models: nonlinear parameters must declared explicitly nonlinear formula. nlme() function, linear models parameters specified fixed random arguments; multiple parameters share model can written single formula instead list single parameter formulas. Starting estimates parameters must provided, unless self-starting function used calculate initial parameter estimates; final estimates can also quite sensitive starting values. example, chose starting estimates close parameter estimates reported text (Table 6.6). Strategies choosing reasonable starting estimates covered Bates Watts (1988), Pinheiro Bates (2000). intercept assumed default, must included nonlinear model formula desired. Note although models fit match described textbook equations, models exactly Singer Willett (2003) secretly fit Table 6.6 Figure 6.10 (discussion, see https://github.com/mccarthy-m-g/alda/issues/3). can plot individual prototypical change trajectories usual. First prototypical trajectories. Finally, individual trajectories. can plot trajectories together get sense interindividual variation. can add child’s fitted trajectory empirical growth plot get sense well model fits data.","code":"cognitive_growth #> # A tibble: 445 × 4 #> id game nmoves reading_score #> #> 1 1 1 4 1.4 #> 2 1 2 7 1.4 #> 3 1 3 8 1.4 #> 4 1 4 3 1.4 #> 5 1 5 3 1.4 #> 6 1 6 3 1.4 #> 7 1 7 7 1.4 #> 8 1 8 6 1.4 #> 9 1 9 3 1.4 #> 10 1 10 7 1.4 #> # ℹ 435 more rows # Figure 6.8, page 227: cognitive_growth_empgrowth <- cognitive_growth |> filter(id %in% c(1, 4, 6, 7, 8, 11, 12, 15)) |> ggplot(aes(x = game, y = nmoves)) + geom_point() + coord_cartesian(xlim = c(0, 30), ylim = c(0, 25)) + facet_wrap(vars(id), ncol = 4, labeller = label_both) cognitive_growth_empgrowth logistic_function <- deriv( ~ 1 + (19.0 / (1.0 + pi0 * exp(-pi1 * time))), namevec = c(\"pi0\", \"pi1\"), function.arg = c(\"time\", \"pi0\", \"pi1\") ) # Figure 6.9: ggplot() + pmap( arrange_all(crossing(.pi0 = c(150, 15, 1.5), .pi1 = c(.5, .3, .1)), desc), \\(.pi0, .pi1) { geom_function( aes(colour = pi1), data = tibble(pi0 = factor(.pi0), pi1 = factor(.pi1)), fun = \\(.game) { logistic_function(.game, pi0 = .pi0, pi1 = .pi1) }, n = 30 ) } ) + scale_x_continuous(limits = c(0, 30)) + scale_color_brewer(palette = \"Dark2\") + coord_cartesian(ylim = c(0, 25)) + facet_wrap(vars(pi0), labeller = label_both) + labs( x = \"game\", y = \"nmoves\" ) # Fit models ------------------------------------------------------------------ cognitive_growth_fit_A <- nlme( nmoves ~ logistic_function(game, pi0, pi1), fixed = pi0 + pi1 ~ 1, random = pi0 + pi1 ~ 1, groups = ~ id, start = list(fixed = c(pi0 = 13, pi0 = .12)), data = cognitive_growth ) cognitive_growth_fit_B <- update( cognitive_growth_fit_A, fixed = pi0 + pi1 ~ 1 + I(reading_score - 1.95625), start = list(fixed = c(13, -.4, .12, .04)) ) cognitive_growth_fits <- list( \"Model A\" = cognitive_growth_fit_A, \"Model B\" = cognitive_growth_fit_B ) # Make table ------------------------------------------------------------------ # broom.mixed and easystats don't have methods to extract random effects from # objects of class nlme, so we have to construct this part of the table manually. cognitive_growth_fits_ranef <- cognitive_growth_fits |> map( \\(.fit) { .fit |> VarCorr() |> as.data.frame(order = \"cov.last\") |> mutate( across( c(var1, var2), \\(.x) if_else(str_ends(.x, \"[:digit:]\"), paste0(.x, \".(Intercept)\"), .x) ), effect = \"random\", var1 = case_when( grp == \"Residual\" ~ \"sd__Observation\", grp == \"id\" & is.na(var2) ~ paste0(\"sd__\", var1), grp == \"id\" & !is.na(var2) ~ paste0(\"cor__pi0\", var1, \".\", var2) ) ) |> arrange(grp) |> select(term = var1, estimate = sdcor) } ) |> list_rbind(names_to = \"model\") |> pivot_wider(names_from = model, values_from = estimate) |> structure(position = 5:8) # Table 6.6, page 231: cognitive_growth_fits |> modelsummary( statistic = NULL, coef_map = c( \"pi0\" = \"pi0.(Intercept)\", \"pi0.(Intercept)\" = \"pi0.(Intercept)\", \"pi0.I(reading_score - 1.95625)\" = \"pi0.I(reading_score - 1.95625)\", \"pi1\" = \"pi1.(Intercept)\", \"pi1.(Intercept)\" = \"pi1.(Intercept)\", \"pi1.I(reading_score - 1.95625)\" = \"pi1.I(reading_score - 1.95625)\" ), gof_map = tibble( raw = c(\"deviance\", \"AIC\", \"BIC\"), clean = c(\"Deviance\", \"AIC\", \"BIC\"), fmt = 1 ), add_rows = cognitive_growth_fits_ranef, output = \"gt\" ) |> tab_row_group(label = \"Goodness-of-Fit\", rows = 9:11) |> tab_row_group(label = \"Variance Components\", rows = 5:8) |> tab_row_group(label = \"Fixed Effects\", rows = 1:4) prototypical_cognitive_growth <- cognitive_growth_fits |> map2( list( tibble(game = seq(from = 0, to = 30, by = 0.1)), crossing(game = seq(from = 0, to = 30, by = 0.1), reading_score = c(1, 4)) ), \\(.fit, .df) { .df |> mutate(nmoves = predict(.fit, newdata = .df, level = 0)) } ) |> list_rbind(names_to = \"model\") |> mutate(reading_score = factor(reading_score, labels = c(\"low\", \"high\"))) # Similar to Figure 6.10, page 232: ggplot(prototypical_cognitive_growth, aes(x = game, y = nmoves)) + geom_line(aes(colour = reading_score)) + scale_color_viridis_d( option = \"G\", begin = .4, end = .7, na.value = \"black\" ) + coord_cartesian(ylim = c(0, 25)) + facet_wrap(vars(model)) cognitive_growth_fits |> map(\\(.fit) augment(.fit, data = cognitive_growth)) |> list_rbind(names_to = \"model\") |> select(-nmoves) |> rename(nmoves = .fitted) |> mutate(reading_score = if_else(model == \"Model A\", NA, reading_score)) |> ggplot(aes(x = game, y = nmoves)) + geom_line(aes(group = id, colour = reading_score)) + scale_colour_viridis_b( option = \"G\", begin = .4, end = .8, na.value = \"black\" ) + coord_cartesian(ylim = c(0, 25)) + facet_wrap(vars(model)) cognitive_growth_empgrowth + geom_line( aes(y = .fitted, group = id, colour = reading_score), data = filter( augment(cognitive_growth_fit_B, data = cognitive_growth), id %in% c(1, 4, 6, 7, 8, 11, 12, 15) ) ) + scale_colour_viridis_b(breaks = 1:4, option = \"G\", begin = .4, end = .8)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-7.html","id":"the-standard-specification-of-the-multilevel-model-for-change","dir":"Articles","previous_headings":"","what":"7.1 The “standard” specification of the multilevel model for change","title":"Chapter 7: Examining the Multilevel Model's Error Covariance Structure","text":"Chapter 7 Singer Willett (2003) examine generalized least squares approach modelling change using artificial data created Willett (1988), simulated changes performance hypothetical “opposites naming” task four week period sample 35 people. example use opposites_naming data set, person-period data frame 140 rows 5 columns: id: Participant ID. wave: Wave measurement. time: Wave measurement centred time 0. opposites_naming_score: Score “opposites naming” task. baseline_cognitive_score: Baseline score standardized instrument assessing general cognitive skill. person-level version opposites_naming data shows, time-structured data set four measurements per participant, time-invariant predictor reflecting participant’s cognitive skill baseline. begin fitting “standard” multilevel model change opposites_naming data, serve point comparison alternative models different error covariance structures “standard” model. “standard” model opposites_naming data takes familiar form: \\[ \\begin{alignat}{2} \\text{Level 1:} \\\\ &\\text{opposites_naming_score}_{ij} = \\pi_{0i} + \\pi_{1i} \\text{time}_{ij} + \\epsilon_{ij} \\\\ \\text{Level 2:} \\\\ &\\pi_{0i} = \\gamma_{00} + \\gamma_{01} (\\text{baseline_cognitive_score}_i - 113.4571) + \\zeta_{0i} \\\\ &\\pi_{1i} = \\gamma_{10} + \\gamma_{11} (\\text{baseline_cognitive_score}_i - 113.4571) + \\zeta_{1i}. \\end{alignat} \\] \\[ \\epsilon_{ij} \\stackrel{iid}{\\sim} \\operatorname{Normal}(0, \\sigma_\\epsilon), \\] \\[ \\begin{bmatrix} \\zeta_{0i} \\\\ \\zeta_{1i} \\end{bmatrix} \\stackrel{iid}{\\sim} \\begin{pmatrix} N \\begin{bmatrix} 0 \\\\ 0 \\end{bmatrix}, \\begin{bmatrix} \\sigma^2_0 & \\ \\sigma_{10} \\\\ \\ \\sigma_{10} & \\sigma^2_1 \\end{bmatrix} \\end{pmatrix}. \\] Singer Willett (2003) discuss, focus chapter error covariance structure multilevel model change, fit model using restricted maximum likelihood goodness--fit statistics reflect stochastic portion model’s fit. Additionally, use lme() function nlme package instead lme4’s lmer() function fit model, former methods make examining fitted model’s error covariance structure easier. API lme() function similar lmer() function, except fixed random effects specified separate formulas rather single formula.","code":"opposites_naming #> # A tibble: 140 × 5 #> id wave time opposites_naming_score baseline_cognitive_score #> #> 1 1 1 0 205 137 #> 2 1 2 1 217 137 #> 3 1 3 2 268 137 #> 4 1 4 3 302 137 #> 5 2 1 0 219 123 #> 6 2 2 1 243 123 #> 7 2 3 2 279 123 #> 8 2 4 3 302 123 #> 9 3 1 0 142 129 #> 10 3 2 1 212 129 #> # ℹ 130 more rows opposites_naming_pl <- opposites_naming |> select(-time) |> pivot_wider( names_from = wave, values_from = opposites_naming_score, names_prefix = \"opp_\" ) |> relocate(baseline_cognitive_score, .after = everything()) # Table 7.1: head(opposites_naming_pl, 10) #> # A tibble: 10 × 6 #> id opp_1 opp_2 opp_3 opp_4 baseline_cognitive_score #> #> 1 1 205 217 268 302 137 #> 2 2 219 243 279 302 123 #> 3 3 142 212 250 289 129 #> 4 4 206 230 248 273 125 #> 5 5 190 220 229 220 81 #> 6 6 165 205 207 263 110 #> 7 7 170 182 214 268 99 #> 8 8 96 131 159 213 113 #> 9 9 138 156 197 200 104 #> 10 10 216 252 274 298 96 # Fit model ------------------------------------------------------------------- opposites_naming_fit_standard <- lme( opposites_naming_score ~ time * I(baseline_cognitive_score - 113.4571), random = ~ time | id, data = opposites_naming, method = \"REML\" ) # Make table ------------------------------------------------------------------ options(modelsummary_get = \"all\") # Table 7.2, page 246: opposites_naming_fit_standard |> list() |> set_names(\"Estimate\") |> modelsummary( fmt = 2, statistic = NULL, effects = c(\"var_model\", \"ran_pars\", \"fixed\"), scales = c(\"vcov\", \"vcov\", NA), coef_map = c( \"(Intercept)\", \"I(baseline_cognitive_score - 113.4571)\", \"time\", \"time:I(baseline_cognitive_score - 113.4571)\", \"var_Observation\", \"var_(Intercept)\", \"var_time\", \"cov_time.(Intercept)\" ), gof_map = list( list( raw = \"logLik\", clean = \"Deviance\", fmt = \\(.x) vec_fmt_number( -2*as.numeric(.x), decimals = 1, sep_mark = \"\" ) ), list( raw = \"AIC\", clean = \"AIC\", fmt = fmt_decimal(1) ), list( raw = \"BIC\", clean = \"BIC\", fmt = fmt_decimal(1) ) ), output = \"gt\" ) |> tab_row_group(label = \"Goodness-of-Fit\", rows = 9:11) |> tab_row_group(label = \"Variance Components\", rows = 5:8) |> tab_row_group(label = \"Fixed Effects\", rows = 1:4)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-7.html","id":"using-the-composite-model-to-understand-assumptions-about-the-error-covariance-matrix","dir":"Articles","previous_headings":"","what":"7.2 Using the composite model to understand assumptions about the error covariance matrix","title":"Chapter 7: Examining the Multilevel Model's Error Covariance Structure","text":"Section 7.2 Singer Willett (2003) examine error covariance structure implied “standard” multilevel model change, given random effects specification. , begin substituting level-2 equations level-1, yielding composite representation “standard” model: \\[ \\text{opp}_{ij} = \\gamma_{00} + \\gamma_{10}\\text{time}_{ij} + \\gamma_{01}(\\text{cog}_i - 113.4571) + \\gamma_{11}\\text{time}_{ij}(\\text{cog}_i - 113.4571) + r_{ij}, \\] composite residual, \\(r_{ij}\\), represents weighted linear combination model’s original three random effects: \\[ r_{ij} = \\epsilon_{ij} + \\zeta_{0i} + \\zeta_{1i} \\text{time}_{ij}. \\] Notice composite model now looks like typical multiple regression model—usual error term, \\(\\epsilon_i\\), replaced composite residual, \\(r_{ij}\\). Following observation, can reexpress distributional assumptions residuals “standard” multilevel model change one grand statement based composite residual: \\[ r \\sim N \\begin{pmatrix} \\mathbf 0, \\begin{bmatrix} \\mathbf{\\Sigma}_r & \\mathbf 0 & \\mathbf 0 & \\dots & \\mathbf 0 \\\\ \\mathbf 0 & \\mathbf{\\Sigma}_r & \\mathbf 0 & \\dots & \\mathbf 0 \\\\ \\mathbf 0 & \\mathbf 0 & \\mathbf{\\Sigma}_r & \\dots & \\mathbf 0 \\\\ \\vdots & \\vdots & \\vdots & \\ddots & \\mathbf 0 \\\\ \\mathbf 0 & \\mathbf 0 & \\mathbf 0 & \\mathbf 0 & \\mathbf{\\Sigma}_r \\end{bmatrix} \\end{pmatrix}, \\] \\(\\mathbf{\\Sigma}_r\\) represents block diagonal error covariance sub-matrix whose dimensions reflect design opposites_naming data, given : $$ \\[\\begin{align} \\mathbf{\\Sigma}_r & = \\begin{bmatrix} \\sigma_{r_1}^2 & \\sigma_{r_1 r_2} & \\sigma_{r_1 r_3} & \\sigma_{r_1 r_4} \\\\ \\sigma_{r_2 r_1} & \\sigma_{r_2}^2 & \\sigma_{r_2 r_3} & \\sigma_{r_2 r_4} \\\\ \\sigma_{r_3 r_1} & \\sigma_{r_3 r_2} & \\sigma_{r_3}^2 & \\sigma_{r_3 r_4} \\\\ \\sigma_{r_4 r_1} & \\sigma_{r_4 r_2} & \\sigma_{r_4 r_3} & \\sigma_{r_4}^2 \\end{bmatrix}, \\end{align}\\] $$ occasion-specific composite residual variances \\[ \\begin{align} \\sigma_{r_j}^2 &= \\operatorname{Var} \\left( \\epsilon_{ij} + \\zeta_{0i} + \\zeta_{1i} \\text{time}_j \\right) \\\\ &= \\sigma_\\epsilon^2 + \\sigma_0^2 + 2 \\sigma_{01} \\text{time}_j + \\sigma_1^2 \\text{time}_j^2, \\end{align} \\] occasion-specific composite residual covariances \\[ \\sigma_{r_j, r_{j'}} = \\sigma_0^2 + \\sigma_{01} (\\text{time}_j + \\text{time}_{j'}) + \\sigma_1^2 \\text{time}_j \\text{time}_{j'}, \\] terms usual meanings. can retrieve error covariance sub-matrix, \\(\\mathbf{\\Sigma}_r\\), opposites_naming_fit_standard fit using getVarCov() function nlme package. emphasize \\(\\mathbf{\\Sigma}_r\\) participant, retrieve first last participants data. descriptive purposes, can also convert \\(\\mathbf{\\Sigma}_r\\) correlation matrix using cov2cor() function, examine residual autocorrelation measurement occasions. Singer Willett (2003) discuss, examining equations outputs reveals two important properties occasion-specific residuals “standard” multilevel model change: can heteroscedastic autocorrelated within participants (remember across participants, identically heteroscedastic autocorrelated homogeneity assumption). powerful dependence time. Specifically, residual variances, \\(\\sigma_{r_j}^2\\), quadratic dependence time minimum time \\(\\text{time} = -(\\sigma_{01} / \\sigma_1^2)\\) increase parabolically symmetrically time either side minimum; residual covariances, \\(\\sigma_{r_j, r_{j'}}\\), (imperfect) band diagonal structure wherein overall magnitude residual covariances tends decline diagonal “bands” main diagonal. first properties—allowing heteroscedasticity autocorrelation among composite residuals—necessity given anticipated demands longitudinal data. longitudinal data sets heteroscedastic autocorrelated, credible model change allow potential heteroscedasticity autocorrelation. One advantage “standard” multilevel model change —although composite residuals powerful dependence time—also capable adapting relatively smoothly many common empirical situations, accommodating automatically certain kinds complex error structure. Nonetheless, Singer Willett (2003) conclude questioning whether hypothesized structure error covariance matrix implied “standard” model can applied ubiquitously, may empirical situations directly modelling alternative error covariance structures may preferable.","code":"opposites_naming_varcov_standard <- opposites_naming_fit_standard |> getVarCov(type = \"marginal\", individuals = c(1, 35)) opposites_naming_varcov_standard #> id 1 #> Marginal variance covariance matrix #> 1 2 3 4 #> 1 1395.90 1058.20 879.95 701.71 #> 2 1058.20 1146.70 916.21 845.22 #> 3 879.95 916.21 1111.90 988.73 #> 4 701.71 845.22 988.73 1291.70 #> Standard Deviations: 37.362 33.863 33.346 35.94 #> id 35 #> Marginal variance covariance matrix #> 1 2 3 4 #> 1 1395.90 1058.20 879.95 701.71 #> 2 1058.20 1146.70 916.21 845.22 #> 3 879.95 916.21 1111.90 988.73 #> 4 701.71 845.22 988.73 1291.70 #> Standard Deviations: 37.362 33.863 33.346 35.94 cov2cor(opposites_naming_varcov_standard[[1]]) #> 1 2 3 4 #> 1 1.0000000 0.8364012 0.7062988 0.5225751 #> 2 0.8364012 1.0000000 0.8113955 0.6944913 #> 3 0.7062988 0.8113955 1.0000000 0.8249967 #> 4 0.5225751 0.6944913 0.8249967 1.0000000"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-7.html","id":"postulating-an-alternative-error-covariance-structure","dir":"Articles","previous_headings":"","what":"7.3 Postulating an alternative error covariance structure","title":"Chapter 7: Examining the Multilevel Model's Error Covariance Structure","text":"section 7.3 Singer Willett (2003) discuss alternative error covariance structures can modelled directly using extended linear model change heteroscedastic, correlated errors fitted generalized least squares regression. See Chapter 5 Pinheiro Bates (2010) discussion extended linear model. can fit extended linear model change gls() function nlme package, allows us model within-group heteroscedasticity correlation structures via weights correlation arguments, respectively. fit six models opposites_naming data following error covariance structures: unstructured, compound symmetric, heterogeneous compound symmetric, autoregressive, heterogeneous autoregressive, Toeplitz. Notice unlike multilevel model change, extended linear model change random effects. need equations table —’s nothing interesting look —hence section collapsed. $$\\mathbf{\\Sigma}_r = \\begin{bmatrix}\\sigma_1^2 & \\sigma_{12} & \\sigma_{13} & \\sigma_{14} \\\\ \\sigma_{21} & \\sigma_2^2 & \\sigma_{23} & \\sigma_{24} \\\\ \\sigma_{31} & \\sigma_{32} & \\sigma_3^2 & \\sigma_{34} \\\\ \\sigma_{41} & \\sigma_{42} & \\sigma_{43} & \\sigma_4^2 \\end{bmatrix}$$ $$\\begin{bmatrix}1345.1&1005.8&946.2&583.2\\\\1005.8&1150.5&1028.5&846.6\\\\946.2&1028.5&1235.8&969.3\\\\583.2&846.6&969.3&1206\\\\\\end{bmatrix}$$ $$\\mathbf{\\Sigma}_r = \\begin{bmatrix} \\sigma^2 + \\sigma_1^2 & \\sigma_1^2 & \\sigma_1^2 & \\sigma_1^2 \\\\ \\sigma_1^2 & \\sigma^2 + \\sigma_1^2 & \\sigma_1^2 & \\sigma_1^2 \\\\ \\sigma_1^2 & \\sigma_1^2 & \\sigma^2 + \\sigma_1^2 & \\sigma_1^2 \\\\ \\sigma_1^2 & \\sigma_1^2 & \\sigma_1^2 & \\sigma^2 + \\sigma_1^2 \\end{bmatrix}$$ $$\\begin{bmatrix}1231.4&900.1&900.1&900.1\\\\900.1&1231.4&900.1&900.1\\\\900.1&900.1&1231.4&900.1\\\\900.1&900.1&900.1&1231.4\\\\\\end{bmatrix}$$ $$\\mathbf{\\Sigma}_r = \\begin{bmatrix} \\sigma_1^2 & \\sigma_1 \\sigma_2 \\rho & \\sigma_1 \\sigma_3 \\rho & \\sigma_1 \\sigma_4 \\rho \\\\ \\sigma_2 \\sigma_1 \\rho & \\sigma_1^2 & \\sigma_2 \\sigma_3 \\rho & \\sigma_2 \\sigma_4 \\rho \\\\ \\sigma_3 \\sigma_1 \\rho & \\sigma_3 \\sigma_2 \\rho & \\sigma_3^2 & \\sigma_3 \\sigma_4 \\rho \\\\ \\sigma_4 \\sigma_1 \\rho & \\sigma_4 \\sigma_2 \\rho & \\sigma_4 \\sigma_3 \\rho & \\sigma_4^2 \\end{bmatrix}$$ $$\\begin{bmatrix}1438.1&912.9&946.6&1009.5\\\\912.9&1067.8&815.7&869.9\\\\946.6&815.7&1148&902\\\\1009.5&869.9&902&1305.7\\\\\\end{bmatrix}$$ $$\\mathbf{\\Sigma}_r = \\begin{bmatrix} \\sigma^2 & \\sigma^2 \\rho & \\sigma^2 \\rho^2 & \\sigma^2 \\rho^3 \\\\ \\sigma^2 \\rho & \\sigma^2 & \\sigma^2 \\rho & \\sigma^2 \\rho^2 \\\\ \\sigma^2 \\rho^2 & \\sigma^2 \\rho & \\sigma^2 & \\sigma^2 \\rho \\\\ \\sigma^2 \\rho^3 & \\sigma^2 \\rho^2 & \\sigma^2 \\rho & \\sigma^2 \\end{bmatrix}$$ $$\\begin{bmatrix}1256.7&1037.2&856&706.5\\\\1037.2&1256.7&1037.2&856\\\\856&1037.2&1256.7&1037.2\\\\706.5&856&1037.2&1256.7\\\\\\end{bmatrix}$$ `qZ$$\\mathbf{\\Sigma}_r = \\begin{bmatrix} \\sigma_1^2 & \\sigma_1 \\sigma_2 \\rho & \\sigma_1 \\sigma_3 \\rho^2 & \\sigma_1 \\sigma_4 \\rho^3 \\ \\sigma_2 \\sigma_1 \\rho & \\sigma_2^2 & \\sigma_2 \\sigma_3 \\rho & \\sigma_2 \\sigma_4 \\rho^2 \\ \\sigma_3 \\sigma_1 \\rho^2 & \\sigma_3 \\sigma_2 \\rho & \\sigma_3^2 & \\sigma_3 \\sigma_4 \\rho \\ \\sigma_4 \\sigma_1 \\rho^3 & \\sigma_4 \\sigma_2 \\rho^2 & \\sigma_4 \\sigma_3 \\rho & \\sigma_4^2 \\end{bmatrix} $$ $$\\begin{bmatrix}1340.7&1000.7&857.3&708.9\\\\1000.7&1111.2&952&787.1\\\\857.3&952&1213.3&1003.2\\\\708.9&787.1&1003.2&1234\\\\\\end{bmatrix}$$ `aH$$\\mathbf{\\Sigma}_r = \\begin{bmatrix}\\sigma^2 & \\sigma_1 & \\sigma_2 & \\sigma_3 \\ \\sigma_1 & \\sigma^2 & \\sigma_1 & \\sigma_2 \\ \\sigma_2 & \\sigma_1 & \\sigma^2 & \\sigma_1 \\ \\sigma_3 & \\sigma_2 & \\sigma_1 & \\sigma^2 \\end{bmatrix} $$ $$\\begin{bmatrix}1246.9&1029.3&896.6&624\\\\1029.3&1246.9&1029.3&896.6\\\\896.6&1029.3&1246.9&1029.3\\\\624&896.6&1029.3&1246.9\\\\\\end{bmatrix}$$ Comparing deviance (-2LL), AIC, BIC statistics alternative error covariance structures, find unstructured Toeplitz structures lead best-fitting models opposites_naming data. Finally, can see gained lost modelling error covariance structure directly (instead indirectly random effects) comparing fixed effect estimates goodness--fit statistics unstructured Toeplitz models “standard” multilevel model change. Singer Willett (2003) observe opposites_naming data: Toeplitz model fits slightly better “standard” model accounts, enough reject “standard” model. unstructured model fits best focus exclusively deviance statistic, cost losing degrees freedom error covariance structure considered . fixed effects estimates similar “standard”, Toeplitz, unstructured models (except (baseline_cognitive_score - 113.4571)), precision estimates slightly better Toeplitz unstructured models, better represent error covariance structure data. Thus, data, conclude much gained replacing “standard” multilevel model change extended linear models change explored . However, data sets, magnitude difference modelling approaches may greater (depending study design, statistical model, choice error covariance structure, nature phenomenon study), may lead us prefer extended linear model change—inferential goal exclusively involves population-averaged interpretations fixed effects, interested addressing questions individuals via random effects (discussion, see McNeish, Stapleton, & Silverman, 2017; Muff, Held, & Keller, 2016).","code":"hypothesized_varcov <- list( unstructured = r\"($$\\mathbf{\\Sigma}_r = \\begin{bmatrix}\\sigma_1^2 & \\sigma_{12} & \\sigma_{13} & \\sigma_{14} \\\\ \\sigma_{21} & \\sigma_2^2 & \\sigma_{23} & \\sigma_{24} \\\\ \\sigma_{31} & \\sigma_{32} & \\sigma_3^2 & \\sigma_{34} \\\\ \\sigma_{41} & \\sigma_{42} & \\sigma_{43} & \\sigma_4^2 \\end{bmatrix}$$)\", compsymm = r\"($$\\mathbf{\\Sigma}_r = \\begin{bmatrix} \\sigma^2 + \\sigma_1^2 & \\sigma_1^2 & \\sigma_1^2 & \\sigma_1^2 \\\\ \\sigma_1^2 & \\sigma^2 + \\sigma_1^2 & \\sigma_1^2 & \\sigma_1^2 \\\\ \\sigma_1^2 & \\sigma_1^2 & \\sigma^2 + \\sigma_1^2 & \\sigma_1^2 \\\\ \\sigma_1^2 & \\sigma_1^2 & \\sigma_1^2 & \\sigma^2 + \\sigma_1^2 \\end{bmatrix}$$)\", hetcompsymm = r\"($$\\mathbf{\\Sigma}_r = \\begin{bmatrix} \\sigma_1^2 & \\sigma_1 \\sigma_2 \\rho & \\sigma_1 \\sigma_3 \\rho & \\sigma_1 \\sigma_4 \\rho \\\\ \\sigma_2 \\sigma_1 \\rho & \\sigma_1^2 & \\sigma_2 \\sigma_3 \\rho & \\sigma_2 \\sigma_4 \\rho \\\\ \\sigma_3 \\sigma_1 \\rho & \\sigma_3 \\sigma_2 \\rho & \\sigma_3^2 & \\sigma_3 \\sigma_4 \\rho \\\\ \\sigma_4 \\sigma_1 \\rho & \\sigma_4 \\sigma_2 \\rho & \\sigma_4 \\sigma_3 \\rho & \\sigma_4^2 \\end{bmatrix}$$)\", ar = r\"($$\\mathbf{\\Sigma}_r = \\begin{bmatrix} \\sigma^2 & \\sigma^2 \\rho & \\sigma^2 \\rho^2 & \\sigma^2 \\rho^3 \\\\ \\sigma^2 \\rho & \\sigma^2 & \\sigma^2 \\rho & \\sigma^2 \\rho^2 \\\\ \\sigma^2 \\rho^2 & \\sigma^2 \\rho & \\sigma^2 & \\sigma^2 \\rho \\\\ \\sigma^2 \\rho^3 & \\sigma^2 \\rho^2 & \\sigma^2 \\rho & \\sigma^2 \\end{bmatrix}$$)\", hetar = r\"($$\\mathbf{\\Sigma}_r = \\begin{bmatrix} \\sigma_1^2 & \\sigma_1 \\sigma_2 \\rho & \\sigma_1 \\sigma_3 \\rho^2 & \\sigma_1 \\sigma_4 \\rho^3 \\\\ \\sigma_2 \\sigma_1 \\rho & \\sigma_2^2 & \\sigma_2 \\sigma_3 \\rho & \\sigma_2 \\sigma_4 \\rho^2 \\\\ \\sigma_3 \\sigma_1 \\rho^2 & \\sigma_3 \\sigma_2 \\rho & \\sigma_3^2 & \\sigma_3 \\sigma_4 \\rho \\\\ \\sigma_4 \\sigma_1 \\rho^3 & \\sigma_4 \\sigma_2 \\rho^2 & \\sigma_4 \\sigma_3 \\rho & \\sigma_4^2 \\end{bmatrix} $$)\", toeplitz = r\"($$\\mathbf{\\Sigma}_r = \\begin{bmatrix}\\sigma^2 & \\sigma_1 & \\sigma_2 & \\sigma_3 \\\\ \\sigma_1 & \\sigma^2 & \\sigma_1 & \\sigma_2 \\\\ \\sigma_2 & \\sigma_1 & \\sigma^2 & \\sigma_1 \\\\ \\sigma_3 & \\sigma_2 & \\sigma_1 & \\sigma^2 \\end{bmatrix} $$)\" ) # Fit models ------------------------------------------------------------------ # Start with a base model we can update with the alternative error covariance # structures. Note that we won't display this model in the table. opposites_naming_fit <- gls( opposites_naming_score ~ time * I(baseline_cognitive_score - 113.4571), method = \"REML\", data = opposites_naming ) # Unstructured: opposites_naming_fit_unstructured <- update( opposites_naming_fit, correlation = corSymm(form = ~ 1 | id), weights = varIdent(form = ~ 1 | wave) ) # Compound symmetry: opposites_naming_fit_compsymm <- update( opposites_naming_fit, correlation = corCompSymm(form = ~ 1 | id) ) # Heterogeneous compound symmetry: opposites_naming_fit_hetcompsymm <- update( opposites_naming_fit_compsymm, weights = varIdent(form = ~ 1 | wave) ) # Autoregressive: opposites_naming_fit_ar <- update( opposites_naming_fit, correlation = corAR1(form = ~ 1 | id) ) # Heterogeneous autoregressive: opposites_naming_fit_hetar <- update( opposites_naming_fit_ar, weights = varIdent(form = ~ 1 | wave) ) # Toeplitz: opposites_naming_fit_toeplitz <- update( opposites_naming_fit, correlation = corARMA(form = ~ 1 | id, p = 3,q = 0) ) opposites_naming_fits <- list( \"Unstructured\" = opposites_naming_fit_unstructured, \"Compound symmetry\" = opposites_naming_fit_compsymm, \"Heterogeneous compound symmetry\" = opposites_naming_fit_hetcompsymm, \"Autoregressive\" = opposites_naming_fit_ar, \"Heterogeneous autoregressive\" = opposites_naming_fit_hetar, \"Toeplitz\" = opposites_naming_fit_toeplitz ) # Make table ------------------------------------------------------------------ # Table 7.3, page 258-259: opposites_naming_fits |> map2( # Note that this list was made in the collapsed code chunk above. It just # contains the equations corresponding to each error covariance structure. hypothesized_varcov, \\(.fit, .hypothesized_varcov) { format_varcov <- function(x) { x <- round(getVarCov(x), digits = 1) begin <- \"$$\\\\begin{bmatrix}\" body <- apply(x, 1, \\(.x) paste0(paste(.x, collapse = \"&\"), \"\\\\\\\\\")) end <- \"\\\\end{bmatrix}$$\" paste0(c(begin, body, end), collapse = \"\") } gof <- .fit |> glance() |> mutate( hypothesized_varcov = .hypothesized_varcov, \"-2LL\" = as.numeric(-2 * logLik), varcov = format_varcov(.fit), across(where(is.numeric), \\(.x) round(.x, digits = 1)) ) |> select(hypothesized_varcov, \"-2LL\", AIC, BIC, varcov) } ) |> list_rbind(names_to = \"structure\") |> gt() |> # Note: Math formatting in HTML currently requires gt version 0.10.1.9000 # (development version). fmt_markdown(columns = c(hypothesized_varcov, varcov)) # Table 7.4, page 265: opposites_naming_fit_standard |> list() |> set_names(\"Standard\") |> c(keep_at(opposites_naming_fits, c(\"Toeplitz\", \"Unstructured\"))) |> (\\(.x) .x[c(\"Standard\", \"Toeplitz\", \"Unstructured\")])() |> modelsummary( fmt = fmt_statistic(estimate = 2, statistic = 3), gof_map = list( list( raw = \"logLik\", clean = \"Deviance\", fmt = \\(.x) vec_fmt_number( -2*as.numeric(.x), decimals = 1, sep_mark = \"\" ) ), list( raw = \"AIC\", clean = \"AIC\", fmt = fmt_decimal(1) ), list( raw = \"BIC\", clean = \"BIC\", fmt = fmt_decimal(1) ) ), output = \"gt\" ) |> tab_row_group(label = \"Goodness-of-Fit\", rows = 13:15) |> tab_row_group(label = \"Variance Components\", rows = 9:12) |> tab_row_group(label = \"Fixed Effects\", rows = 1:8)"},{"path":[]},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-8.html","id":"the-basics-of-latent-growth-modeling","dir":"Articles","previous_headings":"","what":"8.2 The Basics of Latent Growth Modeling","title":"Chapter 8: Modeling change using covariance structure analysis","text":"Table 8.1, page 282: Table 8.2, page 289, Model : Figure 8.2, Model : Table 8.2, page 289, Model B: Comparison baseline model Model B: Figure 8.2, Model B: Table 8.2, page 289, Model C: Figure 8.2, Model C (Model B slope ~ 0*female): Table 8.2, page 289, Model D: Comparison baseline model Model D: Figure 8.2, Model D:","code":"alcohol_use_2_wide <- alcohol_use_2 |> pivot_wider(names_from = time, values_from = c(alcohol_use, peer_pressure)) alcohol_use_2_wide #> # A tibble: 1,122 × 8 #> id female alcohol_use_0 alcohol_use_1 alcohol_use_2 peer_pressure_0 #> #> 1 1 0 0.693 0.288 0.511 0 #> 2 2 0 0 0 0 0 #> 3 3 0 0 0 0 0 #> 4 4 0 0 0.511 0.511 1.10 #> 5 5 0 0.288 0 0.847 0 #> 6 6 0 0 0 0 0 #> 7 7 0 0.288 0.288 0 0 #> 8 8 0 0 0 0 0 #> 9 9 0 0 0.511 0 0 #> 10 10 0 0.511 0.693 1.30 0 #> # ℹ 1,112 more rows #> # ℹ 2 more variables: peer_pressure_1 , peer_pressure_2 # Means alcohol_use_2_wide |> summarise(across(female:peer_pressure_2, mean)) |> glimpse() #> Rows: 1 #> Columns: 7 #> $ female 0.6122995 #> $ alcohol_use_0 0.2250666 #> $ alcohol_use_1 0.2541351 #> $ alcohol_use_2 0.287923 #> $ peer_pressure_0 0.1771944 #> $ peer_pressure_1 0.2904569 #> $ peer_pressure_2 0.3470381 # Covariances cov(select(alcohol_use_2_wide, -c(id, female))) #> alcohol_use_0 alcohol_use_1 alcohol_use_2 peer_pressure_0 #> alcohol_use_0 0.13558718 0.07775260 0.06526470 0.06586967 #> alcohol_use_1 0.07775260 0.15528121 0.08186386 0.04479710 #> alcohol_use_2 0.06526470 0.08186386 0.18075945 0.03988182 #> peer_pressure_0 0.06586967 0.04479710 0.03988182 0.17399159 #> peer_pressure_1 0.06404875 0.09647876 0.06580980 0.07158186 #> peer_pressure_2 0.06008199 0.07433086 0.13197010 0.07071309 #> peer_pressure_1 peer_pressure_2 #> alcohol_use_0 0.06404875 0.06008199 #> alcohol_use_1 0.09647876 0.07433086 #> alcohol_use_2 0.06580980 0.13197010 #> peer_pressure_0 0.07158186 0.07071309 #> peer_pressure_1 0.26190160 0.11180554 #> peer_pressure_2 0.11180554 0.28901177 # Model A: Unconditional model model_A <- (\" # Intercept and slope with fixed coefficients intercept =~ 1*alcohol_use_0 + 1*alcohol_use_1 + 1*alcohol_use_2 slope =~ 0*alcohol_use_0 + .75*alcohol_use_1 + 1.75*alcohol_use_2 \") model_A_fit <- growth( model_A, data = alcohol_use_2_wide, estimator = \"ml\", mimic = \"Mplus\" ) summary(model_A_fit) #> lavaan 0.6.17 ended normally after 32 iterations #> #> Estimator ML #> Optimization method NLMINB #> Number of model parameters 8 #> #> Number of observations 1122 #> Number of missing patterns 1 #> #> Model Test User Model: #> #> Test statistic 0.048 #> Degrees of freedom 1 #> P-value (Chi-square) 0.826 #> #> Parameter Estimates: #> #> Standard errors Standard #> Information Observed #> Observed information based on Hessian #> #> Latent Variables: #> Estimate Std.Err z-value P(>|z|) #> intercept =~ #> alcohol_use_0 1.000 #> alcohol_use_1 1.000 #> alcohol_use_2 1.000 #> slope =~ #> alcohol_use_0 0.000 #> alcohol_use_1 0.750 #> alcohol_use_2 1.750 #> #> Covariances: #> Estimate Std.Err z-value P(>|z|) #> intercept ~~ #> slope -0.012 0.005 -2.727 0.006 #> #> Intercepts: #> Estimate Std.Err z-value P(>|z|) #> intercept 0.226 0.011 21.106 0.000 #> slope 0.036 0.007 4.898 0.000 #> #> Variances: #> Estimate Std.Err z-value P(>|z|) #> .alcohol_use_0 0.048 0.006 7.550 0.000 #> .alcohol_use_1 0.076 0.004 17.051 0.000 #> .alcohol_use_2 0.077 0.010 7.756 0.000 #> intercept 0.087 0.007 12.253 0.000 #> slope 0.020 0.005 3.795 0.000 fitMeasures(model_A_fit, c(\"chisq\", \"df\", \"pvalue\", \"cfi\", \"rmsea\")) #> chisq df pvalue cfi rmsea #> 0.048 1.000 0.826 1.000 0.000 lay <- get_layout( NA, \"intercept\", NA, \"slope\", NA, \"alcohol_use_0\", NA, \"alcohol_use_1\", NA, \"alcohol_use_2\", rows = 2 ) graph_sem(model_A_fit, layout = lay) # Model B: Adding female as a time-invariant predictor model_B <- (\" # Intercept and slope with fixed coefficients intercept =~ 1*alcohol_use_0 + 1*alcohol_use_1 + 1*alcohol_use_2 slope =~ 0*alcohol_use_0 + .75*alcohol_use_1 + 1.75*alcohol_use_2 # Regressions intercept ~ female slope ~ female \") model_B_fit <- growth( model_B, data = alcohol_use_2_wide, estimator = \"ml\", mimic = \"Mplus\" ) summary(model_B_fit) #> lavaan 0.6.17 ended normally after 33 iterations #> #> Estimator ML #> Optimization method NLMINB #> Number of model parameters 10 #> #> Number of observations 1122 #> Number of missing patterns 1 #> #> Model Test User Model: #> #> Test statistic 1.545 #> Degrees of freedom 2 #> P-value (Chi-square) 0.462 #> #> Parameter Estimates: #> #> Standard errors Standard #> Information Observed #> Observed information based on Hessian #> #> Latent Variables: #> Estimate Std.Err z-value P(>|z|) #> intercept =~ #> alcohol_use_0 1.000 #> alcohol_use_1 1.000 #> alcohol_use_2 1.000 #> slope =~ #> alcohol_use_0 0.000 #> alcohol_use_1 0.750 #> alcohol_use_2 1.750 #> #> Regressions: #> Estimate Std.Err z-value P(>|z|) #> intercept ~ #> female -0.042 0.022 -1.912 0.056 #> slope ~ #> female 0.008 0.015 0.522 0.602 #> #> Covariances: #> Estimate Std.Err z-value P(>|z|) #> .intercept ~~ #> .slope -0.012 0.005 -2.661 0.008 #> #> Intercepts: #> Estimate Std.Err z-value P(>|z|) #> .intercept 0.251 0.017 14.653 0.000 #> .slope 0.031 0.012 2.640 0.008 #> #> Variances: #> Estimate Std.Err z-value P(>|z|) #> .alcohol_use_0 0.049 0.006 7.616 0.000 #> .alcohol_use_1 0.075 0.004 17.036 0.000 #> .alcohol_use_2 0.077 0.010 7.789 0.000 #> .intercept 0.086 0.007 12.191 0.000 #> .slope 0.019 0.005 3.740 0.000 fitMeasures(model_B_fit, c(\"chisq\", \"df\", \"pvalue\", \"cfi\", \"rmsea\")) #> chisq df pvalue cfi rmsea #> 1.545 2.000 0.462 1.000 0.000 # Baseline for Model B (not shown in table) model_B_baseline <- (\" # Intercept and slope with fixed coefficients intercept =~ 1*alcohol_use_0 + 1*alcohol_use_1 + 1*alcohol_use_2 slope =~ 0*alcohol_use_0 + .75*alcohol_use_1 + 1.75*alcohol_use_2 # Regressions intercept ~ 0*female slope ~ 0*female alcohol_use_0 ~ 0*1 alcohol_use_1 ~ 0*1 alcohol_use_2 ~ 0*1 \") model_B_baseline_fit <- growth( model_B_baseline, data = alcohol_use_2_wide, estimator = \"ml\", mimic = \"Mplus\" ) anova(model_B_baseline_fit, model_B_fit) #> #> Chi-Squared Difference Test #> #> Df AIC BIC Chisq Chisq diff RMSEA Df diff #> model_B_fit 2 2577.9 2628.1 1.5447 #> model_B_baseline_fit 4 2577.7 2617.9 5.3665 3.8218 0.028493 2 #> Pr(>Chisq) #> model_B_fit #> model_B_baseline_fit 0.1479 lay <- get_layout( NA, NA, \"female\", NA, NA, NA, \"intercept\", NA, \"slope\", NA, \"alcohol_use_0\", NA, \"alcohol_use_1\", NA, \"alcohol_use_2\", rows = 3 ) graph_sem(model_B_fit, layout = lay) # Model C: Model B but with slope fixed to zero model_C <- (\" # Intercept and slope with fixed coefficients intercept =~ 1*alcohol_use_0 + 1*alcohol_use_1 + 1*alcohol_use_2 slope =~ 0*alcohol_use_0 + .75*alcohol_use_1 + 1.75*alcohol_use_2 # Regressions intercept ~ female slope ~ 0*female \") model_C_fit <- growth( model_C, data = alcohol_use_2_wide, estimator = \"ml\", mimic = \"Mplus\" ) summary(model_C_fit) #> lavaan 0.6.17 ended normally after 32 iterations #> #> Estimator ML #> Optimization method NLMINB #> Number of model parameters 9 #> #> Number of observations 1122 #> Number of missing patterns 1 #> #> Model Test User Model: #> #> Test statistic 1.817 #> Degrees of freedom 3 #> P-value (Chi-square) 0.611 #> #> Parameter Estimates: #> #> Standard errors Standard #> Information Observed #> Observed information based on Hessian #> #> Latent Variables: #> Estimate Std.Err z-value P(>|z|) #> intercept =~ #> alcohol_use_0 1.000 #> alcohol_use_1 1.000 #> alcohol_use_2 1.000 #> slope =~ #> alcohol_use_0 0.000 #> alcohol_use_1 0.750 #> alcohol_use_2 1.750 #> #> Regressions: #> Estimate Std.Err z-value P(>|z|) #> intercept ~ #> female -0.037 0.019 -1.885 0.059 #> slope ~ #> female 0.000 #> #> Covariances: #> Estimate Std.Err z-value P(>|z|) #> .intercept ~~ #> .slope -0.012 0.005 -2.667 0.008 #> #> Intercepts: #> Estimate Std.Err z-value P(>|z|) #> .intercept 0.248 0.016 15.525 0.000 #> .slope 0.036 0.007 4.898 0.000 #> #> Variances: #> Estimate Std.Err z-value P(>|z|) #> .alcohol_use_0 0.049 0.006 7.609 0.000 #> .alcohol_use_1 0.075 0.004 17.036 0.000 #> .alcohol_use_2 0.077 0.010 7.801 0.000 #> .intercept 0.086 0.007 12.194 0.000 #> .slope 0.019 0.005 3.739 0.000 fitMeasures(model_C_fit, c(\"chisq\", \"df\", \"pvalue\", \"cfi\", \"rmsea\")) #> chisq df pvalue cfi rmsea #> 1.817 3.000 0.611 1.000 0.000 graph_sem(model_C_fit, layout = lay) # Model D: Adding peer_pressure as a time-varying predictor model_D <- (\" # Intercept and slope with fixed coefficients alc_intercept =~ 1*alcohol_use_0 + 1*alcohol_use_1 + 1*alcohol_use_2 alc_slope =~ 0*alcohol_use_0 + .75*alcohol_use_1 + 1.75*alcohol_use_2 peer_intercept =~ 1*peer_pressure_0 + 1*peer_pressure_1 + 1*peer_pressure_2 peer_slope =~ 0*peer_pressure_0 + .75*peer_pressure_1 + 1.75*peer_pressure_2 # Regressions alc_intercept ~ start(.8)*peer_intercept + start(.08)*peer_slope alc_slope ~ start(-.1)*peer_intercept + start(.6)*peer_slope # Time-varying covariances alcohol_use_0 ~~ peer_pressure_0 alcohol_use_1 ~~ peer_pressure_1 alcohol_use_2 ~~ peer_pressure_2 # Fix intercepts to zero alcohol_use_0 ~ 0*1 alcohol_use_1 ~ 0*1 alcohol_use_2 ~ 0*1 peer_pressure_0 ~ 0*1 peer_pressure_1 ~ 0*1 peer_pressure_2 ~ 0*1 \") model_D_fit <- growth( model_D, data = alcohol_use_2_wide, estimator = \"ml\", mimic = \"Mplus\" ) summary(model_D_fit) #> lavaan 0.6.17 ended normally after 72 iterations #> #> Estimator ML #> Optimization method NLMINB #> Number of model parameters 23 #> #> Number of observations 1122 #> Number of missing patterns 1 #> #> Model Test User Model: #> #> Test statistic 11.557 #> Degrees of freedom 4 #> P-value (Chi-square) 0.021 #> #> Parameter Estimates: #> #> Standard errors Standard #> Information Observed #> Observed information based on Hessian #> #> Latent Variables: #> Estimate Std.Err z-value P(>|z|) #> alc_intercept =~ #> alcohol_use_0 1.000 #> alcohol_use_1 1.000 #> alcohol_use_2 1.000 #> alc_slope =~ #> alcohol_use_0 0.000 #> alcohol_use_1 0.750 #> alcohol_use_2 1.750 #> peer_intercept =~ #> peer_pressur_0 1.000 #> peer_pressur_1 1.000 #> peer_pressur_2 1.000 #> peer_slope =~ #> peer_pressur_0 0.000 #> peer_pressur_1 0.750 #> peer_pressur_2 1.750 #> #> Regressions: #> Estimate Std.Err z-value P(>|z|) #> alc_intercept ~ #> peer_intercept 0.799 0.103 7.781 0.000 #> peer_slope 0.080 0.184 0.438 0.661 #> alc_slope ~ #> peer_intercept -0.143 0.076 -1.884 0.060 #> peer_slope 0.577 0.193 2.990 0.003 #> #> Covariances: #> Estimate Std.Err z-value P(>|z|) #> .alcohol_use_0 ~~ #> .peer_pressur_0 0.011 0.006 1.773 0.076 #> .alcohol_use_1 ~~ #> .peer_pressur_1 0.034 0.005 7.324 0.000 #> .alcohol_use_2 ~~ #> .peer_pressur_2 0.037 0.010 3.663 0.000 #> peer_intercept ~~ #> peer_slope 0.001 0.007 0.166 0.868 #> .alc_intercept ~~ #> .alc_slope -0.006 0.005 -1.249 0.212 #> #> Intercepts: #> Estimate Std.Err z-value P(>|z|) #> .alcohol_use_0 0.000 #> .alcohol_use_1 0.000 #> .alcohol_use_2 0.000 #> .peer_pressur_0 0.000 #> .peer_pressur_1 0.000 #> .peer_pressur_2 0.000 #> .alc_intercept 0.067 0.016 4.252 0.000 #> .alc_slope 0.008 0.015 0.564 0.573 #> peer_intercept 0.188 0.012 15.743 0.000 #> peer_slope 0.096 0.010 9.922 0.000 #> #> Variances: #> Estimate Std.Err z-value P(>|z|) #> .alcohol_use_0 0.048 0.006 7.553 0.000 #> .alcohol_use_1 0.076 0.004 17.165 0.000 #> .alcohol_use_2 0.076 0.010 7.819 0.000 #> .peer_pressur_0 0.106 0.011 9.790 0.000 #> .peer_pressur_1 0.171 0.009 19.713 0.000 #> .peer_pressur_2 0.129 0.018 7.325 0.000 #> .alc_intercept 0.042 0.007 5.649 0.000 #> .alc_slope 0.009 0.005 1.697 0.090 #> peer_intercept 0.070 0.010 6.729 0.000 #> peer_slope 0.028 0.009 3.214 0.001 fitMeasures(model_D_fit, c(\"chisq\", \"df\", \"pvalue\", \"cfi\", \"rmsea\")) #> chisq df pvalue cfi rmsea #> 11.557 4.000 0.021 0.996 0.041 # Baseline for Model D (not shown in table) model_D_baseline <- (\" # Intercepts and slopes with fixed coefficients alc_intercept =~ 1*alcohol_use_0 + 1*alcohol_use_1 + 1*alcohol_use_2 alc_slope =~ 0*alcohol_use_0 + .75*alcohol_use_1 + 1.75*alcohol_use_2 peer_intercept =~ 1*peer_pressure_0 + 1*peer_pressure_1 + 1*peer_pressure_2 peer_slope =~ 0*peer_pressure_0 + .75*peer_pressure_1 + 1.75*peer_pressure_2 # Regressions alc_intercept ~ 0*peer_intercept + 0*peer_slope alc_slope ~ 0*peer_intercept + 0*peer_slope # Time-varying covariances alcohol_use_0 ~~ peer_pressure_0 alcohol_use_1 ~~ peer_pressure_1 alcohol_use_2 ~~ peer_pressure_2 alcohol_use_0 ~ 0*1 alcohol_use_1 ~ 0*1 alcohol_use_2 ~ 0*1 peer_pressure_0 ~ 0*1 peer_pressure_1 ~ 0*1 peer_pressure_2 ~ 0*1 \") model_D_baseline_fit <- growth( model_D_baseline, data = alcohol_use_2_wide, estimator = \"ml\", mimic = \"Mplus\" ) anova(model_D_baseline_fit, model_D_fit) #> #> Chi-Squared Difference Test #> #> Df AIC BIC Chisq Chisq diff RMSEA Df diff #> model_D_fit 4 6120.5 6236.1 11.557 #> model_D_baseline_fit 8 6443.6 6539.1 342.648 331.09 0.26997 4 #> Pr(>Chisq) #> model_D_fit #> model_D_baseline_fit < 2.2e-16 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 lay <- get_layout( \"peer_pressure_0\", NA, \"peer_pressure_1\", NA, \"peer_pressure_2\", NA, \"peer_intercept\", NA, \"peer_slope\", NA, NA, \"alc_intercept\", NA, \"alc_slope\", NA, \"alcohol_use_0\", NA, \"alcohol_use_1\", NA, \"alcohol_use_2\", rows = 4 ) graph_sem(model_D_fit, layout = lay)"},{"path":[]},{"path":[]},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-9.html","id":"should-you-conduct-a-survival-analysis-the-whether-and-when-test","dir":"Articles","previous_headings":"","what":"9.1 Should you conduct a survival analysis? The “whether” and “when” test","title":"Chapter 9: A framework for investigating event occurrence","text":"Section 9.1 Singer Willett (2003) introduce simple mnemonic refer “whether” “” test determine whether research question may call survival analysis: research questions includes words “whether” “”, likely need use survival methods. illustrate range research questions survival methods suitable, introduce three studies pass “whether” “” test: alcohol_relapse: person-level data frame 89 rows 3 columns containing subset data Cooney colleagues (1991), measured whether () 89 recently treated alcoholics first relapsed alcohol use. teachers: person-level data frame 3941 rows 3 columns containing subset data Singer (1993), measured whether () 3941 newly hired special educators Michigan first stopped teaching state. suicide_ideation: person-level data frame 391 rows 4 columns containing subset data Bolger colleagues (1989), measured whether () 391 undergraduate students first experienced suicide ideation. later chapters, return data sets explore different survival methods.","code":"alcohol_relapse #> # A tibble: 89 × 3 #> id weeks censor #> #> 1 1 0.714 0 #> 2 2 0.714 0 #> 3 3 1.14 0 #> 4 4 1.43 0 #> 5 5 1.71 0 #> 6 6 1.71 0 #> 7 7 2.14 0 #> 8 8 2.71 0 #> 9 9 3.86 0 #> 10 10 4.14 0 #> # ℹ 79 more rows teachers #> # A tibble: 3,941 × 3 #> id years censor #> #> 1 1 1 0 #> 2 2 2 0 #> 3 3 1 0 #> 4 4 1 0 #> 5 5 12 1 #> 6 6 1 0 #> 7 7 12 1 #> 8 8 1 0 #> 9 9 2 0 #> 10 10 2 0 #> # ℹ 3,931 more rows suicide_ideation #> # A tibble: 391 × 4 #> id time censor age #> #> 1 1 16 0 18 #> 2 2 10 0 19 #> 3 3 16 0 19 #> 4 4 20 0 22 #> 5 6 15 0 22 #> 6 7 10 0 19 #> 7 8 22 1 22 #> 8 9 22 1 22 #> 9 10 15 0 20 #> 10 11 10 0 19 #> # ℹ 381 more rows"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-9.html","id":"framing-a-research-question-about-event-occurrence","dir":"Articles","previous_headings":"","what":"9.2 Framing a research question about event occurrence","title":"Chapter 9: A framework for investigating event occurrence","text":"Section 9.2 Singer Willett (2003) discuss three methodological features make study suitable survival analysis: target event, whose occurrence represents individual’s transition one state another state, set states precisely defined, mutually exclusive, jointly exhaustive. beginning time, everyone population (least theoretically) risk experiencing target event, individuals occupy one possible non-event states. temporal distance beginning time event occurrence referred event time. metric clocking time, provides meaningful temporal scale record event occurrence—smallest possible units relevant process study. analytical reasons, distinguish discrete time continuous time, depending whether time measured discrete continuous intervals. three example studies introduced possesses features.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-9.html","id":"censoring-how-complete-are-the-data-on-event-occurrence","dir":"Articles","previous_headings":"","what":"9.3 Censoring: How complete are the data on event occurrence?","title":"Chapter 9: A framework for investigating event occurrence","text":"Section 9.3 Singer Willett (2003) introduce concept censoring censored observations, occur sample members unknown event times—preventing knowing whether () target event occurs subset sample. Censoring hallmark feature event occurrence data makes new statistical methods necessary; arises different ways rates, several different forms: Censoring occurs two primary reasons: (1) individuals never experience target event; (2) individuals experience target event outside study’s data collection period. amount censoring study related two factors: (1) rate target event occurs population; (2) length data collection period. two mechanisms behind censoring: (1) noninformative mechanism, censoring occurs reasons independent event occurrence risk event occurrence; (2) informative mechanism, censoring occurs reasons related event occurrence risk event occurrence two types censoring: (1) right-censoring arises event time unknown event occurrence observed; (2) left-censoring arises event time unknown beginning time observed. three example studies different rates censoring: 22.5% former alcoholics remained abstinent, 44.0% newly hired teachers still teaching Michigan, 29.7% undergraduates experience suicide ideation. Singer Willett (2003) discuss, toll censoring can seen plotting event times censored event times teachers data. Notice discrepancy sample distributions known event times censored event times—typical event occurrence data, makes summarizing time--event occurrence difficult adequately traditional descriptive methods (e.g., measures central tendency dispersion). remaining chapters, explore several different methods survival analysis: alternative statistical approach incorporates censored observations based information provide event nonoccurrence, allowing us adequately summarize time--event occurrence dealing evenhandedly known censored event times.","code":"map( list( alcohol_relapse = alcohol_relapse, teachers = teachers, suicide_ideation = suicide_ideation ), \\(.x) .x |> count(censor, name = \"count\") |> mutate(proportion = count / sum(count)) ) #> $alcohol_relapse #> # A tibble: 2 × 3 #> censor count proportion #> #> 1 0 69 0.775 #> 2 1 20 0.225 #> #> $teachers #> # A tibble: 2 × 3 #> censor count proportion #> #> 1 0 2207 0.560 #> 2 1 1734 0.440 #> #> $suicide_ideation #> # A tibble: 2 × 3 #> censor count proportion #> #> 1 0 275 0.703 #> 2 1 116 0.297 # Figure 9.1, page 321: ggplot(teachers, aes(x = years)) + geom_bar() + geom_text(aes(label = after_stat(count)), stat = \"count\", vjust = -.5) + scale_x_continuous(breaks = 1:12) + coord_cartesian(ylim = c(0, 550)) + facet_wrap(vars(censor), nrow = 2, labeller = label_both)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/longitudinal-data-organization.html","id":"longitudinal-data-formats","dir":"Articles","previous_headings":"","what":"Longitudinal data formats","title":"Longitudinal data organization","text":"Longitudinal data can organized two distinct formats: person-level, wide, multivariate, format person one row data multiple columns containing data measurement occasion. person-period, long, univariate, format person one row data measurement occasion. R functions expect data person-period format visualization analysis, ’s easy convert longitudinal data set one format .","code":"glimpse(deviant_tolerance_pl) #> Rows: 16 #> Columns: 8 #> $ id 9, 45, 268, 314, 442, 514, 569, 624, 723, 918, 949, 978, … #> $ tolerance_11 2.23, 1.12, 1.45, 1.22, 1.45, 1.34, 1.79, 1.12, 1.22, 1.0… #> $ tolerance_12 1.79, 1.45, 1.34, 1.22, 1.99, 1.67, 1.90, 1.12, 1.34, 1.0… #> $ tolerance_13 1.90, 1.45, 1.99, 1.55, 1.45, 2.23, 1.90, 1.22, 1.12, 1.2… #> $ tolerance_14 2.12, 1.45, 1.79, 1.12, 1.67, 2.12, 1.99, 1.12, 1.00, 1.9… #> $ tolerance_15 2.66, 1.99, 1.34, 1.12, 1.90, 2.44, 1.99, 1.22, 1.12, 1.2… #> $ male 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0 #> $ exposure 1.54, 1.16, 0.90, 0.81, 1.13, 0.90, 1.99, 0.98, 0.81, 1.2… glimpse(deviant_tolerance_pp) #> Rows: 80 #> Columns: 5 #> $ id 9, 9, 9, 9, 9, 45, 45, 45, 45, 45, 268, 268, 268, 268, 268, … #> $ age 11, 12, 13, 14, 15, 11, 12, 13, 14, 15, 11, 12, 13, 14, 15, … #> $ tolerance 2.23, 1.79, 1.90, 2.12, 2.66, 1.12, 1.45, 1.45, 1.45, 1.99, … #> $ male 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, … #> $ exposure 1.54, 1.54, 1.54, 1.54, 1.54, 1.16, 1.16, 1.16, 1.16, 1.16, …"},{"path":"https://mccarthy-m-g.github.io/alda/articles/longitudinal-data-organization.html","id":"converting-between-formats","dir":"Articles","previous_headings":"Longitudinal data formats","what":"Converting between formats","title":"Longitudinal data organization","text":"convert person-level data set person-period format use dplyr::pivot_longer(): convert person-period data set person-level format use dplyr::pivot_wider():","code":"pivot_longer( deviant_tolerance_pl, cols = starts_with(\"tolerance_\"), names_to = \"age\", names_pattern = \"([[:digit:]]+)\", names_transform = as.integer, values_to = \"tolerance\" ) #> # A tibble: 80 × 5 #> id male exposure age tolerance #> #> 1 9 0 1.54 11 2.23 #> 2 9 0 1.54 12 1.79 #> 3 9 0 1.54 13 1.9 #> 4 9 0 1.54 14 2.12 #> 5 9 0 1.54 15 2.66 #> 6 45 1 1.16 11 1.12 #> 7 45 1 1.16 12 1.45 #> 8 45 1 1.16 13 1.45 #> 9 45 1 1.16 14 1.45 #> 10 45 1 1.16 15 1.99 #> # ℹ 70 more rows pivot_wider( deviant_tolerance_pp, names_from = age, names_prefix = \"tolerance_\", values_from = tolerance ) #> # A tibble: 16 × 8 #> id male exposure tolerance_11 tolerance_12 tolerance_13 tolerance_14 #> #> 1 9 0 1.54 2.23 1.79 1.9 2.12 #> 2 45 1 1.16 1.12 1.45 1.45 1.45 #> 3 268 1 0.9 1.45 1.34 1.99 1.79 #> 4 314 0 0.81 1.22 1.22 1.55 1.12 #> 5 442 0 1.13 1.45 1.99 1.45 1.67 #> 6 514 1 0.9 1.34 1.67 2.23 2.12 #> 7 569 0 1.99 1.79 1.9 1.9 1.99 #> 8 624 1 0.98 1.12 1.12 1.22 1.12 #> 9 723 0 0.81 1.22 1.34 1.12 1 #> 10 918 0 1.21 1 1 1.22 1.99 #> 11 949 1 0.93 1.99 1.55 1.12 1.45 #> 12 978 1 1.59 1.22 1.34 2.12 3.46 #> 13 1105 1 1.38 1.34 1.9 1.99 1.9 #> 14 1542 0 1.44 1.22 1.22 1.99 1.79 #> 15 1552 0 1.04 1 1.12 2.23 1.55 #> 16 1653 0 1.25 1.11 1.11 1.34 1.55 #> # ℹ 1 more variable: tolerance_15 "},{"path":"https://mccarthy-m-g.github.io/alda/articles/longitudinal-data-organization.html","id":"adding-discrete-time-indicators-to-person-period-data","dir":"Articles","previous_headings":"","what":"Adding discrete time indicators to person-period data","title":"Longitudinal data organization","text":"add discrete time indicators person-period data set first create temporary copy time variable column ones, use dplyr::pivot_wider():","code":"deviant_tolerance_pp |> mutate( temp_age = age, temp_dummy = 1 ) |> pivot_wider( names_from = temp_age, names_prefix = \"age_\", values_from = temp_dummy, values_fill = 0 ) #> # A tibble: 80 × 10 #> id age tolerance male exposure age_11 age_12 age_13 age_14 age_15 #> #> 1 9 11 2.23 0 1.54 1 0 0 0 0 #> 2 9 12 1.79 0 1.54 0 1 0 0 0 #> 3 9 13 1.9 0 1.54 0 0 1 0 0 #> 4 9 14 2.12 0 1.54 0 0 0 1 0 #> 5 9 15 2.66 0 1.54 0 0 0 0 1 #> 6 45 11 1.12 1 1.16 1 0 0 0 0 #> 7 45 12 1.45 1 1.16 0 1 0 0 0 #> 8 45 13 1.45 1 1.16 0 0 1 0 0 #> 9 45 14 1.45 1 1.16 0 0 0 1 0 #> 10 45 15 1.99 1 1.16 0 0 0 0 1 #> # ℹ 70 more rows"},{"path":"https://mccarthy-m-g.github.io/alda/articles/longitudinal-data-organization.html","id":"adding-contiguous-periods-to-person-level-survival-data","dir":"Articles","previous_headings":"","what":"Adding contiguous periods to person-level survival data","title":"Longitudinal data organization","text":"add contiguous periods person-level data use dplyr::reframe():","code":"first_sex |> # In order to add the event indicator, the time variable needs a different # name in the person-level data from the name we want to use in `reframe()`. # This is a temporary variable so it doesn't matter what the name is. rename(grades = grade) |> group_by(id) |> reframe( grade = 1:max(grades), event = if_else(grade == grades & censor == 0, 1, 0), # To keep predictors from the person-level data, simply list them. If there # are many predictors it might be more convenient to use # `dplyr::left_join()` after `reframe()`. parental_transition, parental_antisociality ) #> # A tibble: 1,902 × 5 #> id grade event parental_transition parental_antisociality #> #> 1 1 1 0 0 1.98 #> 2 1 2 0 0 1.98 #> 3 1 3 0 0 1.98 #> 4 1 4 0 0 1.98 #> 5 1 5 0 0 1.98 #> 6 1 6 0 0 1.98 #> 7 1 7 0 0 1.98 #> 8 1 8 0 0 1.98 #> 9 1 9 1 0 1.98 #> 10 2 1 0 1 -0.545 #> # ℹ 1,892 more rows"},{"path":"https://mccarthy-m-g.github.io/alda/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Michael McCarthy. Author, maintainer.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"McCarthy M (2024). alda: Data Applied longitudinal data analysis: Modeling change event occurrence. R package version 0.0.0.9000, https://github.com/mccarthy-m-g/alda, https://mccarthy-m-g.github.io/alda/.","code":"@Manual{, title = {alda: Data for Applied longitudinal data analysis: Modeling change and event occurrence}, author = {Michael McCarthy}, year = {2024}, note = {R package version 0.0.0.9000, https://github.com/mccarthy-m-g/alda}, url = {https://mccarthy-m-g.github.io/alda/}, }"},{"path":"https://mccarthy-m-g.github.io/alda/index.html","id":"alda","dir":"","previous_headings":"","what":"Data for Applied longitudinal data analysis: Modeling change and event occurrence","title":"Data for Applied longitudinal data analysis: Modeling change and event occurrence","text":"package contains 31 data sets provided Singer Willett (2003) book, Applied longitudinal data analysis: Modeling change event occurrence, suitable longitudinal mixed effects modelling, longitudinal structural equation modelling, survival analysis. data sets package real data real studies; however, modified Singer Willett (2003) illustration statistical methods, may match results original studies. eleven data sets longitudinal mixed effects modelling: ?deviant_tolerance: Adolescent tolerance deviant behaviour (Chapter 2) ?early_intervention: Early educational interventions cognitive performance (Chapter 3) ?alcohol_use_1: Adolescent peer alcohol use (Chapters 4 6) ?reading_scores: Peabody Individual Achievement Test reading scores (Chapter 5) ?dropout_wages: High school dropout labour market experiences (Chapters 5 6) ?depression_unemployment: Unemployment depression (Chapter 5) ?antidepressants: Antidepressant medication positive mood (Chapter 5) ?berkeley: Berkeley Growth Study (Chapter 6) ?externalizing_behaviour: Externalizing behaviour children (Chapter 6) ?cognitive_growth: Cognitive growth children (Chapter 6) ?opposites_naming: Opposites naming task (Chapter 7) one data set longitudinal structural equation modelling: ?alcohol_use_2: Adolescent alcohol consumption peer pressure (Chapter 8) twenty data sets survival analysis: ?teachers: Years special education teacher turnover (Chapters 9 10) ?cocaine_relapse_1: Weeks cocaine relapse treatment (Chapter 10) ?first_sex: Age first sexual intercourse (Chapters 10 11) ?suicide_ideation: Age first suicide ideation (Chapter 10) ?congresswomen: House Representatives tenure (Chapter 10) ?tenure: Years academic tenure (Chapter 12) ?first_depression_1: Age first depression (Chapter 12) ?first_arrest: Age first juvenile arrest (Chapter 12) ?math_dropout: Math course history (Chapter 12) ?honking: Time horn honking (Chapter 13) ?alcohol_relapse: Weeks alcohol relapse treatment (Chapter 13) ?judges: Supreme Court justice tenure (Chapters 13 15) ?first_depression_2: Age first depression (Chapter 13) ?health_workers: Length health worker employment (Chapter 13) ?rearrest: Days inmate recidivism (Chapters 14 15) ?first_cocaine: Age first cocaine use (Chapter 15) ?cocaine_relapse_2: Days cocaine relapse abstinence (Chapter 15) ?psychiatric_discharge: Days psychiatric hospital discharge (Chapter 15) ?physicians: Physician career history (Chapter 15) ?monkeys: Piagetian monkeys (Chapter 15)","code":""},{"path":"https://mccarthy-m-g.github.io/alda/index.html","id":"vignettes-and-articles","dir":"","previous_headings":"","what":"Vignettes and articles","title":"Data for Applied longitudinal data analysis: Modeling change and event occurrence","text":"one vignette tips tricks working longitudinal data: vignette(\"longitudinal-data-organization\") fourteen articles package documentation website demonstrating recreate examples textbook R: Chapter 2: Exploring longitudinal data change Chapter 3: Introducing multilevel model change Chapter 4: data analysis multilevel model change Chapter 5: Treating time flexibly Chapter 6: Modeling discontinuous nonlinear change Chapter 7: Examining multilevel model’s error covariance structure Chapter 8: Modeling change using covariance structure analysis Chapter 9: framework investigating event occurrence Chapter 10: Describing discrete-time event occurrence data Chapter 11: Fitting basic discrete-time hazard models Chapter 12: Extending discrete-time hazard model Chapter 13: Describing continuous-time event occurrence data Chapter 14: Fitting Cox regression model Chapter 15: Extending Cox regression model","code":""},{"path":"https://mccarthy-m-g.github.io/alda/index.html","id":"documentation","dir":"","previous_headings":"","what":"Documentation","title":"Data for Applied longitudinal data analysis: Modeling change and event occurrence","text":"See https://mccarthy-m-g.github.io/alda/ also installed package: help(package = \"alda\").","code":""},{"path":"https://mccarthy-m-g.github.io/alda/index.html","id":"references","dir":"","previous_headings":"","what":"References","title":"Data for Applied longitudinal data analysis: Modeling change and event occurrence","text":"Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis: Modeling change event occurrence. Oxford University Press, USA. https://doi.org/10.1093/acprof:oso/9780195152968.001.0001","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/alcohol_relapse.html","id":null,"dir":"Reference","previous_headings":"","what":"Weeks to alcohol relapse after treatment — alcohol_relapse","title":"Weeks to alcohol relapse after treatment — alcohol_relapse","text":"subset data Cooney colleagues (1991) measuring weeks first \"heavy drinking\" day sample 89 recently treated alcoholics. Individuals followed two years (around 104.286 weeks) relapsed.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/alcohol_relapse.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Weeks to alcohol relapse after treatment — alcohol_relapse","text":"","code":"alcohol_relapse"},{"path":"https://mccarthy-m-g.github.io/alda/reference/alcohol_relapse.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Weeks to alcohol relapse after treatment — alcohol_relapse","text":"person-level data frame 89 rows 3 columns: id Participant ID. weeks Number weeks first \"heavy drinking\" day censor Censoring status.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/alcohol_relapse.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Weeks to alcohol relapse after treatment — alcohol_relapse","text":"Cooney, N. L., Kadden, R. M., Litt, M. D., & Getter, H. (1991). Matching alcoholics coping skills interactional therapies: Two-year follow-results. Journal Consulting Clinical Psychology, 59, 598–601. https://doi.org/10.1037/0022-006X.59.4.598","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/alcohol_use_1.html","id":null,"dir":"Reference","previous_headings":"","what":"Adolescent and peer alcohol use — alcohol_use_1","title":"Adolescent and peer alcohol use — alcohol_use_1","text":"subset data Curran, Stice, Chassin (1997) measuring relation changes alcohol use changes peer alcohol use 3-year period community-based sample Hispanic Caucasian adolescents.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/alcohol_use_1.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Adolescent and peer alcohol use — alcohol_use_1","text":"","code":"alcohol_use_1"},{"path":"https://mccarthy-m-g.github.io/alda/reference/alcohol_use_1.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Adolescent and peer alcohol use — alcohol_use_1","text":"person-period data frame 246 rows 6 columns: id Participant ID. age years. child_of_alcoholic Binary indicator whether adolescent child alcoholic parent. male Binary indicator whether adolescent male. alcohol_use Square root summed scores four eight-point items measuring frequency alcohol use. peer_alcohol_use Square root summed scores two six-point items measuring frequency peer alcohol use.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/alcohol_use_1.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Adolescent and peer alcohol use — alcohol_use_1","text":"Curran, P. J., Stice, E., & Chassin, L. (1997). relation adolescent peer alcohol use: longitudinal random coefficients model. Journal Consulting Clinical Psychology, 65, 130–140. https://doi.org/10.1037//0022-006x.65.1.130","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/alcohol_use_2.html","id":null,"dir":"Reference","previous_headings":"","what":"Adolescent alcohol consumption and peer pressure — alcohol_use_2","title":"Adolescent alcohol consumption and peer pressure — alcohol_use_2","text":"Data Barnes, Farrell, Banerjee (1994) measuring relation changes alcohol use changes peer pressure use alcohol sample 1122 Black White adolescents tracked beginning seventh grade end eighth grade.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/alcohol_use_2.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Adolescent alcohol consumption and peer pressure — alcohol_use_2","text":"","code":"alcohol_use_2"},{"path":"https://mccarthy-m-g.github.io/alda/reference/alcohol_use_2.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Adolescent alcohol consumption and peer pressure — alcohol_use_2","text":"person-period data frame 3366 rows 5 columns: id Participant ID. time Time measurement. female Binary indicator whether adolescent female. alcohol_use Natural logarithm averaged scores three six-point items measuring frequency beer, wine, liquor consumption, respectively. peer_pressure Natural logarithm six-point item measuring frequency friends offered alcoholic drinks past month.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/alcohol_use_2.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Adolescent alcohol consumption and peer pressure — alcohol_use_2","text":"Barnes, G. M., Farrell, M. P., & Banerjee, S. (1994). Family influences alcohol abuse problem behaviors among black white adolescents general population sample. Journal Research Adolescence, 4, 183–201. https://doi.org/10.1207/s15327795jra0402_2","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/alcohol_use_2.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Adolescent alcohol consumption and peer pressure — alcohol_use_2","text":"Barnes, Farrell, Banerjee (1994) report sample 699 adolescents; however, note ongoing longitudinal study, likely explains sample size discrepancy data used Singer Willett (2003).","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/antidepressants.html","id":null,"dir":"Reference","previous_headings":"","what":"Antidepressant medication and positive mood — antidepressants","title":"Antidepressant medication and positive mood — antidepressants","text":"subset data Tomarken, Shelton, Elkins, Anderson (1997) measuring relation changes positive mood supplemental antidepressant medication course week sample 73 men women already receiving nonpharmacological therapy depression.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/antidepressants.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Antidepressant medication and positive mood — antidepressants","text":"","code":"antidepressants"},{"path":"https://mccarthy-m-g.github.io/alda/reference/antidepressants.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Antidepressant medication and positive mood — antidepressants","text":"person-period data frame 1242 rows 6 columns: id Participant ID. wave Wave measurement. day Day measurement. reading Time day measurement. positive_mood Positive mood score. treatment Treatment condition (placebo pills = 0, antidepressant pills = 1).","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/antidepressants.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Antidepressant medication and positive mood — antidepressants","text":"Tomarken, . J., Shelton, R. C., Elkins, L., & Anderson, T. (1997). Sleep deprivation anti-depressant medication: Unique effects positive negative affect. Poster session presented 9th annual meeting American Psychological Society, Washington, DC.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/berkeley.html","id":null,"dir":"Reference","previous_headings":"","what":"Berkeley Growth Study — berkeley","title":"Berkeley Growth Study — berkeley","text":"subset data Berkeley Growth Study measuring changes IQ single girl followed childhood older adulthood (Bayley, 1935).","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/berkeley.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Berkeley Growth Study — berkeley","text":"","code":"berkeley"},{"path":"https://mccarthy-m-g.github.io/alda/reference/berkeley.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Berkeley Growth Study — berkeley","text":"person-period data frame 18 rows 2 columns: age Age girl years. iq IQ score.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/berkeley.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Berkeley Growth Study — berkeley","text":"Bayley, N. (1935). development motor abilities first three years. Monographs Society Research Child Development, 1.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/cocaine_relapse_1.html","id":null,"dir":"Reference","previous_headings":"","what":"Weeks to cocaine relapse after treatment — cocaine_relapse_1","title":"Weeks to cocaine relapse after treatment — cocaine_relapse_1","text":"subset data Hall, Havassy, Wasserman's (1990) measuring number weeks relapse cocaine use sample 104 former addicts released -patient treatment program. -patients followed 12 weeks used cocaine 7 consecutive days.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/cocaine_relapse_1.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Weeks to cocaine relapse after treatment — cocaine_relapse_1","text":"","code":"cocaine_relapse_1"},{"path":"https://mccarthy-m-g.github.io/alda/reference/cocaine_relapse_1.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Weeks to cocaine relapse after treatment — cocaine_relapse_1","text":"person-level data frame 104 rows 4 columns: id -patient ID. weeks number weeks -patient's release relapse cocaine use. censor Censoring status. needle Binary indicator whether cocaine ever used intravenously.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/cocaine_relapse_1.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Weeks to cocaine relapse after treatment — cocaine_relapse_1","text":"Hall, S. M., Havassy, B. E., & Wasserman, D. . (1990). Commitment abstinence acute stress relapse alcohol, opiates, nicotine. Journal Consulting Clinical Psychology, 58, 175–181. https://doi.org/10.1037//0022-006x.58.2.175","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/cocaine_relapse_2.html","id":null,"dir":"Reference","previous_headings":"","what":"Days to cocaine relapse after abstinence — cocaine_relapse_2","title":"Days to cocaine relapse after abstinence — cocaine_relapse_2","text":"subset unpublished data Hall, Havassy, Wasserman (1990) measuring relation number days relapse cocaine use several predictors might associated relapse sample 104 newly abstinent cocaine users recently completed abstinence-oriented treatment program. Former cocaine users followed 12 weeks post-treatment used cocaine 7 consecutive days. Self-reported abstinence confirmed interview absence cocaine urine specimens.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/cocaine_relapse_2.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Days to cocaine relapse after abstinence — cocaine_relapse_2","text":"","code":"cocaine_relapse_2"},{"path":"https://mccarthy-m-g.github.io/alda/reference/cocaine_relapse_2.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Days to cocaine relapse after abstinence — cocaine_relapse_2","text":"person-period data frame 1248 rows 7 columns: id Participant ID. days Number days relapse cocaine use censoring. Relapse defined 4 days cocaine use week preceding interview. Study dropouts lost participants coded relapsing cocaine use, number days relapse coded occurring week last follow-interview attended. censor Censoring status (0 = relapsed, 1 = censored). needle Binary indicator whether cocaine ever used intravenously. base_mood Total score positive mood subscales (Activity Happiness) Mood Questionnaire (Ryman, Biersner, & LaRocco, 1974), taken intake interview last week treatment. item used five point Likert score (ranging 0 = , 4 = extremely). followup Week follow-interview. mood Total score positive mood subscales (Activity Happiness) Mood Questionnaire (Ryman, Biersner, & LaRocco, 1974), taken follow-interviews week post-treatment. item used five point Likert score (ranging 0 = , 4 = extremely).","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/cocaine_relapse_2.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Days to cocaine relapse after abstinence — cocaine_relapse_2","text":"Hall, S. M., Havassy, B. E., & Wasserman, D. . (1990). Commitment abstinence acute stress relapse alcohol, opiates, nicotine. Journal Consulting Clinical Psychology, 58, 175–181. https://doi.org/10.1037//0022-006x.58.2.175","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/cocaine_relapse_2.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Days to cocaine relapse after abstinence — cocaine_relapse_2","text":"Hall, Havassy, Wasserman (1990) measured time relapse weeks, days; however, use data illustrate imputation strategies, Singer Willett (2003) converted weekly relapse information days, jittered event times, effectively converting discrete-time continuous-time. Additionally, Hall, Havassy, Wasserman (1990) report following cocaine users study, thus, appears unpublished data.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/cocaine_relapse_2.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Days to cocaine relapse after abstinence — cocaine_relapse_2","text":"Ryman, D. H., Biersner, R. J., & La Rocco, J. M. (1974). Reliabilities validities Mood Questionnaire. Psychological Reports, 35, 479-484. https://doi.org/10.2466/pr0.1974.35.1.479","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/cognitive_growth.html","id":null,"dir":"Reference","previous_headings":"","what":"Cognitive growth in children — cognitive_growth","title":"Cognitive growth in children — cognitive_growth","text":"Data Tivnan (1980) measuring changes cognitive growth three-week period sample 17 first second-graders.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/cognitive_growth.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Cognitive growth in children — cognitive_growth","text":"","code":"cognitive_growth"},{"path":"https://mccarthy-m-g.github.io/alda/reference/cognitive_growth.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Cognitive growth in children — cognitive_growth","text":"person-period data frame 445 rows 4 columns: id Child ID. game Game number. child played maximum 27 games. nmoves number moves completed making catastrophic error. read Score unnamed standardized reading test.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/cognitive_growth.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Cognitive growth in children — cognitive_growth","text":"Tivnan, T. (1980). Improvements performance cognitive tasks: acquisition new skills elementary school children. Unpublished doctoral dissertation. Harvard University, Graduate School Education.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/congresswomen.html","id":null,"dir":"Reference","previous_headings":"","what":"House of Representatives tenure — congresswomen","title":"House of Representatives tenure — congresswomen","text":"Data measuring long 168 women elected U.S. House Representatives 1919 1996 remained office. Representatives followed eight terms 1998.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/congresswomen.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"House of Representatives tenure — congresswomen","text":"","code":"congresswomen"},{"path":"https://mccarthy-m-g.github.io/alda/reference/congresswomen.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"House of Representatives tenure — congresswomen","text":"person-level data frame 168 rows 5 columns: id Participant ID. name Representative name. time Number terms office. censor Censoring status. democrat Party affiliation.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/depression_unemployment.html","id":null,"dir":"Reference","previous_headings":"","what":"Unemployment and depression — depression_unemployment","title":"Unemployment and depression — depression_unemployment","text":"subset data Ginexi colleagues (2000) measuring changes depressive symptoms job loss sample 254 recently unemployed men women. Interviews conducted three waves around 1, 5, 12 months job loss.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/depression_unemployment.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Unemployment and depression — depression_unemployment","text":"","code":"depression_unemployment"},{"path":"https://mccarthy-m-g.github.io/alda/reference/depression_unemployment.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Unemployment and depression — depression_unemployment","text":"person-period data frame 674 rows 5 columns: id Participant ID. interview Time interview. months Months since job loss. depression Center Epidemiologic Studies' Depression (CES-D) scale score (Radloff, 1977) unemployed Binary indicator whether participant unemployed time interview.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/depression_unemployment.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Unemployment and depression — depression_unemployment","text":"Ginexi, E. M., Howe, G. W., & Caplan, R. D. (2000). Depression control beliefs relation reemployment: directions effect? Journal Occupational Health Psychology, 5, 323–336. https://doi.org/10.1037//1076-8998.5.3.323","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/depression_unemployment.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Unemployment and depression — depression_unemployment","text":"Radloff, L. S. (1977). CES-D scale: self report major depressive disorder scale research general population. Applied Psychological Measurement, 1, 385–401. https://doi.org/10.1177/014662167700100306","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/deviant_tolerance.html","id":null,"dir":"Reference","previous_headings":"","what":"Adolescent tolerance of deviant behaviour — deviant_tolerance_pp","title":"Adolescent tolerance of deviant behaviour — deviant_tolerance_pp","text":"subset data National Youth Survey (NYS) measuring tolerance deviant behaviour adolescents time (Raudenbush & Chan, 1992).","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/deviant_tolerance.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Adolescent tolerance of deviant behaviour — deviant_tolerance_pp","text":"","code":"deviant_tolerance_pp deviant_tolerance_pl"},{"path":[]},{"path":"https://mccarthy-m-g.github.io/alda/reference/deviant_tolerance.html","id":"deviant-tolerance-pp","dir":"Reference","previous_headings":"","what":"deviant_tolerance_pp","title":"Adolescent tolerance of deviant behaviour — deviant_tolerance_pp","text":"person-period data frame 80 rows 5 columns: id Participant ID. age Adolescent age years. tolerance Average score across 9-item scale assessing attitudes favourable deviant behaviour. item used four point scale (1 = wrong, 2 = wrong, 3 = little bit wrong, 4 = wrong ). male Binary indicator whether adolescent male. exposure Average score across 9-item scale assessing level exposure deviant peers. item used five point Likert score (ranging 0 = none, 4 = ).","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/deviant_tolerance.html","id":"deviant-tolerance-pl","dir":"Reference","previous_headings":"","what":"deviant_tolerance_pl","title":"Adolescent tolerance of deviant behaviour — deviant_tolerance_pp","text":"person-level data frame 16 rows 8 columns: id Participant ID. tolerance_11, tolerance_12, tolerance_13, tolerance_14, tolerance_15, Average score across 9-item scale assessing attitudes favourable deviant behaviour ages 11, 12, 13, 14, 15. item used four point scale (1 = wrong, 2 = wrong, 3 = little bit wrong, 4 = wrong ). male Binary indicator whether adolescent male. exposure Average score across 9-item scale assessing level exposure deviant peers. item used five point Likert score (ranging 0 = none, 4 = ).","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/deviant_tolerance.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Adolescent tolerance of deviant behaviour — deviant_tolerance_pp","text":"Raudenbush, S. W., & Chan, W. S. (1992). Growth curve analysis accelerated longitudinal designs. Journal Research Crime Delinquency, 29, 387–411. https://doi.org/10.1177/0022427892029004001","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/deviant_tolerance.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Adolescent tolerance of deviant behaviour — deviant_tolerance_pp","text":"Raudenbush Chan (1992) comment exposure time-varying predictor original study; however, Singer Willett (2003) provide exposure time-invariant predictor.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/dropout_wages.html","id":null,"dir":"Reference","previous_headings":"","what":"High school dropout labour market experiences — dropout_wages","title":"High school dropout labour market experiences — dropout_wages","text":"subset data National Longitudinal Study Youth tracking labour market experiences male high school dropouts (Murnane, Boudett, & Willett, 1999).","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/dropout_wages.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"High school dropout labour market experiences — dropout_wages","text":"","code":"dropout_wages dropout_wages_subset"},{"path":[]},{"path":"https://mccarthy-m-g.github.io/alda/reference/dropout_wages.html","id":"dropout-wages","dir":"Reference","previous_headings":"","what":"dropout_wages","title":"High school dropout labour market experiences — dropout_wages","text":"person-period data frame 6402 rows 9 columns: id Participant ID. log_wages Natural logarithm wages. experience Labour force experience years, tracked dropouts' first day work. ged Binary indicator whether dropout obtained GED. postsecondary_education Binary indicator whether dropout obtained post-secondary education. black Binary indicator whether dropout black. hispanic Binary indicator whether dropout hispanic. highest_grade Highest grade completed. unemployment_rate Unemployment rate local geographic area.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/dropout_wages.html","id":"dropout-wages-subset","dir":"Reference","previous_headings":"","what":"dropout_wages_subset","title":"High school dropout labour market experiences — dropout_wages","text":"person-period data frame 257 rows 5 columns: id Participant ID. log_wages Natural logarithm wages. experience Labour force experience years, tracked dropouts' first day work. black Binary indicator whether dropout black. highest_grade Highest grade completed.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/dropout_wages.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"High school dropout labour market experiences — dropout_wages","text":"Murnane, R. J., Boudett, K. P., & Willett, J. B. (1999). male dropouts benefit obtaining GED, postsecondary education, training? Evaluation Review, 23, 475–502. https://doi.org/10.1177/0193841x9902300501","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/early_intervention.html","id":null,"dir":"Reference","previous_headings":"","what":"Early educational intervention and cognitive performance — early_intervention","title":"Early educational intervention and cognitive performance — early_intervention","text":"Simulated data Burchinal, Campbell, Bryant, Wasik, Ramey (1997) measuring effect early educational intervention cognitive performance sample African-American children ages 12, 18, 24 months.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/early_intervention.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Early educational intervention and cognitive performance — early_intervention","text":"","code":"early_intervention"},{"path":"https://mccarthy-m-g.github.io/alda/reference/early_intervention.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Early educational intervention and cognitive performance — early_intervention","text":"person-period data frame 309 rows 4 columns: id Child ID. age Age years time measurement. treatment Treatment condition (control = 0, intervention = 1). cognitive_score Cognitive performance score one two standardized intelligence tests: Bayley Scales Infant Development (Bayley, 1969) 12 18 months, Stanford Binet (Terman & Merrill, 1972) 24 months.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/early_intervention.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Early educational intervention and cognitive performance — early_intervention","text":"Burchinal, M. R., Campbell, F. ., Bryant, D. M., Wasik, B. H., & Ramey, C. T. (1997). Early intervention mediating processes cognitive performance children low income African American families. Child Development, 68, 935–954. https://doi.org/10.2307/1132043","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/early_intervention.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Early educational intervention and cognitive performance — early_intervention","text":"request researchers, Singer Willett (2003) provide data Burchinal, Campbell, Bryant, Wasik, Ramey's (1997) order ensure privacy study's participants. However, data provided simulated similar statistical properties study order match estimates figures presented text best possible.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/early_intervention.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Early educational intervention and cognitive performance — early_intervention","text":"Bayley, N. (1969). Bayley Scales Infant Development. New York: Psychological Corp. Terman, L. M., & Merrill, N. Q. (1972). Stanford-Binet Intelligence Scale: 1972 Norms Editions. Boston: Houghton Mifflin.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/externalizing_behaviour.html","id":null,"dir":"Reference","previous_headings":"","what":"Externalizing behaviour in children — externalizing_behaviour","title":"Externalizing behaviour in children — externalizing_behaviour","text":"subset data Keiley, Bates, Dodge, Pettit (2000) measuring changes externalizing behaviour sample 45 children tracked first sixth grade.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/externalizing_behaviour.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Externalizing behaviour in children — externalizing_behaviour","text":"","code":"externalizing_behaviour"},{"path":"https://mccarthy-m-g.github.io/alda/reference/externalizing_behaviour.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Externalizing behaviour in children — externalizing_behaviour","text":"person-period data frame 270 rows 5 columns: id Child ID. time Time measurement. externalizing_behaviour Sum scores Achenbach's (1991) Child Behavior Checklist. Scores range 0 68 female Binary indicator whether adolescent female. grade Grade year.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/externalizing_behaviour.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Externalizing behaviour in children — externalizing_behaviour","text":"Keiley, M. K., Bates, J. E., Dodge, K. ., & Pettit, G. S. (2000). cross-domain growth analysis: Externalizing internalizing behavior 8 years childhood. Journal Abnormal Child Psychology, 28, 161–179. https://doi.org/10.1023%2Fa%3A1005122814723","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/externalizing_behaviour.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Externalizing behaviour in children — externalizing_behaviour","text":"Achenbach, T. M. (1991). Manual Child Behavior Checklist 4–18 1991 Profile. Burlington, VT: University Vermont Press.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_arrest.html","id":null,"dir":"Reference","previous_headings":"","what":"Age of first juvenile arrest — first_arrest","title":"Age of first juvenile arrest — first_arrest","text":"Data Keiley Martin (2002) measuring effect child abuse risk first juvenile arrest sample 1553 adolescents aged 8 18. Adolescents followed age 18 arrested.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_arrest.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Age of first juvenile arrest — first_arrest","text":"","code":"first_arrest"},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_arrest.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Age of first juvenile arrest — first_arrest","text":"person-period data frame 15834 rows 7 columns: id Participant ID. time Age first juvenile arrest. censor Censoring status. abused Binary indicator whether adolescent abused. black Binary indicator whether adolescent black. period Age record corresponds . event Binary indicator whether adolescent arrested.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_arrest.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Age of first juvenile arrest — first_arrest","text":"Keiley, M. K., & Martin, N. C. (2002). Child abuse, neglect, juvenile delinquency: “new” statistical approaches can inform understanding “old” questions—reanalysis Widon, 1989. Manuscript submitted publication.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_cocaine.html","id":null,"dir":"Reference","previous_headings":"","what":"Age of first cocaine use — first_cocaine","title":"Age of first cocaine use — first_cocaine","text":"Data Burton colleagues (1996) measuring relation age first cocaine use drug-use history random sample 1658 white American men. Age first cocaine use drug-use history determined two interviews eleven years apart (1974 1985).","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_cocaine.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Age of first cocaine use — first_cocaine","text":"","code":"first_cocaine"},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_cocaine.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Age of first cocaine use — first_cocaine","text":"person-level data frame 1658 rows 15 columns: id Participant ID. used_cocaine_age Age first cocaine use. censor Censoring status. birth_year early_marijuana_use Binary indicator whether marijuana used age 17. used_marijuana Binary indicator whether participant used marijuana study period. used_marijuana_age Age participant first used marijuana. sold_marijuana Binary indicator whether participant sold marijuana study period. sold_marijuana_age Age participant first sold marijuana. early_drug_use Binary indicator whether drugs used age 17. used_drugs Binary indicator whether participant used drugs study period. used_drugs_age Age participant first used drugs. sold_drugs Binary indicator whether participant sold drugs study period. sold_drugs_age Age participant first sold drugs. rural Binary indicator whether participant lived rural area.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_cocaine.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Age of first cocaine use — first_cocaine","text":"Burton, R. P. D., Johnson, R. J., Ritter, C., & Clayton. R. R. (1996). effects role socialization initiation cocaine use: event history analysis adolescence middle adulthood. Journal Health Social Behavior, 37, 75–90. https://doi.org/10.2307/2137232","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_depression_1.html","id":null,"dir":"Reference","previous_headings":"","what":"Age of first depression — first_depression_1","title":"Age of first depression — first_depression_1","text":"subset data Wheaton, Rozell, Hall (1997) measuring relation age first depressive episode several childhood adult traumatic stressors random sample 1393 adults living metropolitan Toronto, Ontario. Age first depressive episode traumatic stressors determined structured interview.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_depression_1.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Age of first depression — first_depression_1","text":"","code":"first_depression_1"},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_depression_1.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Age of first depression — first_depression_1","text":"person-period data frame 36997 rows 11 columns: id Participant ID. onset Age first depressive episode. censor Censoring status. interview_age Age time interview. female Binary indicator whether adult female. siblings Number siblings. bigfamily Binary indicator whether adult five siblings. period Age record corresponds . depressive_episode Binary indicator whether adult experienced depressive episode. parental_divorce Binary indicator whether adult's parents divorced previous age. parental_divorce_now Binary indicator whether adult's parents divorced current period.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_depression_1.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Age of first depression — first_depression_1","text":"Wheaton, B., Roszell, P., & Hall, K. (1997). impact twenty childhood adult traumatic stressors risk psychiatric disorder. . H. Gotlib & B. Wheaton (Eds.), Stress adversity life course: Trajectories turning points (pp. 50–72). New York: Cambridge University Press. https://doi.org/10.1017/CBO9780511527623.003","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_depression_2.html","id":null,"dir":"Reference","previous_headings":"","what":"Age of first depression — first_depression_2","title":"Age of first depression — first_depression_2","text":"Data Sorenson, Rutter, Aneshensel (1991) measuring age first depressive episode sample 2974 adults. Age first depressive episode measured asking respondents whether , , age first experienced depressive episode.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_depression_2.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Age of first depression — first_depression_2","text":"","code":"first_depression_2"},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_depression_2.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Age of first depression — first_depression_2","text":"person-level data frame 2974 rows 3 columns: id Participant ID. age years. censor Censoring status.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_depression_2.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Age of first depression — first_depression_2","text":"Sorenson, S. B., Rutter, C. M., & Aneshensel, C. S. (1991). Depression community: investigation age onset. Journal Consulting Clinical Psychology, 59, 541546. https://doi.org/10.1037/0022-006X.59.4.541","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_sex.html","id":null,"dir":"Reference","previous_headings":"","what":"Age of first sexual intercourse — first_sex","title":"Age of first sexual intercourse — first_sex","text":"subset data Capaldi, Crosby, Stoolmiller's (1996) measuring grade year first sexual intercourse sample 180 -risk heterosexual adolescent males. Adolescent males followed Grade 7 Grade 12 reported sexual intercourse first time.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_sex.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Age of first sexual intercourse — first_sex","text":"","code":"first_sex"},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_sex.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Age of first sexual intercourse — first_sex","text":"person-level data frame 180 rows 5 columns: id Participant ID. grade Grade year first sexual intercourse. censor Censoring status. parental_transition Binary indicator whether adolescent experienced parental transition (parents separated repartnered). parental_antisociality Composite score across four indicators measuring parents' level antisocial behaviour child's formative years.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_sex.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Age of first sexual intercourse — first_sex","text":"Capaldi, D. M., Crosby, L., & Stoolmiller, M. (1996). Predicting timing first sexual intercourse -risk adolescent males. Child Development, 67, 344–359. https://doi.org/10.2307/1131818","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_sex.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Age of first sexual intercourse — first_sex","text":"Capaldi, Crosby, Stoolmiller's (1996) original sample consisted 182 adolescent males applying exclusion criteria analysis; Singer Willett (2003) excluded additional two males data reported anal intercourse another male.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/health_workers.html","id":null,"dir":"Reference","previous_headings":"","what":"Length of health worker employment — health_workers","title":"Length of health worker employment — health_workers","text":"subset data Singer colleagues (1998) measuring length employment sample 2074 health care workers hired community migrant health centres. Health care workers followed 33 months termination employment.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/health_workers.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Length of health worker employment — health_workers","text":"","code":"health_workers"},{"path":"https://mccarthy-m-g.github.io/alda/reference/health_workers.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Length of health worker employment — health_workers","text":"person-level data frame 2074 rows 3 columns: id Participant ID. weeks Number weeks termination employment. censor Censoring status.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/health_workers.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Length of health worker employment — health_workers","text":"Singer, J. D., Davidson, S., Graham, S., & Davidson, H. S. (1998). Physician retention community migrant health centers: stays long? Medical Care, 38, 11981213. https://doi.org/10.1097/00005650-199808000-00008","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/honking.html","id":null,"dir":"Reference","previous_headings":"","what":"Time to horn honking — honking","title":"Time to horn honking — honking","text":"subset data Diekmann colleagues (1996) measuring time horn honking sample 57 motorists purposefully blocked green light Volkswagen Jetta busy intersection near centre Munich, West Germany two busy afternoons (Sunday Monday) 1998. Motorists followed honked horns took alternative action (beaming changing lanes).","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/honking.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Time to horn honking — honking","text":"","code":"honking"},{"path":"https://mccarthy-m-g.github.io/alda/reference/honking.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Time to horn honking — honking","text":"person-level data frame 57 rows 3 columns: id Participant ID. seconds Number seconds horn honking alternative action. censor Censoring status.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/honking.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Time to horn honking — honking","text":"Diekmann, ., Jungbauer-Gans, M., Krassnig, H., & Lorenz, S. (1996). Social status aggression: field study analyzed survival analysis. Journal Social Psychology, 136, 761–768. https://doi.org/10.1080/00224545.1996.9712252","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/judges.html","id":null,"dir":"Reference","previous_headings":"","what":"Supreme Court justice tenure — judges","title":"Supreme Court justice tenure — judges","text":"Data Zorn Van Winkle (2000) long 107 justices appointed U.S. Supreme Court 1789 1980 remained position.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/judges.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Supreme Court justice tenure — judges","text":"","code":"judges"},{"path":"https://mccarthy-m-g.github.io/alda/reference/judges.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Supreme Court justice tenure — judges","text":"person-level data frame 109 rows 7 columns: id Justice ID. tenure Time retirement death years. dead Binary indicator whether justice died. retired Binary indicator whether justice retired. left_appointment Binary indicator whether justice left appointment. appointment_age Age time appointment. appointment_year Year appointment.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/judges.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Supreme Court justice tenure — judges","text":"Zorn, C. J., & van Winkle, S. R. (2000). competing risks model Supreme Court vacancies, 1780–1992. Political Behavior, 22, 145–166.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/math_dropout.html","id":null,"dir":"Reference","previous_headings":"","what":"Math course History — math_dropout","title":"Math course History — math_dropout","text":"Data Graham (1997) measuring relation mathematics course-taking gender identity sample 3790 tenth grade high school students. Students followed 5 terms (eleventh grade, twelfth grade, first three semesters college) stopped enrolling mathematics courses.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/math_dropout.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Math course History — math_dropout","text":"","code":"math_dropout"},{"path":"https://mccarthy-m-g.github.io/alda/reference/math_dropout.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Math course History — math_dropout","text":"person-period data frame 9558 rows 6 columns: id Participant ID. last_term term student stopped enrolling mathematics courses. woman Binary indicators whether student identified woman. censor Censoring status. term Term record corresponds . event Binary indicator whether student stopped enrolling mathematics courses given term.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/math_dropout.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Math course History — math_dropout","text":"Graham, S. E. (1997). exodus mathematics: ? Unpublished doctoral dissertation. Harvard University, Graduate School Education.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/monkeys.html","id":null,"dir":"Reference","previous_headings":"","what":"Piagetian monkeys — monkeys","title":"Piagetian monkeys — monkeys","text":"Data Ha, Kimpo, Sackett (1997) measuring age first demonstration object recognition sample 123 pigtailed macaques. Monkeys followed 37 days demonstrated classic Piagetian stage development known object recognition.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/monkeys.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Piagetian monkeys — monkeys","text":"","code":"monkeys"},{"path":"https://mccarthy-m-g.github.io/alda/reference/monkeys.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Piagetian monkeys — monkeys","text":"person-level data frame 123 rows 7 columns: id Monkey ID. sessions Number sessions monkey completed demonstrating object recognition. initial_age Age initial testing days. end_age Age end testing days. censor Censoring status. birth_weight Decile equivalent monkey's birth weight comparison colony-wide sex-specific standards. female Binary indicator whether adolescent female.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/monkeys.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Piagetian monkeys — monkeys","text":"Ha, J. C., Kimpo, C. L., & Sackett, G. P. (1997). Multiple-spell, discrete-time survival analysis developmental data: Object concept pigtailed macaques. Developmental Psychology, 33, 1054–1059. https://doi.org/10.1037//0012-1649.33.6.1054","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/opposites_naming.html","id":null,"dir":"Reference","previous_headings":"","what":"Opposites naming Task — opposites_naming","title":"Opposites naming Task — opposites_naming","text":"Artificial data created Willett (1988) measuring changes performance hypothetical \"opposites naming\" task four week period sample 35 people.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/opposites_naming.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Opposites naming Task — opposites_naming","text":"","code":"opposites_naming"},{"path":"https://mccarthy-m-g.github.io/alda/reference/opposites_naming.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Opposites naming Task — opposites_naming","text":"person-period data frame 140 rows 5 columns: id Participant ID. wave Wave measurement. time Wave measurement centred time 0. opposites_naming_score Score \"opposites naming\" task. baseline_cognitive_score Baseline score standardized instrument assessing general cognitive skill.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/opposites_naming.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Opposites naming Task — opposites_naming","text":"Willett, J. B. (1988). Questions answers measurement change. E. Rothkopf (Ed.), Review research education (1988–89) (pp. 345–422). Washington, DC: American Educational Research Association. https://doi.org/10.3102/0091732X015001345","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/physicians.html","id":null,"dir":"Reference","previous_headings":"","what":"Physician career history — physicians","title":"Physician career history — physicians","text":"subset data Singer colleagues (1998) measuring length employment sample 812 physicians hired community migrant health centres. Physicians followed 33 months termination employment. measurement window began January 1, 1990, ended September 30, 1992.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/physicians.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Physician career history — physicians","text":"","code":"physicians"},{"path":"https://mccarthy-m-g.github.io/alda/reference/physicians.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Physician career history — physicians","text":"person-level data frame 812 rows 8 columns: id Participant ID. start_date Date hire. end_date Date departure. entry Number years since hire physician worked entering measurement window. exit Number years physician worked departure. censor Censoring status. part_time Binary indicator whether physician worked part time. age Age time hire.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/physicians.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Physician career history — physicians","text":"Singer, J. D., Davidson, S., Graham, S., & Davidson, H. S. (1998). Physician retention community migrant health centers: stays long? Medical Care, 38, 11981213. https://doi.org/10.1097/00005650-199808000-00008","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/psychiatric_discharge.html","id":null,"dir":"Reference","previous_headings":"","what":"Days to psychiatric hospital discharge — psychiatric_discharge","title":"Days to psychiatric hospital discharge — psychiatric_discharge","text":"subset data Foster (2000) measuring relation number days discharge psychiatric hospital type treatment plan sample 174 adolescents emotional behavioural problems.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/psychiatric_discharge.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Days to psychiatric hospital discharge — psychiatric_discharge","text":"","code":"psychiatric_discharge"},{"path":"https://mccarthy-m-g.github.io/alda/reference/psychiatric_discharge.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Days to psychiatric hospital discharge — psychiatric_discharge","text":"person-level data frame 174 rows 4 columns: id Participant ID. days Number days discharge. censor Censoring status. treatment_plan Binary indicator whether patient traditional coverage plan (0) innovative coverage plan (1).","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/psychiatric_discharge.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Days to psychiatric hospital discharge — psychiatric_discharge","text":"Foster, E. M. (2000). continuum care reduce inpatient length stay? Evaluation Program Planning, 23, 53–65. https://doi.org/10.1016/S0149-7189(99)00037-3","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/reading_scores.html","id":null,"dir":"Reference","previous_headings":"","what":"Peabody Individual Achievement Test reading scores — reading_scores","title":"Peabody Individual Achievement Test reading scores — reading_scores","text":"subset data Children National Longitudinal Study Youth measuring changes reading subtest Peabody Individual Achievement Test (PIAT) sample 89 African-American children across three waves around ages 6, 8, 10.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/reading_scores.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Peabody Individual Achievement Test reading scores — reading_scores","text":"","code":"reading_scores"},{"path":"https://mccarthy-m-g.github.io/alda/reference/reading_scores.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Peabody Individual Achievement Test reading scores — reading_scores","text":"person-period data frame 267 rows 5 columns: id Participant ID. wave Wave measurement. age_group Expected age measurement occasion. age Age years time measurement. reading_score Reading score reading subtest Peabody Individual Achievement Test (PIAT).","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/reading_scores.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Peabody Individual Achievement Test reading scores — reading_scores","text":"US Bureau Labor Statistics. National Longitudinal Survey Youth (Children NLSY). https://www.bls.gov/nls/nlsy79-children.htm","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/rearrest.html","id":null,"dir":"Reference","previous_headings":"","what":"Days to inmate recidivism — rearrest","title":"Days to inmate recidivism — rearrest","text":"Data Henning Frueh (1996) measuring measuring days rearrest sample 194 inmates recently released medium security prison. Inmates followed three years rearrested.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/rearrest.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Days to inmate recidivism — rearrest","text":"","code":"rearrest"},{"path":"https://mccarthy-m-g.github.io/alda/reference/rearrest.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Days to inmate recidivism — rearrest","text":"person-level data frame 194 rows 7 columns: id Participant ID. days Number days rearrest. months Number months rearrest, scale \"average\" month (30.4375 days). censor Censoring status. personal Committed person-related crime property Binary indicator whether inmate committed property crime. age Centred age time release.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/rearrest.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Days to inmate recidivism — rearrest","text":"Henning, K. R., & Frueh, B. C. (1996). Cognitive-behavioral treatment incarcerated offenders: evaluation Vermont Department Corrections' cognitive self-change program. Criminal Justice Behavior, 23, 523–541. https://doi.org/10.1177/0093854896023004001","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/suicide_ideation.html","id":null,"dir":"Reference","previous_headings":"","what":"Age of first suicide ideation — suicide_ideation","title":"Age of first suicide ideation — suicide_ideation","text":"subset data Bolger colleagues (1989) measuring age first suicide ideation sample 391 undergraduate students aged 16 22. Age first suicide ideation measured two-item survey asking respondents \"ever thought committing suicide?\" , \"age thought first occur ?\"","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/suicide_ideation.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Age of first suicide ideation — suicide_ideation","text":"","code":"suicide_ideation"},{"path":"https://mccarthy-m-g.github.io/alda/reference/suicide_ideation.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Age of first suicide ideation — suicide_ideation","text":"person-level data frame 391 rows 4 columns: id Participant ID. time Reported age first suicide ideation. censor Censoring status. age Participant age time survey.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/suicide_ideation.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Age of first suicide ideation — suicide_ideation","text":"Bolger, N., Downey, G., Walker, E., & Steininger, P. (1989). onset suicide ideation childhood adolescence. Journal Youth Adolescence, 18, 175–189. https://doi.org/10.1007/BF02138799","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/teachers.html","id":null,"dir":"Reference","previous_headings":"","what":"Years to special education teacher turnover — teachers","title":"Years to special education teacher turnover — teachers","text":"subset data Singer (1993) measuring many years 3941 newly hired special educators Michigan stayed teaching 1972 1978. Teachers followed 13 years stopped teaching state.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/teachers.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Years to special education teacher turnover — teachers","text":"","code":"teachers"},{"path":"https://mccarthy-m-g.github.io/alda/reference/teachers.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Years to special education teacher turnover — teachers","text":"person-level data frame 3941 rows 3 columns: id Teacher ID. years number years teacher's dates hire departure Michigan public schools. censor Censoring status.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/teachers.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Years to special education teacher turnover — teachers","text":"Singer, J. D. (1992). special educators' careers special? Results 13-Year Longitudinal Study. Exceptional Children, 59, 262–279. https://doi.org/10.1177/001440299305900309","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/tenure.html","id":null,"dir":"Reference","previous_headings":"","what":"Years to academic tenure — tenure","title":"Years to academic tenure — tenure","text":"Data Gamse Conger (1997) measuring number years receiving tenure sample 260 semifinalists fellowship recipients National Academy Education–Spencer Foundation PostDoctoral Fellowship Program took academic job earning doctorate. Academics followed nine years received tenure.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/tenure.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Years to academic tenure — tenure","text":"","code":"tenure"},{"path":"https://mccarthy-m-g.github.io/alda/reference/tenure.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Years to academic tenure — tenure","text":"person-level data frame 260 rows 3 columns: id Participant ID. years Number years receiving tenure. censor Censoring status.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/tenure.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Years to academic tenure — tenure","text":"Gamse, B. C., & Conger, D. (1997). evaluation Spencer post-doctoral dissertation program. Cambridge, MA: Abt Associates.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/news/index.html","id":"alda-0009000","dir":"Changelog","previous_headings":"","what":"alda 0.0.0.9000","title":"alda 0.0.0.9000","text":"Added NEWS.md file track changes package.","code":""}] +[{"path":[]},{"path":"https://mccarthy-m-g.github.io/alda/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"CC0 1.0 Universal","title":"CC0 1.0 Universal","text":"CREATIVE COMMONS CORPORATION LAW FIRM PROVIDE LEGAL SERVICES. DISTRIBUTION DOCUMENT CREATE ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES INFORMATION “-” BASIS. CREATIVE COMMONS MAKES WARRANTIES REGARDING USE DOCUMENT INFORMATION WORKS PROVIDED HEREUNDER, DISCLAIMS LIABILITY DAMAGES RESULTING USE DOCUMENT INFORMATION WORKS PROVIDED HEREUNDER.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/LICENSE.html","id":"statement-of-purpose","dir":"","previous_headings":"","what":"Statement of Purpose","title":"CC0 1.0 Universal","text":"laws jurisdictions throughout world automatically confer exclusive Copyright Related Rights (defined ) upon creator subsequent owner(s) (, “owner”) original work authorship /database (, “Work”). Certain owners wish permanently relinquish rights Work purpose contributing commons creative, cultural scientific works (“Commons”) public can reliably without fear later claims infringement build upon, modify, incorporate works, reuse redistribute freely possible form whatsoever purposes, including without limitation commercial purposes. owners may contribute Commons promote ideal free culture production creative, cultural scientific works, gain reputation greater distribution Work part use efforts others. /purposes motivations, without expectation additional consideration compensation, person associating CC0 Work (“Affirmer”), extent owner Copyright Related Rights Work, voluntarily elects apply CC0 Work publicly distribute Work terms, knowledge Copyright Related Rights Work meaning intended legal effect CC0 rights. Copyright Related Rights. Work made available CC0 may protected copyright related neighboring rights (“Copyright Related Rights”). Copyright Related Rights include, limited , following: right reproduce, adapt, distribute, perform, display, communicate, translate Work; moral rights retained original author(s) /performer(s); publicity privacy rights pertaining person’s image likeness depicted Work; rights protecting unfair competition regards Work, subject limitations paragraph 4(), ; rights protecting extraction, dissemination, use reuse data Work; database rights (arising Directive 96/9/EC European Parliament Council 11 March 1996 legal protection databases, national implementation thereof, including amended successor version directive); similar, equivalent corresponding rights throughout world based applicable law treaty, national implementations thereof. Waiver. greatest extent permitted , contravention , applicable law, Affirmer hereby overtly, fully, permanently, irrevocably unconditionally waives, abandons, surrenders Affirmer’s Copyright Related Rights associated claims causes action, whether now known unknown (including existing well future claims causes action), Work () territories worldwide, (ii) maximum duration provided applicable law treaty (including future time extensions), (iii) current future medium number copies, (iv) purpose whatsoever, including without limitation commercial, advertising promotional purposes (“Waiver”). Affirmer makes Waiver benefit member public large detriment Affirmer’s heirs successors, fully intending Waiver shall subject revocation, rescission, cancellation, termination, legal equitable action disrupt quiet enjoyment Work public contemplated Affirmer’s express Statement Purpose. Public License Fallback. part Waiver reason judged legally invalid ineffective applicable law, Waiver shall preserved maximum extent permitted taking account Affirmer’s express Statement Purpose. addition, extent Waiver judged Affirmer hereby grants affected person royalty-free, non transferable, non sublicensable, non exclusive, irrevocable unconditional license exercise Affirmer’s Copyright Related Rights Work () territories worldwide, (ii) maximum duration provided applicable law treaty (including future time extensions), (iii) current future medium number copies, (iv) purpose whatsoever, including without limitation commercial, advertising promotional purposes (“License”). License shall deemed effective date CC0 applied Affirmer Work. part License reason judged legally invalid ineffective applicable law, partial invalidity ineffectiveness shall invalidate remainder License, case Affirmer hereby affirms () exercise remaining Copyright Related Rights Work (ii) assert associated claims causes action respect Work, either case contrary Affirmer’s express Statement Purpose. Limitations Disclaimers. trademark patent rights held Affirmer waived, abandoned, surrendered, licensed otherwise affected document. Affirmer offers Work -makes representations warranties kind concerning Work, express, implied, statutory otherwise, including without limitation warranties title, merchantability, fitness particular purpose, non infringement, absence latent defects, accuracy, present absence errors, whether discoverable, greatest extent permissible applicable law. Affirmer disclaims responsibility clearing rights persons may apply Work use thereof, including without limitation person’s Copyright Related Rights Work. , Affirmer disclaims responsibility obtaining necessary consents, permissions rights required use Work. Affirmer understands acknowledges Creative Commons party document duty obligation respect CC0 use Work.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-10.html","id":"the-life-table","dir":"Articles","previous_headings":"","what":"10.1 The life table","title":"Chapter 10: Describing discrete-time event occurrence data","text":"Section 10.1 Singer Willett (2003) introduce life table—primary tool summarizing sample distribution event occurrence—using subset data Singer (1993), measured many years 3941 newly hired special educators Michigan stayed teaching 1972 1978. Teachers followed 13 years stopped teaching state. example return teachers data set introduced Chapter 9, person-level data frame 3941 rows 3 columns: id: Teacher ID. years: number years teacher’s dates hire departure Michigan public schools. censor: Censoring status. Singer Willett (2003) discuss, life table tracks event histories sample individuals series contiguous intervals—beginning time end data collection—including information number individuals : Entered interval. Experienced target event interval. censored end interval. can either construct life table “hand” first converting person-level data set person-period data set, cross-tabulating time period event-indicator variables; using prepackaged routine. constructing life table “hand” Section 10.5, demonstrate prepackaged routine approach. Conceptually, life table simply tabular form survival function (see Section 10.2); thus, easy way construct life table first fit survival function person-level data set, use summary fit starting point construct remainder table. can fit survival function using survfit() function survival package. model formula survfit() function takes form response ~ terms, response must “survival object” created Surv() function. right-censored data, survival object can created supplying two unnamed arguments Surv() function corresponding time event variables, order. Note can recode censor variable event variable reversing values. 0-1 coded data, can write event status event = censor - 1. Next, ’ll collect summary information survfit object tibble using tidy() function broom package. now exclude statistical summaries life table, focusing exclusively columns related event histories teachers data. Note also summary information survfit object starts time first event, “beginning time”. can add “beginning time” survfit object using survfit0() function survival package, (default) adds starting time 0 life table. Singer Willett (2003) discuss, interpret columns life table follows: year: Defines time interval using ordinal numbers. interval: Defines precisely event times appear interval using interval notation, [start, end), interval includes starting time excludes ending time. n.risk: Defines risk set interval; , number (remaining) individuals eligible experience target event interval. n.event: Defines number individuals experienced target event interval. n.censor: Defines number individuals censored interval. Importantly, notice individual experiences target event censored interval, drop risk set future intervals; thus, risk set inherently irreversible.","code":"teachers #> # A tibble: 3,941 × 3 #> id years censor #> #> 1 1 1 0 #> 2 2 2 0 #> 3 3 1 0 #> 4 4 1 0 #> 5 5 12 1 #> 6 6 1 0 #> 7 7 12 1 #> 8 8 1 0 #> 9 9 2 0 #> 10 10 2 0 #> # ℹ 3,931 more rows teachers_fit <- survfit(Surv(years, 1 - censor) ~ 1, data = teachers) summary(teachers_fit) #> Call: survfit(formula = Surv(years, 1 - censor) ~ 1, data = teachers) #> #> time n.risk n.event survival std.err lower 95% CI upper 95% CI #> 1 3941 456 0.884 0.00510 0.874 0.894 #> 2 3485 384 0.787 0.00652 0.774 0.800 #> 3 3101 359 0.696 0.00733 0.682 0.710 #> 4 2742 295 0.621 0.00773 0.606 0.636 #> 5 2447 218 0.566 0.00790 0.550 0.581 #> 6 2229 184 0.519 0.00796 0.504 0.535 #> 7 2045 123 0.488 0.00796 0.472 0.504 #> 8 1642 79 0.464 0.00800 0.449 0.480 #> 9 1256 53 0.445 0.00811 0.429 0.461 #> 10 948 35 0.428 0.00827 0.412 0.445 #> 11 648 16 0.418 0.00848 0.401 0.435 #> 12 391 5 0.412 0.00870 0.396 0.430 teachers_lifetable <- teachers_fit |> survfit0() |> tidy() |> select(-c(estimate:conf.low)) |> mutate(interval = paste0(\"[\", time, \", \", time + 1, \")\"), .after = time) |> rename(year = time) teachers_lifetable #> # A tibble: 13 × 5 #> year interval n.risk n.event n.censor #> #> 1 0 [0, 1) 3941 0 0 #> 2 1 [1, 2) 3941 456 0 #> 3 2 [2, 3) 3485 384 0 #> 4 3 [3, 4) 3101 359 0 #> 5 4 [4, 5) 2742 295 0 #> 6 5 [5, 6) 2447 218 0 #> 7 6 [6, 7) 2229 184 0 #> 8 7 [7, 8) 2045 123 280 #> 9 8 [8, 9) 1642 79 307 #> 10 9 [9, 10) 1256 53 255 #> 11 10 [10, 11) 948 35 265 #> 12 11 [11, 12) 648 16 241 #> 13 12 [12, 13) 391 5 386"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-10.html","id":"a-framework-for-characterizing-the-distribution-of-discrete-time-event-occurrence-data","dir":"Articles","previous_headings":"","what":"10.2 A framework for characterizing the distribution of discrete-time event occurrence data","title":"Chapter 10: Describing discrete-time event occurrence data","text":"Section 10.2 Singer Willett (2003) introduce three statistics summarizing event history information life table, can estimated directly life table: Hazard fundamental quantity used assess risk event occurrence discrete time period. discrete-time hazard function conditional probability \\(\\)th individual experience target event \\(j\\)th interval, given experience prior interval: \\[ h(t_{ij}) = \\Pr[T_i = j \\mid T \\geq j], \\] whose maximum likelihood estimates given proportion interval’s risk set experiences event interval: \\[ \\hat h(t_j) = \\frac{n \\text{ events}_j}{n \\text{ risk}_j}. \\] survival function cumulative probability \\(\\)th individual experience target event past \\(j\\)th interval: \\[ S(t_{ij}) = \\Pr[T > j], \\] whose maximum likelihood estimates given cumulative product complement estimated hazard probabilities across current previous intervals: \\[ \\hat S(t_j) = [1 - \\hat h(t_j)] [1 - \\hat h(t_{j-1})] [1 - \\hat h(t_{j-2})] \\dots [1 - \\hat h(t_1)]. \\] median lifetime measure central tendency identifying point time estimate half sample experienced target event half , given : \\[ \\text{Estimated median lifetime} = m + \\Bigg[ \\frac{\\hat S(t_m) - .5}{\\hat S(t_m) - \\hat S(t_{m + 1})} \\Bigg] \\big( (m + 1) - m \\big), \\] \\(m\\) time interval immediately median lifetime, \\(\\hat S(t_m)\\) value survivor function \\(m\\)th interval, \\(\\hat S(t_{m + 1})\\) value survivor function next interval.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-10.html","id":"using-the-life-table-to-estimate-hazard-probability-survival-probability-and-median-lifetime","dir":"Articles","previous_headings":"10.2 A framework for characterizing the distribution of discrete-time event occurrence data","what":"Using the life table to estimate hazard probability, survival probability, and median lifetime","title":"Chapter 10: Describing discrete-time event occurrence data","text":"First, discrete-time hazard function survival function. Note use -else statements provide preset values “beginning time”, definition always NA discrete-time hazard function 1 survival function. median lifetime. use slice() function dplyr package select time intervals immediately median lifetime, bit wrangling make applying median lifetime equation easier clearer. valuable way examining statistics plot trajectories time. examining plots like , Singer Willett (2003) recommend looking patterns trajectories answer questions like: overall shape hazard function? time periods high low risk? time periods elevated risk likely affect large small numbers people, given value survivor function?","code":"teachers_lifetable <- teachers_lifetable |> mutate( haz.estimate = if_else(year != 0, n.event / n.risk, NA), surv.estimate = if_else(year != 0, 1 - haz.estimate, 1), surv.estimate = cumprod(surv.estimate) ) # Table 10.1, page 327: teachers_lifetable #> # A tibble: 13 × 7 #> year interval n.risk n.event n.censor haz.estimate surv.estimate #> #> 1 0 [0, 1) 3941 0 0 NA 1 #> 2 1 [1, 2) 3941 456 0 0.116 0.884 #> 3 2 [2, 3) 3485 384 0 0.110 0.787 #> 4 3 [3, 4) 3101 359 0 0.116 0.696 #> 5 4 [4, 5) 2742 295 0 0.108 0.621 #> 6 5 [5, 6) 2447 218 0 0.0891 0.566 #> 7 6 [6, 7) 2229 184 0 0.0825 0.519 #> 8 7 [7, 8) 2045 123 280 0.0601 0.488 #> 9 8 [8, 9) 1642 79 307 0.0481 0.464 #> 10 9 [9, 10) 1256 53 255 0.0422 0.445 #> 11 10 [10, 11) 948 35 265 0.0369 0.428 #> 12 11 [11, 12) 648 16 241 0.0247 0.418 #> 13 12 [12, 13) 391 5 386 0.0128 0.412 teachers_median_lifetime <- teachers_lifetable |> slice(max(which(surv.estimate >= .5)), min(which(surv.estimate <= .5))) |> mutate(m = c(\"before\", \"after\")) |> select(m, year, surv = surv.estimate) |> pivot_wider(names_from = m, values_from = c(year, surv)) |> summarise( surv.estimate = .5, year = year_before + ((surv_before - .5) / (surv_before - surv_after)) * (year_after - year_before) ) teachers_median_lifetime #> # A tibble: 1 × 2 #> surv.estimate year #> #> 1 0.5 6.61 teachers_haz <- ggplot(teachers_lifetable, aes(x = year, y = haz.estimate)) + geom_line() + scale_x_continuous(breaks = 0:13) + coord_cartesian(xlim = c(0, 13), ylim = c(0, .15)) teachers_surv <- ggplot(teachers_lifetable, aes(x = year, y = surv.estimate)) + geom_line() + geom_segment( aes(xend = year, y = 0, yend = .5), data = teachers_median_lifetime, linetype = 2 ) + geom_segment( aes(xend = 0, yend = .5), data = teachers_median_lifetime, linetype = 2 ) + scale_x_continuous(breaks = 0:13) + scale_y_continuous(breaks = c(0, .5, 1)) + coord_cartesian(xlim = c(0, 13)) # Figure 10.1, page 333: teachers_haz + teachers_surv + plot_layout(ncol = 1, axes = \"collect\")"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-10.html","id":"developing-intuition-about-hazard-functions-survivor-functions-and-median-lifetimes","dir":"Articles","previous_headings":"","what":"10.3 Developing intuition about hazard functions, survivor functions, and median lifetimes","title":"Chapter 10: Describing discrete-time event occurrence data","text":"Section 10.3 Singer Willett (2003) examine describe estimated discrete-time hazard functions, survivor functions, median lifetimes four studies differ type target event, metric clocking time, underlying profile risk: cocaine_relapse_1: person-level data frame 104 rows 4 columns containing subset data Hall, Havassy, Wasserman (1990), measured number weeks relapse cocaine use sample 104 former addicts released -patient treatment program. -patients followed 12 weeks used cocaine 7 consecutive days. first_sex: person-level data frame 180 rows 5 columns containing subset data Capaldi, Crosby, Stoolmiller (1996), measured grade year first sexual intercourse sample 180 -risk heterosexual adolescent males. Adolescent males followed Grade 7 Grade 12 reported sexual intercourse first time. suicide_ideation: person-level data frame 391 rows 4 columns containing subset data Bolger colleagues (1989) measuring age first suicide ideation sample 391 undergraduate students aged 16 22. Age first suicide ideation measured two-item survey asking respondents “ever thought committing suicide?” , “age thought first occur ?” congresswomen: person-level data frame 168 rows 5 columns containing data measuring long 168 women elected U.S. House Representatives 1919 1996 remained office. Representatives followed eight terms 1998. can plot discrete-time hazard functions, survivor functions, median lifetimes four studies single call using pmap() function purrr package. Focusing overall shape discrete-time hazard functions, contextualizing shape respective survival functions, Singer Willet (2003) make following observations: cocaine_relapse_1: discrete-time hazard function follows monotonically decreasing pattern—peaking immediately “beginning time” decreasing thereafter—common studying target events related recurrence relapse. reflected survival function, drops rapidly early time periods, slowly time hazard decreases, indicating prevalence relapse cocaine use greatest shortly leaving treatment. first_sex: discrete-time hazard function follows monotonically increasing pattern—starting low immediately “beginning time” increasing thereafter—common studying target events ultimately inevitable near universal. reflected survival function, drops slowly early time periods, rapidly time hazard increases, indicating prevalence first sexual intercourse still risk progressively increased time progressed. suicide_ideation: discrete-time hazard function follows nonmonotonic pattern multiple distinctive peaks troughs, generally arise studies long duration due data collection period sufficient length capture reversals otherwise seemingly monotonic trends. reflected survival function, multiple periods slow rapid decline, indicating prevalence suicide ideation low childhood, peaked adolescence, declined near early-childhood levels late adolescence still risk. congresswomen: discrete-time hazard function follows U-shaped pattern—periods high risk immediately “beginning time” end time—common studying target events different causes near beginning end time. reflected survival function, drops rapidly early late time periods, slowly middle, indicating prevalence leaving office greatest shortly first election, served long period time still risk.","code":"cocaine_relapse_1 #> # A tibble: 104 × 4 #> id weeks censor needle #> #> 1 501 2 0 1 #> 2 502 12 1 0 #> 3 503 1 0 1 #> 4 505 9 0 1 #> 5 507 3 0 1 #> 6 508 2 0 1 #> 7 509 12 1 0 #> 8 510 12 1 1 #> 9 511 1 0 1 #> 10 512 2 0 1 #> # ℹ 94 more rows first_sex #> # A tibble: 180 × 5 #> id grade censor parental_transition parental_antisociality #> #> 1 1 9 0 0 1.98 #> 2 2 12 1 1 -0.545 #> 3 3 12 1 0 -1.40 #> 4 5 12 0 1 0.974 #> 5 6 11 0 0 -0.636 #> 6 7 9 0 1 -0.243 #> 7 9 12 1 0 -0.869 #> 8 10 11 0 0 0.454 #> 9 11 12 1 1 0.802 #> 10 12 11 0 1 -0.746 #> # ℹ 170 more rows suicide_ideation #> # A tibble: 391 × 4 #> id age censor age_now #> #> 1 1 16 0 18 #> 2 2 10 0 19 #> 3 3 16 0 19 #> 4 4 20 0 22 #> 5 6 15 0 22 #> 6 7 10 0 19 #> 7 8 22 1 22 #> 8 9 22 1 22 #> 9 10 15 0 20 #> 10 11 10 0 19 #> # ℹ 381 more rows congresswomen #> # A tibble: 168 × 5 #> id name terms censor democrat #> #> 1 1 Abzug, Bella 3 0 1 #> 2 2 Andrews, Elizabeth 1 0 1 #> 3 3 Ashbrook, Jean 1 0 0 #> 4 4 Baker, Irene 1 0 0 #> 5 5 Bentley, Helen 5 0 0 #> 6 6 Blitch, Iris 4 0 1 #> 7 7 Boggs, Corinne 8 1 1 #> 8 8 Boland, Veronica Grace 1 0 1 #> 9 9 Bolton, Frances 8 1 0 #> 10 10 Bosone, Reva 2 0 1 #> # ℹ 158 more rows study_plots <- pmap( list( list(\"cocaine_relapse_1\", \"first_sex\", \"suicide_ideation\", \"congresswomen\"), list(cocaine_relapse_1, first_sex, suicide_ideation, congresswomen), list(\"weeks\", \"grade\", \"age\", \"terms\"), list(0, 6, 5, 0) ), \\(.title, .study, .time, .beginning) { # Get life table statistics. study_fit <- survfit(Surv(.study[[.time]], 1 - censor) ~ 1, data = .study) study_lifetable <- study_fit |> survfit0(start.time = .beginning) |> tidy() |> rename(surv.estimate = estimate) |> mutate(haz.estimate = if_else(time != .beginning, n.event / n.risk, NA)) study_median_lifetime <- study_lifetable |> slice(max(which(surv.estimate >= .5)), min(which(surv.estimate <= .5))) |> mutate(m = c(\"before\", \"after\")) |> select(m, time, surv = surv.estimate) |> pivot_wider(names_from = m, values_from = c(time, surv)) |> summarise( surv.estimate = .5, time = time_before + ((surv_before - .5) / (surv_before - surv_after)) * (time_after - time_before) ) # Plot discrete-time hazard and survival functions. study_haz <- ggplot(study_lifetable, aes(x = time, y = haz.estimate)) + geom_line() + xlab(.time) study_surv <- ggplot(study_lifetable, aes(x = time, y = surv.estimate)) + geom_line() + geom_segment( aes(xend = time, y = 0, yend = .5), data = study_median_lifetime, linetype = 2 ) + geom_segment( aes(xend = .beginning, yend = .5), data = study_median_lifetime, linetype = 2 ) + xlab(.time) wrap_elements(panel = (study_haz | study_surv)) + ggtitle(.title) } ) # Figure 10.2, page 340: wrap_plots(study_plots, ncol = 1)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-10.html","id":"quantifying-the-effects-of-sampling-variation","dir":"Articles","previous_headings":"","what":"10.4 Quantifying the effects of sampling variation","title":"Chapter 10: Describing discrete-time event occurrence data","text":"Section 10.4 Singer Willett (2003) return teachers data discuss standard errors estimated discrete-time hazard probabilities survival probabilities, can also estimated directly life table: estimated discrete-time hazard probability simply sample proportion, standard error \\(j\\)th time period can estimated using usual formula estimating standard error proportion: \\[ se \\big(\\hat h(t_j) \\big) = \\sqrt{\\frac{\\hat h(t_j) \\big(1 - \\hat h(t_j) \\big)}{n \\text{ risk}_j}}. \\] risk sets greater size 20, standard error survival probability \\(j\\)th time period can can estimated using Greenwood’s approximation: \\[ se \\big(\\hat S(t_j) \\big) = \\hat S(t_j) \\sqrt{ \\frac{\\hat h(t_1)}{n \\text{ risk}_1 \\big(1 - \\hat h(t_1) \\big)} + \\frac{\\hat h(t_2)}{n \\text{ risk}_2 \\big(1 - \\hat h(t_2) \\big)} + \\cdots + \\frac{\\hat h(t_j)}{n \\text{ risk}_j \\big(1 - \\hat h(t_j) \\big)} }. \\] estimate standard errors using teachers_lifetable Section 10.2.","code":"# Table 10.2, page 349: teachers_lifetable |> filter(year != 0) |> mutate( haz.std.error = sqrt(haz.estimate * (1 - haz.estimate) / n.risk), surv.std.error = surv.estimate * sqrt( cumsum(haz.estimate / (n.risk * (1 - haz.estimate))) ) ) |> select(year, n.risk, starts_with(\"haz\"), starts_with(\"surv\")) #> # A tibble: 12 × 6 #> year n.risk haz.estimate haz.std.error surv.estimate surv.std.error #> #> 1 1 3941 0.116 0.00510 0.884 0.00510 #> 2 2 3485 0.110 0.00530 0.787 0.00652 #> 3 3 3101 0.116 0.00575 0.696 0.00733 #> 4 4 2742 0.108 0.00592 0.621 0.00773 #> 5 5 2447 0.0891 0.00576 0.566 0.00790 #> 6 6 2229 0.0825 0.00583 0.519 0.00796 #> 7 7 2045 0.0601 0.00526 0.488 0.00796 #> 8 8 1642 0.0481 0.00528 0.464 0.00800 #> 9 9 1256 0.0422 0.00567 0.445 0.00811 #> 10 10 948 0.0369 0.00612 0.428 0.00827 #> 11 11 648 0.0247 0.00610 0.418 0.00848 #> 12 12 391 0.0128 0.00568 0.412 0.00870"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-10.html","id":"a-simple-and-useful-strategy-for-constructing-the-life-table","dir":"Articles","previous_headings":"","what":"10.5 A simple and useful strategy for constructing the life table","title":"Chapter 10: Describing discrete-time event occurrence data","text":"Section 10.5 Singer Willett (2003) introduce person-period format event occurrence data, demonstrating can used construct life table “hand” using person-level teachers data set.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-10.html","id":"the-person-level-data-set","dir":"Articles","previous_headings":"10.5 A simple and useful strategy for constructing the life table","what":"The person-level data set","title":"Chapter 10: Describing discrete-time event occurrence data","text":"person-level format event occurrence data, person one row data columns event time censorship status, (optionally) participant identifier variable variables interest. demonstrated teachers data set, person-level data frame 3941 rows 3 columns: id: Teacher ID. years: number years teacher’s dates hire departure Michigan public schools. censor: Censoring status. Note unlike modelling change, person-level data set contain multiple columns time period; thus, demonstrate , new strategy needed convert person-level data set person-period data set. Additionally, also unlike modelling change, person-level data set often useful analyzing event occurrence—demonstrated several examples current previous chapter.","code":"teachers #> # A tibble: 3,941 × 3 #> id years censor #> #> 1 1 1 0 #> 2 2 2 0 #> 3 3 1 0 #> 4 4 1 0 #> 5 5 12 1 #> 6 6 1 0 #> 7 7 12 1 #> 8 8 1 0 #> 9 9 2 0 #> 10 10 2 0 #> # ℹ 3,931 more rows"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-10.html","id":"the-person-period-data-set","dir":"Articles","previous_headings":"10.5 A simple and useful strategy for constructing the life table","what":"The person-period data set","title":"Chapter 10: Describing discrete-time event occurrence data","text":"person-period format event occurrence data, person one row data time period risk, participant identifier variable person, event-indicator variable time period. can use reframe() function dplyr package convert person-level data set person-period data set. reframe() function works similarly dplyr’s summarise() function, except can return arbitrary number rows per group. take advantage property add rows time period individuals risk, use information stored new rows person-level data set identify whether event occurred individual’s last period, given censorship status. Following similar logic, can use summarise() function dplyr package convert person-period data set person-level data set. difference person-level person-period formats best seen examining data subset individuals different (censored) event times.","code":"teachers_pp <- teachers |> group_by(id) |> reframe( year = 1:years, event = if_else(year == years & censor == 0, true = 1, false = 0) ) teachers_pp #> # A tibble: 24,875 × 3 #> id year event #> #> 1 1 1 1 #> 2 2 1 0 #> 3 2 2 1 #> 4 3 1 1 #> 5 4 1 1 #> 6 5 1 0 #> 7 5 2 0 #> 8 5 3 0 #> 9 5 4 0 #> 10 5 5 0 #> # ℹ 24,865 more rows teachers_pl <- teachers_pp |> group_by(id) |> summarise( years = max(year), censor = if_else(all(event == 0), true = 1, false = 0) ) teachers_pl #> # A tibble: 3,941 × 3 #> id years censor #> #> 1 1 1 0 #> 2 2 2 0 #> 3 3 1 0 #> 4 4 1 0 #> 5 5 12 1 #> 6 6 1 0 #> 7 7 12 1 #> 8 8 1 0 #> 9 9 2 0 #> 10 10 2 0 #> # ℹ 3,931 more rows # Figure 10.4, page 353: filter(teachers_pl, id %in% c(20, 126, 129)) #> # A tibble: 3 × 3 #> id years censor #> #> 1 20 3 0 #> 2 126 12 0 #> 3 129 12 1 teachers_pp |> filter(id %in% c(20, 126, 129)) |> print(n = 27) #> # A tibble: 27 × 3 #> id year event #> #> 1 20 1 0 #> 2 20 2 0 #> 3 20 3 1 #> 4 126 1 0 #> 5 126 2 0 #> 6 126 3 0 #> 7 126 4 0 #> 8 126 5 0 #> 9 126 6 0 #> 10 126 7 0 #> 11 126 8 0 #> 12 126 9 0 #> 13 126 10 0 #> 14 126 11 0 #> 15 126 12 1 #> 16 129 1 0 #> 17 129 2 0 #> 18 129 3 0 #> 19 129 4 0 #> 20 129 5 0 #> 21 129 6 0 #> 22 129 7 0 #> 23 129 8 0 #> 24 129 9 0 #> 25 129 10 0 #> 26 129 11 0 #> 27 129 12 0"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-10.html","id":"using-the-person-period-data-set-to-construct-the-life-table","dir":"Articles","previous_headings":"10.5 A simple and useful strategy for constructing the life table","what":"Using the person-period data set to construct the life table","title":"Chapter 10: Describing discrete-time event occurrence data","text":"life table can constructed using person-period data set cross-tabulation time period event-indicator variables. can accomplished using standard df |> group_by(...) |> summarise(...) statement dplyr package, count number individuals risk, experienced target event, censored time period. , statistics summarizing event history information life table can estimated using methods demonstrated Section 10.2.","code":"# Table 10.3, page 355: teachers_pp |> group_by(year) |> summarise( n.risk = n(), n.event = sum(event == 1), n.censor = sum(event == 0), haz.estimate = n.event / n.risk ) #> # A tibble: 12 × 5 #> year n.risk n.event n.censor haz.estimate #> #> 1 1 3941 456 3485 0.116 #> 2 2 3485 384 3101 0.110 #> 3 3 3101 359 2742 0.116 #> 4 4 2742 295 2447 0.108 #> 5 5 2447 218 2229 0.0891 #> 6 6 2229 184 2045 0.0825 #> 7 7 2045 123 1922 0.0601 #> 8 8 1642 79 1563 0.0481 #> 9 9 1256 53 1203 0.0422 #> 10 10 948 35 913 0.0369 #> 11 11 648 16 632 0.0247 #> 12 12 391 5 386 0.0128"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-11.html","id":"toward-a-statistical-model-for-discretetime-hazard","dir":"Articles","previous_headings":"","what":"11.1 Toward a Statistical Model for DiscreteTime Hazard","title":"Chapter 11: Fitting basic discrete-time hazard models","text":"Several examples chapter rely following: Figure 11.1, page 359: Table 11.1, page 360: Figure 11.2, page 363: Figure 11.3, page 366:","code":"first_sex_fit <- survfit(Surv(grade, 1 - censor) ~ 1, data = first_sex) first_sex_pt <- c(0, 1) |> map_dfr( \\(.x) { first_sex_fit_subset <- update( first_sex_fit, subset = (parental_transition == .x) ) first_sex_fit_subset |> survfit0(start.time = 6) |> tidy() |> rename(survival_probability = estimate) |> mutate( hazard_probability = n.event / n.risk, odds = hazard_probability / (1 - hazard_probability), log_odds = log(odds) ) |> select(-starts_with(\"conf\"), -std.error) |> rename(grade = time) |> pivot_longer( cols = c(survival_probability, hazard_probability, odds, log_odds), values_to = \"estimate\" ) |> # The figure doesn't include data for grade 6 in the hazard function. filter( !(name %in% c(\"hazard_probability\", \"odds\", \"log_odds\") & grade == 6) ) }, .id = \"parental_transition\" ) first_sex_pt |> filter(name %in% c(\"survival_probability\", \"hazard_probability\")) |> ggplot(aes(x = grade, y = estimate, colour = parental_transition)) + geom_hline( aes(yintercept = .5), data = tibble(name = \"survival_probability\"), alpha = .25, linetype = 2 ) + geom_line() + scale_x_continuous(breaks = 6:12) + coord_cartesian(xlim = c(6, 12)) + facet_wrap(vars(name), ncol = 1, scales = \"free_y\") + ggh4x::facetted_pos_scales( y = list( name == \"hazard_probability\" ~ scale_y_continuous(limits = c(0, .5)), name == \"survival_probability\" ~ scale_y_continuous(breaks = c(0, .5, 1), limits = c(0, 1)) ) ) # First two sections of the table first_sex_pt |> filter(grade != 6, !(name %in% c(\"odds\", \"log_odds\"))) |> pivot_wider(names_from = name, values_from = estimate) |> select(everything(), -n.censor, hazard_probability, survival_probability) #> # A tibble: 12 × 6 #> parental_transition grade n.risk n.event survival_probability #> #> 1 1 7 72 2 0.972 #> 2 1 8 70 2 0.944 #> 3 1 9 68 8 0.833 #> 4 1 10 60 8 0.722 #> 5 1 11 52 10 0.583 #> 6 1 12 42 8 0.472 #> 7 2 7 108 13 0.880 #> 8 2 8 95 5 0.833 #> 9 2 9 90 16 0.685 #> 10 2 10 74 21 0.491 #> 11 2 11 53 15 0.352 #> 12 2 12 38 18 0.185 #> # ℹ 1 more variable: hazard_probability # Last section first_sex_fit |> tidy() |> rename(survival_probability = estimate) |> mutate( hazard_probability = n.event / n.risk, .before = survival_probability ) |> select(-starts_with(\"conf\"), -std.error, -n.censor) |> rename(grade = time) #> # A tibble: 6 × 5 #> grade n.risk n.event hazard_probability survival_probability #> #> 1 7 180 15 0.0833 0.917 #> 2 8 165 7 0.0424 0.878 #> 3 9 158 24 0.152 0.744 #> 4 10 134 29 0.216 0.583 #> 5 11 105 25 0.238 0.444 #> 6 12 80 26 0.325 0.3 first_sex_pt |> filter(name %in% c(\"hazard_probability\", \"odds\", \"log_odds\")) |> mutate( name = factor(name, levels = c(\"hazard_probability\", \"odds\", \"log_odds\")) ) |> ggplot(aes(x = grade, y = estimate, colour = parental_transition)) + geom_line() + scale_x_continuous(breaks = 6:12) + coord_cartesian(xlim = c(6, 12)) + facet_wrap(vars(name), ncol = 1, scales = \"free_y\") + ggh4x::facetted_pos_scales( y = list( name %in% c(\"hazard_probability\", \"odds\") ~ scale_y_continuous(limits = c(0, 1)), name == \"log_odds\" ~ scale_y_continuous(limits = c(-4, 0)) ) ) # Transform to person-period format. first_sex_pp <- first_sex |> rename(grades = grade) |> reframe( grade = 7:max(grades), event = if_else(grade == grades & censor == 0, 1, 0), parental_transition, parental_antisociality, .by = id ) # Fit models for each panel. first_sex_fit_11.3a <- glm( event ~ parental_transition, family = \"binomial\", data = first_sex_pp ) first_sex_fit_11.3b <- update(first_sex_fit_11.3a, . ~ . + grade) first_sex_fit_11.3c <- update(first_sex_fit_11.3a, . ~ . + factor(grade)) # Plot: map_df( list(a = first_sex_fit_11.3a, b = first_sex_fit_11.3b, c = first_sex_fit_11.3c), \\(.x) augment(.x, newdata = first_sex_pp), .id = \"model\" ) |> ggplot(aes(x = grade, y = .fitted, colour = factor(parental_transition))) + geom_line() + geom_point( aes(y = estimate), data = first_sex_pt |> mutate(parental_transition = as.numeric(parental_transition) - 1) |> filter(name == \"log_odds\") ) + coord_cartesian(ylim = c(-4, 0)) + facet_wrap(vars(model), ncol = 1, labeller = label_both) + labs( y = \"logit(hazard)\", colour = \"parental_transition\" )"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-11.html","id":"a-formal-representation-of-the-population-discrete-time-hazard-model","dir":"Articles","previous_headings":"","what":"11.2 A Formal Representation of the Population Discrete-Time Hazard Model","title":"Chapter 11: Fitting basic discrete-time hazard models","text":"Figure 11.4, page 374:","code":"# Panel A: first_sex_fit_11.3c |> augment(newdata = first_sex_pp) |> ggplot(aes(x = grade, y = .fitted, colour = factor(parental_transition))) + geom_line() + coord_cartesian(ylim = c(-4, 0)) # Panel B: first_sex_fit_11.4b <- update( first_sex_fit_11.3c, . ~ . + parental_transition * factor(grade) ) first_sex_fit_11.4b |> augment(newdata = first_sex_pp) |> ggplot(aes(x = grade, y = exp(.fitted), colour = factor(parental_transition))) + geom_line() + coord_cartesian(ylim = c(0, 1)) # Panel C: first_sex_fit_11.4b |> augment(newdata = first_sex_pp, type.predict = \"response\") |> ggplot(aes(x = grade, y = .fitted, colour = factor(parental_transition))) + geom_line()"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-11.html","id":"fitting-a-discrete-time-hazard-model-to-data","dir":"Articles","previous_headings":"","what":"11.3 Fitting a Discrete-Time Hazard Model to Data","title":"Chapter 11: Fitting basic discrete-time hazard models","text":"Figure 11.5: Table 11.3, page 386:","code":"model_A <- glm( event ~ factor(grade) - 1, family = \"binomial\", data = first_sex_pp ) model_B <- update(model_A, . ~ . + parental_transition) model_C <- update(model_A, . ~ . + parental_antisociality) model_D <- update(model_B, . ~ . + parental_antisociality) anova(model_B) #> Analysis of Deviance Table #> #> Model: binomial, link: logit #> #> Response: event #> #> Terms added sequentially (first to last) #> #> #> Df Deviance Resid. Df Resid. Dev Pr(>Chi) #> NULL 822 1139.53 #> factor(grade) 6 487.58 816 651.96 < 2.2e-16 *** #> parental_transition 1 17.29 815 634.66 3.203e-05 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 anova(model_C) #> Analysis of Deviance Table #> #> Model: binomial, link: logit #> #> Response: event #> #> Terms added sequentially (first to last) #> #> #> Df Deviance Resid. Df Resid. Dev Pr(>Chi) #> NULL 822 1139.53 #> factor(grade) 6 487.58 816 651.96 < 2.2e-16 *** #> parental_antisociality 1 14.79 815 637.17 0.0001204 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 # Deviance tests are sequential so the order of terms matters. To test # parental_transition and parental_antisociality, the model needs to be # fit twice, once with each as the last term. anova(update(model_C, . ~ . + parental_transition)) #> Analysis of Deviance Table #> #> Model: binomial, link: logit #> #> Response: event #> #> Terms added sequentially (first to last) #> #> #> Df Deviance Resid. Df Resid. Dev Pr(>Chi) #> NULL 822 1139.53 #> factor(grade) 6 487.58 816 651.96 < 2.2e-16 *** #> parental_antisociality 1 14.79 815 637.17 0.0001204 *** #> parental_transition 1 8.02 814 629.15 0.0046222 ** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 anova(model_D) #> Analysis of Deviance Table #> #> Model: binomial, link: logit #> #> Response: event #> #> Terms added sequentially (first to last) #> #> #> Df Deviance Resid. Df Resid. Dev Pr(>Chi) #> NULL 822 1139.53 #> factor(grade) 6 487.58 816 651.96 < 2.2e-16 *** #> parental_transition 1 17.29 815 634.66 3.203e-05 *** #> parental_antisociality 1 5.51 814 629.15 0.01886 * #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-11.html","id":"interpreting-parameter-estimates","dir":"Articles","previous_headings":"","what":"11.4 Interpreting Parameter Estimates","title":"Chapter 11: Fitting basic discrete-time hazard models","text":"Table 11.4, page 388:","code":"model_A |> tidy() |> select(term, estimate) |> mutate( odds = exp(estimate), hazard = 1 / (1 + exp(-estimate)) ) #> # A tibble: 6 × 4 #> term estimate odds hazard #> #> 1 factor(grade)7 -2.40 0.0909 0.0833 #> 2 factor(grade)8 -3.12 0.0443 0.0424 #> 3 factor(grade)9 -1.72 0.179 0.152 #> 4 factor(grade)10 -1.29 0.276 0.216 #> 5 factor(grade)11 -1.16 0.313 0.238 #> 6 factor(grade)12 -0.731 0.481 0.325"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-11.html","id":"displaying-fitted-hazard-and-survivor-functions","dir":"Articles","previous_headings":"","what":"11.5 Displaying Fitted Hazard and Survivor Functions","title":"Chapter 11: Fitting basic discrete-time hazard models","text":"Table 11.5, page 392: Figure 11.6, page 393: Figure 11.7, page 395:","code":"model_B_tidy <- model_B |> augment( newdata = expand_grid(grade = 7:12, parental_transition = 0:1) ) |> mutate( hazard = 1 / (1 + exp(-.fitted)), survival = cumprod(1 - hazard), .by = parental_transition ) model_B_tidy #> # A tibble: 12 × 5 #> grade parental_transition .fitted hazard survival #> #> 1 7 0 -2.99 0.0477 0.952 #> 2 7 1 -2.12 0.107 0.893 #> 3 8 0 -3.70 0.0241 0.929 #> 4 8 1 -2.83 0.0559 0.843 #> 5 9 0 -2.28 0.0927 0.843 #> 6 9 1 -1.41 0.197 0.677 #> 7 10 0 -1.82 0.139 0.726 #> 8 10 1 -0.949 0.279 0.488 #> 9 11 0 -1.65 0.161 0.609 #> 10 11 1 -0.781 0.314 0.335 #> 11 12 0 -1.18 0.235 0.466 #> 12 12 1 -0.305 0.424 0.193 # FIXME: should use survfit0() for the survival panel so time starts at 6. model_B_tidy |> pivot_longer(cols = .fitted:survival) |> ggplot(aes(x = grade, y = value, colour = factor(parental_transition))) + geom_line() + facet_wrap(vars(name), ncol = 1, scales = \"free_y\") prototypical_males <- tibble( id = rep(1:6, times = length(7:12)), expand_grid( grade = 7:12, parental_transition = c(0, 1), parental_antisociality = -1:1 ) ) prototypical_first_sex <- tibble( log_odds = predict( model_D, prototypical_males ), hazard = 1 / (1 + exp(-log_odds)) ) grade_six <- tibble( id = 1:6, grade = 6, expand_grid( parental_transition = c(0, 1), parental_antisociality = -1:1 ), log_odds = NA, hazard = NA, survival = 1 ) prototypical_males |> bind_cols(prototypical_first_sex) |> mutate(survival = cumprod(1 - hazard), .by = id) |> add_row(grade_six) |> pivot_longer(cols = c(hazard, survival)) |> ggplot(aes(x = grade, y = value, group = id)) + geom_line( aes( colour = factor(parental_antisociality), linetype = factor(parental_transition) ) ) + scale_colour_grey(start = 0, end = 0.75) + facet_wrap( vars(name), ncol = 1, scales = \"free_y\" ) #> Warning: Removed 6 rows containing missing values or values outside the scale range #> (`geom_line()`)."},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-12.html","id":"alternative-specifications-for-the-main-effect-of-time","dir":"Articles","previous_headings":"","what":"12.1 Alternative Specifications for the “Main Effect of TIME”","title":"Chapter 12: Extending the discrete-time hazard model","text":"Table 12.2, page 413: Figure 12.1, page 414:","code":"# Convert to person-period format tenure_pp <- tenure |> reframe( year = 1:max(years), event = if_else(year == years & censor == 0, 1, 0), .by = id ) |> mutate( temp_year = year, temp_dummy = 1 ) |> pivot_wider( names_from = temp_year, names_prefix = \"year_\", values_from = temp_dummy, values_fill = 0 ) # Fit models tenure_fit_general <- glm( event ~ factor(year), family = \"binomial\", data = tenure_pp ) tenure_fit_constant <- glm( event ~ 1, family = \"binomial\", data = tenure_pp ) tenure_fit_linear <- update(tenure_fit_constant, . ~ year) tenure_fit_quadratic <- update(tenure_fit_linear, . ~ . + I(year^2)) tenure_fit_cubic <- update(tenure_fit_quadratic, . ~ . + I(year^3)) tenure_fit_order_4 <- update(tenure_fit_cubic, . ~ . + I(year^4)) tenure_fit_order_5 <- update(tenure_fit_order_4, . ~ . + I(year^5)) # Compare anova( tenure_fit_constant, tenure_fit_linear, tenure_fit_quadratic, tenure_fit_cubic, tenure_fit_order_4, tenure_fit_order_5 ) #> Analysis of Deviance Table #> #> Model 1: event ~ 1 #> Model 2: event ~ year #> Model 3: event ~ year + I(year^2) #> Model 4: event ~ year + I(year^2) + I(year^3) #> Model 5: event ~ year + I(year^2) + I(year^3) + I(year^4) #> Model 6: event ~ year + I(year^2) + I(year^3) + I(year^4) + I(year^5) #> Resid. Df Resid. Dev Df Deviance Pr(>Chi) #> 1 1473 1037.57 #> 2 1472 867.46 1 170.103 < 2.2e-16 *** #> 3 1471 836.30 1 31.158 2.379e-08 *** #> 4 1470 833.17 1 3.132 0.07679 . #> 5 1469 832.74 1 0.430 0.51208 #> 6 1468 832.73 1 0.011 0.91831 #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 tenure_fit_trajectories <- map_df( list( constant = tenure_fit_constant, linear = tenure_fit_linear, quadratic = tenure_fit_quadratic, cubic = tenure_fit_cubic, general = tenure_fit_general ), \\(.x) { augment(.x, newdata = tibble(year = 1:9)) }, .id = \"model\" ) tenure_fit_trajectories |> mutate( model = factor( model, levels = c(\"constant\", \"linear\", \"quadratic\", \"cubic\", \"general\") ), hazard = if_else( model %in% c(\"quadratic\", \"general\"), 1 / (1 + exp(-.fitted)), NA ), survival = if_else( model %in% c(\"quadratic\", \"general\"), cumprod(1 - hazard), NA ), .by = model ) |> rename(logit_hazard = .fitted) |> pivot_longer(cols = logit_hazard:survival, names_to = \"estimate\") |> mutate(estimate = factor( estimate, levels = c(\"logit_hazard\", \"hazard\", \"survival\")) ) |> ggplot(aes(x = year, y = value, colour = model)) + geom_line() + scale_color_brewer(type = \"qual\", palette = \"Dark2\") + scale_x_continuous(breaks = 1:9) + facet_wrap(vars(estimate), scales = \"free_y\", labeller = label_both) #> Warning: Removed 54 rows containing missing values or values outside the scale range #> (`geom_line()`)."},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-12.html","id":"using-the-complementary-log-log-link-to-specify-a-discrete-time-hazard-model","dir":"Articles","previous_headings":"","what":"12.2 Using the Complementary Log-Log Link to Specify a Discrete-Time Hazard Model","title":"Chapter 12: Extending the discrete-time hazard model","text":"Figure 12.2: Figure 12.3, page 423: Table 12.3, page 424:","code":"first_sex_pp <- first_sex |> rename(grades = grade) |> reframe( grade = 7:max(grades), event = if_else(grade == grades & censor == 0, 1, 0), parental_transition, parental_antisociality, .by = id ) # The nested map_() is used here so we can get an ID column for both the # link function and the subset. map_dfr( list(logit = \"logit\", cloglog = \"cloglog\"), \\(.x) { map_dfr( list(`0` = 0, `1` = 1), \\(.y) { first_sex_fit <- glm( event ~ factor(grade), family = binomial(link = .x), data = first_sex_pp, subset = c(parental_transition == .y) ) augment(first_sex_fit, newdata = tibble(grade = 7:12)) }, .id = \"parental_transition\" ) }, .id = \"link\" ) |> ggplot( aes(x = grade, y = .fitted, colour = parental_transition, linetype = link) ) + geom_line() map_dfr( list(cloglog = \"cloglog\", logit = \"logit\"), \\(.x) { first_sex_fit <- glm( event ~ -1 + factor(grade) + parental_transition, family = binomial(link = .x), data = first_sex_pp ) first_sex_fit |> tidy() |> select(term, estimate) |> mutate( base_hazard = case_when( .x == \"logit\" & term != \"parental_transition\" ~ 1 / (1 + exp(-estimate)), .x == \"cloglog\" & term != \"parental_transition\" ~ 1 - exp(-exp(estimate)) ) ) }, .id = \"link\" ) |> pivot_wider(names_from = link, values_from = c(estimate, base_hazard)) #> # A tibble: 7 × 5 #> term estimate_cloglog estimate_logit base_hazard_cloglog base_hazard_logit #> #> 1 factor(… -2.97 -2.99 0.0498 0.0477 #> 2 factor(… -3.66 -3.70 0.0254 0.0241 #> 3 factor(… -2.32 -2.28 0.0940 0.0927 #> 4 factor(… -1.90 -1.82 0.139 0.139 #> 5 factor(… -1.76 -1.65 0.158 0.161 #> 6 factor(… -1.34 -1.18 0.230 0.235 #> 7 parenta… 0.785 0.874 NA NA"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-12.html","id":"time-varying-predictors","dir":"Articles","previous_headings":"","what":"12.3 Time-Varying Predictors","title":"Chapter 12: Extending the discrete-time hazard model","text":"Figure 12.4, page 432: Figure 12.5, page 437:","code":"first_depression_fit <- glm( depressive_episode ~ poly(I(period - 18), 3, raw = TRUE) + parental_divorce, family = binomial(link = \"logit\"), data = first_depression_1 ) # When a predictor enters the model as part of a matrix of covariates, such as # with stats::poly(), it is represented in augment() as a matrix column. A simple # workaround to get the predictor on its original scale as a vector is to pass # the original data to augment(). first_depression_predictions <- first_depression_fit |> augment(data = first_depression_1) |> mutate(hazard = 1 / (1 + exp(-.fitted))) # Proportions of the risk set at each age who experienced an initial depressive # episode at that age, as function of their parental divorce status at that age. first_depression_proportions <- first_depression_1 |> group_by(period, parental_divorce) |> summarise( total = n(), event = sum(depressive_episode), proportion = event / total, proportion = if_else(proportion == 0, NA, proportion), logit = log(proportion / (1 - proportion)) ) #> `summarise()` has grouped output by 'period'. You can override using the #> `.groups` argument. # Top plot ggplot(mapping = aes(x = period, colour = factor(parental_divorce))) + geom_line( aes(y = hazard), data = first_depression_predictions ) + geom_point( aes(y = proportion), data = first_depression_proportions ) + scale_x_continuous(breaks = seq(0, 40, by = 5), limits = c(0, 40)) + scale_y_continuous(limits = c(0, 0.06)) #> Warning: Removed 14 rows containing missing values or values outside the scale range #> (`geom_point()`). # Bottom plot ggplot(mapping = aes(x = period, colour = factor(parental_divorce))) + geom_line( aes(y = .fitted), data = first_depression_predictions ) + geom_point( aes(y = logit), data = first_depression_proportions ) + scale_x_continuous(breaks = seq(0, 40, by = 5), limits = c(0, 40)) + scale_y_continuous(breaks = seq(-8, -2, by = 1), limits = c(-8, -2)) #> Warning: Removed 14 rows containing missing values or values outside the scale range #> (`geom_point()`). first_depression_fit_2 <- update(first_depression_fit, . ~ . + female) first_depression_fit_2 |> augment( newdata = expand_grid( period = 4:39, parental_divorce = c(0, 1), female = c(0, 1) ) ) |> mutate( female = factor(female), parental_divorce = factor(parental_divorce), hazard = 1 / (1 + exp(-.fitted)), survival = cumprod(1 - hazard), .by = c(female, parental_divorce) ) |> pivot_longer(cols = c(hazard, survival), names_to = \"estimate\") |> ggplot(aes(x = period, y = value, linetype = female, colour = parental_divorce)) + geom_line() + facet_wrap(vars(estimate), ncol = 1, scales = \"free_y\") + scale_x_continuous(breaks = seq(0, 40, by = 5), limits = c(0, 40)) + ggh4x::facetted_pos_scales( y = list( estimate == \"hazard\" ~ scale_y_continuous(limits = c(0, .04)), estimate == \"survival\" ~ scale_y_continuous(limits = c(0, 1)) ) )"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-12.html","id":"the-linear-additivity-assumption-uncovering-violations-and-simple-solutions","dir":"Articles","previous_headings":"","what":"12.4 The Linear Additivity Assumption: Uncovering Violations and Simple Solutions","title":"Chapter 12: Extending the discrete-time hazard model","text":"Figure 12.6, page 445: Table 12.4, page 449:","code":"# Raw first_arrest |> group_by(period, abused, black) |> summarise( total = n(), event = sum(event), proportion = event / total, proportion = if_else(proportion == 0, NA, proportion), logit = log(proportion / (1 - proportion)) ) |> ungroup() |> mutate(across(c(abused, black), factor)) |> na.omit() |> ggplot(aes(x = period, y = logit, colour = abused, group = abused)) + geom_line() + scale_x_continuous(breaks = 7:19, limits = c(7, 19)) + scale_y_continuous(limits = c(-7, -2)) + facet_wrap(vars(black), labeller = label_both) #> `summarise()` has grouped output by 'period', 'abused'. You can override using #> the `.groups` argument. # Model first_arrest_fit <- glm( event ~ factor(period) + abused + black + abused:black, family = binomial(link = \"logit\"), data = first_arrest ) first_arrest_fit |> augment( newdata = expand_grid(period = 8:18, abused = c(0, 1), black = c(0, 1)) ) |> ggplot( aes( x = period, y = .fitted, colour = factor(abused), linetype = factor(black) ) ) + geom_line() + scale_x_continuous(breaks = 7:19, limits = c(7, 19)) + scale_y_continuous(limits = c(-8, -2)) model_A <- update(first_depression_fit_2, . ~ . + siblings) model_B <- update( first_depression_fit_2, . ~ . + between(siblings, 1, 2) + between(siblings, 3, 4) + between(siblings, 5, 6) + between(siblings, 7, 8) + between(siblings, 9, Inf) ) model_C <- update(first_depression_fit_2, . ~ . + bigfamily) tidy(model_A) #> # A tibble: 7 × 5 #> term estimate std.error statistic p.value #> #> 1 (Intercept) -4.36 0.122 -35.8 2.23e-281 #> 2 poly(I(period - 18), 3, raw = TRUE)1 0.0611 0.0117 5.24 1.64e- 7 #> 3 poly(I(period - 18), 3, raw = TRUE)2 -0.00731 0.00122 -5.97 2.34e- 9 #> 4 poly(I(period - 18), 3, raw = TRUE)3 0.000182 0.0000790 2.30 2.14e- 2 #> 5 parental_divorce 0.373 0.162 2.29 2.18e- 2 #> 6 female 0.559 0.109 5.10 3.34e- 7 #> 7 siblings -0.0814 0.0223 -3.66 2.57e- 4 tidy(model_B) #> # A tibble: 11 × 5 #> term estimate std.error statistic p.value #> #> 1 (Intercept) -4.50 0.207 -21.8 4.22e-105 #> 2 poly(I(period - 18), 3, raw = TRUE)1 0.0615 0.0117 5.27 1.37e- 7 #> 3 poly(I(period - 18), 3, raw = TRUE)2 -0.00729 0.00122 -5.96 2.56e- 9 #> 4 poly(I(period - 18), 3, raw = TRUE)3 0.000181 0.0000790 2.30 2.17e- 2 #> 5 parental_divorce 0.373 0.162 2.29 2.18e- 2 #> 6 female 0.560 0.110 5.11 3.24e- 7 #> 7 between(siblings, 1, 2)TRUE 0.0209 0.198 0.106 9.16e- 1 #> 8 between(siblings, 3, 4)TRUE 0.0108 0.210 0.0512 9.59e- 1 #> 9 between(siblings, 5, 6)TRUE -0.494 0.255 -1.94 5.22e- 2 #> 10 between(siblings, 7, 8)TRUE -0.775 0.344 -2.26 2.41e- 2 #> 11 between(siblings, 9, Inf)TRUE -0.658 0.344 -1.91 5.56e- 2 tidy(model_C) #> # A tibble: 7 × 5 #> term estimate std.error statistic p.value #> #> 1 (Intercept) -4.48 0.109 -41.2 0 #> 2 poly(I(period - 18), 3, raw = TRUE)1 0.0614 0.0117 5.27 1.40e-7 #> 3 poly(I(period - 18), 3, raw = TRUE)2 -0.00729 0.00122 -5.96 2.54e-9 #> 4 poly(I(period - 18), 3, raw = TRUE)3 0.000182 0.0000790 2.30 2.15e-2 #> 5 parental_divorce 0.371 0.162 2.29 2.22e-2 #> 6 female 0.558 0.109 5.10 3.44e-7 #> 7 bigfamily -0.611 0.145 -4.22 2.39e-5"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-12.html","id":"the-proportionality-assumption-uncovering-violations-and-simple-solutions","dir":"Articles","previous_headings":"","what":"12.5 The proportionality assumption: Uncovering violations and simple solutions","title":"Chapter 12: Extending the discrete-time hazard model","text":"Figure 12.8, page 458: Table 12.5, page 459:","code":"# Raw math_dropout |> group_by(term, woman) |> summarise( total = n(), event = sum(event), proportion = event / total, proportion = if_else(proportion == 0, NA, proportion), logit = log(proportion / (1 - proportion)) ) |> ungroup() |> mutate(across(c(woman), factor)) |> na.omit() |> ggplot(aes(x = term, y = logit, colour = woman)) + geom_line() #> `summarise()` has grouped output by 'term'. You can override using the #> `.groups` argument. # Models model_A <- glm( event ~ -1 + factor(term) + woman, family = binomial(link = \"logit\"), data = math_dropout ) model_B <- glm( event ~ -1 + factor(term) + factor(term):woman, family = binomial(link = \"logit\"), data = math_dropout ) model_C <- update(model_A, . ~ . + woman:I(term - 1)) map_df( list(model_A = model_A, model_B = model_B, model_C = model_C), \\(.x) { .x |> augment(newdata = expand_grid(term = 1:5, woman = c(0, 1))) |> mutate(hazard = 1 / (1 + exp(-.fitted))) }, .id = \"model\" ) |> ggplot(aes(x = term, y = hazard, colour = factor(woman))) + geom_line() + facet_wrap(vars(model)) tidy(model_A) #> # A tibble: 6 × 5 #> term estimate std.error statistic p.value #> #> 1 factor(term)1 -2.13 0.0567 -37.6 0 #> 2 factor(term)2 -0.942 0.0479 -19.7 3.14e- 86 #> 3 factor(term)3 -1.45 0.0634 -22.8 1.66e-115 #> 4 factor(term)4 -0.618 0.0757 -8.16 3.42e- 16 #> 5 factor(term)5 -0.772 0.143 -5.40 6.54e- 8 #> 6 woman 0.379 0.0501 7.55 4.33e- 14 tidy(model_B) #> # A tibble: 10 × 5 #> term estimate std.error statistic p.value #> #> 1 factor(term)1 -2.01 0.0715 -28.1 1.40e-173 #> 2 factor(term)2 -0.964 0.0585 -16.5 5.98e- 61 #> 3 factor(term)3 -1.48 0.0847 -17.5 1.45e- 68 #> 4 factor(term)4 -0.710 0.101 -7.05 1.81e- 12 #> 5 factor(term)5 -0.869 0.191 -4.56 5.23e- 6 #> 6 factor(term)1:woman 0.157 0.0978 1.60 1.09e- 1 #> 7 factor(term)2:woman 0.419 0.0792 5.28 1.27e- 7 #> 8 factor(term)3:woman 0.441 0.116 3.81 1.42e- 4 #> 9 factor(term)4:woman 0.571 0.145 3.95 7.86e- 5 #> 10 factor(term)5:woman 0.601 0.286 2.10 3.55e- 2 tidy(model_C) #> # A tibble: 7 × 5 #> term estimate std.error statistic p.value #> #> 1 factor(term)1 -2.05 0.0646 -31.6 7.80e-220 #> 2 factor(term)2 -0.926 0.0482 -19.2 3.96e- 82 #> 3 factor(term)3 -1.50 0.0665 -22.5 3.54e-112 #> 4 factor(term)4 -0.718 0.0861 -8.34 7.34e- 17 #> 5 factor(term)5 -0.917 0.156 -5.89 3.94e- 9 #> 6 woman 0.227 0.0774 2.94 3.31e- 3 #> 7 woman:I(term - 1) 0.120 0.0470 2.55 1.08e- 2"},{"path":[]},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-12.html","id":"residual-analysis","dir":"Articles","previous_headings":"","what":"12.7 Residual Analysis","title":"Chapter 12: Extending the discrete-time hazard model","text":"Table 12.6, page 465: Figure 12.8, page 467:","code":"first_sex_fit <- glm( event ~ -1 + factor(grade) + parental_transition + parental_antisociality, family = binomial(link = \"logit\"), data = first_sex_pp ) first_sex_fit |> augment(data = first_sex_pp, type.residuals = \"deviance\") |> select(id:parental_antisociality, .resid) |> filter(id %in% c(22, 112, 166, 89, 102, 87, 67, 212)) |> pivot_wider( id_cols = id, names_from = grade, names_prefix = \"grade_\", values_from = .resid ) #> # A tibble: 8 × 7 #> id grade_7 grade_8 grade_9 grade_10 grade_11 grade_12 #> #> 1 22 -0.412 -0.294 -0.584 -0.718 -0.775 1.41 #> 2 67 -0.618 -0.448 -0.856 -1.03 -1.10 1.04 #> 3 87 1.82 NA NA NA NA NA #> 4 89 -0.325 -0.231 -0.464 -0.575 1.86 NA #> 5 102 -0.491 2.37 NA NA NA NA #> 6 112 -0.411 -0.294 -0.583 -0.717 -0.774 -0.956 #> 7 166 -0.661 -0.481 -0.911 -1.09 1.19 NA #> 8 212 -0.286 -0.203 -0.410 -0.509 -0.552 -0.696 first_sex_fit |> augment(data = first_sex_pp, type.residuals = \"deviance\") |> ggplot(aes(x = id, y = .resid)) + geom_point() + geom_hline(yintercept = 0) first_sex_fit |> augment(data = first_sex_pp, type.residuals = \"deviance\") |> group_by(id) |> summarise(ss.deviance = sum(.resid^2)) |> ggplot(aes(x = id, y = ss.deviance)) + geom_point()"},{"path":[]},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-13.html","id":"grouped-methods-for-estimating-continuous-time-survivor-and-hazard-functions","dir":"Articles","previous_headings":"","what":"13.2 Grouped Methods for Estimating Continuous-Time Survivor and Hazard Functions","title":"Chapter 13: Describing continuous-time event occurrence data","text":"Table 13.2, page 477: Figure 13.1, page 479:","code":"# Adding discrete-time intervals honking_discrete <- honking |> mutate( event = if_else(censor == 0, 1, 0), time_interval = cut(seconds, breaks = c(1:8, 18), right = FALSE), time_start = str_extract(time_interval, \"[[:digit:]]+(?=,)\"), time_end = str_extract(time_interval, \"(?<=,)[[:digit:]]+\"), across(c(time_start, time_end), as.numeric) ) # Grouped life table honking_grouped <- honking_discrete |> group_by(time_interval, time_start, time_end) |> summarise( total = n(), n_event = sum(event), n_censor = sum(censor), # All grouping needs to be dropped in order to calculate the number at risk # correctly. .groups = \"drop\" ) |> mutate(n_risk = sum(total) - lag(cumsum(total), default = 0)) # The conditional probability can be estimated using the same discrete-time methods # from the previous chapter, using the grouped data. honking_grouped_fit <- glm( cbind(n_event, n_risk - n_event) ~ 0 + time_interval, family = binomial(link = \"logit\"), data = honking_grouped ) honking_grouped_fit |> # .fitted is the conditional probability broom::augment(newdata = honking_grouped, type.predict = \"response\") |> mutate( survival = cumprod(1 - .fitted), hazard = .fitted / (time_end - time_start) ) #> # A tibble: 8 × 10 #> time_interval time_start time_end total n_event n_censor n_risk .fitted #> #> 1 [1,2) 1 2 6 5 1 57 0.0877 #> 2 [2,3) 2 3 17 14 3 51 0.275 #> 3 [3,4) 3 4 11 9 2 34 0.265 #> 4 [4,5) 4 5 10 6 4 23 0.261 #> 5 [5,6) 5 6 4 2 2 13 0.154 #> 6 [6,7) 6 7 4 2 2 9 0.222 #> 7 [7,8) 7 8 1 1 0 5 0.2 #> 8 [8,18) 8 18 4 3 1 4 0.75 #> # ℹ 2 more variables: survival , hazard # Estimates by hand honking_discrete_fit <- honking_grouped |> mutate( conditional_probability = n_event / n_risk, discrete.s = cumprod(1 - conditional_probability), discrete.h = conditional_probability / (time_end - time_start), # The actuarial method redefines the number of individuals to be at risk of # event occurrence for both the survival and hazard functions, and thus has # different conditional probabilities from the discrete method. n_risk.s = n_risk - (n_censor / 2), conditional_probability.s = n_event / n_risk.s, actuarial.s = cumprod(1 - conditional_probability.s), n_risk.h = n_risk.s - (n_event / 2), conditional_probability.h = n_event / n_risk.h, actuarial.h = conditional_probability.h / (time_end - time_start) ) |> select( -c(conditional_probability.s, conditional_probability.h, n_risk.s, n_risk.h) ) |> add_row(time_end = 0:1, discrete.s = 1, actuarial.s = 1) honking_discrete_fit #> # A tibble: 10 × 12 #> time_interval time_start time_end total n_event n_censor n_risk #> #> 1 [1,2) 1 2 6 5 1 57 #> 2 [2,3) 2 3 17 14 3 51 #> 3 [3,4) 3 4 11 9 2 34 #> 4 [4,5) 4 5 10 6 4 23 #> 5 [5,6) 5 6 4 2 2 13 #> 6 [6,7) 6 7 4 2 2 9 #> 7 [7,8) 7 8 1 1 0 5 #> 8 [8,18) 8 18 4 3 1 4 #> 9 NA NA 0 NA NA NA NA #> 10 NA NA 1 NA NA NA NA #> # ℹ 5 more variables: conditional_probability , discrete.s , #> # discrete.h , actuarial.s , actuarial.h honking_discrete_fit |> pivot_longer( cols = c(discrete.h, discrete.s, actuarial.h, actuarial.s), names_to = \"estimate\" ) |> mutate( estimate = factor( estimate, levels = c(\"discrete.s\", \"actuarial.s\", \"discrete.h\", \"actuarial.h\") ) ) |> ggplot(aes(x = time_end, y = value)) + geom_line(data = \\(x) filter(x, str_detect(estimate, \"discrete\"))) + geom_step( data = \\(x) filter(x, str_detect(estimate, \"actuarial\")), direction = \"vh\" ) + scale_x_continuous(limits = c(0, 20)) + facet_wrap(vars(estimate), scales = \"free_y\") + ggh4x::facetted_pos_scales( y = list( str_detect(estimate, \"s$\") ~ scale_y_continuous(limits = c(0, 1)), str_detect(estimate, \"h$\") ~ scale_y_continuous(limits = c(0, .35), breaks = seq(0, .35, by = .05)) ) )"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-13.html","id":"the-kaplan-meier-method-of-estimating-the-continuous-time-survivor-function","dir":"Articles","previous_headings":"","what":"13.3 The Kaplan-Meier Method of Estimating the Continuous-Time Survivor Function","title":"Chapter 13: Describing continuous-time event occurrence data","text":"Table 13.3, page 484: Figure 13.2, page 485:","code":"honking_continuous_fit <- survfit(Surv(seconds, 1 - censor) ~ 1, data = honking) honking_continuous_fit_tidy <- honking_continuous_fit |> survfit0() |> tidy() |> select(-starts_with(\"conf\")) |> mutate( # tidy() returns the standard error for the cumulative hazard, so we need to # transform it into the standard error for the survival. std.error = estimate * std.error, conditional_probability = n.event / n.risk, time_interval = 1:n(), time_end = lead(time, default = Inf), width = time_end - time, hazard = conditional_probability / width ) |> relocate( time_interval, time_start = time, time_end, n.risk:n.censor, conditional_probability, survival = estimate ) honking_continuous_fit_tidy #> # A tibble: 57 × 11 #> time_interval time_start time_end n.risk n.event n.censor #> #> 1 1 0 1.41 57 0 0 #> 2 2 1.41 1.51 57 1 1 #> 3 3 1.51 1.67 55 1 0 #> 4 4 1.67 1.68 54 1 0 #> 5 5 1.68 1.86 53 1 0 #> 6 6 1.86 2.12 52 1 0 #> 7 7 2.12 2.19 51 1 0 #> 8 8 2.19 2.36 50 1 0 #> 9 9 2.36 2.48 49 0 1 #> 10 10 2.48 2.5 48 1 0 #> # ℹ 47 more rows #> # ℹ 5 more variables: conditional_probability , survival , #> # std.error , width , hazard honking_continuous_fit_tidy |> add_row(time_end = 0:1, survival = 1) |> # The largest event time was censored, so we extend the last step out to that # largest censored value rather than going to infinity. mutate(time_end = if_else(time_end == Inf, time_start, time_end)) |> ggplot() + geom_step( aes(x = time_end, y = survival, linetype = \"1: Kaplan Meier\"), direction = \"vh\" ) + geom_line( aes(x = time_end, y = discrete.s, linetype = \"2: Discrete-time\"), data = honking_discrete_fit ) + geom_step( aes(x = time_end, y = actuarial.s, linetype = \"3: Actuarial\"), data = honking_discrete_fit, direction = \"vh\" ) + scale_x_continuous(limits = c(0, 20)) + labs(x = \"time\")"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-13.html","id":"the-cumulative-hazard-function","dir":"Articles","previous_headings":"","what":"13.4 The Cumulative Hazard Function","title":"Chapter 13: Describing continuous-time event occurrence data","text":"Figure 13.4, page 493:","code":"honking_continuous_fit_tidy |> mutate(time_end = if_else(time_end == Inf, time_start, time_end)) |> ggplot(aes(x = time_end)) + geom_step( aes(y = -log(survival), linetype = \"Negative log\"), direction = \"vh\" ) + geom_step( aes(y = cumsum(hazard * width), linetype = \"Nelson-Aalen\"), direction = \"vh\" ) #> Warning: Removed 1 row containing missing values or values outside the scale range #> (`geom_step()`)."},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-13.html","id":"kernel-smoothed-estimates-of-the-hazard-function","dir":"Articles","previous_headings":"","what":"13.5 Kernel-Smoothed Estimates of the Hazard Function","title":"Chapter 13: Describing continuous-time event occurrence data","text":"Figure 13.5, page 496:","code":"kernel_smoothed_hazards <- map_df( set_names(1:3), \\(bandwidth) { # muhaz() estimates the hazard function from right-censored data using # kernel-based methods, using the vector of survival and event times. kernel_smoothed_hazard <- muhaz( honking$seconds, 1 - honking$censor, # Narrow the temporal region the smoothed function describes, given the # bandwidth and the minimum and maximum observed event times. min.time = min(honking$seconds[honking$censor == 0]) + bandwidth, max.time = max(honking$seconds[honking$censor == 0]) - bandwidth, bw.grid = bandwidth, bw.method = \"global\", b.cor = \"none\", kern = \"epanechnikov\" ) tidy(kernel_smoothed_hazard) }, .id = \"bandwidth\" ) #> Warning in muhaz(honking$seconds, 1 - honking$censor, min.time = min(honking$seconds[honking$censor == : minimum time > minimum Survival Time #> Warning in muhaz(honking$seconds, 1 - honking$censor, min.time = min(honking$seconds[honking$censor == : minimum time > minimum Survival Time #> Warning in muhaz(honking$seconds, 1 - honking$censor, min.time = min(honking$seconds[honking$censor == : minimum time > minimum Survival Time ggplot(kernel_smoothed_hazards, aes(x = time, y = estimate)) + geom_line() + scale_x_continuous(limits = c(0, 20)) + facet_wrap(vars(bandwidth), ncol = 1, labeller = label_both)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-13.html","id":"developing-an-intuition-about-continuous-time-survivor-cumulative-hazard-and-kernel-smoothed-hazard-functions","dir":"Articles","previous_headings":"","what":"13.6 Developing an Intuition about Continuous-Time Survivor, Cumulative Hazard, and Kernel-Smoothed Hazard Functions","title":"Chapter 13: Describing continuous-time event occurrence data","text":"Figure 13.6, page 499:","code":"# TODO: Check that models are correct, then tidy up code. # Fit survival models alcohol_relapse_fit <- survfit( Surv(weeks, 1 - censor) ~ 1, data = alcohol_relapse ) judges_fit <- survfit( Surv(tenure, dead) ~ 1, data = judges ) first_depression_fit <- survfit( Surv(age, 1 - censor) ~ 1, data = first_depression_2 ) health_workers_fit <- survfit( Surv(weeks, 1 - censor) ~ 1, data = health_workers ) # Tidy survival models survival_models <- list( alcohol_relapse = alcohol_relapse_fit, judges = judges_fit, first_depression = first_depression_fit, health_workers = health_workers_fit ) survival_models_tidy <- map( survival_models, \\(.x) { .x |> survfit0() |> tidy() |> mutate(cumulative_hazard = -log(estimate)) |> select(time, survival = estimate, cumulative_hazard) |> pivot_longer( cols = c(survival, cumulative_hazard), names_to = \"statistic\", values_to = \"estimate\" ) } ) # Estimate and tidy smoothed hazards kernel_smoothed_hazards_tidy <- pmap( list( list( alcohol_relapse = alcohol_relapse$weeks, judges = judges$tenure, first_depression = first_depression_2$age, health_workers = health_workers$weeks ), list( 1 - alcohol_relapse$censor, judges$dead, 1 - first_depression_2$censor, 1 - health_workers$censor ), list(12, 5, 7, 7) ), \\(survival_time, event, bandwidth) { kernel_smoothed_hazard <- muhaz( survival_time, event, min.time = min(survival_time[1 - event == 0]) + bandwidth, max.time = max(survival_time[1 - event == 0]) - bandwidth, bw.grid = bandwidth, bw.method = \"global\", b.cor = \"none\", kern = \"epanechnikov\" ) kernel_smoothed_hazard |> tidy() |> mutate(statistic = \"hazard\") } ) #> Warning in muhaz(survival_time, event, min.time = min(survival_time[1 - : minimum time > minimum Survival Time #> Warning in muhaz(survival_time, event, min.time = min(survival_time[1 - : minimum time > minimum Survival Time #> Warning in muhaz(survival_time, event, min.time = min(survival_time[1 - : minimum time > minimum Survival Time #> Warning in muhaz(survival_time, event, min.time = min(survival_time[1 - : minimum time > minimum Survival Time # Combine estimates estimates_tidy <- map2( survival_models_tidy, kernel_smoothed_hazards_tidy, \\(.x, .y) { bind_rows(.x, .y) |> mutate(statistic = factor( statistic, levels = c(\"survival\", \"cumulative_hazard\", \"hazard\")) ) } ) plots <- map2( estimates_tidy, names(estimates_tidy), \\(.x, .y) { ggplot(.x, aes(x = time, y = estimate)) + geom_step(data = \\(.x) filter(.x, statistic != \"hazard\")) + geom_line(data = \\(.x) filter(.x, statistic == \"hazard\")) + facet_wrap(vars(statistic), ncol = 1, scales = \"free_y\") + labs(title = .y) } ) patchwork::wrap_plots(plots, ncol = 4)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-14.html","id":"toward-a-statistical-model-for-continuous-time-hazard","dir":"Articles","previous_headings":"","what":"14.1 Toward a Statistical Model for Continuous-Time Hazard","title":"Chapter 14: Fitting the Cox regression model","text":"Figure 14.1, page 505: Figure 14.2, page 508:","code":"# Fit survival models rearrest_fit <- survfit( Surv(months, abs(censor - 1)) ~ 1, data = rearrest ) person_crime_0_fit <- update(rearrest_fit, subset = person_crime == 0) person_crime_1_fit <- update(rearrest_fit, subset = person_crime == 1) # Tidy survival models survival_models <- list( person_crime_0 = person_crime_0_fit, person_crime_1 = person_crime_1_fit ) survival_models_tidy <- map( survival_models, \\(.x) { .x |> survfit0() |> tidy() |> mutate(cumulative_hazard = -log(estimate)) |> select(time, survival = estimate, cumulative_hazard) |> pivot_longer( cols = c(survival, cumulative_hazard), names_to = \"statistic\", values_to = \"estimate\" ) } ) # Estimate and tidy smoothed hazards kernel_smoothed_hazards_tidy <- map2( list( person_crime_0 = filter(rearrest, person_crime == 0)$months, person_crime_1 = filter(rearrest, person_crime == 1)$months ), list( abs(filter(rearrest, person_crime == 0)$censor - 1), abs(filter(rearrest, person_crime == 1)$censor - 1) ), \\(survival_time, event) { kernel_smoothed_hazard <- muhaz( survival_time, event, min.time = min(survival_time[1 - event == 0]) + 8, max.time = max(survival_time[1 - event == 0]) - 8, bw.grid = 8, bw.method = \"global\", b.cor = \"none\", kern = \"epanechnikov\" ) kernel_smoothed_hazard |> tidy() |> mutate(statistic = \"hazard\") } ) #> Warning in muhaz(survival_time, event, min.time = min(survival_time[1 - : minimum time > minimum Survival Time #> Warning in muhaz(survival_time, event, min.time = min(survival_time[1 - : minimum time > minimum Survival Time # Combine estimates estimates_tidy <- map2( survival_models_tidy, kernel_smoothed_hazards_tidy, \\(.x, .y) { bind_rows(.x, .y) |> mutate(statistic = factor( statistic, levels = c(\"survival\", \"cumulative_hazard\", \"hazard\")) ) } ) |> list_rbind(names_to = \"person_crime\") # Plot ggplot(estimates_tidy, aes(x = time, y = estimate, linetype = person_crime)) + geom_step(data = \\(.x) filter(.x, statistic != \"hazard\")) + geom_line(data = \\(.x) filter(.x, statistic == \"hazard\")) + facet_wrap(vars(statistic), ncol = 1, scales = \"free_y\") # Top plot log_cumulative_hazards <- estimates_tidy |> filter(statistic == \"cumulative_hazard\") |> mutate(estimate = log(estimate)) |> filter(!is.infinite(estimate)) ggplot( log_cumulative_hazards, aes(x = time, y = estimate, linetype = person_crime) ) + geom_hline(yintercept = 0) + geom_step() + coord_cartesian(xlim = c(0, 30), ylim = c(-6, 1)) # Middle and bottom plots ---- rearrest_fit_2 <- coxph( Surv(months, abs(censor - 1)) ~ person_crime, data = rearrest, method = \"efron\" ) rearrest_fit_2_curves <- map_df( list(person_crime_0 = 0, person_crime_1 = 1), \\(.x) { rearrest_fit_2 |> survfit( newdata = data.frame(person_crime = .x), type = \"kaplan-meier\" ) |> tidy() |> mutate( cumulative_hazard = -log(estimate), log_cumulative_hazard = log(cumulative_hazard) ) }, .id = \"person_crime\" ) # Middle plot rearrest_fit_2 |> augment(data = rearrest, type.predict = \"survival\") |> mutate( cumulative_hazard = -log(.fitted), log_cumulative_hazard = log(cumulative_hazard) ) |> ggplot(mapping = aes(x = months, y = log_cumulative_hazard)) + geom_step(aes(x = time, linetype = person_crime), data = rearrest_fit_2_curves)+ geom_point( aes(shape = person_crime, x = time, y = estimate), data = log_cumulative_hazards ) + scale_y_continuous(breaks = -6:1) + coord_cartesian(xlim = c(0, 30), ylim = c(-6, 1)) # Bottom plot rearrest_fit_2 |> augment(data = rearrest, type.predict = \"survival\") |> mutate( cumulative_hazard = -log(.fitted), log_cumulative_hazard = log(cumulative_hazard) ) |> ggplot(mapping = aes(x = months, y = cumulative_hazard)) + geom_step(aes(x = time, linetype = person_crime), data = rearrest_fit_2_curves) + geom_point( aes(shape = person_crime, x = time, y = estimate), data = filter(estimates_tidy, statistic == \"cumulative_hazard\") ) + coord_cartesian(xlim = c(0, 30))"},{"path":[]},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-14.html","id":"interpreting-the-results-of-fitting-the-cox-regression-model-to-data","dir":"Articles","previous_headings":"","what":"14.3 Interpreting the Results of Fitting the Cox Regression Model to Data","title":"Chapter 14: Fitting the Cox regression model","text":"Table 14.1, page 525: Table 14.2, page 533:","code":"# TODO: Make table model_A <- coxph(Surv(months, abs(censor - 1)) ~ person_crime, data = rearrest) model_B <- coxph(Surv(months, abs(censor - 1)) ~ property_crime, data = rearrest) model_C <- coxph(Surv(months, abs(censor - 1)) ~ age, data = rearrest) model_D <- coxph( Surv(months, abs(censor - 1)) ~ person_crime + property_crime + age, data = rearrest ) # TODO"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-14.html","id":"nonparametric-strategies-for-displaying-the-results-of-model-fitting","dir":"Articles","previous_headings":"","what":"14.4 Nonparametric Strategies for Displaying the Results of Model Fitting","title":"Chapter 14: Fitting the Cox regression model","text":"Figure 14.4, page 538: Figure 14.5, page 541:","code":"pmap( list( list(baseline = 0, average = mean(rearrest$person_crime)), list(0, mean(rearrest$property_crime)), list(0, mean(rearrest$age)) ), \\(.person_crime, .property_crime, .age) { model_D_baseline <- model_D |> survfit( newdata = tibble( person_crime = .person_crime, property_crime = .property_crime, age = .age) ) |> survfit0() |> tidy() survival <- ggplot(model_D_baseline, aes(x = time, y = estimate)) + geom_line() + geom_point() + scale_x_continuous(limits = c(0, 29)) + coord_cartesian(xlim = c(0, 36), ylim = c(0, 1)) cumulative_hazard <- ggplot( model_D_baseline, aes(x = time, y = -log(estimate)) ) + geom_line() + geom_point() + scale_x_continuous(limits = c(0, 29)) + coord_cartesian(xlim = c(0, 36), ylim = c(0, 1.5)) #TODO: Not sure if muhaz can deal with this situation with newdata hazard <- muhaz( model_D_baseline$time, 1 - model_D_baseline$n.censor, min.time = min(rearrest$months[rearrest$censor == 0]) + 8, max.time = max(rearrest$months[rearrest$censor == 0]) - 8, bw.grid = 8, bw.method = \"global\", b.cor = \"none\", kern = \"epanechnikov\" ) |> tidy() |> ggplot(aes(x = time, y = estimate)) + geom_line() + coord_cartesian(xlim = c(0, 36), ylim = c(0, 0.08)) survival + cumulative_hazard + hazard + plot_layout(ncol = 1) } ) |> patchwork::wrap_plots() #> Warning in muhaz(model_D_baseline$time, 1 - model_D_baseline$n.censor, min.time = min(rearrest$months[rearrest$censor == : minimum time > minimum Survival Time #> Warning in muhaz(model_D_baseline$time, 1 - model_D_baseline$n.censor, min.time = min(rearrest$months[rearrest$censor == : minimum time > minimum Survival Time #> Warning: Removed 6 rows containing missing values or values outside the scale range #> (`geom_line()`). #> Warning: Removed 6 rows containing missing values or values outside the scale range #> (`geom_point()`). #> Warning: Removed 6 rows containing missing values or values outside the scale range #> (`geom_line()`). #> Warning: Removed 6 rows containing missing values or values outside the scale range #> (`geom_point()`). #> Warning: Removed 6 rows containing missing values or values outside the scale range #> (`geom_line()`). #> Warning: Removed 6 rows containing missing values or values outside the scale range #> (`geom_point()`). #> Warning: Removed 6 rows containing missing values or values outside the scale range #> (`geom_line()`). #> Warning: Removed 6 rows containing missing values or values outside the scale range #> (`geom_point()`). #TODO: Not sure if muhaz can deal with this situation with newdata, not sure if the # estimates can be modified after fitting to get the desired values hazard_fit <- muhaz( rearrest$months, 1 - rearrest$censor, min.time = min(rearrest$months[rearrest$censor == 0]) + 8, max.time = max(rearrest$months[rearrest$censor == 0]) - 8, bw.grid = 8, bw.method = \"global\", b.cor = \"none\", kern = \"epanechnikov\" ) #> Warning in muhaz(rearrest$months, 1 - rearrest$censor, min.time = min(rearrest$months[rearrest$censor == : minimum time > minimum Survival Time hazard_fit |> str() #> List of 7 #> $ pin :List of 13 #> ..$ times : num [1:194] 0.0657 0.1314 0.23 0.2957 0.2957 ... #> ..$ delta : num [1:194] 1 1 1 1 1 1 1 0 1 1 ... #> ..$ nobs : int 194 #> ..$ min.time : num 8.07 #> ..$ max.time : num 21 #> ..$ n.min.grid : num 51 #> ..$ min.grid : num [1:51] 8.07 8.32 8.58 8.84 9.1 ... #> ..$ n.est.grid : num 101 #> ..$ bw.pilot : num 1.03 #> ..$ bw.smooth : num 5.16 #> ..$ method : int 1 #> ..$ b.cor : num 0 #> ..$ kernel.type: num 1 #> $ est.grid : num [1:101] 8.07 8.19 8.32 8.45 8.58 ... #> $ haz.est : num [1:101] 0.0418 0.0419 0.042 0.0421 0.0421 ... #> $ imse.opt : num 0 #> $ bw.glob : num 8 #> $ glob.imse: num 0 #> $ bw.grid : num 8 #> - attr(*, \"class\")= chr \"muhaz\" prototypical_individuals <- map2_df( # .person_crime list(neither = 0, personal_only = 1, property_only = 0, both = 1), # .property_crime list(0, 0, 1, 1), \\(.person_crime, .property_crime) { model_D |> survfit( newdata = tibble( person_crime = .person_crime, property_crime = .property_crime, age = mean(rearrest$age) ) ) |> survfit0() |> tidy() }, .id = \"prototypical_individual\" ) prototypical_individuals_survival <- ggplot( prototypical_individuals, aes(x = time, y = estimate, colour = prototypical_individual)) + geom_line() + scale_x_continuous(limits = c(0, 29)) + coord_cartesian(xlim = c(0, 36), ylim = c(0, 1)) + labs( y = \"Survival\" ) prototypical_individuals_cumhaz <- ggplot( prototypical_individuals, aes(x = time, y = -log(estimate), colour = prototypical_individual)) + geom_line() + scale_x_continuous(limits = c(0, 29)) + coord_cartesian(xlim = c(0, 36), ylim = c(0, 2)) + labs( y = \"Cumulative hazard\" ) prototypical_individuals_logcumhaz <- ggplot( filter(prototypical_individuals, time != 0), aes(x = time, y = log(-log(estimate)), colour = prototypical_individual)) + geom_line() + scale_x_continuous(limits = c(0, 29)) + scale_y_continuous(breaks = -7:1) + coord_cartesian(xlim = c(0, 36), ylim = c(-7, 1)) + labs( y = \"log Cumulative hazard\" ) prototypical_individuals_survival + prototypical_individuals_cumhaz + prototypical_individuals_logcumhaz + plot_layout(ncol = 1, guides = \"collect\") #> Warning: Removed 24 rows containing missing values or values outside the scale range #> (`geom_line()`). #> Removed 24 rows containing missing values or values outside the scale range #> (`geom_line()`). #> Removed 24 rows containing missing values or values outside the scale range #> (`geom_line()`)."},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-15.html","id":"time-varying-predictors","dir":"Articles","previous_headings":"","what":"15.1 Time-Varying Predictors","title":"Chapter 15: Extending the Cox regression model","text":"Table 15.1, page 548:","code":"# TODO: Clean up code and make table model_A <- coxph( Surv(used_cocaine_age, 1 - censor) ~ birthyr + early_marijuana_use + early_drug_use, data = first_cocaine ) # Model B ---- first_cocaine_pp <- first_cocaine |> group_by(id) |> reframe( # {survival} uses the counting process method for time-varying predictors, # so we need to construct intervals for the ages at which different events # occurred. These intervals are left-censored, so we start with the end # time; we also only require unique intervals, so duplicate ages should be # removed. age_end = sort(unique(c(used_cocaine_age, used_marijuana_age, used_drugs_age))), age_start = lag(age_end, default = 0), # Time-varying predictors should be lagged so that they describe an individual's # status in the immediately prior year. used_cocaine = if_else( age_end == used_cocaine_age & censor == 0, true = 1, false = 0, missing = 0 ), used_marijuana = if_else( age_end > used_marijuana_age, true = 1, false = 0, missing = 0 ), used_drugs = if_else( age_end > used_drugs_age, true = 1, false = 0, missing = 0 ), # Keep time-invariant predictors from the person-level data birthyr ) |> relocate(age_start, .before = age_end) model_B <- coxph( Surv(age_start, age_end, used_cocaine) ~ birthyr + used_marijuana + used_drugs, data = first_cocaine_pp, ties = \"efron\" ) ## This method with tmerge() also works tmerge( first_cocaine, first_cocaine, id = id, used_cocaine = event(used_cocaine_age, 1 - censor), used_marijuana = tdc(used_marijuana_age), used_drugs = tdc(used_drugs_age), options = list( tstartname = \"age_start\", tstopname = \"age_end\" ) ) |> as_tibble() |> arrange(id) #> Warning: Unknown or uninitialised column: `tstop`. #> Warning: Unknown or uninitialised column: `tstart`. #> Warning in tmerge(first_cocaine, first_cocaine, id = id, used_cocaine = #> event(used_cocaine_age, : replacement of variable 'used_marijuana' #> Warning in tmerge(first_cocaine, first_cocaine, id = id, used_cocaine = #> event(used_cocaine_age, : replacement of variable 'used_drugs' #> # A tibble: 3,086 × 18 #> id used_cocaine_age censor birthyr early_marijuana_use early_drug_use #> #> 1 5 41 1 0 0 0 #> 2 5 41 1 0 0 0 #> 3 5 41 1 0 0 0 #> 4 8 32 1 10 0 0 #> 5 9 36 1 5 0 0 #> 6 9 36 1 5 0 0 #> 7 11 41 1 0 0 0 #> 8 12 32 0 4 0 0 #> 9 12 32 0 4 0 0 #> 10 13 39 1 3 0 0 #> # ℹ 3,076 more rows #> # ℹ 12 more variables: used_marijuana , used_marijuana_age , #> # sold_marijuana , sold_marijuana_age , used_drugs , #> # used_drugs_age , sold_drugs , sold_drugs_age , rural , #> # age_start , age_end , used_cocaine coxph( Surv(age_start, age_end, used_cocaine) ~ birthyr + used_marijuana + used_drugs, data = tmerge( first_cocaine, first_cocaine, id = id, used_cocaine = event(used_cocaine_age, 1 - censor), used_marijuana = tdc(used_marijuana_age), used_drugs = tdc(used_drugs_age), options = list( tstartname = \"age_start\", tstopname = \"age_end\" ) ), ties = \"efron\" ) |> summary() #> Warning: Unknown or uninitialised column: `tstop`. #> Warning: Unknown or uninitialised column: `tstart`. #> Warning in tmerge(first_cocaine, first_cocaine, id = id, used_cocaine = #> event(used_cocaine_age, : replacement of variable 'used_marijuana' #> Warning in tmerge(first_cocaine, first_cocaine, id = id, used_cocaine = #> event(used_cocaine_age, : replacement of variable 'used_drugs' #> Warning: Unknown or uninitialised column: `tstop`. #> Warning: Unknown or uninitialised column: `tstart`. #> Warning in tmerge(first_cocaine, first_cocaine, id = id, used_cocaine = #> event(used_cocaine_age, : replacement of variable 'used_marijuana' #> Warning in tmerge(first_cocaine, first_cocaine, id = id, used_cocaine = #> event(used_cocaine_age, : replacement of variable 'used_drugs' #> Call: #> coxph(formula = Surv(age_start, age_end, used_cocaine) ~ birthyr + #> used_marijuana + used_drugs, data = tmerge(first_cocaine, #> first_cocaine, id = id, used_cocaine = event(used_cocaine_age, #> 1 - censor), used_marijuana = tdc(used_marijuana_age), #> used_drugs = tdc(used_drugs_age), options = list(tstartname = \"age_start\", #> tstopname = \"age_end\")), ties = \"efron\") #> #> n= 3086, number of events= 382 #> #> coef exp(coef) se(coef) z Pr(>|z|) #> birthyr 0.10741 1.11340 0.02145 5.008 5.5e-07 *** #> used_marijuana 2.55176 12.82972 0.28095 9.082 < 2e-16 *** #> used_drugs 1.85387 6.38446 0.12921 14.347 < 2e-16 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> exp(coef) exp(-coef) lower .95 upper .95 #> birthyr 1.113 0.89815 1.068 1.161 #> used_marijuana 12.830 0.07794 7.397 22.252 #> used_drugs 6.384 0.15663 4.956 8.225 #> #> Concordance= 0.876 (se = 0.008 ) #> Likelihood ratio test= 856 on 3 df, p=<2e-16 #> Wald test = 451.1 on 3 df, p=<2e-16 #> Score (logrank) test = 1039 on 3 df, p=<2e-16 # Model C and D ---- first_cocaine_pp_C <- first_cocaine |> group_by(id) |> reframe( age_end = sort( unique( c( used_cocaine_age, used_marijuana_age, used_drugs_age, sold_marijuana_age, sold_drugs_age ) ) ), age_start = lag(age_end, default = 0), # Time-varying predictors should be lagged so that they describe an individual's # status in the immediately prior year. used_cocaine = if_else( age_end == used_cocaine_age & censor == 0, true = 1, false = 0, missing = 0 ), used_marijuana = if_else( age_end > used_marijuana_age, true = 1, false = 0, missing = 0 ), used_drugs = if_else( age_end > used_drugs_age, true = 1, false = 0, missing = 0 ), sold_marijuana = if_else( age_end > sold_marijuana_age, true = 1, false = 0, missing = 0 ), sold_drugs = if_else( age_end > sold_drugs_age, true = 1, false = 0, missing = 0 ), # Keep time-invariant predictors from the person-level data birthyr, early_marijuana_use, early_drug_use, rural ) |> relocate(age_start, .before = age_end) first_cocaine_model_C <- coxph( Surv(age_start, age_end, used_cocaine) ~ birthyr + used_marijuana + used_drugs + sold_marijuana + sold_drugs, data = first_cocaine_pp_C, ties = \"efron\" ) model_D <- update(first_cocaine_model_C, . ~ . + early_marijuana_use + early_drug_use)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-15.html","id":"imputation-strategies-for-time-varying-predictors","dir":"Articles","previous_headings":"15.1 Time-Varying Predictors","what":"15.1.3 Imputation Strategies for Time-Varying Predictors","title":"Chapter 15: Extending the Cox regression model","text":"Section 15.1.3 Singer Willett (2003) discuss imputation strategies time-varying predictors using subset unpublished data Hall, Havassy, Wasserman (1990), measured relation number days relapse cocaine use several predictors might associated relapse sample 104 newly abstinent cocaine users recently completed abstinence-oriented treatment program. Former cocaine users followed 12 weeks post-treatment used cocaine 7 consecutive days. Self-reported abstinence confirmed interview absence cocaine urine specimens. example use cocaine_relapse_2 data set, person-period data frame 1248 rows 7 columns: id: Participant ID. days: Number days relapse cocaine use censoring. Relapse defined 4 days cocaine use week preceding interview. Study dropouts lost participants coded relapsing cocaine use, number days relapse coded occurring week last follow-interview attended. censor: Censoring status (0 = relapsed, 1 = censored). needle: Binary indicator whether cocaine ever used intravenously. base_mood: Total score positive mood subscales (Activity Happiness) Mood Questionnaire (Ryman, Biersner, & LaRocco, 1974), taken intake interview last week treatment. item used five point Likert score (ranging 0 = , 4 = extremely). followup: Week follow-interview. -mood: Total score positive mood subscales (Activity Happiness) Mood Questionnaire (Ryman, Biersner, & LaRocco, 1974), taken follow-interviews week post-treatment. item used five point Likert score (ranging 0 = , 4 = extremely). time relapse measured days follow-interviews conducted week, cocaine relapse data current form fails meet data requirement time-varying predictors: unique event time days know time-varying mood scores—everyone still risk —moments. Thus, order meet data requirement must generate predictor histories provide near-daily mood scores participant. proceeding steps develop three Cox regression models fitted cocaine relapse data illustrate compare different imputation strategies time-varying predictors. includes number days relapse cocaine use outcome variable; time-invariant predictor needle; different time-varying variable representing predictor total score positive mood subscales Mood Questionnaire (Ryman, Biersner, & LaRocco, 1974), explore following popular imputation strategies suggested Singer Willett (2003): Carry forward mood score next one available. Interpolate adjacent mood scores. Compute moving average based recent several past mood scores.","code":"glimpse(cocaine_relapse_2) #> Rows: 1,248 #> Columns: 7 #> $ id 550, 550, 550, 550, 550, 550, 550, 550, 550, 550, 550, 550, … #> $ censor 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, … #> $ days 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, … #> $ needle 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, … #> $ base_mood 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 25, 25, 25, … #> $ followup 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, … #> $ mood 23, 27, 28, 31, 29, 32, 33, 28, 36, 33, 33, 24, 31, 19, 29, …"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-15.html","id":"exploratory-data-analysis","dir":"Articles","previous_headings":"15.1 Time-Varying Predictors > 15.1.3 Imputation Strategies for Time-Varying Predictors","what":"Exploratory Data Analysis","title":"Chapter 15: Extending the Cox regression model","text":"begin exploring time-invariant variables cocaine_relapse_2 data. convenient one Cox regression models fitted later , using person-level version cocaine_relapse_2 data. total 62 newly abstinent cocaine users (59.6%) relapsed cocaine use within 12 weeks completing abstinence-oriented treatment program. users relapsed early-follow-period. Across sample 38 unique event times. total 69 participants (66.3%) reported previously used cocaine intravenously.","code":"cocaine_relapse_2_pl <- cocaine_relapse_2 |> pivot_wider( names_from = followup, names_prefix = \"mood_\", values_from = mood ) glimpse(cocaine_relapse_2_pl) #> Rows: 104 #> Columns: 17 #> $ id 550, 604, 608, 631, 513, 531, 533, 536, 599, 542, 564, 573, … #> $ censor 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, … #> $ days 83, 83, 83, 83, 82, 82, 82, 82, 82, 81, 81, 81, 81, 81, 81, … #> $ needle 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, … #> $ base_mood 29, 25, 37, 39, 33, 27, 10, 27, 28, 19, 35, 32, 32, 31, 31, … #> $ mood_1 23, 31, 40, 42, 43, 14, 16, 22, 29, 31, 30, 27, 27, 41, 40, … #> $ mood_2 27, 19, 37, 22, 25, 11, 26, 21, 28, 25, 33, 26, 27, NA, NA, … #> $ mood_3 28, 29, 36, 38, 34, 2, 37, 24, 25, 28, 33, 24, 23, 38, 31, 4… #> $ mood_4 31, 24, 36, 41, 42, 8, 17, 24, NA, 20, 33, 29, 22, 37, NA, N… #> $ mood_5 29, 22, 32, 41, 46, 3, 30, NA, NA, 23, 26, 21, 26, 24, 28, N… #> $ mood_6 32, 22, 35, 42, 42, 3, 15, 22, 16, 29, 35, 28, 28, 27, 28, 2… #> $ mood_7 33, NA, 34, 42, 46, 5, 16, NA, 22, 27, 35, 22, 24, 27, 31, 2… #> $ mood_8 28, 20, 35, 42, 46, 3, 15, 23, 23, NA, 33, 28, 17, 28, 22, 3… #> $ mood_9 36, 31, 29, 46, 47, 2, 21, 19, 24, NA, 29, 25, 14, 31, 24, 3… #> $ mood_10 33, 33, 36, NA, NA, 2, 14, 21, 16, 31, NA, NA, NA, 37, 33, N… #> $ mood_11 33, 30, 30, 47, 28, 0, 20, 15, 18, 23, 26, 25, 17, 38, 29, 2… #> $ mood_12 24, NA, 36, 43, 44, 0, 16, 18, 16, 21, NA, 25, 19, 34, 30, 2… cocaine_relapse_2_pl |> group_by(relapsed = 1 - censor) |> summarise(count = n()) |> mutate(proportion = count / sum(count)) #> # A tibble: 2 × 3 #> relapsed count proportion #> #> 1 0 42 0.404 #> 2 1 62 0.596 ggplot(cocaine_relapse_2_pl, aes(x = days)) + geom_histogram(binwidth = 7) + scale_x_continuous(breaks = c(0, 1:12 * 7)) + facet_wrap(vars(relapsed = 1 - censor), labeller = label_both) # We will use these event times later on during the imputation procedure for # Model B. It is important they are sorted in ascending order for this # procedure, so we do so here for convenience while creating the object. event_times <- cocaine_relapse_2_pl |> filter(1 - censor == 1) |> pull(days) |> unique() |> sort() censor_times <- cocaine_relapse_2_pl |> filter(censor == 1) |> pull(days) |> unique() event_times |> discard(\\(.x) .x %in% censor_times) |> length() #> [1] 38 cocaine_relapse_2_pl |> group_by(needle) |> summarise(count = n()) |> mutate(proportion = count / sum(count)) #> # A tibble: 2 × 3 #> needle count proportion #> #> 1 0 35 0.337 #> 2 1 69 0.663"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-15.html","id":"model-a-time-invariant-baseline","dir":"Articles","previous_headings":"15.1 Time-Varying Predictors > 15.1.3 Imputation Strategies for Time-Varying Predictors","what":"Model A: Time-Invariant Baseline","title":"Chapter 15: Extending the Cox regression model","text":"Model uses time-invariant predictor assessing respondent’s mood score just release treatment.","code":"model_A <- coxph( Surv(days, 1 - censor) ~ needle + base_mood, data = cocaine_relapse_2_pl, ties = \"efron\" ) summary(model_A) #> Call: #> coxph(formula = Surv(days, 1 - censor) ~ needle + base_mood, #> data = cocaine_relapse_2_pl, ties = \"efron\") #> #> n= 104, number of events= 62 #> #> coef exp(coef) se(coef) z Pr(>|z|) #> needle 1.020734 2.775232 0.314068 3.250 0.00115 ** #> base_mood -0.003748 0.996259 0.014709 -0.255 0.79886 #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> exp(coef) exp(-coef) lower .95 upper .95 #> needle 2.7752 0.3603 1.4996 5.136 #> base_mood 0.9963 1.0038 0.9679 1.025 #> #> Concordance= 0.63 (se = 0.036 ) #> Likelihood ratio test= 12.51 on 2 df, p=0.002 #> Wald test = 10.6 on 2 df, p=0.005 #> Score (logrank) test = 11.51 on 2 df, p=0.003"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-15.html","id":"model-b","dir":"Articles","previous_headings":"15.1 Time-Varying Predictors > 15.1.3 Imputation Strategies for Time-Varying Predictors","what":"Model B:","title":"Chapter 15: Extending the Cox regression model","text":"Model B return person-period version cocaine_relapse_2 data explore first imputation strategy suggested Singer Willett (2003): Carrying forward mood score next one available. procedure also lag mood score predictor one week—associating, example, first followup baseline mood scores, second followup first followup’s mood scores, forth. Next, prepare cocaine_relapse_2_prevweek data modelling, want transform data set number days relapse counting process (start, stop) format, person-period format : row transformed data set represents “-risk” time interval (day_start, day_end], open left closed right. event variable row 1 time interval ends event 0 otherwise. Variable values row values apply time interval. start end points time interval determined vector unique event_times, defined earlier. censored data, end point final time interval determined time censorship—included vector unique event times—needs handled separately. Transforming cocaine_relapse_2_prevweek data counting process format two-step process. First create counting process structure, columns participant ID, start time, stop time, event status record. also add week variable indicating week record occurred , important second step process. Note step done using either person-period person-level versions cocaine_relapse_2 data however, readability use person-level data . result can obtained using person-period data wrapping calls days censor unique(). Second, join cocaine_relapse_2_prevweek data counting process structure id week, giving us counting process formatted data time-varying predictor’s values occurring appropriate time interval participant. Finally, match text, rename mood score variable week_mood. survival package also comes two utility functions, survSplit() tmerge(), can used transform data counting process format. discussion, see vignette(\"timedep\", package=\"survival\"). Now can fit Model B.","code":"cocaine_relapse_2_prevweek <- cocaine_relapse_2 |> group_by(id) |> mutate( mood_previous_week = lag(mood, default = unique(base_mood)), mood_previous_week_fill = vec_fill_missing( mood_previous_week, direction = \"down\" ) ) cocaine_relapse_2_prevweek #> # A tibble: 1,248 × 9 #> # Groups: id [104] #> id censor days needle base_mood followup mood mood_previous_week #> #> 1 550 1 83 1 29 1 23 29 #> 2 550 1 83 1 29 2 27 23 #> 3 550 1 83 1 29 3 28 27 #> 4 550 1 83 1 29 4 31 28 #> 5 550 1 83 1 29 5 29 31 #> 6 550 1 83 1 29 6 32 29 #> 7 550 1 83 1 29 7 33 32 #> 8 550 1 83 1 29 8 28 33 #> 9 550 1 83 1 29 9 36 28 #> 10 550 1 83 1 29 10 33 36 #> # ℹ 1,238 more rows #> # ℹ 1 more variable: mood_previous_week_fill cocaine_relapse_2_prevweek_cp <- cocaine_relapse_2_pl |> group_by(id) |> reframe( # For censored data the final day should be a participant's days value, so # we need to concatenate their days to the vector of event times. The call # to unique() around the vector removes the duplicate for uncensored data in # the final time interval. day_end = unique(c(event_times[event_times <= days], days)), day_start = lag(day_end, default = 0), event = if_else(day_end == days & censor == 0, true = 1, false = 0), week = floor(day_end / 7) + 1 ) |> relocate(day_start, .after = id) cocaine_relapse_2_prevweek_cp #> # A tibble: 2,805 × 5 #> id day_start day_end event week #> #> 1 501 0 1 0 1 #> 2 501 1 2 0 1 #> 3 501 2 3 0 1 #> 4 501 3 4 0 1 #> 5 501 4 6 0 1 #> 6 501 6 7 0 2 #> 7 501 7 8 0 2 #> 8 501 8 9 0 2 #> 9 501 9 10 0 2 #> 10 501 10 11 0 2 #> # ℹ 2,795 more rows cocaine_relapse_2_prevweek_cp <- cocaine_relapse_2_prevweek_cp |> left_join( cocaine_relapse_2_prevweek, by = join_by(id == id, week == followup) ) |> rename(week_mood = mood_previous_week_fill) cocaine_relapse_2_prevweek_cp #> # A tibble: 2,805 × 12 #> id day_start day_end event week censor days needle base_mood mood #> #> 1 501 0 1 0 1 0 12 1 29 34 #> 2 501 1 2 0 1 0 12 1 29 34 #> 3 501 2 3 0 1 0 12 1 29 34 #> 4 501 3 4 0 1 0 12 1 29 34 #> 5 501 4 6 0 1 0 12 1 29 34 #> 6 501 6 7 0 2 0 12 1 29 19 #> 7 501 7 8 0 2 0 12 1 29 19 #> 8 501 8 9 0 2 0 12 1 29 19 #> 9 501 9 10 0 2 0 12 1 29 19 #> 10 501 10 11 0 2 0 12 1 29 19 #> # ℹ 2,795 more rows #> # ℹ 2 more variables: mood_previous_week , week_mood model_B <- coxph( Surv(day_start, day_end, event) ~ needle + week_mood, data = cocaine_relapse_2_prevweek_cp, ties = \"efron\" ) summary(model_B) #> Call: #> coxph(formula = Surv(day_start, day_end, event) ~ needle + week_mood, #> data = cocaine_relapse_2_prevweek_cp, ties = \"efron\") #> #> n= 2805, number of events= 62 #> #> coef exp(coef) se(coef) z Pr(>|z|) #> needle 1.07959 2.94348 0.31574 3.419 0.000628 *** #> week_mood -0.03490 0.96570 0.01387 -2.517 0.011832 * #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> exp(coef) exp(-coef) lower .95 upper .95 #> needle 2.9435 0.3397 1.5853 5.4654 #> week_mood 0.9657 1.0355 0.9398 0.9923 #> #> Concordance= 0.662 (se = 0.037 ) #> Likelihood ratio test= 18.61 on 2 df, p=9e-05 #> Wald test = 16.62 on 2 df, p=2e-04 #> Score (logrank) test = 17.49 on 2 df, p=2e-04"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-15.html","id":"model-c","dir":"Articles","previous_headings":"15.1 Time-Varying Predictors > 15.1.3 Imputation Strategies for Time-Varying Predictors","what":"Model C:","title":"Chapter 15: Extending the Cox regression model","text":"Model C also start lagged weekly mood scores, however, use different imputation strategy: Interpolating adjacent mood scores. Although Singer Willett (2003) suggest “resisting temptation design sophisticated imputation algorithms,” approach interpolating adjacent mood scores somewhat complex. Consequently, need create function suit purpose, rather using existing functions like zoo::na.approx() imputeTS::na_ma(). Singer Willett (2003) describe approach text, algorithm appears based following rules: Trailing NAs imputed consecutively using recent non-missing mood score. Internal NAs imputed using mean adjacent non-missing mood scores. consecutive internal NAs, following first imputed mood score sequence, every NA thereafter imputed using mean previous NA value’s imputed mood score next non-missing mood score. Imputed mood scores rounded nearest integer. Now can impute lagged weekly mood scores using na_adjacent() function. Next, prepare cocaine_relapse_2_adjacent data modelling, transform counting process format; however, Model C “-risk” time interval one day long. Following Singer Willett (2003), construct day_mood variable linearly interpolating adjacent weekly values yield daily values, assigning given day mood value imputed immediate prior day. Now can fit Model C. Table 15.2, page 555:","code":"na_adjacent <- function(x) { # The while loop is used here to allow us to carry forward imputed mood scores # for consecutive internal NAs. x_avg <- x while (any(is.na(x_avg[2:length(x)]))) { x_avg <- pslide_dbl( list( x_avg, vec_fill_missing(x_avg, direction = \"down\"), vec_fill_missing(x_avg, direction = \"up\") ), \\(.x, .x_fill_down, .x_fill_up) { case_when( # Rule 1: all(is.na(.x[3:length(.x)])) ~ .x_fill_down[2], # Rule 2: !is.na(.x[1]) & is.na(.x[2]) ~ mean(c(.x_fill_up[1], .x_fill_up[2])), TRUE ~ .x[2] ) }, .before = 1, .after = Inf, .complete = TRUE ) # Rule 3. We are not using round() here because it goes to the even digit when # rounding off a 5, rather than always going upward. x_avg <- if_else(x_avg %% 1 < .5, floor(x_avg), ceiling(x_avg)) x_avg[1] <- x[1] } x_avg } cocaine_relapse_2_adjacent <- cocaine_relapse_2_prevweek |> group_by(id) |> mutate( # It's important to include the final follow-up when imputing between # adjacent mood scores, otherwise cases where the second last score is an # internal NA will fill down instead of using the mean between adjacent mood # scores. However, afterwards the final follow-up can be dropped. mood_adjacent_lag = na_adjacent(c(mood_previous_week, last(mood)))[-13], # We also want the non-lagged mood scores for later, which we impute using # similar logic. mood_adjacent = na_adjacent(c(first(mood_previous_week), mood))[-1] ) # Here is a small preview of the difference between the imputation strategies # for Models B and C: cocaine_relapse_2_adjacent |> filter(id == 544) |> select(id, followup, mood_previous_week:mood_adjacent_lag) #> # A tibble: 12 × 5 #> # Groups: id [1] #> id followup mood_previous_week mood_previous_week_fill mood_adjacent_lag #> #> 1 544 1 40 40 40 #> 2 544 2 40 40 40 #> 3 544 3 38 38 38 #> 4 544 4 27 27 27 #> 5 544 5 NA 27 25 #> 6 544 6 22 22 22 #> 7 544 7 NA 22 21 #> 8 544 8 NA 22 21 #> 9 544 9 20 20 20 #> 10 544 10 NA 20 25 #> 11 544 11 30 30 30 #> 12 544 12 28 28 28 cocaine_relapse_2_adjacent_cp <- cocaine_relapse_2_adjacent |> group_by(id, followup) |> reframe( day_end = (followup - 1) * 7 + 1:7, day_start = day_end - 1, days = unique(days), censor = unique(censor), event = if_else( day_end == days & censor == 0, true = 1, false = 0 ), needle = unique(needle), mood_day = approx(c(mood_adjacent_lag, mood_adjacent), n = 8)[[2]][1:7], ) |> relocate(day_start, day_end, days, .after = id) |> filter(day_end <= days) cocaine_relapse_2_adjacent_cp #> # A tibble: 4,948 × 9 #> id day_start day_end days followup censor event needle mood_day #> #> 1 501 0 1 12 1 0 0 1 29 #> 2 501 1 2 12 1 0 0 1 29.7 #> 3 501 2 3 12 1 0 0 1 30.4 #> 4 501 3 4 12 1 0 0 1 31.1 #> 5 501 4 5 12 1 0 0 1 31.9 #> 6 501 5 6 12 1 0 0 1 32.6 #> 7 501 6 7 12 1 0 0 1 33.3 #> 8 501 7 8 12 2 0 0 1 34 #> 9 501 8 9 12 2 0 0 1 31.9 #> 10 501 9 10 12 2 0 0 1 29.7 #> # ℹ 4,938 more rows model_C <- coxph( Surv(day_start, day_end, event) ~ needle + mood_day, data = cocaine_relapse_2_adjacent_cp, ties = \"efron\" ) summary(model_C) #> Call: #> coxph(formula = Surv(day_start, day_end, event) ~ needle + mood_day, #> data = cocaine_relapse_2_adjacent_cp, ties = \"efron\") #> #> n= 4948, number of events= 62 #> #> coef exp(coef) se(coef) z Pr(>|z|) #> needle 1.12077 3.06720 0.31700 3.536 0.000407 *** #> mood_day -0.05438 0.94707 0.01489 -3.651 0.000261 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> exp(coef) exp(-coef) lower .95 upper .95 #> needle 3.0672 0.326 1.6478 5.7091 #> mood_day 0.9471 1.056 0.9198 0.9751 #> #> Concordance= 0.695 (se = 0.036 ) #> Likelihood ratio test= 25.52 on 2 df, p=3e-06 #> Wald test = 23.1 on 2 df, p=1e-05 #> Score (logrank) test = 24.04 on 2 df, p=6e-06 # TODO"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-15.html","id":"nonproportional-hazards-models-via-stratification","dir":"Articles","previous_headings":"","what":"15.2 Nonproportional Hazards Models via Stratification","title":"Chapter 15: Extending the Cox regression model","text":"Figure 15.2, page 559: Table 15.3, page 560:","code":"# FIXME: The upper limit of the data doesn't match the textbook. survfit(Surv(used_cocaine_age, 1 - censor) ~ rural, data = first_cocaine) |> tidy() |> mutate( strata = stringr::str_remove(strata, \"rural=\"), cumulative_hazard = -log(estimate), log_cumulative_hazard = log(cumulative_hazard) ) |> rename(rural = strata) |> ggplot(aes(x = time, y = log_cumulative_hazard, linetype = rural)) + geom_line() + coord_cartesian(ylim = c(-6, -1)) # first_cocaine_model_C from earlier is the first model first_cocaine_model_C #> Call: #> coxph(formula = Surv(age_start, age_end, used_cocaine) ~ birthyr + #> used_marijuana + used_drugs + sold_marijuana + sold_drugs, #> data = first_cocaine_pp_C, ties = \"efron\") #> #> coef exp(coef) se(coef) z p #> birthyr 0.08493 1.08864 0.02183 3.890 1e-04 #> used_marijuana 2.45920 11.69542 0.28357 8.672 < 2e-16 #> used_drugs 1.25110 3.49419 0.15656 7.991 1.34e-15 #> sold_marijuana 0.68989 1.99349 0.12263 5.626 1.84e-08 #> sold_drugs 0.76037 2.13908 0.13066 5.819 5.91e-09 #> #> Likelihood ratio test=944.5 on 5 df, p=< 2.2e-16 #> n= 3312, number of events= 382 first_cocaine_model_stratified <- update( first_cocaine_model_C, . ~ . + strata(rural) ) first_cocaine_model_nonrural <- update( first_cocaine_model_C, subset = rural == 0 ) first_cocaine_model_rural <- update( first_cocaine_model_C, subset = rural == 1 ) # TODO: Make table."},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-15.html","id":"nonproportional-hazards-models-via-interactions-with-time","dir":"Articles","previous_headings":"","what":"15.3 Nonproportional Hazards Models via Interactions with Time","title":"Chapter 15: Extending the Cox regression model","text":"Table 15.4 page 566: Figure 15.3 page 567:","code":"# TODO psychiatric_discharge #> # A tibble: 174 × 4 #> id days censor treatment_plan #> #> 1 2 3 0 1 #> 2 8 46 0 0 #> 3 73 30 0 0 #> 4 76 45 0 0 #> 5 78 22 0 0 #> 6 79 50 0 0 #> 7 81 59 0 0 #> 8 83 44 0 0 #> 9 95 44 0 1 #> 10 117 22 0 0 #> # ℹ 164 more rows # FIXME: The upper limit of the data doesn't match the textbook. survfit(Surv(days, 1 - censor) ~ treatment_plan, data = psychiatric_discharge) |> tidy() |> mutate( strata = stringr::str_remove(strata, \"treatment_plan=\"), cumulative_hazard = -log(estimate), log_cumulative_hazard = log(cumulative_hazard) ) |> rename(treatment_plan = strata) |> ggplot(aes(x = time, y = log_cumulative_hazard, linetype = treatment_plan)) + geom_hline(yintercept = 0, linewidth = .25, linetype = 3) + geom_line() + coord_cartesian(xlim = c(0, 77), ylim = c(-4, 2)) # TODO: Bottom panel"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-15.html","id":"regression-diagnostics","dir":"Articles","previous_headings":"","what":"15.4 Regression Diagnostics","title":"Chapter 15: Extending the Cox regression model","text":"Figure 15.4 page 573: Figure 15.5 page 577: Figure 15.6 page 580: Figure 15.7 page 583:","code":"rearrest <- rearrest |> mutate(rank_time = rank(months, ties.method = \"average\"), .after = \"months\") rearrest_null_model <- coxph(Surv(months, 1 - censor) ~ 1, data = rearrest) rearrest_full_model <- update( rearrest_null_model, . ~ . + person_crime + property_crime + age ) rearrest_models <- list( null = rearrest_null_model, full = rearrest_full_model ) rearrest_fits <- rearrest_models |> map( \\(.x) { map_df( list(martingale = \"martingale\", deviance = \"deviance\"), \\(.y) augment( .x, data = rearrest, type.predict = \"lp\", type.residuals = .y ), .id = \".resid_type\" ) } ) |> list_rbind(names_to = \"model\") |> mutate( model = factor(model, levels = c(\"null\", \"full\")), censored = as.logical(censor) ) rearrest_fits |> filter(.resid_type == \"martingale\") |> ggplot(aes(x = age, y = .resid)) + geom_hline(yintercept = 0, linewidth = .25, linetype = 3) + geom_point(aes(shape = censored)) + scale_shape_manual(values = c(16, 3)) + geom_smooth(se = FALSE) + facet_wrap(vars(model), ncol = 1, labeller = label_both) + coord_cartesian(ylim = c(-3, 1)) #> `geom_smooth()` using method = 'loess' and formula = 'y ~ x' stem(resid(rearrest_full_model, type = \"deviance\"), scale = 2) #> #> The decimal point is 1 digit(s) to the left of the | #> #> -22 | 09 #> -20 | 21 #> -18 | 6491 #> -16 | 969321 #> -14 | 54216 #> -12 | 87654741 #> -10 | 208776542110 #> -8 | 99852176544 #> -6 | 876322098874400 #> -4 | 444443332877644210 #> -2 | 85009955411 #> -0 | 997597551 #> 0 | 268979 #> 2 | 0318 #> 4 | 133678802337 #> 6 | 2688919 #> 8 | 1136690012233889999 #> 10 | 122334724789 #> 12 | 0115 #> 14 | 0228055578 #> 16 | 03795 #> 18 | 2336 #> 20 | 0735 #> 22 | 6 #> 24 | 00 #> 26 | 7 rearrest_fits |> filter(model == \"full\" & .resid_type == \"deviance\") |> ggplot(aes(x = .fitted, y = .resid)) + geom_hline(yintercept = 0, linewidth = .25, linetype = 3) + geom_point(aes(shape = censored)) + scale_shape_manual(values = c(16, 3)) + scale_x_continuous(breaks = -3:3) + scale_y_continuous(breaks = -3:3) + coord_cartesian(xlim = c(-3, 3), ylim = c(-3, 3)) # augment.coxph is bugged and won't return the .resid column when using # `newdata`, likely related to this issue: https://github.com/tidymodels/broom/issues/937 # So this code doesn't work: # augment( # rearrest_full_model, # newdata = filter(rearrest, censor == 0), # type.predict = \"lp\", # type.residuals = \"schoenfeld\" # ) # Likewise, `data` can't be used because it expects the full dataset; thus, it # will error out even when using the filtered data. # However, updating the model first does work: # Schoenfeld residuals only pertain to those who experience the event, so we need # to update the model before retrieving them, and only use a subset of the data # when getting predictions. rearrest_full_model |> update(subset = censor == 0) |> augment( data = filter(rearrest, censor == 0), type.predict = \"lp\", type.residuals = \"schoenfeld\" ) |> mutate(.resid = as.data.frame(.resid)) |> unnest_wider(col = .resid, names_sep = \"_\") |> pivot_longer( cols = starts_with(\".resid\"), names_to = \"predictor\", values_to = \".resid\" ) |> mutate( predictor = stringr::str_remove(predictor, \".resid_\"), predictor = factor( predictor, levels = c(\"person_crime\", \"property_crime\", \"age\") ) ) |> ggplot(aes(x = rank_time, y = .resid)) + geom_hline(yintercept = 0, linewidth = .25, linetype = 3) + geom_point() + scale_shape_manual(values = c(16, 3)) + geom_smooth(se = FALSE, span = 1) + facet_wrap( vars(predictor), ncol = 1, scales = \"free_y\", labeller = label_both ) + scale_x_continuous(n.breaks = 8) + ggh4x::facetted_pos_scales( y = list( predictor == \"person_crime\" ~ scale_y_continuous(limits = c(-.5, 1)), predictor == \"property_crime\" ~ scale_y_continuous( n.breaks = 7, limits = c(-1, .2) ), predictor == \"age\" ~ scale_y_continuous(limits = c(-10, 20)) ) ) + coord_cartesian(xlim = c(0, 175)) #> `geom_smooth()` using method = 'loess' and formula = 'y ~ x' # TODO: set y-axis scales to match textbook. rearrest_full_model |> augment( data = rearrest, type.predict = \"lp\", type.residuals = \"score\" ) |> mutate(.resid = as.data.frame(.resid)) |> unnest_wider(col = .resid, names_sep = \"_\") |> pivot_longer( cols = starts_with(\".resid\"), names_to = \"predictor\", values_to = \".resid\" ) |> mutate( predictor = stringr::str_remove(predictor, \".resid_\"), predictor = factor( predictor, levels = c(\"person_crime\", \"property_crime\", \"age\") ), censored = as.logical(censor) ) |> ggplot(aes(x = rank_time, y = .resid)) + geom_hline(yintercept = 0, linewidth = .25, linetype = 3) + geom_point(aes(shape = censored)) + scale_shape_manual(values = c(16, 3)) + facet_wrap( vars(predictor), ncol = 1, scales = \"free_y\", labeller = label_both )"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-15.html","id":"competing-risks","dir":"Articles","previous_headings":"","what":"15.5 Competing Risks","title":"Chapter 15: Extending the Cox regression model","text":"Figure 15.8 page 589: Table 15.7 page 592:","code":"judges_null_models <- list( dead = survfit(Surv(tenure, dead) ~ 1, data = judges), retired = survfit(Surv(tenure, retired) ~ 1, data = judges) ) judges_null_models_tidy <- map( judges_null_models, \\(.x) { .x |> survfit0() |> tidy() |> mutate(cumulative_hazard = -log(estimate)) |> select(time, survival = estimate, cumulative_hazard) |> pivot_longer( cols = c(survival, cumulative_hazard), names_to = \"statistic\", values_to = \"estimate\" ) } ) # Estimate and tidy smoothed hazards judges_kernel_smoothed_hazards_tidy <- map( list( judges_dead = judges$dead, judges_retired = judges$retired ), \\(event) { kernel_smoothed_hazard <- muhaz( judges$tenure, event, min.time = min(judges$tenure[event == 0]) + 6, max.time = max(judges$tenure[event == 0]) - 6, bw.grid = 6, bw.method = \"global\", b.cor = \"none\", kern = \"epanechnikov\" ) kernel_smoothed_hazard |> tidy() |> mutate(statistic = \"hazard\") } ) #> Warning in muhaz(judges$tenure, event, min.time = min(judges$tenure[event == : minimum time > minimum Survival Time #> Warning in muhaz(judges$tenure, event, min.time = min(judges$tenure[event == : minimum time > minimum Survival Time # Combine estimates estimates_tidy <- map2_df( judges_null_models_tidy, judges_kernel_smoothed_hazards_tidy, \\(.x, .y) { bind_rows(.x, .y) |> mutate(statistic = factor( statistic, levels = c(\"survival\", \"cumulative_hazard\", \"hazard\")) ) }, .id = \"event\" ) ggplot(estimates_tidy, aes(x = time, y = estimate, linetype = event)) + geom_step(data = \\(.x) filter(.x, statistic != \"hazard\")) + geom_line(data = \\(.x) filter(.x, statistic == \"hazard\")) + facet_wrap(vars(statistic), ncol = 1, scales = \"free_y\") judges_model_A <- coxph( Surv(tenure, dead) ~ appointment_age + appointment_year, data = judges ) judges_model_B <- coxph( Surv(tenure, retired) ~ appointment_age + appointment_year, data = judges ) judges_model_C <- coxph( Surv(tenure, left_appointment) ~ appointment_age + appointment_year, data = judges ) # TODO: Make table."},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-15.html","id":"late-entry-into-the-risk-set","dir":"Articles","previous_headings":"","what":"15.6 Late Entry into the Risk Set","title":"Chapter 15: Extending the Cox regression model","text":"Table 15.8 page 601:","code":"# Model A ---- # First we need to transform to a counting process format. physicians_event_times_A <- physicians |> filter(1 - censor == 1) |> pull(exit) |> unique() |> sort() # We'll use survSplit() this time around. physicians_cp_A <- physicians |> mutate(event = 1 - censor) |> survSplit( Surv(entry, exit, event) ~ ., data = _, cut = physicians_event_times_A, end = \"exit\" ) |> as_tibble() # The warning message here can be ignored. physicians_model_A <- coxph( Surv(entry, exit, event) ~ part_time + age + age:exit, data = physicians_cp_A ) #> Warning in coxph(Surv(entry, exit, event) ~ part_time + age + age:exit, : a #> variable appears on both the left and right sides of the formula # Model B ---- physicians_event_times_B <- physicians |> filter(1 - censor == 1 & entry == 0) |> pull(exit) |> unique() |> sort() physicians_cp_B <- physicians |> filter(entry == 0) |> mutate(event = 1 - censor) |> survSplit( Surv(entry, exit, event) ~ ., data = _, cut = physicians_event_times_B, end = \"exit\" ) |> as_tibble() physicians_model_B <- coxph( Surv(entry, exit, event) ~ part_time + age + age:exit, data = physicians_cp_B ) #> Warning in coxph(Surv(entry, exit, event) ~ part_time + age + age:exit, : a #> variable appears on both the left and right sides of the formula # Model C ---- physicians_cp_C <- physicians |> mutate( event = 1 - censor, entry = 0 ) |> survSplit( Surv(entry, exit, event) ~ ., data = _, cut = physicians_event_times_A, end = \"exit\" ) |> as_tibble() physicians_model_C <- coxph( Surv(entry, exit, event) ~ part_time + age + age:exit, data = physicians_cp_C ) #> Warning in coxph(Surv(entry, exit, event) ~ part_time + age + age:exit, : a #> variable appears on both the left and right sides of the formula # TODO: Make table and clean up code."},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-15.html","id":"using-late-entrants-to-introduce-alternative-metrics-for-clocking-time","dir":"Articles","previous_headings":"15.6 Late Entry into the Risk Set","what":"15.6.2 Using Late Entrants to Introduce Alternative Metrics for Clocking Time","title":"Chapter 15: Extending the Cox regression model","text":"Table 15.9 page 604:","code":"monkeys_model_A <- coxph( Surv(sessions, 1 - censor) ~ initial_age + birth_weight + female, data = monkeys ) monkeys_model_B <- update(monkeys_model_A, Surv(end_age, 1 - censor) ~ .) # The warning message here can be ignored. monkeys_model_C <- update( monkeys_model_A, Surv(initial_age, end_age, 1 - censor) ~ . ) #> Warning in coxph(formula = Surv(initial_age, end_age, 1 - censor) ~ initial_age #> + : a variable appears on both the left and right sides of the formula # TODO: Make table."},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"creating-a-longitudinal-data-set","dir":"Articles","previous_headings":"","what":"2.1 Creating a longitudinal data set","title":"Chapter 2: Exploring longitudinal data on change","text":"Section 2.1 Singer Willett (2003) introduce two distinct formats data organization longitudinal data—person-level format person-period format—using subset data National Youth Survey (NYS) measuring development tolerance towards deviant behaviour adolescents time relation self-reported sex exposure deviant peers (Raudenbush & Chan, 1992). Adolescents’ tolerance towards deviant behaviour based 9-item scale measuring attitudes tolerant deviant behaviour. scale administered year age 11 15 time-varying variable. However, adolescents’ self-reported sex exposure deviant peers recorded beginning study period time-invariant variables. example illustrate difference two formats using deviant_tolerance_pl deviant_tolerance_pp data sets, correspond adolescent tolerance deviant behaviour data organized person-level person-period formats, respectively.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"the-person-level-data-set","dir":"Articles","previous_headings":"2.1 Creating a longitudinal data set","what":"The Person-Level Data Set","title":"Chapter 2: Exploring longitudinal data on change","text":"person-level format (also known wide multivariate format), person one row data multiple columns containing data measurement occasion time-varying variables. demonstrated deviant_tolerance_pl data set, person-level data frame 16 rows 8 columns: id: Participant ID. tolerance_11, tolerance_12, tolerance_13, tolerance_14, tolerance_15: Average score across 9-item scale assessing attitudes favourable deviant behaviour ages 11, 12, 13, 14, 15. item used four point scale (1 = wrong, 2 = wrong, 3 = little bit wrong, 4 = wrong ). male: Binary indicator whether adolescent male. exposure: Average score across 9-item scale assessing level exposure deviant peers. item used five point Likert score (ranging 0 = none, 4 = ). Although person-level format common cross-sectional research, four disadvantages make ill-suited longitudinal data analysis: restricts data analysis examining rank order wave--wave relationships, leading non-informative summaries tell us person changes time, even direction change. omits explicit time-indicator variable, rendering time unavailable data analysis. requires adding additional variable data set unique measurement occasion, making inefficient useless number spacing measurement occasions varies across individuals. requires adding additional set columns time-varying predictor (one column per measurement occasion), rendering unable easily handle presence time-varying predictors. Singer Willett (2003) exemplify first disadvantages postulating one might analyze person-level tolerance towards deviant behaviour data set. natural approach summarize wave--wave relationships among tolerance_11 tolerance_15 using bivariate correlations bivariate plots; however, tell us anything adolescent tolerance towards deviant behaviour changed time either individuals groups. Rather, weak positive correlation measurement occasions merely tells us rank order tolerance towards deviant behaviour remained relatively stable across occasions—, adolescents tolerant towards deviant behaviour one measurement occasion tended tolerant next. first disadvantage also apparent examining bivariate plots measurement occasions: way tell adolescent tolerance towards deviant behaviour changed time either individuals groups. Moreover, lack explicit time-indicator variable, isn’t possible plot person-level data set meaningful way, time series plot organized id. Considered together, disadvantages make person-level format ill-suited longitudinal data analyses. Fortunately, disadvantages person-level format can addressed simple conversion person-period format.","code":"deviant_tolerance_pl #> # A tibble: 16 × 8 #> id tolerance_11 tolerance_12 tolerance_13 tolerance_14 tolerance_15 male #> #> 1 9 2.23 1.79 1.9 2.12 2.66 0 #> 2 45 1.12 1.45 1.45 1.45 1.99 1 #> 3 268 1.45 1.34 1.99 1.79 1.34 1 #> 4 314 1.22 1.22 1.55 1.12 1.12 0 #> 5 442 1.45 1.99 1.45 1.67 1.9 0 #> 6 514 1.34 1.67 2.23 2.12 2.44 1 #> 7 569 1.79 1.9 1.9 1.99 1.99 0 #> 8 624 1.12 1.12 1.22 1.12 1.22 1 #> 9 723 1.22 1.34 1.12 1 1.12 0 #> 10 918 1 1 1.22 1.99 1.22 0 #> 11 949 1.99 1.55 1.12 1.45 1.55 1 #> 12 978 1.22 1.34 2.12 3.46 3.32 1 #> 13 1105 1.34 1.9 1.99 1.9 2.12 1 #> 14 1542 1.22 1.22 1.99 1.79 2.12 0 #> 15 1552 1 1.12 2.23 1.55 1.55 0 #> 16 1653 1.11 1.11 1.34 1.55 2.12 0 #> # ℹ 1 more variable: exposure # Table 2.1, page 20: deviant_tolerance_pl |> select(starts_with(\"tolerance\")) |> correlate(diagonal = 1) |> shave() |> fashion() #> term tolerance_11 tolerance_12 tolerance_13 tolerance_14 tolerance_15 #> 1 tolerance_11 1.00 #> 2 tolerance_12 .66 1.00 #> 3 tolerance_13 .06 .25 1.00 #> 4 tolerance_14 .14 .21 .59 1.00 #> 5 tolerance_15 .26 .39 .57 .83 1.00 deviant_tolerance_pl |> select(starts_with(\"tolerance\")) |> pairs()"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"the-person-period-data-set","dir":"Articles","previous_headings":"2.1 Creating a longitudinal data set","what":"The Person-Period Data Set","title":"Chapter 2: Exploring longitudinal data on change","text":"person-period format (also known long univariate format), person one row data measurement occasion, participant identifier variable person, time-indicator variable measurement occasion. format, time-invariant variables identical values across measurement occasion; whereas time-varying variables potentially differing values. demonstrated deviant_tolerance_pp data set, person-period data frame 80 rows 5 columns: id: Participant ID. age: Adolescent age years. tolerance: Average score across 9-item scale assessing attitudes favourable deviant behaviour. item used four point scale (1 = wrong, 2 = wrong, 3 = little bit wrong, 4 = wrong ). male: Binary indicator whether adolescent male. exposure: Average score across 9-item scale assessing level exposure deviant peers. item used five point Likert score (ranging 0 = none, 4 = ). Although person-period data set contains information person-level data set, format data organization makes amenable longitudinal data analysis, specifically: includes explicit participant identifier variable, enabling data sorted person-specific subsets. includes explicit time-indicator variable, rendering time available data analysis, accommodating research designs number spacing measurement occasions varies across individuals. needs single column variable data set—whether time-varying time-invariant, outcome predictor—making trivial handle number variables. Indeed, R functions designed work data person-period format—falls larger umbrella tidy data format—due R’s vectorized nature. Wickham, Çetinkaya-Rundel, Grolemund (2023) explain, three interrelated rules make data set tidy: variable must column. observation must row. value must cell. Thus, person-period format simply special case tidy data format, distinguishes longitudinal nature requirements explicit participant identifier time-indicator variables.","code":"deviant_tolerance_pp #> # A tibble: 80 × 5 #> id age tolerance male exposure #> #> 1 9 11 2.23 0 1.54 #> 2 9 12 1.79 0 1.54 #> 3 9 13 1.9 0 1.54 #> 4 9 14 2.12 0 1.54 #> 5 9 15 2.66 0 1.54 #> 6 45 11 1.12 1 1.16 #> 7 45 12 1.45 1 1.16 #> 8 45 13 1.45 1 1.16 #> 9 45 14 1.45 1 1.16 #> 10 45 15 1.99 1 1.16 #> # ℹ 70 more rows"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"converting-between-person-level-and-person-period-data-sets","dir":"Articles","previous_headings":"2.1 Creating a longitudinal data set","what":"Converting Between Person-Level and Person-Period Data Sets","title":"Chapter 2: Exploring longitudinal data on change","text":"Unfortunately, longitudinal data often initially stored person-level data set, meaning real analyses require least little tidying get data person-period format. reasons : Many people aren’t familiar principles tidy data—special cases like person person-period format—’s hard derive without spending lot time working longitudinal data. person-level format closely resembles familiar cross-sectional data-set format, making seemingly sensible default inexperienced analysts. Data often organized facilitate non-analytical goals, data entry, rather data analysis. Thus, essential skill aspiring longitudinal data analyst able convert person-level data set person-period data set. tidyr package provides two functions can easily convert longitudinal data set one format : pivot_longer() pivot_wider(). convert person-level data set person-period data set use pivot_longer(): person-level data, five key arguments: cols specifies columns need pivoted longer format—longitudinal data, always columns corresponding time-varying variables. argument uses tidy selection, small data science language selecting columns data frame (?tidyr_tidy_select), making simple select column time-varying variable based naming pattern. names_to names new column (columns) create information stored column names specified cols. named new column age. names_prefix removes matching text start column name—longitudinal data, always prefix time-varying variables separating variable name measurement occasion. argument uses regular expression select matching text. names_transform applies function new column (columns). converted new column age type character type integer. values_to names new column (columns) create data stored cell values. named new column tolerance. Note \"age\" \"tolerance\" quoted call pivot_longer() represent column names new variables ’re creating, rather already-existing variables data. Although longitudinal data analyses begin getting data person-period format, can occasionally useful go opposite direction. computations can made easier using person-period data set, certain functions analyses expect person-period data set; therefore, ’s helpful know untidy, transform, re-tidy data needed. convert person-period data set person-level data set use dplyr::pivot_wider(): person-period data, three key arguments: names_from specifies column (columns) get name output columns —longitudinal data, always columns corresponding time-indicator variables. names_prefix adds specified string start output column name—longitudinal data, always prefix time-varying variables separating variable name measurement occasion. values_from specifies column (columns) get cell values —longitudinal data, always columns corresponding time-varying variables. learn principles tidy data pivoting works, see Data Tidying chapter R Data Science.","code":"# Figure 2.1, page 18: pivot_longer( deviant_tolerance_pl, cols = starts_with(\"tolerance_\"), names_to = \"age\", names_prefix = \"tolerance_\", names_transform = as.integer, values_to = \"tolerance\" ) #> # A tibble: 80 × 5 #> id male exposure age tolerance #> #> 1 9 0 1.54 11 2.23 #> 2 9 0 1.54 12 1.79 #> 3 9 0 1.54 13 1.9 #> 4 9 0 1.54 14 2.12 #> 5 9 0 1.54 15 2.66 #> 6 45 1 1.16 11 1.12 #> 7 45 1 1.16 12 1.45 #> 8 45 1 1.16 13 1.45 #> 9 45 1 1.16 14 1.45 #> 10 45 1 1.16 15 1.99 #> # ℹ 70 more rows pivot_wider( deviant_tolerance_pp, names_from = age, names_prefix = \"tolerance_\", values_from = tolerance ) #> # A tibble: 16 × 8 #> id male exposure tolerance_11 tolerance_12 tolerance_13 tolerance_14 #> #> 1 9 0 1.54 2.23 1.79 1.9 2.12 #> 2 45 1 1.16 1.12 1.45 1.45 1.45 #> 3 268 1 0.9 1.45 1.34 1.99 1.79 #> 4 314 0 0.81 1.22 1.22 1.55 1.12 #> 5 442 0 1.13 1.45 1.99 1.45 1.67 #> 6 514 1 0.9 1.34 1.67 2.23 2.12 #> 7 569 0 1.99 1.79 1.9 1.9 1.99 #> 8 624 1 0.98 1.12 1.12 1.22 1.12 #> 9 723 0 0.81 1.22 1.34 1.12 1 #> 10 918 0 1.21 1 1 1.22 1.99 #> 11 949 1 0.93 1.99 1.55 1.12 1.45 #> 12 978 1 1.59 1.22 1.34 2.12 3.46 #> 13 1105 1 1.38 1.34 1.9 1.99 1.9 #> 14 1542 0 1.44 1.22 1.22 1.99 1.79 #> 15 1552 0 1.04 1 1.12 2.23 1.55 #> 16 1653 0 1.25 1.11 1.11 1.34 1.55 #> # ℹ 1 more variable: tolerance_15 "},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"descriptive-analysis-of-individual-change-over-time","dir":"Articles","previous_headings":"","what":"2.2 Descriptive analysis of individual change over time","title":"Chapter 2: Exploring longitudinal data on change","text":"Section 2.2 Singer Willett (2003) use deviant_tolerance_pp data set demonstrate person-period format facilitates exploratory analyses describe individuals data set change time, revealing nature idiosyncrasies person’s temporal pattern change.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"empirical-growth-plots","dir":"Articles","previous_headings":"2.2 Descriptive analysis of individual change over time","what":"Empirical Growth Plots","title":"Chapter 2: Exploring longitudinal data on change","text":"Empirical growth plots show, individual, sequence change time-varying variable. change can evaluated either absolute terms scale variable interest, relative terms comparison sample members. Singer Willett (2003) identify several questions helpful answer examining empirical growth plots: increasing? decreasing? increasing ? least? decreasing ? least? anyone increase decrease? anyone decrease increase? construct empirical growth plot ggplot2 package, put time-indicator x-axis, time-varying variable y-axis, facet individual separate panel. data set large, Singer Willett (2003) suggest constructing empirical growth plots randomly selected subsample individuals—perhaps stratified groups defined values important predictors—rather using entire sample. task can easily accomplished using filter() function dplyr package prior plotting. example, sample four random adolescents. Note use set.seed() function prior sampling, sets state R’s random number generator: results random sample reproducible. approach can also extended randomly select subsample individuals within different strata combining group_split() function dplyr package split data list different groups, map() function purrr package apply filter() call previous example group. example, sample two random adolescent males two random adolescent females, combine filtered data frames list back together using list_rbind() function purrr package.","code":"# Figure 2.2, page 25: deviant_tolerance_empgrowth <- deviant_tolerance_pp |> ggplot(aes(x = age, y = tolerance)) + geom_point() + coord_cartesian(ylim = c(0, 4)) + facet_wrap(vars(id), labeller = label_both) deviant_tolerance_empgrowth set.seed(345) deviant_tolerance_pp |> filter(id %in% sample(unique(id), size = 4)) #> # A tibble: 20 × 5 #> id age tolerance male exposure #> #> 1 268 11 1.45 1 0.9 #> 2 268 12 1.34 1 0.9 #> 3 268 13 1.99 1 0.9 #> 4 268 14 1.79 1 0.9 #> 5 268 15 1.34 1 0.9 #> 6 442 11 1.45 0 1.13 #> 7 442 12 1.99 0 1.13 #> 8 442 13 1.45 0 1.13 #> 9 442 14 1.67 0 1.13 #> 10 442 15 1.9 0 1.13 #> 11 569 11 1.79 0 1.99 #> 12 569 12 1.9 0 1.99 #> 13 569 13 1.9 0 1.99 #> 14 569 14 1.99 0 1.99 #> 15 569 15 1.99 0 1.99 #> 16 1105 11 1.34 1 1.38 #> 17 1105 12 1.9 1 1.38 #> 18 1105 13 1.99 1 1.38 #> 19 1105 14 1.9 1 1.38 #> 20 1105 15 2.12 1 1.38 set.seed(123) deviant_tolerance_pp |> group_split(male) |> map(\\(.group) filter(.group, id %in% sample(unique(id), size = 2))) |> list_rbind() #> # A tibble: 20 × 5 #> id age tolerance male exposure #> #> 1 442 11 1.45 0 1.13 #> 2 442 12 1.99 0 1.13 #> 3 442 13 1.45 0 1.13 #> 4 442 14 1.67 0 1.13 #> 5 442 15 1.9 0 1.13 #> 6 918 11 1 0 1.21 #> 7 918 12 1 0 1.21 #> 8 918 13 1.22 0 1.21 #> 9 918 14 1.99 0 1.21 #> 10 918 15 1.22 0 1.21 #> 11 268 11 1.45 1 0.9 #> 12 268 12 1.34 1 0.9 #> 13 268 13 1.99 1 0.9 #> 14 268 14 1.79 1 0.9 #> 15 268 15 1.34 1 0.9 #> 16 514 11 1.34 1 0.9 #> 17 514 12 1.67 1 0.9 #> 18 514 13 2.23 1 0.9 #> 19 514 14 2.12 1 0.9 #> 20 514 15 2.44 1 0.9"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"using-a-trajectory-to-summarize-each-persons-empirical-growth-record","dir":"Articles","previous_headings":"2.2 Descriptive analysis of individual change over time","what":"Using a Trajectory to Summarize Each Person’s Empirical Growth Record","title":"Chapter 2: Exploring longitudinal data on change","text":"person’s empirical growth record can summarized applying two standardized approaches: nonparametric approach uses nonparametric smooths summarize person’s pattern change time graphically without imposing specific functional form. primary advantage nonparametric approach requires assumptions. parametric approach uses separate parametric models fit person’s data summarize pattern change time. model uses common functional form trajectories (e.g., straight line, quadratic curve, etc.). primary advantage parametric approach provides numeric summaries trajectories can used exploration. Singer Willett (2003) recommend using approaches—beginning nonparametric approach—examining smoothed trajectories help select common functional form trajectories parametric approach.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"the-nonparametric-approach","dir":"Articles","previous_headings":"2.2 Descriptive analysis of individual change over time > Using a Trajectory to Summarize Each Person’s Empirical Growth Record","what":"The Nonparametric Approach","title":"Chapter 2: Exploring longitudinal data on change","text":"stat_smooth() function can used add nonparametric smooth layer empirical growth record plot. choice particular smoothing algorithm primarily matter convenience, ’ll use default loess smoother. span argument controls amount smoothing default loess smoother—smaller numbers producing wigglier lines larger numbers producing smoother lines; choose value creates similar smooth textbook figure. Singer Willett (2003) recommend focusing elevation, shape, tilt smoothed trajectories answering questions like: scores hover low, medium, high end scale? everyone change time people remain ? trajectories inflection point plateau? rate change steep shallow? overall functional form trajectory group-level? linear curvilinear? smooth step-like? Answering last question particularly important, help select common functional form trajectories parametric approach.","code":"# Figure 2.3, page 27: deviant_tolerance_empgrowth + stat_smooth(method = \"loess\", se = FALSE, span = .9)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"the-parametric-approach","dir":"Articles","previous_headings":"2.2 Descriptive analysis of individual change over time > Using a Trajectory to Summarize Each Person’s Empirical Growth Record","what":"The Parametric Approach","title":"Chapter 2: Exploring longitudinal data on change","text":"parametric approach, Singer Willett (2003) suggest using following three-step process: Estimate within-person linear model person data set. Collect summary statistics within-person linear model single data set. Add person’s fitted trajectory empirical growth record plot. begin, ’ll use lmList() function lme4 package fit common linear model adolescent data set. model formula lmList() function takes form response ~ terms | group. select straight line common functional form trajectories, age centred age 11. Next ’ll collect summary statistics within-person linear model single data set using tidy() function broom package. However, lmList() returns list models, need apply tidy() call model prior collecting summary statistics single data set. Ironically, also need tidy result tidy() prepare data plotting. Finally, can add person’s fitted trajectory empirical growth record plot using geom_abline() function. However, centred age linear model, need transform scale x-axis empirical growth plot centred well—otherwise ggplot2 able align fitted trajectories correctly. , must create custom transformation object using new_transform() function scales package, defines transformation, inverse, methods generating breaks labels. Alternatively, plan examine parametric trajectories graphically, three-step process suggested Singer Willett (2003) can skipped altogether using stat_smooth() function \"lm\" method. approach also fits within-person linear model person data set; drawback makes awkward (though impossible) access summary statistics model.","code":"deviant_tolerance_fit <- lmList( tolerance ~ I(age - 11) | id, pool = FALSE, data = deviant_tolerance_pp ) # Table 2.2, page 30: summary(deviant_tolerance_fit) #> Call: #> Model: tolerance ~ I(age - 11) | NULL #> Data: deviant_tolerance_pp #> #> Coefficients: #> (Intercept) #> Estimate Std. Error t value Pr(>|t|) #> 9 1.902 0.25194841 7.549165 4.819462e-03 #> 45 1.144 0.13335666 8.578499 3.329579e-03 #> 268 1.536 0.26038049 5.899059 9.725771e-03 #> 314 1.306 0.15265648 8.555156 3.356044e-03 #> 442 1.576 0.20786534 7.581832 4.759898e-03 #> 514 1.430 0.13794927 10.366130 1.915399e-03 #> 569 1.816 0.02572936 70.580844 6.267530e-06 #> 624 1.120 0.04000000 28.000000 1.000014e-04 #> 723 1.268 0.08442748 15.018806 6.407318e-04 #> 918 1.000 0.30444376 3.284679 4.626268e-02 #> 949 1.728 0.24118043 7.164760 5.600382e-03 #> 978 1.028 0.31995000 3.213002 4.884420e-02 #> 1105 1.538 0.15115555 10.174949 2.022903e-03 #> 1542 1.194 0.18032748 6.621287 7.015905e-03 #> 1552 1.184 0.37355321 3.169562 5.049772e-02 #> 1653 0.954 0.13925516 6.850734 6.366647e-03 #> I(age - 11) #> Estimate Std. Error t value Pr(>|t|) #> 9 0.119 0.10285751 1.1569404 0.33105320 #> 45 0.174 0.05444263 3.1960249 0.04948216 #> 268 0.023 0.10629989 0.2163690 0.84257784 #> 314 -0.030 0.06232175 -0.4813729 0.66318168 #> 442 0.058 0.08486067 0.6834733 0.54336337 #> 514 0.265 0.05631755 4.7054602 0.01816360 #> 569 0.049 0.01050397 4.6649040 0.01859462 #> 624 0.020 0.01632993 1.2247449 0.30806801 #> 723 -0.054 0.03446738 -1.5666989 0.21516994 #> 918 0.143 0.12428864 1.1505476 0.33330784 #> 949 -0.098 0.09846150 -0.9953129 0.39294486 #> 978 0.632 0.13061904 4.8384983 0.01683776 #> 1105 0.156 0.06170899 2.5279945 0.08557441 #> 1542 0.237 0.07361839 3.2193045 0.04861002 #> 1552 0.153 0.15250246 1.0032625 0.38965538 #> 1653 0.246 0.05685068 4.3271249 0.02275586 deviant_tolerance_tidy <- deviant_tolerance_fit |> map(tidy) |> list_rbind(names_to = \"id\") |> mutate( id = as.factor(id), term = case_when( term == \"(Intercept)\" ~ \"intercept\", term == \"I(age - 11)\" ~ \"slope\" ) ) deviant_tolerance_abline <- deviant_tolerance_tidy |> select(id:estimate) |> pivot_wider(names_from = term, values_from = estimate) deviant_tolerance_abline #> # A tibble: 16 × 3 #> id intercept slope #> #> 1 9 1.90 0.119 #> 2 45 1.14 0.174 #> 3 268 1.54 0.0230 #> 4 314 1.31 -0.0300 #> 5 442 1.58 0.0580 #> 6 514 1.43 0.265 #> 7 569 1.82 0.0490 #> 8 624 1.12 0.0200 #> 9 723 1.27 -0.0540 #> 10 918 1 0.143 #> 11 949 1.73 -0.0980 #> 12 978 1.03 0.632 #> 13 1105 1.54 0.156 #> 14 1542 1.19 0.237 #> 15 1552 1.18 0.153 #> 16 1653 0.954 0.246 transform_centre <- function(subtract) { new_transform( \"centre\", transform = \\(x) x - subtract, inverse = \\(x) x + subtract ) } # Figure 2.5, page 32: deviant_tolerance_empgrowth + geom_abline( aes(intercept = intercept, slope = slope), data = deviant_tolerance_abline ) + scale_x_continuous(transform = transform_centre(11)) deviant_tolerance_empgrowth + stat_smooth(method = \"lm\", se = FALSE)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"exploring-differences-in-change-across-people","dir":"Articles","previous_headings":"","what":"2.3 Exploring differences in change across people","title":"Chapter 2: Exploring longitudinal data on change","text":"explored individual changes time, Section 2.3 Singer Willett (2003) continue deviant_tolerance_pp data set demonstrate three strategies exploring interindividual differences change: Plotting entire set individual trajectories together, along average change trajectory entire group. Individual trajectories can either compared one another examine similarities differences changes across people, average change trajectory compare individual change group change. Conducting descriptive analyses key model parameters, estimated intercepts slopes individual change trajectory models. Exploring relationship change time-invariant predictors. relationship can explored plots statistical modelling.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"plotting-the-entire-set-of-trajectories-together","dir":"Articles","previous_headings":"2.3 Exploring differences in change across people","what":"Plotting the entire set of trajectories together","title":"Chapter 2: Exploring longitudinal data on change","text":"purpose first strategy answer generic questions change, : direction rate change similar different across people? individual change compare group-averaged change trajectory? strategy, Singer Willett (2003) suggest using nonparametric parametric approaches, certain patterns data may somewhat easier interpret using one approach .","code":"deviant_tolerance_grptraj <- map( list(\"loess\", \"lm\"), \\(.method) { deviant_tolerance_pp |> mutate(method = .method) |> ggplot(mapping = aes(x = age, y = tolerance)) + stat_smooth( aes(linewidth = \"individual\", group = id), method = .method, se = FALSE, span = .9 ) + stat_smooth( aes(linewidth = \"average\"), method = .method, se = FALSE, span = .9 ) + scale_linewidth_manual(values = c(2, .25)) + coord_cartesian(ylim = c(0, 4)) + facet_wrap(vars(method), labeller = label_both) + labs(linewidth = \"trajectory\") } ) # Figure 2.6, page 34: wrap_plots(deviant_tolerance_grptraj) + plot_layout(guides = \"collect\", axes = \"collect\")"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"conducting-descriptive-analyses-of-key-model-parameters","dir":"Articles","previous_headings":"2.3 Exploring differences in change across people","what":"Conducting descriptive analyses of key model parameters","title":"Chapter 2: Exploring longitudinal data on change","text":"purpose second strategy answer specific questions behaviour key parameters individual change trajectory models, : average initial status average annual rate change? observed variability initial status annual rate change? relationship initial status annual rate change? strategy, Singer Willett (2003) suggest examining estimated intercepts slopes fitted linear models following three summary statistics: sample mean, summarizes average initial status (intercept) annual rate change (slope) across sample. sample variance standard deviation, summarize amount observed interindividual heterogeneity initial status annual rate change. sample correlation, summarizes strength direction relationship initial status annual rate change. sample mean, variance, standard deviation can computed together tidied model fits saved earlier using combination group_by() summarise() functions dplyr package. sample correlation needs computed separate step, requires additional transformations tidied model fits . use correlate() function corrr package—part tidymodels universe packages—API designed data pipelines mind.","code":"# Table 2.3, page 37: deviant_tolerance_tidy |> group_by(term) |> summarise( mean = mean(estimate), var = var(estimate), sd = sd(estimate) ) #> # A tibble: 2 × 4 #> term mean var sd #> #> 1 intercept 1.36 0.0887 0.298 #> 2 slope 0.131 0.0297 0.172 deviant_tolerance_tidy |> select(id, term, estimate) |> pivot_wider(names_from = term, values_from = estimate) |> select(-id) |> correlate() |> stretch(na.rm = TRUE, remove.dups = TRUE) #> # A tibble: 1 × 3 #> x y r #> #> 1 intercept slope -0.448"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"exploring-the-relationship-between-change-and-time-invariant-predictors","dir":"Articles","previous_headings":"2.3 Exploring differences in change across people","what":"Exploring the relationship between change and time-invariant predictors","title":"Chapter 2: Exploring longitudinal data on change","text":"purpose final strategy answer questions systematic interindividual differences change, : observed (average) initial status (average) annual rate change differ across levels values time-invariant predictors? relationship initial status annual rate change time-invariant predictors? strategy, Singer Willett (2003) suggest using two approaches: Plotting (smoothed) individual growth trajectories, displayed separately groups distinguished important values time-invariant predictors. categorical predictors, level predictor can used. continuous predictors, values can temporarily categorized purpose display. Conducting exploratory analyses relationship change time-invariant predictors, investigating whether estimated intercepts slopes individual change trajectory models vary systematically different predictors.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"the-plotting-approach","dir":"Articles","previous_headings":"2.3 Exploring differences in change across people > Exploring the relationship between change and time-invariant predictors","what":"The plotting approach","title":"Chapter 2: Exploring longitudinal data on change","text":"plotting approach, can adapt code used earlier plot entire set trajectories together, simply changing variable ’ll facet . facet categorical predictor male continuous predictor exposure, split median purposes display. examining plots like , Singer Willett (2003) recommend recommend looking systematic patterns trajectories answer questions like: observed trajectories differ across groups? observed differences appear intercepts slopes? observed trajectories groups heterogeneous others? also wished conduct descriptive analyses key model parameters groups use update() function update refit common linear model different subsets data. store model fits subgroup list ’re easier iterate upon together. example, descriptive analysis intercepts slopes males females.","code":"deviant_tolerance_grptraj_by <- map( list(male = \"male\", exposure = \"exposure\"), \\(.by) { deviant_tolerance_pp |> mutate( exposure = if_else(exposure < median(exposure), \"low\", \"high\"), exposure = factor(exposure, levels = c(\"low\", \"high\")) ) |> ggplot(aes(x = age, y = tolerance)) + stat_smooth( aes(linewidth = \"individual\", group = id), method = \"lm\", se = FALSE, span = .9 ) + stat_smooth( aes(linewidth = \"average\"), method = \"lm\", se = FALSE, span = .9 ) + scale_linewidth_manual(values = c(2, .25)) + coord_cartesian(ylim = c(0, 4)) + facet_wrap(.by, labeller = label_both) + labs(linewidth = \"trajectory\") } ) # Figure 2.7, page 38: wrap_plots(deviant_tolerance_grptraj_by, ncol = 1, guides = \"collect\") tolerance_fit_sex <- list( male = update(deviant_tolerance_fit, subset = male == 1), female = update(deviant_tolerance_fit, subset = male == 0) ) tolerance_fit_exposure <- list( low = update(deviant_tolerance_fit, subset = exposure < 1.145), high = update(deviant_tolerance_fit, subset = exposure >= 1.145) ) tolerance_fit_sex |> map( \\(.fit_sex) { .fit_sex |> map(tidy) |> list_rbind(names_to = \"id\") |> group_by(term) |> summarise( mean = mean(estimate), sd = sd(estimate) ) } ) |> list_rbind(names_to = \"sex\") #> # A tibble: 4 × 4 #> sex term mean sd #> #> 1 male (Intercept) 1.36 0.264 #> 2 male I(age - 11) 0.167 0.238 #> 3 female (Intercept) 1.36 0.338 #> 4 female I(age - 11) 0.102 0.106"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-2.html","id":"the-exploratory-analysis-approach","dir":"Articles","previous_headings":"2.3 Exploring differences in change across people > Exploring the relationship between change and time-invariant predictors","what":"The exploratory analysis approach","title":"Chapter 2: Exploring longitudinal data on change","text":"exploratory analysis approach, Singer Willett (2003) recommend restricting simplest approaches examine relationship change time-invariant predictors—bivariate scatter plots sample correlations. reasoning restriction twofold: statistical models presented chapter intended descriptive exploratory purposes ; estimates known biases make imperfect measures person’s true initial status true rate change. models soon replaced multilevel model change Chapter 3, better-suited modelling longitudinal data. plotting computations, first need add adolescent’s male exposure values deviant_tolerance_tidy data frame. easily done using left_join() function dplyr package, performs mutating join add columns one data frame another, matching observations based keys. join selection columns person-level deviant_tolerance_pl data set: Specifically, id column, exists data frames thus used joining; two time-invariant predictors male exposure. ’ll also create new sex variable, can use instead male plotting. Now can create bivariate scatter plots. Note use .data pronoun inside call aes()—.data pronoun special construct tidyverse allows us treat character vector variable names environment variables, work expected way arguments use non-standard evaluation. learn .data pronoun, see dplyr package’s Programming dplyr vignette. Finally, can compute correlation intercepts slopes individual change trajectory models time-invariant predictors male exposure. use cor() function rather corrr::correlate() since just want return correlation value, correlation data frame.","code":"deviant_tolerance_tidy_2 <- deviant_tolerance_tidy |> left_join(select(deviant_tolerance_pl, id, male, exposure)) |> mutate(sex = if_else(male == 0, \"female\", \"male\")) deviant_tolerance_tidy_2 #> # A tibble: 32 × 9 #> id term estimate std.error statistic p.value male exposure sex #> #> 1 9 intercept 1.90 0.252 7.55 0.00482 0 1.54 female #> 2 9 slope 0.119 0.103 1.16 0.331 0 1.54 female #> 3 45 intercept 1.14 0.133 8.58 0.00333 1 1.16 male #> 4 45 slope 0.174 0.0544 3.20 0.0495 1 1.16 male #> 5 268 intercept 1.54 0.260 5.90 0.00973 1 0.9 male #> 6 268 slope 0.0230 0.106 0.216 0.843 1 0.9 male #> 7 314 intercept 1.31 0.153 8.56 0.00336 0 0.81 female #> 8 314 slope -0.0300 0.0623 -0.481 0.663 0 0.81 female #> 9 442 intercept 1.58 0.208 7.58 0.00476 0 1.13 female #> 10 442 slope 0.0580 0.0849 0.683 0.543 0 1.13 female #> # ℹ 22 more rows deviant_tolerance_biplot <- map( list(sex = \"sex\", exposure = \"exposure\"), \\(.x) { ggplot(deviant_tolerance_tidy_2, aes(x = .data[[.x]], y = estimate)) + geom_point() + facet_wrap(vars(term), ncol = 1, scales = \"free_y\") } ) # Figure 2.8, page 40: wrap_plots(deviant_tolerance_biplot) + plot_layout(axes = \"collect\") # Correlation values shown in Figure 2.8, page 40: deviant_tolerance_tidy_2 |> group_by(term) |> summarise( male_cor = cor(estimate, male), exposure_cor = cor(estimate, exposure) ) #> # A tibble: 2 × 3 #> term male_cor exposure_cor #> #> 1 intercept 0.00863 0.232 #> 2 slope 0.194 0.442"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-3.html","id":"what-is-the-purpose-of-the-multilevel-model-for-change","dir":"Articles","previous_headings":"","what":"3.1 What Is the Purpose of the Multilevel Model for Change?","title":"Chapter 3: Introducing the multilevel model for change","text":"Chapter 3 Singer Willett (2003) develop explain multilevel model change using subset data Burchinal, Campbell, Bryant, Wasik, Ramey (1997), measured effect early educational intervention cognitive performance sample African-American children ages 12, 18, 24 months (.e., 1.0, 1.5, 2.0 years). example use early_intervention data set, person-period data frame 309 rows 4 columns: id: Child ID. age: Age years time measurement. treatment: Treatment condition (control = 0, intervention = 1). cognitive_score: Cognitive performance score one two standardized intelligence tests: Bayley Scales Infant Development (Bayley, 1969) 12 18 months, Stanford Binet (Terman & Merrill, 1972) 24 months. Note reasons participant privacy early_intervention data set uses simulated data rather real data used Singer Willett (2003), examples presented article similar identical results presented text. motivate need multilevel model change, begin basic exploration description early_intervention data. Starting age variable, can see child’s cognitive performance measured three occasions ages 1.0, 1.5, 2.0 years, thus early_intervention data uses time-structured design. Next ’ll look time-invariant treatment variable. ’re summarizing time-invariant predictor, ’ll transform data person-level format pivot_wider() function tidyr package summarizing. total 58 children (56.3%) assigned participate early educational intervention, remaining 45 children (43.7%) participating control group. Singer Willett (2003) discuss, kind statistical model needed represent change processes longitudinal data like must include components two levels: level-1 submodel describes individuals change time, can address questions within-person change. level-2 submodel describes changes vary across individuals, can address questions -person differences change. Together, two components form multilevel model (also known linear mixed-effects model mixed model) change.","code":"# Table 3.1, page 48: early_intervention #> # A tibble: 309 × 4 #> id age treatment cognitive_score #> #> 1 1 1 1 106. #> 2 1 1.5 1 91.7 #> 3 1 2 1 74.2 #> 4 2 1 1 112. #> 5 2 1.5 1 114. #> 6 2 2 1 119. #> 7 3 1 0 90.4 #> 8 3 1.5 0 94.7 #> 9 3 2 0 80.4 #> 10 4 1 1 103. #> # ℹ 299 more rows measurement_occasions <- unique(early_intervention$age) measurement_occasions #> [1] 1.0 1.5 2.0 early_intervention |> group_by(id) |> summarise(all_occasions = identical(age, measurement_occasions)) |> pull(all_occasions) |> unique() #> [1] TRUE early_intervention_pl <- pivot_wider( early_intervention, names_from = age, names_prefix = \"age_\", values_from = cognitive_score ) early_intervention_pl #> # A tibble: 103 × 5 #> id treatment age_1 age_1.5 age_2 #> #> 1 1 1 106. 91.7 74.2 #> 2 2 1 112. 114. 119. #> 3 3 0 90.4 94.7 80.4 #> 4 4 1 103. 101. 93.9 #> 5 5 1 103. 75.0 71.7 #> 6 6 0 106. 96.8 93.5 #> 7 7 1 136. 117. 119. #> 8 8 0 79.8 69.3 67.5 #> 9 9 1 113. 105. 108. #> 10 10 1 88.2 87.5 85.3 #> # ℹ 93 more rows early_intervention_pl |> group_by(treatment) |> summarise(count = n()) |> mutate(proportion = count / sum(count)) #> # A tibble: 2 × 3 #> treatment count proportion #> #> 1 0 45 0.437 #> 2 1 58 0.563"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-3.html","id":"the-level-1-submodel-for-individual-change","dir":"Articles","previous_headings":"","what":"3.2 The Level-1 Submodel for Individual Change","title":"Chapter 3: Introducing the multilevel model for change","text":"Section 3.2 Singer Willett (2003) introduce level-1 component multilevel model change: submodel individual change—also known individual growth model—represents individual change outcome variable expect occur time period study. individual growth model specifies common functional form individual trajectories, Singer Willett (2003) suggest preceding level-1 submodel specification visual inspection empirical growth plots order select parsimonious functional form observed data reasonably come . Singer Willett (2003) identify several questions helpful answer examining empirical growth plots aid model specification: type population individual growth model generated observed data? population individual growth model linear curvilinear time? Smooth jagged? Continuous disjoint? nonlinearities observed data consistent across individuals? Might due measurement error random error? many measurement occasions ? many? early_intervention data three measurement occasions individual, nonlinearities observed data likely due measurement error random error, Singer Willett (2003) specify simple linear model level-1 submodel: \\[ \\text{cognitive_score}_{ij} = \\pi_{0i} + \\pi_{1i} (\\text{age}_{ij} - 1) + \\epsilon_{ij}, \\] asserts \\(\\text{cognitive_score}_{ij}\\)—true value cognitive_score \\(\\)th child \\(j\\)th time—linear function age measurement occasion, \\(\\text{age}_{ij}\\), deviations linearity observed data time result random error, \\(\\epsilon_{ij}\\). \\(\\text{age}\\) centred \\((\\text{age} - 1)\\) model intercept, \\(\\pi_{0i}\\), represents \\(\\)th child’s true initial status—, true \\(\\text{cognitive_score}\\) value age 1. preferable using \\(\\text{age}\\) centring, intercept model instead represent \\(\\)th child’s true value cognitive_score age 0, meaning: (1) uncentred \\(\\text{age}\\) model must predict beyond temporal limits early_intervention data; (2) assume individual trajectories extend back time birth linearly age. Finally, model slope, \\(\\pi_{1i}\\), represents true rate change \\(\\)th child’s true \\(\\text{cognitive_score}\\) time—case, true annual rate change.","code":"set.seed(567) # Figure 3.1, page 50: early_intervention |> filter(id %in% sample(unique(id), size = 8)) |> ggplot(aes(x = age, y = cognitive_score)) + geom_point() + stat_smooth(method = \"lm\", se = FALSE) + scale_x_continuous(breaks = c(1, 1.5, 2)) + coord_cartesian(ylim = c(50, 150)) + facet_wrap(vars(id), ncol = 4, labeller = label_both)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-3.html","id":"relating-the-level-1-submodel-to-the-exploratory-methods-of-chapter-2","dir":"Articles","previous_headings":"3.2 The Level-1 Submodel for Individual Change","what":"Relating the Level-1 Submodel to the Exploratory Methods of Chapter 2","title":"Chapter 3: Introducing the multilevel model for change","text":"fitting model, find helpful introduce Gelman Hill’s (2006) concepts complete pooling, pooling, partial pooling relation exploratory linear models fit Chapter 2 summarize average individual patterns change time. complete pooling model, group indicators individuals included model, single average trajectory using information individuals fitted. definition, complete pooling model ignores variation individuals, making unsuitable analyzing individual change time. corresponds average trajectory group trajectory plot Section 2.3. pooling model, separate within-person models fit person’s data, resulting individual trajectory individual. However, individual’s fitted trajectory ignores information individuals, pooling models tend overfit data within individual—potentially overstating variation individuals, making interindividual differences initial status rate change look different actually . corresponds individual trajectories group trajectory plot Section 2.3. partial pooling model represents compromise two extremes complete pooling partial pooling, wherein fit single model—individual growth model—takes account information individual, one hand, average individuals, , determine individual’s fitted trajectory. , partial pooling approach regularizes individual’s fitted trajectory—pulling extreme initial statuses rates change towards overall average. can fit level-1 submodel individual change using lmer() function lme4 package. model formula lmer() function takes form response ~ fixed_effects + random_effects. random effects term takes form (varying_effects | group), left side vertical bar defining variable(s) allowed vary across groups, right side grouping variable. Note , order match maximum likelihood method used Singer Willett (2003) chapter, also set REML argument FALSE model fit using full maximum likelihood (FML) estimation rather restricted maximum likelihood (REML). can visualize differences complete pooling, pooling, partial pooling adding fitted trajectory model empirical growth plots random sample children early_intervention data. , ’ll first use augment() function broom broom.mixed packages get predicted values child’s fitted trajectory complete pooling, pooling, partial pooling models. Note augment() generic function, whose methods linear multilevel models available broom broom.mixed packages, respectively. Next, ’ll tidy sample predicted values data. Finally, can plot empirical growth plots randomly sampled children, along fitted trajectory complete pooling, pooling, partial pooling models. Examining plots, differences complete pooling, pooling, partial pooling become apparent: complete pooling model estimates single average trajectory, ignoring variation individuals, child predicted exact trajectory. pooling model estimates unique trajectory individual closely follows observed data. partial pooling model estimates unique trajectory individual, sometimes closely follows observed data, times lies somewhere -complete pooling pooling trajectories. Conceptually, partial pooling model similar pooling model: approaches model individual change outcome variable expect occur time period study, allowing individuals vary initial status rate change. However, partial pooling model takes account relative amount information individual average individuals, trajectories pulled towards overall average—stronger pull extreme individual’s observed data. case early_intervention data, children’s cognitive performance declined time, apparent average trajectory complete pooling model, entire set trajectories pooling model together. Although rate decline varied across children, showed improvement. Given , cases show improvement given less weight partial pooling model, estimates pulled strongly towards group average. full extent effect can seen plotting entire set fitted trajectories partial pooling model together, along model’s population average trajectory.","code":"early_intervention_fit_cp <- lm( cognitive_score ~ I(age - 1), data = early_intervention ) early_intervention_fit_np <- lmList( cognitive_score ~ I(age - 1) | id, pool = FALSE, data = early_intervention ) early_intervention_fit_1 <- lmer( cognitive_score ~ I(age - 1) + (1 + I(age - 1) | id), data = early_intervention, REML = FALSE ) summary(early_intervention_fit_1) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: cognitive_score ~ I(age - 1) + (1 + I(age - 1) | id) #> Data: early_intervention #> #> AIC BIC logLik deviance df.resid #> 2412.7 2435.1 -1200.4 2400.7 303 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -2.08234 -0.50220 0.04103 0.53133 2.40849 #> #> Random effects: #> Groups Name Variance Std.Dev. Corr #> id (Intercept) 137.02 11.706 #> I(age - 1) 53.60 7.321 -0.46 #> Residual 69.24 8.321 #> Number of obs: 309, groups: id, 103 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) 109.881 1.375 79.92 #> I(age - 1) -16.913 1.366 -12.38 #> #> Correlation of Fixed Effects: #> (Intr) #> I(age - 1) -0.561 # Because the complete pooling model does not have a group indicator for # individuals, we need to manually add the IDs to the predicted values. early_intervention_pred_cp <- early_intervention_fit_cp |> augment() |> mutate(model = \"complete_pooling\", id = early_intervention$id) early_intervention_pred_cp #> # A tibble: 309 × 10 #> cognitive_score `I(age - 1)` .fitted .resid .hat .sigma .cooksd .std.resid #> > #> 1 106. 0 110. -4.04 0.00809 13.8 3.52e-4 -0.294 #> 2 91.7 0.5 101. -9.74 0.00324 13.8 8.11e-4 -0.707 #> 3 74.2 1 93.0 -18.7 0.00809 13.8 7.58e-3 -1.36 #> 4 112. 0 110. 2.33 0.00809 13.8 1.17e-4 0.169 #> 5 114. 0.5 101. 12.5 0.00324 13.8 1.33e-3 0.905 #> 6 119. 1 93.0 26.3 0.00809 13.7 1.49e-2 1.91 #> 7 90.4 0 110. -19.5 0.00809 13.8 8.19e-3 -1.42 #> 8 94.7 0.5 101. -6.69 0.00324 13.8 3.82e-4 -0.485 #> 9 80.4 1 93.0 -12.6 0.00809 13.8 3.42e-3 -0.916 #> 10 103. 0 110. -6.77 0.00809 13.8 9.88e-4 -0.492 #> # ℹ 299 more rows #> # ℹ 2 more variables: model , id # Because the no pooling models are separate linear models stored in a list, we # need to apply the augment() call to each model, then bind the predicted values # from each model into a single data set. Here the individual ID for each model # is stored in the name of list entry, which we add to the data frame using the # `names_to` argument of list_rbind(). early_intervention_pred_np <- early_intervention_fit_np |> map(augment) |> list_rbind(names_to = \"id\") |> mutate(model = \"no_pooling\") early_intervention_pred_np #> # A tibble: 309 × 10 #> id cognitive_score `I(age - 1)` .fitted .resid .hat .sigma .cooksd #> > #> 1 1 106. 0 106. -0.551 0.833 NaN 2.50 #> 2 1 91.7 0.5 90.6 1.10 0.333 NaN 0.25 #> 3 1 74.2 1 74.8 -0.551 0.833 Inf 2.50 #> 4 2 112. 0 112. 0.610 0.833 NaN 2.50 #> 5 2 114. 0.5 115. -1.22 0.333 NaN 0.25 #> 6 2 119. 1 119. 0.610 0.833 Inf 2.50 #> 7 3 90.4 0 93.5 -3.12 0.833 NaN 2.50 #> 8 3 94.7 0.5 88.5 6.23 0.333 NaN 0.25 #> 9 3 80.4 1 83.5 -3.12 0.833 Inf 2.50 #> 10 4 103. 0 104. -0.686 0.833 NaN 2.50 #> # ℹ 299 more rows #> # ℹ 2 more variables: .std.resid , model # Nothing special needs to be done for the partial pooling model, aside from # having the broom.mixed package loaded. early_intervention_pred_pp <- early_intervention_fit_1 |> augment() |> mutate(model = \"partial_pooling\") early_intervention_pred_pp #> # A tibble: 309 × 15 #> cognitive_score `I(age - 1)` id .fitted .resid .hat .cooksd .fixed .mu #> > #> 1 106. 0 1 103. 3.11 0.439 0.0974 110. 103. #> 2 91.7 0.5 1 92.6 -0.939 0.276 0.00336 101. 92.6 #> 3 74.2 1 1 82.5 -8.29 0.395 0.535 93.0 82.5 #> 4 112. 0 2 118. -5.90 0.439 0.352 110. 118. #> 5 114. 0.5 2 112. 1.42 0.276 0.00765 101. 112. #> 6 119. 1 2 107. 12.4 0.395 1.20 93.0 107. #> 7 90.4 0 3 97.7 -7.34 0.439 0.543 110. 97.7 #> 8 94.7 0.5 3 90.7 4.07 0.276 0.0632 101. 90.7 #> 9 80.4 1 3 83.6 -3.21 0.395 0.0803 93.0 83.6 #> 10 103. 0 4 107. -3.71 0.439 0.139 110. 107. #> # ℹ 299 more rows #> # ℹ 6 more variables: .offset , .sqrtXwt , .sqrtrwt , #> # .weights , .wtres , model # Finally, we can bind the predicted values from the models into a single data # frame. early_intervention_preds <- bind_rows( early_intervention_pred_cp, early_intervention_pred_np, early_intervention_pred_pp ) set.seed(333) early_intervention_preds_tidy <- early_intervention_preds |> select(model, id, cognitive_score, age = `I(age - 1)`, .fitted) |> mutate( id = factor(id, levels = unique(id)), age = as.numeric(age + 1) ) |> filter(id %in% sample(unique(id), size = 8)) early_intervention_preds_tidy #> # A tibble: 72 × 5 #> model id cognitive_score age .fitted #> #> 1 complete_pooling 2 112. 1 110. #> 2 complete_pooling 2 114. 1.5 101. #> 3 complete_pooling 2 119. 2 93.0 #> 4 complete_pooling 6 106. 1 110. #> 5 complete_pooling 6 96.8 1.5 101. #> 6 complete_pooling 6 93.5 2 93.0 #> 7 complete_pooling 14 88.7 1 110. #> 8 complete_pooling 14 97.6 1.5 101. #> 9 complete_pooling 14 81.3 2 93.0 #> 10 complete_pooling 39 111. 1 110. #> # ℹ 62 more rows ggplot(early_intervention_preds_tidy, aes(x = age, group = id)) + geom_point(aes(y = cognitive_score)) + geom_line(aes(y = .fitted, colour = model, group = model), linewidth = .75) + scale_x_continuous(breaks = c(1, 1.5, 2)) + scale_colour_brewer(palette = \"Dark2\") + coord_cartesian(ylim = c(50, 150)) + facet_wrap(vars(id), nrow = 2, labeller = label_both) # Figure 3.3, page 57: early_intervention |> ggplot(mapping = aes(x = age, y = cognitive_score)) + stat_smooth( aes(linewidth = \"no_pooling\", group = id), method = \"lm\", se = FALSE, span = .9 ) + stat_smooth( aes(linewidth = \"complete_pooling\"), method = \"lm\", se = FALSE, span = .9 ) + scale_linewidth_manual(values = c(2, .25)) + scale_x_continuous(breaks = c(1, 1.5, 2)) + coord_cartesian(ylim = c(50, 150)) + labs(linewidth = \"model\") early_intervention_fit_1 |> augment() |> select(-cognitive_score) |> rename(cognitive_score = .fitted, age = `I(age - 1)`) |> mutate(age = as.numeric(age + 1)) |> ggplot(aes(x = age, y = cognitive_score)) + geom_line(aes(linewidth = \"individual\", group = id), colour = \"#3366FF\") + # We'll use predict() rather than augment() to get the population-level # predictions, due to some currently bad behaviour in augment() when making # predictions on new data: https://github.com/bbolker/broom.mixed/issues/141 geom_line( aes(linewidth = \"average\"), data = tibble( age = measurement_occasions, cognitive_score = predict( early_intervention_fit_1, tibble(age = measurement_occasions), re.form = NA ) ), colour = \"#3366FF\" ) + scale_linewidth_manual(values = c(2, .25)) + scale_x_continuous(breaks = c(1, 1.5, 2)) + coord_cartesian(ylim = c(50, 150)) + labs(linewidth = \"trajectory\")"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-3.html","id":"the-level-2-submodel-for-systematic-interindividual-differences-in-change","dir":"Articles","previous_headings":"","what":"3.3 The Level-2 Submodel for Systematic Interindividual Differences in Change","title":"Chapter 3: Introducing the multilevel model for change","text":"Section 3.3 Singer Willett (2003) introduce level-2 component multilevel model change—submodel systematic interindividual differences change—defined four specific features: outcomes must individual growth parameters level-1 submodel, \\(\\pi_{0i}\\) \\(\\pi_{1i}\\). must one level-2 submodel level-1 individual growth parameter, must written separate parts. level-2 submodel must specify relationship individual growth parameter level-2 time-invariant predictors. level-2 submodel must allow individuals share common predictor values vary individual change trajectories. level-2 submodel must simultaneously account -group differences individual growth parameters within-group differences change, Singer Willett (2003) suggest preceding level-2 submodel specification visual inspection individual growth trajectories stratified levels time-invariant predictor(s), order identify kind population model give rise observed patterns. early_intervention data, single time-invariant predictor, Singer Willett (2003) specify following level-2 submodel: \\[ \\begin{align} \\pi_{0i} &= \\gamma_{00} + \\gamma_{01} \\text{treatment}_i + \\zeta_{0i} \\\\ \\pi_{1i} &= \\gamma_{10} + \\gamma_{11} \\text{treatment}_i + \\zeta_{1i}, \\end{align} \\] asserts individual growth parameters level-1 submodel, \\(\\pi_{0i}\\) \\(\\pi_{1i}\\), treated level-2 outcomes linear function \\(\\)th child’s treatment status, \\(\\text{treatment}_i\\). parameters \\(\\gamma_{00}\\) \\(\\gamma_{10}\\) level-2 intercepts; parameters \\(\\gamma_{01}\\) \\(\\gamma_{11}\\) level-2 slopes. Collectively, four level-2 parameters known fixed effects. Finally, parameters \\(\\zeta_{0i}\\) \\(\\zeta_{1i}\\) level-2 residuals allow value individual’s growth parameters scattered around respective population averages. Collectively, two level-2 parameters known random effects, assume bivariate normally distributed mean 0, unknown variances, \\(\\sigma_0^2\\) \\(\\sigma_1^2\\), unknown covariance, \\(\\sigma_{01}\\): \\[ \\begin{align} \\begin{bmatrix} \\zeta_{0i} \\\\ \\zeta_{1i} \\end{bmatrix} & \\sim \\operatorname{N} \\begin{pmatrix} \\begin{bmatrix} 0 \\\\ 0 \\end{bmatrix}, \\begin{bmatrix} \\sigma_0^2 & \\sigma_{01}\\\\ \\sigma_{01} & \\sigma_1^2 \\end{bmatrix} \\end{pmatrix}. \\end{align} \\]","code":"# Figure 3.4, page 59: ggplot(early_intervention, mapping = aes(x = age, y = cognitive_score)) + stat_smooth( aes(linewidth = \"individual\", group = id), method = \"lm\", se = FALSE, span = .9 ) + stat_smooth( aes(linewidth = \"average\"), method = \"lm\", se = FALSE, span = .9 ) + scale_linewidth_manual(values = c(2, .25)) + scale_x_continuous(breaks = c(1, 1.5, 2)) + coord_cartesian(ylim = c(50, 150)) + facet_wrap(vars(treatment), labeller = label_both) + labs(linewidth = \"trajectory\") # TODO: Decide whether or not to do the bottom plots."},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-3.html","id":"fitting-the-multilevel-model-for-change-to-data","dir":"Articles","previous_headings":"","what":"3.4 Fitting the Multilevel Model for Change to Data","title":"Chapter 3: Introducing the multilevel model for change","text":"Putting level-1 level-2 submodels together, multilevel model change early_intervention data looks like: \\[ \\begin{alignat}{3} & \\text{Level 1:} \\qquad & \\text{cognitive_score}_{ij} &= \\pi_{0i} + \\pi_{1i} (\\text{age}_{ij} - 1) + \\epsilon_{ij} \\\\ & \\text{Level 2:} \\qquad & \\pi_{0i} &= \\gamma_{00} + \\gamma_{01} \\text{treatment}_i + \\zeta_{0i} \\\\ & & \\pi_{1i} &= \\gamma_{10} + \\gamma_{11} \\text{treatment}_i + \\zeta_{1i}. \\end{alignat} \\] fitting model, find helpful substitute level-2 equations level-1, yielding single reduced equation. Although mixed model equation mathematically identical multilevel model equation , substituting equations helpful practice two reasons: makes parameters model easier identify, particularly case interactions level-1 level-2 predictors. format mixed-effects modelling packages R use specify model formula. clarifies statistical model actually fit data software. \\[ \\begin{align} \\text{cog}_{ij} &= \\pi_{0i} + \\pi_{1i} (\\text{age}_{ij} - 1) + \\epsilon_{ij} \\\\ &= \\gamma_{00} + \\gamma_{01} \\text{trt}_i + \\pi_{1i} (\\text{age}_{ij} - 1) + \\epsilon_{ij} + \\zeta_{0i} \\\\ &= \\gamma_{00} + \\gamma_{01} \\text{trt}_i + (\\gamma_{10} + \\gamma_{11} \\text{trt}_i + \\zeta_{1i})(\\text{age}_{ij} - 1) + \\epsilon_{ij} + \\zeta_{0i} \\\\ &= \\underbrace{ \\gamma_{00} + \\gamma_{01} \\text{trt}_i + \\gamma_{10}(\\text{age}_{ij} - 1) + \\gamma_{11} \\text{trt}_i(\\text{age}_{ij} - 1) }_{\\text{Fixed Effects}} + \\underbrace{ \\epsilon_{ij} + \\zeta_{0i} + \\zeta_{1i}(\\text{age}_{ij} - 1). }_{\\text{Random Effects}} \\end{align} \\] can now fit multilevel model change early_intervention data. Based equation , need update level-1 submodel include predictors treatment, interaction treatment age. Alternatively, start scratch, can specify final multilevel model change like .","code":"update( early_intervention_fit_1, . ~ . + treatment + treatment:I(age - 1) ) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: cognitive_score ~ I(age - 1) + (1 + I(age - 1) | id) + treatment + #> I(age - 1):treatment #> Data: early_intervention #> AIC BIC logLik deviance df.resid #> 2402.540 2432.407 -1193.270 2386.540 301 #> Random effects: #> Groups Name Std.Dev. Corr #> id (Intercept) 11.564 #> I(age - 1) 6.754 -0.57 #> Residual 8.321 #> Number of obs: 309, groups: id, 103 #> Fixed Effects: #> (Intercept) I(age - 1) treatment #> 107.822 -20.123 3.657 #> I(age - 1):treatment #> 5.702 early_intervention_fit <- lmer( cognitive_score ~ I(age - 1) * treatment + (1 + I(age - 1) | id), data = early_intervention, REML = FALSE ) # Table 3.3, page 69: summary(early_intervention_fit) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: cognitive_score ~ I(age - 1) * treatment + (1 + I(age - 1) | id) #> Data: early_intervention #> #> AIC BIC logLik deviance df.resid #> 2402.5 2432.4 -1193.3 2386.5 301 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -2.04567 -0.48714 0.04639 0.53367 2.32828 #> #> Random effects: #> Groups Name Variance Std.Dev. Corr #> id (Intercept) 133.74 11.564 #> I(age - 1) 45.61 6.754 -0.57 #> Residual 69.24 8.321 #> Number of obs: 309, groups: id, 103 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) 107.822 2.063 52.276 #> I(age - 1) -20.123 2.023 -9.949 #> treatment 3.657 2.749 1.330 #> I(age - 1):treatment 5.702 2.695 2.116 #> #> Correlation of Fixed Effects: #> (Intr) I(g-1) trtmnt #> I(age - 1) -0.605 #> treatment -0.750 0.454 #> I(g-1):trtm 0.454 -0.750 -0.605"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-3.html","id":"examining-estimated-fixed-effects","dir":"Articles","previous_headings":"","what":"3.5 Examining Estimated Fixed Effects","title":"Chapter 3: Introducing the multilevel model for change","text":"Section 3.5 Singer Willett (2003) explain two ways interpret fixed effects estimates multilevel model change: Interpreting fixed effects coefficients directly. Plotting fitted change trajectories prototypical individuals.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-3.html","id":"interpreting-fixed-effects-coefficients-directly","dir":"Articles","previous_headings":"3.5 Examining Estimated Fixed Effects","what":"Interpreting fixed effects coefficients directly","title":"Chapter 3: Introducing the multilevel model for change","text":"fixed effects estimates multilevel model change can interpreted directly way regression coefficient. Thus, multilevel model change early_intervention data, fixed effects estimates interpreted follows: \\(\\gamma_{00}\\): model intercept. parameter estimates population average true initial status children control group. \\(\\gamma_{01}\\): coefficient treatment. parameter estimates difference population average true initial status children treatment group children control group. \\(\\gamma_{10}\\): coefficient age. parameter estimates population average annual rate true change children control group. \\(\\gamma_{11}\\): coefficient interaction age treatment. parameter estimates difference population average annual rate true change children treatment group children control group. model, can view estimates summary() function. can also access estimates programmatically using either generic fixef() function return vector estimates, tidy() function broom.mixed package return tibble.","code":"summary(early_intervention_fit) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: cognitive_score ~ I(age - 1) * treatment + (1 + I(age - 1) | id) #> Data: early_intervention #> #> AIC BIC logLik deviance df.resid #> 2402.5 2432.4 -1193.3 2386.5 301 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -2.04567 -0.48714 0.04639 0.53367 2.32828 #> #> Random effects: #> Groups Name Variance Std.Dev. Corr #> id (Intercept) 133.74 11.564 #> I(age - 1) 45.61 6.754 -0.57 #> Residual 69.24 8.321 #> Number of obs: 309, groups: id, 103 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) 107.822 2.063 52.276 #> I(age - 1) -20.123 2.023 -9.949 #> treatment 3.657 2.749 1.330 #> I(age - 1):treatment 5.702 2.695 2.116 #> #> Correlation of Fixed Effects: #> (Intr) I(g-1) trtmnt #> I(age - 1) -0.605 #> treatment -0.750 0.454 #> I(g-1):trtm 0.454 -0.750 -0.605 fixef(early_intervention_fit) #> (Intercept) I(age - 1) treatment #> 107.821683 -20.123396 3.656602 #> I(age - 1):treatment #> 5.702077 tidy(early_intervention_fit, effects = \"fixed\") #> # A tibble: 4 × 5 #> effect term estimate std.error statistic #> #> 1 fixed (Intercept) 108. 2.06 52.3 #> 2 fixed I(age - 1) -20.1 2.02 -9.95 #> 3 fixed treatment 3.66 2.75 1.33 #> 4 fixed I(age - 1):treatment 5.70 2.70 2.12"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-3.html","id":"plotting-fitted-change-trajectories-for-prototypical-individuals","dir":"Articles","previous_headings":"3.5 Examining Estimated Fixed Effects","what":"Plotting fitted change trajectories for prototypical individuals","title":"Chapter 3: Introducing the multilevel model for change","text":"Another way interpreting fixed effects plot fitted change trajectories prototypical individuals, using fixed effects estimates make predictions. can using following three-step process: Construct data set prototypical individuals. Predict fitted change trajectories prototypical individuals using fixed effects estimates. Plot fitted change trajectories. Depending complexity multilevel model change, number prototypical individuals wish examine, number ways construct data set prototypical individuals. simplest way construct data set hand using, example, tibble() tribble() functions tibble package. However, often case prototypical individuals simply represent unique combinations different predictor values, often convenient construct data set using expand_grid() crossing() functions tidyr package, expand data frame include possible combinations values. difference functions crossing() wrapper around expand_grid() de-duplicates sorts inputs. early_intervention multilevel model change, two prototypical individuals possible: child treatment group (treatment = 1) child control group (treatment = 0). make predictions using fixed effects estimates, set re.form argument predict() function NA. noted earlier code comment, use predict() rather augment() get predictions, due currently bad behaviour augment.merMod() making predictions new data models (although specific example augment() approach worked well). Finally, can plot fitted change trajectories usual.","code":"prototypical_children <- crossing(treatment = c(0, 1), age = c(1, 1.5, 2)) prototypical_children #> # A tibble: 6 × 2 #> treatment age #> #> 1 0 1 #> 2 0 1.5 #> 3 0 2 #> 4 1 1 #> 5 1 1.5 #> 6 1 2 prototypical_children <- prototypical_children |> mutate( treatment = factor(treatment), cognitive_score = predict( early_intervention_fit, newdata = prototypical_children, re.form = NA ) ) prototypical_children #> # A tibble: 6 × 3 #> treatment age cognitive_score #> #> 1 0 1 108. #> 2 0 1.5 97.8 #> 3 0 2 87.7 #> 4 1 1 111. #> 5 1 1.5 104. #> 6 1 2 97.1 # Figure 3.5, page 71: ggplot(prototypical_children, aes(x = age, y = cognitive_score)) + geom_line(aes(linetype = treatment, group = treatment)) + scale_x_continuous(breaks = c(1, 1.5, 2)) + coord_cartesian(ylim = c(50, 150))"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"example-changes-in-adolescent-alcohol-use","dir":"Articles","previous_headings":"","what":"4.1 Example: Changes in adolescent alcohol use","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"Chapter 4 Singer Willett (2003) delve deeper specification, estimation, interpretation multilevel model change using subset data Curran, Stice, Chassin (1997), measured relation changes alcohol use changes peer alcohol use 3-year period community-based sample Hispanic Caucasian adolescents. example use alcohol_use_1 data set, person-period data frame 246 rows 6 columns: id: Adolescent ID. age: Age years time measurement. child_of_alcoholic: Binary indicator whether adolescent child alcoholic parent. male: Binary indicator whether adolescent male. alcohol_use: Square root summed scores four eight-point items measuring frequency alcohol use. peer_alcohol_use: Square root summed scores two six-point items measuring frequency peer alcohol use. inform specification multilevel model change fit subsequent sections, begin basic exploration description alcohol_use_1 data. Starting age variable, can see adolescent measured three occasions ages 14, 15, 16 years. Next ’ll look time-invariant male child_of_alcoholic variables. ’re summarizing time-invariant predictors, ’ll transform data person-level format pivot_wider() function tidyr package summarizing. total 42 adolescents (51.2%) male 40 female (48.8%), total 37 adolescents (45.1%) children alcoholic parent 45 (54.9%). inform specification level-1 submodel, can look empirical growth plots random sample adolescents usual. Finally, inform specification level-2 submodel, can look coincident growth trajectories—simply usual individual growth trajectories summarized number individuals trajectory—displayed separately groups distinguished important values time-invariant predictors. look two time-invariant predictors, child_of_alcoholic peer_alcohol_use. peer_alcohol_use continuous variable, split sample mean purpose display. plot coincident growth trajectories, first need summarize, predictor, number individuals trajectory. easiest way count number groups trajectory pattern level time-invariant predictors using person-level data, tidying coincident trajectory summary back person-period format. Afterwards can plot usual, addition linewidth aesthetic coincident trajectory counts. Note plot differs slightly text, unlike Singer Willett (2003) use entire sample instead random sample. Based exploratory analyses, Singer Willett (2003) posited following multilevel model change alcohol_use_1 data: \\[ \\begin{alignat}{3} & \\text{Level 1:} \\qquad & \\text{alcohol_use}_{ij} &= \\pi_{0i} + \\pi_{1i} (\\text{age}_{ij} - 14) + \\epsilon_{ij} \\\\ & \\text{Level 2:} \\qquad & \\pi_{0i} &= \\gamma_{00} + \\gamma_{01} \\text{child_of_alcoholic}_i + \\zeta_{0i} \\\\ & & \\pi_{1i} &= \\gamma_{10} + \\gamma_{11} \\text{child_of_alcoholic}_i + \\zeta_{1i}, \\end{alignat} \\] model parameters follow definitions interpretations discussed Chapter 3.","code":"alcohol_use_1 #> # A tibble: 246 × 6 #> id age child_of_alcoholic male alcohol_use peer_alcohol_use #> #> 1 1 14 1 0 1.73 1.26 #> 2 1 15 1 0 2 1.26 #> 3 1 16 1 0 2 1.26 #> 4 2 14 1 1 0 0.894 #> 5 2 15 1 1 0 0.894 #> 6 2 16 1 1 1 0.894 #> 7 3 14 1 1 1 0.894 #> 8 3 15 1 1 2 0.894 #> 9 3 16 1 1 3.32 0.894 #> 10 4 14 1 1 0 1.79 #> # ℹ 236 more rows measurement_occasions <- unique(alcohol_use_1$age) measurement_occasions #> [1] 14 15 16 alcohol_use_1 |> group_by(id) |> summarise(all_occasions = identical(age, measurement_occasions)) |> pull(all_occasions) |> unique() #> [1] TRUE alcohol_use_1_pl <- pivot_wider( alcohol_use_1, names_from = age, names_prefix = \"alcohol_use_\", values_from = alcohol_use ) alcohol_use_1_pl #> # A tibble: 82 × 7 #> id child_of_alcoholic male peer_alcohol_use alcohol_use_14 alcohol_use_15 #> #> 1 1 1 0 1.26 1.73 2 #> 2 2 1 1 0.894 0 0 #> 3 3 1 1 0.894 1 2 #> 4 4 1 1 1.79 0 2 #> 5 5 1 0 0.894 0 0 #> 6 6 1 1 1.55 3 3 #> 7 7 1 0 1.55 1.73 2.45 #> 8 8 1 1 0 0 0 #> 9 9 1 1 0 0 1 #> 10 10 1 0 2 1 1 #> # ℹ 72 more rows #> # ℹ 1 more variable: alcohol_use_16 map( list(male = \"male\", child_of_alcoholic = \"child_of_alcoholic\"), \\(.x) { alcohol_use_1_pl |> group_by(.data[[.x]]) |> summarise(count = n()) |> mutate(proportion = count / sum(count)) } ) #> $male #> # A tibble: 2 × 3 #> male count proportion #> #> 1 0 40 0.488 #> 2 1 42 0.512 #> #> $child_of_alcoholic #> # A tibble: 2 × 3 #> child_of_alcoholic count proportion #> #> 1 0 45 0.549 #> 2 1 37 0.451 # Figure 4.1, page 77: alcohol_use_1 |> filter(id %in% c(4, 14, 23, 32, 41, 56, 65, 82)) |> ggplot(aes(x = age, y = alcohol_use)) + stat_smooth(method = \"lm\", se = FALSE) + geom_point() + coord_cartesian(xlim = c(13, 17), ylim = c(0, 4)) + facet_wrap(vars(id), ncol = 4, labeller = label_both) alcohol_use_1_pl <- alcohol_use_1_pl |> mutate( peer_alcohol_use_split = if_else( peer_alcohol_use < mean(peer_alcohol_use), true = \"low\", false = \"high\" ), peer_alcohol_use_split = factor(peer_alcohol_use_split, levels = c(\"low\", \"high\")) ) alcohol_use_1_pl #> # A tibble: 82 × 8 #> id child_of_alcoholic male peer_alcohol_use alcohol_use_14 alcohol_use_15 #> #> 1 1 1 0 1.26 1.73 2 #> 2 2 1 1 0.894 0 0 #> 3 3 1 1 0.894 1 2 #> 4 4 1 1 1.79 0 2 #> 5 5 1 0 0.894 0 0 #> 6 6 1 1 1.55 3 3 #> 7 7 1 0 1.55 1.73 2.45 #> 8 8 1 1 0 0 0 #> 9 9 1 1 0 0 1 #> 10 10 1 0 2 1 1 #> # ℹ 72 more rows #> # ℹ 2 more variables: alcohol_use_16 , peer_alcohol_use_split alcohol_use_1_cotraj <- map( list(\"child_of_alcoholic\", \"peer_alcohol_use_split\"), \\(.x) { # Wrangle .coincident_trajectories <- alcohol_use_1_pl |> group_by(.data[[.x]], pick(starts_with(\"alcohol_use\"))) |> summarise(coincident_trajectories = n(), .groups = \"drop\") |> mutate(trajectory_id = 1:n(), .before = everything()) |> pivot_longer( cols = starts_with(\"alcohol_use\"), names_to = \"age\", names_prefix = \"alcohol_use_\", names_transform = as.integer, values_to = \"alcohol_use\" ) # Plot ggplot(.coincident_trajectories, aes(x = age, y = alcohol_use)) + stat_smooth( aes(group = trajectory_id, linewidth = coincident_trajectories), method = \"lm\", se = FALSE ) + scale_linewidth_continuous(limits = c(1, 22), range = c(.25, 4)) + coord_cartesian(xlim = c(13, 17), ylim = c(0, 4)) + facet_wrap(vars(.data[[.x]]), labeller = label_both) } ) # Figure 4.2, page 79: wrap_plots(alcohol_use_1_cotraj, ncol = 1, guides = \"collect\")"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"the-composite-specification-of-the-multilevel-model-for-change","dir":"Articles","previous_headings":"","what":"4.2 The composite specification of the multilevel model for change","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"Section 4.2 Singer Willett (2003) introduce call composite multilevel model change, prematurely introduced Chapter 3 examples mixed model specification. substituting level-2 equations level-1, composite multilevel model change alcohol_use_1 data looks like: \\[ \\text{alcohol_use}_{ij} = \\underbrace{ \\gamma_{00} + \\gamma_{01} \\text{coa}_i + \\gamma_{10}(\\text{age}_{ij} - 14) + \\gamma_{11} \\text{coa}_i(\\text{age}_{ij} - 14) }_{\\text{Fixed Effects}} + \\underbrace{ \\epsilon_{ij} + \\zeta_{0i} + \\zeta_{1i}(\\text{age}_{ij} - 14). }_{\\text{Random Effects}} \\]","code":""},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"methods-of-estimation-revisited","dir":"Articles","previous_headings":"","what":"4.3 Methods of Estimation, Revisited","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"Section 4.3 Singer Willett (2003) discuss two methods estimation available frequentist multilevel models, must selected fitting model: Generalized least squares (GLS), extension ordinary least-squares estimation allows residuals autocorrelated heteroscedastic. GLS estimates obtained minimizing weighted function residuals. gls() function nlme package can used fit multilevel model change using GLS. Maximum likelihood (ML), general approach limited linear regression models. ML estimates obtained maximizing likelihood function , assumed statistical model, observed data probable. previously demonstrated, lmer() function lme4 package can used fit multilevel model change using ML. generalized least squares maximum likelihood estimation use different procedures fit multilevel model change, estimates may differ fitting model using data; however, normal distribution assumptions required maximum likelihood estimation hold, estimates equivalent. Additionally, maximum likelihood estimation can distinguished two types: full restricted. Singer Willett (2003) explain, full maximum likelihood (FML) likelihood sample data maximized, goodness--fit statistics refer fit entire model (fixed random effects); restricted maximum likelihood (REML) likelihood sample residuals maximized, goodness--fit statistics refer fit random effects. Consequently, statistical tests comparing goodness--fit statistics FML models can used test hypotheses either fixed random effect parameters, whereas REML models can used test hypotheses random effect parameters.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"first-steps-fitting-two-unconditional-multilevel-models-for-change","dir":"Articles","previous_headings":"","what":"4.4 First Steps: Fitting Two Unconditional Multilevel Models for Change","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"Section 4.4 Singer Willett (2003) introduce new model building workflow multilevel model change begins fitting two unconditional multilevel models change including substantive predictors: unconditional means model, partitions quantifies total variation outcome variable across individuals without regard time. fit model first determine whether variance components \\(\\epsilon_{ij}\\) \\(\\zeta_{0i}\\) sufficient variation within individuals (level 1) individuals (level 2), respectively, warrant linking outcome variation level predictors. unconditional growth model, partitions quantifies variation outcome variable across individuals time. fit model second determine whether interindividual differences change due outcome variation true initial status, \\(\\zeta_{0i}\\), true rate change, \\(\\zeta_{1i}\\). Together, two models (1) provide valuable baseline can evaluate compare subsequent models include substantive predictors, (2) help establish whether systematic variation outcome variable worth exploring resides.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"the-unconditional-means-model","dir":"Articles","previous_headings":"4.4 First Steps: Fitting Two Unconditional Multilevel Models for Change","what":"The unconditional means model","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"unconditional means model intercept-model allows intercept vary across individuals: \\[ \\begin{alignat}{3} & \\text{Level 1:} \\qquad & \\text{alcohol_use}_{ij} &= \\pi_{0i} + \\epsilon_{ij} \\\\ & \\text{Level 2:} \\qquad & \\pi_{0i} &= \\gamma_{00} + \\zeta_{0i}, \\end{alignat} \\] postulates observed value alcohol_use \\(\\)th adolescent \\(j\\)th time composed within-person deviations, \\(\\epsilon_{ij}\\), person-specific true mean, \\(\\pi_{0i}\\), turn composed -person deviation, \\(\\zeta_{0i}\\), population average true mean, \\(\\gamma_{00}\\). Note unconditional means model lacks temporal predictors, stipulates true change trajectory individual completely flat time, sitting person-specific mean (\\(\\pi_{0i}\\)); true population change trajectory also flat, sitting grand mean (\\(\\gamma_{00}\\)).","code":"# Table 4.1, Model A, page 94-95: model_A <- lmer( alcohol_use ~ 1 + (1 | id), data = alcohol_use_1, REML = FALSE ) summary(model_A) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: alcohol_use ~ 1 + (1 | id) #> Data: alcohol_use_1 #> #> AIC BIC logLik deviance df.resid #> 676.2 686.7 -335.1 670.2 243 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -1.8865 -0.3076 -0.3067 0.6137 2.8567 #> #> Random effects: #> Groups Name Variance Std.Dev. #> id (Intercept) 0.5639 0.7509 #> Residual 0.5617 0.7495 #> Number of obs: 246, groups: id, 82 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) 0.92195 0.09571 9.633 model_A |> augment(data = alcohol_use_1) |> ggplot(aes(x = age, y = .fitted)) + geom_line(aes(linewidth = \"individual\", group = id), alpha = .35) + geom_line( aes(linewidth = \"average\"), data = tibble(.fitted = fixef(model_A), age = measurement_occasions) ) + scale_linewidth_manual(values = c(2, .25)) + coord_cartesian(xlim = c(13, 17), ylim = c(0, 4)) + labs(y = \"alcohol_use\", linewidth = \"trajectory\")"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"the-unconditional-growth-model","dir":"Articles","previous_headings":"4.4 First Steps: Fitting Two Unconditional Multilevel Models for Change","what":"The unconditional growth model","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"unconditional growth model introduces time-indicator predictor model, allows rate change vary across individuals: \\[ \\begin{alignat}{3} & \\text{Level 1:} \\qquad & \\text{alcohol_use}_{ij} &= \\pi_{0i} + \\pi_{1i}(\\text{age}_{ij} - 14) + \\epsilon_{ij} \\\\ & \\text{Level 2:} \\qquad & \\pi_{0i} &= \\gamma_{00} + \\zeta_{0i} \\\\ & & \\pi_{1i} &= \\gamma_{10} + \\zeta_{1i}, \\end{alignat} \\] postulates observed value alcohol_use \\(\\)th adolescent \\(j\\)th time composed within-person deviations, \\(\\epsilon_{ij}\\), true linear change trajectory—linear function true initial status, \\(\\pi_{0i}\\), true rate change, \\(\\pi_{1i}\\)—turn composed -person deviations, \\(\\zeta_{0i}\\) \\(\\zeta_{1i}\\), population average true initial status, \\(\\gamma_{00}\\), population average true rate change, \\(\\gamma_{10}\\), respectively. model Chapter 3’s individual growth model—new name emphasize model includes substantive predictors—can plot trajectories usual.","code":"model_B <- lmer( alcohol_use ~ I(age - 14) + (I(age - 14) | id), data = alcohol_use_1, REML = FALSE ) summary(model_B) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: alcohol_use ~ I(age - 14) + (I(age - 14) | id) #> Data: alcohol_use_1 #> #> AIC BIC logLik deviance df.resid #> 648.6 669.6 -318.3 636.6 240 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -2.47999 -0.38401 -0.07553 0.39001 2.50685 #> #> Random effects: #> Groups Name Variance Std.Dev. Corr #> id (Intercept) 0.6244 0.7902 #> I(age - 14) 0.1512 0.3888 -0.22 #> Residual 0.3373 0.5808 #> Number of obs: 246, groups: id, 82 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) 0.65130 0.10508 6.198 #> I(age - 14) 0.27065 0.06245 4.334 #> #> Correlation of Fixed Effects: #> (Intr) #> I(age - 14) -0.441 model_B |> augment() |> select(-alcohol_use) |> rename(alcohol_use = .fitted, age = `I(age - 14)`) |> mutate(age = as.numeric(age + 14)) |> ggplot(aes(x = age, y = alcohol_use)) + geom_line(aes(linewidth = \"individual\", group = id), colour = \"#3366FF\") + geom_line( aes(linewidth = \"average\"), data = tibble( age = measurement_occasions, alcohol_use = predict( model_B, tibble(age = measurement_occasions), re.form = NA ) ), colour = \"#3366FF\" ) + scale_linewidth_manual(values = c(2, .25)) + scale_x_continuous(breaks = 13:17) + coord_cartesian(xlim = c(13, 17), ylim = c(0, 4)) + labs(linewidth = \"trajectory\")"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"practical-data-analytic-strategies-for-model-building","dir":"Articles","previous_headings":"","what":"4.5 Practical Data Analytic Strategies for Model Building","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"Section 4.5 Singer Willett (2003) present data analytic strategy model building, focuses building systematic sequence models , set, address research questions meaningful way. refer sequence taxonomy statistical models, : model taxonomy extends prior model sensible way. Decisions enter, retain, remove predictors based combination logic, theory, prior research, supplemented hypothesis testing comparisons model fit. taxonomy progresses toward “final” model whose interpretation addresses research questions. present strategy one potential analytic path alcohol_use_1 data, research question focused relationship changes adolescent alcohol use child alcoholic parent. first substantive model, Model C, updates unconditional growth model include child_of_alcoholic predictor initial status rate change. Singer Willett (2003) added terms logical first step, given research question. Model D builds upon Model C, controlling effects peer_alcohol_use initial status rate change. Singer Willett (2003) added terms see might explain conditional residual variation initial status rate change Model C. Model E reduces Model D, removing child_of_alcoholic predictor rate change. Singer Willett (2003) removed term based results Models C D, estimated difference rate change alcohol_use children alchoholic nonalcoholic parents practically zero. Model F serves alternative Model E, peer_alcohol_use centred sample mean 1.018 (person-level data set). Singer Willett (2003) centred peer_alcohol_use level-2 intercepts, \\(\\gamma_{00}\\) \\(\\gamma_{10}\\), represent child non-alchoholic parents average value peer_alcohol_use (peer_alcohol_use = 1.018 child_of_alcoholic = 0), rather child non-alchoholic parents whose peers age 14 totally abstinent (peer_alcohol_use = 0 child_of_alcoholic = 0). Finally, Model G serves alternative Model F, child_of_alcoholic also centred sample mean 0.451 (person-level data set). Singer Willett (2003) also centred child_of_alcoholic level-2 intercepts, \\(\\gamma_{00}\\) \\(\\gamma_{10}\\), represent adolescent average values peer_alcohol_use child_of_alcoholic (peer_alcohol_use = 1.018 child_of_alcoholic = 0.451), numerically identical corresponding level-2 intercepts unconditional growth model. make taxonomy statistical models easier work subsequent sections, store models list.","code":"model_C <- update( model_B, . ~ . + child_of_alcoholic + I(age - 14):child_of_alcoholic ) summary(model_C) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: alcohol_use ~ I(age - 14) + (I(age - 14) | id) + child_of_alcoholic + #> I(age - 14):child_of_alcoholic #> Data: alcohol_use_1 #> #> AIC BIC logLik deviance df.resid #> 637.2 665.2 -310.6 621.2 238 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -2.5480 -0.3880 -0.1058 0.3602 2.3961 #> #> Random effects: #> Groups Name Variance Std.Dev. Corr #> id (Intercept) 0.4876 0.6983 #> I(age - 14) 0.1506 0.3881 -0.22 #> Residual 0.3373 0.5808 #> Number of obs: 246, groups: id, 82 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) 0.31595 0.13070 2.417 #> I(age - 14) 0.29296 0.08423 3.478 #> child_of_alcoholic 0.74321 0.19457 3.820 #> I(age - 14):child_of_alcoholic -0.04943 0.12539 -0.394 #> #> Correlation of Fixed Effects: #> (Intr) I(g-14) chld__ #> I(age - 14) -0.460 #> chld_f_lchl -0.672 0.309 #> I(-14):ch__ 0.309 -0.672 -0.460 model_D <- update( model_C, . ~ . + peer_alcohol_use + I(age - 14):peer_alcohol_use ) summary(model_D) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: alcohol_use ~ I(age - 14) + (I(age - 14) | id) + child_of_alcoholic + #> peer_alcohol_use + I(age - 14):child_of_alcoholic + I(age - #> 14):peer_alcohol_use #> Data: alcohol_use_1 #> #> AIC BIC logLik deviance df.resid #> 608.7 643.7 -294.3 588.7 236 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -2.59554 -0.40005 -0.07769 0.46003 2.29373 #> #> Random effects: #> Groups Name Variance Std.Dev. Corr #> id (Intercept) 0.2409 0.4908 #> I(age - 14) 0.1391 0.3730 -0.03 #> Residual 0.3373 0.5808 #> Number of obs: 246, groups: id, 82 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) -0.31651 0.14806 -2.138 #> I(age - 14) 0.42943 0.11369 3.777 #> child_of_alcoholic 0.57917 0.16249 3.564 #> peer_alcohol_use 0.69430 0.11153 6.225 #> I(age - 14):child_of_alcoholic -0.01403 0.12477 -0.112 #> I(age - 14):peer_alcohol_use -0.14982 0.08564 -1.749 #> #> Correlation of Fixed Effects: #> (Intr) I(g-14) chld__ pr_lc_ I(-14):c__ #> I(age - 14) -0.436 #> chld_f_lchl -0.371 0.162 #> peer_lchl_s -0.686 0.299 -0.162 #> I(-14):ch__ 0.162 -0.371 -0.436 0.071 #> I(-14):pr__ 0.299 -0.686 0.071 -0.436 -0.162 model_E <- update( model_D, . ~ . - I(age - 14):child_of_alcoholic ) summary(model_E) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: alcohol_use ~ I(age - 14) + (I(age - 14) | id) + child_of_alcoholic + #> peer_alcohol_use + I(age - 14):peer_alcohol_use #> Data: alcohol_use_1 #> #> AIC BIC logLik deviance df.resid #> 606.7 638.3 -294.4 588.7 237 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -2.59554 -0.40414 -0.08352 0.45550 2.29975 #> #> Random effects: #> Groups Name Variance Std.Dev. Corr #> id (Intercept) 0.2409 0.4908 #> I(age - 14) 0.1392 0.3730 -0.03 #> Residual 0.3373 0.5808 #> Number of obs: 246, groups: id, 82 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) -0.31382 0.14611 -2.148 #> I(age - 14) 0.42469 0.10559 4.022 #> child_of_alcoholic 0.57120 0.14623 3.906 #> peer_alcohol_use 0.69518 0.11126 6.249 #> I(age - 14):peer_alcohol_use -0.15138 0.08451 -1.791 #> #> Correlation of Fixed Effects: #> (Intr) I(g-14) chld__ pr_lc_ #> I(age - 14) -0.410 #> chld_f_lchl -0.338 0.000 #> peer_lchl_s -0.709 0.351 -0.146 #> I(-14):pr__ 0.334 -0.814 0.000 -0.431 model_F <- update( model_E, data = mutate(alcohol_use_1, peer_alcohol_use = peer_alcohol_use - 1.018) ) summary(model_F) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: alcohol_use ~ I(age - 14) + (I(age - 14) | id) + child_of_alcoholic + #> peer_alcohol_use + I(age - 14):peer_alcohol_use #> Data: mutate(alcohol_use_1, peer_alcohol_use = peer_alcohol_use - 1.018) #> #> AIC BIC logLik deviance df.resid #> 606.7 638.3 -294.4 588.7 237 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -2.59554 -0.40414 -0.08352 0.45550 2.29975 #> #> Random effects: #> Groups Name Variance Std.Dev. Corr #> id (Intercept) 0.2409 0.4908 #> I(age - 14) 0.1392 0.3730 -0.03 #> Residual 0.3373 0.5808 #> Number of obs: 246, groups: id, 82 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) 0.39387 0.10354 3.804 #> I(age - 14) 0.27058 0.06127 4.416 #> child_of_alcoholic 0.57120 0.14623 3.906 #> peer_alcohol_use 0.69518 0.11126 6.249 #> I(age - 14):peer_alcohol_use -0.15138 0.08451 -1.791 #> #> Correlation of Fixed Effects: #> (Intr) I(g-14) chld__ pr_lc_ #> I(age - 14) -0.336 #> chld_f_lchl -0.637 0.000 #> peer_lchl_s 0.094 0.000 -0.146 #> I(-14):pr__ 0.000 0.001 0.000 -0.431 model_G <- update( model_F, data = mutate( alcohol_use_1, peer_alcohol_use = peer_alcohol_use - 1.018, child_of_alcoholic = child_of_alcoholic - 0.451 ) ) summary(model_G) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: alcohol_use ~ I(age - 14) + (I(age - 14) | id) + child_of_alcoholic + #> peer_alcohol_use + I(age - 14):peer_alcohol_use #> Data: mutate(alcohol_use_1, peer_alcohol_use = peer_alcohol_use - 1.018, #> child_of_alcoholic = child_of_alcoholic - 0.451) #> #> AIC BIC logLik deviance df.resid #> 606.7 638.3 -294.4 588.7 237 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -2.59554 -0.40414 -0.08352 0.45550 2.29975 #> #> Random effects: #> Groups Name Variance Std.Dev. Corr #> id (Intercept) 0.2409 0.4908 #> I(age - 14) 0.1392 0.3730 -0.03 #> Residual 0.3373 0.5808 #> Number of obs: 246, groups: id, 82 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) 0.65148 0.07979 8.165 #> I(age - 14) 0.27058 0.06127 4.416 #> child_of_alcoholic 0.57120 0.14623 3.906 #> peer_alcohol_use 0.69518 0.11126 6.249 #> I(age - 14):peer_alcohol_use -0.15138 0.08451 -1.791 #> #> Correlation of Fixed Effects: #> (Intr) I(g-14) chld__ pr_lc_ #> I(age - 14) -0.436 #> chld_f_lchl 0.000 0.000 #> peer_lchl_s 0.001 0.000 -0.146 #> I(-14):pr__ 0.000 0.001 0.000 -0.431 alcohol_use_1_fits <- list( `Model A` = model_A, `Model B` = model_B, `Model C` = model_C, `Model D` = model_D, `Model E` = model_E, `Model F` = model_F, `Model G` = model_G )"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"inspecting-model-summary-and-goodness-of-fit-statistics","dir":"Articles","previous_headings":"4.5 Practical Data Analytic Strategies for Model Building","what":"Inspecting model summary and goodness-of-fit statistics","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"addition output summary() can return one row tibble model summary goodness--fit statistics using glance() function broom.mixed package. Individual statistics can also returned using generic functions corresponding name (e.g., AIC(), BIC(), deviance(), etc.). Singer Willett (2003) also introduce three pseudo-\\(R^2\\) statistics multilevel model change, can used cautiously quantify much outcome variation “explained” model’s predictors. first statistic, \\(R^2_{y \\hat y}\\), assesses proportion total outcome variation “explained” model’s specific combination predictors, based squared sample correlation observed predicted values. second statistic, \\(R^2_{\\epsilon}\\), assesses proportion within-person variation “explained” time, based proportional decrease within-person residual variance unconditional means model subsequent models. Note way reducing variance component add time-varying predictors level-1 submodel, statistic models fit alcohol_use_1 data. final statistic, \\(R^2_{\\zeta}\\), assesses proportion -person variation “explained” one level-2 predictors, based proportional decrease level-2 residual variance unconditional growth model subsequent models level-2 residual variance component. adding statistics table next section, join together .","code":"glance(model_A) #> # A tibble: 1 × 7 #> nobs sigma logLik AIC BIC deviance df.residual #> #> 1 246 0.749 -335. 676. 687. 670. 243 r2_yy <- alcohol_use_1_fits |> map( \\(.fit) { .fit |> augment() |> summarise( r2_yy = cor(alcohol_use, .fixed)^2 ) } ) |> list_rbind(names_to = \"model\") r2_e <- alcohol_use_1_fits[2:7] |> map( \\(.fit) { .fit |> augment() |> summarise( r2_e = (sigma(model_A)^2 - sigma(.fit)^2) / sigma(model_A)^2 ) } ) |> list_rbind(names_to = \"model\") r2_z <- alcohol_use_1_fits[3:7] |> map( \\(.fit) { zeta <- map( list(x = model_B, y = .fit), \\(.fit2) { .fit2 |> tidy(effects = \"ran_pars\", scales = \"vcov\") |> filter(group != \"Residual\" & stringr::str_detect(term, \"^var\")) } ) zeta$x |> left_join(zeta$y, by = c(\"effect\", \"group\", \"term\")) |> mutate( r2 = (estimate.x - estimate.y) / estimate.x, name = c(\"r2_z1\", \"r2_z2\") ) |> select(name, r2) |> pivot_wider(names_from = name, values_from = r2) } ) |> list_rbind(names_to = \"model\") alcohol_use_1_fits_r2 <- r2_yy |> left_join(r2_e) |> left_join(r2_z) alcohol_use_1_fits_r2 #> # A tibble: 7 × 5 #> model r2_yy r2_e r2_z1 r2_z2 #> #> 1 Model A NA NA NA NA #> 2 Model B 0.0434 0.400 NA NA #> 3 Model C 0.150 0.400 0.219 0.00401 #> 4 Model D 0.291 0.400 0.614 0.0799 #> 5 Model E 0.291 0.400 0.614 0.0797 #> 6 Model F 0.291 0.400 0.614 0.0797 #> 7 Model G 0.291 0.400 0.614 0.0797"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"interpreting-fitted-models","dir":"Articles","previous_headings":"4.5 Practical Data Analytic Strategies for Model Building","what":"Interpreting Fitted Models","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"systematically compare fitted models—describing happens predictors added removed—Singer Willett (2003) suggest placing side--side table, allows easily inspect compare estimated fixed effects, variance components, goodness--fit statistics one model next. can construct table using modelsummary() function modelsummary package. better match table text, set table output \"gt\" can post-process using gt package.","code":"# This option needs to be set in order to make all the desired goodness-of-fit # statistics available to modelsummary. options(modelsummary_get = \"all\") # Table 4.1, page 94-95: alcohol_use_1_fits |> modelsummary( shape = term + effect + statistic ~ model, scales = c(\"vcov\", NA), # argument from broom.mixed::tidy() coef_map = c( \"(Intercept)\", \"child_of_alcoholic\", \"peer_alcohol_use\", \"I(age - 14)\", \"I(age - 14):child_of_alcoholic\", \"I(age - 14):peer_alcohol_use\", \"var__Observation\", \"var__(Intercept)\", \"var__I(age - 14)\", \"cov__(Intercept).I(age - 14)\" ), gof_map = tibble( raw = c(\"deviance\", \"AIC\", \"BIC\"), clean = c(\"Deviance\", \"AIC\", \"BIC\"), fmt = 2 ), # The R2s need to be transposed to be added to the table columns. Their # position in the table is set by the `position` attribute. add_rows = alcohol_use_1_fits_r2 |> pivot_longer(-model, names_to = \"estimate\") |> pivot_wider(names_from = model) |> mutate(effect = \"\", .after = estimate) |> structure(position = 17:21), output = \"gt\" ) |> tab_row_group(label = \"Goodness-of-Fit\", rows = 17:23) |> tab_row_group(label = \"Variance Components\", rows = 13:16) |> tab_row_group(label = \"Fixed Effects\", rows = 1:12) |> cols_hide(effect)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"displaying-prototypical-change-trajectories","dir":"Articles","previous_headings":"4.5 Practical Data Analytic Strategies for Model Building","what":"Displaying Prototypical Change Trajectories","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"addition numerical summaries, Singer Willett (2003) suggest plotting fitted trajectories prototypical individuals describe results model fitting, prototypical values predictors selected using one following strategies: Choosing substantively interesting values, categorical continuous predictors well-known values. Using range percentiles, continuous predictors without well-known values. Using standard deviations sample mean, continuous predictors without well-known values. Using sample mean, categorical continuous predictors want control . selecting prototypical values predictors, prototypical change trajectories can derived combinations values using usual model predictions. convenience, can use estimate_prediction() function modelbased package make predictions, map2() function purrr package iterate desired model prototypical value combinations. look prototypical change trajectories Models B, C, E. systematically compare prototypical change trajectories, can helpful plot side--side. However, certain predictors present models others, need supply na.value scale_() functions ensure trajectory appears panels predictors present. Note depending number predictors across different models, may preferable instead create separate plots (later added together using patchwork package).","code":"prototypical_alcohol_use <- alcohol_use_1_fits |> keep_at(paste0(\"Model \", c(\"B\", \"C\", \"E\"))) |> map2( list( tibble(age = 14:16), crossing(age = 14:16, child_of_alcoholic = 0:1), crossing( age = 14:16, child_of_alcoholic = 0:1, peer_alcohol_use = c(0.655, 1.381) ) ), \\(.fit, .data) { .fit |> estimate_prediction(data = .data) |> rename(alcohol_use = Predicted) |> as_tibble() } ) |> list_rbind(names_to = \"model\") |> mutate( child_of_alcoholic = factor(child_of_alcoholic), peer_alcohol_use = factor(peer_alcohol_use, labels = c(\"low\", \"high\")) ) prototypical_alcohol_use #> # A tibble: 21 × 8 #> model age alcohol_use SE CI_low CI_high child_of_alcoholic #> #> 1 Model B 14 0.651 0.590 -0.511 1.81 NA #> 2 Model B 15 0.922 0.589 -0.238 2.08 NA #> 3 Model B 16 1.19 0.594 0.0233 2.36 NA #> 4 Model C 14 0.316 0.595 -0.857 1.49 0 #> 5 Model C 14 1.06 0.598 -0.120 2.24 1 #> 6 Model C 15 0.609 0.593 -0.559 1.78 0 #> 7 Model C 15 1.30 0.595 0.130 2.48 1 #> 8 Model C 16 0.902 0.602 -0.284 2.09 0 #> 9 Model C 16 1.55 0.607 0.351 2.74 1 #> 10 Model E 14 0.142 0.591 -1.02 1.31 0 #> # ℹ 11 more rows #> # ℹ 1 more variable: peer_alcohol_use # Figure 4.3, page 99: prototypical_alcohol_use |> ggplot(aes(x = age, y = alcohol_use)) + geom_line(aes(linetype = child_of_alcoholic, colour = peer_alcohol_use)) + scale_linetype_manual(values = c(2, 6), na.value = 1) + scale_color_viridis_d( option = \"G\", begin = .4, end = .7, na.value = \"black\" ) + scale_x_continuous(breaks = 13:17) + coord_cartesian(xlim = c(13, 17), ylim = c(0, 2)) + facet_wrap(vars(model))"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"comparing-models-using-deviance-statistics","dir":"Articles","previous_headings":"","what":"4.6 Comparing Models Using Deviance Statistics","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"Section 4.6 Singer Willett (2003) introduce deviance statistic, quantifies much worse current model fit comparison saturated model fits observed data perfectly comparing log-likelihood statistics two models: \\[ \\text{Deviance} = -2 (LL_\\text{current model} - LL_\\text{saturated model}). \\] Note multilevel model change equation reduces : \\[ \\text{Deviance} = -2LL_\\text{current model}, \\] log-likelihood statistic saturated model always zero. Deviance statistics nested models estimated using identical data can compared using anova() function, computes analysis deviance tables one fitted models. Unfortunately anova() doesn’t accept list input, meta-programming use list models don’t want type model hand. Note default anova() function refit objects class merMod FML comparing models estimated REML prevent common mistake inappropriately comparing REML-fitted models different fixed effects, whose likelihoods directly comparable. REML-fitted models identical fixed effects different random effects, refit argument can set FALSE directly compare REML-fitted models.","code":"with( alcohol_use_1_fits[1:5], do.call(anova, map(names(alcohol_use_1_fits[1:5]), as.name)) ) #> Data: alcohol_use_1 #> Models: #> Model A: alcohol_use ~ 1 + (1 | id) #> Model B: alcohol_use ~ I(age - 14) + (I(age - 14) | id) #> Model C: alcohol_use ~ I(age - 14) + (I(age - 14) | id) + child_of_alcoholic + I(age - 14):child_of_alcoholic #> Model E: alcohol_use ~ I(age - 14) + (I(age - 14) | id) + child_of_alcoholic + peer_alcohol_use + I(age - 14):peer_alcohol_use #> Model D: alcohol_use ~ I(age - 14) + (I(age - 14) | id) + child_of_alcoholic + peer_alcohol_use + I(age - 14):child_of_alcoholic + I(age - 14):peer_alcohol_use #> npar AIC BIC logLik deviance Chisq Df Pr(>Chisq) #> Model A 3 676.16 686.67 -335.08 670.16 #> Model B 6 648.61 669.64 -318.31 636.61 33.5449 3 2.472e-07 *** #> Model C 8 637.20 665.25 -310.60 621.20 15.4085 2 0.0004509 *** #> Model E 9 606.70 638.25 -294.35 588.70 32.4993 1 1.192e-08 *** #> Model D 10 608.69 643.74 -294.35 588.69 0.0126 1 0.9104569 #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"using-wald-statistics-to-test-composite-hypotheses-about-fixed-effects","dir":"Articles","previous_headings":"","what":"4.7 Using Wald Statistics to Test Composite Hypotheses About Fixed Effects","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"section intentionally left blank.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"evaluating-the-tenability-of-a-models-assumptions","dir":"Articles","previous_headings":"","what":"4.8 Evaluating the Tenability of a Model’s Assumptions","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"Section 4.8 Singer Willett (2003) offer strategies checking following assumptions multilevel model change: linear (nonlinear) functional form hypothesized individual change trajectory seems reasonable observed data—appear systematic deviations linearity (nonlinearity) across participants. level-1 level-2 residuals normally distributed. level-1 level-2 residuals equal variances level every predictor.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"checking-functional-form","dir":"Articles","previous_headings":"4.8 Evaluating the Tenability of a Model’s Assumptions","what":"Checking Functional Form","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"functional form assumption multilevel model change can assessed inspecting “outcome versus predictors” plots level. level-1, empirical growth plots superimposed individual change trajectories support suitability specified functional form. Empirical growth plots examined individual (several subsamples individuals), looking systematic deviations disconfirm suitability hypothesized individual change trajectory. level-2, OLS-estimated individual growth parameters plotted substantive predictor confirm suitability specified level-2 relationships. linear models, continuous predictors need assessed categorical predictors always linear.","code":"set.seed(333) alcohol_use_1 |> filter(id %in% sample(unique(id), size = 16)) |> ggplot(aes(x = age, y = alcohol_use)) + stat_smooth(method = \"lm\", se = FALSE) + geom_point() + coord_cartesian(xlim = c(13, 17), ylim = c(0, 4)) + facet_wrap(vars(id), ncol = 4, labeller = label_both) alcohol_use_1_fit_np <- lmList( alcohol_use ~ I(age - 14) | id, pool = FALSE, data = alcohol_use_1 ) alcohol_use_1_est_np <- alcohol_use_1_fit_np |> map(tidy) |> list_rbind(names_to = \"id\") |> select(id:estimate, alcohol_use = estimate) |> left_join(alcohol_use_1_pl) |> mutate(child_of_alcoholic = factor(child_of_alcoholic)) alcohol_use_1_ovp <- map( list(\"child_of_alcoholic\", \"peer_alcohol_use\"), \\(.x) { ggplot(alcohol_use_1_est_np, aes(x = .data[[.x]], y = alcohol_use)) + geom_hline(yintercept = 0, alpha = .25) + geom_point() + facet_wrap(vars(term), ncol = 1, scales = \"free_y\") } ) # Figure 4.4: wrap_plots(alcohol_use_1_ovp) + plot_layout(axes = \"collect\")"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"checking-normality","dir":"Articles","previous_headings":"4.8 Evaluating the Tenability of a Model’s Assumptions","what":"Checking Normality","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"normality assumption multilevel model change can assessed inspecting Q-Q plots level-1 level-2 residuals, also (optionally) statistical tests normality. check_normality() function performance package can perform tasks. plot() method check_normality() can used return Q-Q plots level-1 level-2 residuals.","code":"model_F_normality <- map( set_names(c(\"fixed\", \"random\")), \\(.x) check_normality(model_F, effects = .x) ) model_F_normality #> $fixed #> Warning: Non-normality of residuals detected (p = 0.011). #> #> $random #> OK: Random effects 'id: (Intercept)' appear as normally distributed (p = 0.270). #> Warning: Non-normality for random effects 'id: I(age - 14)' detected (p < .001). # Figure 4.5 (left panels), page 131: plot(model_F_normality$fixed, detrend = FALSE) + plot(model_F_normality$random) + plot_layout(widths = c(1/3, 2/3)) & theme_bw() & theme(panel.grid = element_blank())"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"checking-homoscedasticity","dir":"Articles","previous_headings":"4.8 Evaluating the Tenability of a Model’s Assumptions","what":"Checking Homoscedasticity","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"homoscedasticity assumption multilevel model change can assessed inspecting “residual versus predictors” plots level see residual variability approximately equal every predictor value. level-1 residuals plotted level-1 predictor. level-2 residuals plotted level-2 predictor(s).","code":"# Figure 4.6 (top panel), page 133: model_F |> augment(re.form = NA) |> rename(age = `I(age - 14)`) |> mutate(age = as.numeric(age + 14)) |> ggplot(aes(x = age, y = .resid)) + geom_hline(yintercept = 0, alpha = .25) + geom_point() + coord_cartesian(xlim = c(13, 17), ylim = c(-2, 2)) alcohol_use_1_ranef <- model_F |> ranef() |> augment() |> as_tibble() |> rename(term = variable, id = level, .resid = estimate) |> left_join(alcohol_use_1_pl) |> mutate(child_of_alcoholic = factor(child_of_alcoholic)) alcohol_use_1_rvp <- map( list(\"child_of_alcoholic\", \"peer_alcohol_use\"), \\(.x) { ggplot(alcohol_use_1_ranef, aes(x = .data[[.x]], y = .resid)) + geom_hline(yintercept = 0, alpha = .25) + geom_point() + facet_wrap(vars(term), ncol = 1, scales = \"free_y\") + coord_cartesian(ylim = c(-1, 1)) } ) # Figure 4.6 (bottom panels), page 133: wrap_plots(alcohol_use_1_rvp) + plot_layout(axes = \"collect\")"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-4.html","id":"model-based-estimates-of-the-individual-growth-parameters","dir":"Articles","previous_headings":"","what":"4.9 Model-Based Estimates of the Individual Growth Parameters","title":"Chapter 4: Doing data analysis with the multilevel model for change","text":"Section 4.9 Singer Willett (2003) discuss use model-based estimates display individual growth trajectories, simply partial pooling trajectories previously discussed Chapter 3. begin predicting three types growth trajectories individual: (1) pooling trajectory estimated separate linear models; (2) population average trajectory estimated multilevel model change, condition random effects; (3) model-based trajectory estimated multilevel model change, condition random effects. Now can plot three trajectories. Similar partial pooling example Chapter 3, notice : population average trajectories stable, varying least across individuals. pooling trajectories least stable, varying across individuals. model-based trajectories fall somewhere population average pooling trajectories, due effects partial pooling. Although typically prefer model-based trajectories multilevel model change, Singer Willett (2003) conclude cautioning model-based trajectories flawed model flawed well—quality depends heavily quality model fit soundness model’s assumptions (given data).","code":"alcohol_use_1_np_pred <- alcohol_use_1_fit_np |> map(augment) |> list_rbind(names_to = \"id\") |> mutate(trajectory = \"no_pooling\") alcohol_use_1_pp_pred <- list(population_average = NA, model_based = NULL) |> map(\\(.x) augment(model_F, re.form = .x)) |> list_rbind(names_to = \"trajectory\") # For display purposes we will tidy up the prediction data frames and only use # a subset of participants. alcohol_use_1_preds <- alcohol_use_1_np_pred |> bind_rows(alcohol_use_1_pp_pred) |> filter(id %in% c(4, 14, 23, 32, 41, 56, 65, 82)) |> rename(age = `I(age - 14)`) |> mutate( trajectory = factor( trajectory, levels = c(\"population_average\", \"no_pooling\", \"model_based\") ), id = factor(id, levels = sort(as.numeric(unique(id)))), age = as.numeric(age + 14), ) |> select(trajectory, id, age, alcohol_use, .fitted) alcohol_use_1_preds #> # A tibble: 72 × 5 #> trajectory id age alcohol_use .fitted #> #> 1 no_pooling 4 14 0 0.378 #> 2 no_pooling 4 15 2 1.24 #> 3 no_pooling 4 16 1.73 2.11 #> 4 no_pooling 14 14 2.83 3.09 #> 5 no_pooling 14 15 3.61 3.09 #> 6 no_pooling 14 16 2.83 3.09 #> 7 no_pooling 23 14 1 0.878 #> 8 no_pooling 23 15 1 1.24 #> 9 no_pooling 23 16 1.73 1.61 #> 10 no_pooling 32 14 1.73 1.75 #> # ℹ 62 more rows # Figure 4.7: ggplot(alcohol_use_1_preds, aes(x = age)) + geom_point(aes(y = alcohol_use)) + geom_line(aes(y = .fitted, colour = trajectory)) + scale_colour_brewer(palette = \"Dark2\") + scale_y_continuous(breaks = 0:4) + coord_cartesian(xlim = c(13, 17), ylim = c(-1, 4)) + facet_wrap(vars(id), nrow = 2, labeller = label_both)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-5.html","id":"variably-spaced-measurement-occasions","dir":"Articles","previous_headings":"","what":"5.1 Variably Spaced Measurement Occasions","title":"Chapter 5: Treating time more flexibly","text":"Section 5.1 Singer Willett (2003) demonstrate can fit multilevel model change data variably spaced measurement occasions using subset data Children National Longitudinal Study Youth (US Bureau Labor Statistics), measured changes reading subtest Peabody Individual Achievement Test (PIAT) sample 89 African-American children across three waves around ages 6, 8, 10. example use reading_scores data set, person-period data frame 267 rows 5 columns: id: Child ID. wave: Wave measurement. age_group: Expected age measurement occasion. age: Age years time measurement. reading_score: Reading score reading subtest Peabody Individual Achievement Test (PIAT). Note structure reading_scores data identical person-period data sets shown previous chapters, except three time-indicator variables: values wave reflect study’s design; time-structured across children, little substantive meaning. values age_group reflect child’s expected age measurement occasion; time-structured across children substantive meaning. values age reflect child’s actual age measurement occasion; variably spaced across children substantive meaning. demonstrates distinctive feature time-unstructured data sets—possibility multiple representations time. Thus, perspective age_group variable, reading_scores data appears time-structured: Whereas perspective age variable, reading_scores data appears variably spaced: However, Singer Willett (2003) discuss, specification, estimation, interpretation multilevel model change proceeds exact way regardless temporal representation use; thus, generally preferable use accurate unstructured temporal representation rather forcing data time-structured design. fit unconditional growth model using structured unstructured temporal representations demonstrate latter generally preferable. usual, begin inspecting empirical growth plots help select functional form level-1 submodel. linear change individual growth model seems parsimonious temporal representations. Following Singer Willett (2003), centre age_group age age 6.5 (average child’s age wave 1) parameters models identical interpretations, label time model generic time variable. Comparing models, see age model fits data better age_group model—less unexplained variation initial status rates change, smaller AIC BIC statistics.","code":"# Table 5.1, page 141: reading_scores #> # A tibble: 267 × 5 #> id wave age_group age reading_score #> #> 1 1 1 6.5 6 18 #> 2 1 2 8.5 8.33 35 #> 3 1 3 10.5 10.3 59 #> 4 2 1 6.5 6 18 #> 5 2 2 8.5 8.5 25 #> 6 2 3 10.5 10.6 28 #> 7 3 1 6.5 6.08 18 #> 8 3 2 8.5 8.42 23 #> 9 3 3 10.5 10.4 32 #> 10 4 1 6.5 6 18 #> # ℹ 257 more rows select(reading_scores, id, age_group, reading_score) #> # A tibble: 267 × 3 #> id age_group reading_score #> #> 1 1 6.5 18 #> 2 1 8.5 35 #> 3 1 10.5 59 #> 4 2 6.5 18 #> 5 2 8.5 25 #> 6 2 10.5 28 #> 7 3 6.5 18 #> 8 3 8.5 23 #> 9 3 10.5 32 #> 10 4 6.5 18 #> # ℹ 257 more rows select(reading_scores, id, age, reading_score) #> # A tibble: 267 × 3 #> id age reading_score #> #> 1 1 6 18 #> 2 1 8.33 35 #> 3 1 10.3 59 #> 4 2 6 18 #> 5 2 8.5 25 #> 6 2 10.6 28 #> 7 3 6.08 18 #> 8 3 8.42 23 #> 9 3 10.4 32 #> 10 4 6 18 #> # ℹ 257 more rows # Figure 5.1, page 143: reading_scores |> filter(id %in% c(4, 27, 31, 33, 41, 49, 69, 77, 87)) |> pivot_longer( starts_with(\"age\"), names_to = \"time_indicator\", values_to = \"age\" ) |> ggplot(aes(x = age, y = reading_score, colour = time_indicator)) + geom_point() + stat_smooth(method = \"lm\", se = FALSE, linewidth = .5) + scale_x_continuous(breaks = 5:12) + scale_color_brewer(palette = \"Dark2\") + coord_cartesian(xlim = c(5, 12), ylim = c(0, 80)) + facet_wrap(vars(id), labeller = label_both) reading_scores_fits <- map( list(age_group = \"age_group\", age = \"age\"), \\(.time) { lmer( reading_score ~ I(time - 6.5) + (1 + I(time - 6.5) | id), data = mutate(reading_scores, time = .data[[.time]]), REML = FALSE ) } ) options(modelsummary_get = \"all\") # Table 5.2, page 145: reading_scores_fits |> modelsummary( shape = term + effect + statistic ~ model, scales = c(\"vcov\", NA), coef_map = c( \"(Intercept)\", \"I(time - 6.5)\", \"var__Observation\", \"var__(Intercept)\", \"var__I(time - 6.5)\" ), gof_map = tibble( raw = c(\"deviance\", \"AIC\", \"BIC\"), clean = c(\"Deviance\", \"AIC\", \"BIC\"), fmt = 1 ), output = \"gt\" ) |> tab_row_group(label = \"Goodness-of-Fit\", rows = 8:10) |> tab_row_group(label = \"Variance Components\", rows = 5:7) |> tab_row_group(label = \"Fixed Effects\", rows = 1:4) |> cols_hide(effect)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-5.html","id":"varying-numbers-of-measurement-occasions","dir":"Articles","previous_headings":"","what":"5.2 Varying Numbers of Measurement Occasions","title":"Chapter 5: Treating time more flexibly","text":"Section 5.2 Singer Willett (2003) demonstrate can fit multilevel model change data varying numbers measurement occasions (.e., unbalanced data) using subset data National Longitudinal Study Youth tracking labour market experiences male high school dropouts (Murnane, Boudett, & Willett, 1999). example use dropout_wages data set, person-period data frame 6402 rows 9 columns: id: Participant ID. log_wages: Natural logarithm wages. experience: Labour force experience years, tracked dropouts’ first day work. ged: Binary indicator whether dropout obtained GED. postsecondary_education: Binary indicator whether dropout obtained post-secondary education. black: Binary indicator whether dropout black. hispanic: Binary indicator whether dropout hispanic. highest_grade: Highest grade completed. unemployment_rate: Unemployment rate local geographic area. dropout_wages data, number measurement occasions varies widely across individuals, 1 13 waves. Indeed, examining data subset individuals, can see dropout_wages data varies number spacing measurement occasions. Yet, Singer Willett (2003) discuss, major advantage multilevel model change can easily fit unbalanced data like —long person-period data set includes enough people enough waves data model converge, analyses can proceed usual. fit three models dropout_wages data: unconditional growth model (Model ), two models include predictors race highest grade completed (Models B C). Likewise, even data varying numbers measurement occasions, prototypical change trajectories can derived model usual.","code":"dropout_wages #> # A tibble: 6,402 × 9 #> id log_wages experience ged postsecondary_education black hispanic #> #> 1 31 1.49 0.015 1 0.015 0 1 #> 2 31 1.43 0.715 1 0.715 0 1 #> 3 31 1.47 1.73 1 1.73 0 1 #> 4 31 1.75 2.77 1 2.77 0 1 #> 5 31 1.93 3.93 1 3.93 0 1 #> 6 31 1.71 4.95 1 4.95 0 1 #> 7 31 2.09 5.96 1 5.96 0 1 #> 8 31 2.13 6.98 1 6.98 0 1 #> 9 36 1.98 0.315 1 0.315 0 0 #> 10 36 1.80 0.983 1 0.983 0 0 #> # ℹ 6,392 more rows #> # ℹ 2 more variables: highest_grade , unemployment_rate dropout_wages |> group_by(id) |> summarise(waves = n()) |> count(waves, name = \"count\") #> # A tibble: 13 × 2 #> waves count #> #> 1 1 38 #> 2 2 39 #> 3 3 47 #> 4 4 35 #> 5 5 74 #> 6 6 92 #> 7 7 103 #> 8 8 123 #> 9 9 127 #> 10 10 113 #> 11 11 65 #> 12 12 26 #> 13 13 6 # Table 5.3, page 147: dropout_wages |> filter(id %in% c(206, 332, 1028)) |> select(id, experience, log_wages, black, highest_grade, unemployment_rate) #> # A tibble: 20 × 6 #> id experience log_wages black highest_grade unemployment_rate #> #> 1 206 1.87 2.03 0 10 9.2 #> 2 206 2.81 2.30 0 10 11 #> 3 206 4.31 2.48 0 10 6.30 #> 4 332 0.125 1.63 0 8 7.1 #> 5 332 1.62 1.48 0 8 9.6 #> 6 332 2.41 1.80 0 8 7.2 #> 7 332 3.39 1.44 0 8 6.20 #> 8 332 4.47 1.75 0 8 5.60 #> 9 332 5.18 1.53 0 8 4.60 #> 10 332 6.08 2.04 0 8 4.30 #> 11 332 7.04 2.18 0 8 3.40 #> 12 332 8.20 2.19 0 8 4.39 #> 13 332 9.09 4.04 0 8 6.70 #> 14 1028 0.004 0.872 1 8 9.3 #> 15 1028 0.035 0.903 1 8 7.4 #> 16 1028 0.515 1.39 1 8 7.3 #> 17 1028 1.48 2.32 1 8 7.4 #> 18 1028 2.14 1.48 1 8 6.30 #> 19 1028 3.16 1.70 1 8 5.90 #> 20 1028 4.10 2.34 1 8 6.9 # Fit models ------------------------------------------------------------------ dropout_wages_fit_A <- lmer( log_wages ~ experience + (1 + experience | id), data = dropout_wages, REML = FALSE ) dropout_wages_fit_B <- update( dropout_wages_fit_A, . ~ . + experience * I(highest_grade - 9) + experience * black ) # The model fails to converge with the default optimizer (although the estimates # are fine). Changing the optimizer achieves convergence. dropout_wages_fit_C <- update( dropout_wages_fit_B, . ~ . - experience:I(highest_grade - 9) - black, control = lmerControl(optimizer = \"bobyqa\") ) dropout_wages_fits <- list( `Model A` = dropout_wages_fit_A, `Model B` = dropout_wages_fit_B, `Model C` = dropout_wages_fit_C ) # Make table ------------------------------------------------------------------ # Table 5.4, page 149: dropout_wages_fits |> modelsummary( shape = term + effect + statistic ~ model, scales = c(\"vcov\", NA), coef_map = c( \"(Intercept)\", \"I(highest_grade - 9)\", \"black\", \"experience\", \"experience:I(highest_grade - 9)\", \"experience:black\", \"var__Observation\", \"var__(Intercept)\", \"var__experience\" ), gof_map = tibble( raw = c(\"deviance\", \"AIC\", \"BIC\"), clean = c(\"Deviance\", \"AIC\", \"BIC\"), fmt = 1 ), output = \"gt\" ) |> tab_row_group(label = \"Goodness-of-Fit\", rows = 15:18) |> tab_row_group(label = \"Variance Components\", rows = 13:15) |> tab_row_group(label = \"Fixed Effects\", rows = 1:12) |> cols_hide(effect) prototypical_dropout_wages <- dropout_wages_fit_C |> estimate_prediction( data = crossing( experience = c(0, 12), highest_grade = c(0, 3) + 9, black = c(FALSE, TRUE) ) ) |> rename(log_wages = Predicted) |> mutate(highest_grade = factor(highest_grade)) |> as_tibble() # Figure 5.2, page 150: ggplot(prototypical_dropout_wages, aes(x = experience, y = log_wages)) + geom_line(aes(colour = highest_grade, linetype = black)) + scale_x_continuous(breaks = seq(0, 12, by = 2)) + scale_color_brewer(palette = \"Dark2\") + scale_linetype_manual(values = c(2, 1)) + coord_cartesian(ylim = c(1.6, 2.4))"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-5.html","id":"practical-problems-that-may-arise-when-analyzing-unbalanced-data-sets","dir":"Articles","previous_headings":"5.2 Varying Numbers of Measurement Occasions","what":"5.2.2 Practical Problems That May Arise When Analyzing Unbalanced Data Sets","title":"Chapter 5: Treating time more flexibly","text":"multilevel model may fail converge unable estimate one variance components data sets severely unbalanced, people enough waves data. Section 5.2.2 Singer Willett (2003) discuss two strategies addressing problems: Removing boundary constraints, software permitted obtain negative variance components. Fixing rates change, model simplified removing varying slope change. example use subset dropout_wages data purposefully constructed severely unbalanced. First refit Model C dropout_wages_subset data. Note estimated variance component experience practically zero model summary following message bottom: “boundary (singular) fit: see help(‘isSingular’)”. first strategy Singer Willett (2003) suggest remove boundary constraints software, however, lme4 package support removal boundary constraints allow negative variance components, strategy replicated (Model B). second strategy simplify model fixing rates change, removing varying slope experience. model fits without issue. Comparing Models C, note deviance statistics identical, AIC BIC statistics smaller Model C, suggesting : (1) Model C improvement Model ; (2) effectively model systematic interindividual differences rates change data set.","code":"dropout_wages_subset #> # A tibble: 257 × 5 #> id log_wages experience black highest_grade #> #> 1 206 2.03 1.87 0 10 #> 2 206 2.30 2.81 0 10 #> 3 206 2.48 4.31 0 10 #> 4 266 1.81 0.322 0 9 #> 5 304 1.84 0.58 0 8 #> 6 329 1.42 0.016 0 8 #> 7 329 1.31 0.716 0 8 #> 8 329 1.88 1.76 0 8 #> 9 336 1.89 1.91 1 8 #> 10 336 1.28 2.51 1 8 #> # ℹ 247 more rows dropout_wages_fit_A_subset <- update( dropout_wages_fit_C, data = dropout_wages_subset ) summary(dropout_wages_fit_A_subset) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: log_wages ~ experience + (1 + experience | id) + I(highest_grade - #> 9) + experience:black #> Data: dropout_wages_subset #> Control: lmerControl(optimizer = \"bobyqa\") #> #> AIC BIC logLik deviance df.resid #> 299.9 328.3 -141.9 283.9 249 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -2.4109 -0.4754 -0.0290 0.4243 4.2842 #> #> Random effects: #> Groups Name Variance Std.Dev. Corr #> id (Intercept) 8.215e-02 0.286615 #> experience 3.526e-06 0.001878 1.00 #> Residual 1.150e-01 0.339068 #> Number of obs: 257, groups: id, 124 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) 1.73734 0.04760 36.499 #> experience 0.05161 0.02108 2.449 #> I(highest_grade - 9) 0.04610 0.02447 1.884 #> experience:black -0.05968 0.03477 -1.716 #> #> Correlation of Fixed Effects: #> (Intr) exprnc I(_-9) #> experience -0.612 #> I(hghst_-9) 0.051 -0.133 #> exprnc:blck -0.129 -0.297 0.023 #> optimizer (bobyqa) convergence code: 0 (OK) #> boundary (singular) fit: see help('isSingular') dropout_wages_fit_C_subset <- update( dropout_wages_fit_A_subset, . ~ . - (1 + experience | id) + (1 | id) ) summary(dropout_wages_fit_C_subset) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: #> log_wages ~ experience + I(highest_grade - 9) + (1 | id) + experience:black #> Data: dropout_wages_subset #> Control: lmerControl(optimizer = \"bobyqa\") #> #> AIC BIC logLik deviance df.resid #> 295.9 317.2 -141.9 283.9 251 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -2.4202 -0.4722 -0.0290 0.4197 4.2439 #> #> Random effects: #> Groups Name Variance Std.Dev. #> id (Intercept) 0.08425 0.2903 #> Residual 0.11480 0.3388 #> Number of obs: 257, groups: id, 124 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) 1.73734 0.04775 36.383 #> experience 0.05178 0.02093 2.474 #> I(highest_grade - 9) 0.04576 0.02450 1.868 #> experience:black -0.06007 0.03458 -1.737 #> #> Correlation of Fixed Effects: #> (Intr) exprnc I(_-9) #> experience -0.614 #> I(hghst_-9) 0.051 -0.135 #> exprnc:blck -0.130 -0.294 0.024 dropout_wages_fits_subset <- list( `Model A` = dropout_wages_fit_A_subset, `Model C` = dropout_wages_fit_C_subset ) # Table 5.5, page 154: dropout_wages_fits_subset |> modelsummary( shape = term + effect + statistic ~ model, scales = c(\"vcov\", NA), coef_map = c( \"(Intercept)\", \"I(highest_grade - 9)\", \"black\", \"experience\", \"experience:I(highest_grade - 9)\", \"experience:black\", \"var__Observation\", \"var__(Intercept)\", \"var__experience\" ), gof_map = tibble( raw = c(\"deviance\", \"AIC\", \"BIC\"), clean = c(\"Deviance\", \"AIC\", \"BIC\"), fmt = 1 ), output = \"gt\" ) |> tab_row_group(label = \"Goodness-of-Fit\", rows = 12:14) |> tab_row_group(label = \"Variance Components\", rows = 9:11) |> tab_row_group(label = \"Fixed Effects\", rows = 1:8) |> cols_hide(effect)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-5.html","id":"time-varying-predictors","dir":"Articles","previous_headings":"","what":"5.3 Time-Varying Predictors","title":"Chapter 5: Treating time more flexibly","text":"Section 5.3 Singer Willett (2003) demonstrate fit multilevel model change data time-varying predictors using subset data Ginexi, Howe, Caplan (2000), measured changes depressive symptoms job loss sample 254 recently unemployed men women. Interviews conducted three waves around 1, 5, 12 months job loss. example use depression_unemployment data set, person-period data frame 674 rows 5 columns: id: Participant ID. interview: Time interview. months: Months since job loss. depression: Total score Center Epidemiologic Studies’ Depression (CES-D) scale (Radloff, 1977). unemployed: Binary indicator whether participant unemployed time interview. Note participants unemployed first interview, changes unemployment status gathered second third interviews. depression_unemployment data, number spacing measurement occasions varies across individuals. total 193 participants (76%) three interviews, 34 participants (13.4%) two interviews, 27 participants (10.6%) one interview. average time job loss first interview 27.6 days (SD = 10.7; range = 2-61), 151 days second interview (SD = 18.3; range = 111-220), 359 days third interview (SD = 19.1; range = 319-458). Additionally, examining data subset individuals, can see unemployed variable time-varying predictor several unique patterns change across participants. Considering participants complete data, 78 unemployed every interview (pattern 1-1-1), 55 always employed first interview (pattern 1-0-0), 41 still unemployed second interview employed third (pattern 1-1-0), 19 employed second interview unemployed third (pattern 1-0-1). previous examples, special strategies needed fit multilevel model change time-varying predictors. However, Singer Willett (2003) discuss, inclusion time-varying predictors model implies existence multiple continuous discontinuous change trajectories—one possible pattern time-varying predictors. fit four models depression_unemployment data: unconditional growth model (Model ), model includes main effect time-varying predictor (Model B), model includes interaction effect time-varying predictor (Model C), model allows time-varying predictor fixed random effects (Model D). Note Model D, Singer Willett (2003) fit model using SAS, report issues model given data; however, programs (R, MPlus, SPSS, STATA) convergence/singularity problems possible get results match textbook. programs react differently situation, reasonable conclude problem software, model complex, given data.","code":"depression_unemployment #> # A tibble: 674 × 5 #> id interview months depression unemployed #> #> 1 103 1 1.15 25 1 #> 2 103 2 5.95 16 1 #> 3 103 3 12.9 33 1 #> 4 641 1 0.789 27 1 #> 5 641 2 4.86 7 0 #> 6 641 3 11.8 25 0 #> 7 741 1 1.05 40 1 #> 8 846 1 0.624 2 1 #> 9 846 2 4.93 22 1 #> 10 846 3 11.8 0 0 #> # ℹ 664 more rows depression_unemployment |> group_by(id) |> summarise(waves = n()) |> count(waves, name = \"count\") |> mutate(proportion = count / sum(count)) #> # A tibble: 3 × 3 #> waves count proportion #> #> 1 1 27 0.106 #> 2 2 34 0.134 #> 3 3 193 0.760 depression_unemployment |> group_by(interview) |> mutate(days = months * 30.4167) |> summarise( mean = mean(days), sd = sd(days), min = min(days), max = max(days) ) #> # A tibble: 3 × 5 #> interview mean sd min max #> #> 1 1 27.6 10.7 2.00 61.0 #> 2 2 151. 18.4 111. 220. #> 3 3 359. 19.1 319. 458. # Table 5.6, page 161: filter(depression_unemployment, id %in% c(7589, 55697, 67641, 65441, 53782)) #> # A tibble: 14 × 5 #> id interview months depression unemployed #> #> 1 7589 1 1.31 36 1 #> 2 7589 2 5.09 40 1 #> 3 7589 3 11.8 39 1 #> 4 53782 1 0.427 22 1 #> 5 53782 2 4.24 15 0 #> 6 53782 3 11.1 21 1 #> 7 55697 1 1.35 7 1 #> 8 55697 2 5.78 4 1 #> 9 65441 1 1.08 27 1 #> 10 65441 2 4.70 15 1 #> 11 65441 3 11.3 7 0 #> 12 67641 1 0.329 32 1 #> 13 67641 2 4.11 9 0 #> 14 67641 3 10.9 10 0 unemployed_patterns <- depression_unemployment |> group_by(id) |> filter(n() == 3) |> summarise(unemployed_pattern = paste(unemployed, collapse = \"-\")) |> count(unemployed_pattern, name = \"count\") unemployed_patterns #> # A tibble: 4 × 2 #> unemployed_pattern count #> #> 1 1-0-0 55 #> 2 1-0-1 19 #> 3 1-1-0 41 #> 4 1-1-1 78 # Fit models ------------------------------------------------------------------ depression_unemployment_fit_A <- lmer( depression ~ months + (1 + months | id), data = depression_unemployment, REML = FALSE ) # The model fails to converge with the default optimizer (although the # estimates are fine). Changing the optimizer achieves convergence. depression_unemployment_fit_B <- update( depression_unemployment_fit_A, . ~ . + unemployed, control = lmerControl(optimizer = \"bobyqa\") ) depression_unemployment_fit_C <- update( depression_unemployment_fit_B, . ~ . + months:unemployed ) # The number of observations is less than the number of random effects levels # for each term, which makes the random effects variances (probably) # unidentifiable in this model and throws an error. In order to fit the model # we need to ignore this check. depression_unemployment_fit_D <- lmer( depression ~ unemployed + unemployed:months + (1 + unemployed + months:unemployed | id), data = depression_unemployment, REML = FALSE, control = lmerControl(check.nobs.vs.nRE = \"ignore\") ) depression_unemployment_fits <- list( `Model A` = depression_unemployment_fit_A, `Model B` = depression_unemployment_fit_B, `Model C` = depression_unemployment_fit_C, `Model D` = depression_unemployment_fit_D ) # Make table ------------------------------------------------------------------ # Table 5.7, page 163: depression_unemployment_fits |> modelsummary( shape = term + effect + statistic ~ model, scales = c(\"vcov\", NA), coef_map = c( \"(Intercept)\" = \"(Intercept)\", \"months\" = \"months\", \"black\" = \"black\", \"unemployed\" = \"unemployed\", \"months:unemployed\" = \"months:unemployed\", \"unemployed:months\" = \"months:unemployed\", \"var__Observation\" = \"var__Observation\", \"var__(Intercept)\" = \"var__(Intercept)\", \"var__months\" = \"var__months\", \"var__unemployed\" = \"var__unemployed\", \"var__unemployed:months\" = \"var__unemployed:months\" ), gof_map = tibble( raw = c(\"deviance\", \"AIC\", \"BIC\"), clean = c(\"Deviance\", \"AIC\", \"BIC\"), fmt = 1 ), output = \"gt\" ) |> tab_row_group(label = \"Goodness-of-Fit\", rows = 14:16) |> tab_row_group(label = \"Variance Components\", rows = 9:13) |> tab_row_group(label = \"Fixed Effects\", rows = 1:8) |> cols_hide(effect)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-5.html","id":"plotting-discontinuous-change-trajectories","dir":"Articles","previous_headings":"5.3 Time-Varying Predictors","what":"Plotting discontinuous change trajectories","title":"Chapter 5: Treating time more flexibly","text":"Unlike previous examples, addition time-varying predictor model implies given change trajectory may composed either one continuous segment multiple discontinuous segments. , new strategies required construct data set prototypical individuals plot fitted change trajectories, segment start end times predictions. data set can either wide long format, however: wide formats, segment must plotted using geom_segment() function ggplot2 package; whereas long formats, segment must grouping ID, can otherwise plotted using geom_line() function usual. demonstrate formats constructing prototypical change trajectories Model B. convenient way construct data set prototypical individuals wide format reframe() function dplyr package, works similarly summarise() function, can return arbitrary number rows per group. use (1) expand unemployed_pattern string numeric vector using str_extract_all() function stringr package; (2) add start stop times segment. prediction proceeds usual, except use dplyr’s across() function avoid writing predict() code. Although plot prototypical trajectories using wide format data, note convenient way create grouping ID long format data consecutive_id() function dplyr package, generates unique identifier increments every time variable changes. resulting variable can passed ggplot2’s group aesthetic ensure correct cases connected together. Now can plot four trajectories. alternative strategy plotting discontinuous change trajectories suggested Singer Willett (2003) represent wide variety transition times using just two continuous trajectories encompass extreme contrasts possible: , someone consistently unemployed, someone consistently employed. approach, prototypical change trajectories can predicted plotted using strategies used models time-invariant predictors, conveying () information set discontinuous trajectories . demonstrate alternative strategy Models B, C, D. depression_unemployment study’s design, start fitted trajectory consistently employed individual 3.5 months—earliest time participant second interview. examining plots like , Singer Willett (2003) suggest thinking two extreme trajectories envelope representing complete set prototypical individuals implied model: participants unemployed first interview (design), individual starts unemployed trajectory. second interview—regardless transition time—become employed move employed trajectory, don’t stay unemployed trajectory. third interview—regardless transition time—become unemployed move back unemployed trajectory, don’t stay employed trajectory.","code":"prototypical_depression_B <- unemployed_patterns |> select(-count) |> group_by(unemployed_pattern) |> reframe( unemployed = str_extract_all(unemployed_pattern, \"[:digit:]\", simplify = TRUE), unemployed = as.numeric(unemployed), months_start = c(0, 5, 10), months_end = c(5, 10, 15), ) |> mutate( across( starts_with(\"months\"), \\(.time) { predict( depression_unemployment_fit_B, tibble(unemployed, months = .time), re.form = NA ) }, .names = \"depression_{.col}\" ), unemployed_pattern = factor( unemployed_pattern, levels = c(\"1-1-1\", \"1-0-0\", \"1-1-0\", \"1-0-1\") ) ) |> rename_with( \\(.x) str_remove(.x, \"months_\"), .cols = starts_with(\"depression\") ) prototypical_depression_B #> # A tibble: 12 × 6 #> unemployed_pattern unemployed months_start months_end depression_start #> #> 1 1-0-0 1 0 5 17.8 #> 2 1-0-0 0 5 10 11.7 #> 3 1-0-0 0 10 15 10.6 #> 4 1-0-1 1 0 5 17.8 #> 5 1-0-1 0 5 10 11.7 #> 6 1-0-1 1 10 15 15.8 #> 7 1-1-0 1 0 5 17.8 #> 8 1-1-0 1 5 10 16.8 #> 9 1-1-0 0 10 15 10.6 #> 10 1-1-1 1 0 5 17.8 #> 11 1-1-1 1 5 10 16.8 #> 12 1-1-1 1 10 15 15.8 #> # ℹ 1 more variable: depression_end prototypical_depression_B |> pivot_longer( cols = c(starts_with(\"months\"), starts_with(\"depression\")), names_to = c(\".value\"), names_pattern = \"(^.*(?=_))\" ) |> group_by(unemployed_pattern) |> mutate(cid = consecutive_id(unemployed), .after = unemployed_pattern) #> # A tibble: 24 × 5 #> # Groups: unemployed_pattern [4] #> unemployed_pattern cid unemployed months depression #> #> 1 1-0-0 1 1 0 17.8 #> 2 1-0-0 1 1 5 16.8 #> 3 1-0-0 2 0 5 11.7 #> 4 1-0-0 2 0 10 10.6 #> 5 1-0-0 2 0 10 10.6 #> 6 1-0-0 2 0 15 9.64 #> 7 1-0-1 1 1 0 17.8 #> 8 1-0-1 1 1 5 16.8 #> 9 1-0-1 2 0 5 11.7 #> 10 1-0-1 2 0 10 10.6 #> # ℹ 14 more rows # Figure 5.3: ggplot(prototypical_depression_B, aes(x = months_start, y = depression_start)) + geom_segment(aes(xend = months_end, yend = depression_end)) + coord_cartesian(ylim = c(5, 20)) + facet_wrap(vars(unemployed_pattern), labeller = label_both) + labs(x = \"months\", y = \"depression\") prototypical_depression <- depression_unemployment_fits[-1] |> map( \\(.fit) { .fit |> estimate_prediction( data = tibble(months = c(0, 14, 3.5, 14), unemployed = c(1, 1, 0, 0)) ) |> rename(depression = Predicted) |> mutate(unemployed = as.logical(unemployed)) |> as_tibble() } ) |> list_rbind(names_to = \"model\") # Figure 5.4, page 167: ggplot(prototypical_depression, aes(x = months, y = depression)) + geom_line(aes(colour = unemployed)) + scale_x_continuous(breaks = seq(0, 14, by = 2)) + scale_color_brewer(palette = \"Dark2\") + coord_cartesian(xlim = c(0, 14), ylim = c(5, 20)) + facet_wrap(vars(model))"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-5.html","id":"recentring-time-varying-predictors","dir":"Articles","previous_headings":"5.3 Time-Varying Predictors","what":"5.3.3 Recentring time-varying predictors","title":"Chapter 5: Treating time more flexibly","text":"Section 5.3.3 Singer Willett (2003) return dropout_wages data discuss three strategies centring time-varying predictors: Constant centring: Centre around single substantively meaningful constant observations. Within-person centring: Decompose time-varying predictor two constituent predictors , individual, first predictor within-person mean; second predictor measurement occasion’s deviation within-person mean. Time-one centring: Decompose time-varying predictor two constituent predictors , individual, first predictor value first measurement occasion; second predictor measurement occasion’s deviation first measurement occasion. demonstrate strategies updating Model C, dropout_wages_fit_C, include main effect time-varying predictor unemployment_rate, fitting model uses constant centring (Model A2), within-person centring (Model B2), time-one centring (Model C2).","code":"# Fit models ------------------------------------------------------------------ dropout_wages_fit_A2 <- update( dropout_wages_fit_C, . ~ . + I(unemployment_rate - 7) ) dropout_wages_fit_B2 <- update( dropout_wages_fit_C, . ~ . + unemployment_rate_mean + unemployment_rate_dev, data = mutate( dropout_wages, unemployment_rate_mean = mean(unemployment_rate), unemployment_rate_dev = unemployment_rate - unemployment_rate_mean, .by = id ) ) dropout_wages_fit_C2 <- update( dropout_wages_fit_C, . ~ . + unemployment_rate_first + unemployment_rate_dev, data = mutate( dropout_wages, unemployment_rate_first = first(unemployment_rate), unemployment_rate_dev = unemployment_rate - unemployment_rate_first, .by = id ) ) dropout_wages_fits_2 <- list( `Model A2` = dropout_wages_fit_A2, `Model B2` = dropout_wages_fit_B2, `Model C2` = dropout_wages_fit_C2 ) # Make table ------------------------------------------------------------------ # Table 5.8: dropout_wages_fits_2 |> modelsummary( shape = term + effect + statistic ~ model, scales = c(\"vcov\", NA), coef_map = c( \"(Intercept)\" = \"(Intercept)\", \"I(highest_grade - 9)\" = \"I(highest_grade - 9)\", \"I(unemployment_rate - 7)\" = \"unemployment_rate\", \"unemployment_rate_mean\" = \"unemployment_rate\", \"unemployment_rate_first\" = \"unemployment_rate\", \"unemployment_rate_dev\" = \"unemployment_rate_dev\", \"black\" = \"black\", \"experience\" = \"experience\", \"experience:I(highest_grade - 9)\" = \"experience:I(highest_grade - 9)\", \"experience:black\" = \"experience:black\", \"var__Observation\" = \"var__Observation\", \"var__(Intercept)\" = \"var__(Intercept)\", \"var__experience\" = \"var__experience\" ), gof_map = tibble( raw = c(\"deviance\", \"AIC\", \"BIC\"), clean = c(\"Deviance\", \"AIC\", \"BIC\"), fmt = 1 ), fmt = 4, output = \"gt\" ) |> tab_row_group(label = \"Goodness-of-Fit\", rows = 16:18) |> tab_row_group(label = \"Variance Components\", rows = 13:15) |> tab_row_group(label = \"Fixed Effects\", rows = 1:12) |> cols_hide(effect)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-5.html","id":"recentring-the-effect-of-time","dir":"Articles","previous_headings":"","what":"5.4 Recentring the effect of time","title":"Chapter 5: Treating time more flexibly","text":"Section 5.4 Singer Willett (2003) discuss strategies centring time-indicator variables using subset data Tomarken, Shelton, Elkins, Anderson (1997), measured relation changes positive mood supplemental antidepressant medication course week sample 73 men women already receiving nonpharmacological therapy depression. example use antidepressants data set, person-period data frame 1242 rows 8 columns: id: Participant ID. wave: Wave measurement. day: Day measurement. reading: Time day reading taken. positive_mood: Positive mood score. treatment: Treatment condition (placebo pills = 0, antidepressant pills = 1). Note antidepressants data three time-indicator variables, providing different representation time: values wave reflect study’s design, little substantive meaning due conceptual difficulty dividing one week 21 components. values day reflect study’s design meaningful way, fail distinguish morning, afternoon, evening readings. values reading also reflect study’s design meaningful way—capturing time day reading taken—fail distinguish days, difficult analyze due character vector. facilitate model fitting, can create new time-indicator variables meaningful easier analyze. create two new time-indicator variables: time_of_day: Time day reading taken, expressed numerically (0 morning readings; 0.33 afternoon readings; 0.67 evening readings). time: Time measurement expressed combination day time_of_day. advantage time variable captures aspects time antidepressants data single variable, making easy centre different time points study. Following Singer Willett (2003), centre time three different points study: time: centred initial status. time_3.33: centred study’s midpoint. time_6.67: centred study’s final wave. fit three models antidepressants data demonstrate centring time affects parameter estimates interpretation: model time centred initial status (Model ), model time centred study’s midpoint (Model B), model time centred study’s final wave (Model C). Notice parameters related slope identical Models , B, C, related intercept different. Singer Willett (2003) explain, centring time-indicator variable changes location fitted trajectory’s anchors around given point time. can visualize anchoring effect plotting prototypical change trajectories models fit antidepressants data: dashed vertical lines highlight, centring time-indicator variable changes location focal comparison control treatment groups model, causing resultant estimates describe trajectories behaviours specific point time. Note Models , B, C structurally identical, matter model used make predictions—prototypical change trajectories.","code":"antidepressants #> # A tibble: 1,242 × 6 #> id wave day reading positive_mood treatment #> #> 1 1 1 0 8 AM 107. 1 #> 2 1 2 0 3 PM 100 1 #> 3 1 3 0 10 PM 100 1 #> 4 1 4 1 8 AM 100 1 #> 5 1 5 1 3 PM 100 1 #> 6 1 6 1 10 PM 100 1 #> 7 1 7 2 8 AM 100 1 #> 8 1 8 2 3 PM 100 1 #> 9 1 9 2 10 PM 100 1 #> 10 1 10 3 8 AM 107. 1 #> # ℹ 1,232 more rows antidepressants <- antidepressants |> mutate( time_of_day = case_when( reading == \"8 AM\" ~ 0, reading == \"3 PM\" ~ 1/3, reading == \"10 PM\" ~ 2/3 ), time = day + time_of_day, .after = reading ) antidepressants #> # A tibble: 1,242 × 8 #> id wave day reading time_of_day time positive_mood treatment #> #> 1 1 1 0 8 AM 0 0 107. 1 #> 2 1 2 0 3 PM 0.333 0.333 100 1 #> 3 1 3 0 10 PM 0.667 0.667 100 1 #> 4 1 4 1 8 AM 0 1 100 1 #> 5 1 5 1 3 PM 0.333 1.33 100 1 #> 6 1 6 1 10 PM 0.667 1.67 100 1 #> 7 1 7 2 8 AM 0 2 100 1 #> 8 1 8 2 3 PM 0.333 2.33 100 1 #> 9 1 9 2 10 PM 0.667 2.67 100 1 #> 10 1 10 3 8 AM 0 3 107. 1 #> # ℹ 1,232 more rows # Table 5.9, page 182: antidepressants |> select(-c(id, positive_mood, treatment)) |> mutate(time_3.33 = time - 3.33, time_6.67 = time - 6.67) #> # A tibble: 1,242 × 7 #> wave day reading time_of_day time time_3.33 time_6.67 #> #> 1 1 0 8 AM 0 0 -3.33 -6.67 #> 2 2 0 3 PM 0.333 0.333 -3.00 -6.34 #> 3 3 0 10 PM 0.667 0.667 -2.66 -6.00 #> 4 4 1 8 AM 0 1 -2.33 -5.67 #> 5 5 1 3 PM 0.333 1.33 -2.00 -5.34 #> 6 6 1 10 PM 0.667 1.67 -1.66 -5.00 #> 7 7 2 8 AM 0 2 -1.33 -4.67 #> 8 8 2 3 PM 0.333 2.33 -0.997 -4.34 #> 9 9 2 10 PM 0.667 2.67 -0.663 -4.00 #> 10 10 3 8 AM 0 3 -0.33 -3.67 #> # ℹ 1,232 more rows # Fit models ------------------------------------------------------------------ antidepressants_fit_A <- lmer( positive_mood ~ treatment * time + (1 + time | id), data = antidepressants, REML = FALSE ) antidepressants_fit_B <- update( antidepressants_fit_A, data = mutate(antidepressants, time = time - 3.33), control = lmerControl(optimizer = \"bobyqa\") ) antidepressants_fit_C <- update( antidepressants_fit_A, data = mutate(antidepressants, time = time - 6.67) ) antidepressants_fits <- list( `Model A` = antidepressants_fit_A, `Model B` = antidepressants_fit_B, `Model C` = antidepressants_fit_C ) # Make table ------------------------------------------------------------------ # Table 5.10, page 184: antidepressants_fits |> modelsummary( shape = term + effect + statistic ~ model, scales = c(\"vcov\", NA), coef_map = c( \"(Intercept)\", \"treatment\", \"time\", \"treatment:time\", \"var__Observation\", \"var__(Intercept)\", \"var__time\", \"cov__(Intercept).time\" ), gof_map = tibble( raw = c(\"deviance\", \"AIC\", \"BIC\"), clean = c(\"Deviance\", \"AIC\", \"BIC\"), fmt = 1 ), output = \"gt\" ) |> tab_row_group(label = \"Goodness-of-Fit\", rows = 13:15) |> tab_row_group(label = \"Variance Components\", rows = 9:12) |> tab_row_group(label = \"Fixed Effects\", rows = 1:8) |> cols_hide(effect) protoypical_mood <- antidepressants_fit_A |> estimate_prediction( data = tibble( treatment = c(0, 0, 0, 1, 1, 1), time = c(0, 3.33, 6.67, 0, 3.33, 6.67) ) ) |> rename(positive_mood = Predicted) |> mutate(treatment = as.logical(treatment)) # Figure 5.5, page 185: ggplot(protoypical_mood, aes(x = time, y = positive_mood)) + geom_line(aes(colour = treatment)) + geom_line(aes(group = time), linetype = 2) + scale_x_continuous(breaks = seq(0, 7, by = 1)) + scale_color_brewer(palette = \"Dark2\") + coord_cartesian(ylim = c(140, 190))"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-6.html","id":"discontinuous-individual-change","dir":"Articles","previous_headings":"","what":"6.1 Discontinuous Individual Change","title":"Chapter 6: Modeling Discontinuous and Nonlinear Change","text":"Section 6.1, Singer Willett (2003) discuss strategies fitting discontinuous individual change trajectories using subset data National Longitudinal Study Youth tracking labour market experiences male high school dropouts (Murnane, Boudett, & Willett, 1999). example return dropout_wages data set introduced Chapter 5, person-period data frame 6402 rows 9 columns: id: Participant ID. log_wages: Natural logarithm wages. experience: Labour force experience years, tracked dropouts’ first day work. ged: Binary indicator whether dropout obtained GED. postsecondary_education: Binary indicator whether dropout obtained post-secondary education. black: Binary indicator whether dropout black. hispanic: Binary indicator whether dropout hispanic. highest_grade: Highest grade completed. unemployment_rate: Unemployment rate local geographic area. demonstrated Section 5.3, inclusion one () time-varying predictor(s) level-1 individual growth model can used model discontinuous individual change trajectories. Singer Willett (2003) discuss, dropout_wages data contains several time-varying predictors can used model different forms discontinuous change. can see behaviour predictors examining data subset individuals: ged: immediate shift elevation, shift slope. postsecondary_education: immediate shift slope, shift elevation. ged_x_experience: Immediate shifts elevation slope. fit ten models dropout_wages data: Model : baseline model. Model B: model adds discontinuity elevation, slope, including fixed random effects ged. Model C: model excludes variance/covariance components associated ged Model B. Model D: model adds discontinuity slope, elevation, including fixed random effects postsecondary_education. Model E: model excludes variance/covariance components associated postsecondary_education Model D. Model F: model adds discontinuity elevation slope including fixed random effects ged postsecondary_education. Model G: model excludes variance/covariance components associated postsecondary_education Model F. Model H: model excludes variance/covariance components associated ged Model F. Model : model adds discontinuity elevation slope including fixed random effects ged interaction ged experience. Model J: model excludes variance/covariance components associated interaction ged experience Model . can visualize different forms discontinuous change hypothesized models plotting fitted change trajectory single prototypical individual. can select “final” model taxonomy models comparing deviance statistics nested models, AIC/BIC statistics non-nested models (addition using combination logic, theory, prior research). Following Singer Willett (2003), choose Model F “final” model. Finally, can plot prototypical change trajectories model. Note additional information captured discontinuous trajectories compared continuous trajectories dropout_wages data presented Chapter 5 (Figure 5.2).","code":"dropout_wages #> # A tibble: 6,402 × 9 #> id log_wages experience ged postsecondary_education black hispanic #> #> 1 31 1.49 0.015 1 0.015 0 1 #> 2 31 1.43 0.715 1 0.715 0 1 #> 3 31 1.47 1.73 1 1.73 0 1 #> 4 31 1.75 2.77 1 2.77 0 1 #> 5 31 1.93 3.93 1 3.93 0 1 #> 6 31 1.71 4.95 1 4.95 0 1 #> 7 31 2.09 5.96 1 5.96 0 1 #> 8 31 2.13 6.98 1 6.98 0 1 #> 9 36 1.98 0.315 1 0.315 0 0 #> 10 36 1.80 0.983 1 0.983 0 0 #> # ℹ 6,392 more rows #> # ℹ 2 more variables: highest_grade , unemployment_rate # Table 6.1, page 192: dropout_wages |> filter(id %in% c(206, 2365, 4384)) |> select(id, log_wages, experience, ged, postsecondary_education) |> mutate(ged_x_experience = ged * experience) |> print(n = 22) #> # A tibble: 22 × 6 #> id log_wages experience ged postsecondary_education ged_x_experience #> #> 1 206 2.03 1.87 0 0 0 #> 2 206 2.30 2.81 0 0 0 #> 3 206 2.48 4.31 0 0 0 #> 4 2365 1.78 0.66 0 0 0 #> 5 2365 1.76 1.68 0 0 0 #> 6 2365 1.71 2.74 0 0 0 #> 7 2365 1.74 3.68 0 0 0 #> 8 2365 2.19 4.68 1 0 4.68 #> 9 2365 2.04 5.72 1 1.04 5.72 #> 10 2365 2.32 6.72 1 2.04 6.72 #> 11 2365 2.66 7.87 1 3.19 7.87 #> 12 2365 2.42 9.08 1 4.40 9.08 #> 13 2365 2.39 10.0 1 5.36 10.0 #> 14 2365 2.48 11.1 1 6.44 11.1 #> 15 2365 2.44 12.0 1 7.36 12.0 #> 16 4384 2.86 0.096 0 0 0 #> 17 4384 1.53 1.04 0 0 0 #> 18 4384 1.59 1.73 1 0 1.73 #> 19 4384 1.97 3.13 1 1.40 3.13 #> 20 4384 1.68 4.28 1 2.56 4.28 #> 21 4384 2.62 5.72 1 4.00 5.72 #> 22 4384 2.58 6.02 1 4.30 6.02 # Fit models ------------------------------------------------------------------ dropout_wages_fit_A <- lmer( log_wages ~ experience + I(highest_grade - 9) + experience:black + I(unemployment_rate - 7) + (1 + experience | id), data = dropout_wages, REML = FALSE ) dropout_wages_fit_B <- update( dropout_wages_fit_A, . ~ . - (1 + experience | id) + ged + (1 + experience + ged | id), control = lmerControl(optimizer = \"bobyqa\") ) dropout_wages_fit_C <- update( dropout_wages_fit_A, . ~ . + ged, control = lmerControl(optimizer = \"bobyqa\") ) dropout_wages_fit_D <- update( dropout_wages_fit_A, . ~ . - (1 + experience | id) + postsecondary_education + (1 + experience + postsecondary_education | id), control = lmerControl(optimizer = \"bobyqa\") ) dropout_wages_fit_E <- update( dropout_wages_fit_A, . ~ . + postsecondary_education ) dropout_wages_fit_F <- update( dropout_wages_fit_A, . ~ . - (1 + experience | id) + ged + postsecondary_education + (1 + experience + ged + postsecondary_education | id), control = lmerControl(optimizer = \"bobyqa\") ) dropout_wages_fit_G <- update( dropout_wages_fit_F, . ~ . - (1 + experience + ged + postsecondary_education | id) + (1 + experience + ged | id) ) dropout_wages_fit_H <- update( dropout_wages_fit_F, . ~ . - (1 + experience + ged + postsecondary_education | id) + (1 + experience + postsecondary_education | id) ) dropout_wages_fit_I <- update( dropout_wages_fit_A, . ~ . - (1 + experience | id) + ged + experience:ged + (1 + experience + ged + experience:ged | id) ) dropout_wages_fit_J <- update( dropout_wages_fit_I, . ~ . - (1 + experience + ged + experience:ged | id) + (1 + experience + ged | id) ) dropout_wages_fits <- list( `Model A` = dropout_wages_fit_A, `Model B` = dropout_wages_fit_B, `Model C` = dropout_wages_fit_C, `Model D` = dropout_wages_fit_D, `Model E` = dropout_wages_fit_E, `Model F` = dropout_wages_fit_F, `Model G` = dropout_wages_fit_G, `Model H` = dropout_wages_fit_H, `Model I` = dropout_wages_fit_I, `Model J` = dropout_wages_fit_J ) # Make table ------------------------------------------------------------------ options(modelsummary_get = \"all\") dropout_wages_fits |> modelsummary( shape = term + effect + statistic ~ model, scales = c(\"vcov\", NA), coef_map = c( \"(Intercept)\", \"I(highest_grade - 9)\", \"I(unemployment_rate - 7)\", \"black\", \"experience\", \"experience:I(highest_grade - 9)\", \"experience:black\", \"ged\", \"postsecondary_education\", \"experience:ged\", \"var__Observation\", \"var__(Intercept)\", \"var__experience\", \"var__ged\", \"var__postsecondary_education\", \"var__experience:ged\" ), gof_map = tibble( raw = c(\"deviance\", \"AIC\", \"BIC\"), clean = c(\"Deviance\", \"AIC\", \"BIC\"), fmt = 1 ), output = \"gt\" ) |> tab_row_group(label = \"Goodness-of-Fit\", rows = 23:25) |> tab_row_group(label = \"Variance Components\", rows = 17:22) |> tab_row_group(label = \"Fixed Effects\", rows = 1:16) |> cols_hide(effect) prototypical_dropout_wages <- dropout_wages_fits |> keep_at(paste0(\"Model \", c(\"B\", \"D\", \"F\", \"I\"))) |> map( \\(.fit) { prototypical_dropout <- tibble( experience = c(0, 3, 3, 10), ged = c(0, 0, 1, 1), postsecondary_education = c(0, 0, 0, 7), highest_grade = 9, black = 1, unemployment_rate = 7, cid = c(1, 1, 2, 2) ) prototypical_dropout |> mutate(log_wages = predict(.fit, prototypical_dropout, re.form = NA)) } ) |> list_rbind(names_to = \"model\") # Similar to Figure 6.2: ggplot(prototypical_dropout_wages, aes(x = experience, y = log_wages)) + geom_line(aes(group = cid)) + geom_line(aes(group = experience), alpha = .25) + facet_wrap(vars(model)) dropout_wages_anovas <- list( \"Model B\" = anova(dropout_wages_fit_B, dropout_wages_fit_A), \"Model C\" = anova(dropout_wages_fit_C, dropout_wages_fit_B), \"Model D\" = anova(dropout_wages_fit_D, dropout_wages_fit_A), \"Model E\" = anova(dropout_wages_fit_E, dropout_wages_fit_D), \"Model F\" = anova(dropout_wages_fit_F, dropout_wages_fit_B), \"Model F\" = anova(dropout_wages_fit_F, dropout_wages_fit_D), \"Model G\" = anova(dropout_wages_fit_G, dropout_wages_fit_F), \"Model H\" = anova(dropout_wages_fit_H, dropout_wages_fit_F), \"Model I\" = anova(dropout_wages_fit_I, dropout_wages_fit_B), \"Model J\" = anova(dropout_wages_fit_J, dropout_wages_fit_I) ) # Table 6.2, page 203: dropout_wages_anovas |> map(tidy) |> list_rbind(names_to = \"model\") |> select(model, term, npar, deviance, statistic, df, p.value) |> mutate( term = stringr::str_remove(term, \"dropout_wages_fit_\"), across(c(npar, df), as.integer) ) |> group_by(model) |> gt() |> cols_label(term = \"comparison\") |> fmt_number(columns = where(is.double), decimals = 2) |> sub_missing(missing_text = \"\") # Table 6.3, page 205: dropout_wages_fit_F |> list() |> set_names(\"Estimate\") |> modelsummary( shape = term + effect + statistic ~ model, scales = c(\"vcov\", NA), coef_map = c( \"(Intercept)\", \"I(highest_grade - 9)\", \"I(unemployment_rate - 7)\", \"experience\", \"experience:black\", \"ged\", \"postsecondary_education\", \"var__Observation\", \"var__(Intercept)\", \"var__experience\", \"var__ged\", \"var__postsecondary_education\" ), gof_map = tibble( raw = c(\"deviance\", \"AIC\", \"BIC\"), clean = c(\"Deviance\", \"AIC\", \"BIC\"), fmt = 1 ), output = \"gt\" ) |> tab_row_group(label = \"Goodness-of-Fit\", rows = 20:22) |> tab_row_group(label = \"Variance Components\", rows = 15:19) |> tab_row_group(label = \"Fixed Effects\", rows = 1:14) |> cols_hide(effect) prototypical_dropout_wages_F <- dropout_wages_fit_F |> estimate_prediction( data = tibble( experience = rep(c(0, 3, 3, 10), times = 4), highest_grade = rep(c(9, 12), each = 4, times = 2), black = rep(c(FALSE, TRUE), each = 8), ged = rep(c(0, 0, 1, 1), times = 4), unemployment_rate = 7, postsecondary_education = rep(c(0, 0, 0, 7), times = 4) ) ) |> rename(log_wages = Predicted) |> mutate( highest_grade = factor(highest_grade), black = as.logical(black), cid = consecutive_id(ged) ) # Figure 6.3: ggplot(prototypical_dropout_wages_F, aes(x = experience, y = log_wages)) + geom_line(aes(group = cid, colour = black)) + geom_line( aes(group = interaction(experience, black), colour = black), alpha = .25 ) + scale_x_continuous(breaks = seq(0, 10, by = 2)) + scale_color_brewer(palette = \"Dark2\") + coord_cartesian(xlim = c(0, 10), ylim = c(1.6, 2.4)) + facet_wrap(vars(highest_grade), labeller = label_both)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-6.html","id":"using-transformations-to-model-nonlinear-individual-change","dir":"Articles","previous_headings":"","what":"6.2 Using Transformations to Model Nonlinear Individual Change","title":"Chapter 6: Modeling Discontinuous and Nonlinear Change","text":"Section 6.2, Singer Willett (2003) discuss strategies transforming outcome time-indicator variables model nonlinear individual change trajectories using subset data Curran, Stice, Chassin (1997), measured relation changes alcohol use changes peer alcohol use 3-year period community-based sample Hispanic Caucasian adolescents. example return alcohol_use_1 data set introduced Chapter 4, person-period data frame 246 rows 6 columns: id: Adolescent ID. age: Age years time measurement. child_of_alcoholic: Binary indicator whether adolescent child alcoholic parent. male: Binary indicator whether adolescent male. alcohol_use: Square root summed scores four eight-point items measuring frequency alcohol use. peer_alcohol_use: Square root summed scores two six-point items measuring frequency peer alcohol use. Note outcome variable, alcohol_use, uses square root metric—meaning square root transformation previously applied raw scores. return alcohol_use scores original metric, can back-transform applying inverse transformation. case, squaring square root transformed values. difference original transformed metrics best seen examining empirical growth plots random subset individuals. Notice original metric—despite three measurement occasions—individual change trajectories participants 13, 20, 22, 45 appear obviously nonlinear. However, square root metric trajectories become obviously linear. Singer Willett (2003) discuss, ability transform outcome variable (time-indicator variables) individual change becomes linear suggests simple strategy modelling nonlinear change: Transform outcome variable (time-indicator variables) individual change becomes linear. Fit multilevel model change transformed metric. Back-transform estimates predictions present findings original metric. return Model E Chapter 4 demonstrate strategy. Chapter 4, begin fitting multilevel model change transformed metric. However, instead plotting prototypical change trajectories transformed metric, now back-transform present original metric. Note similarities differences nonlinear trajectories compared linear trajectories alcohol_use_1 data presented Chapter 4 (Figure 4.3).","code":"alcohol_use_1 #> # A tibble: 246 × 6 #> id age child_of_alcoholic male alcohol_use peer_alcohol_use #> #> 1 1 14 1 0 1.73 1.26 #> 2 1 15 1 0 2 1.26 #> 3 1 16 1 0 2 1.26 #> 4 2 14 1 1 0 0.894 #> 5 2 15 1 1 0 0.894 #> 6 2 16 1 1 1 0.894 #> 7 3 14 1 1 1 0.894 #> 8 3 15 1 1 2 0.894 #> 9 3 16 1 1 3.32 0.894 #> 10 4 14 1 1 0 1.79 #> # ℹ 236 more rows alcohol_use_1 <- alcohol_use_1 |> mutate(alcohol_use_raw = alcohol_use^2, .before = alcohol_use) |> rename(alcohol_use_sqrt = alcohol_use) alcohol_use_1 #> # A tibble: 246 × 7 #> id age child_of_alcoholic male alcohol_use_raw alcohol_use_sqrt #> #> 1 1 14 1 0 3.00 1.73 #> 2 1 15 1 0 4 2 #> 3 1 16 1 0 4 2 #> 4 2 14 1 1 0 0 #> 5 2 15 1 1 0 0 #> 6 2 16 1 1 1 1 #> 7 3 14 1 1 1 1 #> 8 3 15 1 1 4 2 #> 9 3 16 1 1 11.0 3.32 #> 10 4 14 1 1 0 0 #> # ℹ 236 more rows #> # ℹ 1 more variable: peer_alcohol_use alcohol_use_1_empgrowth <- map( list(original = \"alcohol_use_raw\", sqrt = \"alcohol_use_sqrt\"), \\(.y) { set.seed(333) alcohol_use_1 |> filter(id %in% sample(id, size = 8)) |> ggplot(aes(x = age, y = .data[[.y]])) + geom_point() + coord_cartesian(xlim = c(13, 17), ylim = c(-1, 15)) + facet_wrap(vars(id), ncol = 4, labeller = label_both) } ) alcohol_use_1_empgrowth$original alcohol_use_1_empgrowth$sqrt alcohol_use_1_fit <- lmer( alcohol_use_sqrt ~ I(age - 14) * peer_alcohol_use + child_of_alcoholic + (1 + I(age - 14) | id), data = alcohol_use_1, REML = FALSE ) summary(alcohol_use_1_fit) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: #> alcohol_use_sqrt ~ I(age - 14) * peer_alcohol_use + child_of_alcoholic + #> (1 + I(age - 14) | id) #> Data: alcohol_use_1 #> #> AIC BIC logLik deviance df.resid #> 606.7 638.3 -294.4 588.7 237 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -2.59554 -0.40414 -0.08352 0.45550 2.29975 #> #> Random effects: #> Groups Name Variance Std.Dev. Corr #> id (Intercept) 0.2409 0.4908 #> I(age - 14) 0.1392 0.3730 -0.03 #> Residual 0.3373 0.5808 #> Number of obs: 246, groups: id, 82 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) -0.31382 0.14611 -2.148 #> I(age - 14) 0.42469 0.10559 4.022 #> peer_alcohol_use 0.69518 0.11126 6.249 #> child_of_alcoholic 0.57120 0.14623 3.906 #> I(age - 14):peer_alcohol_use -0.15138 0.08451 -1.791 #> #> Correlation of Fixed Effects: #> (Intr) I(g-14) pr_lc_ chld__ #> I(age - 14) -0.410 #> peer_lchl_s -0.709 0.351 #> chld_f_lchl -0.338 0.000 -0.146 #> I(-14):pr__ 0.334 -0.814 -0.431 0.000 prototypical_alcohol_use <- alcohol_use_1_fit |> estimate_prediction( data = crossing( age = seq(14, 16, by = .25), child_of_alcoholic = 0:1, peer_alcohol_use = c(0.655, 1.381) ) ) |> rename(alcohol_use = Predicted) |> mutate( alcohol_use = alcohol_use^2, child_of_alcoholic = factor(child_of_alcoholic), peer_alcohol_use = factor(peer_alcohol_use, labels = c(\"low\", \"high\")) ) # Figure 6.4, page 209: ggplot(prototypical_alcohol_use, aes(x = age, y = alcohol_use)) + geom_line(aes(linetype = child_of_alcoholic, colour = peer_alcohol_use)) + scale_color_viridis_d(option = \"G\", begin = .4, end = .7) + scale_x_continuous(breaks = 13:17) + coord_cartesian(xlim = c(13, 17), ylim = c(0, 3))"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-6.html","id":"selecting-a-suitable-transformation","dir":"Articles","previous_headings":"6.2 Using Transformations to Model Nonlinear Individual Change","what":"Selecting a suitable transformation","title":"Chapter 6: Modeling Discontinuous and Nonlinear Change","text":"Singer Willett (2003) suggest examining empirical growth plots participant (random subset) several different transformations selecting one analysis, keeping mind : order analyze data, transformation must used entire sample. Different transformations may less successful different participants. transformation selected analysis work well sample, compromise expected. illustrate process using berkeley data set, contains subset data Berkeley Growth Study measuring changes IQ single girl followed childhood older adulthood (Bayley, 1935). Following Singer Willett (2003), try transforming outcome time-indicator variable see successful removing nonlinearity. Note although two transformations example simply inversions (raising iq power 2.3, taking 2.3th root age), produce identical reductions nonlinearity.","code":"berkeley #> # A tibble: 18 × 2 #> age iq #> #> 1 5 37 #> 2 7 65 #> 3 9 85 #> 4 10 88 #> 5 11 95 #> 6 12 101 #> 7 13 103 #> 8 14 107 #> 9 15 113 #> 10 18 121 #> 11 21 148 #> 12 24 161 #> 13 27 165 #> 14 36 187 #> 15 42 205 #> 16 48 218 #> 17 54 218 #> 18 60 228 berkeley_transforms <- list(\"original\", \"iq^(2.3)\", \"age^(1/2.3)\") |> set_names() |> map( \\(.transform) { mutate( berkeley, transform = .transform, age = if_else(transform == \"age^(1/2.3)\", age^(1/2.3), age), iq = if_else(transform == \"iq^(2.3)\", iq^(2.3), iq) ) } ) |> list_rbind(names_to = \"metric\") |> mutate(metric = factor(metric, levels = unique(metric))) # Figure 6.6, page 212: ggplot(berkeley_transforms, aes(x = age, y = iq)) + geom_point() + facet_wrap(vars(metric), scales = \"free\", labeller = label_both)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-6.html","id":"representing-individual-change-using-a-polynomial-function-of-time","dir":"Articles","previous_headings":"","what":"6.3 Representing individual change using a polynomial function of time","title":"Chapter 6: Modeling Discontinuous and Nonlinear Change","text":"Section 6.3, Singer Willett (2003) discuss strategies fitting polynomial individual change trajectories using subset data Keiley, Bates, Dodge, Pettit (2000), measured changes externalizing behaviour sample 45 children tracked first sixth grade. example use externalizing_behaviour data set, person-period data frame 270 rows 5 columns: id: Child ID. time: Time measurement. externalizing_behaviour: Sum scores Achenbach’s (1991) Child Behavior Checklist. Scores range 0 68 female: Binary indicator whether adolescent female. grade: Grade year. Singer Willett (2003) recommend using two approaches selecting among competing polynomial forms level-1 individual growth model: Examining empirical growth plots identify highest order polynomial change trajectory suggested data. Comparing goodness--fit statistics across series polynomial level-1 models.","code":"externalizing_behaviour #> # A tibble: 270 × 5 #> id externalizing_behaviour female time grade #> #> 1 1 50 0 0 1 #> 2 1 57 0 1 2 #> 3 1 51 0 2 3 #> 4 1 48 0 3 4 #> 5 1 43 0 4 5 #> 6 1 19 0 5 6 #> 7 2 4 0 0 1 #> 8 2 6 0 1 2 #> 9 2 3 0 2 3 #> 10 2 3 0 3 4 #> # ℹ 260 more rows"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-6.html","id":"using-a-polynomial-trajectory-to-summarize-each-persons-empirical-growth-record","dir":"Articles","previous_headings":"6.3 Representing individual change using a polynomial function of time","what":"Using a polynomial trajectory to summarize each person’s empirical growth record","title":"Chapter 6: Modeling Discontinuous and Nonlinear Change","text":"begin examining empirical growth plots subset children whose trajectories span wide array individual change patterns externalizing_behaviour data. Unlike previous examples, (1) match order IDs input vector order ids resultant data frame using map() function purrr package filter data specific order; (2) assign participant subset consecutive alphabetical identifier using consecutive_id() function dplyr package. Now can examine empirical growth plots externalizing_behaviour_subset data. faced many different individual change patterns, Singer Willett (2003) suggest beginning following exploratory approach: First, identify highest order polynomial needed summarize change participant fitting separate person-specific polynomial trajectories. Second, identify highest order polynomial needed summarize change participant fitting common polynomial trajectory across participants. Similar previous examples, geom_smooth() function \"lm\" method can used add person’s fitted polynomial trajectory empirical growth record plot—need specify functional form trajectories formula argument. two ways specify polynomial trajectories using R’s formula syntax: (): () function can used construct series polynomial predictors hand. poly(): poly() function can used construct series (default, orthogonal) polynomial predictors degree 1 specified degree. begin person-specific polynomial trajectories. Note aesthetic mappings geom_smooth() formulas can used specify degree child’s polynomial trajectory, instead need add separate smooth geom empirical growth plots child’s data. Next common polynomial trajectories. Following Singer Willett (2003), select quartic (degree 4) trajectory child appears need higher order polynomial.","code":"# Note that participant 26 is last in the input vector and resultant data frame. externalizing_behaviour_subset <- c(1, 6, 11, 25, 34, 36, 40, 26) |> map(\\(.id) filter(externalizing_behaviour, id == .id)) |> list_rbind() |> mutate(child = LETTERS[consecutive_id(id)]) tail(externalizing_behaviour_subset, n = 12) #> # A tibble: 12 × 6 #> id externalizing_behaviour female time grade child #> #> 1 40 40 1 0 1 G #> 2 40 23 1 1 2 G #> 3 40 7 1 2 3 G #> 4 40 28 1 3 4 G #> 5 40 35 1 4 5 G #> 6 40 56 1 5 6 G #> 7 26 19 0 0 1 H #> 8 26 32 0 1 2 H #> 9 26 25 0 2 3 H #> 10 26 40 0 3 4 H #> 11 26 20 0 4 5 H #> 12 26 23 0 5 6 H externalizing_behaviour_empgrowth <- externalizing_behaviour_subset |> ggplot(aes(x = grade, y = externalizing_behaviour)) + geom_point() + scale_x_continuous(breaks = 0:7) + coord_cartesian(xlim = c(0, 7), ylim = c(0, 60)) + facet_wrap(vars(child), ncol = 4, labeller = label_both) externalizing_behaviour_empgrowth externalizing_behaviour_empgrowth <- externalizing_behaviour_empgrowth + map2( group_split(externalizing_behaviour_subset, child), list(2, 2, 1, 1, 3, 4, 2, 4), \\(.child, .degree) { geom_smooth( aes(linetype = polynomial, colour = degree), data = mutate( .child, polynomial = factor(\"person-specific\"), degree = factor(.degree, levels = 1:4) ), method = \"lm\", formula = y ~ poly(x, degree = .degree), se = FALSE ) } ) + scale_colour_brewer(palette = \"Dark2\", drop = FALSE) + guides(linetype = guide_legend(override.aes = list(colour = \"black\"))) externalizing_behaviour_empgrowth # Figure 6.7, page 218: externalizing_behaviour_empgrowth + geom_smooth( aes(linetype = \"common (quartic)\"), method = \"lm\", formula = y ~ poly(x, degree = 4), se = FALSE, colour = \"black\", linewidth = .5 )"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-6.html","id":"testing-higher-order-terms-in-a-polynomial-level-1-model","dir":"Articles","previous_headings":"6.3 Representing individual change using a polynomial function of time","what":"Testing Higher Order Terms in a Polynomial Level-1 Model","title":"Chapter 6: Modeling Discontinuous and Nonlinear Change","text":"select “final” polynomial trajectory, Singer Willett (2003) suggest fitting series level-1 individual growth models increasing polynomial complexity, stopping goodness--fit statistics suggest need add polynomial predictors model. fit four models externalizing_behaviour data: unconditional means model (Model ), unconditional growth model (Model B), two models increasing polynomial order (Models C D). Note set raw argument poly() function TRUE order use raw, orthogonal, polynomials. usual, can inspect analysis deviance tables compare nested models.","code":"# Fit models ------------------------------------------------------------------ externalizing_behaviour_fit_A <- lmer( externalizing_behaviour ~ 1 + (1 | id), data = externalizing_behaviour, REML = FALSE ) externalizing_behaviour_poly_fits <- map( set_names(1:3), \\(.degree) { lmer( externalizing_behaviour ~ poly(time, .degree, raw = TRUE) + (poly(time, .degree, raw = TRUE) | id), data = externalizing_behaviour, REML = FALSE ) } ) externalizing_behaviour_fits <- list( \"Model A\" = externalizing_behaviour_fit_A, \"Model B\" = externalizing_behaviour_poly_fits[[\"1\"]], \"Model C\" = externalizing_behaviour_poly_fits[[\"2\"]], \"Model D\" = externalizing_behaviour_poly_fits[[\"3\"]] ) # Make table ------------------------------------------------------------------ # Table 6.5, page 221: externalizing_behaviour_fits |> modelsummary( shape = term + effect + statistic ~ model, statistic = NULL, scales = c(\"vcov\", NA), coef_map = c( \"(Intercept)\" = \"(Intercept)\", \"poly(time, .degree, raw = TRUE)\" = \"time\", \"poly(time, .degree, raw = TRUE)1\" = \"time\", \"poly(time, .degree, raw = TRUE)2\" = \"time^2\", \"poly(time, .degree, raw = TRUE)3\" = \"time^3\", \"var__Observation\" = \"var__Observation\", \"var__(Intercept)\" = \"var__(Intercept)\", \"var__poly(time, .degree, raw = TRUE)\" = \"var__time\", \"var__poly(time, .degree, raw = TRUE)1\" = \"var__time\", \"cov__(Intercept).poly(time, .degree, raw = TRUE)\" = \"cov__(Intercept).time\", \"cov__(Intercept).poly(time, .degree, raw = TRUE)1\" = \"cov__(Intercept).time\", \"var__poly(time, .degree, raw = TRUE)2\" = \"var__time^2\", \"cov__(Intercept).poly(time, .degree, raw = TRUE)2\" = \"cov__(Intercept).time^2\", \"cov__poly(time, .degree, raw = TRUE)1.poly(time, .degree, raw = TRUE)2\" = \"cov__time.time^2\", \"var__poly(time, .degree, raw = TRUE)3\" = \"var__time^3\", \"cov__(Intercept).poly(time, .degree, raw = TRUE)3\" = \"cov__(Intercept).time^3\", \"cov__poly(time, .degree, raw = TRUE)1.poly(time, .degree, raw = TRUE)3\" = \"cov__time.time^3\", \"cov__poly(time, .degree, raw = TRUE)2.poly(time, .degree, raw = TRUE)3\" = \"cov__time^2.time^3\" ), gof_map = tibble( raw = c(\"deviance\", \"AIC\", \"BIC\"), clean = c(\"Deviance\", \"AIC\", \"BIC\"), fmt = 1 ), output = \"gt\" ) |> tab_row_group(label = \"Goodness-of-Fit\", rows = 16:18) |> tab_row_group(label = \"Variance Components\", rows = 5:15) |> tab_row_group(label = \"Fixed Effects\", rows = 1:4) |> cols_hide(effect) externalizing_behaviour_fits |> with(do.call(anova, map(names(externalizing_behaviour_fits), as.name))) |> tidy() #> # A tibble: 4 × 9 #> term npar AIC BIC logLik deviance statistic df p.value #> #> 1 Model A 3 2016. 2027. -1005. 2010. NA NA NA #> 2 Model B 6 2004. 2025. -996. 1992. 18.5 3 0.000345 #> 3 Model C 10 1996. 2032. -988. 1976. 15.9 4 0.00315 #> 4 Model D 15 1997. 2051. -984. 1967. 8.48 5 0.132"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-6.html","id":"truly-nonlinear-trajectories","dir":"Articles","previous_headings":"","what":"6.4 Truly Nonlinear Trajectories","title":"Chapter 6: Modeling Discontinuous and Nonlinear Change","text":"Section 6.4, Singer Willett (2003) discuss strategies fitting truly nonlinear change trajectories using data Tivnan (1980), measured changes cognitive growth three-week period sample 17 first second-graders. Childrens’ cognitive growth based improvement number moves completed two-person checkerboard game, Fox n’ Geese, making catastrophic error. example use cognitive_growth data set, person-period data frame 445 rows 4 columns: id: Child ID. game: Game number. child played maximum 27 games. nmoves: number moves completed making catastrophic error. reading_score: Score unnamed standardized reading test. inform specification nonlinear multilevel model change fit example, begin examining empirical growth plots subset children cognitive_growth data. Singer Willett (2003) discuss, knowledge Fox n’ Geese game inspection plots suggests (generalized) logistic trajectory level-1 individual growth model consisting three features: fixed lower asymptote: child’s trajectory rises fixed lower asymptote 1 players must make least one move. fixed upper asymptote: child’s trajectory approaches upper asymptote can make finite number moves making catastrophic error. Based examining empirical growth plots, fixed upper asymptote 20 appears reasonable. smooth curve joining lower upper asymptotes: Learning theory suggests child’s true trajectory smoothly traverse region lower upper asymptotes—accelerating away lower asymptote child initially deduces winning strategy, decelerating toward upper asymptote child finds increasingly difficult refine winning strategy . write level-1 logistic trajectory : \\[ \\text{nmoves}_{ij} = a_{1} + \\frac{(a_{2} - a_{1})}{1 + \\pi_{0i} e^{-(\\pi_{1i} \\text{game}_{ij})}} + \\epsilon_{ij}, \\] asserts \\(\\text{nmoves}_{ij}\\)—true value nmoves \\(\\)th child \\(j\\)th game—nonlinear function logistic growth parameters \\(\\pi_{0i}\\) \\(\\pi_{1i}\\). parameters \\(a_{1}\\) \\(a_{2}\\) represent lower upper asymptotes, fix values 1 20, respectively. develop intuition level-1 logistic trajectory models relationship nmoves game, can plot true trajectories different children using specific combinations values nonlinear parameters, \\(\\pi_{0i}\\) \\(\\pi_{1i}\\). can first writing function deriv() representing level-1 logistic trajectory (note also use function fitting nonlinear multilevel model change). using geom_function() geom plot true trajectories different combinations nonlinear parameters. fit two models cognitive_growth data using postulated level-1 logistic trajectory: unconditional logistic growth model (Model ); logistic growth model includes time-invariant predictor children’s reading skill, reading_score, centred sample mean 1.95625 (Model B). Following Singer Willett (2003), specify level-2 submodels nonlinear parameters Models B : \\[ \\begin{align} \\text{Model :} \\qquad \\pi_{0i} &= \\gamma_{00} + \\zeta_{0i} \\\\ \\pi_{1i} &= \\gamma_{10} + \\zeta_{1i} \\\\ \\\\ \\text{Model B:} \\qquad \\pi_{0i} &= \\gamma_{00} + \\gamma_{01}(\\text{reading_score}_i - \\overline{\\text{reading_score}}) + \\zeta_{0i} \\\\ \\pi_{1i} &= \\gamma_{10} + \\gamma_{11}(\\text{reading_score}_i - \\overline{\\text{reading_score}}) + \\zeta_{1i} \\end{align} \\] \\[ \\begin{bmatrix} \\zeta_{0i} \\\\ \\zeta_{1i} \\end{bmatrix} \\sim \\begin{pmatrix} N \\begin{bmatrix} 0 \\\\ 0 \\end{bmatrix}, \\begin{bmatrix} \\sigma^2_0 & \\ \\sigma_{10} \\\\ \\ \\sigma_{10} & \\sigma^2_1 \\end{bmatrix} \\end{pmatrix}. \\] can fit logistic growth models using nlme() function nlme package. model formula nlme() function takes form response ~ nonlinear_formula, nonlinear formula can either represented using function written directly. example use logistic_function() created . several important differences linear nonlinear models keep mind fitting models: nonlinear parameters must declared explicitly nonlinear formula. nlme() function, linear models parameters specified fixed random arguments; multiple parameters share model can written single formula instead list single parameter formulas. Starting estimates parameters must provided, unless self-starting function used calculate initial parameter estimates; final estimates can also quite sensitive starting values. example, chose starting estimates close parameter estimates reported text (Table 6.6). Strategies choosing reasonable starting estimates covered Bates Watts (1988), Pinheiro Bates (2000). intercept assumed default, must included nonlinear model formula desired. Note although models fit match described textbook equations, models exactly Singer Willett (2003) secretly fit Table 6.6 Figure 6.10 (discussion, see https://github.com/mccarthy-m-g/alda/issues/3). can plot individual prototypical change trajectories usual. First prototypical trajectories. Finally, individual trajectories. can plot trajectories together get sense interindividual variation. can add child’s fitted trajectory empirical growth plot get sense well model fits data.","code":"cognitive_growth #> # A tibble: 445 × 4 #> id game nmoves reading_score #> #> 1 1 1 4 1.4 #> 2 1 2 7 1.4 #> 3 1 3 8 1.4 #> 4 1 4 3 1.4 #> 5 1 5 3 1.4 #> 6 1 6 3 1.4 #> 7 1 7 7 1.4 #> 8 1 8 6 1.4 #> 9 1 9 3 1.4 #> 10 1 10 7 1.4 #> # ℹ 435 more rows # Figure 6.8, page 227: cognitive_growth_empgrowth <- cognitive_growth |> filter(id %in% c(1, 4, 6, 7, 8, 11, 12, 15)) |> ggplot(aes(x = game, y = nmoves)) + geom_point() + coord_cartesian(xlim = c(0, 30), ylim = c(0, 25)) + facet_wrap(vars(id), ncol = 4, labeller = label_both) cognitive_growth_empgrowth logistic_function <- deriv( ~ 1 + (19.0 / (1.0 + pi0 * exp(-pi1 * time))), namevec = c(\"pi0\", \"pi1\"), function.arg = c(\"time\", \"pi0\", \"pi1\") ) # Figure 6.9: ggplot() + pmap( arrange_all(crossing(.pi0 = c(150, 15, 1.5), .pi1 = c(.5, .3, .1)), desc), \\(.pi0, .pi1) { geom_function( aes(colour = pi1), data = tibble(pi0 = factor(.pi0), pi1 = factor(.pi1)), fun = \\(.game) { logistic_function(.game, pi0 = .pi0, pi1 = .pi1) }, n = 30 ) } ) + scale_x_continuous(limits = c(0, 30)) + scale_color_brewer(palette = \"Dark2\") + coord_cartesian(ylim = c(0, 25)) + facet_wrap(vars(pi0), labeller = label_both) + labs( x = \"game\", y = \"nmoves\" ) # Fit models ------------------------------------------------------------------ cognitive_growth_fit_A <- nlme( nmoves ~ logistic_function(game, pi0, pi1), fixed = pi0 + pi1 ~ 1, random = pi0 + pi1 ~ 1, groups = ~ id, start = list(fixed = c(pi0 = 13, pi0 = .12)), data = cognitive_growth ) cognitive_growth_fit_B <- update( cognitive_growth_fit_A, fixed = pi0 + pi1 ~ 1 + I(reading_score - 1.95625), start = list(fixed = c(13, -.4, .12, .04)) ) cognitive_growth_fits <- list( \"Model A\" = cognitive_growth_fit_A, \"Model B\" = cognitive_growth_fit_B ) # Make table ------------------------------------------------------------------ # broom.mixed and easystats don't have methods to extract random effects from # objects of class nlme, so we have to construct this part of the table manually. cognitive_growth_fits_ranef <- cognitive_growth_fits |> map( \\(.fit) { .fit |> VarCorr() |> as.data.frame(order = \"cov.last\") |> mutate( across( c(var1, var2), \\(.x) if_else(str_ends(.x, \"[:digit:]\"), paste0(.x, \".(Intercept)\"), .x) ), effect = \"random\", var1 = case_when( grp == \"Residual\" ~ \"sd__Observation\", grp == \"id\" & is.na(var2) ~ paste0(\"sd__\", var1), grp == \"id\" & !is.na(var2) ~ paste0(\"cor__pi0\", var1, \".\", var2) ) ) |> arrange(grp) |> select(term = var1, estimate = sdcor) } ) |> list_rbind(names_to = \"model\") |> pivot_wider(names_from = model, values_from = estimate) |> structure(position = 5:8) # Table 6.6, page 231: cognitive_growth_fits |> modelsummary( statistic = NULL, coef_map = c( \"pi0\" = \"pi0.(Intercept)\", \"pi0.(Intercept)\" = \"pi0.(Intercept)\", \"pi0.I(reading_score - 1.95625)\" = \"pi0.I(reading_score - 1.95625)\", \"pi1\" = \"pi1.(Intercept)\", \"pi1.(Intercept)\" = \"pi1.(Intercept)\", \"pi1.I(reading_score - 1.95625)\" = \"pi1.I(reading_score - 1.95625)\" ), gof_map = tibble( raw = c(\"deviance\", \"AIC\", \"BIC\"), clean = c(\"Deviance\", \"AIC\", \"BIC\"), fmt = 1 ), add_rows = cognitive_growth_fits_ranef, output = \"gt\" ) |> tab_row_group(label = \"Goodness-of-Fit\", rows = 9:11) |> tab_row_group(label = \"Variance Components\", rows = 5:8) |> tab_row_group(label = \"Fixed Effects\", rows = 1:4) prototypical_cognitive_growth <- cognitive_growth_fits |> map2( list( tibble(game = seq(from = 0, to = 30, by = 0.1)), crossing(game = seq(from = 0, to = 30, by = 0.1), reading_score = c(1, 4)) ), \\(.fit, .df) { .df |> mutate(nmoves = predict(.fit, newdata = .df, level = 0)) } ) |> list_rbind(names_to = \"model\") |> mutate(reading_score = factor(reading_score, labels = c(\"low\", \"high\"))) # Similar to Figure 6.10, page 232: ggplot(prototypical_cognitive_growth, aes(x = game, y = nmoves)) + geom_line(aes(colour = reading_score)) + scale_color_viridis_d( option = \"G\", begin = .4, end = .7, na.value = \"black\" ) + coord_cartesian(ylim = c(0, 25)) + facet_wrap(vars(model)) cognitive_growth_fits |> map(\\(.fit) augment(.fit, data = cognitive_growth)) |> list_rbind(names_to = \"model\") |> select(-nmoves) |> rename(nmoves = .fitted) |> mutate(reading_score = if_else(model == \"Model A\", NA, reading_score)) |> ggplot(aes(x = game, y = nmoves)) + geom_line(aes(group = id, colour = reading_score)) + scale_colour_viridis_b( option = \"G\", begin = .4, end = .8, na.value = \"black\" ) + coord_cartesian(ylim = c(0, 25)) + facet_wrap(vars(model)) cognitive_growth_empgrowth + geom_line( aes(y = .fitted, group = id, colour = reading_score), data = filter( augment(cognitive_growth_fit_B, data = cognitive_growth), id %in% c(1, 4, 6, 7, 8, 11, 12, 15) ) ) + scale_colour_viridis_b(breaks = 1:4, option = \"G\", begin = .4, end = .8)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-7.html","id":"the-standard-specification-of-the-multilevel-model-for-change","dir":"Articles","previous_headings":"","what":"7.1 The “standard” specification of the multilevel model for change","title":"Chapter 7: Examining the Multilevel Model's Error Covariance Structure","text":"Chapter 7 Singer Willett (2003) examine generalized least squares approach modelling change using artificial data created Willett (1988), simulated changes performance hypothetical “opposites naming” task four week period sample 35 people. example use opposites_naming data set, person-period data frame 140 rows 5 columns: id: Participant ID. wave: Wave measurement. time: Wave measurement centred time 0. opposites_naming_score: Score “opposites naming” task. baseline_cognitive_score: Baseline score standardized instrument assessing general cognitive skill. person-level version opposites_naming data shows, time-structured data set four measurements per participant, time-invariant predictor reflecting participant’s cognitive skill baseline. begin fitting “standard” multilevel model change opposites_naming data, serve point comparison alternative models different error covariance structures “standard” model. “standard” model opposites_naming data takes familiar form: \\[ \\begin{alignat}{2} \\text{Level 1:} \\\\ &\\text{opposites_naming_score}_{ij} = \\pi_{0i} + \\pi_{1i} \\text{time}_{ij} + \\epsilon_{ij} \\\\ \\text{Level 2:} \\\\ &\\pi_{0i} = \\gamma_{00} + \\gamma_{01} (\\text{baseline_cognitive_score}_i - 113.4571) + \\zeta_{0i} \\\\ &\\pi_{1i} = \\gamma_{10} + \\gamma_{11} (\\text{baseline_cognitive_score}_i - 113.4571) + \\zeta_{1i}. \\end{alignat} \\] \\[ \\epsilon_{ij} \\stackrel{iid}{\\sim} \\operatorname{Normal}(0, \\sigma_\\epsilon), \\] \\[ \\begin{bmatrix} \\zeta_{0i} \\\\ \\zeta_{1i} \\end{bmatrix} \\stackrel{iid}{\\sim} \\begin{pmatrix} N \\begin{bmatrix} 0 \\\\ 0 \\end{bmatrix}, \\begin{bmatrix} \\sigma^2_0 & \\ \\sigma_{10} \\\\ \\ \\sigma_{10} & \\sigma^2_1 \\end{bmatrix} \\end{pmatrix}. \\] Singer Willett (2003) discuss, focus chapter error covariance structure multilevel model change, fit model using restricted maximum likelihood goodness--fit statistics reflect stochastic portion model’s fit. Additionally, use lme() function nlme package instead lme4’s lmer() function fit model, former methods make examining fitted model’s error covariance structure easier. API lme() function similar lmer() function, except fixed random effects specified separate formulas rather single formula.","code":"opposites_naming #> # A tibble: 140 × 5 #> id wave time opposites_naming_score baseline_cognitive_score #> #> 1 1 1 0 205 137 #> 2 1 2 1 217 137 #> 3 1 3 2 268 137 #> 4 1 4 3 302 137 #> 5 2 1 0 219 123 #> 6 2 2 1 243 123 #> 7 2 3 2 279 123 #> 8 2 4 3 302 123 #> 9 3 1 0 142 129 #> 10 3 2 1 212 129 #> # ℹ 130 more rows opposites_naming_pl <- opposites_naming |> select(-time) |> pivot_wider( names_from = wave, values_from = opposites_naming_score, names_prefix = \"opp_\" ) |> relocate(baseline_cognitive_score, .after = everything()) # Table 7.1: head(opposites_naming_pl, 10) #> # A tibble: 10 × 6 #> id opp_1 opp_2 opp_3 opp_4 baseline_cognitive_score #> #> 1 1 205 217 268 302 137 #> 2 2 219 243 279 302 123 #> 3 3 142 212 250 289 129 #> 4 4 206 230 248 273 125 #> 5 5 190 220 229 220 81 #> 6 6 165 205 207 263 110 #> 7 7 170 182 214 268 99 #> 8 8 96 131 159 213 113 #> 9 9 138 156 197 200 104 #> 10 10 216 252 274 298 96 # Fit model ------------------------------------------------------------------- opposites_naming_fit_standard <- lme( opposites_naming_score ~ time * I(baseline_cognitive_score - 113.4571), random = ~ time | id, data = opposites_naming, method = \"REML\" ) # Make table ------------------------------------------------------------------ options(modelsummary_get = \"all\") # Table 7.2, page 246: opposites_naming_fit_standard |> list() |> set_names(\"Estimate\") |> modelsummary( fmt = 2, statistic = NULL, effects = c(\"var_model\", \"ran_pars\", \"fixed\"), scales = c(\"vcov\", \"vcov\", NA), coef_map = c( \"(Intercept)\", \"I(baseline_cognitive_score - 113.4571)\", \"time\", \"time:I(baseline_cognitive_score - 113.4571)\", \"var_Observation\", \"var_(Intercept)\", \"var_time\", \"cov_time.(Intercept)\" ), gof_map = list( list( raw = \"logLik\", clean = \"Deviance\", fmt = \\(.x) vec_fmt_number( -2*as.numeric(.x), decimals = 1, sep_mark = \"\" ) ), list( raw = \"AIC\", clean = \"AIC\", fmt = fmt_decimal(1) ), list( raw = \"BIC\", clean = \"BIC\", fmt = fmt_decimal(1) ) ), output = \"gt\" ) |> tab_row_group(label = \"Goodness-of-Fit\", rows = 9:11) |> tab_row_group(label = \"Variance Components\", rows = 5:8) |> tab_row_group(label = \"Fixed Effects\", rows = 1:4)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-7.html","id":"using-the-composite-model-to-understand-assumptions-about-the-error-covariance-matrix","dir":"Articles","previous_headings":"","what":"7.2 Using the composite model to understand assumptions about the error covariance matrix","title":"Chapter 7: Examining the Multilevel Model's Error Covariance Structure","text":"Section 7.2 Singer Willett (2003) examine error covariance structure implied “standard” multilevel model change, given random effects specification. , begin substituting level-2 equations level-1, yielding composite representation “standard” model: \\[ \\text{opp}_{ij} = \\gamma_{00} + \\gamma_{10}\\text{time}_{ij} + \\gamma_{01}(\\text{cog}_i - 113.4571) + \\gamma_{11}\\text{time}_{ij}(\\text{cog}_i - 113.4571) + r_{ij}, \\] composite residual, \\(r_{ij}\\), represents weighted linear combination model’s original three random effects: \\[ r_{ij} = \\epsilon_{ij} + \\zeta_{0i} + \\zeta_{1i} \\text{time}_{ij}. \\] Notice composite model now looks like typical multiple regression model—usual error term, \\(\\epsilon_i\\), replaced composite residual, \\(r_{ij}\\). Following observation, can reexpress distributional assumptions residuals “standard” multilevel model change one grand statement based composite residual: \\[ r \\sim N \\begin{pmatrix} \\mathbf 0, \\begin{bmatrix} \\mathbf{\\Sigma}_r & \\mathbf 0 & \\mathbf 0 & \\dots & \\mathbf 0 \\\\ \\mathbf 0 & \\mathbf{\\Sigma}_r & \\mathbf 0 & \\dots & \\mathbf 0 \\\\ \\mathbf 0 & \\mathbf 0 & \\mathbf{\\Sigma}_r & \\dots & \\mathbf 0 \\\\ \\vdots & \\vdots & \\vdots & \\ddots & \\mathbf 0 \\\\ \\mathbf 0 & \\mathbf 0 & \\mathbf 0 & \\mathbf 0 & \\mathbf{\\Sigma}_r \\end{bmatrix} \\end{pmatrix}, \\] \\(\\mathbf{\\Sigma}_r\\) represents block diagonal error covariance sub-matrix whose dimensions reflect design opposites_naming data, given : $$ \\[\\begin{align} \\mathbf{\\Sigma}_r & = \\begin{bmatrix} \\sigma_{r_1}^2 & \\sigma_{r_1 r_2} & \\sigma_{r_1 r_3} & \\sigma_{r_1 r_4} \\\\ \\sigma_{r_2 r_1} & \\sigma_{r_2}^2 & \\sigma_{r_2 r_3} & \\sigma_{r_2 r_4} \\\\ \\sigma_{r_3 r_1} & \\sigma_{r_3 r_2} & \\sigma_{r_3}^2 & \\sigma_{r_3 r_4} \\\\ \\sigma_{r_4 r_1} & \\sigma_{r_4 r_2} & \\sigma_{r_4 r_3} & \\sigma_{r_4}^2 \\end{bmatrix}, \\end{align}\\] $$ occasion-specific composite residual variances \\[ \\begin{align} \\sigma_{r_j}^2 &= \\operatorname{Var} \\left( \\epsilon_{ij} + \\zeta_{0i} + \\zeta_{1i} \\text{time}_j \\right) \\\\ &= \\sigma_\\epsilon^2 + \\sigma_0^2 + 2 \\sigma_{01} \\text{time}_j + \\sigma_1^2 \\text{time}_j^2, \\end{align} \\] occasion-specific composite residual covariances \\[ \\sigma_{r_j, r_{j'}} = \\sigma_0^2 + \\sigma_{01} (\\text{time}_j + \\text{time}_{j'}) + \\sigma_1^2 \\text{time}_j \\text{time}_{j'}, \\] terms usual meanings. can retrieve error covariance sub-matrix, \\(\\mathbf{\\Sigma}_r\\), opposites_naming_fit_standard fit using getVarCov() function nlme package. emphasize \\(\\mathbf{\\Sigma}_r\\) participant, retrieve first last participants data. descriptive purposes, can also convert \\(\\mathbf{\\Sigma}_r\\) correlation matrix using cov2cor() function, examine residual autocorrelation measurement occasions. Singer Willett (2003) discuss, examining equations outputs reveals two important properties occasion-specific residuals “standard” multilevel model change: can heteroscedastic autocorrelated within participants (remember across participants, identically heteroscedastic autocorrelated homogeneity assumption). powerful dependence time. Specifically, residual variances, \\(\\sigma_{r_j}^2\\), quadratic dependence time minimum time \\(\\text{time} = -(\\sigma_{01} / \\sigma_1^2)\\) increase parabolically symmetrically time either side minimum; residual covariances, \\(\\sigma_{r_j, r_{j'}}\\), (imperfect) band diagonal structure wherein overall magnitude residual covariances tends decline diagonal “bands” main diagonal. first properties—allowing heteroscedasticity autocorrelation among composite residuals—necessity given anticipated demands longitudinal data. longitudinal data sets heteroscedastic autocorrelated, credible model change allow potential heteroscedasticity autocorrelation. One advantage “standard” multilevel model change —although composite residuals powerful dependence time—also capable adapting relatively smoothly many common empirical situations, accommodating automatically certain kinds complex error structure. Nonetheless, Singer Willett (2003) conclude questioning whether hypothesized structure error covariance matrix implied “standard” model can applied ubiquitously, may empirical situations directly modelling alternative error covariance structures may preferable.","code":"opposites_naming_varcov_standard <- opposites_naming_fit_standard |> getVarCov(type = \"marginal\", individuals = c(1, 35)) opposites_naming_varcov_standard #> id 1 #> Marginal variance covariance matrix #> 1 2 3 4 #> 1 1395.90 1058.20 879.95 701.71 #> 2 1058.20 1146.70 916.21 845.22 #> 3 879.95 916.21 1111.90 988.73 #> 4 701.71 845.22 988.73 1291.70 #> Standard Deviations: 37.362 33.863 33.346 35.94 #> id 35 #> Marginal variance covariance matrix #> 1 2 3 4 #> 1 1395.90 1058.20 879.95 701.71 #> 2 1058.20 1146.70 916.21 845.22 #> 3 879.95 916.21 1111.90 988.73 #> 4 701.71 845.22 988.73 1291.70 #> Standard Deviations: 37.362 33.863 33.346 35.94 cov2cor(opposites_naming_varcov_standard[[1]]) #> 1 2 3 4 #> 1 1.0000000 0.8364012 0.7062988 0.5225751 #> 2 0.8364012 1.0000000 0.8113955 0.6944913 #> 3 0.7062988 0.8113955 1.0000000 0.8249967 #> 4 0.5225751 0.6944913 0.8249967 1.0000000"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-7.html","id":"postulating-an-alternative-error-covariance-structure","dir":"Articles","previous_headings":"","what":"7.3 Postulating an alternative error covariance structure","title":"Chapter 7: Examining the Multilevel Model's Error Covariance Structure","text":"section 7.3 Singer Willett (2003) discuss alternative error covariance structures can modelled directly using extended linear model change heteroscedastic, correlated errors fitted generalized least squares regression. See Chapter 5 Pinheiro Bates (2010) discussion extended linear model. can fit extended linear model change gls() function nlme package, allows us model within-group heteroscedasticity correlation structures via weights correlation arguments, respectively. fit six models opposites_naming data following error covariance structures: unstructured, compound symmetric, heterogeneous compound symmetric, autoregressive, heterogeneous autoregressive, Toeplitz. Notice unlike multilevel model change, extended linear model change random effects. need equations table —’s nothing interesting look —hence section collapsed. $$\\mathbf{\\Sigma}_r = \\begin{bmatrix}\\sigma_1^2 & \\sigma_{12} & \\sigma_{13} & \\sigma_{14} \\\\ \\sigma_{21} & \\sigma_2^2 & \\sigma_{23} & \\sigma_{24} \\\\ \\sigma_{31} & \\sigma_{32} & \\sigma_3^2 & \\sigma_{34} \\\\ \\sigma_{41} & \\sigma_{42} & \\sigma_{43} & \\sigma_4^2 \\end{bmatrix}$$ $$\\begin{bmatrix}1345.1&1005.8&946.2&583.2\\\\1005.8&1150.5&1028.5&846.6\\\\946.2&1028.5&1235.8&969.3\\\\583.2&846.6&969.3&1206\\\\\\end{bmatrix}$$ $$\\mathbf{\\Sigma}_r = \\begin{bmatrix} \\sigma^2 + \\sigma_1^2 & \\sigma_1^2 & \\sigma_1^2 & \\sigma_1^2 \\\\ \\sigma_1^2 & \\sigma^2 + \\sigma_1^2 & \\sigma_1^2 & \\sigma_1^2 \\\\ \\sigma_1^2 & \\sigma_1^2 & \\sigma^2 + \\sigma_1^2 & \\sigma_1^2 \\\\ \\sigma_1^2 & \\sigma_1^2 & \\sigma_1^2 & \\sigma^2 + \\sigma_1^2 \\end{bmatrix}$$ $$\\begin{bmatrix}1231.4&900.1&900.1&900.1\\\\900.1&1231.4&900.1&900.1\\\\900.1&900.1&1231.4&900.1\\\\900.1&900.1&900.1&1231.4\\\\\\end{bmatrix}$$ $$\\mathbf{\\Sigma}_r = \\begin{bmatrix} \\sigma_1^2 & \\sigma_1 \\sigma_2 \\rho & \\sigma_1 \\sigma_3 \\rho & \\sigma_1 \\sigma_4 \\rho \\\\ \\sigma_2 \\sigma_1 \\rho & \\sigma_1^2 & \\sigma_2 \\sigma_3 \\rho & \\sigma_2 \\sigma_4 \\rho \\\\ \\sigma_3 \\sigma_1 \\rho & \\sigma_3 \\sigma_2 \\rho & \\sigma_3^2 & \\sigma_3 \\sigma_4 \\rho \\\\ \\sigma_4 \\sigma_1 \\rho & \\sigma_4 \\sigma_2 \\rho & \\sigma_4 \\sigma_3 \\rho & \\sigma_4^2 \\end{bmatrix}$$ $$\\begin{bmatrix}1438.1&912.9&946.6&1009.5\\\\912.9&1067.8&815.7&869.9\\\\946.6&815.7&1148&902\\\\1009.5&869.9&902&1305.7\\\\\\end{bmatrix}$$ $$\\mathbf{\\Sigma}_r = \\begin{bmatrix} \\sigma^2 & \\sigma^2 \\rho & \\sigma^2 \\rho^2 & \\sigma^2 \\rho^3 \\\\ \\sigma^2 \\rho & \\sigma^2 & \\sigma^2 \\rho & \\sigma^2 \\rho^2 \\\\ \\sigma^2 \\rho^2 & \\sigma^2 \\rho & \\sigma^2 & \\sigma^2 \\rho \\\\ \\sigma^2 \\rho^3 & \\sigma^2 \\rho^2 & \\sigma^2 \\rho & \\sigma^2 \\end{bmatrix}$$ $$\\begin{bmatrix}1256.7&1037.2&856&706.5\\\\1037.2&1256.7&1037.2&856\\\\856&1037.2&1256.7&1037.2\\\\706.5&856&1037.2&1256.7\\\\\\end{bmatrix}$$ `qZ$$\\mathbf{\\Sigma}_r = \\begin{bmatrix} \\sigma_1^2 & \\sigma_1 \\sigma_2 \\rho & \\sigma_1 \\sigma_3 \\rho^2 & \\sigma_1 \\sigma_4 \\rho^3 \\ \\sigma_2 \\sigma_1 \\rho & \\sigma_2^2 & \\sigma_2 \\sigma_3 \\rho & \\sigma_2 \\sigma_4 \\rho^2 \\ \\sigma_3 \\sigma_1 \\rho^2 & \\sigma_3 \\sigma_2 \\rho & \\sigma_3^2 & \\sigma_3 \\sigma_4 \\rho \\ \\sigma_4 \\sigma_1 \\rho^3 & \\sigma_4 \\sigma_2 \\rho^2 & \\sigma_4 \\sigma_3 \\rho & \\sigma_4^2 \\end{bmatrix} $$ $$\\begin{bmatrix}1340.7&1000.7&857.3&708.9\\\\1000.7&1111.2&952&787.1\\\\857.3&952&1213.3&1003.2\\\\708.9&787.1&1003.2&1234\\\\\\end{bmatrix}$$ `aH$$\\mathbf{\\Sigma}_r = \\begin{bmatrix}\\sigma^2 & \\sigma_1 & \\sigma_2 & \\sigma_3 \\ \\sigma_1 & \\sigma^2 & \\sigma_1 & \\sigma_2 \\ \\sigma_2 & \\sigma_1 & \\sigma^2 & \\sigma_1 \\ \\sigma_3 & \\sigma_2 & \\sigma_1 & \\sigma^2 \\end{bmatrix} $$ $$\\begin{bmatrix}1246.9&1029.3&896.6&624\\\\1029.3&1246.9&1029.3&896.6\\\\896.6&1029.3&1246.9&1029.3\\\\624&896.6&1029.3&1246.9\\\\\\end{bmatrix}$$ Comparing deviance (-2LL), AIC, BIC statistics alternative error covariance structures, find unstructured Toeplitz structures lead best-fitting models opposites_naming data. Finally, can see gained lost modelling error covariance structure directly (instead indirectly random effects) comparing fixed effect estimates goodness--fit statistics unstructured Toeplitz models “standard” multilevel model change. Singer Willett (2003) observe opposites_naming data: Toeplitz model fits slightly better “standard” model accounts, enough reject “standard” model. unstructured model fits best focus exclusively deviance statistic, cost losing degrees freedom error covariance structure considered . fixed effects estimates similar “standard”, Toeplitz, unstructured models (except (baseline_cognitive_score - 113.4571)), precision estimates slightly better Toeplitz unstructured models, better represent error covariance structure data. Thus, data, conclude much gained replacing “standard” multilevel model change extended linear models change explored . However, data sets, magnitude difference modelling approaches may greater (depending study design, statistical model, choice error covariance structure, nature phenomenon study), may lead us prefer extended linear model change—inferential goal exclusively involves population-averaged interpretations fixed effects, interested addressing questions individuals via random effects (discussion, see McNeish, Stapleton, & Silverman, 2017; Muff, Held, & Keller, 2016).","code":"hypothesized_varcov <- list( unstructured = r\"($$\\mathbf{\\Sigma}_r = \\begin{bmatrix}\\sigma_1^2 & \\sigma_{12} & \\sigma_{13} & \\sigma_{14} \\\\ \\sigma_{21} & \\sigma_2^2 & \\sigma_{23} & \\sigma_{24} \\\\ \\sigma_{31} & \\sigma_{32} & \\sigma_3^2 & \\sigma_{34} \\\\ \\sigma_{41} & \\sigma_{42} & \\sigma_{43} & \\sigma_4^2 \\end{bmatrix}$$)\", compsymm = r\"($$\\mathbf{\\Sigma}_r = \\begin{bmatrix} \\sigma^2 + \\sigma_1^2 & \\sigma_1^2 & \\sigma_1^2 & \\sigma_1^2 \\\\ \\sigma_1^2 & \\sigma^2 + \\sigma_1^2 & \\sigma_1^2 & \\sigma_1^2 \\\\ \\sigma_1^2 & \\sigma_1^2 & \\sigma^2 + \\sigma_1^2 & \\sigma_1^2 \\\\ \\sigma_1^2 & \\sigma_1^2 & \\sigma_1^2 & \\sigma^2 + \\sigma_1^2 \\end{bmatrix}$$)\", hetcompsymm = r\"($$\\mathbf{\\Sigma}_r = \\begin{bmatrix} \\sigma_1^2 & \\sigma_1 \\sigma_2 \\rho & \\sigma_1 \\sigma_3 \\rho & \\sigma_1 \\sigma_4 \\rho \\\\ \\sigma_2 \\sigma_1 \\rho & \\sigma_1^2 & \\sigma_2 \\sigma_3 \\rho & \\sigma_2 \\sigma_4 \\rho \\\\ \\sigma_3 \\sigma_1 \\rho & \\sigma_3 \\sigma_2 \\rho & \\sigma_3^2 & \\sigma_3 \\sigma_4 \\rho \\\\ \\sigma_4 \\sigma_1 \\rho & \\sigma_4 \\sigma_2 \\rho & \\sigma_4 \\sigma_3 \\rho & \\sigma_4^2 \\end{bmatrix}$$)\", ar = r\"($$\\mathbf{\\Sigma}_r = \\begin{bmatrix} \\sigma^2 & \\sigma^2 \\rho & \\sigma^2 \\rho^2 & \\sigma^2 \\rho^3 \\\\ \\sigma^2 \\rho & \\sigma^2 & \\sigma^2 \\rho & \\sigma^2 \\rho^2 \\\\ \\sigma^2 \\rho^2 & \\sigma^2 \\rho & \\sigma^2 & \\sigma^2 \\rho \\\\ \\sigma^2 \\rho^3 & \\sigma^2 \\rho^2 & \\sigma^2 \\rho & \\sigma^2 \\end{bmatrix}$$)\", hetar = r\"($$\\mathbf{\\Sigma}_r = \\begin{bmatrix} \\sigma_1^2 & \\sigma_1 \\sigma_2 \\rho & \\sigma_1 \\sigma_3 \\rho^2 & \\sigma_1 \\sigma_4 \\rho^3 \\\\ \\sigma_2 \\sigma_1 \\rho & \\sigma_2^2 & \\sigma_2 \\sigma_3 \\rho & \\sigma_2 \\sigma_4 \\rho^2 \\\\ \\sigma_3 \\sigma_1 \\rho^2 & \\sigma_3 \\sigma_2 \\rho & \\sigma_3^2 & \\sigma_3 \\sigma_4 \\rho \\\\ \\sigma_4 \\sigma_1 \\rho^3 & \\sigma_4 \\sigma_2 \\rho^2 & \\sigma_4 \\sigma_3 \\rho & \\sigma_4^2 \\end{bmatrix} $$)\", toeplitz = r\"($$\\mathbf{\\Sigma}_r = \\begin{bmatrix}\\sigma^2 & \\sigma_1 & \\sigma_2 & \\sigma_3 \\\\ \\sigma_1 & \\sigma^2 & \\sigma_1 & \\sigma_2 \\\\ \\sigma_2 & \\sigma_1 & \\sigma^2 & \\sigma_1 \\\\ \\sigma_3 & \\sigma_2 & \\sigma_1 & \\sigma^2 \\end{bmatrix} $$)\" ) # Fit models ------------------------------------------------------------------ # Start with a base model we can update with the alternative error covariance # structures. Note that we won't display this model in the table. opposites_naming_fit <- gls( opposites_naming_score ~ time * I(baseline_cognitive_score - 113.4571), method = \"REML\", data = opposites_naming ) # Unstructured: opposites_naming_fit_unstructured <- update( opposites_naming_fit, correlation = corSymm(form = ~ 1 | id), weights = varIdent(form = ~ 1 | wave) ) # Compound symmetry: opposites_naming_fit_compsymm <- update( opposites_naming_fit, correlation = corCompSymm(form = ~ 1 | id) ) # Heterogeneous compound symmetry: opposites_naming_fit_hetcompsymm <- update( opposites_naming_fit_compsymm, weights = varIdent(form = ~ 1 | wave) ) # Autoregressive: opposites_naming_fit_ar <- update( opposites_naming_fit, correlation = corAR1(form = ~ 1 | id) ) # Heterogeneous autoregressive: opposites_naming_fit_hetar <- update( opposites_naming_fit_ar, weights = varIdent(form = ~ 1 | wave) ) # Toeplitz: opposites_naming_fit_toeplitz <- update( opposites_naming_fit, correlation = corARMA(form = ~ 1 | id, p = 3,q = 0) ) opposites_naming_fits <- list( \"Unstructured\" = opposites_naming_fit_unstructured, \"Compound symmetry\" = opposites_naming_fit_compsymm, \"Heterogeneous compound symmetry\" = opposites_naming_fit_hetcompsymm, \"Autoregressive\" = opposites_naming_fit_ar, \"Heterogeneous autoregressive\" = opposites_naming_fit_hetar, \"Toeplitz\" = opposites_naming_fit_toeplitz ) # Make table ------------------------------------------------------------------ # Table 7.3, page 258-259: opposites_naming_fits |> map2( # Note that this list was made in the collapsed code chunk above. It just # contains the equations corresponding to each error covariance structure. hypothesized_varcov, \\(.fit, .hypothesized_varcov) { format_varcov <- function(x) { x <- round(getVarCov(x), digits = 1) begin <- \"$$\\\\begin{bmatrix}\" body <- apply(x, 1, \\(.x) paste0(paste(.x, collapse = \"&\"), \"\\\\\\\\\")) end <- \"\\\\end{bmatrix}$$\" paste0(c(begin, body, end), collapse = \"\") } gof <- .fit |> glance() |> mutate( hypothesized_varcov = .hypothesized_varcov, \"-2LL\" = as.numeric(-2 * logLik), varcov = format_varcov(.fit), across(where(is.numeric), \\(.x) round(.x, digits = 1)) ) |> select(hypothesized_varcov, \"-2LL\", AIC, BIC, varcov) } ) |> list_rbind(names_to = \"structure\") |> gt() |> # Note: Math formatting in HTML currently requires gt version 0.10.1.9000 # (development version). fmt_markdown(columns = c(hypothesized_varcov, varcov)) # Table 7.4, page 265: opposites_naming_fit_standard |> list() |> set_names(\"Standard\") |> c(keep_at(opposites_naming_fits, c(\"Toeplitz\", \"Unstructured\"))) |> (\\(.x) .x[c(\"Standard\", \"Toeplitz\", \"Unstructured\")])() |> modelsummary( fmt = fmt_statistic(estimate = 2, statistic = 3), gof_map = list( list( raw = \"logLik\", clean = \"Deviance\", fmt = \\(.x) vec_fmt_number( -2*as.numeric(.x), decimals = 1, sep_mark = \"\" ) ), list( raw = \"AIC\", clean = \"AIC\", fmt = fmt_decimal(1) ), list( raw = \"BIC\", clean = \"BIC\", fmt = fmt_decimal(1) ) ), output = \"gt\" ) |> tab_row_group(label = \"Goodness-of-Fit\", rows = 13:15) |> tab_row_group(label = \"Variance Components\", rows = 9:12) |> tab_row_group(label = \"Fixed Effects\", rows = 1:8)"},{"path":[]},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-8.html","id":"the-basics-of-latent-growth-modeling","dir":"Articles","previous_headings":"","what":"8.2 The Basics of Latent Growth Modeling","title":"Chapter 8: Modeling change using covariance structure analysis","text":"Table 8.1, page 282: Table 8.2, page 289, Model : Figure 8.2, Model : Table 8.2, page 289, Model B: Comparison baseline model Model B: Figure 8.2, Model B: Table 8.2, page 289, Model C: Figure 8.2, Model C (Model B slope ~ 0*female): Table 8.2, page 289, Model D: Comparison baseline model Model D: Figure 8.2, Model D:","code":"alcohol_use_2_wide <- alcohol_use_2 |> pivot_wider(names_from = time, values_from = c(alcohol_use, peer_pressure)) alcohol_use_2_wide #> # A tibble: 1,122 × 8 #> id female alcohol_use_0 alcohol_use_1 alcohol_use_2 peer_pressure_0 #> #> 1 1 0 0.693 0.288 0.511 0 #> 2 2 0 0 0 0 0 #> 3 3 0 0 0 0 0 #> 4 4 0 0 0.511 0.511 1.10 #> 5 5 0 0.288 0 0.847 0 #> 6 6 0 0 0 0 0 #> 7 7 0 0.288 0.288 0 0 #> 8 8 0 0 0 0 0 #> 9 9 0 0 0.511 0 0 #> 10 10 0 0.511 0.693 1.30 0 #> # ℹ 1,112 more rows #> # ℹ 2 more variables: peer_pressure_1 , peer_pressure_2 # Means alcohol_use_2_wide |> summarise(across(female:peer_pressure_2, mean)) |> glimpse() #> Rows: 1 #> Columns: 7 #> $ female 0.6122995 #> $ alcohol_use_0 0.2250666 #> $ alcohol_use_1 0.2541351 #> $ alcohol_use_2 0.287923 #> $ peer_pressure_0 0.1771944 #> $ peer_pressure_1 0.2904569 #> $ peer_pressure_2 0.3470381 # Covariances cov(select(alcohol_use_2_wide, -c(id, female))) #> alcohol_use_0 alcohol_use_1 alcohol_use_2 peer_pressure_0 #> alcohol_use_0 0.13558718 0.07775260 0.06526470 0.06586967 #> alcohol_use_1 0.07775260 0.15528121 0.08186386 0.04479710 #> alcohol_use_2 0.06526470 0.08186386 0.18075945 0.03988182 #> peer_pressure_0 0.06586967 0.04479710 0.03988182 0.17399159 #> peer_pressure_1 0.06404875 0.09647876 0.06580980 0.07158186 #> peer_pressure_2 0.06008199 0.07433086 0.13197010 0.07071309 #> peer_pressure_1 peer_pressure_2 #> alcohol_use_0 0.06404875 0.06008199 #> alcohol_use_1 0.09647876 0.07433086 #> alcohol_use_2 0.06580980 0.13197010 #> peer_pressure_0 0.07158186 0.07071309 #> peer_pressure_1 0.26190160 0.11180554 #> peer_pressure_2 0.11180554 0.28901177 # Model A: Unconditional model model_A <- (\" # Intercept and slope with fixed coefficients intercept =~ 1*alcohol_use_0 + 1*alcohol_use_1 + 1*alcohol_use_2 slope =~ 0*alcohol_use_0 + .75*alcohol_use_1 + 1.75*alcohol_use_2 \") model_A_fit <- growth( model_A, data = alcohol_use_2_wide, estimator = \"ml\", mimic = \"Mplus\" ) summary(model_A_fit) #> lavaan 0.6.17 ended normally after 32 iterations #> #> Estimator ML #> Optimization method NLMINB #> Number of model parameters 8 #> #> Number of observations 1122 #> Number of missing patterns 1 #> #> Model Test User Model: #> #> Test statistic 0.048 #> Degrees of freedom 1 #> P-value (Chi-square) 0.826 #> #> Parameter Estimates: #> #> Standard errors Standard #> Information Observed #> Observed information based on Hessian #> #> Latent Variables: #> Estimate Std.Err z-value P(>|z|) #> intercept =~ #> alcohol_use_0 1.000 #> alcohol_use_1 1.000 #> alcohol_use_2 1.000 #> slope =~ #> alcohol_use_0 0.000 #> alcohol_use_1 0.750 #> alcohol_use_2 1.750 #> #> Covariances: #> Estimate Std.Err z-value P(>|z|) #> intercept ~~ #> slope -0.012 0.005 -2.727 0.006 #> #> Intercepts: #> Estimate Std.Err z-value P(>|z|) #> intercept 0.226 0.011 21.106 0.000 #> slope 0.036 0.007 4.898 0.000 #> #> Variances: #> Estimate Std.Err z-value P(>|z|) #> .alcohol_use_0 0.048 0.006 7.550 0.000 #> .alcohol_use_1 0.076 0.004 17.051 0.000 #> .alcohol_use_2 0.077 0.010 7.756 0.000 #> intercept 0.087 0.007 12.253 0.000 #> slope 0.020 0.005 3.795 0.000 fitMeasures(model_A_fit, c(\"chisq\", \"df\", \"pvalue\", \"cfi\", \"rmsea\")) #> chisq df pvalue cfi rmsea #> 0.048 1.000 0.826 1.000 0.000 lay <- get_layout( NA, \"intercept\", NA, \"slope\", NA, \"alcohol_use_0\", NA, \"alcohol_use_1\", NA, \"alcohol_use_2\", rows = 2 ) graph_sem(model_A_fit, layout = lay) # Model B: Adding female as a time-invariant predictor model_B <- (\" # Intercept and slope with fixed coefficients intercept =~ 1*alcohol_use_0 + 1*alcohol_use_1 + 1*alcohol_use_2 slope =~ 0*alcohol_use_0 + .75*alcohol_use_1 + 1.75*alcohol_use_2 # Regressions intercept ~ female slope ~ female \") model_B_fit <- growth( model_B, data = alcohol_use_2_wide, estimator = \"ml\", mimic = \"Mplus\" ) summary(model_B_fit) #> lavaan 0.6.17 ended normally after 33 iterations #> #> Estimator ML #> Optimization method NLMINB #> Number of model parameters 10 #> #> Number of observations 1122 #> Number of missing patterns 1 #> #> Model Test User Model: #> #> Test statistic 1.545 #> Degrees of freedom 2 #> P-value (Chi-square) 0.462 #> #> Parameter Estimates: #> #> Standard errors Standard #> Information Observed #> Observed information based on Hessian #> #> Latent Variables: #> Estimate Std.Err z-value P(>|z|) #> intercept =~ #> alcohol_use_0 1.000 #> alcohol_use_1 1.000 #> alcohol_use_2 1.000 #> slope =~ #> alcohol_use_0 0.000 #> alcohol_use_1 0.750 #> alcohol_use_2 1.750 #> #> Regressions: #> Estimate Std.Err z-value P(>|z|) #> intercept ~ #> female -0.042 0.022 -1.912 0.056 #> slope ~ #> female 0.008 0.015 0.522 0.602 #> #> Covariances: #> Estimate Std.Err z-value P(>|z|) #> .intercept ~~ #> .slope -0.012 0.005 -2.661 0.008 #> #> Intercepts: #> Estimate Std.Err z-value P(>|z|) #> .intercept 0.251 0.017 14.653 0.000 #> .slope 0.031 0.012 2.640 0.008 #> #> Variances: #> Estimate Std.Err z-value P(>|z|) #> .alcohol_use_0 0.049 0.006 7.616 0.000 #> .alcohol_use_1 0.075 0.004 17.036 0.000 #> .alcohol_use_2 0.077 0.010 7.789 0.000 #> .intercept 0.086 0.007 12.191 0.000 #> .slope 0.019 0.005 3.740 0.000 fitMeasures(model_B_fit, c(\"chisq\", \"df\", \"pvalue\", \"cfi\", \"rmsea\")) #> chisq df pvalue cfi rmsea #> 1.545 2.000 0.462 1.000 0.000 # Baseline for Model B (not shown in table) model_B_baseline <- (\" # Intercept and slope with fixed coefficients intercept =~ 1*alcohol_use_0 + 1*alcohol_use_1 + 1*alcohol_use_2 slope =~ 0*alcohol_use_0 + .75*alcohol_use_1 + 1.75*alcohol_use_2 # Regressions intercept ~ 0*female slope ~ 0*female alcohol_use_0 ~ 0*1 alcohol_use_1 ~ 0*1 alcohol_use_2 ~ 0*1 \") model_B_baseline_fit <- growth( model_B_baseline, data = alcohol_use_2_wide, estimator = \"ml\", mimic = \"Mplus\" ) anova(model_B_baseline_fit, model_B_fit) #> #> Chi-Squared Difference Test #> #> Df AIC BIC Chisq Chisq diff RMSEA Df diff #> model_B_fit 2 2577.9 2628.1 1.5447 #> model_B_baseline_fit 4 2577.7 2617.9 5.3665 3.8218 0.028493 2 #> Pr(>Chisq) #> model_B_fit #> model_B_baseline_fit 0.1479 lay <- get_layout( NA, NA, \"female\", NA, NA, NA, \"intercept\", NA, \"slope\", NA, \"alcohol_use_0\", NA, \"alcohol_use_1\", NA, \"alcohol_use_2\", rows = 3 ) graph_sem(model_B_fit, layout = lay) # Model C: Model B but with slope fixed to zero model_C <- (\" # Intercept and slope with fixed coefficients intercept =~ 1*alcohol_use_0 + 1*alcohol_use_1 + 1*alcohol_use_2 slope =~ 0*alcohol_use_0 + .75*alcohol_use_1 + 1.75*alcohol_use_2 # Regressions intercept ~ female slope ~ 0*female \") model_C_fit <- growth( model_C, data = alcohol_use_2_wide, estimator = \"ml\", mimic = \"Mplus\" ) summary(model_C_fit) #> lavaan 0.6.17 ended normally after 32 iterations #> #> Estimator ML #> Optimization method NLMINB #> Number of model parameters 9 #> #> Number of observations 1122 #> Number of missing patterns 1 #> #> Model Test User Model: #> #> Test statistic 1.817 #> Degrees of freedom 3 #> P-value (Chi-square) 0.611 #> #> Parameter Estimates: #> #> Standard errors Standard #> Information Observed #> Observed information based on Hessian #> #> Latent Variables: #> Estimate Std.Err z-value P(>|z|) #> intercept =~ #> alcohol_use_0 1.000 #> alcohol_use_1 1.000 #> alcohol_use_2 1.000 #> slope =~ #> alcohol_use_0 0.000 #> alcohol_use_1 0.750 #> alcohol_use_2 1.750 #> #> Regressions: #> Estimate Std.Err z-value P(>|z|) #> intercept ~ #> female -0.037 0.019 -1.885 0.059 #> slope ~ #> female 0.000 #> #> Covariances: #> Estimate Std.Err z-value P(>|z|) #> .intercept ~~ #> .slope -0.012 0.005 -2.667 0.008 #> #> Intercepts: #> Estimate Std.Err z-value P(>|z|) #> .intercept 0.248 0.016 15.525 0.000 #> .slope 0.036 0.007 4.898 0.000 #> #> Variances: #> Estimate Std.Err z-value P(>|z|) #> .alcohol_use_0 0.049 0.006 7.609 0.000 #> .alcohol_use_1 0.075 0.004 17.036 0.000 #> .alcohol_use_2 0.077 0.010 7.801 0.000 #> .intercept 0.086 0.007 12.194 0.000 #> .slope 0.019 0.005 3.739 0.000 fitMeasures(model_C_fit, c(\"chisq\", \"df\", \"pvalue\", \"cfi\", \"rmsea\")) #> chisq df pvalue cfi rmsea #> 1.817 3.000 0.611 1.000 0.000 graph_sem(model_C_fit, layout = lay) # Model D: Adding peer_pressure as a time-varying predictor model_D <- (\" # Intercept and slope with fixed coefficients alc_intercept =~ 1*alcohol_use_0 + 1*alcohol_use_1 + 1*alcohol_use_2 alc_slope =~ 0*alcohol_use_0 + .75*alcohol_use_1 + 1.75*alcohol_use_2 peer_intercept =~ 1*peer_pressure_0 + 1*peer_pressure_1 + 1*peer_pressure_2 peer_slope =~ 0*peer_pressure_0 + .75*peer_pressure_1 + 1.75*peer_pressure_2 # Regressions alc_intercept ~ start(.8)*peer_intercept + start(.08)*peer_slope alc_slope ~ start(-.1)*peer_intercept + start(.6)*peer_slope # Time-varying covariances alcohol_use_0 ~~ peer_pressure_0 alcohol_use_1 ~~ peer_pressure_1 alcohol_use_2 ~~ peer_pressure_2 # Fix intercepts to zero alcohol_use_0 ~ 0*1 alcohol_use_1 ~ 0*1 alcohol_use_2 ~ 0*1 peer_pressure_0 ~ 0*1 peer_pressure_1 ~ 0*1 peer_pressure_2 ~ 0*1 \") model_D_fit <- growth( model_D, data = alcohol_use_2_wide, estimator = \"ml\", mimic = \"Mplus\" ) summary(model_D_fit) #> lavaan 0.6.17 ended normally after 72 iterations #> #> Estimator ML #> Optimization method NLMINB #> Number of model parameters 23 #> #> Number of observations 1122 #> Number of missing patterns 1 #> #> Model Test User Model: #> #> Test statistic 11.557 #> Degrees of freedom 4 #> P-value (Chi-square) 0.021 #> #> Parameter Estimates: #> #> Standard errors Standard #> Information Observed #> Observed information based on Hessian #> #> Latent Variables: #> Estimate Std.Err z-value P(>|z|) #> alc_intercept =~ #> alcohol_use_0 1.000 #> alcohol_use_1 1.000 #> alcohol_use_2 1.000 #> alc_slope =~ #> alcohol_use_0 0.000 #> alcohol_use_1 0.750 #> alcohol_use_2 1.750 #> peer_intercept =~ #> peer_pressur_0 1.000 #> peer_pressur_1 1.000 #> peer_pressur_2 1.000 #> peer_slope =~ #> peer_pressur_0 0.000 #> peer_pressur_1 0.750 #> peer_pressur_2 1.750 #> #> Regressions: #> Estimate Std.Err z-value P(>|z|) #> alc_intercept ~ #> peer_intercept 0.799 0.103 7.781 0.000 #> peer_slope 0.080 0.184 0.438 0.661 #> alc_slope ~ #> peer_intercept -0.143 0.076 -1.884 0.060 #> peer_slope 0.577 0.193 2.990 0.003 #> #> Covariances: #> Estimate Std.Err z-value P(>|z|) #> .alcohol_use_0 ~~ #> .peer_pressur_0 0.011 0.006 1.773 0.076 #> .alcohol_use_1 ~~ #> .peer_pressur_1 0.034 0.005 7.324 0.000 #> .alcohol_use_2 ~~ #> .peer_pressur_2 0.037 0.010 3.663 0.000 #> peer_intercept ~~ #> peer_slope 0.001 0.007 0.166 0.868 #> .alc_intercept ~~ #> .alc_slope -0.006 0.005 -1.249 0.212 #> #> Intercepts: #> Estimate Std.Err z-value P(>|z|) #> .alcohol_use_0 0.000 #> .alcohol_use_1 0.000 #> .alcohol_use_2 0.000 #> .peer_pressur_0 0.000 #> .peer_pressur_1 0.000 #> .peer_pressur_2 0.000 #> .alc_intercept 0.067 0.016 4.252 0.000 #> .alc_slope 0.008 0.015 0.564 0.573 #> peer_intercept 0.188 0.012 15.743 0.000 #> peer_slope 0.096 0.010 9.922 0.000 #> #> Variances: #> Estimate Std.Err z-value P(>|z|) #> .alcohol_use_0 0.048 0.006 7.553 0.000 #> .alcohol_use_1 0.076 0.004 17.165 0.000 #> .alcohol_use_2 0.076 0.010 7.819 0.000 #> .peer_pressur_0 0.106 0.011 9.790 0.000 #> .peer_pressur_1 0.171 0.009 19.713 0.000 #> .peer_pressur_2 0.129 0.018 7.325 0.000 #> .alc_intercept 0.042 0.007 5.649 0.000 #> .alc_slope 0.009 0.005 1.697 0.090 #> peer_intercept 0.070 0.010 6.729 0.000 #> peer_slope 0.028 0.009 3.214 0.001 fitMeasures(model_D_fit, c(\"chisq\", \"df\", \"pvalue\", \"cfi\", \"rmsea\")) #> chisq df pvalue cfi rmsea #> 11.557 4.000 0.021 0.996 0.041 # Baseline for Model D (not shown in table) model_D_baseline <- (\" # Intercepts and slopes with fixed coefficients alc_intercept =~ 1*alcohol_use_0 + 1*alcohol_use_1 + 1*alcohol_use_2 alc_slope =~ 0*alcohol_use_0 + .75*alcohol_use_1 + 1.75*alcohol_use_2 peer_intercept =~ 1*peer_pressure_0 + 1*peer_pressure_1 + 1*peer_pressure_2 peer_slope =~ 0*peer_pressure_0 + .75*peer_pressure_1 + 1.75*peer_pressure_2 # Regressions alc_intercept ~ 0*peer_intercept + 0*peer_slope alc_slope ~ 0*peer_intercept + 0*peer_slope # Time-varying covariances alcohol_use_0 ~~ peer_pressure_0 alcohol_use_1 ~~ peer_pressure_1 alcohol_use_2 ~~ peer_pressure_2 alcohol_use_0 ~ 0*1 alcohol_use_1 ~ 0*1 alcohol_use_2 ~ 0*1 peer_pressure_0 ~ 0*1 peer_pressure_1 ~ 0*1 peer_pressure_2 ~ 0*1 \") model_D_baseline_fit <- growth( model_D_baseline, data = alcohol_use_2_wide, estimator = \"ml\", mimic = \"Mplus\" ) anova(model_D_baseline_fit, model_D_fit) #> #> Chi-Squared Difference Test #> #> Df AIC BIC Chisq Chisq diff RMSEA Df diff #> model_D_fit 4 6120.5 6236.1 11.557 #> model_D_baseline_fit 8 6443.6 6539.1 342.648 331.09 0.26997 4 #> Pr(>Chisq) #> model_D_fit #> model_D_baseline_fit < 2.2e-16 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 lay <- get_layout( \"peer_pressure_0\", NA, \"peer_pressure_1\", NA, \"peer_pressure_2\", NA, \"peer_intercept\", NA, \"peer_slope\", NA, NA, \"alc_intercept\", NA, \"alc_slope\", NA, \"alcohol_use_0\", NA, \"alcohol_use_1\", NA, \"alcohol_use_2\", rows = 4 ) graph_sem(model_D_fit, layout = lay)"},{"path":[]},{"path":[]},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-9.html","id":"should-you-conduct-a-survival-analysis-the-whether-and-when-test","dir":"Articles","previous_headings":"","what":"9.1 Should you conduct a survival analysis? The “whether” and “when” test","title":"Chapter 9: A framework for investigating event occurrence","text":"Section 9.1 Singer Willett (2003) introduce simple mnemonic refer “whether” “” test determine whether research question may call survival analysis: research questions includes words “whether” “”, likely need use survival methods. illustrate range research questions survival methods suitable, introduce three studies pass “whether” “” test: alcohol_relapse: person-level data frame 89 rows 3 columns containing subset data Cooney colleagues (1991), measured whether () 89 recently treated alcoholics first relapsed alcohol use. teachers: person-level data frame 3941 rows 3 columns containing subset data Singer (1993), measured whether () 3941 newly hired special educators Michigan first stopped teaching state. suicide_ideation: person-level data frame 391 rows 4 columns containing subset data Bolger colleagues (1989), measured whether () 391 undergraduate students first experienced suicide ideation. later chapters, return data sets explore different survival methods.","code":"alcohol_relapse #> # A tibble: 89 × 3 #> id weeks censor #> #> 1 1 0.714 0 #> 2 2 0.714 0 #> 3 3 1.14 0 #> 4 4 1.43 0 #> 5 5 1.71 0 #> 6 6 1.71 0 #> 7 7 2.14 0 #> 8 8 2.71 0 #> 9 9 3.86 0 #> 10 10 4.14 0 #> # ℹ 79 more rows teachers #> # A tibble: 3,941 × 3 #> id years censor #> #> 1 1 1 0 #> 2 2 2 0 #> 3 3 1 0 #> 4 4 1 0 #> 5 5 12 1 #> 6 6 1 0 #> 7 7 12 1 #> 8 8 1 0 #> 9 9 2 0 #> 10 10 2 0 #> # ℹ 3,931 more rows suicide_ideation #> # A tibble: 391 × 4 #> id age censor age_now #> #> 1 1 16 0 18 #> 2 2 10 0 19 #> 3 3 16 0 19 #> 4 4 20 0 22 #> 5 6 15 0 22 #> 6 7 10 0 19 #> 7 8 22 1 22 #> 8 9 22 1 22 #> 9 10 15 0 20 #> 10 11 10 0 19 #> # ℹ 381 more rows"},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-9.html","id":"framing-a-research-question-about-event-occurrence","dir":"Articles","previous_headings":"","what":"9.2 Framing a research question about event occurrence","title":"Chapter 9: A framework for investigating event occurrence","text":"Section 9.2 Singer Willett (2003) discuss three methodological features make study suitable survival analysis: target event, whose occurrence represents individual’s transition one state another state, set states precisely defined, mutually exclusive, jointly exhaustive. beginning time, everyone population (least theoretically) risk experiencing target event, individuals occupy one possible non-event states. temporal distance beginning time event occurrence referred event time. metric clocking time, provides meaningful temporal scale record event occurrence—smallest possible units relevant process study. analytical reasons, distinguish discrete time continuous time, depending whether time measured discrete continuous intervals. three example studies introduced possesses features.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/articles/chapter-9.html","id":"censoring-how-complete-are-the-data-on-event-occurrence","dir":"Articles","previous_headings":"","what":"9.3 Censoring: How complete are the data on event occurrence?","title":"Chapter 9: A framework for investigating event occurrence","text":"Section 9.3 Singer Willett (2003) introduce concept censoring censored observations, occur sample members unknown event times—preventing knowing whether () target event occurs subset sample. Censoring hallmark feature event occurrence data makes new statistical methods necessary; arises different ways rates, several different forms: Censoring occurs two primary reasons: (1) individuals never experience target event; (2) individuals experience target event outside study’s data collection period. amount censoring study related two factors: (1) rate target event occurs population; (2) length data collection period. two mechanisms behind censoring: (1) noninformative mechanism, censoring occurs reasons independent event occurrence risk event occurrence; (2) informative mechanism, censoring occurs reasons related event occurrence risk event occurrence two types censoring: (1) right-censoring arises event time unknown event occurrence observed; (2) left-censoring arises event time unknown beginning time observed. three example studies different rates censoring: 22.5% former alcoholics remained abstinent, 44.0% newly hired teachers still teaching Michigan, 29.7% undergraduates experience suicide ideation. Singer Willett (2003) discuss, toll censoring can seen plotting event times censored event times teachers data. Notice discrepancy sample distributions known event times censored event times—typical event occurrence data, makes summarizing time--event occurrence difficult adequately traditional descriptive methods (e.g., measures central tendency dispersion). remaining chapters, explore several different methods survival analysis: alternative statistical approach incorporates censored observations based information provide event nonoccurrence, allowing us adequately summarize time--event occurrence dealing evenhandedly known censored event times.","code":"map( list( alcohol_relapse = alcohol_relapse, teachers = teachers, suicide_ideation = suicide_ideation ), \\(.x) .x |> count(censor, name = \"count\") |> mutate(proportion = count / sum(count)) ) #> $alcohol_relapse #> # A tibble: 2 × 3 #> censor count proportion #> #> 1 0 69 0.775 #> 2 1 20 0.225 #> #> $teachers #> # A tibble: 2 × 3 #> censor count proportion #> #> 1 0 2207 0.560 #> 2 1 1734 0.440 #> #> $suicide_ideation #> # A tibble: 2 × 3 #> censor count proportion #> #> 1 0 275 0.703 #> 2 1 116 0.297 # Figure 9.1, page 321: ggplot(teachers, aes(x = years)) + geom_bar() + geom_text(aes(label = after_stat(count)), stat = \"count\", vjust = -.5) + scale_x_continuous(breaks = 1:12) + coord_cartesian(ylim = c(0, 550)) + facet_wrap(vars(censor), nrow = 2, labeller = label_both)"},{"path":"https://mccarthy-m-g.github.io/alda/articles/longitudinal-data-organization.html","id":"longitudinal-data-formats","dir":"Articles","previous_headings":"","what":"Longitudinal data formats","title":"Longitudinal data organization","text":"Longitudinal data can organized two distinct formats: person-level, wide, multivariate, format person one row data multiple columns containing data measurement occasion. person-period, long, univariate, format person one row data measurement occasion. R functions expect data person-period format visualization analysis, ’s easy convert longitudinal data set one format .","code":"glimpse(deviant_tolerance_pl) #> Rows: 16 #> Columns: 8 #> $ id 9, 45, 268, 314, 442, 514, 569, 624, 723, 918, 949, 978, … #> $ tolerance_11 2.23, 1.12, 1.45, 1.22, 1.45, 1.34, 1.79, 1.12, 1.22, 1.0… #> $ tolerance_12 1.79, 1.45, 1.34, 1.22, 1.99, 1.67, 1.90, 1.12, 1.34, 1.0… #> $ tolerance_13 1.90, 1.45, 1.99, 1.55, 1.45, 2.23, 1.90, 1.22, 1.12, 1.2… #> $ tolerance_14 2.12, 1.45, 1.79, 1.12, 1.67, 2.12, 1.99, 1.12, 1.00, 1.9… #> $ tolerance_15 2.66, 1.99, 1.34, 1.12, 1.90, 2.44, 1.99, 1.22, 1.12, 1.2… #> $ male 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0 #> $ exposure 1.54, 1.16, 0.90, 0.81, 1.13, 0.90, 1.99, 0.98, 0.81, 1.2… glimpse(deviant_tolerance_pp) #> Rows: 80 #> Columns: 5 #> $ id 9, 9, 9, 9, 9, 45, 45, 45, 45, 45, 268, 268, 268, 268, 268, … #> $ age 11, 12, 13, 14, 15, 11, 12, 13, 14, 15, 11, 12, 13, 14, 15, … #> $ tolerance 2.23, 1.79, 1.90, 2.12, 2.66, 1.12, 1.45, 1.45, 1.45, 1.99, … #> $ male 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, … #> $ exposure 1.54, 1.54, 1.54, 1.54, 1.54, 1.16, 1.16, 1.16, 1.16, 1.16, …"},{"path":"https://mccarthy-m-g.github.io/alda/articles/longitudinal-data-organization.html","id":"converting-between-formats","dir":"Articles","previous_headings":"Longitudinal data formats","what":"Converting between formats","title":"Longitudinal data organization","text":"convert person-level data set person-period format use dplyr::pivot_longer(): convert person-period data set person-level format use dplyr::pivot_wider():","code":"pivot_longer( deviant_tolerance_pl, cols = starts_with(\"tolerance_\"), names_to = \"age\", names_pattern = \"([[:digit:]]+)\", names_transform = as.integer, values_to = \"tolerance\" ) #> # A tibble: 80 × 5 #> id male exposure age tolerance #> #> 1 9 0 1.54 11 2.23 #> 2 9 0 1.54 12 1.79 #> 3 9 0 1.54 13 1.9 #> 4 9 0 1.54 14 2.12 #> 5 9 0 1.54 15 2.66 #> 6 45 1 1.16 11 1.12 #> 7 45 1 1.16 12 1.45 #> 8 45 1 1.16 13 1.45 #> 9 45 1 1.16 14 1.45 #> 10 45 1 1.16 15 1.99 #> # ℹ 70 more rows pivot_wider( deviant_tolerance_pp, names_from = age, names_prefix = \"tolerance_\", values_from = tolerance ) #> # A tibble: 16 × 8 #> id male exposure tolerance_11 tolerance_12 tolerance_13 tolerance_14 #> #> 1 9 0 1.54 2.23 1.79 1.9 2.12 #> 2 45 1 1.16 1.12 1.45 1.45 1.45 #> 3 268 1 0.9 1.45 1.34 1.99 1.79 #> 4 314 0 0.81 1.22 1.22 1.55 1.12 #> 5 442 0 1.13 1.45 1.99 1.45 1.67 #> 6 514 1 0.9 1.34 1.67 2.23 2.12 #> 7 569 0 1.99 1.79 1.9 1.9 1.99 #> 8 624 1 0.98 1.12 1.12 1.22 1.12 #> 9 723 0 0.81 1.22 1.34 1.12 1 #> 10 918 0 1.21 1 1 1.22 1.99 #> 11 949 1 0.93 1.99 1.55 1.12 1.45 #> 12 978 1 1.59 1.22 1.34 2.12 3.46 #> 13 1105 1 1.38 1.34 1.9 1.99 1.9 #> 14 1542 0 1.44 1.22 1.22 1.99 1.79 #> 15 1552 0 1.04 1 1.12 2.23 1.55 #> 16 1653 0 1.25 1.11 1.11 1.34 1.55 #> # ℹ 1 more variable: tolerance_15 "},{"path":"https://mccarthy-m-g.github.io/alda/articles/longitudinal-data-organization.html","id":"adding-discrete-time-indicators-to-person-period-data","dir":"Articles","previous_headings":"","what":"Adding discrete time indicators to person-period data","title":"Longitudinal data organization","text":"add discrete time indicators person-period data set first create temporary copy time variable column ones, use dplyr::pivot_wider():","code":"deviant_tolerance_pp |> mutate( temp_age = age, temp_dummy = 1 ) |> pivot_wider( names_from = temp_age, names_prefix = \"age_\", values_from = temp_dummy, values_fill = 0 ) #> # A tibble: 80 × 10 #> id age tolerance male exposure age_11 age_12 age_13 age_14 age_15 #> #> 1 9 11 2.23 0 1.54 1 0 0 0 0 #> 2 9 12 1.79 0 1.54 0 1 0 0 0 #> 3 9 13 1.9 0 1.54 0 0 1 0 0 #> 4 9 14 2.12 0 1.54 0 0 0 1 0 #> 5 9 15 2.66 0 1.54 0 0 0 0 1 #> 6 45 11 1.12 1 1.16 1 0 0 0 0 #> 7 45 12 1.45 1 1.16 0 1 0 0 0 #> 8 45 13 1.45 1 1.16 0 0 1 0 0 #> 9 45 14 1.45 1 1.16 0 0 0 1 0 #> 10 45 15 1.99 1 1.16 0 0 0 0 1 #> # ℹ 70 more rows"},{"path":"https://mccarthy-m-g.github.io/alda/articles/longitudinal-data-organization.html","id":"adding-contiguous-periods-to-person-level-survival-data","dir":"Articles","previous_headings":"","what":"Adding contiguous periods to person-level survival data","title":"Longitudinal data organization","text":"add contiguous periods person-level data use dplyr::reframe():","code":"first_sex |> # In order to add the event indicator, the time variable needs a different # name in the person-level data from the name we want to use in `reframe()`. # This is a temporary variable so it doesn't matter what the name is. rename(grades = grade) |> group_by(id) |> reframe( grade = 1:max(grades), event = if_else(grade == grades & censor == 0, 1, 0), # To keep predictors from the person-level data, simply list them. If there # are many predictors it might be more convenient to use # `dplyr::left_join()` after `reframe()`. parental_transition, parental_antisociality ) #> # A tibble: 1,902 × 5 #> id grade event parental_transition parental_antisociality #> #> 1 1 1 0 0 1.98 #> 2 1 2 0 0 1.98 #> 3 1 3 0 0 1.98 #> 4 1 4 0 0 1.98 #> 5 1 5 0 0 1.98 #> 6 1 6 0 0 1.98 #> 7 1 7 0 0 1.98 #> 8 1 8 0 0 1.98 #> 9 1 9 1 0 1.98 #> 10 2 1 0 1 -0.545 #> # ℹ 1,892 more rows"},{"path":"https://mccarthy-m-g.github.io/alda/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Michael McCarthy. Author, maintainer.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"McCarthy M (2024). alda: Data Applied longitudinal data analysis: Modeling change event occurrence. R package version 0.0.0.9000, https://github.com/mccarthy-m-g/alda, https://mccarthy-m-g.github.io/alda/.","code":"@Manual{, title = {alda: Data for Applied longitudinal data analysis: Modeling change and event occurrence}, author = {Michael McCarthy}, year = {2024}, note = {R package version 0.0.0.9000, https://github.com/mccarthy-m-g/alda}, url = {https://mccarthy-m-g.github.io/alda/}, }"},{"path":"https://mccarthy-m-g.github.io/alda/index.html","id":"alda","dir":"","previous_headings":"","what":"Data for Applied longitudinal data analysis: Modeling change and event occurrence","title":"Data for Applied longitudinal data analysis: Modeling change and event occurrence","text":"package contains 31 data sets provided Singer Willett (2003) book, Applied longitudinal data analysis: Modeling change event occurrence, suitable longitudinal mixed effects modelling, longitudinal structural equation modelling, survival analysis. data sets package real data real studies; however, modified Singer Willett (2003) illustration statistical methods, may match results original studies. eleven data sets longitudinal mixed effects modelling: ?deviant_tolerance: Adolescent tolerance deviant behaviour (Chapter 2) ?early_intervention: Early educational interventions cognitive performance (Chapter 3) ?alcohol_use_1: Adolescent peer alcohol use (Chapters 4 6) ?reading_scores: Peabody Individual Achievement Test reading scores (Chapter 5) ?dropout_wages: High school dropout labour market experiences (Chapters 5 6) ?depression_unemployment: Unemployment depression (Chapter 5) ?antidepressants: Antidepressant medication positive mood (Chapter 5) ?berkeley: Berkeley Growth Study (Chapter 6) ?externalizing_behaviour: Externalizing behaviour children (Chapter 6) ?cognitive_growth: Cognitive growth children (Chapter 6) ?opposites_naming: Opposites naming task (Chapter 7) one data set longitudinal structural equation modelling: ?alcohol_use_2: Adolescent alcohol consumption peer pressure (Chapter 8) twenty data sets survival analysis: ?teachers: Years special education teacher turnover (Chapters 9 10) ?cocaine_relapse_1: Weeks cocaine relapse treatment (Chapter 10) ?first_sex: Age first sexual intercourse (Chapters 10 11) ?suicide_ideation: Age first suicide ideation (Chapter 10) ?congresswomen: House Representatives tenure (Chapter 10) ?tenure: Years academic tenure (Chapter 12) ?first_depression_1: Age first depression (Chapter 12) ?first_arrest: Age first juvenile arrest (Chapter 12) ?math_dropout: Math course history (Chapter 12) ?honking: Time horn honking (Chapter 13) ?alcohol_relapse: Weeks alcohol relapse treatment (Chapter 13) ?judges: Supreme Court justice tenure (Chapters 13 15) ?first_depression_2: Age first depression (Chapter 13) ?health_workers: Length health worker employment (Chapter 13) ?rearrest: Days inmate recidivism (Chapters 14 15) ?first_cocaine: Age first cocaine use (Chapter 15) ?cocaine_relapse_2: Days cocaine relapse abstinence (Chapter 15) ?psychiatric_discharge: Days psychiatric hospital discharge (Chapter 15) ?physicians: Physician career history (Chapter 15) ?monkeys: Piagetian monkeys (Chapter 15)","code":""},{"path":"https://mccarthy-m-g.github.io/alda/index.html","id":"vignettes-and-articles","dir":"","previous_headings":"","what":"Vignettes and articles","title":"Data for Applied longitudinal data analysis: Modeling change and event occurrence","text":"one vignette tips tricks working longitudinal data: vignette(\"longitudinal-data-organization\") fourteen articles package documentation website demonstrating recreate examples textbook R: Chapter 2: Exploring longitudinal data change Chapter 3: Introducing multilevel model change Chapter 4: data analysis multilevel model change Chapter 5: Treating time flexibly Chapter 6: Modeling discontinuous nonlinear change Chapter 7: Examining multilevel model’s error covariance structure Chapter 8: Modeling change using covariance structure analysis Chapter 9: framework investigating event occurrence Chapter 10: Describing discrete-time event occurrence data Chapter 11: Fitting basic discrete-time hazard models Chapter 12: Extending discrete-time hazard model Chapter 13: Describing continuous-time event occurrence data Chapter 14: Fitting Cox regression model Chapter 15: Extending Cox regression model","code":""},{"path":"https://mccarthy-m-g.github.io/alda/index.html","id":"documentation","dir":"","previous_headings":"","what":"Documentation","title":"Data for Applied longitudinal data analysis: Modeling change and event occurrence","text":"See https://mccarthy-m-g.github.io/alda/ also installed package: help(package = \"alda\").","code":""},{"path":"https://mccarthy-m-g.github.io/alda/index.html","id":"references","dir":"","previous_headings":"","what":"References","title":"Data for Applied longitudinal data analysis: Modeling change and event occurrence","text":"Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis: Modeling change event occurrence. Oxford University Press, USA. https://doi.org/10.1093/acprof:oso/9780195152968.001.0001","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/alcohol_relapse.html","id":null,"dir":"Reference","previous_headings":"","what":"Weeks to alcohol relapse after treatment — alcohol_relapse","title":"Weeks to alcohol relapse after treatment — alcohol_relapse","text":"subset data Cooney colleagues (1991) measuring weeks first \"heavy drinking\" day sample 89 recently treated alcoholics. Individuals followed two years (around 104.286 weeks) relapsed.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/alcohol_relapse.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Weeks to alcohol relapse after treatment — alcohol_relapse","text":"","code":"alcohol_relapse"},{"path":"https://mccarthy-m-g.github.io/alda/reference/alcohol_relapse.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Weeks to alcohol relapse after treatment — alcohol_relapse","text":"person-level data frame 89 rows 3 columns: id Participant ID. weeks Number weeks first \"heavy drinking\" day censor Censoring status.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/alcohol_relapse.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Weeks to alcohol relapse after treatment — alcohol_relapse","text":"Cooney, N. L., Kadden, R. M., Litt, M. D., & Getter, H. (1991). Matching alcoholics coping skills interactional therapies: Two-year follow-results. Journal Consulting Clinical Psychology, 59, 598–601. https://doi.org/10.1037/0022-006X.59.4.598","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/alcohol_use_1.html","id":null,"dir":"Reference","previous_headings":"","what":"Adolescent and peer alcohol use — alcohol_use_1","title":"Adolescent and peer alcohol use — alcohol_use_1","text":"subset data Curran, Stice, Chassin (1997) measuring relation changes alcohol use changes peer alcohol use 3-year period community-based sample Hispanic Caucasian adolescents.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/alcohol_use_1.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Adolescent and peer alcohol use — alcohol_use_1","text":"","code":"alcohol_use_1"},{"path":"https://mccarthy-m-g.github.io/alda/reference/alcohol_use_1.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Adolescent and peer alcohol use — alcohol_use_1","text":"person-period data frame 246 rows 6 columns: id Participant ID. age years. child_of_alcoholic Binary indicator whether adolescent child alcoholic parent. male Binary indicator whether adolescent male. alcohol_use Square root summed scores four eight-point items measuring frequency alcohol use. peer_alcohol_use Square root summed scores two six-point items measuring frequency peer alcohol use.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/alcohol_use_1.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Adolescent and peer alcohol use — alcohol_use_1","text":"Curran, P. J., Stice, E., & Chassin, L. (1997). relation adolescent peer alcohol use: longitudinal random coefficients model. Journal Consulting Clinical Psychology, 65, 130–140. https://doi.org/10.1037//0022-006x.65.1.130","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/alcohol_use_2.html","id":null,"dir":"Reference","previous_headings":"","what":"Adolescent alcohol consumption and peer pressure — alcohol_use_2","title":"Adolescent alcohol consumption and peer pressure — alcohol_use_2","text":"Data Barnes, Farrell, Banerjee (1994) measuring relation changes alcohol use changes peer pressure use alcohol sample 1122 Black White adolescents tracked beginning seventh grade end eighth grade.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/alcohol_use_2.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Adolescent alcohol consumption and peer pressure — alcohol_use_2","text":"","code":"alcohol_use_2"},{"path":"https://mccarthy-m-g.github.io/alda/reference/alcohol_use_2.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Adolescent alcohol consumption and peer pressure — alcohol_use_2","text":"person-period data frame 3366 rows 5 columns: id Participant ID. time Time measurement. female Binary indicator whether adolescent female. alcohol_use Natural logarithm averaged scores three six-point items measuring frequency beer, wine, liquor consumption, respectively. peer_pressure Natural logarithm six-point item measuring frequency friends offered alcoholic drinks past month.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/alcohol_use_2.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Adolescent alcohol consumption and peer pressure — alcohol_use_2","text":"Barnes, G. M., Farrell, M. P., & Banerjee, S. (1994). Family influences alcohol abuse problem behaviors among black white adolescents general population sample. Journal Research Adolescence, 4, 183–201. https://doi.org/10.1207/s15327795jra0402_2","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/alcohol_use_2.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Adolescent alcohol consumption and peer pressure — alcohol_use_2","text":"Barnes, Farrell, Banerjee (1994) report sample 699 adolescents; however, note ongoing longitudinal study, likely explains sample size discrepancy data used Singer Willett (2003).","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/antidepressants.html","id":null,"dir":"Reference","previous_headings":"","what":"Antidepressant medication and positive mood — antidepressants","title":"Antidepressant medication and positive mood — antidepressants","text":"subset data Tomarken, Shelton, Elkins, Anderson (1997) measuring relation changes positive mood supplemental antidepressant medication course week sample 73 men women already receiving nonpharmacological therapy depression.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/antidepressants.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Antidepressant medication and positive mood — antidepressants","text":"","code":"antidepressants"},{"path":"https://mccarthy-m-g.github.io/alda/reference/antidepressants.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Antidepressant medication and positive mood — antidepressants","text":"person-period data frame 1242 rows 6 columns: id Participant ID. wave Wave measurement. day Day measurement. reading Time day measurement. positive_mood Positive mood score. treatment Treatment condition (placebo pills = 0, antidepressant pills = 1).","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/antidepressants.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Antidepressant medication and positive mood — antidepressants","text":"Tomarken, . J., Shelton, R. C., Elkins, L., & Anderson, T. (1997). Sleep deprivation anti-depressant medication: Unique effects positive negative affect. Poster session presented 9th annual meeting American Psychological Society, Washington, DC.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/berkeley.html","id":null,"dir":"Reference","previous_headings":"","what":"Berkeley Growth Study — berkeley","title":"Berkeley Growth Study — berkeley","text":"subset data Berkeley Growth Study measuring changes IQ single girl followed childhood older adulthood (Bayley, 1935).","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/berkeley.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Berkeley Growth Study — berkeley","text":"","code":"berkeley"},{"path":"https://mccarthy-m-g.github.io/alda/reference/berkeley.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Berkeley Growth Study — berkeley","text":"person-period data frame 18 rows 2 columns: age Age girl years. iq IQ score.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/berkeley.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Berkeley Growth Study — berkeley","text":"Bayley, N. (1935). development motor abilities first three years. Monographs Society Research Child Development, 1.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/cocaine_relapse_1.html","id":null,"dir":"Reference","previous_headings":"","what":"Weeks to cocaine relapse after treatment — cocaine_relapse_1","title":"Weeks to cocaine relapse after treatment — cocaine_relapse_1","text":"subset data Hall, Havassy, Wasserman (1990) measuring number weeks relapse cocaine use sample 104 former addicts released -patient treatment program. -patients followed 12 weeks used cocaine 7 consecutive days.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/cocaine_relapse_1.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Weeks to cocaine relapse after treatment — cocaine_relapse_1","text":"","code":"cocaine_relapse_1"},{"path":"https://mccarthy-m-g.github.io/alda/reference/cocaine_relapse_1.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Weeks to cocaine relapse after treatment — cocaine_relapse_1","text":"person-level data frame 104 rows 4 columns: id -patient ID. weeks number weeks -patient's release relapse cocaine use. censor Censoring status. needle Binary indicator whether cocaine ever used intravenously.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/cocaine_relapse_1.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Weeks to cocaine relapse after treatment — cocaine_relapse_1","text":"Hall, S. M., Havassy, B. E., & Wasserman, D. . (1990). Commitment abstinence acute stress relapse alcohol, opiates, nicotine. Journal Consulting Clinical Psychology, 58, 175–181. https://doi.org/10.1037//0022-006x.58.2.175","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/cocaine_relapse_2.html","id":null,"dir":"Reference","previous_headings":"","what":"Days to cocaine relapse after abstinence — cocaine_relapse_2","title":"Days to cocaine relapse after abstinence — cocaine_relapse_2","text":"subset unpublished data Hall, Havassy, Wasserman (1990) measuring relation number days relapse cocaine use several predictors might associated relapse sample 104 newly abstinent cocaine users recently completed abstinence-oriented treatment program. Former cocaine users followed 12 weeks post-treatment used cocaine 7 consecutive days. Self-reported abstinence confirmed interview absence cocaine urine specimens.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/cocaine_relapse_2.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Days to cocaine relapse after abstinence — cocaine_relapse_2","text":"","code":"cocaine_relapse_2"},{"path":"https://mccarthy-m-g.github.io/alda/reference/cocaine_relapse_2.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Days to cocaine relapse after abstinence — cocaine_relapse_2","text":"person-period data frame 1248 rows 7 columns: id Participant ID. days Number days relapse cocaine use censoring. Relapse defined 4 days cocaine use week preceding interview. Study dropouts lost participants coded relapsing cocaine use, number days relapse coded occurring week last follow-interview attended. censor Censoring status (0 = relapsed, 1 = censored). needle Binary indicator whether cocaine ever used intravenously. base_mood Total score positive mood subscales (Activity Happiness) Mood Questionnaire (Ryman, Biersner, & LaRocco, 1974), taken intake interview last week treatment. item used five point Likert score (ranging 0 = , 4 = extremely). followup Week follow-interview. mood Total score positive mood subscales (Activity Happiness) Mood Questionnaire (Ryman, Biersner, & LaRocco, 1974), taken follow-interviews week post-treatment. item used five point Likert score (ranging 0 = , 4 = extremely).","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/cocaine_relapse_2.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Days to cocaine relapse after abstinence — cocaine_relapse_2","text":"Hall, S. M., Havassy, B. E., & Wasserman, D. . (1990). Commitment abstinence acute stress relapse alcohol, opiates, nicotine. Journal Consulting Clinical Psychology, 58, 175–181. https://doi.org/10.1037//0022-006x.58.2.175","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/cocaine_relapse_2.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Days to cocaine relapse after abstinence — cocaine_relapse_2","text":"Hall, Havassy, Wasserman (1990) measured time relapse weeks, days; however, use data illustrate imputation strategies, Singer Willett (2003) converted weekly relapse information days, jittered event times, effectively converting discrete-time continuous-time. Additionally, Hall, Havassy, Wasserman (1990) report following cocaine users study, thus, appears unpublished data.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/cocaine_relapse_2.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Days to cocaine relapse after abstinence — cocaine_relapse_2","text":"Ryman, D. H., Biersner, R. J., & La Rocco, J. M. (1974). Reliabilities validities Mood Questionnaire. Psychological Reports, 35, 479-484. https://doi.org/10.2466/pr0.1974.35.1.479","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/cognitive_growth.html","id":null,"dir":"Reference","previous_headings":"","what":"Cognitive growth in children — cognitive_growth","title":"Cognitive growth in children — cognitive_growth","text":"Data Tivnan (1980) measuring changes cognitive growth three-week period sample 17 first second-graders.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/cognitive_growth.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Cognitive growth in children — cognitive_growth","text":"","code":"cognitive_growth"},{"path":"https://mccarthy-m-g.github.io/alda/reference/cognitive_growth.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Cognitive growth in children — cognitive_growth","text":"person-period data frame 445 rows 4 columns: id Child ID. game Game number. child played maximum 27 games. nmoves number moves completed making catastrophic error. read Score unnamed standardized reading test.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/cognitive_growth.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Cognitive growth in children — cognitive_growth","text":"Tivnan, T. (1980). Improvements performance cognitive tasks: acquisition new skills elementary school children. Unpublished doctoral dissertation. Harvard University, Graduate School Education.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/congresswomen.html","id":null,"dir":"Reference","previous_headings":"","what":"House of Representatives tenure — congresswomen","title":"House of Representatives tenure — congresswomen","text":"Data measuring long 168 women elected U.S. House Representatives 1919 1996 remained office. Representatives followed eight terms 1998.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/congresswomen.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"House of Representatives tenure — congresswomen","text":"","code":"congresswomen"},{"path":"https://mccarthy-m-g.github.io/alda/reference/congresswomen.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"House of Representatives tenure — congresswomen","text":"person-level data frame 168 rows 5 columns: id Participant ID. name Representative name. terms Number terms office. censor Censoring status. democrat Party affiliation.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/depression_unemployment.html","id":null,"dir":"Reference","previous_headings":"","what":"Unemployment and depression — depression_unemployment","title":"Unemployment and depression — depression_unemployment","text":"subset data Ginexi colleagues (2000) measuring changes depressive symptoms job loss sample 254 recently unemployed men women. Interviews conducted three waves around 1, 5, 12 months job loss.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/depression_unemployment.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Unemployment and depression — depression_unemployment","text":"","code":"depression_unemployment"},{"path":"https://mccarthy-m-g.github.io/alda/reference/depression_unemployment.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Unemployment and depression — depression_unemployment","text":"person-period data frame 674 rows 5 columns: id Participant ID. interview Time interview. months Months since job loss. depression Center Epidemiologic Studies' Depression (CES-D) scale score (Radloff, 1977) unemployed Binary indicator whether participant unemployed time interview.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/depression_unemployment.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Unemployment and depression — depression_unemployment","text":"Ginexi, E. M., Howe, G. W., & Caplan, R. D. (2000). Depression control beliefs relation reemployment: directions effect? Journal Occupational Health Psychology, 5, 323–336. https://doi.org/10.1037//1076-8998.5.3.323","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/depression_unemployment.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Unemployment and depression — depression_unemployment","text":"Radloff, L. S. (1977). CES-D scale: self report major depressive disorder scale research general population. Applied Psychological Measurement, 1, 385–401. https://doi.org/10.1177/014662167700100306","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/deviant_tolerance.html","id":null,"dir":"Reference","previous_headings":"","what":"Adolescent tolerance of deviant behaviour — deviant_tolerance_pp","title":"Adolescent tolerance of deviant behaviour — deviant_tolerance_pp","text":"subset data National Youth Survey (NYS) measuring tolerance deviant behaviour adolescents time (Raudenbush & Chan, 1992).","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/deviant_tolerance.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Adolescent tolerance of deviant behaviour — deviant_tolerance_pp","text":"","code":"deviant_tolerance_pp deviant_tolerance_pl"},{"path":[]},{"path":"https://mccarthy-m-g.github.io/alda/reference/deviant_tolerance.html","id":"deviant-tolerance-pp","dir":"Reference","previous_headings":"","what":"deviant_tolerance_pp","title":"Adolescent tolerance of deviant behaviour — deviant_tolerance_pp","text":"person-period data frame 80 rows 5 columns: id Participant ID. age Adolescent age years. tolerance Average score across 9-item scale assessing attitudes favourable deviant behaviour. item used four point scale (1 = wrong, 2 = wrong, 3 = little bit wrong, 4 = wrong ). male Binary indicator whether adolescent male. exposure Average score across 9-item scale assessing level exposure deviant peers. item used five point Likert score (ranging 0 = none, 4 = ).","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/deviant_tolerance.html","id":"deviant-tolerance-pl","dir":"Reference","previous_headings":"","what":"deviant_tolerance_pl","title":"Adolescent tolerance of deviant behaviour — deviant_tolerance_pp","text":"person-level data frame 16 rows 8 columns: id Participant ID. tolerance_11, tolerance_12, tolerance_13, tolerance_14, tolerance_15, Average score across 9-item scale assessing attitudes favourable deviant behaviour ages 11, 12, 13, 14, 15. item used four point scale (1 = wrong, 2 = wrong, 3 = little bit wrong, 4 = wrong ). male Binary indicator whether adolescent male. exposure Average score across 9-item scale assessing level exposure deviant peers. item used five point Likert score (ranging 0 = none, 4 = ).","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/deviant_tolerance.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Adolescent tolerance of deviant behaviour — deviant_tolerance_pp","text":"Raudenbush, S. W., & Chan, W. S. (1992). Growth curve analysis accelerated longitudinal designs. Journal Research Crime Delinquency, 29, 387–411. https://doi.org/10.1177/0022427892029004001","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/deviant_tolerance.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Adolescent tolerance of deviant behaviour — deviant_tolerance_pp","text":"Raudenbush Chan (1992) comment exposure time-varying predictor original study; however, Singer Willett (2003) provide exposure time-invariant predictor.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/dropout_wages.html","id":null,"dir":"Reference","previous_headings":"","what":"High school dropout labour market experiences — dropout_wages","title":"High school dropout labour market experiences — dropout_wages","text":"subset data National Longitudinal Study Youth tracking labour market experiences male high school dropouts (Murnane, Boudett, & Willett, 1999).","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/dropout_wages.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"High school dropout labour market experiences — dropout_wages","text":"","code":"dropout_wages dropout_wages_subset"},{"path":[]},{"path":"https://mccarthy-m-g.github.io/alda/reference/dropout_wages.html","id":"dropout-wages","dir":"Reference","previous_headings":"","what":"dropout_wages","title":"High school dropout labour market experiences — dropout_wages","text":"person-period data frame 6402 rows 9 columns: id Participant ID. log_wages Natural logarithm wages. experience Labour force experience years, tracked dropouts' first day work. ged Binary indicator whether dropout obtained GED. postsecondary_education Binary indicator whether dropout obtained post-secondary education. black Binary indicator whether dropout black. hispanic Binary indicator whether dropout hispanic. highest_grade Highest grade completed. unemployment_rate Unemployment rate local geographic area.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/dropout_wages.html","id":"dropout-wages-subset","dir":"Reference","previous_headings":"","what":"dropout_wages_subset","title":"High school dropout labour market experiences — dropout_wages","text":"person-period data frame 257 rows 5 columns: id Participant ID. log_wages Natural logarithm wages. experience Labour force experience years, tracked dropouts' first day work. black Binary indicator whether dropout black. highest_grade Highest grade completed.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/dropout_wages.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"High school dropout labour market experiences — dropout_wages","text":"Murnane, R. J., Boudett, K. P., & Willett, J. B. (1999). male dropouts benefit obtaining GED, postsecondary education, training? Evaluation Review, 23, 475–502. https://doi.org/10.1177/0193841x9902300501","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/early_intervention.html","id":null,"dir":"Reference","previous_headings":"","what":"Early educational intervention and cognitive performance — early_intervention","title":"Early educational intervention and cognitive performance — early_intervention","text":"Simulated data Burchinal, Campbell, Bryant, Wasik, Ramey (1997) measuring effect early educational intervention cognitive performance sample African-American children ages 12, 18, 24 months.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/early_intervention.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Early educational intervention and cognitive performance — early_intervention","text":"","code":"early_intervention"},{"path":"https://mccarthy-m-g.github.io/alda/reference/early_intervention.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Early educational intervention and cognitive performance — early_intervention","text":"person-period data frame 309 rows 4 columns: id Child ID. age Age years time measurement. treatment Treatment condition (control = 0, intervention = 1). cognitive_score Cognitive performance score one two standardized intelligence tests: Bayley Scales Infant Development (Bayley, 1969) 12 18 months, Stanford Binet (Terman & Merrill, 1972) 24 months.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/early_intervention.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Early educational intervention and cognitive performance — early_intervention","text":"Burchinal, M. R., Campbell, F. ., Bryant, D. M., Wasik, B. H., & Ramey, C. T. (1997). Early intervention mediating processes cognitive performance children low income African American families. Child Development, 68, 935–954. https://doi.org/10.2307/1132043","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/early_intervention.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Early educational intervention and cognitive performance — early_intervention","text":"request researchers, Singer Willett (2003) provide data Burchinal, Campbell, Bryant, Wasik, Ramey's (1997) order ensure privacy study's participants. However, data provided simulated similar statistical properties study order match estimates figures presented text best possible.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/early_intervention.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Early educational intervention and cognitive performance — early_intervention","text":"Bayley, N. (1969). Bayley Scales Infant Development. New York: Psychological Corp. Terman, L. M., & Merrill, N. Q. (1972). Stanford-Binet Intelligence Scale: 1972 Norms Editions. Boston: Houghton Mifflin.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/externalizing_behaviour.html","id":null,"dir":"Reference","previous_headings":"","what":"Externalizing behaviour in children — externalizing_behaviour","title":"Externalizing behaviour in children — externalizing_behaviour","text":"subset data Keiley, Bates, Dodge, Pettit (2000) measuring changes externalizing behaviour sample 45 children tracked first sixth grade.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/externalizing_behaviour.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Externalizing behaviour in children — externalizing_behaviour","text":"","code":"externalizing_behaviour"},{"path":"https://mccarthy-m-g.github.io/alda/reference/externalizing_behaviour.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Externalizing behaviour in children — externalizing_behaviour","text":"person-period data frame 270 rows 5 columns: id Child ID. time Time measurement. externalizing_behaviour Sum scores Achenbach's (1991) Child Behavior Checklist. Scores range 0 68 female Binary indicator whether adolescent female. grade Grade year.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/externalizing_behaviour.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Externalizing behaviour in children — externalizing_behaviour","text":"Keiley, M. K., Bates, J. E., Dodge, K. ., & Pettit, G. S. (2000). cross-domain growth analysis: Externalizing internalizing behavior 8 years childhood. Journal Abnormal Child Psychology, 28, 161–179. https://doi.org/10.1023%2Fa%3A1005122814723","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/externalizing_behaviour.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Externalizing behaviour in children — externalizing_behaviour","text":"Achenbach, T. M. (1991). Manual Child Behavior Checklist 4–18 1991 Profile. Burlington, VT: University Vermont Press.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_arrest.html","id":null,"dir":"Reference","previous_headings":"","what":"Age of first juvenile arrest — first_arrest","title":"Age of first juvenile arrest — first_arrest","text":"Data Keiley Martin (2002) measuring effect child abuse risk first juvenile arrest sample 1553 adolescents aged 8 18. Adolescents followed age 18 arrested.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_arrest.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Age of first juvenile arrest — first_arrest","text":"","code":"first_arrest"},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_arrest.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Age of first juvenile arrest — first_arrest","text":"person-period data frame 15834 rows 7 columns: id Participant ID. time Age first juvenile arrest. censor Censoring status. abused Binary indicator whether adolescent abused. black Binary indicator whether adolescent black. period Age record corresponds . event Binary indicator whether adolescent arrested.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_arrest.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Age of first juvenile arrest — first_arrest","text":"Keiley, M. K., & Martin, N. C. (2002). Child abuse, neglect, juvenile delinquency: “new” statistical approaches can inform understanding “old” questions—reanalysis Widon, 1989. Manuscript submitted publication.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_cocaine.html","id":null,"dir":"Reference","previous_headings":"","what":"Age of first cocaine use — first_cocaine","title":"Age of first cocaine use — first_cocaine","text":"Data Burton colleagues (1996) measuring relation age first cocaine use drug-use history random sample 1658 white American men. Age first cocaine use drug-use history determined two interviews eleven years apart (1974 1985).","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_cocaine.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Age of first cocaine use — first_cocaine","text":"","code":"first_cocaine"},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_cocaine.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Age of first cocaine use — first_cocaine","text":"person-level data frame 1658 rows 15 columns: id Participant ID. used_cocaine_age Age first cocaine use. censor Censoring status. birth_year early_marijuana_use Binary indicator whether marijuana used age 17. used_marijuana Binary indicator whether participant used marijuana study period. used_marijuana_age Age participant first used marijuana. sold_marijuana Binary indicator whether participant sold marijuana study period. sold_marijuana_age Age participant first sold marijuana. early_drug_use Binary indicator whether drugs used age 17. used_drugs Binary indicator whether participant used drugs study period. used_drugs_age Age participant first used drugs. sold_drugs Binary indicator whether participant sold drugs study period. sold_drugs_age Age participant first sold drugs. rural Binary indicator whether participant lived rural area.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_cocaine.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Age of first cocaine use — first_cocaine","text":"Burton, R. P. D., Johnson, R. J., Ritter, C., & Clayton. R. R. (1996). effects role socialization initiation cocaine use: event history analysis adolescence middle adulthood. Journal Health Social Behavior, 37, 75–90. https://doi.org/10.2307/2137232","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_depression_1.html","id":null,"dir":"Reference","previous_headings":"","what":"Age of first depression — first_depression_1","title":"Age of first depression — first_depression_1","text":"subset data Wheaton, Rozell, Hall (1997) measuring relation age first depressive episode several childhood adult traumatic stressors random sample 1393 adults living metropolitan Toronto, Ontario. Age first depressive episode traumatic stressors determined structured interview.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_depression_1.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Age of first depression — first_depression_1","text":"","code":"first_depression_1"},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_depression_1.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Age of first depression — first_depression_1","text":"person-period data frame 36997 rows 11 columns: id Participant ID. onset Age first depressive episode. censor Censoring status. interview_age Age time interview. female Binary indicator whether adult female. siblings Number siblings. bigfamily Binary indicator whether adult five siblings. period Age record corresponds . depressive_episode Binary indicator whether adult experienced depressive episode. parental_divorce Binary indicator whether adult's parents divorced previous age. parental_divorce_now Binary indicator whether adult's parents divorced current period.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_depression_1.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Age of first depression — first_depression_1","text":"Wheaton, B., Roszell, P., & Hall, K. (1997). impact twenty childhood adult traumatic stressors risk psychiatric disorder. . H. Gotlib & B. Wheaton (Eds.), Stress adversity life course: Trajectories turning points (pp. 50–72). New York: Cambridge University Press. https://doi.org/10.1017/CBO9780511527623.003","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_depression_2.html","id":null,"dir":"Reference","previous_headings":"","what":"Age of first depression — first_depression_2","title":"Age of first depression — first_depression_2","text":"Data Sorenson, Rutter, Aneshensel (1991) measuring age first depressive episode sample 2974 adults. Age first depressive episode measured asking respondents whether , , age first experienced depressive episode.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_depression_2.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Age of first depression — first_depression_2","text":"","code":"first_depression_2"},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_depression_2.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Age of first depression — first_depression_2","text":"person-level data frame 2974 rows 3 columns: id Participant ID. age years. censor Censoring status.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_depression_2.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Age of first depression — first_depression_2","text":"Sorenson, S. B., Rutter, C. M., & Aneshensel, C. S. (1991). Depression community: investigation age onset. Journal Consulting Clinical Psychology, 59, 541546. https://doi.org/10.1037/0022-006X.59.4.541","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_sex.html","id":null,"dir":"Reference","previous_headings":"","what":"Age of first sexual intercourse — first_sex","title":"Age of first sexual intercourse — first_sex","text":"subset data Capaldi, Crosby, Stoolmiller (1996) measuring grade year first sexual intercourse sample 180 -risk heterosexual adolescent males. Adolescent males followed Grade 7 Grade 12 reported sexual intercourse first time.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_sex.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Age of first sexual intercourse — first_sex","text":"","code":"first_sex"},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_sex.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Age of first sexual intercourse — first_sex","text":"person-level data frame 180 rows 5 columns: id Participant ID. grade Grade year first sexual intercourse. censor Censoring status. parental_transition Binary indicator whether adolescent experienced parental transition (parents separated repartnered). parental_antisociality Composite score across four indicators measuring parents' level antisocial behaviour child's formative years.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_sex.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Age of first sexual intercourse — first_sex","text":"Capaldi, D. M., Crosby, L., & Stoolmiller, M. (1996). Predicting timing first sexual intercourse -risk adolescent males. Child Development, 67, 344–359. https://doi.org/10.2307/1131818","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/first_sex.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Age of first sexual intercourse — first_sex","text":"Capaldi, Crosby, Stoolmiller's (1996) original sample consisted 182 adolescent males applying exclusion criteria analysis; Singer Willett (2003) excluded additional two males data reported anal intercourse another male.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/health_workers.html","id":null,"dir":"Reference","previous_headings":"","what":"Length of health worker employment — health_workers","title":"Length of health worker employment — health_workers","text":"subset data Singer colleagues (1998) measuring length employment sample 2074 health care workers hired community migrant health centres. Health care workers followed 33 months termination employment.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/health_workers.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Length of health worker employment — health_workers","text":"","code":"health_workers"},{"path":"https://mccarthy-m-g.github.io/alda/reference/health_workers.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Length of health worker employment — health_workers","text":"person-level data frame 2074 rows 3 columns: id Participant ID. weeks Number weeks termination employment. censor Censoring status.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/health_workers.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Length of health worker employment — health_workers","text":"Singer, J. D., Davidson, S., Graham, S., & Davidson, H. S. (1998). Physician retention community migrant health centers: stays long? Medical Care, 38, 11981213. https://doi.org/10.1097/00005650-199808000-00008","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/honking.html","id":null,"dir":"Reference","previous_headings":"","what":"Time to horn honking — honking","title":"Time to horn honking — honking","text":"subset data Diekmann colleagues (1996) measuring time horn honking sample 57 motorists purposefully blocked green light Volkswagen Jetta busy intersection near centre Munich, West Germany two busy afternoons (Sunday Monday) 1998. Motorists followed honked horns took alternative action (beaming changing lanes).","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/honking.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Time to horn honking — honking","text":"","code":"honking"},{"path":"https://mccarthy-m-g.github.io/alda/reference/honking.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Time to horn honking — honking","text":"person-level data frame 57 rows 3 columns: id Participant ID. seconds Number seconds horn honking alternative action. censor Censoring status.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/honking.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Time to horn honking — honking","text":"Diekmann, ., Jungbauer-Gans, M., Krassnig, H., & Lorenz, S. (1996). Social status aggression: field study analyzed survival analysis. Journal Social Psychology, 136, 761–768. https://doi.org/10.1080/00224545.1996.9712252","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/judges.html","id":null,"dir":"Reference","previous_headings":"","what":"Supreme Court justice tenure — judges","title":"Supreme Court justice tenure — judges","text":"Data Zorn Van Winkle (2000) long 107 justices appointed U.S. Supreme Court 1789 1980 remained position.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/judges.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Supreme Court justice tenure — judges","text":"","code":"judges"},{"path":"https://mccarthy-m-g.github.io/alda/reference/judges.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Supreme Court justice tenure — judges","text":"person-level data frame 109 rows 7 columns: id Justice ID. tenure Time retirement death years. dead Binary indicator whether justice died. retired Binary indicator whether justice retired. left_appointment Binary indicator whether justice left appointment. appointment_age Age time appointment. appointment_year Year appointment.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/judges.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Supreme Court justice tenure — judges","text":"Zorn, C. J., & van Winkle, S. R. (2000). competing risks model Supreme Court vacancies, 1780–1992. Political Behavior, 22, 145–166.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/math_dropout.html","id":null,"dir":"Reference","previous_headings":"","what":"Math course History — math_dropout","title":"Math course History — math_dropout","text":"Data Graham (1997) measuring relation mathematics course-taking gender identity sample 3790 tenth grade high school students. Students followed 5 terms (eleventh grade, twelfth grade, first three semesters college) stopped enrolling mathematics courses.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/math_dropout.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Math course History — math_dropout","text":"","code":"math_dropout"},{"path":"https://mccarthy-m-g.github.io/alda/reference/math_dropout.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Math course History — math_dropout","text":"person-period data frame 9558 rows 6 columns: id Participant ID. last_term term student stopped enrolling mathematics courses. woman Binary indicators whether student identified woman. censor Censoring status. term Term record corresponds . event Binary indicator whether student stopped enrolling mathematics courses given term.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/math_dropout.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Math course History — math_dropout","text":"Graham, S. E. (1997). exodus mathematics: ? Unpublished doctoral dissertation. Harvard University, Graduate School Education.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/monkeys.html","id":null,"dir":"Reference","previous_headings":"","what":"Piagetian monkeys — monkeys","title":"Piagetian monkeys — monkeys","text":"Data Ha, Kimpo, Sackett (1997) measuring age first demonstration object recognition sample 123 pigtailed macaques. Monkeys followed 37 days demonstrated classic Piagetian stage development known object recognition.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/monkeys.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Piagetian monkeys — monkeys","text":"","code":"monkeys"},{"path":"https://mccarthy-m-g.github.io/alda/reference/monkeys.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Piagetian monkeys — monkeys","text":"person-level data frame 123 rows 7 columns: id Monkey ID. sessions Number sessions monkey completed demonstrating object recognition. initial_age Age initial testing days. end_age Age end testing days. censor Censoring status. birth_weight Decile equivalent monkey's birth weight comparison colony-wide sex-specific standards. female Binary indicator whether adolescent female.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/monkeys.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Piagetian monkeys — monkeys","text":"Ha, J. C., Kimpo, C. L., & Sackett, G. P. (1997). Multiple-spell, discrete-time survival analysis developmental data: Object concept pigtailed macaques. Developmental Psychology, 33, 1054–1059. https://doi.org/10.1037//0012-1649.33.6.1054","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/opposites_naming.html","id":null,"dir":"Reference","previous_headings":"","what":"Opposites naming Task — opposites_naming","title":"Opposites naming Task — opposites_naming","text":"Artificial data created Willett (1988) measuring changes performance hypothetical \"opposites naming\" task four week period sample 35 people.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/opposites_naming.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Opposites naming Task — opposites_naming","text":"","code":"opposites_naming"},{"path":"https://mccarthy-m-g.github.io/alda/reference/opposites_naming.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Opposites naming Task — opposites_naming","text":"person-period data frame 140 rows 5 columns: id Participant ID. wave Wave measurement. time Wave measurement centred time 0. opposites_naming_score Score \"opposites naming\" task. baseline_cognitive_score Baseline score standardized instrument assessing general cognitive skill.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/opposites_naming.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Opposites naming Task — opposites_naming","text":"Willett, J. B. (1988). Questions answers measurement change. E. Rothkopf (Ed.), Review research education (1988–89) (pp. 345–422). Washington, DC: American Educational Research Association. https://doi.org/10.3102/0091732X015001345","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/physicians.html","id":null,"dir":"Reference","previous_headings":"","what":"Physician career history — physicians","title":"Physician career history — physicians","text":"subset data Singer colleagues (1998) measuring length employment sample 812 physicians hired community migrant health centres. Physicians followed 33 months termination employment. measurement window began January 1, 1990, ended September 30, 1992.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/physicians.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Physician career history — physicians","text":"","code":"physicians"},{"path":"https://mccarthy-m-g.github.io/alda/reference/physicians.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Physician career history — physicians","text":"person-level data frame 812 rows 8 columns: id Participant ID. start_date Date hire. end_date Date departure. entry Number years since hire physician worked entering measurement window. exit Number years physician worked departure. censor Censoring status. part_time Binary indicator whether physician worked part time. age Age time hire.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/physicians.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Physician career history — physicians","text":"Singer, J. D., Davidson, S., Graham, S., & Davidson, H. S. (1998). Physician retention community migrant health centers: stays long? Medical Care, 38, 11981213. https://doi.org/10.1097/00005650-199808000-00008","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/psychiatric_discharge.html","id":null,"dir":"Reference","previous_headings":"","what":"Days to psychiatric hospital discharge — psychiatric_discharge","title":"Days to psychiatric hospital discharge — psychiatric_discharge","text":"subset data Foster (2000) measuring relation number days discharge psychiatric hospital type treatment plan sample 174 adolescents emotional behavioural problems.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/psychiatric_discharge.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Days to psychiatric hospital discharge — psychiatric_discharge","text":"","code":"psychiatric_discharge"},{"path":"https://mccarthy-m-g.github.io/alda/reference/psychiatric_discharge.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Days to psychiatric hospital discharge — psychiatric_discharge","text":"person-level data frame 174 rows 4 columns: id Participant ID. days Number days discharge. censor Censoring status. treatment_plan Binary indicator whether patient traditional coverage plan (0) innovative coverage plan (1).","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/psychiatric_discharge.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Days to psychiatric hospital discharge — psychiatric_discharge","text":"Foster, E. M. (2000). continuum care reduce inpatient length stay? Evaluation Program Planning, 23, 53–65. https://doi.org/10.1016/S0149-7189(99)00037-3","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/reading_scores.html","id":null,"dir":"Reference","previous_headings":"","what":"Peabody Individual Achievement Test reading scores — reading_scores","title":"Peabody Individual Achievement Test reading scores — reading_scores","text":"subset data Children National Longitudinal Study Youth measuring changes reading subtest Peabody Individual Achievement Test (PIAT) sample 89 African-American children across three waves around ages 6, 8, 10.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/reading_scores.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Peabody Individual Achievement Test reading scores — reading_scores","text":"","code":"reading_scores"},{"path":"https://mccarthy-m-g.github.io/alda/reference/reading_scores.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Peabody Individual Achievement Test reading scores — reading_scores","text":"person-period data frame 267 rows 5 columns: id Participant ID. wave Wave measurement. age_group Expected age measurement occasion. age Age years time measurement. reading_score Reading score reading subtest Peabody Individual Achievement Test (PIAT).","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/reading_scores.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Peabody Individual Achievement Test reading scores — reading_scores","text":"US Bureau Labor Statistics. National Longitudinal Survey Youth (Children NLSY). https://www.bls.gov/nls/nlsy79-children.htm","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/rearrest.html","id":null,"dir":"Reference","previous_headings":"","what":"Days to inmate recidivism — rearrest","title":"Days to inmate recidivism — rearrest","text":"Data Henning Frueh (1996) measuring measuring days rearrest sample 194 inmates recently released medium security prison. Inmates followed three years rearrested.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/rearrest.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Days to inmate recidivism — rearrest","text":"","code":"rearrest"},{"path":"https://mccarthy-m-g.github.io/alda/reference/rearrest.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Days to inmate recidivism — rearrest","text":"person-level data frame 194 rows 7 columns: id Participant ID. days Number days rearrest. months Number months rearrest, scale \"average\" month (30.4375 days). censor Censoring status. personal Committed person-related crime property Binary indicator whether inmate committed property crime. age Centred age time release.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/rearrest.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Days to inmate recidivism — rearrest","text":"Henning, K. R., & Frueh, B. C. (1996). Cognitive-behavioral treatment incarcerated offenders: evaluation Vermont Department Corrections' cognitive self-change program. Criminal Justice Behavior, 23, 523–541. https://doi.org/10.1177/0093854896023004001","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/suicide_ideation.html","id":null,"dir":"Reference","previous_headings":"","what":"Age of first suicide ideation — suicide_ideation","title":"Age of first suicide ideation — suicide_ideation","text":"subset data Bolger colleagues (1989) measuring age first suicide ideation sample 391 undergraduate students aged 16 22. Age first suicide ideation measured two-item survey asking respondents \"ever thought committing suicide?\" , \"age thought first occur ?\"","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/suicide_ideation.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Age of first suicide ideation — suicide_ideation","text":"","code":"suicide_ideation"},{"path":"https://mccarthy-m-g.github.io/alda/reference/suicide_ideation.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Age of first suicide ideation — suicide_ideation","text":"person-level data frame 391 rows 4 columns: id Participant ID. age Reported age first suicide ideation. censor Censoring status. age_now Participant age time survey.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/suicide_ideation.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Age of first suicide ideation — suicide_ideation","text":"Bolger, N., Downey, G., Walker, E., & Steininger, P. (1989). onset suicide ideation childhood adolescence. Journal Youth Adolescence, 18, 175–189. https://doi.org/10.1007/BF02138799","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/teachers.html","id":null,"dir":"Reference","previous_headings":"","what":"Years to special education teacher turnover — teachers","title":"Years to special education teacher turnover — teachers","text":"subset data Singer (1993) measuring many years 3941 newly hired special educators Michigan stayed teaching 1972 1978. Teachers followed 13 years stopped teaching state.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/teachers.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Years to special education teacher turnover — teachers","text":"","code":"teachers"},{"path":"https://mccarthy-m-g.github.io/alda/reference/teachers.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Years to special education teacher turnover — teachers","text":"person-level data frame 3941 rows 3 columns: id Teacher ID. years number years teacher's dates hire departure Michigan public schools. censor Censoring status.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/teachers.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Years to special education teacher turnover — teachers","text":"Singer, J. D. (1992). special educators' careers special? Results 13-Year Longitudinal Study. Exceptional Children, 59, 262–279. https://doi.org/10.1177/001440299305900309","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/tenure.html","id":null,"dir":"Reference","previous_headings":"","what":"Years to academic tenure — tenure","title":"Years to academic tenure — tenure","text":"Data Gamse Conger (1997) measuring number years receiving tenure sample 260 semifinalists fellowship recipients National Academy Education–Spencer Foundation PostDoctoral Fellowship Program took academic job earning doctorate. Academics followed nine years received tenure.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/tenure.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Years to academic tenure — tenure","text":"","code":"tenure"},{"path":"https://mccarthy-m-g.github.io/alda/reference/tenure.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Years to academic tenure — tenure","text":"person-level data frame 260 rows 3 columns: id Participant ID. years Number years receiving tenure. censor Censoring status.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/reference/tenure.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Years to academic tenure — tenure","text":"Gamse, B. C., & Conger, D. (1997). evaluation Spencer post-doctoral dissertation program. Cambridge, MA: Abt Associates.","code":""},{"path":"https://mccarthy-m-g.github.io/alda/news/index.html","id":"alda-0009000","dir":"Changelog","previous_headings":"","what":"alda 0.0.0.9000","title":"alda 0.0.0.9000","text":"Added NEWS.md file track changes package.","code":""}]