Advanced mixed-models workshop: Session 2

Setup

Formatting

Prepare the data

dat <- read.csv("misc/LecXX-Datasets/KeysarEtAl2000.csv")
saveRDS(dat, "kbbb.rds")

Single factor design

Determining Maximal Random Effects

Random intercept is needed whenever there are multiple observations per unit
Any within-unit factor gets a random slope, unless there is only one observation per level per unit
Between-unit factors do not get a random slope
For each interaction, include a slope for the highest order combination of within-subject factors subsumed by the interaction
For time-series data, include random slopes for time predictors if you have more than one time series per unit

Keysar, Barr, Balin, & Brauner (2000)

Keysar, B., Barr, D. J., Balin, J. A., & Brauner, J. S. (2000). Taking perspective in conversation: The role of mutual knowledge in comprehension. Psychological Science, 11, 32–38.

When interpreting expressions e.g. the small candle, do listeners experience interference?

Keysar, Barr, Balin, & Brauner (2000)

20 subjects, 12 items for each subject (N=240)
one factor: condition (competitor vs. noncompetitor)
data recorded using a 60 Hz eyetracker
DV: latency to fixate the target object, measured from onset of the critical word

Field	Description
`SubjID`	Subject identifier (N=20)
`cond`	Condition (E=competitor, C=noncompetitor)
`crit`	Moment of onset of critical word (frames)
`targfix`	Moment the hand touched the target (frames)
`object`	Name of the experimental item

Keysar, Barr, Balin, & Brauner (2000)

load the data from KeysarEtAl2000.rds into dataframe dat
calculate tfix, the latency for touching the target in milliseconds, store this in the dataframe dat
make histogram of tfix
create “truncated” versions of tfix, tfixTrunc, truncating values higher than the 97.5th percentile
calculate means in each condition

Linear mixed-model analysis

Now do the analysis using model comparison in a linear mixed effects model, with maximal random effects
- tip: use deviation coding for condition
Derive \(p\)-values using:
- Wald \(z\) statistic (“t-as-z”)
- Likelihood ratio tests
Would you say that subjects or items introduce more overall variation in grand mean target latencies?
Do subjects or items vary more in terms of the effect of condition (competitor)?
Look at the BLUPS.
- Which items show the effect most strongly?
- Which subjects?
- Do all subjects show the effect?
- Do all items show the effect?

Load and preprocess

dat <- readRDS("kbbb.rds")
# calculate latencies
dat$tfix <- 1000*((dat$targfix - dat$crit) / 60)

hist(dat$tfix)

Clean up the latency data

# truncate outliers at 97.5th percentile of distribution
cutoff.tfix <- quantile(dat$tfix, probs=.975, na.rm=TRUE)
dat$tfixTrunc <- ifelse(dat$tfix>cutoff.tfix, cutoff.tfix, dat$tfix)

hist(dat$tfixTrunc)

Descriptive stats

# aggregate
aggregate(tfixTrunc ~ cond, dat, mean)

Run t-test (aggregating first by subject)

# re-create data: t-test
dat.subj <- aggregate(tfixTrunc ~ SubjID + cond, dat, mean)
dat.subj <- dat.subj[order(dat.subj$SubjID, dat.subj$cond), ]

dat.t <- t.test(subset(dat.subj, cond=="C")$tfixTrunc,
                subset(dat.subj, cond=="E")$tfixTrunc, paired=TRUE)
print(dat.t)

Run linear mixed model

# linear mixed model
# create deviation-coded predictor
dat$D <- dat$cond == "E"
dat$C2 <- dat$D - mean(dat$D)

library("lme4")
mod1 <- lmer(tfixTrunc ~ C2 +
                 (1 + C2 | SubjID) + 
                 (1 + C2 | object),
             data=dat,
             subset = complete.cases(dat),
             REML=FALSE)

View results

Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: tfixTrunc ~ C2 + (1 + C2 | SubjID) + (1 + C2 | object)
   Data: dat
 Subset: complete.cases(dat)

     AIC      BIC   logLik deviance df.resid 
  4421.9   4453.0  -2201.9   4403.9      226 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 SubjID   (Intercept)  417282   645.97      
          C2           758341   870.83  1.00
 object   (Intercept)  616982   785.48      
          C2             6765    82.25  1.00
 Residual             7236631  2690.10      
Number of obs: 235, groups:  SubjID, 20; object, 12

Fixed effects:
            Estimate Std. Error t value
(Intercept)   3306.4      321.1  10.296
C2            1439.6      402.2   3.579

Wald \(z\) statistics

paramest <- fixef(mod1)
stderrs <- sqrt(diag(vcov(mod1)))
tstats <- paramest / stderrs
pvals <- 2 * (1 - pnorm(abs(tstats)))

data.frame(b = paramest, se = stderrs, t = tstats, p = pvals)

Likelihood ratio tests

mod2 <- update(mod1, . ~ . -C2)

anova(mod1, mod2)

chi2 <- deviance(mod2) - deviance(mod1)
pchi <- pchisq(chi2, 1, lower.tail = FALSE)

c(chisq = chi2, p = pchi)

Blups

blups <- ranef(mod1)
blups$SubjID$C2 + fixef(mod1)[2] # every subject shows effect
blups$object$C2 + fixef(mod1)[2] # every item shows effect

Additional stats

library("pbkrtest")

mod_kr <- KRmodcomp(mod1, mod2)

summary(mod_kr)

#+RESULTS[a21201dd2704288fcddacb9041b2d895bd722e63]:

F-test with Kenward-Roger approximation; computing time: 1.14 sec.
large : tfixTrunc ~ C2 + (1 + C2 | SubjID) + (1 + C2 | object)
small : tfixTrunc ~ (1 + C2 | SubjID) + (1 + C2 | object)
          stat     ndf     ddf F.scaling  p.value   
Ftest  12.4550  1.0000  8.9771         1 0.006448 **
FtestU 12.4550  1.0000  8.9771           0.006448 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Parametric bootstrap

mod_pb <- PBmodcomp(mod1, mod2)

summary(mod_pb)

#+RESULTS[3013bd824fc4a56fb33c20466b86fd7e9944571c]:

There were 50 or more warnings (use warnings() to see the first 50)
Parametric bootstrap test; time: 156.28 sec; samples: 1000 extremes: 0;
Requested samples: 1000 Used samples: 996 Extremes: 0
large : tfixTrunc ~ C2 + (1 + C2 | SubjID) + (1 + C2 | object)
small : tfixTrunc ~ (1 + C2 | SubjID) + (1 + C2 | object)
           stat     df     ddf   p.value    
PBtest   10.539                0.0010030 ** 
Gamma    10.539                0.0005716 ***
Bartlett 11.161  1.000         0.0008356 ***
F        10.539  1.000 -33.912              
LRT      10.539  1.000         0.0011689 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Warning message:
In pf(Fobs, df1 = ndf, df2 = ddf) : NaNs produced

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

session_02.org

session_02.org

Advanced mixed-models workshop: Session 2

Setup

Formatting

Prepare the data

Single factor design

Determining Maximal Random Effects

Keysar, Barr, Balin, & Brauner (2000)

Keysar, Barr, Balin, & Brauner (2000)

Keysar, Barr, Balin, & Brauner (2000)

Linear mixed-model analysis

Load and preprocess

Clean up the latency data

Descriptive stats

Run t-test (aggregating first by subject)

Run linear mixed model

View results

Wald \(z\) statistics

Likelihood ratio tests

Blups

Additional stats

Parametric bootstrap

Files

session_02.org

Latest commit

History

session_02.org

File metadata and controls

Advanced mixed-models workshop: Session 2

Setup

Formatting

Prepare the data

Single factor design

Determining Maximal Random Effects

Keysar, Barr, Balin, & Brauner (2000)

Keysar, Barr, Balin, & Brauner (2000)

Keysar, Barr, Balin, & Brauner (2000)

Linear mixed-model analysis

Load and preprocess

Clean up the latency data

Descriptive stats

Run t-test (aggregating first by subject)

Run linear mixed model

View results

Wald \(z\) statistics

Likelihood ratio tests

Blups

Additional stats

Parametric bootstrap