Skip to content

Latest commit

 

History

History
332 lines (256 loc) · 8.98 KB

session_01.org

File metadata and controls

332 lines (256 loc) · 8.98 KB

Advanced mixed-models workshop: Session 1

Setup

Overview and background

Tentative schedule

Today

StartEndActivity
109:0010:30Review / Overview
211:0012:30Dataset 1 (one factor)
313:3015:00Dataset 2 (multifactor)
415:3017:00Slack time / Q&A / BYOD\(^*\)

back

Tomorrow

StartEndActivity
109:0010:30Dataset 3 (GLMM)
211:0012:30Dataset 4 (multifactor GLMM)
313:0015:00Slack time / Q&A / BYOD\(^*\)

back again

\(^*\) Bring Your Own Data

Repository for this workshop

If you have git installed, use:

git clone https://github.com/dalejbarr/bremen.git

or download full archive from:

General information on LMEMs

  • Baayen (2008), Analyzing Linguistic Data
  • Baayen, Davidson, & Bates (2008), JML
  • Barr, Levy, Scheepers, Tily (2013), JML
  • Barr (2013), Frontiers in Psychology (interactions)
  • Bates et al. http://arxiv.org/pdf/1406.5823.pdf (technical)
  • Bolker et al. (2009), Trends in Ecology & Evolution
  • Westfall, Kenny, and Judd (2014), JEP: General (power)
  • see also r-lang and r-sig-mixed-models mailing lists
  • r-sig-mixed-models FAQ http://glmm.wikidot.com/faq
  • add-on packages afex, pbkrcomp, lmerTest

Generating multilevel data

Simulated data

  • single-factor within subject / between items
  • IV: type of word, DV = lexical decision times

\(Y_{si} = β_0 + S_{0s} + I_{0i} + (β_1 + S_{1s})X_{i} + e_{si}\)

Subjects

\(<S_{0i},S_{1i}> ∼ N(<0,0>, Σ)\)

\(Σ = \left(\begin{array}{cc}{τ_{00}}^2 & ρ_{I}τ_{00}τ_{11}
ρ_{I}τ_{00}τ_{11} & {τ_{11}}^2 \ \end{array}\right) \)

file:img/bivariatedist.jpg

Items

\(I_{0i} ∼ N(0, ω_{00}^2)\)

Define the data structures

library("MASS") # needed for mvrnorm

set.seed(11709)

nsubj <- 100
nitem <- 50 # must be an even number

########################################
## create the data structures
subj <- data.frame(subject_id = 1:nsubj)

item <- data.frame(item_id = 1:nitem,
                   cond = rep(1:2, each = nitem / 2))

trial <- expand.grid(subject_id = 1:nsubj,
                      item_id = 1:nitem)

Define parameters for data generation

########################################
## define parameters for data generation
## first, fixed effects
mu <- 800 # grand mean
eff <- 80 # 80 ms difference
effc <- c(-.5, .5) # deviation codes
res_sd <- 200 # residual (standard deviation)
iri <- 80 # by-item random intercept sd
sri <- 100 # by-subject random intercept sd
srs <- 40 # by-subject random slope sd
rcor <- .2 # correlation between intercept and slope

By-item random effects

## define item random effects variance
## and sample items
item$iri <- rnorm(nitem, mean = 0, sd = iri)

## view the expected mean for each item
## for a typical subject (random effs = 0)
head(cbind(mu, effc[item$cond] * eff, item$iri,
           mu + effc[item$cond] * eff + item$iri), 3)

By-subject random effects

## define subject random effects variance
## variance co-variance matrix
svcov <- matrix(c(sri^2, 
                  rcor * sri * srs,
                  rcor * sri * srs,
                  srs^2), nrow = 2)

## sample subjects
srfx <- mvrnorm(nsubj, mu = c(0, 0), Sigma = svcov)

subj$sri <- srfx[, 1]
subj$srs <- srfx[, 2]

head(subj, 3)

Pull it all together

dat <- merge(merge(subj, trial), item)
dat <- dat[order(dat$subject_id, dat$item_id), ] # sort
dat$err <- rnorm(nrow(dat), sd = res_sd) # trial-level noise
dat$Y <- with(dat,
    mu + sri + iri + (eff + srs) * effc[cond] + err)

head(dat)

Decomposition matrix

\(Y_{si} = β_0 + S_{0s} + I_{0i} + (β_1 + S_{1s})X_{i} + e_{si}\)

 Source: local data frame [64 x 11]
Groups: sid, c

   sid iid c        Y  mu        sri        iri eff       srs    x        err
1    1   1 1 866.6973 800 -127.33939   51.14385  80 -15.07568 -0.5  175.35504
2    1   2 1 520.0347 800 -127.33939  -87.15511  80 -15.07568 -0.5  -33.00861
3    1  13 2 921.6896 800 -127.33939 -104.37196  80 -15.07568  0.5  320.93881
4    1  14 2 711.8382 800 -127.33939 -124.47164  80 -15.07568  0.5  131.18702
5    2   1 1 700.6971 800   38.40241   51.14385  80 -20.38240 -0.5 -159.04036
6    2   2 1 560.1265 800   38.40241  -87.15511  80 -20.38240 -0.5 -161.31203
7    2  13 2 280.5352 800   38.40241 -104.37196  80 -20.38240  0.5 -483.30409
8    2  14 2 479.3219 800   38.40241 -124.47164  80 -20.38240  0.5 -264.41768
9    3   1 1 695.3036 800  -12.51718   51.14385  80  14.50942 -0.5  -96.06837
10   3   2 1 773.5395 800  -12.51718  -87.15511  80  14.50942 -0.5  120.46647
.. ... ... .      ... ...        ...        ... ...       ...  ...        ...

Fitting the model

library("lme4")
dat$c <- dat$cond - mean(dat$cond)
mod <- lmer(Y ~ c + (1 + c | subject_id) + (1 | item_id),
            dat, REML = FALSE)

Viewing results

Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: Y ~ c + (1 + c | subject_id) + (1 | item_id)
   Data: dat

     AIC      BIC   logLik deviance df.resid 
  5179.5   5207.2  -2582.8   5165.5      377 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.7280 -0.5511 -0.0343  0.6103  3.3860 

Random effects:
 Groups     Name        Variance Std.Dev. Corr 
 item_id    (Intercept)  8792.4   93.768       
 subject_id (Intercept) 11434.8  106.934       
            c              95.5    9.772  -1.00
 Residual               33590.6  183.278       
Number of obs: 384, groups:  item_id, 24; subject_id, 16

Fixed effects:
            Estimate Std. Error t value
(Intercept)   779.44      34.18  22.802
c              46.36      42.68   1.086

Correlation of Fixed Effects:
  (Intr)
c -0.045