Replicating the Monte Carlo split-half #13

Spiritspeak · 2022-01-04T11:05:59Z

Spiritspeak
Jan 4, 2022

I've just read your paper on split-half reliability, and so I wanted to try out the Monte Carlo approach you mentioned in it. It's not clear to me how to do it correctly, and I could not find the right way in the splithalfr package source code. I tried interpreting the descriptions given by you and Williams & Kaufmann, but the two interpretations I've implemented both give incorrect reliabilities. What do you suggest?

the interpretations I've tried are as follows:

randomly sample (with replacement) two sets from the same dataset, the two sets being of equal size to the original dataset, and it is thus permitted that a single row in the data can end up in both sets. This gives correlations that are too large.
randomly split the data in two, then sample (randomly with replacement) from the two halves two sets which are as big as the original dataset. This gives correlations that are too small.

Kind regards.

#sampling two datasets of equal size to the original dataset, with replacement and with overlap allowed
cluster<-makeCluster(6)
registerDoParallel(cluster)
mcrels<-foreach(i=1:100,.packages=c("AATtools","magrittr","dplyr")) %dopar% {
  h0<-ds %>% group_by(subj,is_pull,is_target) %>% slice_sample(prop=1,replace=T)
  h1<-ds %>% group_by(subj,is_pull,is_target) %>% slice_sample(prop=1,replace=T)
 
  sc0<-aat_compute(h0,"subj","is_pull","is_target","rt")
  sc1<-aat_compute(h1,"subj","is_pull","is_target","rt")
 
  merge(sc0,sc1,by="subj") %$% cor(ab.x,ab.y)
}
stopCluster(cluster)
cormean(unlist(mcrels),rep(length(unique(ds$subj)),100),type="OPK")
# these correlations are too large.


#splitting the data, then sampling (with replacement) two sets of the same size as the original data
cluster<-makeCluster(6)
registerDoParallel(cluster)
mcrels<-foreach(i=1:100,.packages=c("AATtools","magrittr","dplyr")) %dopar% {
  iterset<- ds %>% group_by(subj,is_pull,is_target) %>% mutate(key=sample((1:n()) %%2))
  h0<-iterset %>% filter(key==0) %>% group_by(subj,is_pull,is_target) %>% slice_sample(prop=2,replace=T)
  h1<-iterset %>% filter(key==1) %>% group_by(subj,is_pull,is_target) %>% slice_sample(prop=2,replace=T)
 
  sc0<-aat_compute(h0,"subj","is_pull","is_target","rt")
  sc1<-aat_compute(h1,"subj","is_pull","is_target","rt")
  merge(sc0,sc1,by="subj") %$% cor(ab.x,ab.y)
}
stopCluster(cluster)
cormean(unlist(mcrels),rep(length(unique(ds$subj)),100),type="OPK")
# these correlations are too small.

Answered by tpronk

Jan 4, 2022

Hi @Spiritspeak,

Your first interpretation of the splitting method is correct, i.e. "randomly sample (with replacement) two sets from the same dataset, the two sets being of equal size to the original dataset, and it is thus permitted that a single row in the data can end up in both sets.". In this vignette (included in the package) I provide a concrete example of this approach.

With regard to your concerns about Monte Carlo estimates being too high, I've got two suggestions:

Could be that your data just shows strange patterns across splitting methods, similar to the AAT in the compendium paper. You could try a whole bunch of splitting methods via the splithalfr package (also first-secon…

View full answer

tpronk · 2022-01-04T17:13:20Z

tpronk
Jan 4, 2022
Maintainer

Hi @Spiritspeak,

Your first interpretation of the splitting method is correct, i.e. "randomly sample (with replacement) two sets from the same dataset, the two sets being of equal size to the original dataset, and it is thus permitted that a single row in the data can end up in both sets.". In this vignette (included in the package) I provide a concrete example of this approach.

With regard to your concerns about Monte Carlo estimates being too high, I've got two suggestions:

Could be that your data just shows strange patterns across splitting methods, similar to the AAT in the compendium paper. You could try a whole bunch of splitting methods via the splithalfr package (also first-second, odd-even, different levels of stratification) and see how this pans out.
Could be you're right. In the pre-print of my latest paper, I report some findings that chime in with your concerns, based on a sub-sampling method to see how permutated and Monte Carlo estimates develop as a function of trial count. See Figure 1, page 19.

Cheers T

1 reply

tpronk Jan 4, 2022
Maintainer

Cross-linking this thread with another thread about Monte Carlo splitting

tpronk · 2022-01-10T09:32:01Z

tpronk
Jan 10, 2022
Maintainer

Given that there hasn't been any activity in this thread for about a week, I'll close this thread as "answered". However, @Spiritspeak, feel free to reopen it if you've got further questions or comments.

0 replies

tpronk · 2022-12-03T08:43:45Z

tpronk
Dec 3, 2022
Maintainer

Hey @Spiritspeak, I spotted a preprint making a strong case against the using the Monte Carlo method because it overestimates reliability. They based their argument on simulated data. This nicely lines up with our preprint (now published here), which makes the same argument, but based on empirical data.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replicating the Monte Carlo split-half #13

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Replicating the Monte Carlo split-half #13

Spiritspeak Jan 4, 2022

Replies: 3 comments · 1 reply

tpronk Jan 4, 2022 Maintainer

tpronk Jan 4, 2022 Maintainer

tpronk Jan 10, 2022 Maintainer

tpronk Dec 3, 2022 Maintainer

Spiritspeak
Jan 4, 2022

Replies: 3 comments 1 reply

tpronk
Jan 4, 2022
Maintainer

tpronk Jan 4, 2022
Maintainer

tpronk
Jan 10, 2022
Maintainer

tpronk
Dec 3, 2022
Maintainer