questions related to Cronbach's alpha and Monte Carlo splitting #6

ShawGoGo · 2021-07-27T11:12:54Z

ShawGoGo
Jul 27, 2021

Hi~
Recently, I encountered three questions related to Cronbach's alpha and Monte Carlo splitting when analyzing the eye-tracking data from an attention bias task.

The first one is about the missing data (dropped trials) and Cronbach's alpha. Just like you mentioned, it may not be appropriate to apply Cronbach's alpha for the trial responses using a scoring algorithm that drops trials, for instance, those with RTs < 200 ms. Do you have any suggestions for calculating Cronbach's alpha in this case? I noticed that in SPPS, they deleted the items completely with missing data.

The second question is about calculating the correlations after Monte Carlo splitting. I did it in three steps. First, I used Monte Carto splitting to resample the trials of each condition with replacement by the function (by_split).
by_split(ds, ds$participant, method = "random" ,replace = TRUE, split_p = 1, replications = 1,function(ds) {print(ds) },ncores = 1,verbose = F). Then, I aggregated the trials (get the mean of the trials) for each participant. At the end, I caculated the Spearman-Brown adjusted Pearson correlations per replication and got the mean of all the correlations. Are these the correct step to get the correlation? The correlations I got are much larger than the Conbach's alpha, e.g., 0.89 vs. 0.75. Is this pattern common for you?

The third question is about the correlation methods. In your scripts (VPT - Difference of Means), you listed Spearman-Brown adjusted Pearson correlations (spearman_brown), Flanagan-Rulon (flanagan_rulon), Angoff-Feldt (angoff_feldt), and Intraclass Correlation (short_icc) coefficients to estimate split-half reliability. In general, based on your experience, which method would be more reasonable to estimate the reliability of the trials which are randomly represented in a cognitive task block and all belonged to one condition.

Answered by tpronk

Jul 30, 2021

Hey @ShawGoGo,

Thanks for your questions and nice to see you're enjoying the splithalfr! Below I (hopefully accurately) summarize each question and try to give a useful response.

(Q1) How do I calculate Cronbach's alpha if I drop trials?

I could imagine a couple of approaches, using different definitions of alpha. For instance, coefficient alpha can be expressed in inter-item correlation, or (I suspect) as a SEM model. Calculate the parameters of the model with a method that handles missing data, and you've got a nice case for the equivalence of your coefficient. Via splithalfr, I approached alpha using the Flanagan-Rulon coefficient; see this simulation. You could extend this method by d…

View full answer

tpronk · 2021-07-30T12:30:16Z

tpronk
Jul 30, 2021
Maintainer

Hey @ShawGoGo,

Thanks for your questions and nice to see you're enjoying the splithalfr! Below I (hopefully accurately) summarize each question and try to give a useful response.

(Q1) How do I calculate Cronbach's alpha if I drop trials?

I could imagine a couple of approaches, using different definitions of alpha. For instance, coefficient alpha can be expressed in inter-item correlation, or (I suspect) as a SEM model. Calculate the parameters of the model with a method that handles missing data, and you've got a nice case for the equivalence of your coefficient. Via splithalfr, I approached alpha using the Flanagan-Rulon coefficient; see this simulation. You could extend this method by dropping trials before calculating the mean.

(Q2). How to I calculate reliability when splitting Monte Carlo?

The split seems OK. However, when you split Monte Carlo there is no need to apply a Spearman-Brown adjustment to the correlations, like you need to do for first-second, odd-even, or permutated. That's because the Monte Carlo already produces two parts that are just as long as the task you're splitting (split_p = 1). The above might explain why you got higher Monte Carlo coefficients. In the paper I found that, overall, Monte Carlo tends to be a little bit higher than other coefficients

(Q3). Which coefficient is the best option?

I haven't examined this in-depth (the whole splitting thing turned out to be a project in itself), but I can speculate a bit. If you randomize and repeat trials, the three CTT coefficients (Spearman-Brown, Flanagan-Rulon, Angoff-Feldt) tend to have the same values. I think that's because the three coefficients use models with increasing numbers of trial-level parameters, but a random sequence of trials does not offer any information for estimating those parameters. See Warrens (2015). I know of a single paper that sorted trials of a cognitive task in such a way that it made sense to fit a model with trial-level parameters (Green et al., 2016).

ICC is different; there are over 6 version of it. Those ICC ideas about consistency and agreement have, as far as I know, not been applied to split-half methods, but it could be really interesting :)

Best, Thomas

4 replies

ShawGoGo Aug 3, 2021
Author

thank you very much for the suggestions. About the Q2, after the splitting and before calculating any correlation coefficients, can I confirm that the aggregation (i.e., get the mean of the trials belonging to one condition) procedure is necessary?

tpronk Aug 4, 2021
Maintainer

Well, you'll need some kind of function fn_score that, given a split dataset, returns a number. Generally that number is produced by an aggregation over the split data, but technically speaking that's not necessary; so long as some number comes out. Whether it makes sense in the context of two-part reliability coefficients is another question though. Could you tell me a bit more about what you've got in mind?

ShawGoGo Aug 9, 2021
Author

Sorry for the late reply. My understanding of fn_score is that this function returns the average scores of trials per condition per participant. For example, I have a dataset containing 50 participants and 20 trials for each participant (all 20 trials belong to 1 condition for this dataset), fn_score will return a dataset with 50 values. Regarding my question, if I want to examine the internal reliability, I used Monte Carlo to split the data. If I set the replication equal to 1, I got two datasets, each with 1000 values (50*20). To evaluate the correlation coefficients, I should do aggregation first, such as use fn_score to get a dataset with 50 values but not just run the Flanagan-Rulon or any other CTT coefficients with 1000 values?

tpronk Aug 10, 2021
Maintainer

Two-part coefficients, like Spearman-Brown, Flanagan-Rulon, Angoff-Feldt, and ICC, are calculated on aggregated scores, not on individual items/trials. So it's handiest to have a dataset with 50 rows, and columns for participant, replication, and scores for each participant on each of the two parts.

Here is an example

library(splithalfr)
# Example data
example_data = data.frame(
 participant_id = rep(1 : 50, each = 20),
 trial_id = rep(1 : 20, 50),
 rt = rnorm(50 * 20)
)
# Example scoring function; receives (split) data from one participant and
# return a score. For example, the mean.
# When we are splitting, the function is called twice; once per split part
example_score = function(ds) {
  return (mean(ds$rt))
}
# One Monte Carlo replication
split_scores = by_split(
  data = example_data,
  participants = example_data$participant_id,
  fn_score = example_score,
  replications = 1,
  method = "random",
  replace = TRUE,
  split_p = 1,
  ncores = 1
)
# split_scores has 50 rows of data; one row per participant and replication
# split_scores has columns score_1 and score_2, these are the scores returned
# by example_score for each of the two parts

# Now we can calculate a coefficient for one replication. Since we're splitting
# Monte Carlo, each part is just as long as the original dataset, so no
# Spearman-Brown adjustment needed
cor(split_scores$score_1, split_scores$score_2)

# Same result as above, but if we'd have multiple replications, split_coef
# would return a vector of correlations; one per replication
split_coefs(split_scores, cor)

ShawGoGo · 2021-08-11T05:47:11Z

ShawGoGo
Aug 11, 2021
Author

Yes, that is what I did for the data set. Thank you very much for the example. Cheers! xz

…

On Tue, Aug 10, 2021 at 9:17 PM tpronk ***@***.***> wrote: Two-part coefficients, like Spearman-Brown, Flanagan-Rulon, Angoff-Feldt, and ICC, are calculated on aggregated scores, not on individual items/trials. So it's handiest to have a dataset with 50 rows, and columns for participant, replication, and scores for each participant on each of the two parts. Here is an example library(splithalfr) # Example data example_data = data.frame( participant_id = rep(1 : 50, each = 20), trial_id = rep(1 : 20, 50), rt = rnorm(50 * 20) ) # Example scoring function; receives (split) data from one participant and # return a score. For example, the mean. # When we are splitting, the function is called twice; once per split part example_score = function(ds) { return (mean(ds$rt)) } # One Monte Carlo replication split_scores = by_split( data = example_data, participants = example_data$participant_id, fn_score = example_score, replications = 1, method = "random", replace = TRUE, split_p = 1, ncores = 1 ) # split_scores has 50 rows of data; one row per participant and replication # split_scores has columns score_1 and score_2, these are the scores returned # by example_score for each of the two parts # Now we can calculate a coefficient for one replication. Since we're splitting # Monte Carlo, each part is just as long as the original dataset, so no # Spearman-Brown adjustment needed cor(split_scores$score_1, split_scores$score_2) # Same result as above, but if we'd have multiple replications, split_coef # would return a vector of correlations; one per replication split_coefs(split_scores, cor) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#6 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMD77LHB2AK7OPYEYDTOC7DT4ERFVANCNFSM5BCAI7CQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email> .

3 replies

ShawGoGo Nov 14, 2021
Author

Hi Tpronk,

When calculating the internal reliability of the response time of a task with a split-half coefficient or the Monte Carlo method, how do you handle the incorrect trials? Do you use nan to replace it or just remove that trial?

Thanks,
xz

tpronk Nov 15, 2021
Maintainer

Hey @ShawGoGo,

In general, incorrect trials are dropped (if the scoring algorithm specifies this). Besides that, opinions differ on whether to drop trials after splitting (as I recommend), or drop trials before splitting (as Sam Parsons recommends).

Cheers T

tpronk Nov 15, 2021
Maintainer

I'm quite curious how Sam feels about this, so I just gave him a heads-up

tpronk · 2022-01-04T17:47:37Z

tpronk
Jan 4, 2022
Maintainer

@ShawGoGo, a question (and concern) about Monte Carlo came in in this thread. Perhaps interesting to you?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

questions related to Cronbach's alpha and Monte Carlo splitting #6

{{title}}

Replies: 3 comments 7 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

questions related to Cronbach's alpha and Monte Carlo splitting #6

ShawGoGo Jul 27, 2021

(Q1) How do I calculate Cronbach's alpha if I drop trials?

Replies: 3 comments · 7 replies

tpronk Jul 30, 2021 Maintainer

(Q1) How do I calculate Cronbach's alpha if I drop trials?

(Q2). How to I calculate reliability when splitting Monte Carlo?

(Q3). Which coefficient is the best option?

ShawGoGo Aug 3, 2021 Author

tpronk Aug 4, 2021 Maintainer

ShawGoGo Aug 9, 2021 Author

tpronk Aug 10, 2021 Maintainer

ShawGoGo Aug 11, 2021 Author

ShawGoGo Nov 14, 2021 Author

tpronk Nov 15, 2021 Maintainer

tpronk Nov 15, 2021 Maintainer

tpronk Jan 4, 2022 Maintainer

ShawGoGo
Jul 27, 2021

Replies: 3 comments 7 replies

tpronk
Jul 30, 2021
Maintainer

ShawGoGo Aug 3, 2021
Author

tpronk Aug 4, 2021
Maintainer

ShawGoGo Aug 9, 2021
Author

tpronk Aug 10, 2021
Maintainer

ShawGoGo
Aug 11, 2021
Author

ShawGoGo Nov 14, 2021
Author

tpronk Nov 15, 2021
Maintainer

tpronk Nov 15, 2021
Maintainer

tpronk
Jan 4, 2022
Maintainer