Effect sizes not calculated from F (and t?) values #68

lottiegasp · 2020-12-16T00:30:36Z

It looks like rows with effect sizes on the website only show up if they have been calculated from means and SDs, not from F or t values. (I have checked that this is the case in Fleur's and my datasets)

Not sure if this is a problem with the script content or its deployment so assigning @christinabergmann and @erikriverson. Let me know if I can help

erikriverson · 2020-12-16T02:49:55Z

@lottiegasp can you provide a screenshot or more description of the issue? I'm not yet very familiar with the effect size calculation, but if I can see an example of what's not working, I may be able to assist. Thanks!

lottiegasp · 2020-12-16T04:08:19Z

For example, NaturalSpeechSegmentation_Neuro should have 26 effect sizes, but on the MetaLab website only 6 show up. These are the six where means and standard deviations are included (rows 17-22, columns BB to BE), but 20 should be able to be calculated from F values (column BI) (and there are 8 rows where no effect size could be calculated because none of means, SDs, t or F are available, this is normal).

The effect size calculation code is meant to include an if-then structure such that effect sizes are calculated from means and SDs if they are available, but if not they are calculated from the included t or F values (see bolded code below. this is the code I used in my meta-analysis, which was based on the code from a previous MetaLab study). But there must be an error whereby the effect sizes are not being calculated from t or F values

for(line in 1:length(db$n_1)){
if(db[line,]$participant_design == "within_two"){
if (complete.cases(db[line,]$x_1, db[line,]$x_2, db[line,]$SD_1, db[line,]$SD_2)) {
# Lipsey & Wilson, 3.14
pooled_SD <- sqrt((db[line,]$SD_1 ^ 2 + db[line,]$SD_2 ^ 2) / 2)
db[line,]$d_calc <- (db[line,]$x_2 - db[line,]$x_1) / pooled_SD
db[line,]$es_method <- "group_means_two"
} else if (complete.cases(db[line,]$t)) {
#Dunlap et al., 1996, p.171
wc <- sqrt(2 * (1 - db[line,]$corr))
db[line,]$d_calc <- (db[line,]$t / sqrt(db[line,]$n_1)) * wc
db[line,]$es_method <- "t_two"
} else if (complete.cases(db[line,]$F)) {
wc <- sqrt(2 * (1 - db[line,]$corr))
db[line,]$d_calc <- sqrt(abs(db[line,]$F) / db[line,]$n_1) * wc
db[line,]$es_method <- "F_two"
}
#Next step: effect size variance (needed for weighting the effect sizes)
#Lipsey & Wilson (2001)
if (complete.cases(db[line,]$n_1, db[line,]$d_calc)) {
db[line,]$d_var_calc <- (2 * (1 - db[line,]$corr) / db[line,]$n_1) +
(db[line,]$d_calc ^ 2 / (2 * db[line,]$n_1))
}
}else if(db[line,]$participant_design == "between"){
if (complete.cases(db[line,]$x_1, db[line,]$x_2, db[line,]$SD_1, db[line,]$SD_2)) {
# Lipsey & Wilson, 3.14
pooled_SD <- sqrt(((db[line,]$n_1 - 1) * db[line,]$SD_1 ^ 2 + (db[line,]$n_2 - 1)
* db[line,]$SD_2 ^ 2) / (db[line,]$n_1 + db[line,]$n_2 - 2))
db[line,]$d_calc <- (db[line,]$x_2 - db[line,]$x_1) / pooled_SD
# Lipsey & Wilson (2001)
} else if (complete.cases(db[line,]$t)) {
# Lipsey & Wilson, (2001)
db[line,]$d_calc <- db[line,]$t * sqrt((db[line,]$n_1 + db[line,]$n_2) /
(db[line,]$n_1 * db[line,]$n_2))
} else if (complete.cases(db[line,]$F)) {
# Lipsey & Wilson, (2001)
db[line,]$d_calc <- sqrt(abs(db[line,]$F) * (db[line,]$n_1 + db[line,]$n_2) /
(db[line,]$n_1 * db[line,]$n_2))
}
if (complete.cases(db[line,]$n_1, db[line,]$n_2, db[line,]$d_calc)) {
#now that effect size are calculated, effect size variance is calculated
db[line,]$d_var_calc <- ((db[line,]$n_1 + db[line,]$n_2) / (db[line,]$n_1 *
db[line,]$n_2))
+ (db[line,]$d_calc ^ 2 / (2 * (db[line,]$n_1 + db[line,]$n_2)))
}
}else if(db[line,]$participant_design == "within_one"){
if (complete.cases(db[line,]$x_1, db[line,]$x_2, db[line,]$SD_1)) {
db[line,]$d_calc <- (db[line,]$x_2 -db[line,]$x_1) / db[line,]$SD_1
db[line,]$es_method <- "group_means_one"
} else if (complete.cases(db[line,]$t)) {
db[line,]$d_calc <- db[line,]$t / sqrt(db[line,]$n_1)
db[line,]$es_method <- "t_one"
}
if (complete.cases(db[line,]$n_1, db[line,]$d_calc)) {
db[line,]$d_var_calc <- (2/db[line,]$n_1) + (db[line,]$d_calc ^ 2 /
(2 * db[line,]$n_1))
}
}
}

lottiegasp · 2021-01-07T03:25:58Z

@christinabergmann and @shotsuji we figured out that in the case of Fleur's neuro segmentation dataset, effect sizes aren't being calculated because there is no corr values in the spreadsheet, and the code is such that if corr values exist, new values are imputed based on existing values. But if no corr values originally exist there is no alternate corr imputation method, see the code below

Code from tidy_dataset line 21

#Impute values for missing correlations
set.seed(111)
#First we replace corr values outside the range (.01,.99) with NA
dataset_data = dataset_data %>%
mutate(corr = abs(corr)) %>%
mutate(corr = ifelse(corr > .99 | corr < .01, NA, corr))
#Then impute NA values
if (all(is.na(dataset_data$corr))) {
dataset_data$corr_imputed <- NA ## corr_imputed=NA if no original corr values are in the dataset
} else {
dataset_data$corr_imputed <- dataset_data$corr %>%
Hmisc::impute(fun = "random") %>%
as.numeric()
}

Then from compute_es line 40

} else if (participant_design == "within_two") {
if (is.na(corr) & complete(x_1, x_2, SD_1, SD_2, t)) {
# Use raw means, SD, and t-values to calculate correlations
corr = (SD_1^2 + SD_2^2 - (n_1 * (x_1 - x_2)^2 / t^2)) / (2 * SD_1 * SD_2)
}
if (is.na(corr) | corr > .99 | corr < .01){
#if correlation between two measures is not reported, use an imputed correlation value
#we also account for the observation that some re-calculated values are impossible and replace those
corr <- corr_imputed ## if no original corr values are in the dataset as above, then corr=NA
}
from line 72:
} else if (complete(t)) {
wc <- sqrt(2 * (1 - corr)) ## corr=NA, so wc=NA, d_calc=NA, no effect size is calculated
d_calc <- (t / sqrt(n_1)) * wc #Dunlap et al., 1996, p.171
es_method <- "t_two"
} else if (complete(f)) {
wc <- sqrt(2 * (1 - corr)) ## corr=NA, so wc=NA, d_calc=NA, no effect size is calculated
d_calc <- sqrt(f / n_1) * wc
es_method <- "f_two"
}

Should we perhaps adjust the code so that if no corr values exist in the dataset, then the corr values across all the MetaLab datasets are used to impute corr values in the dataset?

Relatedly, I noticed that the MetaLab script's current method for imputing missing corr values is to randomly select from other existing values in the dataset (see in italics above). For my study I imputed from a normal distribution using the median and variance of existing corr values and then adjusting the range to [-1,1]. Rabagliati et al. (2018) imputed it as the mean weighted by sample size. Are you happy with current method for imputing corr, or should we discuss what is currently considered to be the best practice?

christinabergmann · 2021-01-07T09:03:59Z

Oh, good catch.

How about the following:

if there are no corr in a dataset, take MetaLab values where possible from a matched sample that has a similar age (so to not take values from 4-month-olds for 4-year-olds and vice versa) and the same method? Maybe method alone is enough, too.
Work on a short report comparing the methods (mean, SD vs sampling, Rabagliati, method-matching, age-matching) for the site (and maybe a paper in the long run)? In that context maybe also review the literature for other options?
After 2. decide which method fits best?

lottiegasp · 2021-01-08T06:17:05Z

Thanks @christinabergmann

@erikriverson as a quick solution, are you able to work on some code that for each row of data extracts corr values from all MetaLab datasets with the same methods, and similar age (maybe where mean_age is +/- 15 days of the mean_age of the row in question that is missing a corr value?) into an object (named below CORRS_FROM_ALL_DATASETS) and then imputes a corr value in the same way?

#Impute values for missing correlations
set.seed(111)
#First we replace corr values outside the range (.01,.99) with NA
dataset_data = dataset_data %>%
mutate(corr = abs(corr)) %>%
mutate(corr = ifelse(corr > .99 | corr < .01, NA, corr))
#Then impute NA values
if (all(is.na(dataset_data$corr))) {
**dataset_data$corr_imputed <- CORRS_FROM_ALL_DATASETS %>%
Hmisc::impute(fun = "random") %>% ## same imputation method as below, when there are corr values in the dataset
as.numeric()
} else {
dataset_data$corr_imputed <- dataset_data$corr %>%
Hmisc::impute(fun = "random") %>%
as.numeric()
}

Let me know if this makes sense.

Then at a later time I can work on 2. and 3. as suggested by Christina. This will likely be in Feb after the conference.

lottiegasp assigned erikriverson and christinabergmann Dec 16, 2020

lottiegasp assigned shotsuji Jan 7, 2021

lottiegasp assigned lottiegasp and unassigned christinabergmann and shotsuji Jan 8, 2021

lottiegasp mentioned this issue Jan 22, 2021

Estimating corr/dependencies in the data #80

Closed

lottiegasp added the bug Something isn't working label Feb 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Effect sizes not calculated from F (and t?) values #68

Effect sizes not calculated from F (and t?) values #68

lottiegasp commented Dec 16, 2020

erikriverson commented Dec 16, 2020 •

edited

Loading

lottiegasp commented Dec 16, 2020 •

edited

Loading

lottiegasp commented Jan 7, 2021 •

edited

Loading

christinabergmann commented Jan 7, 2021

lottiegasp commented Jan 8, 2021

Effect sizes not calculated from F (and t?) values #68

Effect sizes not calculated from F (and t?) values #68

Comments

lottiegasp commented Dec 16, 2020

erikriverson commented Dec 16, 2020 • edited Loading

lottiegasp commented Dec 16, 2020 • edited Loading

lottiegasp commented Jan 7, 2021 • edited Loading

Code from tidy_dataset line 21

Then from compute_es line 40

christinabergmann commented Jan 7, 2021

lottiegasp commented Jan 8, 2021

erikriverson commented Dec 16, 2020 •

edited

Loading

lottiegasp commented Dec 16, 2020 •

edited

Loading

lottiegasp commented Jan 7, 2021 •

edited

Loading