-
Notifications
You must be signed in to change notification settings - Fork 15
/
replication-prediction-intervals.Rmd
executable file
·108 lines (93 loc) · 3.28 KB
/
replication-prediction-intervals.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
---
title: "Replication prediction intervals"
author: "Matthew Kay"
date: "12/7/2021"
output: github_document
---
## Setup
Libraries needed:
```{r setup, message=FALSE}
library(ggplot2)
library(purrr)
library(dplyr)
library(tidyr)
library(forcats)
library(broom)
library(ggdist)
```
## Data generation
We'll generate data from a bunch of hypothetical original experiments (orig) and
replications (rep) and then calculate three replication criteria:
- Is the replication effect in the 95% CI of the original?
- Is the original effect in the 95% CI of the replication?
- Is the replication effect in the 95% prediction interval of the original?
We'll do this at several sample sizes, always assuming the replication has
a sample size of 100 (for simplicity):
```{r}
set.seed(1234)
exp_per_sample_size = 10000
sample_size = round(seq(20, 100, length.out = 10))
rep_sample_size = 100
results = expand.grid(
exp = 1:exp_per_sample_size,
sample_size = sample_size
) %>%
# convert to tibble so that nested data frames don't complain
as_tibble() %>%
# generate data
mutate(
orig = map_dfr(sample_size, \(n) {
x = rnorm(n)
result = tidy(t.test(x))
# add on replication prediction intervals
# (Spence and Stanley 2016, https://doi.org/10.1371/journal.pone.0162874)
var = var(x)
pi_plusminus = qt(.975, n - 1) * sqrt(var/n + var/rep_sample_size)
result %>% mutate(
pi.low = estimate - pi_plusminus,
pi.high = estimate + pi_plusminus
)
}),
rep = map_dfr(sample_size, \(n) tidy(t.test(rnorm(rep_sample_size))))
) %>%
# calculate the three criteria
mutate(
rep_in_orig = orig$conf.low <= rep$estimate & rep$estimate <= orig$conf.high,
orig_in_rep = rep$conf.low <= orig$estimate & orig$estimate <= rep$conf.high,
rep_in_pi = orig$pi.low <= rep$estimate & rep$estimate <= orig$pi.high
)
results_summary = results %>%
group_by(sample_size) %>%
summarise(across(rep_in_orig:rep_in_pi, mean))
results_summary
```
Plotting these, criteria that are well-calibrated should be at 95% and
not be sensitive to sample size:
```{r, fig.width = 7.1, fig.height = 4}
results_summary %>%
pivot_longer(rep_in_orig:rep_in_pi, names_to = "Criterion", values_to = "proportion") %>%
mutate(Criterion = fct_rev(fct_recode(Criterion,
"Original in rep 95% CI" = "orig_in_rep",
"Rep in original 95% CI" = "rep_in_orig",
"Rep in 95% predictive interval" = "rep_in_pi"
))) %>%
ggplot(aes(x = sample_size, y = proportion, color = Criterion)) +
geom_line(size = 1) +
xlab("Original sample size") +
ylab("Proportion of\nreplications\nmatching each\ncriterion") +
ggtitle(
'Judging replication "success"',
'Original and replication samples drawn from the same Gaussian\ndistribution with a replication sample size of 100') +
scale_y_continuous(limits = c(0, 1), breaks = seq(0, 1, by = 0.1)) +
theme_ggdist() +
axis_titles_bottom_left() +
theme(
axis.title.y = element_text(hjust = 1),
panel.grid.major = element_line(color = "gray95"),
legend.justification = c(0,1)
) +
coord_cartesian(expand = FALSE, clip = "off") +
scale_color_brewer(palette = "Set2")
```
We can see that the only one that we can consistently judge replication success
with is whether the replication effect is in the 95% predictive interval.