-
Notifications
You must be signed in to change notification settings - Fork 1
/
minimization_randomization_comparison.Rmd
649 lines (507 loc) · 30.6 KB
/
minimization_randomization_comparison.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
---
title: "Benchmarking randomization methods"
author: "Aleksandra Duda^[Tranistion Technologies Science], Jagoda Głowacka-Walas^[Tranistion Technologies Science], Michał Seweryn^[Uniwersytet Łódzki]"
date: "`r Sys.Date()`"
output:
html_document:
toc: yes
toc_float: yes
bibliography: references.bib
link-citations: true
---
<style>
p {
text-align: justify
}
</style>
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE
)
```
# Introduction
Randomization in clinical trials is the gold standard and is widely considered the best design for evaluating the effectiveness of new treatments compared to alternative treatments (standard of care) or placebo. Indeed, the selection of an appropriate randomisation is as important as the selection of an appropriate statistical analysis for the study and the analysis strategy, whether based on randomisation or on a population model (@berger2021roadmap).
One of the primary advantages of randomization, particularly simple randomization (usually using flipping a coin method), is its ability to balance confounding variables across treatment groups. This is especially effective in large sample sizes (n \> 200), where the random allocation of participants helps to ensure that both known and unknown confounders are evenly distributed between the study arms. This balanced distribution contributes significantly to the internal validity of the study, as it minimizes the risk of selection bias and confounding influencing the results (@lim2019randomization).
It's important to note, however, that while simple randomization is powerful in large trials, it may not always guarantee an even distribution of confounding factors in trials with smaller sample sizes (n \< 100). In such cases, the random allocation might result in imbalances in baseline characteristics between groups, which can affect the interpretation of the treatment's effectiveness. This potential limitation sets the stage for considering additional methods, such as stratified randomization, or dynamic minimization algorithms to address these challenges in smaller trials (@kang2008issues).
This document provides a summary of the comparison of three randomization methods: simple randomization, block randomization, and adaptive randomization. Simple randomization and adaptive randomization (minimization method) are tools available in the `unbiased` package as `randomize_simple` and `randomize_minimisation_pocock` functions (@unbiased). The comparison aims to demonstrate the superiority of adaptive randomization (minimization method) over other methods in assessing the least imbalance of accompanying variables between therapeutic groups. Monte Carlo simulations were used to generate data, utilizing the `simstudy` package (@goldfeld2020simstudy). Parameters for the binary distribution of variables were based on data from the publication by @mrozikiewicz2023allogenic and information from researchers.
The document structure is as follows: first, based on the defined parameters, data will be simulated using the Monte Carlo method for a single simulation; then, for the generated patient data, appropriate groups will be assigned to them using three randomization methods; these data will be summarized in the form of descriptive statistics along with the relevant statistical test; next, data prepared in .Rds format generated for 1000 simulations will be loaded., the results based on the standardised mean difference (SMD) test will be discussed in visual form (boxplot, violin plot) and as a percentage of success achieved in each method for the given precision (tabular summary)
```{r setup, warning = FALSE, message=FALSE}
# load packages
library(unbiased)
library(dplyr)
library(simstudy)
library(tableone)
library(ggplot2)
library(gt)
library(gtsummary)
library(truncnorm)
library(tidyr)
library(randomizeR)
```
# The randomization methods considered for comparison
In the process of comparing the balance of covariates among randomization methods, three randomization methods have been selected for evaluation:
- **simple randomization** - simple coin toss, algorithm that gives participants equal chances of being assigned to a particular arm. The method's advantage lies in its simplicity and the elimination of predictability. However, due to its complete randomness, it may lead to imbalance in sample sizes between arms and imbalances between prognostic factors. For a large sample size (n \> 200), simple randomisation gives a similar number of generated participants in each group. For a small sample size (n \< 100), it results in an imbalance (@kang2008issues).
- **block randomization** - a randomization method that takes into account defined covariates for patients. The method involves assigning patients to therapeutic arms in blocks of a fixed size, with the recommendation that the blocks have different sizes. This, to some extent, reduces the risk of researchers predicting future arm assignments. In contrast to simple randomization, the block method aims to balance the number of patients within the block, hence reducing the overall imbalance between arms (@rosenberger2015randomization).
- **adaptive randomization using minimization method** based on @pocock1975sequential algorithm - - this randomization approach aims to balance prognostic factors across treatment arms within a clinical study. It functions by evaluating the total imbalance of these factors each time a new patient is considered for the study. The minimization method computes the overall imbalance for each potential arm assignment of the new patient, considering factors like variance or other specified criteria. The patient is then assigned to the arm where their addition results in the smallest total imbalance. This assignment is not deterministic but is made with a predetermined probability, ensuring some level of randomness in arm allocation. This method is particularly useful in trials with multiple prognostic factors or in smaller studies where traditional randomization might fail to achieve balance.
# Assessment of covariate balance
In the proposed approach to the assessment of randomization methods, the primary objective is to evaluate each method in terms of achieving balance in the specified covariates. The assessment of balance aims to determine whether the distributions of covariates are similarly balanced in each therapeutic group. Based on the literature, standardized mean differences (SMD) have been employed for assessing balance (@berger2021roadmap).
The SMD method is one of the most commonly used statistics for assessing the balance of covariates, regardless of the unit of measurement. It is a statistical measure for comparing differences between two groups. The covariates in the examined case are expressed as binary variables. In the case of categorical variables, SMD is calculated using the following formula (@zhang2019balance):
$$ SMD = \frac{{p_1 - p_2}}{{\sqrt{\frac{{p_1 \cdot (1 - p_1) + p_2 \cdot (1 - p_2)}}{2}}}} $$,
where:
- $p_1$ is the proportion in the first arm,
- $p_2$ is the proportion in the second arm.
# Definied number of patients
In this simulation, we are using a real use case - the planned FootCell study - non-commercial clinical research in the area of civilisation diseases - to guide our data generation process. For the FootCell study, it is anticipated that a total of 105 patients will be randomized into the trial. These patients will be equally divided among three research groups - Group A, Group B, and Group C - with each group comprising 35 patients.
```{r, define-parameters}
# defined number of patients
n <- 105
```
# Defining parameters for Monte-Carlo simulation
The distribution of parameters for individual covariates, which will subsequently be used to validate randomization methods, has been defined using the publication @mrozikiewicz2023allogenic on allogenic interventions..
The publication describes the effectiveness of comparing therapy using ADSC (Adipose-Derived Stem Cells) gel versus standard therapy with fibrin gel for patients in diabetic foot ulcer treatment. The FootCell study also aims to assess the safety of advanced therapy involving live ASCs (Adipose-Derived Stem Cells) in the treatment of diabetic foot syndrome, considering two groups treated with ADSCs (one or two administrations) compared to fibrin gel. Therefore, appropriate population data have been extracted from the publication to determine distributions that can be maintained when designing the FootCell study.
In the process of defining the study for randomization, the following covariates have been selected:
- **gender** [male/female],
- **diabetes type** [type I/type II],
- **HbA1c** [up to 9/9 to 11] [%],
- **tpo2** [up to 50/above 50] [mmHg],
- **age** [up to 55/above 55] [years],
- **wound size** [up to 2/above 2] [cm$^2$].
In the case of the variables gender and diabetes type in the publication @mrozikiewicz2023allogenic, they were expressed in the form of frequencies. The remaining variables were presented in terms of measures of central tendency along with an indication of variability, as well as minimum and maximum values. To determine the parameters for the binary distribution, the truncated normal distribution available in the `truncnorm` package was utilized. The truncated normal distribution is often used in statistics and probability modeling when dealing with data that is constrained to a certain range. It is particularly useful when you want to model a random variable that cannot take values beyond certain limits (@burkardt2014truncated).
To generate the necessary information for the remaining covariates, a function `simulate_proportions_trunc` was written, utilizing the `rtruncnorm function` (@truncnorm). The parameters `mean`, `sd`, `lower`, `upper` were taken from the publication and based on expertise regarding the ranges for the parameters.
The results are presented in a table, assuming that the outcome refers to the first category of each parameter.
```{r, simulate-proportions-function}
# simulate parameters using truncated normal distribution
simulate_proportions_trunc <-
function(n, lower, upper, mean, sd, threshold) {
simulate_data <-
rtruncnorm(
n = n,
a = lower,
b = upper,
mean = mean,
sd = sd
) <= threshold
sum(simulate_data == TRUE) / n
}
```
```{r, parameters-result-table, tab.cap = "Summary of literature verification about strata selected parameters (Mrozikiewicz-Rakowska et. al., 2023)"}
set.seed(123)
data.frame(
hba1c = simulate_proportions_trunc(1000, 0, 11, 7.41, 1.33, 9),
tpo2 = simulate_proportions_trunc(1000, 30, 100, 53.4, 18.4, 50),
age = simulate_proportions_trunc(1000, 0, 100, 59.2, 9.7, 55),
wound_size = simulate_proportions_trunc(1000, 0, 20, 2.7, 2.28, 2)
) |>
rename("wound size" = wound_size) |>
pivot_longer(
cols = everything(),
names_to = "parametr",
values_to = "proportions"
) |>
mutate("first catogory of strata" = c("<=9", "<=50", "<=55", "<=2")) |>
gt()
```
# Generate data using Monte-Carlo simulations
Monte-Carlo simulations were used to accumulate the data. This method is designed to model variables based on defined parameters. Variables were defined using the `simstudy` package, utilizing the `defData` function (@goldfeld2020simstudy). As all variables specify proportions, `dist = 'binary'` was used to define the variables. Due to the likely association between the type of diabetes and age -- meaning that the older the patient, the higher the probability of having type II diabetes -- a relationship with diabetes was established when defining the `age` variable using a logit function `link = "logit"`. The proportions for gender and diabetes were defined by the researchers and were consistent with the literature @mrozikiewicz2023allogenic.
Using `genData` function from `simstudy` package, a data frame (**data**) was generated with an artificially adopted variable `arm`, which will be filled in by subsequent randomization methods in the arm allocation process for all `n` patients.
```{r, defdata}
# defining variables
# male - 0.9
def <- simstudy::defData(varname = "sex", formula = "0.9", dist = "binary")
# type I - 0.15
def <- simstudy::defData(def, varname = "diabetes_type", formula = "0.15", dist = "binary")
# <= 9 - 0.888
def <- simstudy::defData(def, varname = "hba1c", formula = "0.888", dist = "binary")
# <= 50 - 0.354
def <- simstudy::defData(def, varname = "tpo2", formula = "0.354", dist = "binary")
# correlation with diabetes type
def <- simstudy::defData(
def,
varname = "age", formula = "(diabetes_type == 0) * (-0.95)", link = "logit", dist = "binary"
)
# <= 2 - 0.302
def <- simstudy::defData(def, varname = "wound_size", formula = "0.302", dist = "binary")
```
```{r, create-data}
# generate data using genData()
data <-
genData(n, def) |>
mutate(
sex = as.character(sex),
age = as.character(age),
diabetes_type = as.character(diabetes_type),
hba1c = as.character(hba1c),
tpo2 = as.character(tpo2),
wound_size = as.character(wound_size)
) |>
as_tibble()
```
```{r, data-generate}
# add arm to tibble
data <-
data |>
tibble::add_column(arm = "")
```
```{r, data-show}
# first 5 rows of the data
head(data, 5) |>
gt()
```
# Minimization randomization
To generate appropriate research arms, a function called `minimize_results` was written, utilizing the `randomize_minimisation_pocock` function available within the `unbiased` package (@unbiased). The probability parameter was set at the level defined within the function (p = 0.85). In the case of minimization randomization, to verify which type of minimization (with equal weights or unequal weights) was used, three calls to the minimize_results function were prepared:
- **minimize_equal_weights** - each covariate weight takes a value equal to 1 divided by the number of covariates. In this case, the weight is 1/6,
- **minimize_unequal_weights** - following the expert assessment by physicians, parameters with potentially significant impact on treatment outcomes (hba1c, tpo2, wound size) have been assigned a weight of 2. The remaining covariates have been assigned a weight of 1.
- **minimize_unequal_weights_3** - following the expert assessment by physicians, parameters with potentially significant impact on treatment outcomes (hba1c, tpo2, wound size) have been assigned a weight of 3. The remaining covariates have been assigned a weight of 1.
The tables present information about allocations for the first 5 patients.
```{r, minimize-results}
# drawing an arm for each patient
minimize_results <-
function(current_data, arms, weights) {
for (n in seq_len(nrow(current_data))) {
current_state <- current_data[1:n, 2:ncol(current_data)]
current_data$arm[n] <-
randomize_minimisation_pocock(
arms = arms,
current_state = current_state,
weights = weights
)
}
return(current_data)
}
```
```{r, minimize-equal}
set.seed(123)
# eqal weights - 1/6
minimize_equal_weights <-
minimize_results(
current_data = data,
arms = c("armA", "armB", "armC")
)
head(minimize_equal_weights, 5) |>
gt()
```
```{r, minimize-unequal-1}
set.seed(123)
# double weights where the covariant is of high clinical significance
minimize_unequal_weights <-
minimize_results(
current_data = data,
arms = c("armA", "armB", "armC"),
weights = c(
"sex" = 1,
"diabetes_type" = 1,
"hba1c" = 2,
"tpo2" = 2,
"age" = 1,
"wound_size" = 2
)
)
head(minimize_unequal_weights, 5) |>
gt()
```
```{r, minimize-unequal-2}
set.seed(123)
# triple weights where the covariant is of high clinical significance
minimize_unequal_weights_3 <-
minimize_results(
current_data = data,
arms = c("armA", "armB", "armC"),
weights = c(
"sex" = 1,
"diabetes_type" = 1,
"hba1c" = 3,
"tpo2" = 3,
"age" = 1,
"wound_size" = 3
)
)
head(minimize_unequal_weights_3, 5) |>
gt()
```
The `statistic_table` function was developed to provide information on: the distribution of the number of patients across research arms, and the distribution of covariates across research arms, along with p-value information for statistical analyses used to compare proportions - chi\^2, and the exact Fisher's test, typically used for small samples.
The function relies on the use of the `tbl_summary` function available in the `gtsummary` package (@gtsummary).
```{r, statistics-table}
# generation of frequency and chi^2 statistic values or fisher exact test
statistics_table <-
function(data) {
data |>
mutate(
sex = ifelse(sex == "1", "men", "women"),
diabetes_type = ifelse(diabetes_type == "1", "type1", "type2"),
hba1c = ifelse(hba1c == "1", "<=9", "(9,11>"),
tpo2 = ifelse(tpo2 == "1", "<=50", ">50"),
age = ifelse(age == "1", "<=55", ">55"),
wound_size = ifelse(wound_size == "1", "<=2", ">2")
) |>
tbl_summary(
include = c(sex, diabetes_type, hba1c, tpo2, age, wound_size),
by = arm
) |>
modify_header(label = "") |>
modify_header(all_stat_cols() ~ "**{level}**, N = {n}") |>
bold_labels() |>
add_p()
}
```
The table presents a statistical summary of results for the first iteration for:
- **Minimization with all weights equal to 1/6**.
```{r, chi2-1, tab.cap = "Summary of proportion test for minimization randomization with equal weights"}
statistics_table(minimize_equal_weights)
```
- **Minimization with weights 2:1**.
```{r, chi2-2, tab.cap = "Summary of proportion test for minimization randomization with equal weights"}
statistics_table(minimize_unequal_weights)
```
- **Minimization with weights 3:1**.
```{r, chi2-3, tab.cap = "Summary of proportion test for minimization randomization with equal weights"}
statistics_table(minimize_unequal_weights_3)
```
# Simple randomization
In the next step, appropriate arms were generated for patients using simple randomization, available through the `unbiased` package - the `randomize_simple` function (@unbiased). The `simple_results` function was called within `simple_data`, considering the initial assumption of assigning patients to three arms in a 1:1:1 ratio.
Since this is simple randomization, it does not take into account the initial covariates, and treatment assignment occurs randomly (flip coin method). The tables illustrate an example of data output and summary statistics including a summary of the statistical tests.
```{r, simple-result}
# simple randomization
simple_results <-
function(current_data, arms, ratio) {
for (n in seq_len(nrow(current_data))) {
current_data$arm[n] <-
randomize_simple(arms, ratio)
}
return(current_data)
}
```
```{r, simple-data}
set.seed(123)
simple_data <-
simple_results(
current_data = data,
arms = c("armA", "armB", "armC"),
ratio = c("armB" = 1L, "armA" = 1L, "armC" = 1L)
)
head(simple_data, 5) |>
gt()
```
```{r, chi2-4, tab.cap = "Summary of proportion test for simple randomization"}
statistics_table(simple_data)
```
# Block randomization
Block randomization, as opposed to minimization and simple randomization methods, was developed based on the `rbprPar` function available in the `randomizeR` package (@randomizeR). Using this, the `block_rand` function was created, which, based on the defined number of patients, arms, and a list of stratifying factors, generates a randomization list with a length equal to the number of patients multiplied by the product of categories in each covariate. In the case of the specified data in the document, for one iteration, it amounts to **105 \* 2\^6 = 6720 rows**. This ensures that there is an appropriate number of randomisation codes for each opportunity. In the case of equal characteristics, it is certain that there are the right number of codes for the defined `n` patients.
Based on the `block_rand` function, it is possible to generate a randomisation list, based on which patients will be allocated, with characteristics from the output `data` frame. Due to the 3 arms and the need to blind the allocation of consecutive patients, block sizes 3,6 and 9 were used for the calculations.
In the next step, patients were assigned to research groups using the `block_results` function (based on the list generated by the function `block_rand`). A first available code from the randomization list that meets specific conditions is selected, and then it is removed from the list of available codes. Based on this, research arms are generated to ensure the appropriate number of patients in each group (based on the assumed ratio of 1:1:1).
The tables show the assignment of patients to groups using block randomisation and summary statistics including a summary of the statistical tests.
```{r, block-rand}
# Function to generate a randomisation list
block_rand <-
function(n, block, n_groups, strata, arms = LETTERS[1:n_groups]) {
strata_grid <- expand.grid(strata)
strata_n <- nrow(strata_grid)
ratio <- rep(1, n_groups)
gen_seq_list <- lapply(seq_len(strata_n), function(i) {
rand <- rpbrPar(
N = n,
rb = block,
K = n_groups,
ratio = ratio,
groups = arms,
filledBlock = FALSE
)
getRandList(genSeq(rand))[1, ]
})
df_list <- tibble::tibble()
for (i in seq_len(strata_n)) {
local_df <- strata_grid |>
dplyr::slice(i) |>
dplyr::mutate(count_n = n) |>
tidyr::uncount(count_n) |>
tibble::add_column(rand_arm = gen_seq_list[[i]])
df_list <- rbind(local_df, df_list)
}
return(df_list)
}
```
```{r, block-results}
# Generate a research arm for patients in each iteration
block_results <- function(current_data) {
simulation_result <-
block_rand(
n = n,
block = c(3, 6, 9),
n_groups = 3,
strata = list(
sex = c("0", "1"),
diabetes_type = c("0", "1"),
hba1c = c("0", "1"),
tpo2 = c("0", "1"),
age = c("0", "1"),
wound_size = c("0", "1")
),
arms = c("armA", "armB", "armC")
)
for (n in seq_len(nrow(current_data))) {
# "-1" is for "arm" column
current_state <- current_data[n, 2:(ncol(current_data) - 1)]
matching_rows <- which(apply(
simulation_result[, -ncol(simulation_result)], 1,
function(row) all(row == current_state)
))
if (length(matching_rows) > 0) {
current_data$arm[n] <-
simulation_result[matching_rows[1], "rand_arm"]
# Delete row from randomization list
simulation_result <- simulation_result[-matching_rows[1], , drop = FALSE]
}
}
return(current_data)
}
```
```{r, block-data-show}
set.seed(123)
block_data <-
block_results(data)
head(block_data, 5) |>
gt()
```
```{r, chi2-5, tab.cap = "Summary of proportion test for simple randomization"}
statistics_table(block_data)
```
# Generate 1000 simulations
We have performed 1000 iterations of data generation with parameters defined above. The number of iterations indicates the number of iterations included in the Monte-Carlo simulations to accumulate data for the given parameters. This allowed for the generation of data 1000 times for 105 patients to more efficiently assess the effect of randomization methods in the context of covariate balance.
These data were assigned to the variable `sim_data` based on the data stored in the .Rds file `1000_sim_data.Rds`, available within the vignette information on the GitHub repository of the `unbiased` package.
```{r, simulations}
# define number of iterations
# no_of_iterations <- 1000 # nolint
# define number of cores
# no_of_cores <- 20 # nolint
# perform simulations (run carefully!)
# source("~/unbiased/vignettes/helpers/run_parallel.R") # nolint
# read data from file
sim_data <- readRDS("1000_sim_data.Rds")
```
# Check balance using smd test
In order to select the test and define the precision at a specified level, above which we assume no imbalance, a literature analysis was conducted based on publications such as @lee2021estimating, @austin2009balance, @doah2021impact, @brown2020novel, @nguyen2017double, @sanchez2003effect, @lee2022propensity, @berger2021roadmap.
To assess the balance for covariates between the research groups A, B, C, the Standardized Mean Difference (SMD) test was employed, which compares two groups. Since there are three groups in the example, the SMD test is computed for each pair of comparisons: A vs B, A vs C, and B vs C. The average SMD test for a given covariate is then calculated based on these comparisons.
In the literature analysis, the precision level ranged between 0.1-0.2. For small samples, it was expected that the SMD test would exceed 0.2 (@austin2009balance). Additionally, according to the publication by @sanchez2003effect, there is no golden standard that dictates a specific threshold for the SMD test to be considered balanced. Generally, the smaller the SMD test, the smaller the difference in covariate imbalance.
In the analyzed example, due to the sample size of 105 patients, a threshold of 0.2 for the SMD test was adopted.
A function called `smd_covariants_data` was written to generate frames that produce the SMD test for each covariate in each iteration, utilizing the `CreateTableOne` function available in the `tableone` package (@tableone). In cases where the test result is \<0.001, a value of 0 was assigned.
The results for each randomization method were stored in the `cov_balance_data`.
```{r, define-strata-vars}
# definied covariants
vars <- c("sex", "age", "diabetes_type", "wound_size", "tpo2", "hba1c")
```
```{r, smd-covariants-data}
smd_covariants_data <-
function(data, vars, strata) {
result_table <-
lapply(unique(data$simnr), function(i) {
current_data <- data[data$simnr == i, ]
arms_to_check <- setdiff(names(current_data), c(vars, "id", "simnr"))
# check SMD for any covariants
lapply(arms_to_check, function(arm) {
tab <-
CreateTableOne(
vars = vars,
data = current_data,
strata = arm
)
results_smd <-
ExtractSmd(tab) |>
as.data.frame() |>
tibble::rownames_to_column("covariants") |>
select(covariants, results = average) |>
mutate(results = round(as.numeric(results), 3))
results <-
bind_cols(
simnr = i,
strata = arm,
results_smd
)
return(results)
}) |>
bind_rows()
}) |>
bind_rows()
return(result_table)
}
```
```{r, cov-balance-data, echo = TRUE, results='hide'}
cov_balance_data <-
smd_covariants_data(
data = sim_data,
vars = vars
) |>
mutate(method = case_when(
strata == "minimize_equal_weights_arms" ~ "minimize equal",
strata == "minimize_unequal_weights_arms" ~ "minimize unequal 2:1",
strata == "minimize_unequal_weights_triple_arms" ~ "minimize unequal 3:1",
strata == "simple_data_arms" ~ "simple randomization",
strata == "block_data_arms" ~ "block randomization"
)) |>
select(-strata)
```
Below are the results of the SMD test presented in the form of boxplot and violin plot, depicting the outcomes for each randomization method. The red dashed line indicates the adopted precision threshold.
- **Boxplot of the combined results**
```{r, boxplot, fig.cap= "Summary average smd in each randomization methods", warning=FALSE, fig.width=9, fig.height=6}
# boxplot
cov_balance_data |>
select(simnr, results, method) |>
group_by(simnr, method) |>
mutate(results = mean(results)) |>
distinct() |>
ggplot(aes(x = method, y = results, fill = method)) +
geom_boxplot() +
geom_hline(yintercept = 0.2, linetype = "dashed", color = "red") +
theme_bw()
```
- **Violin plot**
```{r, violinplot, fig.cap= "Summary smd in each randomization methods in each covariants", warning = FALSE, fig.width=9, fig.height=6}
# violin plot
cov_balance_data |>
ggplot(aes(x = method, y = results, fill = method)) +
geom_violin() +
geom_hline(
yintercept = 0.2,
linetype = "dashed",
color = "red"
) +
facet_wrap(~covariants, ncol = 3) +
theme_bw() +
theme(axis.text = element_text(angle = 45, vjust = 0.5, hjust = 1))
```
- **Summary table of success**
Based on the specified precision threshold of 0.2, a function defining randomization success, named `success_power`, was developed. If the SMD test value for each covariate in a given iteration is above 0.2, the function defines the analysis data as 'failure' - 0; otherwise, it is defined as 'success' - 1.
The final success power is calculated as the sum of successes in each iteration divided by the total number of specified iterations.
The results are summarized in a table as the percentage of success for each randomization method.
```{r, success-power}
# function defining success of randomisation
success_power <-
function(cov_data) {
result_table <-
lapply(unique(cov_data$simnr), function(i) {
current_data <- cov_data[cov_data$simnr == i, ]
current_data |>
group_by(method) |>
summarise(success = ifelse(any(results > 0.2), 0, 1)) |>
tibble::add_column(simnr = i, .before = 1)
}) |>
bind_rows()
success <-
result_table |>
group_by(method) |>
summarise(results_power = sum(success) / n() * 100)
return(success)
}
```
```{r, success-result-data, tab.cap = "Summary of percent success in each randomization methods"}
success_power(cov_balance_data) |>
as.data.frame() |>
rename(`power results [%]` = results_power) |>
gt()
```
# Conclusion
Considering all three randomization methods: minimization, block randomization, and simple randomization, minimization performs the best in terms of covariate balance. Simple randomization has a significant drawback, as patient allocation to arms occurs randomly with equal probability. This leads to an imbalance in both the number of patients and covariate balance, which is also random. This is particularly the case with small samples. Balancing the number of patients is possible for larger samples for n \> 200.
On the other hand, block randomization performs very well in balancing the number of patients in groups in a specified allocation ratio. However, compared to adaptive randomisation using the minimisation method, block randomisation has a lower probability in terms of balancing the co-variables.
Minimization method, provides the highest success power by ensuring balance across covariates between groups. This is made possible by an appropriate algorithm implemented as part of minimisation randomisation. When assigning the next patient to a group, the method examines the total imbalance and then assigns the patient to the appropriate study group with a specified probability to balance the sample in terms of size, and covariates.
# References
---
nocite: '@*'
---