-
Notifications
You must be signed in to change notification settings - Fork 0
/
Main.Rmd
218 lines (167 loc) · 7.99 KB
/
Main.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
---
title: "Visualizing expectations and variation through simulation"
author: "P. Adames"
date: "`r Sys.Date()`"
output:
html_document:
df_print: paged
word_document: default
pdf_document: default
bibliography: biblio.bib
description: "Visualizing expectations and variance via simulation"
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Expected values and variance
I am not a fan of gambling but I find it interesting to think that life itself is a gamble.
To avoid the inevitability of daily random events seems futile:
traffic volume and weather are but the two more prominent in our daily lives.
Oftentimes forecasts and averages are rendered useless by the
seemingly wild variations encountered in spatial and temporal oddities inherent to natural phenomena.
After all, statistics, in general, is well known for failing to predict the unusual events.
## Wagers
As a means to experiencing the impact of expected values and variability let's follow a slightly modified
example of wagers from [@RodMen2018, p.19].
For wager 3: you pay $300 to enter and I roll a die. If it comes up 1, 2, or 3, then
I keep your money. If it comes up 4, 5, or 6, then I give you back $600 (your
original bet plus a $300 profit).
For wager 4: you pay $300 to enter and I roll a die. If it comes up 1 or 2 then I get
to keep your money. If it comes up 3, 4, 5, or 6 then I give you back $450 (your original $300 plus a profit of $150).
## Comparing the wagers
]
Assuming a fair die, wager 3 feels balanced at 50% chance of winning or losing the same amount.
Wager 4 feels somewhat unbalanced because one can make 50% profit 2/3 of the time.
We need to compare the two wagers to assess their relative benefit.
The expected profit from each wager in statistical terms is a
straightforward way of doing so.
### Expected Profit from the wagers
The profit of a wager is defined as the payout minus the cost of entry.
The expected profit of each wager is:
```{=latex}
\begin{align}
E(X) &= [x_1 P(X=x_1) + .. + x_n P(X=x_n)] - C \tag{1}\\
&= [(x_1 -C) P(X=x_1) + .. + (x_n -C) P(X=x_n)]\tag{2}
\end{align}
```
```{=latex}
\begin{align}
E(W_3) &= [(0) (1/2) + 600 (1/2)] - 300 \\
&= [(-300) (1/2) + (300) (1/2)] \\
&= 0
\end{align}
```
```{=latex}
\begin{align}
E(W_4) &= [(0) (1/3) + 450 (2/3)] - 300 \\
&= [(-300) (1/3) + 150 (2/3)] \\
&= 0
\end{align}
```
Eq. [2] is simpler to compute the variance, and thus it is used in Eq. [3] below.
Note: I modified the values of the original wagers from the reference in an effort to make the dollar amounts more realistic.
This apparently takes away from what is considered a fair bet, however, we will defer
that subject for another day and definitively recommend that the curious reader goes to the source of the wagers for this discussion [@RodMen2018, Chapter 2]
Let's use R to compute those values.
```{r "expected profit from waher 3", include=TRUE, cache=FALSE}
cost_of_entry = 300
probability_of_die_eq_or_less_than_three = 0.5
payout_of_die_eq_or_less_than_three = 0
payout_of_die_greater_than_three = 300 + 300
expected_value_wager_3 <- (payout_of_die_eq_or_less_than_three *
probability_of_die_eq_or_less_than_three +
payout_of_die_greater_than_three * (1 - probability_of_die_eq_or_less_than_three)) -
cost_of_entry
cat(paste(
paste("Expected net profit from wager 3:",
ifelse(all.equal(expected_value_wager_3, 0.0),
0.0,
expected_value_wager_3),
sep = "\n"),
" dollars"))
```
In a similar fashion the expected statistical profit for wager 4 is:
```{r "expected profit from wager 4", include=TRUE, cache=FALSE}
cost_of_entry = 300
probability_of_die_eq_or_less_than_two = 1 / 3
payout_of_die_eq_or_less_than_two = 0
payout_of_die_greater_than_two = 300 + 150
expected_value_wager_4 <- (payout_of_die_eq_or_less_than_two * probability_of_die_eq_or_less_than_two +
payout_of_die_greater_than_two * (1 - probability_of_die_eq_or_less_than_two)) -
cost_of_entry
cat(paste(
paste("Expected net profit from wager 4:",
ifelse(all.equal(expected_value_wager_4, 0),
0,
expected_value_wager_3),
sep = "\n"),
" dollars"))
```
All that effort to confirm that neither wager will make or lose you money in the long run.
So, where does that leave us in terms of the relative merits or
drawbacks of each bet?
We need other tools to analyze the outcomes from engaging in either
wager.
The first thing we will look at is a verification that the expected value does manifest
in reality, albeit after a few repetitions, after all practice makes perfect, right?
### Simulation
In the following animated gif image you will see 30 sets of 2,000 bets using wager 3 in blue
and the same number of bets using wager 4 in red.
Each of the 30 sets starts with a different random number seed before sampling the die with replacement 2,000 times.
Because R is a vectorized language, this is done in a single line:
die_results = sample(x =1:6, size = 2000, replace = TRUE)
The running profit is obtained by dividing the sum of the net profit by the number of times played.
That explains why if you always win the bet the maximum running profit is the maximum payout, `$300`, in the case of wager 3 and `$150` for wager 4. However, if you always lose, both wagers would produce a `$300` loss.
Please remember that this does not make wager 3 any more desirable by itself because both converge to the same expected value in the long term, zero profit.
```{r "comparing wagers via simulation", echo=FALSE, results='asis'}
knitr::include_graphics("animations/wager_comp_sim.gif", dpi = 86)
```
## Variance
So, how else can we qualify the different wagers? Are they different in any practical way?
The statistical variance might give us a clue, it is calculated as follows:
```{=latex}
\begin{align}
V(x) &= E{[X - E(X)]^2} \tag{3}\\
&= E{X^2 - 2 X E(X) + {E(X)}^2}\\
&= E(X^2) - 2 E(X) E(X) + {E(X)}^2\\
&= E(X^2) - {E(X)}^2\\
&= (x_1)^2 P(X = x_1) + .. + (x_n)^2 P(X = x_n) - {x_1 P(X = x_1) + .. + x_n P(X= x_n)}^2\tag{4}
\end{align}
```
Substituting for each wager:
```{=latex}
\begin{align}
V(W_3) &= [(-300)^2 x (1/2) + (300)^2 x (1/2)] - {(-300) x (1/2) + 300 x (1/2)}^2\\
&= 300^2 - {0} \\
&= 90,000
\end{align}
```
```{=latex}
\begin{align}
V(W_4) &= [(-300)^2 x (1/3) + (150)^2 x (2/3)] - {(-300) x (1/3) + (150) x (2/3)}^2\\
&= 30,000 + 50 x 150 x 2 - {0}\\
&= 30,000 + 15,000 \\
&= 45,000
\end{align}
```
This means that the variability of the profit with wager 3 can be substantially larger that with wager 4, although both have zero expected profit in the long run.
What does that say about the gambling budget necessary to sustain these wagers?
## Experiencing variability via simulation
I believe a realistic way of thinking about what the variability of the expected
outcomes of these two wagers is thinking in terms of the cash flow necessary to
sustain 2,000 bets, the same number we used for the previous simulations of the
expected profit.
The following animated image shows 60 such sets of 2,000 bets one for each wager.
The idea is to glance at all of them to get a sense for the wager that gives the most
extreme variations in the positive or negative cash flow necessary to sustain
the 2,000 bets.
```{r "cash flows via simulation", echo=FALSE, results='asis'}
knitr::include_graphics("animations/wager_cash_flow_comparison.gif", dpi = 86)
```
Overall, it seems the red histograms span larger areas than the blue ones.
This reinforces the results from the calculation of variance showing that wager 3 has
more variability than wager 4.
Note that wager 3 may require larger amounts of cash to sustain the 2,000 bets,
when the cash flow is negative. At this point is worth reminding the reader that both
wagers have a neutral expected profit.
## References