Main.Rmd

---
title: "Visualizing expectations and variation through simulation"
author: "P. Adames"
date: "`r Sys.Date()`"
output:
  html_document:
    df_print: paged
  word_document: default
  pdf_document: default
bibliography: biblio.bib
description: "Visualizing expectations and variance via simulation"
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## Expected values and variance

I am not a fan of gambling but I find it interesting to think that life itself is a gamble.
To avoid the inevitability of daily random events seems futile: 
traffic volume and weather are but the two more prominent in our daily lives.

Oftentimes forecasts and averages are rendered useless by the
seemingly wild variations encountered in spatial and temporal oddities inherent to natural phenomena.
After all, statistics, in general, is well known for failing to predict the unusual events.

## Wagers 

As a means to experiencing the impact of expected values and variability let's follow a slightly modified
example of wagers from [@RodMen2018, p.19].

For wager 3:  you pay $300 to enter and I roll a die. If it comes up 1, 2, or 3, then
I keep your money. If it comes up 4, 5, or 6, then I give you back $600 (your
original bet plus a $300 profit).

For wager 4:  you pay $300 to enter and I roll a die. If it comes up 1 or 2 then I get
to keep your money. If it comes up 3, 4, 5, or 6 then I give you back $450 (your original $300 plus a profit of $150).


## Comparing the wagers
]
Assuming a fair die, wager 3 feels balanced at 50% chance of winning or losing the same amount.
Wager 4 feels somewhat unbalanced because one can make 50% profit 2/3 of the time.


We need to compare the two wagers to assess their relative benefit.
The expected profit from each wager in statistical terms is a  
straightforward way of doing so.

### Expected Profit from the wagers

The profit of a wager is defined as the payout minus the cost of entry.
The expected profit of each wager is:
```{=latex}
\begin{align}
    E(X)   &= [x_1 P(X=x_1) + .. + x_n P(X=x_n)] - C \tag{1}\\
           &= [(x_1 -C) P(X=x_1) + .. + (x_n -C) P(X=x_n)]\tag{2}
\end{align}
```
```{=latex}
\begin{align}
    E(W_3) &= [(0)  (1/2) + 600 (1/2)] - 300 \\
           &= [(-300) (1/2) + (300) (1/2)] \\
           &= 0
\end{align}
```
```{=latex}
\begin{align}
    E(W_4) &= [(0)  (1/3) + 450 (2/3)] - 300 \\
           &= [(-300) (1/3) + 150  (2/3)] \\
           &= 0
\end{align}
```

Eq. [2] is simpler to compute the variance, and thus it is used in Eq. [3] below.
 
Note: I modified the values of the original wagers from the reference in an effort to make the dollar amounts more realistic. 
This apparently takes away from what is considered a fair bet, however, we will defer
that subject for another day and definitively recommend that the curious reader goes to the source of the wagers for this discussion [@RodMen2018, Chapter 2] 


Let's use R to compute those values.

```{r "expected profit from waher 3", include=TRUE, cache=FALSE}
cost_of_entry = 300
probability_of_die_eq_or_less_than_three = 0.5
payout_of_die_eq_or_less_than_three = 0
payout_of_die_greater_than_three = 300 + 300
expected_value_wager_3 <- (payout_of_die_eq_or_less_than_three *
                          probability_of_die_eq_or_less_than_three +
                          payout_of_die_greater_than_three * (1 - probability_of_die_eq_or_less_than_three)) -
  cost_of_entry
cat(paste(
  paste("Expected net profit from wager 3:",
        ifelse(all.equal(expected_value_wager_3, 0.0),
               0.0, 
               expected_value_wager_3),
        sep = "\n"),
  " dollars"))
```

In a similar fashion the expected statistical profit for wager 4 is:

```{r "expected profit from wager 4", include=TRUE, cache=FALSE}
cost_of_entry = 300
probability_of_die_eq_or_less_than_two = 1 / 3
payout_of_die_eq_or_less_than_two = 0
payout_of_die_greater_than_two = 300 + 150
expected_value_wager_4 <- (payout_of_die_eq_or_less_than_two * probability_of_die_eq_or_less_than_two +
                             payout_of_die_greater_than_two * (1 - probability_of_die_eq_or_less_than_two)) -
                             cost_of_entry
cat(paste(
  paste("Expected net profit from wager 4:",
        ifelse(all.equal(expected_value_wager_4, 0),
               0, 
               expected_value_wager_3),
        sep = "\n"),
  " dollars"))
```

All that effort to confirm that neither wager will make or lose you money in the long run.
So, where does that leave us in terms of the relative merits or 
drawbacks of each bet?
We need other tools to analyze the outcomes from engaging in either
wager.

The first thing we will look at is a verification that the expected value does manifest 
in reality, albeit after a few repetitions, after all practice makes perfect, right?

### Simulation

In the following animated gif image you will see 30 sets of 2,000 bets using wager 3 in blue
and the same number of bets using wager 4 in red. 
Each of the 30 sets starts with a different random number seed before sampling the die with replacement 2,000 times.
Because R is a vectorized language, this is done in a single line:

    die_results = sample(x =1:6, size = 2000, replace = TRUE)
    
The running profit is obtained by dividing the sum of the net profit by the number of times played.
That explains why if you always win the bet the maximum running profit is the maximum payout, `$300`, in the case of wager 3 and `$150` for wager 4. However, if you always lose, both wagers would produce a `$300` loss.

Please remember that this does not make wager 3 any more desirable by itself because both converge to the same expected value in the long term, zero profit.


```{r "comparing wagers via simulation", echo=FALSE, results='asis'}
knitr::include_graphics("animations/wager_comp_sim.gif", dpi = 86)
```


## Variance

So, how else can we qualify the different wagers? Are they different in any practical way?

The statistical variance might give us a clue, it is calculated as follows:

```{=latex}
\begin{align}
    V(x) &= E{[X - E(X)]^2} \tag{3}\\
         &= E{X^2 - 2 X E(X) + {E(X)}^2}\\
         &= E(X^2) - 2 E(X) E(X) + {E(X)}^2\\
         &= E(X^2) - {E(X)}^2\\
         &= (x_1)^2 P(X = x_1) + .. + (x_n)^2 P(X = x_n) - {x_1 P(X = x_1) + .. + x_n P(X= x_n)}^2\tag{4}
\end{align}
```

Substituting for each wager:

```{=latex}
\begin{align}
    V(W_3) &= [(-300)^2 x (1/2) + (300)^2 x (1/2)] - {(-300) x (1/2) + 300 x (1/2)}^2\\
           &=  300^2 - {0} \\
           &=  90,000
\end{align}
```

```{=latex}
\begin{align}
    V(W_4) &= [(-300)^2 x (1/3) + (150)^2 x (2/3)] - {(-300) x (1/3) + (150) x (2/3)}^2\\
           &=  30,000 + 50 x 150 x 2 - {0}\\
           &= 30,000 + 15,000 \\
           &= 45,000
\end{align}
```

This means that the variability of the profit with wager 3 can be substantially larger that with wager 4, although both have zero expected profit in the long run. 
What does that say about the gambling budget necessary to sustain these wagers?


## Experiencing variability via simulation

I believe a realistic way of thinking about what the variability of the expected
outcomes of these two wagers is thinking in terms of the cash flow necessary to
sustain 2,000 bets, the same number we used for the previous simulations of the 
expected profit.

The following animated image shows 60 such sets of 2,000 bets one for each wager.
The idea is to glance at all of them to get a sense for the wager that gives the most
extreme variations in the positive or negative cash flow necessary to sustain
the 2,000 bets.


```{r "cash flows via simulation", echo=FALSE, results='asis'}
knitr::include_graphics("animations/wager_cash_flow_comparison.gif", dpi = 86)
```

Overall, it seems the red histograms span larger areas than the blue ones.
This reinforces the results from the calculation of variance showing that wager 3 has 
more variability than wager 4.


Note that wager 3 may require larger amounts of cash to sustain the 2,000 bets, 
when the cash flow is negative. At this point is worth reminding the reader that both
wagers have a neutral expected profit.


## References