-
Notifications
You must be signed in to change notification settings - Fork 0
/
bandit.html
162 lines (160 loc) · 10.9 KB
/
bandit.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
<html><head><title>niplav</title>
<link href="./favicon.png" rel="shortcut icon" type="image/png"/>
<link href="main.css" rel="stylesheet" type="text/css"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<!DOCTYPE HTML>
<style type="text/css">
code.has-jax {font: inherit; font-size: 100%; background: inherit; border: inherit;}
</style>
<script async="" src="./mathjax/latest.js?config=TeX-MML-AM_CHTML" type="text/javascript">
</script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
extensions: ["tex2jax.js"],
jax: ["input/TeX", "output/HTML-CSS"],
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
processEscapes: true,
skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
},
"HTML-CSS": { availableFonts: ["TeX"] }
});
</script>
<script>
document.addEventListener('DOMContentLoaded', function () {
// Change the title to the h1 header
var title = document.querySelector('h1')
if(title) {
var title_elem = document.querySelector('title')
title_elem.textContent=title.textContent + " – niplav"
}
});
</script>
</head><body><h2 id="home"><a href="./index.html">home</a></h2>
<p><em>author: niplav, created: 2023-11-18, modified: 2024-09-30, language: english, status: in progress, importance: 2, confidence: likely</em></p>
<blockquote>
<p><strong>I finally make use of my <a href="./data.html#Daygame">daygame data</a> by
writing some code that implements a multi-armed bandit with Thompson
sampling on beta-distributed estimates of what proportion of approaches
in a particular location yield contact information.</strong></p>
</blockquote><div class="toc"><div class="toc-title">Contents</div><ul><li><a href="#Beyond_the_Bandit">Beyond the Bandit</a><ul></ul></li><li><a href="#See_Also">See Also</a><ul></ul></li></ul></div>
<h1 id="Using_A_MultiArmed_Bandit_to_Select_Daygame_Locations"><a class="hanchor" href="#Using_A_MultiArmed_Bandit_to_Select_Daygame_Locations">Using A Multi-Armed Bandit to Select Daygame Locations</a></h1>
<blockquote>
<p>Glattes Eis<br/>
Ein Paradeis<br/>
Für den, der gut zu tanzen weiß.</p>
</blockquote>
<p><em>—Friedrich Nietzsche, “Für Tänzer”, 1882</em></p>
<p>Given the <a href="./data.html#Daygame">data of my daygame approaches</a>, I've
wondered for quite a while how I could use that data to make improve
my game. I don't think I've found anything solid yet, so instead I'm
going to try to use that data to estimate where I should do my next
daygame session. Beliefs are for action, after all.</p>
<p>For this, I trick ChatGPT into writing code for a <a href="https://en.wikipedia.org/wiki/Multi-armed_bandit">multi-armed
bandit</a> using
<a href="https://en.wikipedia.org/wiki/Thompson_sampling">Thompson sampling</a> of
<a href="https://en.wikipedia.org/wiki/Beta-distribution">beta-distributed</a> value
in <a href="https://en.wikipedia.org/wiki/Julia_(programming_language)">Julia</a>,
with getting a contact information as a reward of 1 and not getting any
contact information as a reward of 0.</p>
<p>(I know that this is a super impoverished view on what makes a good
daygame approach, but this is an exploratory exercise. I might add more &
different factors later.)</p>
<p>Of course, I can't tell ChatGPT that I am doing pickup, so I instead
say that I'm looking to optimize the quality of icecream I'm eating by
selecting different icecream shops. (Title of conversation: "Bayesian
Icecream Bandit").</p>
<p>The resulting code is is wholly confused and <em>bad</em>, with multiple subtle
and not so subtle bugs, and unelegant too—I reckon there's just not
enough Julia training data to make it capable enough, but I haven't
checked with the most recent models.</p>
<p>So after more than a year of procrastination, I decide to rewrite
the code, the result is <a href="./code/bandit/location.jl">here</a>.</p>
<p>If first loads the data, collects the number of successes (got contact
info) and failures (didn't get contact info), builds the corresponding
Beta distribution and past success ratio, throws it all into the DataFrame
<code>bandit</code> and then samples from the distribution. (The Beta distribution
is useful here because the more samples have been collected, the smaller
the variance—and this is exactly what we want, since less-explored
locations should be sampled more often.)</p>
<p>So the output of the script can look something like this, where the most
preferred option is at the bottom:</p>
<pre><code>julia> sort(bandit, :sample)
32×7 DataFrame
Row │ location successes failures success_prob dist sample name
│ Float64 Int64 Int64 Float64 Beta… Float64 String
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 571077.0 0 1 0.0 Beta{Float64}(α=1.0, β=2.0) 0.0202956 [REDACTED]
2 │ 371851.0 1 10 0.0909091 Beta{Float64}(α=2.0, β=11.0) 0.0311173 [REDACTED]
3 │ 449256.0 3 37 0.075 Beta{Float64}(α=4.0, β=38.0) 0.0320887 [REDACTED]
4 │ 785084.0 2 35 0.0540541 Beta{Float64}(α=3.0, β=36.0) 0.0338077 [REDACTED]
5 │ 98955.0 1 7 0.125 Beta{Float64}(α=2.0, β=8.0) 0.0493673 [REDACTED]
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ [REDACTED]
29 │ 817198.0 1 1 0.5 Beta{Float64}(α=2.0, β=2.0) 0.619935 [REDACTED]
30 │ 276017.0 3 5 0.375 Beta{Float64}(α=4.0, β=6.0) 0.787144 [REDACTED]
31 │ 692404.0 0 1 0.0 Beta{Float64}(α=1.0, β=2.0) 0.826964 [REDACTED]
32 │ 295748.0 1 0 1.0 Beta{Float64}(α=2.0, β=1.0) 0.982625 [REDACTED]
23 rows omitted
</code></pre>
<p>The top option (namely 702595) is, unfortunately, in another city hundreds
kilometers from where I live.</p>
<p>So I want to filter out irrelevant locations, so I create a set of
locations that are amenable to weekday/weekend and good/bad weather
daygame:</p>
<pre><code>weekday_good_weather=[709269, 449256, 76108, 449052, 175735, 276017, 796877, 835159, 823073, 696163, 843941, 132388, 496077, 32441, 399686, 793915]
weekday_bad_weather=[709269, 449256, 76108, 449052]
weekend_good_weather=[692404, 10939, 709269, 157691, 175735, 276017, 702595, 449256, 76108, 793915, 796877, 835159, 823073, 696163, 531828, 781627, 843941, 132388, 496077, 371851, 32441, 399686, 449052]
weekend_bad_weather=[709269, 449256, 76108, 449052, 702595, 531828]
</code></pre>
<p>Then, on a weekday with good weather (as it often is, at the time of
writing), I can then filter for locations in my current city with such
conditions:</p>
<pre><code>julia> filter(x->x[:location] in weekday_good_weather, sort(bandit, :sample))
14×7 DataFrame
Row │ location successes failures success_prob dist sample name
│ Float64 Int64 Int64 Float64 Beta… Float64 String
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 449256.0 3 37 0.075 Beta{Float64}(α=4.0, β=38.0) 0.0320887 [REDACTED]
2 │ 132388.0 9 71 0.1125 Beta{Float64}(α=10.0, β=72.0) 0.0543564 [REDACTED]
3 │ 175735.0 0 7 0.0 Beta{Float64}(α=1.0, β=8.0) 0.0923418 [REDACTED]
4 │ 823073.0 2 13 0.133333 Beta{Float64}(α=3.0, β=14.0) 0.11243 [REDACTED]
5 │ 449052.0 1 10 0.0909091 Beta{Float64}(α=2.0, β=11.0) 0.130813 [REDACTED]
6 │ 796877.0 0 3 0.0 Beta{Float64}(α=1.0, β=4.0) 0.153522 [REDACTED]
7 │ 696163.0 1 3 0.25 Beta{Float64}(α=2.0, β=4.0) 0.207392 [REDACTED]
8 │ 399686.0 2 5 0.285714 Beta{Float64}(α=3.0, β=6.0) 0.221436 [REDACTED]
9 │ 709269.0 22 87 0.201835 Beta{Float64}(α=23.0, β=88.0) 0.249274 [REDACTED]
10 │ 835159.0 5 16 0.238095 Beta{Float64}(α=6.0, β=17.0) 0.256881 [REDACTED]
11 │ 843941.0 0 1 0.0 Beta{Float64}(α=1.0, β=2.0) 0.272473 [REDACTED]
12 │ 496077.0 12 28 0.3 Beta{Float64}(α=13.0, β=29.0) 0.313311 [REDACTED]
13 │ 76108.0 1 1 0.5 Beta{Float64}(α=2.0, β=2.0) 0.617513 [REDACTED]
14 │ 276017.0 3 5 0.375 Beta{Float64}(α=4.0, β=6.0) 0.787144 [REDACTED]
</code></pre>
<p>The approach of using a multi-armed bandit here is nice because, if I
follow it, it avoids both situations where I undervalue really great
opportunities (because they're so overgamed nobody goes there anymore),
and I can notice when locations <em>do</em> get worse. I had for example thought
that 449256 was a great location, but the statistics definitely say
otherwise, and similar with 449052.</p>
<p>Additional variables I could take into account would be my enjoyment
of the approach, the attractiveness of the woman I'm speaking to, the
amount of time I'm spending between approaches, …</p>
<p>I will, however, exercise my judgement: I'll <em>probably</em> take a closer
look at 76108, even if I don't feel very enthusiastic about it.</p>
<!--
### Adding Unexplored Locations
I have some locations I haven't looked at in my city.
-->
<h3 id="Beyond_the_Bandit"><a class="hanchor" href="#Beyond_the_Bandit">Beyond the Bandit</a></h3>
<p>And if I wanted to be really fancy, I could use a 2-dimensional
<a href="https://en.wikipedia.org/wiki/Gaussian_Process">Gaussian process</a>, in
<a href="https://en.wikipedia.org/wiki/Kriging">kriging</a> fashion, to interpolate
geographical data and find the best daygame locations that way. <em>Probably</em>
overkill.</p>
<!--TODO: Fatebook predictions-->
<h3 id="See_Also"><a class="hanchor" href="#See_Also">See Also</a></h3>
<ul>
<li><a href="https://lilianweng.github.io/posts/2018-01-23-multi-armed-bandit/">The Multi-Armed Bandit Problem and Its Solutions (Lilian Weng, 2018)</a></li>
<li><a href="https://sebastiancallh.github.io/post/multi-armed-bandit-and-penguins/">A penguin fish-recommender systems using multi-armed bandit pt. 1 (Sebastian Callh, 2020)</a></li>
</ul>
</body></html>