-
Notifications
You must be signed in to change notification settings - Fork 0
/
logical.html
248 lines (246 loc) · 17.1 KB
/
logical.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
<html><head><title>niplav</title>
<link href="./favicon.png" rel="shortcut icon" type="image/png"/>
<link href="main.css" rel="stylesheet" type="text/css"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<!DOCTYPE HTML>
<style type="text/css">
code.has-jax {font: inherit; font-size: 100%; background: inherit; border: inherit;}
</style>
<script async="" src="./mathjax/latest.js?config=TeX-MML-AM_CHTML" type="text/javascript">
</script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
extensions: ["tex2jax.js"],
jax: ["input/TeX", "output/HTML-CSS"],
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
processEscapes: true,
skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
},
"HTML-CSS": { availableFonts: ["TeX"] }
});
</script>
<script>
document.addEventListener('DOMContentLoaded', function () {
// Change the title to the h1 header
var title = document.querySelector('h1')
if(title) {
var title_elem = document.querySelector('title')
title_elem.textContent=title.textContent + " – niplav"
}
});
</script>
</head><body><h2 id="home"><a href="./index.html">home</a></h2>
<p><em>author: niplav, created: 2024-04-22, modified: 2024-11-04, language: english, status: in progress, importance: 5, confidence: highly likely</em></p>
<blockquote>
<p><strong>In which to compare how similarly programs compute their outputs,
<a href="#A_Nave_Formula">naïvely</a> and <a href="#A_Less_Nave_Formula">less naïvely</a>.</strong></p>
</blockquote><div class="toc"><div class="toc-title">Contents</div><ul><li><a href="#A_Nave_Formula">A Naïve Formula</a><ul><li><a href="#Setup">Setup</a><ul></ul></li><li><a href="#Possible_Desiderata">Possible Desiderata</a><ul></ul></li><li><a href="#Formal_Definition">Formal Definition</a><ul><li><a href="#Explanation">Explanation</a><ul></ul></li><li><a href="#Different_Trace_Lengths">Different Trace Lengths</a><ul></ul></li></ul></li><li><a href="#Desiderata_Fulfilled">Desiderata Fulfilled?</a><ul><li><a href="#None">Proving $合(p, p)=0$</a><ul></ul></li><li><a href="#Proving_Positivity">Proving Positivity</a><ul><li><a href="#Only_A_Pseudometric">Only A Pseudometric</a><ul></ul></li></ul></li></ul></li></ul></li><li><a href="#A_Less_Nave_Formula">A Less Naïve Formula</a><ul></ul></li><li><a href="#See_Also">See Also</a><ul></ul></li></ul></div>
<h1 id="Logical_Correlation"><a class="hanchor" href="#Logical_Correlation">Logical Correlation</a></h1>
<p>Attention conservation notice: Premature formalization,
<a href="https://en.wiktionary.org/wiki/ab-">ab-</a><a href="https://www.lesswrong.com/posts/GhFoAxG49RXFzze5Y/what-s-so-bad-about-ad-hoc-mathematical-definitions">hoc mathematical
definition</a>.</p>
<p>In the <a href="https://www.lesswrong.com/tag/psychological-twin-prisoner-s-dilemma">twin prisoners
dilemma</a>,
I cooperate with my twin because we're implementing the same algorithm. If
we modify the twin slightly, for example to have a slightly longer right
index-finger-nail, I would still cooperate, even though we're different
algorithms, since little enough has been changed about our algorithms
that the internal states and the output are basically the same.</p>
<p>It could be that I'm in a prisoner's dilemma with some program
<code>$p^{\star}$</code> that, given some inputs, returns the same outputs as I do,
but for completely different "reasons"—that is, the internal states
are very different, and a slight change in input would cause the output
to be radically different. Intuitively, my similarity to <code>$p^{\star}$</code>
is pretty small, because even though it gives the same output, it gives
that output for very different reasons, so I don't have much control
over its outputs by controlling my own computations.</p>
<p>Let's call this similarity of two algorithms the
<strong>logical correlation</strong> between the two algorithms (<a href="https://forum.effectivealtruism.org/posts/JGazpLa3Gvvter4JW/cooperating-with-aliens-and-agis-an-ecl-explainer#fnref5yum06b0ld8">alternative
terms</a>
"include “logical influence,” “logical
correlation,” “correlation,” “quasi-causation,”
“metacausation,” […] “entanglement”[,] “acausal
influence”"). I take this term from <a href="./cs/ai/alignment/agent_foundations/embedded_agency_demski_garrabrant_2020.pdf">Demski & Garrabrant
2020</a>:</p>
<blockquote>
<p>One idea is that exact copies should be treated as 100% under
your “logical control”. For approximate models of you, or merely
similar agents, control should drop off sharply as logical correlation
decreases. But how does this work?</p>
</blockquote>
<p><em>—Abram Demski & Scott Garrabrant, <a href="./cs/ai/alignment/agent_foundations/embedded_agency_demski_garrabrant_2020.pdf">“Embedded Agency”</a> p. 12, 2020</em></p>
<p>Similarly:</p>
<blockquote>
<p>The reasoning behind cooperation does not involve a common cause of
all collaborators' decisions. Instead, the correlation may be viewed
as logical (Garrabrant et al., 2016): if I cooperate, then this implies
that all other implementations of my decision algorithm also cooperate.</p>
</blockquote>
<p><em>—Caspar Oesterheld, “Multiverse-wide Cooperation via Correlated Decision Making” p. 18, 2018</em></p>
<p>There isn't yet an established way of estimating the logical correlation
between different decision algorithms.</p>
<h2 id="A_Nave_Formula"><a class="hanchor" href="#A_Nave_Formula">A Naïve Formula</a></h2>
<p>Thus: Consider the some naïve formula (which we'll designate by
<code>$合$</code><sup id="fnref1"><a href="#fn1" rel="footnote">1</a></sup>) for logical correlation<sup id="fnref2"><a href="#fn2" rel="footnote">2</a></sup>: Something that takes in two
programs and returns a number that quantifies how similarly the two
programs compute what they compute.</p>
<h3 id="Setup"><a class="hanchor" href="#Setup">Setup</a></h3>
<p>Let a program <code>$p$</code> be a tuple of code for a <a href="https://en.wikipedia.org/wiki/Turing_Machine">Turing
machine</a> and intermediate
tape states after each command execution. We'll treat the final tape
state as the output, all in binary.</p>
<p>That is <code>$p=(c, t)$</code>, with <code>$c \in \{0, 1\}^+$</code> and <code>$t \in (\{0,
1\}^+)^+$</code>. Let <code>$l=|t|$</code> be the number of steps that <code>$p$</code> takes to halt.</p>
<p>For simplicity's sake, let's give <code>$t_l$</code> (the tape state upon halting)
the name <code>$o$</code>, the output.</p>
<h3 id="Possible_Desiderata"><a class="hanchor" href="#Possible_Desiderata">Possible Desiderata</a></h3>
<ol>
<li>If possible, we would want our formula for logical correlation to be a <a href="https://en.wikipedia.org/wiki/Metric_space">metric</a> or a <a href="https://en.wikipedia.org/wiki/Pseudometric_space">pseudometric</a> on the space of programs:
<ol>
<li><code>$合(p, p)=0$</code>.</li>
<li><a href="https://en.wikipedia.org/wiki/Symmetric_function">Symmetry</a>: <code>$合(p_1, p_2)=合(p_2, p_1)$</code>.</li>
<li>If <code>$p_1 \not=p_2$</code>, then <code>$合(p_1, p_2)>0$</code>. This condition is dropped if we're fine with <code>$合$</code> being a pseudometric.</li>
<li>The <a href="https://en.wikipedia.org/wiki/Triangle_Inequality">triangle inequality</a>: <code>$合(p_1, p_3) \le 合(p_1, p_2)+合(p_2, p_3)$</code>.</li>
</ol></li>
<li>If <code>$p_1$</code> and <code>$p_2$</code> have very similar outputs, and <code>$p_3$</code> has a very different output, then <code>$合(p_1, p_2)<合(p_1, p_3)$</code> (and <code>$合(p_1, p_2)<合(p_2, p_3)$</code>).
<ol>
<li>I'm not <em>so sure</em> about this one: Let's say there's <code>$p$</code>, which outputs a binary string <code>$o \in \{0, 1\}$</code>, and <code>$p^{\not \sim}$</code>, which computes <code>$o$</code> in a completely different way, as well as <code>$p^{\lnot}$</code>, which first runs <code>$p$</code>, and then flips every bit on the tape, finally returning the negation of <code>$o$</code>. In this case, it seems that if <code>$p$</code> is a decision algorithm, it has far more "control" over the output of <code>$p^{\lnot}$</code> than over <code>$p^{\not \sim}$</code>.</li>
<li>For the time being, I'm going to accept this, though ideally there'd be some way of handling the tradeoff between "computed the same output in a different way" and "computed a different output in a similar way".</li>
</ol></li>
</ol>
<h3 id="Formal_Definition"><a class="hanchor" href="#Formal_Definition">Formal Definition</a></h3>
<p>Let <code>$p_1=(c_1, t_1), p_2=(c_2, t_2)$</code> be two halting programs,
<code>$l_1, l_2$</code> are the number of steps it takes <code>$p_1, p_2$</code> to halt,
and <code>$o_1=t_{l_1}, o_2=t_{l_2}$</code> the last tape states (outputs) of the
two programs.</p>
<p>Then a formula for the logical correlation <code>$合$</code> of <code>$p_1, p_2$</code>,
a tape-state discount factor <code>$γ$</code><sup id="fnref3"><a href="#fn3" rel="footnote">3</a></sup>, and a <a href="https://en.wikipedia.org/wiki/String_similarity_metric">string-distance
metric</a> <code>$d:
\{0, 1\}^+ \times \{0, 1\}^+ \rightarrow ℕ$</code> could be</p>
<div>
$$合(p_1, p_2, γ)=d(o_1, o_2)+0.5-\frac{1}{2+\sum_{k=1}^{\min(l_1, l_2)} γ^k \cdot d(t_1(l_1-k), t_2(l_2-k))}$$
</div>
<p>The lower <code>$合$</code>, the higher the logical correlation between <code>$p_1$</code>
and <code>$p_2$</code>.</p>
<h4 id="Explanation"><a class="hanchor" href="#Explanation">Explanation</a></h4>
<p>Let's take a look at the equation again, but this time with some color
highlighting:</p>
<div>
$$合(p_1, p_2, γ)=\color{red}{d(o_1, o_2)}+0.5\color{orange}{-}\frac{1}{2+\color{green}{\sum_{k=1}^{\min(l_1, l_2)}} \color{purple}{γ^k \cdot} \color{blue}{d(t_1(l_1-k), t_2(l_2-k))}}$$
</div>
<p>The fundamental idea is that we first <span style="color:red">compute
the distance of the two outputs</span>. We then go <em>backward</em> through
the trace of the two programs, <span style="color:green">adding up the
pairwise </span> <span style="color:blue">differences of the traces at
each timestep</span>, potentially <span style="color:purple">discounting
the differences the farther they lie in in the "past" of the
output/further towards the start of the computation</span>.</p>
<p><img alt="" src="./img/logical/naive.png"/></p>
<p>Finally, we <span style="color:orange"><em>subtract</em></span> the inverse of
this (discounted) sum of trace differences from the output difference<sup id="fnref4"><a href="#fn4" rel="footnote">4</a></sup>.</p>
<p>The fraction can maximally be ½ (since we're
adding a number greater than zero to two in the <a href="./mathematics_notation_convention.html#Basics">lower
number</a>). Thus, since
we're subtracting a number ≤½ from <code>$d(o_1, o_2)+0.5$</code>, the resulting
logical correlation must be <code>$d(o_1, o_2)≤合(p_1, p_2, γ)≤d(o_1,
o_2)+0.5$</code>. That implies that for three programs with the same output,
their logical correlations all lie in that range. That also means that
if <code>$d(o_1, o_2)<d(o_1, o_3)$</code>, then it's the case that <code>$合(p_1, p_2,
γ)<合(p_1, p_3, γ)$</code>.</p>
<p>Or, in even simpler terms; "Output similarity dominates trace similarity."</p>
<h4 id="Different_Trace_Lengths"><a class="hanchor" href="#Different_Trace_Lengths">Different Trace Lengths</a></h4>
<p>One might also want to be able to deal with the fact that programs have
different trace lengths, and penalize that, e.g. amending the formula:</p>
<div>
$$合'(p_1, p_2, γ)=合(p_1, p_2, γ)+2^{|l_1-l_2|}$$
</div>
<h3 id="Desiderata_Fulfilled"><a class="hanchor" href="#Desiderata_Fulfilled">Desiderata Fulfilled?</a></h3>
<p>Does this fulfill our desiderata from earlier? I'll assume that the
string distance <code>$d$</code> is a metric, in the mathematical sense.</p>
<h4 id="None"><a class="hanchor" href="#None">Proving <code>$合(p, p)=0$</code></a></h4>
<p>Proof:</p>
<div>
$$d(o, o)+0.5-\frac{1}{2+\sum_{k=1}^{\min(l, l)} γ^k \cdot d(t(l-k), t(l-k))}= \\
0+0.5+\frac{1}{2+\sum_{k=1}^l y^k \cdot 0}= \\
0.5+\frac{1}{2+0}= \\
0$$
</div>
<p>Since <code>$d$</code> is a metric, <code>$d(o, o)=0$</code>.</p>
<h4 id="Proving_Positivity"><a class="hanchor" href="#Proving_Positivity">Proving Positivity</a></h4>
<p>The minimal logical correlation is 0.</p>
<div>
$$合(p_1, p_2, γ)≥0 \Leftrightarrow \\
d(o_1, o_2)+0.5-\frac{1}{2+\sum_{k=1}^{\min(l_1, l_2)} γ^k \cdot d(t_1(l_1-k), t_2(l_2-k))}≥0 \Leftrightarrow \\
d(o_1, o_2)+0.5 ≥\frac{1}{2+\sum_{k=1}^{\min(l_1, l_2)} γ^k \cdot d(t_1(l_1-k), t_2(l_2-k))} \Leftrightarrow \\
2 \cdot d(o_1, o_2)+1+d(o_1, o_2) \cdot \sum_{k=1}^{\min(l_1, l_2)} γ^k \cdot d(t_1(l_1-k), t_2(l_2-k))+0.5 \cdot \sum_{k=1}^{\min(l_1, l_2)} γ^k \cdot d(t_1(l_1-k), t_2(l_2-k))≥1 \Leftrightarrow \\
2 \cdot d(o_1, o_2)+1+(d(o_1, o_2)+0.5) \cdot \sum_{k=1}^{\min(l_1, l_2)} γ^k \cdot d(t_1(l_1-k), t_2(l_2-k)))≥1 \Leftrightarrow \\
2 \cdot d(o_1, o_2)+(d(o_1, o_2)+0.5) \cdot \sum_{k=1}^{\min(l_1, l_2)} γ^k \cdot d(t_1(l_1-k), t_2(l_2-k)))≥0 $$
</div>
<p>This is true, because:</p>
<ol>
<li><code>$d(o_1, o_2)≥0$</code>.</li>
<li><code>$d(t_1(l_1-k), t_2(l_2-k))≥0$</code> for every <code>$k$</code>.</li>
<li><code>$γ^k≥0$</code> for every <code>$k$</code>.</li>
</ol>
<p>Thus we have a sum of products of only positive things, which is in turn
positive itself.</p>
<h5 id="Only_A_Pseudometric"><a class="hanchor" href="#Only_A_Pseudometric">Only A Pseudometric</a></h5>
<p>But, unfortunately, it isn't the case that if <code>$p_1≠p_2$</code>, then
<code>$合(p_1, p_2, γ)>0$</code>. Thus <code>$合$</code> is only a pseudometric.</p>
<p>Consider, for example, two programs that both write a <code>$1$</code> to the
starting position on the tape and then halt, but with the difference that
<code>$p_1$</code> moves left and then right in the first two steps, and <code>$p_2$</code>
moves right and then left in the first two steps. Both programs have
the same tape-state trace, but are not "equal" in the strict sense as
they have different source codes.</p>
<p>You might now complain that this is vacuous, since the two programs have
no relevant functional difference. That's true, but I suspect there's some
trickier edge cases here where randomly initialized tapes can have very
different (or in other cases equal) tape-state traces. If you find an
<a href="https://en.wikipedia.org/wiki/Equivalence_class">equivalence class</a>
of programs that are just vacuously different, I'd be interested in
hearing about it.</p>
<!--
### Implementation
TODO: implement in Rust using Brainfuck-->
<h2 id="A_Less_Nave_Formula"><a class="hanchor" href="#A_Less_Nave_Formula">A Less Naïve Formula</a></h2>
<p>I think that <a href="#Some_Nave_Formula">the naïve formula</a> is <em>too</em> naïve.
Reasons:</p>
<ol>
<li>If you have a program <code>$p$</code> and a program <code>$p^-$</code> which is just <code>$p$</code> but with the tape reversed (so that whenever <code>$p$</code> makes a step left, <code>$p^-$</code> makes a step right, and same with right steps for <code>$p$</code>). Intuitively <code>$p$</code> and <code>$p^-$</code> should have a very high logical correlation, but <code>$合$</code> would tell us that they very much don't.</li>
<li><code>$合$</code> doesn't <em>really</em> make a statement about which states of the program influence which other states, it just compares them.</li>
<li>I'm a bit unhappy that the code doesn't factor into <code>$合$</code>, and ideally one would want to be able to compute the logical correlation without having to run the program.</li>
</ol>
<p>I think one can create a better (though not perfect)
way of determining logical correlations based on <a href="https://en.wikipedia.org/wiki/Shapley_value">Shapley
Values</a> and possible
tape-permutations.</p>
<h2 id="See_Also"><a class="hanchor" href="#See_Also">See Also</a></h2>
<ul>
<li>How does this relate to <a href="https://wiki.c2.com/?DataAndCodeAreTheSameThing">data=code</a>?</li>
<li><a href="https://www.lesswrong.com/posts/Xd9FLs4geRAWxkQPE/writing-causal-models-like-we-write-programs">Writing Causal Models Like We Write Programs (johnswentworth, 2020)</a></li>
</ul>
<!--TODO: check with brainfuck-->
<!--TODO: prove or disprove that this is a metric (or maybe a
pseudometric?—seems like there could the different programs with the
exactly same tape states…)-->
<div class="footnotes">
<hr/>
<ol>
<li id="fn1">
<p>Suggested by GPT-4. Stands for <a href="https://en.wiktionary.org/wiki/%E5%90%88#Definitions">joining, combining, uniting</a>. Also "to suit; to fit", "to have sexual intercourse", "to fight, to have a confrontation with", or "to be equivalent to, to add up". Maybe I could've used one of the <a href="https://en.wikipedia.org/wiki/Ghost_characters">ghost characters</a>. <a href="#fnref1" rev="footnote">↩</a></p>
</li>
<li id="fn2">
<p>Actually not explained in detail anywhere, as far as I can tell. <a href="#fnref2" rev="footnote">↩</a></p>
</li>
<li id="fn3">
<p>Which is needed because tape states close to the output are more important than tape states early on. <a href="#fnref3" rev="footnote">↩</a></p>
</li>
<li id="fn4">
<p>Together with two constants to avoid division by zero or same logical correlations for programs with different outputs differences. <a href="#fnref4" rev="footnote">↩</a></p>
</li>
</ol>
</div>
</body></html>