logical.html

<html><head><title>niplav</title>
<link href="./favicon.png" rel="shortcut icon" type="image/png"/>
<link href="main.css" rel="stylesheet" type="text/css"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<!DOCTYPE HTML>

<style type="text/css">
code.has-jax {font: inherit; font-size: 100%; background: inherit; border: inherit;}
</style>
<script async="" src="./mathjax/latest.js?config=TeX-MML-AM_CHTML" type="text/javascript">
</script>
<script type="text/x-mathjax-config">
	MathJax.Hub.Config({
	extensions: ["tex2jax.js"],
	jax: ["input/TeX", "output/HTML-CSS"],
	tex2jax: {
		inlineMath: [ ['$','$'], ["\\(","\\)"] ],
		displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
		processEscapes: true,
		skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
	},
	"HTML-CSS": { availableFonts: ["TeX"] }
	});
</script>
<script>
document.addEventListener('DOMContentLoaded', function () {
	// Change the title to the h1 header
	var title = document.querySelector('h1')
	if(title) {
		var title_elem = document.querySelector('title')
		title_elem.textContent=title.textContent + " – niplav"
	}
});
</script>
</head><body><h2 id="home"><a href="./index.html">home</a></h2>
<p><em>author: niplav, created: 2024-04-22, modified: 2024-11-04, language: english, status: in progress, importance: 5, confidence: highly likely</em></p>
<blockquote>
<p><strong>In which to compare how similarly programs compute their outputs,
<a href="#A_Nave_Formula">naïvely</a> and <a href="#A_Less_Nave_Formula">less naïvely</a>.</strong></p>
</blockquote><div class="toc"><div class="toc-title">Contents</div><ul><li><a href="#A_Nave_Formula">A Naïve Formula</a><ul><li><a href="#Setup">Setup</a><ul></ul></li><li><a href="#Possible_Desiderata">Possible Desiderata</a><ul></ul></li><li><a href="#Formal_Definition">Formal Definition</a><ul><li><a href="#Explanation">Explanation</a><ul></ul></li><li><a href="#Different_Trace_Lengths">Different Trace Lengths</a><ul></ul></li></ul></li><li><a href="#Desiderata_Fulfilled">Desiderata Fulfilled?</a><ul><li><a href="#None">Proving $合(p, p)=0$</a><ul></ul></li><li><a href="#Proving_Positivity">Proving Positivity</a><ul><li><a href="#Only_A_Pseudometric">Only A Pseudometric</a><ul></ul></li></ul></li></ul></li></ul></li><li><a href="#A_Less_Nave_Formula">A Less Naïve Formula</a><ul></ul></li><li><a href="#See_Also">See Also</a><ul></ul></li></ul></div>
<h1 id="Logical_Correlation"><a class="hanchor" href="#Logical_Correlation">Logical Correlation</a></h1>
<p>Attention conservation notice: Premature formalization,
<a href="https://en.wiktionary.org/wiki/ab-">ab-</a><a href="https://www.lesswrong.com/posts/GhFoAxG49RXFzze5Y/what-s-so-bad-about-ad-hoc-mathematical-definitions">hoc mathematical
definition</a>.</p>
<p>In the <a href="https://www.lesswrong.com/tag/psychological-twin-prisoner-s-dilemma">twin prisoners
dilemma</a>,
I cooperate with my twin because we're implementing the same algorithm. If
we modify the twin slightly, for example to have a slightly longer right
index-finger-nail, I would still cooperate, even though we're different
algorithms, since little enough has been changed about our algorithms
that the internal states and the output are basically the same.</p>
<p>It could be that I'm in a prisoner's dilemma with some program
<code>$p^{\star}$</code> that, given some inputs, returns the same outputs as I do,
but for completely different "reasons"—that is, the internal states
are very different, and a slight change in input would cause the output
to be radically different. Intuitively, my similarity to <code>$p^{\star}$</code>
is pretty small, because even though it gives the same output, it gives
that output for very different reasons, so I don't have much control
over its outputs by controlling my own computations.</p>
<p>Let's call this similarity of two algorithms the
<strong>logical correlation</strong> between the two algorithms (<a href="https://forum.effectivealtruism.org/posts/JGazpLa3Gvvter4JW/cooperating-with-aliens-and-agis-an-ecl-explainer#fnref5yum06b0ld8">alternative
terms</a>
"include “logical influence,” “logical
correlation,” “correlation,” “quasi-causation,”
“metacausation,” […] “entanglement”[,] “acausal
influence”"). I take this term from <a href="./cs/ai/alignment/agent_foundations/embedded_agency_demski_garrabrant_2020.pdf">Demski &amp; Garrabrant
2020</a>:</p>
<blockquote>
<p>One idea is that exact copies should be treated as 100% under
your “logical control”. For approximate models of you, or merely
similar agents, control should drop off sharply as logical correlation
decreases. But how does this work?</p>
</blockquote>
<p><em>—Abram Demski &amp; Scott Garrabrant, <a href="./cs/ai/alignment/agent_foundations/embedded_agency_demski_garrabrant_2020.pdf">“Embedded Agency”</a> p. 12, 2020</em></p>
<p>Similarly:</p>
<blockquote>
<p>The reasoning behind cooperation does not involve a common cause of
all collaborators' decisions. Instead, the correlation may be viewed
as logical (Garrabrant et al., 2016): if I cooperate, then this implies
that all other implementations of my decision algorithm also cooperate.</p>
</blockquote>
<p><em>—Caspar Oesterheld, “Multiverse-wide Cooperation via Correlated Decision Making” p. 18, 2018</em></p>
<p>There isn't yet an established way of estimating the logical correlation
between different decision algorithms.</p>
<h2 id="A_Nave_Formula"><a class="hanchor" href="#A_Nave_Formula">A Naïve Formula</a></h2>
<p>Thus: Consider the some naïve formula (which we'll designate by
<code>$合$</code><sup id="fnref1"><a href="#fn1" rel="footnote">1</a></sup>) for logical correlation<sup id="fnref2"><a href="#fn2" rel="footnote">2</a></sup>: Something that takes in two
programs and returns a number that quantifies how similarly the two
programs compute what they compute.</p>
<h3 id="Setup"><a class="hanchor" href="#Setup">Setup</a></h3>
<p>Let a program <code>$p$</code> be a tuple of code for a <a href="https://en.wikipedia.org/wiki/Turing_Machine">Turing
machine</a> and intermediate
tape states after each command execution. We'll treat the final tape
state as the output, all in binary.</p>
<p>That is <code>$p=(c, t)$</code>, with <code>$c \in \{0, 1\}^+$</code> and <code>$t \in (\{0,
1\}^+)^+$</code>. Let <code>$l=|t|$</code> be the number of steps that <code>$p$</code> takes to halt.</p>
<p>For simplicity's sake, let's give <code>$t_l$</code> (the tape state upon halting)
the name <code>$o$</code>, the output.</p>
<h3 id="Possible_Desiderata"><a class="hanchor" href="#Possible_Desiderata">Possible Desiderata</a></h3>
<ol>
<li>If possible, we would want our formula for logical correlation to be a <a href="https://en.wikipedia.org/wiki/Metric_space">metric</a> or a <a href="https://en.wikipedia.org/wiki/Pseudometric_space">pseudometric</a> on the space of programs:
<ol>
<li><code>$合(p, p)=0$</code>.</li>
<li><a href="https://en.wikipedia.org/wiki/Symmetric_function">Symmetry</a>: <code>$合(p_1, p_2)=合(p_2, p_1)$</code>.</li>
<li>If <code>$p_1 \not=p_2$</code>, then <code>$合(p_1, p_2)&gt;0$</code>. This condition is dropped if we're fine with <code>$合$</code> being a pseudometric.</li>
<li>The <a href="https://en.wikipedia.org/wiki/Triangle_Inequality">triangle inequality</a>: <code>$合(p_1, p_3) \le 合(p_1, p_2)+合(p_2, p_3)$</code>.</li>
</ol></li>
<li>If <code>$p_1$</code> and <code>$p_2$</code> have very similar outputs, and <code>$p_3$</code> has a very different output, then <code>$合(p_1, p_2)&lt;合(p_1, p_3)$</code> (and <code>$合(p_1, p_2)&lt;合(p_2, p_3)$</code>).
<ol>
<li>I'm not <em>so sure</em> about this one: Let's say there's <code>$p$</code>, which outputs a binary string <code>$o \in \{0, 1\}$</code>, and <code>$p^{\not \sim}$</code>, which computes <code>$o$</code> in a completely different way, as well as <code>$p^{\lnot}$</code>, which first runs <code>$p$</code>, and then flips every bit on the tape, finally returning the negation of <code>$o$</code>. In this case, it seems that if <code>$p$</code> is a decision algorithm, it has far more "control" over the output of <code>$p^{\lnot}$</code> than over <code>$p^{\not \sim}$</code>.</li>
<li>For the time being, I'm going to accept this, though ideally there'd be some way of handling the tradeoff between "computed the same output in a different way" and "computed a different output in a similar way".</li>
</ol></li>
</ol>
<h3 id="Formal_Definition"><a class="hanchor" href="#Formal_Definition">Formal Definition</a></h3>
<p>Let <code>$p_1=(c_1, t_1), p_2=(c_2, t_2)$</code> be two halting programs,
<code>$l_1, l_2$</code> are the number of steps it takes <code>$p_1, p_2$</code> to halt,
and <code>$o_1=t_{l_1}, o_2=t_{l_2}$</code> the last tape states (outputs) of the
two programs.</p>
<p>Then a formula for the logical correlation <code>$合$</code> of <code>$p_1, p_2$</code>,
a tape-state discount factor <code>$γ$</code><sup id="fnref3"><a href="#fn3" rel="footnote">3</a></sup>, and a <a href="https://en.wikipedia.org/wiki/String_similarity_metric">string-distance
metric</a> <code>$d:
\{0, 1\}^+ \times \{0, 1\}^+ \rightarrow ℕ$</code> could be</p>
<div>
        $$合(p_1, p_2, γ)=d(o_1, o_2)+0.5-\frac{1}{2+\sum_{k=1}^{\min(l_1, l_2)} γ^k \cdot d(t_1(l_1-k), t_2(l_2-k))}$$
</div>
<p>The lower <code>$合$</code>, the higher the logical correlation between <code>$p_1$</code>
and <code>$p_2$</code>.</p>
<h4 id="Explanation"><a class="hanchor" href="#Explanation">Explanation</a></h4>
<p>Let's take a look at the equation again, but this time with some color
highlighting:</p>
<div>
        $$合(p_1, p_2, γ)=\color{red}{d(o_1, o_2)}+0.5\color{orange}{-}\frac{1}{2+\color{green}{\sum_{k=1}^{\min(l_1, l_2)}} \color{purple}{γ^k \cdot} \color{blue}{d(t_1(l_1-k), t_2(l_2-k))}}$$
</div>
<p>The fundamental idea is that we first <span style="color:red">compute
the distance of the two outputs</span>. We then go <em>backward</em> through
the trace of the two programs, <span style="color:green">adding up the
pairwise </span> <span style="color:blue">differences of the traces at
each timestep</span>, potentially <span style="color:purple">discounting
the differences the farther they lie in in the "past" of the
output/further towards the start of the computation</span>.</p>
<p><img alt="" src="./img/logical/naive.png"/></p>
<p>Finally, we <span style="color:orange"><em>subtract</em></span> the inverse of
this (discounted) sum of trace differences from the output difference<sup id="fnref4"><a href="#fn4" rel="footnote">4</a></sup>.</p>
<p>The fraction can maximally be ½ (since we're
adding a number greater than zero to two in the <a href="./mathematics_notation_convention.html#Basics">lower
number</a>). Thus, since
we're subtracting a number ≤½ from <code>$d(o_1, o_2)+0.5$</code>, the resulting
logical correlation must be <code>$d(o_1, o_2)≤合(p_1, p_2, γ)≤d(o_1,
o_2)+0.5$</code>. That implies that for three programs with the same output,
their logical correlations all lie in that range. That also means that
if <code>$d(o_1, o_2)&lt;d(o_1, o_3)$</code>, then it's the case that <code>$合(p_1, p_2,
γ)&lt;合(p_1, p_3, γ)$</code>.</p>
<p>Or, in even simpler terms; "Output similarity dominates trace similarity."</p>
<h4 id="Different_Trace_Lengths"><a class="hanchor" href="#Different_Trace_Lengths">Different Trace Lengths</a></h4>
<p>One might also want to be able to deal with the fact that programs have
different trace lengths, and penalize that, e.g. amending the formula:</p>
<div>
        $$合'(p_1, p_2, γ)=合(p_1, p_2, γ)+2^{|l_1-l_2|}$$
</div>
<h3 id="Desiderata_Fulfilled"><a class="hanchor" href="#Desiderata_Fulfilled">Desiderata Fulfilled?</a></h3>
<p>Does this fulfill our desiderata from earlier? I'll assume that the
string distance <code>$d$</code> is a metric, in the mathematical sense.</p>
<h4 id="None"><a class="hanchor" href="#None">Proving <code>$合(p, p)=0$</code></a></h4>
<p>Proof:</p>
<div>
    $$d(o, o)+0.5-\frac{1}{2+\sum_{k=1}^{\min(l, l)} γ^k \cdot d(t(l-k), t(l-k))}= \\
    0+0.5+\frac{1}{2+\sum_{k=1}^l y^k \cdot 0}= \\
    0.5+\frac{1}{2+0}= \\
    0$$
</div>
<p>Since <code>$d$</code> is a metric, <code>$d(o, o)=0$</code>.</p>
<h4 id="Proving_Positivity"><a class="hanchor" href="#Proving_Positivity">Proving Positivity</a></h4>
<p>The minimal logical correlation is 0.</p>
<div>
    $$合(p_1, p_2, γ)≥0 \Leftrightarrow \\
    d(o_1, o_2)+0.5-\frac{1}{2+\sum_{k=1}^{\min(l_1, l_2)} γ^k \cdot d(t_1(l_1-k), t_2(l_2-k))}≥0 \Leftrightarrow \\
    d(o_1, o_2)+0.5 ≥\frac{1}{2+\sum_{k=1}^{\min(l_1, l_2)} γ^k \cdot d(t_1(l_1-k), t_2(l_2-k))} \Leftrightarrow \\
    2 \cdot d(o_1, o_2)+1+d(o_1, o_2) \cdot \sum_{k=1}^{\min(l_1, l_2)} γ^k \cdot d(t_1(l_1-k), t_2(l_2-k))+0.5 \cdot \sum_{k=1}^{\min(l_1, l_2)} γ^k \cdot d(t_1(l_1-k), t_2(l_2-k))≥1 \Leftrightarrow \\
    2 \cdot d(o_1, o_2)+1+(d(o_1, o_2)+0.5) \cdot \sum_{k=1}^{\min(l_1, l_2)} γ^k \cdot d(t_1(l_1-k), t_2(l_2-k)))≥1 \Leftrightarrow \\
    2 \cdot d(o_1, o_2)+(d(o_1, o_2)+0.5) \cdot \sum_{k=1}^{\min(l_1, l_2)} γ^k \cdot d(t_1(l_1-k), t_2(l_2-k)))≥0 $$
</div>
<p>This is true, because:</p>
<ol>
<li><code>$d(o_1, o_2)≥0$</code>.</li>
<li><code>$d(t_1(l_1-k), t_2(l_2-k))≥0$</code> for every <code>$k$</code>.</li>
<li><code>$γ^k≥0$</code> for every <code>$k$</code>.</li>
</ol>
<p>Thus we have a sum of products of only positive things, which is in turn
positive itself.</p>
<h5 id="Only_A_Pseudometric"><a class="hanchor" href="#Only_A_Pseudometric">Only A Pseudometric</a></h5>
<p>But, unfortunately, it isn't the case that if <code>$p_1≠p_2$</code>, then
<code>$合(p_1, p_2, γ)&gt;0$</code>. Thus <code>$合$</code> is only a pseudometric.</p>
<p>Consider, for example, two programs that both write a <code>$1$</code> to the
starting position on the tape and then halt, but with the difference that
<code>$p_1$</code> moves left and then right in the first two steps, and <code>$p_2$</code>
moves right and then left in the first two steps. Both programs have
the same tape-state trace, but are not "equal" in the strict sense as
they have different source codes.</p>
<p>You might now complain that this is vacuous, since the two programs have
no relevant functional difference. That's true, but I suspect there's some
trickier edge cases here where randomly initialized tapes can have very
different (or in other cases equal) tape-state traces. If you find an
<a href="https://en.wikipedia.org/wiki/Equivalence_class">equivalence class</a>
of programs that are just vacuously different, I'd be interested in
hearing about it.</p>
<!--
### Implementation

TODO: implement in Rust using Brainfuck-->
<h2 id="A_Less_Nave_Formula"><a class="hanchor" href="#A_Less_Nave_Formula">A Less Naïve Formula</a></h2>
<p>I think that <a href="#Some_Nave_Formula">the naïve formula</a> is <em>too</em> naïve.
Reasons:</p>
<ol>
<li>If you have a program <code>$p$</code> and a program <code>$p^-$</code> which is just <code>$p$</code> but with the tape reversed (so that whenever <code>$p$</code> makes a step left, <code>$p^-$</code> makes a step right, and same with right steps for <code>$p$</code>). Intuitively <code>$p$</code> and <code>$p^-$</code> should have a very high logical correlation, but <code>$合$</code> would tell us that they very much don't.</li>
<li><code>$合$</code> doesn't <em>really</em> make a statement about which states of the program influence which other states, it just compares them.</li>
<li>I'm a bit unhappy that the code doesn't factor into <code>$合$</code>, and ideally one would want to be able to compute the logical correlation without having to run the program.</li>
</ol>
<p>I think one can create a better (though not perfect)
way of determining logical correlations based on <a href="https://en.wikipedia.org/wiki/Shapley_value">Shapley
Values</a> and possible
tape-permutations.</p>
<h2 id="See_Also"><a class="hanchor" href="#See_Also">See Also</a></h2>
<ul>
<li>How does this relate to <a href="https://wiki.c2.com/?DataAndCodeAreTheSameThing">data=code</a>?</li>
<li><a href="https://www.lesswrong.com/posts/Xd9FLs4geRAWxkQPE/writing-causal-models-like-we-write-programs">Writing Causal Models Like We Write Programs (johnswentworth, 2020)</a></li>
</ul>
<!--TODO: check with brainfuck-->
<!--TODO: prove or disprove that this is a metric (or maybe a
pseudometric?—seems like there could the different programs with the
exactly same tape states…)-->
<div class="footnotes">
<hr/>
<ol>
<li id="fn1">
<p>Suggested by GPT-4. Stands for <a href="https://en.wiktionary.org/wiki/%E5%90%88#Definitions">joining, combining, uniting</a>. Also "to suit; to fit", "to have sexual intercourse", "to fight, to have a confrontation with", or "to be equivalent to, to add up". Maybe I could've used one of the <a href="https://en.wikipedia.org/wiki/Ghost_characters">ghost characters</a>. <a href="#fnref1" rev="footnote">↩</a></p>
</li>
<li id="fn2">
<p>Actually not explained in detail anywhere, as far as I can tell. <a href="#fnref2" rev="footnote">↩</a></p>
</li>
<li id="fn3">
<p>Which is needed because tape states close to the output are more important than tape states early on. <a href="#fnref3" rev="footnote">↩</a></p>
</li>
<li id="fn4">
<p>Together with two constants to avoid division by zero or same logical correlations for programs with different outputs differences. <a href="#fnref4" rev="footnote">↩</a></p>
</li>
</ol>
</div>
</body></html>