foom.html

<html><head><title>niplav</title>
<link href="./favicon.png" rel="shortcut icon" type="image/png"/>
<link href="main.css" rel="stylesheet" type="text/css"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<!DOCTYPE HTML>

<style type="text/css">
code.has-jax {font: inherit; font-size: 100%; background: inherit; border: inherit;}
</style>
<script async="" src="./mathjax/latest.js?config=TeX-MML-AM_CHTML" type="text/javascript">
</script>
<script type="text/x-mathjax-config">
	MathJax.Hub.Config({
	extensions: ["tex2jax.js"],
	jax: ["input/TeX", "output/HTML-CSS"],
	tex2jax: {
		inlineMath: [ ['$','$'], ["\\(","\\)"] ],
		displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
		processEscapes: true,
		skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
	},
	"HTML-CSS": { availableFonts: ["TeX"] }
	});
</script>
<script>
document.addEventListener('DOMContentLoaded', function () {
	// Change the title to the h1 header
	var title = document.querySelector('h1')
	if(title) {
		var title_elem = document.querySelector('title')
		title_elem.textContent=title.textContent + " – niplav"
	}
});
</script>
</head><body><h2 id="home"><a href="./index.html">home</a></h2>
<p><em>author: niplav, created: 2020-07-22, modified: 2024-07-16, language: english, status: maintenance, importance: 4, confidence: possible</em></p>
<blockquote>
<p><strong>In <a href="https://en.wikipedia.org/wiki/AI_control_problem">AI safety</a>,
significant time has been spent on the question of
the intelligence of AI systems over time, especially during
<a href="https://en.wikipedia.org/wiki/Technological_singularity#Hard_vs._soft_takeoff">takeoff</a>.
An underappreciated argument in the debate has been the idea that the more
intelligent an AI system becomes, the better it can search the space of
possible optimization algorithms. This post proposes a computational model
of this process by creating an n-dimensional search space and then running
a very simple <a href="https://en.wikipedia.org/wiki/Hill_climbing">hill-climbing</a>
algorithm and brute-force search on that space. Possible further
improvements to the model are suggested, and implications are discussed.</strong></p>
</blockquote><div class="toc"><div class="toc-title">Contents</div><ul><li><a href="#Introduction">Introduction</a><ul><li><a href="#Better_Algorithms_Exist">Better Algorithms Exist</a><ul></ul></li><li><a href="#Better_Algorithms_are_Quickly_Reachable">Better Algorithms are Quickly Reachable</a><ul></ul></li></ul></li><li><a href="#The_Argument">The Argument</a><ul></ul></li><li><a href="#The_Model">The Model</a><ul><li><a href="#Generating_the_Search_Space">Generating the Search Space</a><ul></ul></li><li><a href="#Filling_the_Search_Space">Filling the Search Space</a><ul><li><a href="#DiamondSquare">Diamond-Square</a><ul></ul></li></ul></li><li><a href="#Searching_the_Space">Searching the Space</a><ul><li><a href="#Hill_Climbing">Hill Climbing</a><ul></ul></li><li><a href="#Brute_Force_Search">Brute Force Search</a><ul></ul></li></ul></li><li><a href="#External_Intelligence_Improvements">External Intelligence Improvements</a><ul></ul></li></ul></li><li><a href="#Parameters">Parameters</a><ul></ul></li><li><a href="#Results">Results</a><ul></ul></li><li><a href="#Limitations">Limitations</a><ul><li><a href="#Search_Space_Wrong">Search Space Wrong</a><ul></ul></li><li><a href="#Values_for_Intelligence_Wrong">Values for Intelligence Wrong</a><ul></ul></li><li><a href="#BruteForce_Search">Brute-Force Search</a><ul></ul></li><li><a href="#Small_Dataset">Small Dataset</a><ul></ul></li><li><a href="#Small_Search_Spaces">Small Search Spaces</a><ul></ul></li></ul></li><li><a href="#Conclusion">Conclusion</a><ul></ul></li><li><a href="#See_Also">See Also</a><ul></ul></li><li><a href="#Appendix_A_Images_of_All_Runs">Appendix A: Images of All Runs</a><ul></ul></li></ul></div>
<h1 id="On_Discontinuous_and_Fast_Takeoff"><a class="hanchor" href="#On_Discontinuous_and_Fast_Takeoff">On Discontinuous and Fast Takeoff</a></h1>
<blockquote>
<p>Paraphrasing Roache (2008) the state of play is such that nobody
believes the result of a simulation, except the person who performed
the simulation, and everybody believes the result of an experiment,
except the person who ran the experiment.</p>
</blockquote>
<p><em>— Ryan G. McClarren, “Uncertainty Quantification and Predictive Computational Science“ p. 9, 2018</em></p>
<p>(Although the quote apparently goes back to Einstein, see “The
advancement of science, and its burdens” p. 13, only there with "theory"
instead of "simulation").</p>
<blockquote>
<p>And let me just make an aside. There’s a lot of meta-debate that goes
on in the AI safety community which I don’t understand why: it’s not
as if we haven’t got enough real work to do. So now we have meta-debates
about whether you should focus on short-term or long-term, or whether
we should try to reduce the conflict between the short-termers and the
long-termers and it’s like, there doesn’t need to be a conflict.</p>
</blockquote>
<p><em>— <a href="https://en.wikipedia.org/wiki/Stuart_J._Russell">Stuart J. Russell</a>, <a href="https://80000hours.org/podcast/episodes/stuart-russell-human-compatible-ai/#what-most-needs-to-be-done-015014">“The flaws that make today’s AI architecture unsafe and a new approach that could fix it”</a> (Episode 80 of the <a href="https://80000hours.org/podcast/">80,000 hours podcast</a>), 2020</em></p>
<h2 id="Introduction"><a class="hanchor" href="#Introduction">Introduction</a></h2>
<p>Many regard the trajectory of future AI development as
crucial<!--TODO: citation needed-->: when will AGI first be
developed? Will the development be slow, moderate, or fast (in
economic doublings, or in wallclock time)? Will one AGI system become a
<a href="https://en.wikipedia.org/wiki/Singleton_(global_governance)">singleton</a>,
i.e come to dominate the whole world (individual vs. collective
takeoff)? Will AGIs <a href="https://intelligence.org/ai-foom-debate/">FOOM</a>, i.e growing unexpectedly
fast? And: will there be one or more discontinuity when AI systems
recursively self-improve? This text attempts to shine light on the last
question and highlight a (supposedly underappreciated) argument for one or
more discontinuities in AI takeoff based on a computational model of an AI
searching the space of possible optimization algorithms for stronger ones.</p>
<p>In the model, there are three possible ways of an AI improving its
intelligence:</p>
<ul>
<li>By working and investing a part of the money to buy more hardware (e.g. by a cloud provider). This should grow roughly exponentially, at a similar speed to the <a href="https://en.wikipedia.org/wiki/Gross_world_product#Recent_growth">Gross World Product</a> (although the model does not consider wall-clock time)</li>
<li>By performing a very simple <a href="https://en.wikipedia.org/wiki/Hill_climbing">hill-climbing</a> algorithm in the space of possible optimization algorithms, such as rewriting parts of its source code in <a href="https://en.wikipedia.org/wiki/Assembly_language">assembly language</a> or making trivial small performance improvements</li>
<li>By <a href="https://en.wikipedia.org/wiki/Brute-force_search">brute-force searching</a> the space of possible optimization algorithms, possibly in the vincinity of the current algorithm</li>
</ul>
<p>The most fleshed-out model of AI takeoff is <a href="./doc/cs/ai/alignment/takeoff/a_compute_centric_framework_of_takeoff_speeds_davidson_2023.pdf">Davidson
2023</a>,
which makes the median prediction of 20% automation to 100% automation
in ~3 years (10th percentile: 0.8 years, 90th percentile: 12.5 years).</p>
<p>Along the axes of {fast, slow}×{continuous, discountinuous}, that feels
quite fast to me, even if it isn't very discountinuous.</p>
<p>The other reasons make me move towards "but it might be a lot faster
than that". One reason is that the Davidson model assumes that the human
brain performs 10¹⁵ FLOP/s, and that the AI systems will be at most
that efficient or slightly less efficient. So a lot of my disagreement is
around: <strong>how much the ceiling of cognitive algorithms is above humans</strong>
(my belief: very high<sub>80%</sub>), and the rest of the disagreement
is <strong>how quickly can AI systems move towards that ceiling</strong> (my belief:
not sure, but potentially within days<sub>40%</sub>).</p>
<h3 id="Better_Algorithms_Exist"><a class="hanchor" href="#Better_Algorithms_Exist">Better Algorithms Exist</a></h3>
<p>One reason is that human brains don't seem like the optimal substrate
for performing cognition: Warm &amp; wet, very low information transmission
speed (signals on neurons are limited to at most 200 m/s) <a href="https://www.lesswrong.com/posts/HhWhaSzQr6xmBki8F/birds-brains-planes-and-ai-against-appeals-to-the-complexity">Kokotajlo
2021</a>,
needing to self-construct and self-repair — and
still brains are incredibly sample-efficient! And I
suspect that, if anything, humans are at best at a <a href="https://www.lesswrong.com/posts/yuP4D4Pz79uyPS9KW">subspace
optimum</a> of cognitive
algorithms.</p>
<p>Then there's the <em>power of error-corrected/discrete/serial computation</em>:
Digital computers can make very long inferences in discrete domains
without problems, and when I introspect, I have the strong intuition
that my system 2 tries to approximate this, especially when trying
to enumerate options in a decision, recursively decompose a plan
into its components (which gets much easier <a href="https://bmk.sh/2020/08/17/Building-AGI-Using-Language-Models/">once you have a world
model</a>),
perform abstraction (while caching which parts of the
abstraction are tight and which are leaky)—but my system 2 only has
<a href="https://en.wikipedia.org/wiki/The_Magical_Number_Seven,_Plus_or_Minus_Two">7±2</a>
(or maybe <a href="https://en.wikipedia.org/wiki/Working_Memory#Capacity">actually just
4</a>?) usable
slots. And unless the limit is due to <a href="https://en.wikipedia.org/wiki/Combinatorial_explosion">combinatorial
explosion</a>
(which might be handleable by careful pruning, or prioritized search),
AI systems could have larger (perhaps vastly larger?) working memories.</p>
<p>The standard rejoinder here is that evolution has optimized human
brains really really hard, and our current technology is usually 2-6
orders of magnitude worse than what evolution has come up with<!--TODO:
find Christiano investigation into this-->. But if we believe that
error-corrected computation is quite rare in biology, then this opens
up a new niche to make progress in, similar to how there are no plants
in space because they couldn't evolve rocket-like tech and transparent
shells that were resistant enough in vacuum.</p>
<p>This points at an intuition I have: There is a bunch of α left
in combining error-corrected/discrete/serial computation (which
computers are good at) with error-resistant/continuous/parallel
computation (à la neural networks or brains). And especially if
I think about cognition through the lens of algorithms, it feels
like there's a <em>deep mine of algorithms</em>: The space of possible
algorithms is vast, and even in <em>very</em> simple problem domains we have
found surprising innovations (such as going from the <a href="https://en.wikipedia.org/wiki/Karatsuba_algorithm">Karatsuba
algorithm</a>
to the <a href="https://en.wikipedia.org/wiki/Sch%C3%B6nhage-Strassen_algorithm">Schönhage-Strassen
algorithm</a>,
or from the naive algorithm for the <a href="https://en.wikipedia.org/wiki/Maximum_Subarray_problem">maximum subarray
problem</a>
to Kadane's algorithm). My "optimism" here has been hindered
somewhat by some evidence on how well <a href="https://www.lesswrong.com/posts/J6gktpSgYoyq5q3Au/benchmarking-an-old-chess-engine-on-new-hardware">old chess algorithms perform on new
hardware</a>,
and the observation that the surprising algorithms we find are
usually galactic (such as in the case of the <a href="https://en.wikipedia.org/wiki/Computational_complexity_of_matrix_multiplication#Matrix_multiplication_exponent">decreasing shrinking
rate of the best-case exponent in the computational complexity of matrix
multiplication</a>—where
yet we still only use <a href="https://en.wikipedia.org/wiki/Strassen's_algorithm">Strassen's
algorithm</a>).</p>
<p>Additionally, there's some domains of computation of which we have
made little use, <em>because</em> our minds are limited in a way that makes
it difficult to think about them. As the adage goes, programming is
divided into four levels of difficulty: <code>if</code> statements, <code>while</code>
loops, <a href="https://en.wikipedia.org/wiki/Recursion">recursion</a> and
<a href="https://en.wikipedia.org/wiki/Parallelism_(computing)">parallelism</a>;
but what about domains like <a href="https://en.wikipedia.org/wiki/Self-modifying_code">self-modifying
code</a> (where, except
maybe <a href="https://en.wikipedia.org/wiki/G%C3%B6del_machine">Gödel machines</a>,
there is no respectable theory, and except <a href="https://en.wikipedia.org/wiki/Alexia_Massalin">Alexia
Massalin's</a>
<a href="https://en.wikipedia.org/wiki/Superoptimization">superoptimization</a> there
isn't really any application)? Although, to be fair, <a href="https://en.wikipedia.org/wiki/Neural_architecture_search">neural architecture
search</a> might
be getting there, sometime.</p>
<!--TODO: Additionally, people seem to have *forgotten about thinking*:-->
<p>My view on better algorithms existing is <em>not</em>
informed very much by <a href="https://www.lesswrong.com/posts/hvz9qjWyv8cLX9JJR/evolution-provides-no-evidence-for-the-sharp-left-turn">specific observations about
evolution</a>.</p>
<h3 id="Better_Algorithms_are_Quickly_Reachable"><a class="hanchor" href="#Better_Algorithms_are_Quickly_Reachable">Better Algorithms are Quickly Reachable</a></h3>
<p>As in the section about better algorithms existing, many of my intuitions
here come from algorithm design and/or regular software engineering.</p>
<p>One argument against discountinuous takeoff is a response
to the hypothesis of recursive self-improvement, in
which AI systems start finding improvements to their own
architectures more and more quickly (which I try to model
<a href="./toy_ai_takeoff_model.html">here</a>). The counterargument says that
before there will be AI systems that are really good at self-improvement,
there <a href="https://sideways-view.com/2018/02/24/takeoff-speeds/">will be systems that are first crappy and then merely okay at
self-improvement</a>.<!--TODO:
link page that collects examples of these in current ML?--></p>
<p>But usually, with algorithms, having a 99%-finished implementation of the
algorithm doesn't give you 99% of the benefit, nor does it give you 50%
or even 1% of the benefit. It simply doesn't work. And here intuitions
collide: I find it plausible that, in this case, the <a href="https://www.lesswrong.com/posts/xkRtegmqL2iyhtDB3/the-gods-of-straight-lines">The Gods of Straight
Lines</a>
do not interfere, and instead something far stranger is afoot, but
the machine learning intuition tells people that everything in neural
networks is continuous, so why wouldn't there be a continous path to a
TAI architecture?<!--TODO: link continuity assumption post by Kulveit?--></p>
<h2 id="The_Argument"><a class="hanchor" href="#The_Argument">The Argument</a></h2>
<p>One possible argument has the following premises:</p>
<ul>
<li>There exists a property of algorithms called “<strong>optimization power</strong>”
<ul>
<li>Optimization power describes (among other things) the ability of an algorithm to search a search space quickly and efficiently</li>
</ul></li>
<li>There is a space of algorithms, and each algorithm in that space can be given a number according to its optimization power</li>
<li>The space of algorithms has certain properties
<ul>
<li>There are algorithms that have positive optimization power (&gt;0)</li>
<li>There are algorithms that have vastly more optimization power than others</li>
</ul></li>
</ul>
<p>If these premises are accepted, it is possible for some search spaces to
construct an algorithm that attempts to find the best algorithm according
to its optimization power:</p>
<ul>
<li>Start with an algorithm with optimization power &gt;0</li>
<li>Let the algorithm search part of the search space for a better algorithm
<ul>
<li>If it finds one, it modifies itself to become that better algorithm</li>
<li>Else it does nothing</li>
</ul></li>
<li>  If it has searched the whole search space (or has proved that no
better algorithm can exist), it stops the search</li>
<li>The algorithm acquires some more resources such as computing power</li>
</ul>
<p>This can be visualized quite nicely when one imagines the search space
to be one-dimensional with arbitrary values (or perhaps its <a href="https://en.wikipedia.org/wiki/G%C3%B6del_numbering">Gödel
number</a>) on both axes
(which is of course not the case in reality):</p>
<p><img alt="A jagged graph with a maximum in the middle, a vertical red line at the minimum at the left, and a small horizontal orange line on top of it, T-shaped" src="./img/toy_ai_takeoff_model/search_step_2.png" title="A jagged graph with a maximum in the middle, a vertical red line at the minimum at the left, and a small horizontal orange line on top of it, T-shaped"/></p>
<p>At the first step in the optimization process, the algorithm is very
weak and can only search a small portion of the space.</p>
<p><img alt="The same graph with the two lines, and a lighter orange vertical line on top of the darker orange one. A wider horizontal yellow line on top of the light orange line" src="./img/toy_ai_takeoff_model/search_step_4.png" title="The same graph with the two lines, and a lighter orange vertical line on top of the darker orange one. A wider horizontal yellow line on top of the light orange line"/></p>
<p>At the second step in the optimization process, the algorithm is slightly
better than before and already defeating a local maximum.</p>
<p><img alt="The same graph with the lines as before, but now an even higher light green vertical, wider dark green line on top, the whole constellation forming a “stair-like” pattern" src="./img/toy_ai_takeoff_model/search_step_6.png" title="The same graph with the lines as before, but now an even higher light green vertical, wider dark green line on top, the whole constellation forming a “stair-like” pattern"/></p>
<p>The algorithm is now much stronger, and exploring a sizeable fraction
of the search space.</p>
<p><img alt="Same image as before, with a dark-green line on top of the highest horizontal one, now reaching the global maximum" src="./img/toy_ai_takeoff_model/search_step_7.png" title="Same image as before, with a dark-green line on top of the highest horizontal one, now reaching the global maximum"/></p>
<p>It has now practically found the global maximum.</p>
<p>The height of the vertical bar indicates the optimization power of
the current optimization process, while the width of the vertical bar
indicates the portion of the space searched by the current algorithm.
For simplicity, the current algorithm searches in its own vincinity,
which also might be a good heuristic (since similar algorithms might
have similar properties). The width of the horizontal bar increases as
the current optimization algorithm becomes stronger and stronger, which
leads to bigger subspaces being searched and in turn better algorithms
being found.</p>
<p>This argument might fail in many different ways, e.g. if
being more intelligent does not imply being more able to
search the space of optimization algorithms quickly (e.g. by
<a href="https://en.wikipedia.org/wiki/Data_compression">compression</a> and
searching the compressed data).</p>
<p>However, a conceptual argument is not sufficient here. It would be
advantageous to test whether a similarly explosive effect occurs in
higher-dimensional spaces with different characteristics.</p>
<h2 id="The_Model"><a class="hanchor" href="#The_Model">The Model</a></h2>
<p>A computational model can shed some light on whether the argument above
would actually bring about discontinuities in the recursive development
of artificial intelligence, and can also provide a more concrete ground
for disagreement (for by creating an opportunity for people to modify
the code and show that different search spaces, search algorithms and
external factors generate different conclusions).</p>
<p>On a high level, in
<a href="https://en.wikipedia.org/wiki/Pseudocode">pseudocode</a>, the model executes
a code similar to this:</p>
<pre><code>space=gen_space()
fill_space(space)

pos=random_pos()

factor=intelligence=1

for i=0, rounds
    print(intelligence)
    factor*=1.001
    intelligence=max(1, space[pos])*factor
    pos=climb(space, pos)
    pos=search_subpart(space, pos, intelligence)
</code></pre>
<p>First, the space is generated and filled with values. Then, the AI
repeatedly grows a little bit, does some hill-climbing, and brute-force
searches the neighbouring space.</p>
<h3 id="Generating_the_Search_Space"><a class="hanchor" href="#Generating_the_Search_Space">Generating the Search Space</a></h3>
<p>The search space the AI would be inquiring into here is the
<a href="https://en.wikipedia.org/wiki/Space_(mathematics)">space</a>
of all possible algorithms. While I'm not very knowledgeable
about the structure of the space of algorithms, it seems
plausible to me that it would be isomorphic to the perfect <a href="https://en.wikipedia.org/wiki/Binary_tree">binary
tree</a> with infinite depth (for
any given <a href="https://en.wikipedia.org/wiki/Turing_machine">turing machine</a>
with a binary tape).</p>
<!--TODO At the nth level, the parents describe the preceding bits, and the
node describes whether the nth bit is 0 or 1.-->
<!--Perhaps trinary tree? 0/1/$ for bits and ending-->
<p>However, since I do not know algorithms that would assign
possible values to nodes in the tree, as well as fitting
search algorithms, I decided instead to use a <a href="https://en.wikipedia.org/wiki/Euclidean_space">Euclidean
space</a> to stand in for
the space of all algorithms. Specifically, the metric space was even
further simplified as an n-dimensional array with equal side-lengths:</p>
<pre><code>dim=5
size=33
space=np.zeros([size]*dim)
</code></pre>
<p>However, I might return to more accurate representations of the space
of possible algorithms.</p>
<!--TODO: actually do that-->
<h3 id="Filling_the_Search_Space"><a class="hanchor" href="#Filling_the_Search_Space">Filling the Search Space</a></h3>
<p>The most important decision in this model is how to fill the search space
(that is, what values to give the points in the search space).</p>
<p>Since I am very confused about what what a useful approximation of
the search space of intelligent algorithms could look like, I start by
generalizing the Diamond-Square algorithm to n dimensions.</p>
<!--
TODO:

#### Desiderata for the Search Space

* Sparse: Most algorithms don't really do any optimization work at all
* "Spiky": Some optimization algorithms are vastly stronger than other, very similar algorithms
-->
<h4 id="DiamondSquare"><a class="hanchor" href="#DiamondSquare">Diamond-Square</a></h4>
<p>The <a href="https://en.wikipedia.org/wiki/Diamond-square_algorithm">Diamond-Square
algorithm</a>
is a fractal algorithm originally developed for terrain generation.</p>
<p>The generated landscapes often resemble mountain-ranges, they show
relatively few abrupt local maxima. An example for a generated landscape:</p>
<p><img alt="Space generated by the algorithm in two dimensions" src="./img/toy_ai_takeoff_model/twodim.png" title="Space generated by the algorithm in two dimensions"/></p>
<p>The size of the space is restricted to dimensions of height/width/length
etc. <code>$2^{n}+1$</code>.</p>
<pre><code>space=create_space(dim, size, minval, maxval, extrfact)
</code></pre>
<p><code>create_space</code> is described in detail in <a href="./diamond.html">Generalizing the Diamond-Square Algorithm to n Dimensions</a>.</p>
<h3 id="Searching_the_Space"><a class="hanchor" href="#Searching_the_Space">Searching the Space</a></h3>
<p>After generating the search space, it is searched for a number of times,
each time increasing the intelligence of the current search process by
a given factor.</p>
<pre><code>for i in range(0, rounds):
    factor*=growth
    intelligence=max(1, space[tuple(pos)])*factor
    f.write(str(space[tuple(pos)]) + "," + str(intelligence) + "\n")
</code></pre>
<p>To avoid algorithms of zero or negative intelligence, the floor of
intelligence is set to 1.</p>
<p>The space is searched in two different ways, starting from a random point:</p>
<pre><code>pos=[random.randint(0, size-1) for i in range(0, dim)]
pos=climb(space, pos, size, dim)
pos=search_around(space, pos, size, dim, intelligence)
</code></pre>
<h4 id="Hill_Climbing"><a class="hanchor" href="#Hill_Climbing">Hill Climbing</a></h4>
<p>First, the algorithm executes a very simple hill-climbing procedure. Here,
it examines the position next to it in every dimension (in the case of
two: in front of the current position, behind of the current position,
left to it and right to it), but not the corners, and chooses the
direction with the highest value. It then returns the position with
the highest value, if that value is greater than the one of the current
position.</p>
<pre><code>maxpos=np.array(pos)
for i in range(0, dim):
    pos[i]+=1
    if 0&lt;=pos[i]&lt;size:
        if space[tuple(pos)]&gt;space[tuple(maxpos)]:
            maxpos=np.array(pos)
    pos[i]-=2
    if 0&lt;=pos[i]&lt;size:
        if space[tuple(pos)]&gt;space[tuple(maxpos)]:
            maxpos=np.array(pos)
    pos[i]+=1
return maxpos
</code></pre>
<h4 id="Brute_Force_Search"><a class="hanchor" href="#Brute_Force_Search">Brute Force Search</a></h4>
<p>After hill-climbing, the model searches the neighbouring region of
the search space for better algorithms. The neighbouring region,
in this case, is a hypercube of dimension <code>$n$</code> and the size
<code>$1+2*\sqrt[n]{i^2}$</code>, with the current position being the center
of that cube (<code>$i$</code> is the current intelligence).</p>
<p>The choice of making the size of the hypercube cubic in the intelligence
is pretty much arbitrary. I will test with different approaches,
e.g. making it linear.<!--TODO: test around whether this makes any
difference--></p>
<pre><code>step=round(intelligence**(2/dim))
subpos=[slice(0,0)]*dim
for i in range(0, dim):
    subpos[i]=slice(max(0,pos[i]-step), min(size-1, pos[i]+step))
subsection=space[tuple(subpos)]
</code></pre>
<p>This subsection of the space is then brute-force searched for a maximum,
akin to the agent being able to reason about it and find near local
maxima.</p>
<pre><code>mp=np.where(subsection == np.amax(subsection))
pos=np.array([list(mp[i])[0]+subpos[i].start for i in range(0, dim)])
return pos
</code></pre>
<p>The position of the maximum found is then returned (often the current
position). A new maximum having been found is akin to the agent
discovering a more intelligent agent, and modifying itself to become
that agent.</p>
<p>This approach is not as efficient as it could be: If the agent is caught
at a local maximum, this approach leads to it searching parts of the
search space multiple times.</p>
<h3 id="External_Intelligence_Improvements"><a class="hanchor" href="#External_Intelligence_Improvements">External Intelligence Improvements</a></h3>
<p>Under this model, I assume an exponential growth as
a backdrop. This exponential growth could be <a href="https://en.wikipedia.org/wiki/Moore%27s_law">Moore's
Law</a> or <a href="https://en.wikipedia.org/wiki/Gross_world_product">Gross World
Product</a> growth,
or another uninterrupted exponential growth mode.</p>
<p>This factor is currently set to 1.001 per timestep, or 0.1% growth.
If the backdrop is Moore's Law, with a doubling time of 18 months (or
540 days), then a timestep would be</p>
<div>
    $$\frac{540 \hbox{ days}}{\ln_{1.001}(2)} \approx 0.779 \hbox{ days}$$
</div>
<p>If one assumes GWP growth as a backdrop instead, one gets a
doubling every 20 years (…yet. growth mindset) (see <a href="https://www.openphilanthropy.org/blog/modeling-human-trajectory" title="Modeling the Human Trajectory">Roodman
2020</a>), which
works out to</p>
<div>
    $$\frac{7300 \hbox{ days}}{\ln_{1.001}(2)} \approx 10.52 \hbox{ days}$$
</div>
<p>per timestep.</p>
<p>Both assumptions seem not unreasonable to me (although I'm not an expert
on such things): A day seems enough time for an AI to design and deploy
an improvement to its own source code, although I acknowledge this might
change with different AI designs (especially more clean and algorithmic
designs might improve faster, while fuzzy &amp; big neural nets might take
much longer).</p>
<!--TODO: add analysis with growth mode of current ML models-->
<h2 id="Parameters"><a class="hanchor" href="#Parameters">Parameters</a></h2>
<p>I ran the model several times, varying the size and dimensionality of
the search space. The spaces used were <code>$\mathbb{F}_{8193}^{1}$</code>,
<code>$\mathbb{F}_{16385}^{1}$</code>, <code>$\mathbb{F}_{32769}^{1}$</code>,
<code>$\mathbb{F}_{65537}^{1}$</code>, <code>$\mathbb{F}_{1048577}^{1}$</code>,
<code>$\mathbb{F}_{16777217}^{1}$</code>, <code>$\mathbb{F}_{4097}^{2}$</code>,
<code>$\mathbb{F}_{65}^{3}$</code>, <code>$\mathbb{F}_{129}^{3}$</code>,
<code>$\mathbb{F}_{257}^{3}$</code>, <code>$\mathbb{F}_{65}^{4}$</code>,
<code>$\mathbb{F}_{33}^{5}$</code>, <code>$\mathbb{F}_{17}^{6}$</code> and
<code>$\mathbb{F}_{9}^{8}$</code> (<code>$\mathbb{F}_{a}^{b}$</code> being the <a href="https://en.wikipedia.org/wiki/Vector_space">vector
space</a> of dimensionality
<code>$b$</code> for the <a href="https://en.wikipedia.org/wiki/Finite_field">finite field</a>
with <code>$a$</code> elements). The biggest search space contained 43m elements.</p>
<p>Each iteration ran through 2048 timesteps, with a growth of 1.001.</p>
<!--TODO: add explanations for minval, maxval & extrfact-->
<pre><code>datagen(1, 8193, 0, 256, 0.5, 2048, 1.001)
datagen(1, 16385, 0, 256, 0.5, 2048, 1.001)
datagen(1, 32769, 0, 256, 0.5, 2048, 1.001)
datagen(1, 65537, 0, 256, 0.5, 2048, 1.001)
datagen(1, 1048577, 0, 256, 0.5, 2048, 1.001)
datagen(1, 16777217, 0, 256, 0.5, 2048, 1.001)
datagen(2, 4097, 0, 256, 0.5, 2048, 1.001)  # 16785409
datagen(3, 65, 0, 256, 0.5, 2048, 1.001)    # 274625
datagen(3, 129, 0, 256, 0.5, 2048, 1.001)       # 2146689
datagen(3, 257, 0, 256, 0.5, 2048, 1.001)       # 16581375
datagen(4, 65, 0, 256, 0.5, 2048, 1.001)    # 17850625
datagen(5, 33, 0, 256, 0.5, 2048, 1.001)    # 39135393
datagen(6, 17, 0, 256, 0.5, 2048, 1.001)    # 24137569
datagen(8, 9, 0, 256, 0.5, 2048, 1.001)     # 43046721
</code></pre>
<p>I ran the model only once with each set of parameters, since I discovered
that some parts of the model are very slow and take quite some time to
execute on my puny hardware (I left the model running through the night).</p>
<p>I would like to run the model with a bigger search space, and more
often than once, but unless I optimize the code to be faster, I expect
the easiest option would be for me to get access to a computer that
is more capable than my current laptop. If you have access to such
a computer and want to run the code with other parameters on it,
<a href="./about.html#Contact">contact me</a> or modify the code yourself
(<a href="./code/toy_ai_takeoff_model/takeoff.py">relevant file 1</a>, <a href="./code/toy_ai_takeoff_model/ndim_diamond_square.py">relevant
file 2</a>) and send
me the results.</p>
<h2 id="Results"><a class="hanchor" href="#Results">Results</a></h2>
<p>A gzipped tarfile of the run data can be found
<a href="./data/toy_ai_takeoff_runs.tar.gz">here</a>.</p>
<p>The model generated 14 space, and ran 1 takeoff scenario for each. 9 of
the scenarios showed discontinuities after the first timestep (4 runs
showed one discontinuity, 4 showed two discontinuities, and 1 run showed
three discointuities). Late discontinuities, large discontinuities, and
a higher number of discontinuities seemed more likely in bigger search
spaces, and also with higher-dimensional search spaces.</p>
<p>Here are some graphs of the development of the search process. The
blue line indicates the intelligence of the current algorithm at fixed
resources, while the black line shows the intelligence of the current
algorithm with the current resources.</p>
<p><img alt="Largest one-dimensional run" src="./img/toy_ai_takeoff_model/1_16777217_0.5_1.001.png" title="Black graph: an exponential curve, making a small jump at position ~1400/2040. Blue graph: a straight horizontal line, making a small jump at the same point, but staying horizontal."/></p>
<p><em>A run in <code>$\mathbb{F}_{16777217}^{1}$</code>, with one discontinuity</em></p>
<p><img alt="Two-dimensional run" src="./img/toy_ai_takeoff_model/2_4097_0.5_1.001.png" title="Black graph: an exponential curve, making a small jump at position ~80/2040. Blue graph: a straight horizontal line, making a small jump at the same point, but staying horizontal."/></p>
<p><em>A run in <code>$\mathbb{F}_{4097}^{2}$</code>, with one early discontinuity</em></p>
<p><img alt="Two-dimensional run" src="./img/toy_ai_takeoff_model/4_65_0.5_1.001.png" title="Black graph: staying at -1 for ~500 timesteps, then making a very small jump, and growing very slowly afterwards. Another jump at ~1400, making existing growth much faster. Blue graph: a straight horizontal line, making a small jump at ~500, and a much bigger one at ~1400, but staying horizontal."/></p>
<p><em>A run in <code>$\mathbb{F}_{65}^{4}$</code>, with two discontinuities</em></p>
<p>This run illustrates why even jumps in intelligence can be a little
surprising: the second jump in intelligence both makes the system
around 16 times more intelligent (both controlled for resources and
real-world). Using the resources the system has acquired before the jump,
its growth in real-world intelligence is much faster than beforehand.
If humans were controlling the system before the jump, it has now become
much harder (or even impossible).</p>
<p><img alt="Two-dimensional run" src="./img/toy_ai_takeoff_model/8_9_0.5_1.001.png" title="After ~200 timesteps two medium-sized jumps, then no jumps the rest of the time (but still growth)."/></p>
<p><em>A run in <code>$\mathbb{F}_{9}^{8}$</code>, with two early discontinuities</em></p>
<p>This run contains two relatively early jumps, both of medium size. Being
the most high-dimensional model, it undermines the hypothesis that late &amp;
big jumps are more common in more high-dimensional settings.</p>
<p>In some scenarios, one can observe that the model makes a very early
jump to a relatively high level of capabilities. That probably happens
due to hill-climbing to the local maximum.</p>
<p>An example for this is the run in <code>$\mathbb{F}_{65}^{3}$</code>:</p>
<pre><code>11.465043286618974,11.476508329905592
79.33333333333333,79.49207933333331
173.441875,173.9627211240668
246.0,246.98547698424588
246.0,247.23246246123009
246.0,247.4796949236913
246.0,247.72717461861495
246.0,247.97490179323353
246.0,248.22287669502674
246.0,248.47109957172174
</code></pre>
<!--
### Uniform Values

### Normal Values

### Lognormal Values
-->
<h2 id="Limitations"><a class="hanchor" href="#Limitations">Limitations</a></h2>
<p>As suggested in the title, this model is very much exploratory, and is
in many regards highly inadequate for modeling real-world AI takeoff
scenarios.</p>
<h3 id="Search_Space_Wrong"><a class="hanchor" href="#Search_Space_Wrong">Search Space Wrong</a></h3>
<p>The biggest problem with this analysis is that the space of possible
algorithms (or minds) is nothing like the finite discrete euclidean
space I have used here. I believe that the space of all algorithms is
like an infinite binary tree, with each edge representing one program
(the left child of an edge being the current program appended with 0,
the right child being the current program appended with 1, and the root
being the empty program).</p>
<p>The reason why I didn't use this search space was that I had no idea
how to distribute the values for intelligence in the tree, as well
as being unsure how to implement both the hill climbing as well as the
brute-force search in that space (except only tentative ideas).</p>
<h3 id="Values_for_Intelligence_Wrong"><a class="hanchor" href="#Values_for_Intelligence_Wrong">Values for Intelligence Wrong</a></h3>
<p>A separate concern I have is about using the diamond square algorithm
to assign values for intelligence to points in the search space.</p>
<p>Diamond square was devised for producing convincing landscapes, and
therefore has following problems:</p>
<ol>
<li>The values it assigns are very near each other. I expect that
algorithms differ vastly in their general intelligence, with very few
algorithms being extremely intelligent, and most others ranking very low.</li>
<li>Related to the previous point, diamond square assigns 0 as the value
of a point very rarely, although most algorithms would have intelligence
0 under a reasonable metric.</li>
</ol>
<p>I will be experimenting to salvage both of these points. 1. could be
improved by changing the random number to assign to the current point
in the space not using a uniform distribution, but instead perhaps a
lognormal one. 2. Could be improved decreasing the numbers assigned and
then replacing negative values with 0 (or leaving them, if it turns out
that including algorithms of inverse intelligence makes sense).</p>
<h3 id="BruteForce_Search"><a class="hanchor" href="#BruteForce_Search">Brute-Force Search</a></h3>
<p>The brute-force search in the space around the algorithm is also far from
perfect. Apart from <a href="https://arbital.com/p/Vingean_uncertainty/">Vingean</a>
considerations, it seems that an AI would search the space much more
systematically, potentially ruling out large portions of the space by
proving that they can't contain more intelligent algorithms.</p>
<p>Furthermore, it would probably not search the space around the current
algorithm repeatedly: This leads to many repeated searches, which could
be avoided by just saving the positions of already observed points in
the search space.</p>
<p>Also, searching the space around the current algorithm makes hill-climbing
obsolete, unless the algorithm is so weak that the brute-force search
has a radius of less than 1: all hill-climbing steps are already included
in the portion of the search space searched in with brute-force.</p>
<!--
### Hill Climbing

Or something stronger, such as gradient descent?
-->
<h3 id="Small_Dataset"><a class="hanchor" href="#Small_Dataset">Small Dataset</a></h3>
<p>The dataset I base my conclusions on is relatively small, only 14 runs,
with a different space and parameters each. This is mostly due to the
fact that I am doing this on my laptop, and didn't want the experiment
running for several days (while it is running, I can't use my browser,
because the model and my browser use more RAM than I have available,
so the model is terminated by the OS).</p>
<p>Generating the search-space is much more expensive than doing several runs
in the same search space, so I will focus on implementing these first.</p>
<h3 id="Small_Search_Spaces"><a class="hanchor" href="#Small_Search_Spaces">Small Search Spaces</a></h3>
<p>The search spaces used in the model are relatively small, with the
biggest containing ~43m points, and the rest being around 25m points big.</p>
<p>This makes inferences about how AI systems will optimize in much bigger
(perhaps even infinite) search spaces harder.</p>
<!--
### Additional Factors

What else? Problems with continuity?
-->
<h2 id="Conclusion"><a class="hanchor" href="#Conclusion">Conclusion</a></h2>
<p>This text provides a toy-model for AI takeoff scenarios using
high-dimensional spaces filled using a n-dimensional variant of the
diamond square algorithm.</p>
<p>Running the model with limited computing power, I observe that
discontinuities indeed occur, and I hypothesize that in larger search
spaces discontinuities occur later, and are larger.</p>
<!--TODO: this makes sense: $n^t*c<n^{t+k}*c$, i.e. later jumps of the
same factor are larger-->
<p>However, due to multiple limitations, these conclusions are very
haphazard.</p>
<p>If people arguing in favour of discontinuous takeoffs agree that this
model is demonstrating one of their arguments, the main advantage of
this model could be that it now offers a more concrete case at which
skeptics can point to concrete implementation details or assumptions of
the models that they disagree with, as well as modify it and demonstrate
under which conditions no discontinuities occur.</p>
<h2 id="See_Also"><a class="hanchor" href="#See_Also">See Also</a></h2>
<ul>
<li><a href="https://sideways-view.com/2018/02/24/takeoff-speeds/">Takeoff speeds (Paul Christiano, 2018)</a></li>
<li><a href="https://www.lesswrong.com/rationality/optimization-and-the-intelligence-explosion">Optimization and the Intelligence Explosion (Eliezer Yudkowsky, 2015)</a></li>
<li>Discussions
<ul>
<li><a href="https://www.lesswrong.com/posts/dBBqHxZqvXKW3ZsGa/an-exploratory-toy-ai-takeoff-model">LessWrong</a></li>
</ul></li>
</ul>
<h2 id="Appendix_A_Images_of_All_Runs"><a class="hanchor" href="#Appendix_A_Images_of_All_Runs">Appendix A: Images of All Runs</a></h2>
<p><img alt="Run in 8193^1" src="./img/toy_ai_takeoff_model/1_8193_0.5_1.001.png" title="Run in 8193^1, simple exponential."/></p>
<p><em>A run in <code>$\mathbb{F}_{8193}^{1}$</code>, with no discontinuities</em></p>
<p><img alt="Run in 16385^1" src="./img/toy_ai_takeoff_model/1_16385_0.5_1.001.png" title="Run in 16385^1, simple exponential."/></p>
<p><em>A run in <code>$\mathbb{F}_{16385}^{1}$</code>, with no discontinuities</em></p>
<p><img alt="Run in 32769^1" src="./img/toy_ai_takeoff_model/1_32769_0.5_1.001.png" title="Run in 32769^1, simple exponential."/></p>
<p><em>A run in <code>$\mathbb{F}_{32769}^{1}$</code>, with no discontinuities</em></p>
<p><img alt="Run in 65537^1" src="./img/toy_ai_takeoff_model/1_65537_0.5_1.001.png" title="Run in 65537^1, simple exponential."/></p>
<p><em>A run in <code>$\mathbb{F}_{65537}^{1}$</code>, with no discontinuities</em></p>
<p><img alt="Run in 1048577^1" src="./img/toy_ai_takeoff_model/1_1048577_0.5_1.001.png" title="Run in 1048577^1, two discontinuities."/></p>
<p><em>A run in <code>$\mathbb{F}_{1048577}^{1}$</code>, with two discontinuities</em></p>
<p><img alt="Largest one-dimensional run" src="./img/toy_ai_takeoff_model/1_16777217_0.5_1.001.png" title="Black graph: an exponential curve, making a small jump at position ~1400/2040. Blue graph: a straight horizontal line, making a small jump at the same point, but staying horizontal."/></p>
<p><em>A run in <code>$\mathbb{F}_{16777217}^{1}$</code>, with one discontinuity</em></p>
<p><img alt="Two-dimensional run" src="./img/toy_ai_takeoff_model/2_4097_0.5_1.001.png" title="Black graph: an exponential curve, making a small jump at position ~80/2040. Blue graph: a straight horizontal line, making a small jump at the same point, but staying horizontal."/></p>
<p><em>A run in <code>$\mathbb{F}_{4097}^{2}$</code>, with one early discontinuity</em></p>
<p><img alt="Run in 65^3" src="./img/toy_ai_takeoff_model/3_65_0.5_1.001.png" title="Run in 65^3, no discontinuities."/></p>
<p><em>A run in <code>$\mathbb{F}_{65}^{3}$</code>, with no discontinuities</em></p>
<p><img alt="Run in 129^3" src="./img/toy_ai_takeoff_model/3_129_0.5_1.001.png" title="Run in 129^3, one discontinuity."/></p>
<p><em>A run in <code>$\mathbb{F}_{129}^{3}$</code>, with one discontinuity</em></p>
<p><img alt="Run in 257^3" src="./img/toy_ai_takeoff_model/3_257_0.5_1.001.png" title="Run in 257^3, two discontinuities."/></p>
<p><em>A run in <code>$\mathbb{F}_{257}^{3}$</code>, with two discontinuities</em></p>
<p><img alt="Two-dimensional run" src="./img/toy_ai_takeoff_model/4_65_0.5_1.001.png" title="Black graph: staying at -1 for ~500 timesteps, then making a very small jump, and growing very slowly afterwards. Another jump at ~1400, making existing growth much faster. Blue graph: a straight horizontal line, making a small jump at ~500, and a much bigger one at ~1400, but staying horizontal."/></p>
<p><em>A run in <code>$\mathbb{F}_{65}^{4}$</code>, with two discontinuities</em></p>
<p><img alt="Run in 33^5" src="./img/toy_ai_takeoff_model/5_33_0.5_1.001.png" title="Run in 33^5, with three discontinuities."/></p>
<p><em>A run in <code>$\mathbb{F}_{33}^{5}$</code>, with three discontinuities</em></p>
<p><img alt="Run in 17^6" src="./img/toy_ai_takeoff_model/6_17_0.5_1.001.png" title="Run in 17^6, with one discontinuity."/></p>
<p><em>A run in <code>$\mathbb{F}_{17}^{6}$</code>, with one discontinuity</em></p>
<p><img alt="Two-dimensional run" src="./img/toy_ai_takeoff_model/8_9_0.5_1.001.png" title="After ~200 timesteps two medium-sized jumps, then no jumps the rest of the time (but still growth)."/></p>
<p><em>A run in <code>$\mathbb{F}_{9}^{8}$</code>, with two early discontinuities</em></p>
</body></html>