diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 00000000..e69de29b diff --git a/2017/03/12/min-of-three.html b/2017/03/12/min-of-three.html new file mode 100644 index 00000000..6c139106 --- /dev/null +++ b/2017/03/12/min-of-three.html @@ -0,0 +1,381 @@ + + + + + + + Min of Three + + + + + + + + + + + + +
+ +
+ +
+
+ +

Min of Three

+

How to find a minimum of three double numbers? It may be surprising to you (it +certainly was to me), but there is more than one way to do it, and with big +difference in performance as well. It is possible to make this simple +calculation significantly faster by utilizing +CPU level parallelism.

+

The phenomenon described in this blog post was observed in this +thread of the Rust forum. I am not the one who found out what is +going on, I am just writing it down :)

+

We will be using Rust, but the language is not important, the original program +was in Java. What will turn out to be important is CPU architecture. The laptop +on which the measurements are done has i7-3612QM.

+
+ +

+ Test subject +

+

We will be measuring dynamic time warping algorithm. This algorithm +calculates a distance between two real number sequences, xs and ys. It is +very similar to edit distance or NeedlemanWunsch, +because it uses the same dynamic programming structure.

+

The main equation is

+ +
+ + +
dtw[i, j] =
+    min(dtw[i-1, j-1], dtw[i, j-1], dtw[i-1, j]) + (xs[i] - ys[j])^2
+ +
+

That is, we calculate the distance between each pair of prefixes of xs and +ys using the distances from three smaller pairs. This calculation can be +represented as a table where each cell depends on three others:

+ +
+ +Dynamic programming 2D table +
+

It is possible to avoid storing the whole table explicitly. Each row depends +only on the previous one, so we need to store only two rows at a time.

+ +
+ +Dynamic programming 2 rows +
+

Here is the Rust code for this version:

+ +
+ + +
fn dtw(xs: &[f64], ys: &[f64]) -> f64 {
+    // assume equal lengths for simplicity
+    assert_eq!(xs.len(), ys.len());
+    let n = xs.len();
+    let mut prev = vec![0f64; n + 1];
+    let mut curr = vec![std::f64::MAX; n + 1];
+    curr[0] = 0.0;
+
+    for ix in 1..(n + 1) {
+        std::mem::swap(&mut curr, &mut prev);
+        curr[0] = std::f64::MAX;
+        for iy in 1..(n + 1) {
+            let d11 = prev[iy - 1];
+            let d01 = curr[iy - 1];
+            let d10 = prev[iy];
+
+            // Find the minimum of d11, d01, d10
+            // by enumerating all the cases.
+            let d = if d11 < d01 {
+                if d11 < d10 { d11 } else { d10 }
+            } else {
+                if d01 < d10 { d01 } else { d10 }
+            };
+
+            let cost = {
+                let t = xs[ix - 1] - ys[iy - 1];
+                t * t
+            };
+
+            curr[iy] = d + cost;
+        }
+    }
+    curr[n]
+}
+ +
+

Code on Rust playground

+
+
+ +

+ Profile first +

+

Is it fast? If we compile it in --release mode with

+ +
+ + +
[build]
+rustflags = "-C target-cpu=native"
+ +
+

in ~/.cargo/config, it takes 435 milliseconds for two +random sequences of length 10000.

+

What is the bottleneck? Lets look at the instruction level profile of the main +loop using perf annotate command:

+ +
+ + +
   // Find the minimum of three numbers.
+    0.00 :       vmovsd -0x8(%rax,%rsi,8),%xmm1
+    0.00 :       vmovsd (%rax,%rsi,8),%xmm2
+    0.06 :       vminsd %xmm2,%xmm1,%xmm3
+    9.04 :       vminsd %xmm2,%xmm0,%xmm2
+    0.00 :       vcmpltsd %xmm0,%xmm1,%xmm0
+   22.70 :       vblendvpd %xmm0,%xmm3,%xmm2,%xmm0
+
+   // Calculate the squared error penalty.
+    0.00 :       vmovsd -0x8(%r12,%r10,8),%xmm1
+    0.00 :       vsubsd -0x8(%r13,%rsi,8),%xmm1,%xmm1
+   11.01 :       vmulsd %xmm1,%xmm1,%xmm1
+
+   // Store the result in the `curr` array.
+   // Note how xmm0 is used on the next iteration.
+   22.81 :       vaddsd %xmm1,%xmm0,%xmm0
+   10.67 :       vmovsd %xmm0,(%rdi,%rsi,8)
+ +
+

perf annotate uses AT&T assembly syntax, this means that the destination +register is on the right.

+

The xmm0 register holds the value of curr[iy], which was calculated on the +previous iteration. Values of prev[iy - 1] and prev[iy] are fetched into +xmm1 and xmm2. Note that although the original code contained three if +expressions, the assembly does not have any jumps and instead uses two min and +one blend instruction to select the minimum. Nevertheless, a significant +amount of time, according to perf, is spent calculating the minimum.

+
+
+ +

+ Optimization +

+

Can we do better? Lets use min2 function to calculate minimum of three +elements recursively:

+ +
+ + +
fn min2(x: f64, y: f64) -> f64 {
+    if x < y { x } else { y }
+}
+
+fn dtw(xs: &[f64], ys: &[f64]) -> f64 {
+    // ...
+            let d = min2(min2(d11, d01), d10);
+    // ...
+}
+ +
+

Code on Rust playground

+

This version completes in 430 milliseconds, which is a nice win of 5 +milliseconds over the first version, but is not that impressive. The assembly +looks cleaner though:

+ +
+ + +
    0.00 :       vmovsd -0x8(%rax,%rsi,8),%xmm1
+    0.28 :       vminsd %xmm0,%xmm1,%xmm0
+   31.14 :       vminsd (%rax,%rsi,8),%xmm0,%xmm0
+
+    0.06 :       vmovsd -0x8(%r12,%r10,8),%xmm1
+    0.28 :       vsubsd -0x8(%r13,%rsi,8),%xmm1,%xmm1
+   10.61 :       vmulsd %xmm1,%xmm1,%xmm1
+
+   23.29 :       vaddsd %xmm1,%xmm0,%xmm0
+   11.11 :       vmovsd %xmm0,(%rdi,%rsi,8)
+ +
+

Up to this point it was a rather boring blog post about Rust with some assembly +thrown in. But lets tweak the last variant just a little bit

+ +
+ + +
fn dtw(xs: &[f64], ys: &[f64]) -> f64 {
+    // ...
+            // Swap d10 and d01.
+            let d = min2(min2(d11, d10), d01);
+    // ...
+}
+ +
+

Code on Rust playground

+

This version takes only 287 milliseconds to run, which is roughly 1.5 times +faster than the previous one! However, the assembly looks almost the same

+ +
+ + +
    0.08 :       vmovsd -0x8(%rax,%rsi,8),%xmm1
+    0.17 :       vminsd (%rax,%rsi,8),%xmm1,%xmm1
+   16.40 :       vminsd %xmm0,%xmm1,%xmm0
+
+    0.00 :       vmovsd -0x8(%r12,%r10,8),%xmm1
+    0.17 :       vsubsd -0x8(%r13,%rsi,8),%xmm1,%xmm1
+   18.24 :       vmulsd %xmm1,%xmm1,%xmm1
+
+   17.15 :       vaddsd %xmm1,%xmm0,%xmm0
+   15.82 :       vmovsd %xmm0,(%rdi,%rsi,8)
+ +
+

The only difference is that two vminsd instructions are swapped. +But it is definitely much faster.

+
+
+ +

+ A possible explanation +

+

A possible explanation is a synergy of CPU level parallelism and speculative +execution. It was proposed by @krdln and @vitalyd. I dont know how to +falsify it, but it at least looks plausible to me!

+

Imagine for a second that instead of vminsd %xmm0,%xmm1,%xmm0 instruction +in the preceding assembly there is just vmovsd %xmm1,%xmm0. That is, we dont +use xmm0 from the previous iteration at all! This corresponds to the following +update rule:

+ +
+ +Parallel update +
+

The important property of this update rule is that CPU can calculate two cells +simultaneously in parallel, because there is no data dependency between +curr[i] and curr[i + 1].

+

We do have vminsd %xmm0,%xmm1,%xmm0, but it is equivalent to vmovsd +%xmm1,%xmm0 if xmm1 is smaller than xmm0. And this is often the case: +xmm1 holds the minimum of upper and diagonal cell, so it is likely to be less +then a single cell to the left. Also, the diagonal path is taken slightly more +often then the two alternatives, which adds to the bias.

+

So it looks like the CPU is able to speculatively execute vminsd and +parallelise the following computation based on this speculation! Isnt that +awesome?

+
+
+ +

+ Further directions +

+

Its interesting that we can make the computation truly parallel if we update +the cells diagonally:

+ +
+ +Diagonal update +
+

This is explored in the second part of this post.

+
+
+ +

+ Conclusion +

+

Despite the fact that Rust is a high level language, there is a strong +correlation between the source code and the generated assembly. Small tweaks to +the source result in the small changes to the assembly with potentially big +implications for performance. Also, perf is great!

+

Thats all :)

+
+
+
+ + + + + diff --git a/2017/03/18/min-of-three-part-2.html b/2017/03/18/min-of-three-part-2.html new file mode 100644 index 00000000..66e06776 --- /dev/null +++ b/2017/03/18/min-of-three-part-2.html @@ -0,0 +1,542 @@ + + + + + + + Min of Three Part 2 + + + + + + + + + + + + +
+ +
+ +
+
+ +

Min of Three Part 2

+

This is the continuation of the previous post about optimizing 2D grid +based dynamic programming algorithm for CPU level parallelism.

+
+ +

+ In The Previous Episode +

+

This is the code we are trying to make faster:

+ +
+ + +
fn dtw(xs: &[f64], ys: &[f64]) -> f64 {
+    // assume equal lengths for simplicity
+    assert_eq!(xs.len(), ys.len());
+    let n = xs.len();
+    let mut prev = vec![0f64; n + 1];
+    let mut curr = vec![std::f64::MAX; n + 1];
+    curr[0] = 0.0;
+
+    for ix in 1..(n + 1) {
+        ::std::mem::swap(&mut curr, &mut prev);
+        curr[0] = std::f64::MAX;
+        for iy in 1..(n + 1) {
+            let d11 = prev[iy - 1];
+            let d01 = curr[iy - 1];
+            let d10 = prev[iy];
+
+            // Find the minimum of d11, d01, d10
+            // by enumerating all the cases.
+            let d = if d11 < d01 {
+                if d11 < d10 { d11 } else { d10 }
+            } else {
+                if d01 < d10 { d01 } else { d10 }
+            };
+
+            let cost = {
+                let t = xs[ix - 1] - ys[iy - 1];
+                t * t
+            };
+
+            curr[iy] = d + cost;
+        }
+    }
+    curr[n]
+}
+ +
+

Code on Rust playground (293 ms)

+

It calculates dynamic time warping distance between two double +vectors using an update rule which is structured like this:

+ +
+ +Dynamic programming 2D table +
+

This code takes 293 milliseconds to run on a particular input +data. The speedup from 435 milliseconds stated in the previous post is +due to Moores law: Ive upgraded the CPU :)

+

We can bring run time down by tweaking how we calculate the minimum of +three elements.

+ +
+ + +
fn min2(x: f64, y: f64) -> f64 {
+    if x < y { x } else { y }
+}
+
+fn dtw(xs: &[f64], ys: &[f64]) -> f64 {
+    // ...
+            let d = min2(min2(d11, d10), d01);
+    // ...
+}
+ +
+

Code on Rust playground (210 ms)

+

This version takes only 210 milliseconds, presumably because the +minimum of two elements in the previous row can be calculated without +waiting for the preceding element in the current row to be computed.

+

The assembly for the main loop looks like this (AT&T syntax, +destination register on the right)

+ +
+ + +
   18.32    vmovsd -0x8(%rax,%rsi,8),%xmm1
+    0.00    vminsd (%rax,%rsi,8),%xmm1,%xmm1
+    6.72    vminsd %xmm0,%xmm1,%xmm0
+    4.64    vmovsd -0x8(%r12,%r10,8),%xmm1
+    0.00    vsubsd -0x8(%r13,%rsi,8),%xmm1,%xmm1
+    7.69    vmulsd %xmm1,%xmm1,%xmm1
+   36.14    vaddsd %xmm1,%xmm0,%xmm0
+   14.16    vmovsd %xmm0,(%rdi,%rsi,8)
+ +
+

Check the previous post for more details!

+
+
+ +

+ The parallel plan +

+

Can we loosen dependencies between cells even more to benefit from instruction +level parallelism? What if instead of filling the table row by row, we do it +diagonals?

+ +
+ +Diagonal update +
+

Wed need to remember two previous diagonals instead of one previous +row, but all the cells on the next diagonal would be independent! In +theory, compiler should be able to use SIMD instructions to make the +computation truly parallel.

+
+
+ +

+ Implementation Plan +

+

Coding up this diagonal traversal is a bit tricky, because you need to +map linear vector indices to diagonal indices.

+

The original indexing worked like this:

+ +
+ + +
        iy
+       ---->
+    | . . . .
+ ix | . . . .
+    | . . . .
+    V . . . .
+ +
+
    +
  • +ix and iy are indices in the input vectors. +
  • +
  • +The outer loop is over ix. +
  • +
  • +On each iteration, we remember two rows (curr and prev in the +code). +
  • +
+

For our grand plan, we need to fit a rhombus peg in a square hole:

+ +
+ + +
   id
+  ---->
+ . . . .        |
+   . . . .      | ix
+     . . . .    |
+       . . . .  V
+ +
+
    +
  • +id is the index of the diagonal. There are twice as much diagonals +as rows. +
  • +
  • +The outer loop is over id. +
  • +
  • +On each iteration we remember three columns (d1, d2 d3 in the +code). +
  • +
  • +There is a phase transition once weve crossed the main diagonal. +
  • +
  • +We can derive iy from the fact that ix + iy = id. +
  • +
+
+
+ +

+ Code +

+

The actual code looks like this:

+ +
+ + +
fn dtw(xs: &[f64], ys: &[f64]) -> f64 {
+    assert_eq!(xs.len(), ys.len());
+    let n = xs.len();
+    let mut d1 = vec![0f64; n + 1];
+    let mut d2 = vec![0f64; n + 1];
+    let mut d3 = vec![0f64; n + 1];
+    d2[0] = ::std::f64::MAX;
+
+    for id in 1..(2 * n + 1) {
+        ::std::mem::swap(&mut d1, &mut d2);
+        ::std::mem::swap(&mut d2, &mut d3);
+
+        let ix_range = if id <= n {
+            d3[0] = ::std::f64::MAX;
+            d3[id] = ::std::f64::MAX;
+            1..id
+        } else {
+            (id - n..n + 1)
+        };
+
+        for ix in ix_range {
+            let iy = id - ix;
+            let d = min2(min2(d2[ix - 1], d2[ix]), d1[ix - 1]);
+            let cost = {
+                let t = xs[ix - 1] - ys[iy - 1];
+                t * t
+            };
+            d3[ix] = d + cost;
+        };
+    }
+
+    d3[n]
+}
+ +
+

Code on Rust playground (185 ms)

+

It take 185 milliseconds to run. The assembly for the main loop is +quite interesting:

+ +
+ + +
    1.67    cmp    %rax,%rdx
+    0.00    jbe    6d95
+    1.95    lea    0x1(%rax),%rbx
+    8.09    cmp    %rbx,%rdx
+    0.98    jbe    6da4
+    1.12    cmp    %rax,%r8
+    0.00    jbe    6db3
+    3.49    cmp    %r12,%rax
+    0.00    jae    6de9
+    9.07    cmp    %r12,%rcx
+    0.00    jae    6dc5
+    0.56    cmp    %rbx,%r9
+    0.00    jbe    6dd7
+    2.23    vmovsd (%r15,%rax,8),%xmm0
+   11.72    vminsd 0x8(%r15,%rax,8),%xmm0,%xmm0
+    2.09    vminsd (%r11,%rax,8),%xmm0,%xmm0
+    2.51    vmovsd (%r14,%rax,8),%xmm1
+    7.95    mov    -0x88(%rbp),%rdi
+    3.07    vsubsd (%rdi,%rcx,8),%xmm1,%xmm1
+    3.91    vmulsd %xmm1,%xmm1,%xmm1
+   15.90    vaddsd %xmm1,%xmm0,%xmm0
+    8.37    vmovsd %xmm0,0x8(%r13,%rax,8)
+ +
+

First of all, we dont see any vectorized instructions, the code does +roughly the same operations as the in previous version. Also, there is +a whole bunch of extra branching instructions on the top. These are +bounds checks which were not eliminated this time. And this is great: +if I add all off-by one errors Ive made implementing diagonal +indexing, I would get an integer overflow! Nevertheless, weve got +some speedup.

+

Can we go further and add get SIMD instructions here? At the moment, +Rust does not have a stable way to explicitly emit SIMD +(its going to change some day) (UPDATE: we have SIMD on stable now!), so the only choice we +have is to tweak the source code until LLVM sees an opportunity for +vectorization.

+
+
+ +

+ SIMD +

+

Although bounds checks themselves dont slow down the code that much, +they can prevent LLVM from vectorizing. So lets dip our toes into +unsafe:

+ +
+ + +
unsafe {
+    let d = min2(
+        min2(*d2.get_unchecked(ix - 1), *d2.get_unchecked(ix)),
+        *d1.get_unchecked(ix - 1),
+    );
+    let cost = {
+        let t =
+            xs.get_unchecked(ix - 1) - ys.get_unchecked(iy - 1);
+        t * t
+    };
+    *d3.get_unchecked_mut(ix) = d + cost;
+}
+ +
+

Code on Rust playground (52 ms)

+

The code is as fast as it is ugly: it finishes in whooping 52 +milliseconds! And of course we see SIMD in the assembly:

+ +
+ + +
    5.74    vmovupd -0x8(%r8,%rcx,8),%ymm0
+    1.44    vminpd (%r8,%rcx,8),%ymm0,%ymm0
+    7.66    vminpd -0x8(%r11,%rcx,8),%ymm0,%ymm0
+    5.26    vmovupd -0x8(%rbx,%rcx,8),%ymm1
+    7.66    vpermpd $0x1b,0x20(%r12),%ymm2
+    5.26    vsubpd %ymm2,%ymm1,%ymm1
+    7.66    vmulpd %ymm1,%ymm1,%ymm1
+    8.61    vaddpd %ymm1,%ymm0,%ymm0
+    2.39    vmovupd %ymm0,(%rdx,%rcx,8)
+    2.39    vmovupd 0x18(%r8,%rcx,8),%ymm0
+    5.74    vminpd 0x20(%r8,%rcx,8),%ymm0,%ymm0
+    9.09    vminpd 0x18(%r11,%rcx,8),%ymm0,%ymm0
+    0.96    vmovupd 0x18(%rbx,%rcx,8),%ymm1
+    4.78    vpermpd $0x1b,(%r12),%ymm2
+    3.83    vsubpd %ymm2,%ymm1,%ymm1
+    3.83    vmulpd %ymm1,%ymm1,%ymm1
+   10.53    vaddpd %ymm1,%ymm0,%ymm0
+    4.78    vmovupd %ymm0,0x20(%rdx,%rcx,8)
+ +
+
+
+ +

+ Safe SIMD +

+

How can we get the same results with safe Rust? One possible way is to +use iterators, but in this case the resulting code would be rather +ugly, because youll need a lot of nested .zips. So lets try a +simple trick of hoisting the bounds checks of the loop. The idea is to +transform this:

+ +
+ + +
for i in 0..n {
+    assert i < xs.len();
+    xs.get_unchecked(i);
+}
+ +
+

into this:

+ +
+ + +
assert xs.len() < n;
+for i in 0..n {
+    xs.get_unchecked(i);
+}
+ +
+

In Rust, this is possible by explicitly slicing the buffer before the loop:

+ +
+ + +
let ix_range = if id <= n {
+    d3[0] = ::std::f64::MAX;
+    d3[id] = ::std::f64::MAX;
+    1..id
+} else {
+    (id - n..n + 1)
+};
+
+let ix_range_1 = ix_range.start - 1..ix_range.end - 1;
+let dn = ix_range.end - ix_range.start;
+
+let d1 = &d1[ix_range_1.clone()];
+let d2_0 = &d2[ix_range.clone()];
+let d2_1 = &d2[ix_range_1.clone()];
+let d3 = &mut d3[ix_range.clone()];
+let xs = &xs[ix_range_1.clone()];
+let ys = &ys[id - ix_range.end..id - ix_range.start];
+
+// All the buffers we access inside the loop
+// will have the same length
+assert!(
+    d1.len() == dn && d2_0.len() == dn && d2_1.len() == dn
+    && d3.len() == dn && xs.len() == dn && ys.len() == dn
+);
+
+for i in 0..dn { // so hopefully LLVM can eliminate bounds checks.
+    let d = min2(min2(d2_0[i], d2_1[i]), d1[i]);
+    let cost = {
+        let t = xs[i] - ys[ys.len() - i - 1];
+        t * t
+    };
+    d3[i] = d + cost;
+};
+ +
+

Code on Rust playground (107 ms)

+

This is definitely an improvement over the best safe version, but is +still twice as slow as the unsafe variant. Looks like some bounds +checks are still there! It is possible to find them by selectively +using unsafe to replace some indexing operations.

+

And it turns out that only ys is still checked!

+ +
+ + +
let t = xs[i] - unsafe { ys.get_unchecked(ys.len() - i - 1) };
+ +
+

Code on Rust playground (52 ms)

+

If we use unsafe only for ys, we regain all the performance.

+

LLVM is having trouble iterating ys in reverse, but the fix is easy: +just reverse it once at the beginning of the function:

+ +
+ + +
let ys_rev: Vec<f64> = ys.iter().cloned().rev().collect();
+ +
+

Code on Rust playground (50 ms)

+
+
+ +

+ Conclusions +

+

Weve gone from almost 300 milliseconds to only 50 in safe Rust. That +is quite impressive! However, the resulting code is rather brittle and +even small changes can prevent vectorization from triggering.

+

Its also important to understand that to allow for SIMD, we had to +change the underlying algorithm. This is not something even a very +smart compiler could do!

+
+
+
+ + + + + diff --git a/2017/03/25/nixos-notes.html b/2017/03/25/nixos-notes.html new file mode 100644 index 00000000..d52f3a83 --- /dev/null +++ b/2017/03/25/nixos-notes.html @@ -0,0 +1,203 @@ + + + + + + + NixOS Notes + + + + + + + + + + + + +
+ +
+ +
+
+ +

NixOS Notes

+

I had bought a new laptop recently, which was a perfect opportunity to take a +fresh look at my NixOS setup.

+

As usual, there are some hacks and not obvious things which I would like to +document just in case :)

+
+ +

+ If it does not work, update +

+

Ive tried installed a stable 16.09 version first, but live CD didnt manage to +start the X server properly. This was easy to fix by switching to the then beta +17.03.

+
+
+ +

+ UEFI +

+

It is my first system which uses UEFI instead of BIOS, and I was +pleasantly surprised by how everything just worked. Documentation contains only +a short paragraph about UEFI, but its everything you need. The only hiccup on +my side happened when I enabled GRUB together with systemd-boot: you dont +need GRUB at all, system-boot is a bootloader which handles everything.

+
+
+ +

+ If it does not work, fix the obvious problem +

+

After Ive installed everything, I was presented with a blank screen +instead of my desktop environment (with the live CD everything +worked). It took me ages to debug the issue, while the fix was super +trivial: add videoDrivers = [ "intel" ]; to xserver config and +"noveau" to blacklistedKernelModules.

+
+
+ +

+ Rust +

+

While nix is the best way to manage Linux desktop I am aware of, +rustup is the most convenient way of managing Rust toolchains. +Unfortunately its not easy to make rustup play nicely with NixOS (UPDATE: +rustup is now packaged in nixpkgs and just works). Rustup downloads binaries of +the compiler and Cargo, but it is impossible to launch unmodified binaries on +NixOS because it a lacks conventional loader.

+

The fix I came up with is a horrible hack which goes against +everything in NixOS. Here it is:

+ +
+ + +
environment.extraInit = let loader = "ld-linux-x86-64.so.2"; in ''
+  export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/run/current-system/sw/lib:${pkgs.stdenv.cc.cc.lib}/lib"
+  ln -fs ${pkgs.stdenv.cc.libc.out}/lib/${loader} /lib64/${loader}
+'';
+ +
+

It makes the loader and shared libraries (rustup needs zlib) visible +to binaries compiled for x64 Linux.

+
+
+ +

+ Idea +

+

Another software which I wish to update somewhat more frequently than +other packages is IntelliJ IDEA (I write a fair amount of Kotlin and +Rust). NixOS has a super convenient mechanism to do this: +packageOverrides. Here is my ~/nixpkgs/config.nix:

+ +
+ + +
{
+  packageOverrides = pkgs: rec {
+    idea-community = let
+      version = "2017.1";
+      sha256 = "750b517742157475bb690c1cc8f21ac151a754a38fec5c99a4bb473efd71da5d";
+    in
+      pkgs.idea.idea-community.overrideDerivation (attrs: rec {
+        inherit version;
+        name = "idea-community-${version}";
+        src = pkgs.fetchurl {
+          inherit sha256;
+          url = "https://download.jetbrains.com/idea/ideaIC-${version}.tar.gz";
+        };
+      });
+  };
+}
+ +
+

It allows to use the most recent IDEA with the stable NixOS channel.

+
+
+
+ + + + + diff --git a/2017/10/21/lldb-dynamic-type.html b/2017/10/21/lldb-dynamic-type.html new file mode 100644 index 00000000..39335f26 --- /dev/null +++ b/2017/10/21/lldb-dynamic-type.html @@ -0,0 +1,168 @@ + + + + + + + Dynamic types in LLDB + + + + + + + + + + + + +
+ +
+ +
+
+ +

Dynamic types in LLDB

+

If you are wondering how debuggers work, I suggest reading Eli Benderskys +eli-on-debuggers. However after having read these notes myself, I still +had one question unanswered. Namely, how can debugger show fields of a class, if +the type of the class is known only at runtime?

+
+ +

+ Example +

+

Consider this situation: you have a pointer of type A*, which at runtime holds +a value of some subtype of A. Could the debugger display the fields of the +actual type? Turns out, it can handle cases like the one below just fine!

+ +
+ + +
struct Base { ... };
+
+struct Derived: Base { ... };
+
+void foo(Base& x) {
+    // `x` can be `Derived` or `Base` here.
+    // How can debugger show fields of `Derived` then?
+}
+ +
+
+
+ +

+ DWARF +

+

Could it be possible that information about dynamic types is present in DWARF? +If we look at the DWARF, well see that theres layout information for both +Base and Derive types, as well as a entry for x parameter, which says that +it has type Base. And this makes sense: we dont know that x is Derived +until runtime! So debugger must somehow figure the type of the variable +dynamically.

+
+
+ +

+ No Magic +

+

As usual, theres no magic. For example, LLDB has a hard-coded knowledge of C++ +programming language, which allows debugger to inspect types at runtime. +Specifically, this is handled by LanguageRuntime LLDB plugin, which has a +curious function GetDynamicTypeAndAddress, whose job is to poke the +representation of value to get its real type and adjust pointer, if necessary +(remember, with multiple inheritance, casts may change the value of the +pointer).

+

The implementation of this function for C++ language lives in +ItaniumABILanguageRuntime.cpp although, unlike C, C++ lacks a +standardized ABI, almost all compilers on all non-windows platforms use a +specific ABI, confusingly called Itanium (after a now effectively dead +64-bit CPU architecture).

+
+
+
+ + + + + diff --git a/2018/01/03/make-your-own-make.html b/2018/01/03/make-your-own-make.html new file mode 100644 index 00000000..1f7a0ab3 --- /dev/null +++ b/2018/01/03/make-your-own-make.html @@ -0,0 +1,258 @@ + + + + + + + Make your own make + + + + + + + + + + + + +
+ +
+ +
+
+ +

Make your own make

+
+ +

+ Introduction +

+

One of my favorite features of Cargo is that it is not a general +purpose build tool. This allows Cargo to really excel at the task of building +Rust code, without usual Turing tarpit of build configuration files. I have yet +to see a complicated Cargo.toml file!

+

However, once a software project grows, its almost inevitable that it will +require some tasks besides building Rust code. For example, you might need to +integrate several languages together, or to setup some elaborate testing for +non-code aspects of your project, like checking the licenses, or to establish an +elaborate release procedure.

+

For such use-cases, a general purpose task automation solution is needed. In +this blog post I want to describe one possible approach, which leans heavily on +Cargos built-in functionality.

+ +
+
+ +

+ Existing Solutions +

+

The simplest way to automate something is to write a shell script. However there +are few experts in the arcane art of shell scripting, and shell scripts are +inherently platform dependent.

+

The same goes for make, with its many annoyingly similar flavors.

+

Two tools which significantly improve on the ease of use and ergonomics are +just and cargo make. Alas, they still mostly rely on the +shell to actually execute the tasks.

+
+
+ +

+ Reinventing the Wheel +

+

Obligatory XKCD 927:

+ +
+ +xkcd 927 +
+

An obvious idea is to use Rust for task automation. Originally, I have proposed +creating a special Cargo subcommand to execute build tasks, implemented as Rust +programs, in this +thread. +However, since then I realized that there are built-in tools in Cargo which +allow one to get a pretty ergonomic solution. Namely, the combination of +workspaces, aliases and ability to define binaries seems to do the trick.

+
+
+ +

+ Elements of the Solution +

+

If you just want a working example, see this +commit.

+

A typical Rust project looks like this

+ +
+ + +
frobnicator/
+  Cargo.toml
+  src/
+    lib.rs
+ +
+

Suppose that we want to add a couple of tasks, like generating some code from +some specification in the RON format, or +grepping the source code for TODO marks.

+

First, create a special tools package:

+ +
+ + +
frobnicator/
+  Cargo.toml
+  src/
+    lib.rs
+  tools/
+    Cargo.toml
+    src/bin/
+      gen.rs
+      todo.rs
+ +
+

The tools/Cargo.toml might look like this:

+ +
+ + +
# file: frobnicator/tools/Cargo.toml
+
+[package]
+name = "tools"
+version = "0.1.0"
+authors = []
+# We never publish our tasks
+publish = false
+
+[dependencies]
+# These dependencies are isolated from the main crate.
+serde = "1.0.26"
+serde_derive = "1.0.26"
+file = "1.1.1"
+ron = "0.1.5"
+ +
+

Then, we add a +[workspace] +to the parent package:

+ +
+ + +
# file: frobnicator/Cargo.toml
+
+[workspace]
+members = ["tools"]
+ +
+

We need this section because tools is not a dependency of frobnicator, so it +wont be picked up automatically.

+

Then, we write code to accomplish the tasks in tools/src/bin/gen.rs and +tools/src/bin/todo.rs.

+

Finally, we add frobnicator/.cargo/config with the following contents:

+ +
+ + +
# file: frobnicator/.cargo/config
+
+[alias]
+gen  = "run --package tools --bin gen"
+todo = "run --package tools --bin todo"
+ +
+

Voilà! Now, running cargo gen or cargo todo will execute the tasks!

+

Discussion on /r/rust.

+
+
+
+ + + + + diff --git a/2018/03/03/stopping-a-rust-worker.html b/2018/03/03/stopping-a-rust-worker.html new file mode 100644 index 00000000..90658eac --- /dev/null +++ b/2018/03/03/stopping-a-rust-worker.html @@ -0,0 +1,549 @@ + + + + + + + Stopping a Rust Worker + + + + + + + + + + + + +
+ +
+ +
+
+ +

Stopping a Rust Worker

+

This is a small post about a specific pattern for cancellation in the Rust +programming language. The pattern is simple and elegant, but its rather +difficult to come up with it by yourself.

+
+ +

+ Introducing a worker +

+

To be able to stop a worker, we need to have one in the first place! So, lets +implement a model program.

+

The task is to read the output line-by-line, sending these lines to another thread +for processing (echoing the line back, with ❤️). +My solution looks like this:

+ +
+ + +
use std::io::BufRead;
+use std::sync::mpsc::{Sender, channel};
+use std::thread;
+
+fn main() {
+    let worker = spawn_worker();
+
+    let stdin = ::std::io::stdin();
+    for line in stdin.lock().lines() {
+        let line = line.unwrap();
+        worker.send(Msg::Echo(line))
+            .unwrap();
+    }
+
+    println!("Bye!");
+}
+
+enum Msg {
+    Echo(String),
+}
+
+fn spawn_worker() -> Sender<Msg> {
+    let (tx, rx) = channel();
+    thread::spawn(move || {
+        loop {
+            let msg = rx.recv().unwrap();
+            match msg {
+                Msg::Echo(msg) => println!("{} ❤️", msg),
+            }
+        }
+    });
+    tx
+}
+ +
+

The program seems to work:

+ +
+ + +
$ cargo r
+    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
+     Running `target/debug/worker`
+hello
+hello ❤️
+world
+world ❤️
+Bye!
+ +
+
+
+ +

+ Stopping the worker, the obvious way +

+

Now that we have a worker, lets add a new requirement.

+

When the user types stop, the worker (but not the program itself) should be halted.

+

How can we do this? The most obvious way is to add a new variant, Stop, to the Msg +enum, and break out of the workers loop:

+ +
+ + +
use std::io::BufRead;
+use std::sync::mpsc::{Sender, channel};
+use std::thread;
+
+fn main() {
+    let worker = spawn_worker();
+
+    let stdin = ::std::io::stdin();
+    for line in stdin.lock().lines() {
+        let line = line.unwrap();
+        let msg = if line == "stop" {
+            Msg::Stop
+        } else {
+            Msg::Echo(line)
+        };
+
+        worker.send(msg)
+            .unwrap();
+    }
+
+    println!("Bye!");
+}
+
+enum Msg {
+    Echo(String),
+    Stop,
+}
+
+fn spawn_worker() -> Sender<Msg> {
+    let (tx, rx) = channel();
+    thread::spawn(move || {
+        loop {
+            let msg = rx.recv().unwrap();
+            match msg {
+                Msg::Echo(msg) => println!("{} ❤️", msg),
+                Msg::Stop => break,
+            }
+        }
+        println!("The worker has stopped!");
+    });
+    tx
+}
+ +
+

This works, but only partially:

+ +
+ + +
$ cargo r
+    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
+     Running `target/debug/worker`
+hello
+hello ❤️
+stop
+The worker has stopped!
+world
+thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: "SendError(..)"', /checkout/src/libcore/result.rs:916:5
+note: Run with `RUST_BACKTRACE=1` for a backtrace.
+ +
+

We can add more code to fix the panic, but lets stop for a moment and try +to invent a more elegant way to stop the worker. The answer will be below this +beautiful Ukiyo-e print :-)

+ +
+ + +
+
+
+ +

+ Dropping the microphone +

+

The answer is: the cleanest way to cancel something in Rust is to drop it. +For our task, we can stop the worker by dropping the Sender:

+ +
+ + +
use std::io::BufRead;
+use std::sync::mpsc::{Sender, channel};
+use std::thread;
+
+fn main() {
+    let mut worker = Some(spawn_worker());
+
+    let stdin = ::std::io::stdin();
+    for line in stdin.lock().lines() {
+        let line = line.unwrap();
+        if line == "stop" {
+            drop(worker.take());
+            continue
+        };
+
+        if let Some(ref worker) = worker {
+            worker.send(Msg::Echo(line)).unwrap();
+        } else {
+            println!("The worker has been stopped!");
+        };
+    }
+
+    println!("Bye!");
+}
+
+enum Msg {
+    Echo(String),
+}
+
+fn spawn_worker() -> Sender<Msg> {
+    let (tx, rx) = channel();
+    thread::spawn(move || {
+        while let Ok(msg) = rx.recv() {
+            match msg {
+                Msg::Echo(msg) => println!("{} ❤️", msg),
+            }
+        }
+        println!("The worker has stopped!");
+    });
+    tx
+}
+ +
+

Note the interesting parts of the solution:

+
    +
  • +no need to invent an additional message type, +
  • +
  • +the Sender is stored inside an Option, so that we can +drop it with the .take method, +
  • +
  • +the Option forces us to check if the worker is alive +before sending a message. +
  • +
+

More generally, previously the worker had two paths for termination: a normal +termination via the Stop message and an abnormal termination after a panic +in recv (which might happen if the parent thread panics and drops the Sender). +Now there is a single code path for both cases. That means we can be surer that if +something somewhere dies with a panic then the shutdown will proceed in an +orderly fashion, it is not a special case anymore.

+

The only thing left to make this ultimately neat is to replace a hand-written while let +with a for loop:

+ +
+ + +
for msg in rx {
+    match msg {
+        Msg::Echo(msg) => println!("{} ❤️", msg),
+    }
+}
+ +
+
+
+ +

+ Am I awaited? +

+

Its interesting to see that the same pattern applies to the async version of the +solution as well.

+

Async baseline:

+ +
+ + +
extern crate futures; // [dependencies] futures = "0.1"
+
+use std::io::BufRead;
+use std::thread;
+
+use futures::sync::mpsc::{Sender, channel};
+use futures::{Future, Stream, Sink};
+
+fn main() {
+    let mut worker = spawn_worker();
+
+    let stdin = ::std::io::stdin();
+    for line in stdin.lock().lines() {
+        let line = line.unwrap();
+        worker = worker.send(Msg::Echo(line)).wait().unwrap();
+    }
+
+    println!("Bye!");
+}
+
+enum Msg {
+    Echo(String),
+}
+
+fn spawn_worker() -> Sender<Msg> {
+    let (tx, rx) = channel(1);
+    thread::spawn(move || {
+        rx.for_each(|msg| {
+            match msg {
+                Msg::Echo(msg) => println!("{} ❤️", msg),
+            }
+            Ok(())
+        }).wait().unwrap()
+    });
+    tx
+}
+ +
+

Async with a termination message:

+ +
+ + +
extern crate futures; // [dependencies] futures = "0.1"
+
+use std::io::BufRead;
+use std::thread;
+
+use futures::sync::mpsc::{Sender, channel};
+use futures::{Future, Stream, Sink};
+
+fn main() {
+    let mut worker = spawn_worker();
+
+    let stdin = ::std::io::stdin();
+    for line in stdin.lock().lines() {
+        let line = line.unwrap();
+        let msg = if line == "stop" {
+            Msg::Stop
+        } else {
+            Msg::Echo(line)
+        };
+        worker = worker.send(msg).wait().unwrap();
+    }
+
+    println!("Bye!");
+}
+
+enum Msg {
+    Echo(String),
+    Stop,
+}
+
+fn spawn_worker() -> Sender<Msg> {
+    let (tx, rx) = channel(1);
+    thread::spawn(move || {
+        let _ = rx.for_each(|msg| {
+            match msg {
+                Msg::Echo(msg) => {
+                    println!("{} ❤️", msg);
+                    Ok(())
+                },
+                Msg::Stop => Err(()),
+            }
+        }).then(|result| {
+            println!("The worker has stopped!");
+            result
+        }).wait();
+    });
+    tx
+}
+ +
+ +
+ + +
$ cargo r
+    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
+     Running `target/debug/worker`
+hello
+hello ❤️
+stop
+The worker has stopped!
+world
+thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: SendError("...")', /checkout/src/libcore/result.rs:916:5
+note: Run with `RUST_BACKTRACE=1` for a backtrace.
+ +
+

Async with drop:

+ +
+ + +
extern crate futures; // [dependencies] futures = "0.1"
+
+use std::io::BufRead;
+use std::thread;
+
+use futures::sync::mpsc::{Sender, channel};
+use futures::{Future, Stream, Sink};
+
+fn main() {
+    let mut worker = Some(spawn_worker());
+
+    let stdin = ::std::io::stdin();
+    for line in stdin.lock().lines() {
+        let line = line.unwrap();
+        if line == "stop" {
+            drop(worker.take());
+            continue;
+        };
+
+        if let Some(w) = worker {
+            worker = Some(w.send(Msg::Echo(line)).wait().unwrap())
+        } else {
+            println!("The worker has been stopped!");
+        }
+    }
+
+    println!("Bye!");
+}
+
+enum Msg {
+    Echo(String),
+}
+
+fn spawn_worker() -> Sender<Msg> {
+    let (tx, rx) = channel(1);
+    thread::spawn(move || {
+        rx.for_each(|msg| {
+            match msg {
+                Msg::Echo(msg) => println!("{} ❤️", msg),
+            }
+            Ok(())
+        }).map(|()| {
+            println!("The worker has stopped!");
+        }).wait().unwrap();
+    });
+    tx
+}
+ +
+ +
+ + +
$ cargo r
+    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
+     Running `target/debug/worker`
+hello
+hello ❤️
+stop
+The worker has stopped!
+world
+The worker has been stopped!
+Bye!
+ +
+
+
+ +

+ Conclusion +

+

So, yeah, this all was written just to say in Rust, cancellation is drop :-)

+

Discussion on /r/rust.

+
+
+
+ + + + + diff --git a/2018/05/03/effective-pull-requests.html b/2018/05/03/effective-pull-requests.html new file mode 100644 index 00000000..560a2025 --- /dev/null +++ b/2018/05/03/effective-pull-requests.html @@ -0,0 +1,277 @@ + + + + + + + Effective Pull Requests + + + + + + + + + + + + +
+ +
+ +
+
+ +

Effective Pull Requests

+

Recently Ive been sending a lot of pull requests to various GitHub-hosted +projects. It had been a lot of trial and error before I settled on the git +workflow which doesnt involve Nah, Ill just rm -rf this folder and do a +fresh git clone somewhere. This post documents the workflow. In a nutshell, +it is

+ +

Note that hub utility exist to handle these issues +automatically. I personally havent used for no real reason, you definitely +should check it out!

+
+ +

+ Avoiding the master branch +

+

The natural thing to do, when sending a pull request, is to fork the upstream +repository, git clone your fork locally, make a fix, git commit -am and +git push it to the master branch of your fork and then send a PR.

+

It even seems to work at first, but breaks down in these two cases:

+
    +
  • +

    You want to send a second PR, and now you dont have a clean branch +to base your work off.

    +
  • +
  • +

    The upstream was updated, your PR does not merge cleanly anymore, +you need to do a rebase, but you dont have a clean branch to rebase +onto.

    +
  • +
+

Tip 1: always start with creating a feature branch for PR:

+ +
+ + +
$ git clone git@github.com:matklad/cargo.git && cd cargo
+$ git checkout -b long-and-descriptive-name-of-the-pr-branch
+$ $EDITOR hack-hack-hack
+ +
+

However it is easy to forget this step, so it is important to be able +to move to a separate branch after you erroneously committed code to +master. It is also crucial to reset master to clean state, otherwise +youll face some bewildering merge conflicts, when you try to update +your fork several days later.

+

Tip 2: dont forget to reset master after a mix-up:

+ +
+ + +
$ git clone git@github.com:matklad/cargo.git && cd cargo
+$ $EDITOR hack-hack-hack
+$ git commit -am'A very important fix'
+$ echo "urgh, should have done this on a separate branch"
+
+$ git branch pr-branch
+$ git reset --hard origin/master
+$ git checkout pr-branch
+ +
+

Update: Ive learned that magit has a dedicated utility for this create a +branch and reset master to a clean state workflow git spinoff. +My implementation is here.

+
+
+ +

+ Syncing with upstream +

+

If you work regularly on a particular project, youd want to keep your +fork in sync with upstream repository. One way to do that would be to +add upstream repository as a git remote, and set the local master +branch to track the master from upstream:

+

Tip 3: tracking remote repository

+ +
+ + +
$ git clone git@github.com:matklad/cargo.git && cd cargo
+$ git remote add upstream git@github.com:rust-lang/cargo.git
+$ git fetch remote
+$ git branch --set-upstream-to=upstream/master
+ +
+

With this setup, you can easily update your pull request if they dont +merge cleanly because of upstream changes:

+

Tip 4: updating a PR

+ +
+ + +
$ git checkout master && git pull --rebase
+$ git checkout pr-branch
+$ git rebase master
+ +
+

Update: worth automating as well, heres my git +refresh

+
+
+ +

+ Automating +

+

There are several steps to get the repo setup just right, and doing it +manually every time would lead to errors and mysterious merge +conflicts. It might be useful to define a shell function to do this +for you! It could look like this

+ +
+ + +
# calling `gcf rust-lang/cargo` would clone github.com/matklad/cargo,
+# and setup upstream properly
+function gcf() {
+    local userrepo=$1
+    local repo=`basename $userrepo`
+    git clone git@github.com:matklad/$repo.git
+    pushd $repo
+    git remote add upstream git@github.com:$userrepo.git
+    git fetch upstream
+    git checkout master
+    git branch --set-upstream-to=upstream/master
+    git pull --rebase --force
+    popd
+}
+ +
+
+
+ +

+ Bonus points +

+

Bonus 1: another useful function to have is for reviewing PRs:

+ +
+ + +
# called like `gpr 9262`, this function would checkout
+# GitHub pull request #9262 to `pr-9262` branch
+function gpr() {
+    local pr=$1
+    git fetch upstream pull/$pr/head:pr-$pr
+    git checkout pr-$pr
+}
+ +
+

Bonus 2:
+There are a lot of learning materials about Git out there. However, a +lot of these materials are either comprehensive references, or just present a +handful of most useful git commands. Ive once accidentally stumbled upon +Git from the bottom up and I +highly recommend reading it: it is a moderately long article, which explains the +inner mechanics of Git.

+
+
+
+ + + + + diff --git a/2018/05/04/encapsulating-lifetime-of-the-field.html b/2018/05/04/encapsulating-lifetime-of-the-field.html new file mode 100644 index 00000000..d8ee7e61 --- /dev/null +++ b/2018/05/04/encapsulating-lifetime-of-the-field.html @@ -0,0 +1,552 @@ + + + + + + + Encapsulating Lifetime of the Field + + + + + + + + + + + + +
+ +
+ +
+
+ +

Encapsulating Lifetime of the Field

+

This is a post about an annoying Rust pattern and an annoying +workaround, without a good solution :)

+
+ +

+ Problem Statement +

+

Suppose you have some struct which holds some references inside. Now, +you want to store a reference to this structure inside some larger +struct. It could look like this:

+ +
+ + +
struct Foo<'a> {
+    buff: &'a String
+}
+
+struct Context<'f> {
+    foo: &'f Foo
+}
+ +
+

The code, as written, does not compile:

+ +
+ + +
error[E0106]: missing lifetime specifier
+ --> src/main.rs:8:14
+  |
+8 |     foo: &'f Foo
+  |              ^^^ expected lifetime parameter
+ +
+

To fix it, we need to get Foo an additional lifetime:

+ +
+ + +
struct Foo<'a> {
+    buff: &'a String
+}
+
+struct Context<'f, 'a: 'f> {
+    foo: &'f Foo<'a>
+}
+ +
+

And this is the problem which is the subject of this post. Although +Foo is supposed to be an implementation detail, its lifetime, 'a, +bleeds to Contexts interface, so most of the clients of Context +would need to name this lifetime together with 'a: 'f bound. Note +that this effect is transitive: in general, rust struct has to name +lifetimes of contained types, and their contained types, and their +contained types, But lets concentrate on this two-level example!

+

The question is, can we somehow hide this 'a from users of Context? Its +interesting that Ive first distilled this problem about half a year ago in this +urlo +post, +and today, while refactoring some of Cargo internals in +#5476 with +@dwijnand, Ive stumbled upon something, which +could be called a solution, if you squint hard enough.

+
+
+ +

+ Extended Example +

+

Lets create a somewhat longer example to check that lifetime setup +actually works out in practice.

+ +
+ + +
struct Foo<'a> {
+    buff: &'a String
+}
+
+impl<'a> Foo<'a> {
+    fn len(&self) -> usize {
+        self.buff.len()
+    }
+}
+
+struct Context<'f, 'a: 'f> {
+    foo: &'f Foo<'a>
+}
+
+// Note how we have to repeat ugly `'a: 'f` bound here!
+impl<'f, 'a: 'f> Context<'f, 'a> {
+    fn new(foo: &'f Foo<'a>) -> Self {
+        Context { foo }
+    }
+
+    fn len(&self) -> usize {
+        self.foo.len()
+    }
+}
+
+// Check, that we actually can create a `Context`
+// from `Foo` and call a method.
+fn test<'f, 'a>(foo: &'f Foo<'a>) {
+    let ctx = Context::new(foo);
+    ctx.len();
+}
+ +
+

playground

+
+
+ +

+ First fix +

+

The first natural idea is to try to use the same lifetime, 'f for +both & and Foo: it fits syntactically, so why not give it a try?

+ +
+ + +
struct Foo<'a> {
+    buff: &'a String
+}
+
+impl<'a> Foo<'a> {
+    fn len(&self) -> usize {
+        self.buff.len()
+    }
+}
+
+struct Context<'f> {
+    foo: &'f Foo<'f>
+}
+
+impl<'f> Context<'f> {
+    fn new<'a>(foo: &'f Foo<'a>) -> Self {
+        Context { foo }
+    }
+
+    fn len(&self) -> usize {
+        self.foo.len()
+    }
+}
+
+fn test<'f, 'a>(foo: &'f Foo<'a>) {
+    let ctx = Context::new(foo);
+    ctx.len();
+}
+ +
+

playground

+

Surprisingly, it works! Ill show a case where this approach breaks down +in a moment, but lets first understand why this works. The magic +happens in the new method, which could be written more explicitly as

+ +
+ + +
fn new<'a: 'f>(foo: &'f Foo<'a>) -> Self {
+    let foo1: &'f Foo<'f> = foo;
+    Context { foo: foo1 }
+}
+ +
+

Here, we assign a &'f Foo<'a> to a variable of a different type &'f +Foo<'f>. Why is this allowed? We use 'a lifetime in Foo only for +a shared reference. That means that Foo is +covariant over +'a. And that means that the compiler can use Foo<'a> instead of +Foo<'f> if 'a: 'f. In other words rustc is allowed to shorten the +lifetime.

+

Its interesting to note that the original new function didnt say +that 'a: 'f, although we had to add this bound to the impl block +explicitly. For functions, the compiler infers such bounds from +parameters.

+

Hopefully, Ive mixed polarity an even number of times in this +variance discussion :-)

+
+
+ +

+ Going invariant +

+

Lets throw a wrench in the works by adding some unique references:

+ +
+ + +
struct Foo<'a> {
+    buff: &'a mut String
+}
+
+impl<'a> Foo<'a> {
+    fn push(&mut self, c: char) {
+        self.buff.push(c)
+    }
+}
+
+struct Context<'f, 'a: 'f> {
+    foo: &'f mut  Foo<'a>
+}
+
+impl<'f, 'a: 'f> Context<'f, 'a> {
+    fn new(foo: &'f mut Foo<'a>) -> Self {
+        Context { foo }
+    }
+
+    fn push(&mut self, c: char) {
+        self.foo.push(c)
+    }
+}
+
+fn test<'f, 'a>(foo: &'f mut Foo<'a>) {
+    let mut ctx = Context::new(foo);
+    ctx.push('9');
+}
+ +
+

playground

+

Foo is now invariant, so the previous solution does not work:

+ +
+ + +
struct Context<'f> {
+    foo: &'f mut  Foo<'f>
+}
+
+impl<'f> Context<'f> {
+    fn new<'a: 'f>(foo: &'f mut Foo<'a>) -> Self {
+        let foo1: &'f mut Foo<'f> = foo;
+        Context { foo: foo1 }
+    }
+
+    fn push(&mut self, c: char) {
+        self.foo.push(c)
+    }
+}
+ +
+ +
+ + +
error[E0308]: mismatched types
+  --> src/main.rs:17:37
+   |
+17 |         let foo1: &'f mut Foo<'f> = foo;
+   |                                     ^^^ lifetime mismatch
+   |
+   = note: expected type `&'f mut Foo<'f>`
+              found type `&'f mut Foo<'a>`
+ +
+

playground

+
+
+ +

+ Unsheathing existentials +

+

Lets look again at the Context type:

+ +
+ + +
struct Context<'f, 'a: 'f> {
+    foo: &'f mut  Foo<'a>
+}
+ +
+

What we want to say is that, inside the Context, there is some +lifetime 'a which the consumers of Context need not care about, +because it outlives 'f anyway. I think that the syntax for that +would be something like

+ +
+ + +
struct Context<'f> {
+    foo: &'f mut for<'a: f> Foo<'a>
+}
+ +
+

Alas, for is supported only for traits and function pointers, and +there it has the opposite polarity of for all instead of exists, +so using it for a struct gives

+ +
+ + +
error[E0404]: expected trait, found struct `Foo`
+  --> src/main.rs:12:30
+   |
+12 |     foo: &'f mut for<'a: 'f> Foo<'a>
+   |                              ^^^^^^^ not a trait
+ +
+
+
+ +

+ A hack +

+

However, and this is what I realized reading the Cargos source code, +we can use a trait here!

+ +
+ + +
struct Foo<'a> {
+    buff: &'a mut String
+}
+
+impl<'a> Foo<'a> {
+    fn push(&mut self, c: char) {
+        self.buff.push(c)
+    }
+}
+
+trait Push {
+    fn push(&mut self, c: char);
+}
+
+impl<'a> Push for Foo<'a> {
+    fn push(&mut self, c: char) {
+        self.push(c)
+    }
+}
+
+struct Context<'f> {
+    foo: &'f mut (Push + 'f)
+}
+
+impl<'f> Context<'f> {
+    fn new<'a>(foo: &'f mut Foo<'a>) -> Self {
+        let foo: &'f mut Push = foo;
+        Context { foo }
+    }
+
+    fn push(&mut self, c: char) {
+        self.foo.push(c)
+    }
+}
+
+fn test<'f, 'a>(foo: &'f mut Foo<'a>) {
+    let mut ctx = Context::new(foo);
+    ctx.push('9');
+}
+ +
+

playground

+

Weve added a Push trait, which has the same interface as the Foo +struct, but is not parametrized over the lifetime. This is +possible because Foos interface doesnt actually depend on the 'a +lifetime. And this allows us to magically write foo: &'f mut (Push + 'f). +This + 'f is what hides 'a as some unknown lifetime, which outlives 'f.

+
+
+ +

+ A hack, refined +

+

There are many problems with the previous solution: it is ugly, +complicated and introduces dynamic dispatch. I dont know how to solve +those problems, so lets talk about something I know how to deal with +:-)

+

The Push trait duplicated the interface of the Foo struct. It +wasnt that bad, because Foo had only one method. But what if +Bar has a dozen of methods? Could we write a more general trait, +which gives us access to Foo directly? Looks like it is possible, at +least to some extent:

+ +
+ + +
struct Foo<'a> {
+    buff: &'a mut String
+}
+
+impl<'a> Foo<'a> {
+    fn push(&mut self, c: char) {
+        self.buff.push(c)
+    }
+}
+
+trait WithFoo {
+    fn with_foo<'f>(&'f mut self, f: &mut FnMut(&'f mut Foo));
+}
+
+impl<'a> WithFoo for Foo<'a> {
+    fn with_foo<'f>(&'f mut self, f: &mut FnMut(&'f mut Foo)) {
+        f(self)
+    }
+}
+
+struct Context<'f> {
+    foo: &'f mut (WithFoo + 'f)
+}
+
+impl<'f> Context<'f> {
+    fn new<'a>(foo: &'f mut Foo<'a>) -> Self {
+        let foo: &'f mut WithFoo = foo;
+        Context { foo }
+    }
+
+    fn push(&mut self, c: char) {
+        self.foo.with_foo(&mut |foo| foo.push(c))
+    }
+}
+
+fn test<'f, 'a>(foo: &'f mut Foo<'a>) {
+    let mut ctx = Context::new(foo);
+    ctx.push('9');
+}
+ +
+

playground

+

How does this work? Generally, we want to say that there exists some +lifetime 'a, which we know nothing about except that 'a: 'f. Rust +supports similar constructions only for functions, where for<'a> fn +foo(&'a i32) means that a function works for all lifetimes 'a. The +trick is to turn one into another! The desugared type of callback f, +is &mut for<'x> FnMut(&'f mut Foo<'x>). That is, it is a function +which accepts Foo with any lifetime. Given that callback, we are +able to feed our Foo with a particular lifetime to it.

+
+
+ +

+ Conclusion +

+

While the code examples in the post juggled Foos and Bars, the +core problem is real and greatly affects the design of Rust code. When +you add a lifetime to a struct, you poison it, and all structs which +contain it as a member need to declare this lifetime as well. I would +love to know a proper solution for this problem: the described trait +object workaround is closer to code golf than to the practical +approach.

+

Discussion on /r/rust.

+
+
+
+ + + + + diff --git a/2018/05/24/typed-key-pattern.html b/2018/05/24/typed-key-pattern.html new file mode 100644 index 00000000..92d9656f --- /dev/null +++ b/2018/05/24/typed-key-pattern.html @@ -0,0 +1,307 @@ + + + + + + + Typed Key Pattern + + + + + + + + + + + + +
+ +
+ +
+
+ +

Typed Key Pattern

+

In this post, Ill talk about a pattern for extracting values from a +weakly typed map. This pattern applies to all statically typed +languages, and even to dynamically typed ones, but the post is rather +Rust-specific.

+

Ive put together a small crate which implements the pattern:
+https://github.com/matklad/typed_key

+

If you want to skip all the +blah-blah-blah, you can dig right into the code & docs :)

+
+ +

+ The problem +

+

You have an untyped Map<String, Object> and you need to get a typed +Foo out of it by the "foo" key. The untyped map is often some kind +of configuration, like a JSON file, but it can be a real map with +type-erased Any objects as well.

+

In the common case of statically known configuration, the awesome +solution that Rust offers is serde. You stick derive(Deserialize) +in front of the Config struct and read it from JSON, YML, TOML or +even just environment variables!

+ +
+ + +
#[derive(Deserialize)]
+struct Config {
+    foo: Foo
+}
+
+fn parse_config(data: &str) -> Result<Config> {
+    let config = serde_json::from_str(data)?;
+    Ok(config)
+}
+ +
+

However, occasionally you cant use serde. Some of the cases where +this might happen are:

+
    +
  • +

    merging configuration from several sources, which requires writing a +non-trivial serde deserializer,

    +
  • +
  • +

    lazy deserialization, when you dont want to care about invalid values +until you actually use them,

    +
  • +
  • +

    extensible plugin architecture, where various independent modules +contribute options to a shared global config, and so the shape of +the config is not known upfront.

    +
  • +
  • +

    you are working with Any objects or otherwise dont do +serialization per se.

    +
  • +
+
+
+ +

+ Typical solutions +

+

The simplest approach here is to just grab an untyped object using a +string literal and specify its type on the call site:

+ +
+ + +
impl Config {
+    fn get<T: Deserialize>(&self, key: &str) -> Result<T> {
+        let json_value = self.map.get("key")
+            .ok_or_else(|| bail!("key is missing: `{}`", key))?;
+        Ok(T::deserialize(json_value)?)
+    }
+}
+
+...
+
+
+let foo = config.get::<Foo>("foo")?;
+ +
+

I actually think that this is a fine approach as long as such snippets +are confined within a single module.

+

One possible way to make it better is to extract "foo" constant to a +variable:

+ +
+ + +
const FOO: &str = "foo";
+
+...
+
+let foo = config.get::<Foo>(FOO)?;
+ +
+

This does bring certain benefits:

+
    +
  • +

    fewer places to make a typo in,

    +
  • +
  • +

    behavior is moved from the code (.get("foo")) into data (const FOO), which +makes it easier to reason about the code (at a glance, you can see all available +config option and get an idea why they might be useful),

    +
  • +
  • +

    theres now an obvious place to document keys: write a doc-comment for a +constant.

    +
  • +
+

While great in theory, I personally feel that this usually brings little +tangible benefit in most cases, especially if some constants are used only once. +This is the case where the implementation, a literal "foo", is more clear than +the abstraction, a constant FOO.

+
+
+ +

+ Adding types +

+

However, the last pattern can become much more powerful and +interesting if we associate types with string constants. The idea is +to encode that the "foo" key can be used to extract an object of +type Foo, and make it impossible to use it for, say, +Vec<String>. To do this, well need a pinch of +PhantomData:

+ +
+ + +
pub struct Key<T> {
+    name: &'static str,
+    marker: PhantomData<T>,
+}
+
+impl<T> Key<T> {
+    pub const fn new(name: &'static str) -> Key<T> {
+        Key { name, marker: PhantomData }
+    }
+
+    pub fn name(&self) -> &'static str {
+        self.name
+    }
+}
+ +
+

Now, we can add type knowledge to the "foo" literal:

+ +
+ + +
const FOO: Key<Foo> = Key::new("foo");
+ +
+

And we can take advantage of this in the get method:

+ +
+ + +
impl Config {
+    fn get<T: Deserialize>(&self, key: Key<T>) -> Result<T> {
+        let json_value = self.map.get(key.name())
+            .ok_or_else(|| bail!("key is missing: `{}`", key))?;
+        Ok(T::deserialize(json_value)?)
+    }
+}
+
+...
+
+
+let foo = config.get(FOO)?;
+ +
+

Note how we were able to get rid of the turbofish at the call-site! +Moreover, the understandably aspect of the previous pattern is also +enhanced: if you know both the type and the name of the config option, +you can pretty reliably predict how it is going to be used.

+
+
+ +

+ Pattern in the wild +

+

Ive first encountered this pattern in IntelliJ code. It uses +UserDataHolder, which is basically Map<String, Object>, everywhere. +It helps plugin authors to extend built-in objects in crazy ways but is rather +hard to reason about, and type-safety improves the situation a lot. Ive also +changed Exonums config to employ this pattern in this PR. It also was a +case of plugin extensible, where an upfront definition of all configuration +option is impossible.

+

Finally, Ive written a small crate for this typed_key :)

+

Discussion on /r/rust.

+
+
+
+ + + + + diff --git a/2018/06/04/newtype-index-pattern.html b/2018/06/04/newtype-index-pattern.html new file mode 100644 index 00000000..c7cdc9df --- /dev/null +++ b/2018/06/04/newtype-index-pattern.html @@ -0,0 +1,331 @@ + + + + + + + Newtype Index Pattern + + + + + + + + + + + + +
+ +
+ +
+
+ +

Newtype Index Pattern

+

Similarly to the previous post, we will once again add types to the Rust +code which works perfectly fine without them. This time, well try to improve +the pervasive pattern of using indexes to manage cyclic data structures.

+
+ +

+ The problem +

+

Often one wants to work with a data structure which contains a cycle +of some form: object foo references bar, which references baz +which references foo again. The textbook example here is a graph of +vertices and edges. In practice, however, true graphs are a rare +encounter. Instead, you are more likely to see a tree with parent +pointers, which contains a lot of trivial cycles. And sometimes cyclic +graphs are implicit: an Employee can be the head of a Departement, +and Departement has a Vec<Employee> personal. This is sort-of a +graph in disguise: in usual graphs, all vertices are of the same type, +and here Employee and Departement are different types.

+

Working with such data structures is hard in any language. To arrive +at a situation when A points to B which points back to A, some +form of mutability is required. Indeed, either A or B must be +created first, and so it can not point to the other immediately after +construction. You can paper over this mutability with let rec, as in +OCaml, or with laziness, as in Haskell, but it is still there.

+

Rust tends to surface subtle problems in the form of compile-time +errors, so implementing such graphs in Rust is challenging. The three +usual approaches are:

+
    +
  • +reference counting, explanation by nrc, +
  • +
  • +arena and real cyclic references, explanation by +simonsapin (this one is really neat!), +
  • +
  • +arena and integer indices, explanation by nikomatsakis. +
  • +
+

(apparently, rewriting a Haskell monad tutorial in Rust results in a +graphs blog post).

+

I personally like the indexing approach the most. However it presents +an interesting readability challenge. With references, you have a +foo of type &Foo, and it is immediately clear what that foo is, +and what you can do with it. With indexes, however, you have a foo: +usize, and it is not obvious that you somehow can get a Foo. Even +worse, if indexes are used for two types of objects, like Foo and +Bar, you may end up with thing: usize. While writing the code with +usize actually works pretty well (I dont think Ive ever used the +wrong index type), reading it later is more complicated, because +usize is much less suggestive of what you could do.

+
+
+ +

+ Newtype trick +

+

One way to ameliorate this problem is to introduce a newtype wrapper +around usize:

+ +
+ + +
struct Foo;
+
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+struct FooIdx(usize);
+
+struct Arena {
+    foos: Vec<Foo>,
+}
+
+impl Arena {
+    fn foo(&self, foo: FooIdx) -> &Foo {
+        &self.foos[foo.0]
+    }
+}
+ +
+

Here, one should use FooIdx to index into Vec<Foo> is still just +a convention. A cool thing about Rust is that we can turn this +convention into a property verified during type checking. By adding an +appropriate impl, we should be able to index into Vec<Foo> with +FooIdx directly:

+ +
+ + +
#[test]
+fn direct_indexing(foos: Vec<Foo>, idx: FooIdx) {
+    let _foo: &Foo = &foos[idx];
+}
+ +
+

The impl would look like this:

+ +
+ + +
use std::ops;
+
+impl ops::Index<FooIdx> for Vec<Foo> {
+    type Output = Foo;
+
+    fn index(&self, index: FooIdx) -> &Foo {
+        &self[index.0]
+    }
+}
+ +
+
+
+ +

+ Coherence +

+

Its insightful to study why this impl is allowed. In Rust, types, +traits and impls are separate. This creates a room for a problem: what +if there are two impl blocks for a given (trait, type) pair? The +obvious choice is to forbid to have two impls in the first place, and +this is what Rust does.

+

Actually enforcing this restriction is tricky! The simplest rule of +“error if a set of crates currently compiled contains duplicate impls” +has severe drawbacks. First of all, this is a global check, which +requires the knowledge of all compiled crates. This postpones the +check until the later stages of compilation. It also plays awfully +with dependencies, because two completely unrelated crates might fail +the compilation if present simultaneously. Whats more, it doesnt +actually solve the problem, because the compiler does not necessary +know the set of all crates beforehand. For example, you may load +additional code at runtime via dynamic libraries, and silent bad +things might happen if you program and dynamic library have duplicate +impls.

+

To be able to combine crates freely, we want a much stronger property: +not only the set of crates currently compiled, but all existing and +even future crates must not violate the one impl restriction. How on +earth is it possible to check this? Should cargo publish look for +conflicting impls across all of the crates.io?

+

Luckily, and this is stunningly beautiful, it is possible to loosen +this world-global property to a local one. In the simplest form, we +can place a restriction that impl Foo for Bar can appear either in +the crate that defines Foo, or in the one that defines +Bar. Crucially, whichever one defines the impl has to use the other, +which makes it possible to detect the conflict.

+

This is all really nifty, but weve just defined an Index impl for +Vec, and both Index and Vec are from the standard library! How +is it possible? The trick is that Index has a type parameter: trait +Index<Idx: ?Sized>. It is a template for a trait of sorts, and we get +a real trait when we substitute type parameter with a type. Because +FooIdx is a local type, the resulting Index<FromIdx> trait is also +considered local. The precise rules here are quite tricky, this +RFC explains them pretty well.

+
+
+ +

+ More impls +

+

Because Index<FooIdx> and Index<BarIdx> are different traits, one +type can implement both of them. This is convenient for containers +which hold distinct types:

+ +
+ + +
struct Arena {
+    foos: Vec<Foo>,
+    bars: Vec<Bar>,
+}
+
+impl ops::Index<FooIdx> for Arena { ... }
+
+impl ops::Index<BarIdx> for Arena { ... }
+ +
+

Its also helpful to define arithmetic operations and conversions for +the newtyped indexes. Ive put together a +typed_index_derive crate to automate this boilerplate via a +proc macro, the end result looks like this:

+ +
+ + +
#[macro_use]
+extern crate typed_index_derive;
+
+struct Spam(String);
+
+#[derive(
+    // Usual derives for plain old data
+    Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash,
+
+    TypedIndex
+)]
+#[typed_index(Spam)] // index into `&[Spam]`
+struct SpamIdx(usize); // could be `u32` instead of `usize`
+
+fn main() {
+    let spams = vec![Spam("foo".into()), Spam("bar".into()), Spam("baz".into())];
+
+    // Conversions between `usize` and `SpamIdx`
+    let idx: SpamIdx = 1.into();
+    assert_eq!(usize::from(idx), 1);
+
+    // Indexing `Vec<Spam>` with `SpamIdx`, `IndexMut` works as well
+    assert_eq!(&spams[idx].0, "bar");
+
+    // Indexing `Vec<usize>` is rightfully forbidden
+    // vec![1, 2, 3][idx]
+    // error: slice indices are of type `usize` or ranges of `usize`
+
+    // It is possible to  add/subtract `usize` from an index
+    assert_eq!(&spams[idx - 1].0, "foo");
+
+    // The difference between two indices is `usize`
+    assert_eq!(idx - idx, 0usize);
+}
+ +
+

Discussion on /r/rust.

+
+
+
+ + + + + diff --git a/2018/06/06/modern-parser-generator.html b/2018/06/06/modern-parser-generator.html new file mode 100644 index 00000000..4272e3f7 --- /dev/null +++ b/2018/06/06/modern-parser-generator.html @@ -0,0 +1,771 @@ + + + + + + + Modern Parser Generator + + + + + + + + + + + + +
+ +
+ +
+
+ +

Modern Parser Generator

+

Hi! During the last couple of years, Ive spent a lot of time writing +parsers and parser generators, and I want to write down my thoughts +about this topic. Specifically, I want to describe some properties of +a parser generator that I would enjoy using. Note that this is not an +“introduction to parsing blog post, some prior knowledge is assumed.

+

Why do I care about this at all? The broad reason is that today a lot +of tools and even most editors use regular expressions to +approximately parse programming languages, and I find this outright +b҉a͡rb̢ari͞c͘. I understand +that in practice parsing is not as easy as it is in theory:

+ +
+

Law: You cant check code you cant parse. Checking code deeply requires +understanding the codes semantics. The most basic requirement is that you parse +it. Parsing is considered a solved problem. Unfortunately, this view is naïve, +rooted in the widely believed myth that programming languages exist.

+
+
A few billion lines of code later
+
+

However, I do believe we could do better if we use better tools!

+

The specific reason is that I care way too much about the Rust +programming language and

+ +

Ive used various parser generators, implemented one, +fall, and still havent met a parser generator +that I love.

+

The post is split into three major chapters:

+ +

Ill be using a rather direct and assertive language in the following, +but the fact is I am totally not sure about anything written here, and +would love to know more about alternatives!

+
+ +

+ UX +

+

Although this text is written in Emacs, I strongly believe that a +semantic-based, reliable, and fast support from tooling is a great +boon to learnability and productivity. A great IDE support is a must +for a modern parser generator, and this chapter talks mostly about +IDE-related features.

+

The most important productivity boost of a parser generator is the +ability to fiddle with grammar interactively. The UI for this might +look as a three-pane view, where the grammar is on the first pane, +example code to parse is in the second pane and the resulting parse +tree is in the third one. Editing first two panes should reactively +update the last one. This is difficult to implement with most +yacc-like parser generators, Ill talk more about it in the next +section.

+

The second most important feature is inline tests: for complex +grammars it could be really hard to map from a particular rule +specification to actual code that is parsed by the rule. Having a test +written alongside the rule is invaluable! The test should be just a +snippet of code in the target language. The gold value of the parse +tree for the snippet should be saved in the file alongside the grammar +and should be updated automatically when the grammar changes. Having +inline tests allows to fit the three pane UI from the previous into +two panes because you can just use the test as your second pane.

+

Heres a video that shows how it works in fall: https://youtu.be/gb1MJnTcvds.

+

Note that even if you write your parser by hand, you still should use such +“inline tests. To do so, write them as comments with special markers, and write +a small script which extracts such comments and turns them into tests proper. +Heres an +example +from one experimental hand-written parser of mine. Having such examples of what +does this if parses? greatly simplifies reading of parsers code!

+

Heres the list of important misc IDE features, from super important to very +important. They are not specific to parser generators, so, if you are using a +parser generator to implement IDE support for your language, look into these +first!

+
    +
  • +

    Extend selection to the enclosing syntactic structure (and not just +to a braced block). A super simple feature, but this combined with +multiple cursors is arguably more powerful than vims text objects, +and most definitely easier to use.

    +
  • +
  • +

    Fuzzy search of symbols in the current file/in the project: super +handy for navigation, both more important and easier to implement +than goto definition.

    +
  • +
  • +

    Precise syntax highlighting. Highlighting is not a super-important +feature and actually works ok even with regex approximations, but +if you already have the syntax tree, then why not use it?

    +
  • +
  • +

    Go to definition/find references.

    +
  • +
  • +

    Errors and warnings inline, with fixes if available.

    +
  • +
  • +

    Extract rule refactoring, pairs well with extend selection.

    +
  • +
  • +

    Code formatting.

    +
  • +
  • +

    Smart typing: indenting code on Enter, adding/removing trailing +commas when joining/splitting lines, and in general auto magically +fixing punctuation.

    +
  • +
  • +

    Code completion: although for parser generators dumb word-based +completion tends to work OK.

    +
  • +
+

Heres a short demo of some of these features in fall: https://youtu.be/WRWmwfBLf7o.

+

I want to emphasize that most of these features are ridiculously easy to +implement, if you have a parse tree for your language. Take, for example, fuzzy +search of symbols in the project. This is a super awesome feature for +navigation. Basically, it is CTAGS done right: first, you parse each file (in +parallel) and build a list of symbols for it. Then, as user types, you +incrementally update the changed files. Using fall, Ive implemented this +feature for Rust, and it took me three small files:

+
    +
  • +

    find_symbols.rs +to extract symbols from a single file, 21(!) lines.

    +
  • +
  • +

    indxr.rs, +a generic infra to watch files for changes and recompute the index incrementally, 155 lines.

    +
  • +
  • +

    symbol_index.rs +glues the previous two together, and adds +fst by ever-awesome BurntSushi +on top for fuzzy search, 122 lines.

    +
  • +
+

This is actually practical: initial indexing of rust-lang/rust repo +takes about 30 seconds using a single core and falls ridiculously +slow parser, and after that everything just works:

+

https://youtu.be/KyUUDcnOvUw

+

A small note on how to pack all this IDE functionality: make a library. That +way, anyone could use it anywhere. For example, as a web-assembly module in the +online version. On top of the library you could implement whatever protocol you +like, Microsofts LSP, or some custom one. If you go the protocol-first way, +using your code outside of certain editors could be harder.

+
+
+ +

+ API +

+
+ +

+ Parse Tree +

+

Traditionally, parser generators work by allowing the user to specify +custom code for each rule, which is then copy-pasted into the +generated parser. This is typically used to construct an abstract +syntax tree, but could be used, for example, to evaluate arithmetic +expressions during parsing.

+

I dont think this is the right API for the parser generator for three +reasons though.

+

It feels like a layering violation because it allows to intermix parsing with +basically everything else. You can literally do code-generation during parsing. +It makes things like +the lexer hack possible.

+

It would be very hard to implement reactive rendering of the parse +tree if the result of parsing is some user-defined type.

+

Most importantly, I dont think that producing abstract syntax +tree as a result of parsing is the right choice. The problem with AST +is that it, by definition, loses information. The most commonly lost +things are whitespace and comments. While they are not important for a +command-line batch compiler, they are crucial for IDEs, which work +very close to the original source code. Another important IDE-specific +aspect is support for incomplete code. If a function is missing a body +and a closing parenthesis on the parameter list, its still better be +recognized as a function. Its difficult to support such missing +pieces in traditional AST.

+

I am pretty confident that a better API for the generated parser is to +produce a parse tree which losslessly represents both the input text +and associated tree structure. Losslessness is a very important +property: it guarantees that we could implement anything in principle.

+

Ive outlined one possible design of such lossless representation in the +libsyntax2 RFC, the simplified +version looks like this:

+ +
+ + +
struct Kind(u32);
+
+struct Node {
+    kind: Kind,
+    span: (usize, usize),
+    children: Vec<Node>,
+}
+ +
+

That is, the result of parsing is a homogeneous tree, with nodes +having two bits of information besides the children:

+
    +
  • +

    Type of a node: is it a function definition, a parameter, a +comment?

    +
  • +
  • +

    Region of the source text covered by the node.

    +
  • +
+

A cool thing about such representation is that every language uses +the same type of the syntax tree. In fall features like extend +selection are implemented once and work for all languages.

+

If you need it, you can do the conversion to AST in a separate +pass. Alternatively, its possible to layer AST on top of the +homogeneous tree, using newtype wrappers like

+ +
+ + +
// invariant: Node.kind == STRUCT_DEF
+struct StructDef(Node);
+
+// invariant: Node.kind == STRUCT_FIELD
+struct StructField(Node);
+
+impl StructDef {
+    fn fields(&self) -> Vec<StructField> {
+        self.0.children.iter().filer(|c| c.kind == STRUCT_FIELD)
+            .map(StructField)
+            .collect()
+    }
+}
+ +
+

Parser generator should automatically generate such AST wrappers. However, it +shouldnt directly infer them from the grammar: not every node kind needs an AST +wrapper, and method names are important. Better to let the user specify AST +structure separately, and check that AST and parse tree agree. As an example +from fall, here is the +grammar rule for Rust paths, the corresponding +ast definition, and the +generated code.

+
+
+ +

+ Incremental Reparsing +

+

Another important feature for modern parser generator is support for +incremental reparsing, which is obviously useful for IDEs.

+

One thing that greatly helps here is the split between parser and +lexer phases.

+

It is much simpler (and more efficient) to make lexing +incremental. When lexing, almost any change affects at most a couple +of tokens, so in theory incremental lexing could be pretty +efficient. Beware though that worst-case relexing still has to be +linear, because insertion of unclosed quote changes all the following +tokens.

+

In contrast, it is much easier to change tree structure significantly +with a small edit, which places upper-bound on incremental reparsing +effectiveness. Besides, making parsing incremental is more complicated +because you have to deal with trees instead of a linear structure.

+

An interesting middle ground here is an incremental lexer combined +with a fast non-incremental parser.

+
+
+ +

+ Lexer +

+

Traditional lex-style lexers struggle with special cases like ml-style +properly nested comments or Rust raw literals which are even not +context-free. +The problem is typically solved by injecting custom code into lexer, +which maintains some sort of state, like a nesting level of +comments. In my experience, making this work properly is very +frustrating.

+

These two tricks may make writing lexer simpler.

+

Instead of supporting lexer states and injecting custom code, allow to pair +regex, which defines a token, with a function which takes a string slice and +outputs usize. If lexer matches such external token, it then calls supplied +function to determine the other end of the token. Heres an example from fall: +external +token, +custom +functions.

+

Often it is better to use layered languages instead of lexer +states. Parsing string literals is a great example of this. String +literals usually have some notion of a well-formed escape +sequence. The traditional approach to parsing string literals is to +switch to a separate lexer state after ", which handles +escapes. This is bad for error recovery: if theres a typo in an +escape sequence, it should still be possible to recognize literal +correctly. So alternative approach is to parse a string literal as, +basically, anything between two quotes, and then use a separate +lexer for escapes specifically later in the compiler pipeline.

+

Another interesting lexing problem which arises in practice is +context-sensitivity: things like contextual keywords or >> can +represent different token types, depending on the surrounding code. To +deal with this case nicely, the parser should support token +remapping. While most of the tokens appear in the final parse tree as +is, the parser should be able to, for example, substitute two > > +tokens with a single >>, so that later stages of compilation need +not to handle this special case.

+
+
+ +

+ Parser +

+

A nice trick to make parser more general and fast is not to construct +parse tree directly, but emit a stream of events like start internal +node, eat token, finish internal node. That way, parsing does not +itself allocate and, for example, you can use the stream of events to +patch an existing tree, doing minimal allocations. This also divorces +the parser from a particular tree structure, so it is easier to +plug-in different tree backends.

+

Events also help with reshuffling the tree structure. For example, +during event processing we can turn left-leaning trees to +right-leaning ones or flatten them into lists. Another interesting +form of tree reshuffling is attachment of comments. If a comment +immediately precedes some definition, it should be a part of this +definition. This is not specified by the language, but it is the +result that human would expect. With events, we can handle only +significant tokens to the parser and deal with attaching comments and +whitespace when reconstructing tree from a flat list of events.

+
+
+ +

+ Miscellaneous concerns +

+

To properly implement incremental reparsing, we should start with a +data structure for text which is more efficient to update than +String. While we do have quite a few extremely high-quality +implementations of ropes, the ecosystem is critically missing a way to +talks about them generically. That is, theres no something like +Javas CharSequence in Rust (which needs a much more involved design +in Rust to avoid unnecessary overhead).

+

Luckily, the parse tree needs to remember only the offsets, so we can +avoid hard-coding a particular text representation, and we dont even +need a generic parameter for that.

+

Homogeneous trees make reactive testing of the grammar possible in +theory because you can always produce a text representation of a tree +from them. But in practice reactivity requires that read grammar, +compile parser, run it on input loop is fast. Literally generating +source code of the parser and then compiling it would be too slow, so +some kind of interpreted mode is required. However, this conflicts +with the need to be able to extend lexer with custom code. I dont +know of a great solution here, but something like this would work:

+
    +
  • +

    require that all lexer extensions are specified in the verbatim +block of the grammar file and dont have external dependencies,

    +
  • +
  • +

    for IDE support, compile the lexer, and only the lexer, in a temp +dir and communicate with it via IPC.

    +
  • +
+

A possible alternative is to use a different, approximate lexer for +interactive testing of the grammar. In my experience this makes such +testing almost useless because you get different results in +interesting cases and interesting cases are what is important for this +feature.

+

In IDEs, a surprisingly complicated problem is managing a list of open +and modified files, synchronizing them with the file system, providing +consistent file-system snapshots and making sure that things like +in-memory buffers are also possible. For parser generators, all this +complexity might be dodged by requiring that all of the grammar needs +to be specified in a single file.

+
+
+
+ +

+ Parsing Techniques +

+

So we want to write a parser generator that produces lossless parse +trees and which has an awesome IDE support. How do we actually parse +a text into a tree? Unfortunately, while there are many ways to parse +text, theres no accepted best one. Ill try to do a broad survey of +various options.

+

Id love to discuss the challenges of the textbook approach of just +using a context-free grammar/BNF notation. However, lets start with a +simpler, solved case: regular expressions.

+

Languages which could be described by regular expressions are called +regular. They are exactly the same languages which could be recognized +by finite state machines. These two definition mechanisms have nice +properties which explain the usefulness of regular languages in real +life:

+
    +
  • +

    Regular expressions map closely to our thinking and are easy for +humans to understand. Note that there are equivalent in power, but +much less natural meta-languages for describing regular +languages: raw finite state machines or regular grammars.

    +
  • +
  • +

    Finite state machines are easy for computers to execute. FSM is +just a program which is guaranteed to use constant amount of +memory.

    +
  • +
+

Regular languages are rather inexpressive, but they work great for +lexers. On the opposite side of expressivity spectrum are Turing +machines. For them, we also have a number of meta-languages (like +Rust), which work great for humans. Its interesting that a Turing +machine is equivalent to a finite state machine with a pair of stacks: +to get two stacks from a tape, cut the tape in half where the head +is. Moving the head then corresponds to popping from one stack and +pushing to another.

+

And the context-free languages, which are described by CFGs, are +exactly in between languages recognized by finite state machines and +languages recognized by Turing machines. You need a push-down +automaton, or a state machine with one stack, to recognize a +context-free language.

+

CFGs are powerful enough to describe arbitrary nesting structures and +seem to be a good fit for describing programming languages. However, +there are a couple of problems with CFGs. Lets write a grammar for +arithmetic expressions with additions, multiplications, parenthesis +and numbers. The obvious answer,

+ +
+ + +
E -> E + E | E * E | (E) | number
+ +
+

has a problem. It is under specified and does not tell if 1 + 2 * 3 +is (1 + 2) * 3 or 1 + (2 * 3). We need to tweak the grammar to get +rid of this ambiguity:

+ +
+ + +
E -> F | E + F
+F -> T | F * T
+T -> number | (E)
+ +
+

I think the necessity of such transformations is a problem! Humans dont think +like this: it took me three or four courses in formal grammars to really +internalize this transformation. And if we look at language references, well +typically see a +precedence +table instead of BNF.

+

Another problem here is that we even cant workaround ambiguity by +plainly forbidding it: checking if CFG is unambiguous is undecidable.

+

So CFGs turn out to be much less practical and simple than regular +expressions. What options do we have then?

+
+ +

+ Abandoning CFG +

+

The first choice is to parse something, not necessary a context-free +language. A good way to do it is to write a parser by hand. A +hand-written parser is usually called a recursive descent parser, but +in reality it includes two crucial techniques in addition to just +recursive descent. The pure recursive descent works by translating +grammar rules like T -> A B into a set of recursive functions:

+ +
+ + +
fn parse_t() {
+    parse_a();
+    parse_b();
+}
+ +
+

The theoretical problem here is that it cant deal with +left-recursion. That is, rules like Statements -> Statements ';' +OneStatement make recursive descent parser to loop infinitely. In +theory, this problem is solved by rewriting the grammar and +eliminating the left recursion. If you had a formal grammars class, +you probably have done this! In practice, this is a completely +non-existent problem, because we have loops:

+ +
+ + +
fn parse_statements() {
+    loop {
+        parse_one_statement();
+        if !parse_semicolon() {
+            break;
+        }
+    }
+}
+ +
+

The next problem with recursive descent is that parsing expressions with +precedence requires that weird grammar rewriting. Luckily, theres a simpler +technique to deal with expressions. Suppose you want to parse 1 + 2 * 3. One +way to do that would be to parse it with a loop as a list of atoms separated +by operators and then reconstruct a tree separately. If you fuse these two +stages together, you get a loop, which could recursively call itself and nest, +a +Pratt parser. Understanding it for the first time is hard, but you only need to +do it once :)

+

The most important feature of hand-written parsers is a great support +for error recovery and partial parses. It boils down to two simple +tricks.

+

If you are parsing a homogeneous sequence of things (i.e, you are inside the +loop), and the current token does not look like it can begin a new element, you +just skip over it and start the next iteration of the loop. Heres an +example +from Kotlin. At +this +line, well get null if current token could not begin a class member +declaration. +Here +we just skip over it.

+

If you are parsing a particular thing T, and you expect token foo, +but see bar, then, roughly:

+
    +
  • +if bar is not in the FOLLOW(T), you skip over it and emit error, +
  • +
  • +if bar is in FOLLOW(T), you emit error, but dont skip the +token. +
  • +
+

That way, parsing something like

+ +
+ + +
fn foo(
+
+struct S {
+   f: u32
+}
+ +
+

would correctly recognize incomplete function foo (again, its easier to +represent such incomplete function with homogeneous parse trees than with AST), +and a complete struct S. Heres another +example +from Kotlin.

+

Although hand-written parsers are good at producing high-quality error +messages as well, I dont think that this is important. In the IDE +context, for syntax errors it is much more important and beneficial to +get a red squiggly under the error immediately after youve typed +invalid code. Instantaneous feedback and precise location are, in my +personal experience, enough to fix syntax errors. The error message +can be just Syntax error, and more elaborate messages are often make +things worse because mapping from an error message to what is +actually wrong is harder than just typing and deleting stuff and +checking if it works.

+

It is possible to simplify authoring of this style of parsers by +generating all recursive functions, loop and Pratt parsers from +declarative BNF/PEG style description. This is what Grammar Kit and +fall do.

+
+
+ +

+ Embracing ambiguity +

+

Another choice is to stay within CFG class but avoid dealing with +ambiguity by producing all possible parse trees for a given +input. This is typically achieved using non-determinism and +memorization, using GLR and GLL style techniques.

+

Here Id like to call out +tree-sitter project, which actually +ticks quite a few boxes outlined in this blog post. In particular, it uses +homogeneous trees, is fully incremental and has surprisingly good support for +error recovery (though not quite as good as hand-written style parsers, at least +when Ive last checked it).

+
+
+ +

+ Abandoning generality +

+

Yet another choice is to give up full generality and restrict the +parser generator to a subset of unambiguous grammars, for which we +actually could verify the absence of ambiguity. This is how traditional +parser generators like yacc, happy, menhir or LALRPOP work.

+

The very important advantage of these parsers is that you get a strong +guarantee that the grammar works and does not have nasty +surprises. The price you have to pay, though, is that sometimes it is +necessary to tweak an already unambiguous grammar to make the stupid +tool understand that theres no ambiguity.

+

I also havent seen deterministic LR parsers with great support for +error recovery, but looks like it should be possible in theory? +Recursive descent parsers, which are more or less LL(1), recover from +errors splendidly, and LR(1) has strictly more information than an +LL(1) one.

+

So, what is the best choice for writing a parser/parser generator?

+

It seems to me that the two extremes are the most promising: hand +written parser gives you utmost control over everything, which is +important when you need to parse some language, not designed by you, +which is hostile to the usual parsing techniques. On the other hand, +classical LR-style parsers give you a proof that the grammar is +unambiguous, which is very useful if you are creating your own +language. Ultimately, I think that being able to produce lossless +parse trees supporting partial parses is more important than any +particular parsing technique, so perhaps supporting both approaches +with a single API is the right choice?

+
+
+
+ +

+ Conclusion +

+

This turned out to be a quite lengthy post, hope it was interesting! +These are the main points:

+
    +
  • +

    IDE support is important, for the parser generator itself as well as +for the target language.

    +
  • +
  • +

    Lossless parse trees are more general than ASTs and custom action +code, and are a better fit for IDEs.

    +
  • +
  • +

    Interactivity matters! Reactive grammar repl and inline tests rock!

    +
  • +
  • +

    Parsing is an unsolved problem :)

    +
  • +
+

Discussion on +/r/rust.

+
+
+
+ + + + + diff --git a/2018/06/18/a-trick-for-test-maintenance.html b/2018/06/18/a-trick-for-test-maintenance.html new file mode 100644 index 00000000..96eaaa90 --- /dev/null +++ b/2018/06/18/a-trick-for-test-maintenance.html @@ -0,0 +1,243 @@ + + + + + + + A Trick For Test Maintenance + + + + + + + + + + + + +
+ +
+ +
+
+ +

A Trick For Test Maintenance

+

This is a post about an interesting testing technique which feels like it should +be well known. However, I havent seen it mentioned anywhere. I dont even have +a good name for it, Ive semi-discovered it in the wild. If you know how this +thing is called, please leave a comment!

+ +
+ +

+ A long time ago +

+

I was reading Dart analysis server source code, and came across this +line. +Immediately I was struck as if by lighting. Well, not exactly in the same way, +but you get the idea.

+

What does this line do? I actually dont know, but I have a guess. My +explanation is further down (to give you a chance to discover the +trick as well!), but the general idea is that this line helps +tremendously with making tests more maintainable.

+
+
+ +

+ The two mundane problems +

+

Two tasks which programmers typically enjoy less than furiously +cranking out new features are maintaining existing code and writing +tests. And, as an old Russian joke says, maintaining tests is the +worst. Here are some pain points specific to the post:

+

Negative tests. You want to check that something does not +happen. Writing a test in this situation is tricky because the test +might actually pass for a trivial reason instead of the intended +one. The rule of thumb is to verify that the test actually fails if +the specific condition which it covers is commented out. The problem +with this rule of thumb is that it works in a single point in time. As +the code evolves, the test might begin to pass for a trivial reason.

+

Duplicated tests. Test suites are usually append-only and grow +indefinitely. Almost inevitably this leads to a situation where +different tests are testing essentially the same features, or where +one test is a superset of another.

+

Bifurcated suites. Somewhat similar to the previous point, you may +end up in a situation where a single component has two separate +test-suites in different parts of the code base. Id want to say that +this happens when two developers write tests independently, but +practice says that me and me one month later are enough to create such +a mess :)

+

Tests discoverability. This is a problem a new contributor usually +faces. Finding a piece of code where the bug fix should be applied is +usually comparatively easier than locating the corresponding tests.

+

The underlying issue is that it is non-trivial to answer these two +questions:

+
    +
  • +

    Given a line of code, where is the test for this specific line?

    +
  • +
  • +

    Given a test, where is the code that is being tested?

    +
  • +
+
+
+ +

+ The solution +

+

The beautiful solution to this problem (which I hypothesise the +_coverageMarker() line in Dart does) is to track code coverage on the +test-by-test basis. That is, when running a test, verify that +specific lines of code were covered by this test.

+

Ive put together a small Rust library to do this, called +uncover. It provides two macros: +covered_by and covers.

+

The first macro is used in the code under test, like +this:

+ +
+ + +
if !self.keys.is_empty() {
+    covered_by!("table_with_two_names");
+    panic!("table header is already specified, can't reset to {:?}", key)
+}
+ +
+

The second macro is used in the corresponding test:

+ +
+ + +
#[test]
+fn table_with_two_names() {
+    covers!("table_with_two_names");
+    let f = Factory::new();
+    check_panics(|| {
+        f.table()
+            .with_name("foo")
+            .with_name("bar")
+            .build();
+    })
+}
+ +
+

If the block where covers is used does not cause the execution of +the corresponding covered_by line then the error will be raised at +the end of the block.

+

Under the hood, this is implemented as a global HashMap<String, u64> which +counts how many times each line was executed. So covered_by! +increments +the corresponding count, and covers! returns a guard object that +checks +in Drop that the count was incremented. It is possible to disable these checks +at compile time. And yes, the library actually +exposes +a macro which defines macros :)

+

I havent had a chance to apply this technique in large projects (and +it is less useful for smaller ones), but it looks very promising.

+

Its now easy to navigate between code and tests: just ripgrep the +string literal (or write a plugin for this for your IDE). You will be +able to find the test for the specific if-branch! This should be +especially handy for new contributors.

+

If this technique is used pervasively, you also get an idea about the +overall test coverage.

+

During refactorings, you became aware of tests which might be +affected. Moreover, because coverage is actually checked by the tests +themselves, youll notice if some test stop to exercise the code it +was intended to check.

+

Once again, if you know how this thing is called, please do enlighten +me in comments! Discussion on /r/rust.

+
+
+
+ + + + + diff --git a/2018/07/24/exceptions-in-structured-concurrency.html b/2018/07/24/exceptions-in-structured-concurrency.html new file mode 100644 index 00000000..f015acc4 --- /dev/null +++ b/2018/07/24/exceptions-in-structured-concurrency.html @@ -0,0 +1,334 @@ + + + + + + + Exceptions vs Structured Concurrency + + + + + + + + + + + + +
+ +
+ +
+
+ +

Exceptions vs Structured Concurrency

+

This is partially a mild instance of xkcd://386 with +respect to the great dont +panic post by +@vorner (yes, its 2 am here) and partially a +discussion of error-handling in the framework of structured concurrency, which +was recently popularized by @njsmith.

+
+ +

+ Panics +

+

In the blog post, @vorner argues that unwinding sometimes may do more +harm than good, if it manages to break some unsafe invariants, +cross FFI boundary or put the application into an impossible state. I +fully agree that these all are indeed significant dangers of panics.

+

However, I dont think that just disabling unwinding and using panic += "abort" is the proper fix to the problem for the majority of use +cases. A lot of programs work in a series of requests and responses +(often implicit), and I argue that for this pattern it is desirable to +be able to handle bugs in requests gracefully.

+

Ive spent quite some time working on an +IDE, and, although it might not +be apparent on the first sight, IDEs are also based on requests/responses:

+
    +
  • +users types a character, IDE updates its internal data structures +
  • +
  • +users requests completion, IDE runs some calculations on the data +and gives results +
  • +
+

As IDEs are large and have a huge number of features, it is inevitable +that some not very important linting inspection will fail due to index +out of bounds access on this particular macro invocation in this +particular project. Killing the whole IDE process would definitely be +a bad user experience. On the other hand, just showing a non-modal +popup Something went wrong, would you like to submit a bug report is +usually only a minor irritation: errors are more common in the +numerous additional features, while the smaller core tends to be +more correct.

+

I do think that this pattern of show error message and chug along is +applicable to a significant number of applications. Of course, even in +this setting a bug in the code can in theory have dire consequences, +but in practice this is mitigated by the following:

+
    +
  • +

    Majority of requests are readonly and cant corrupt data.

    +
  • +
  • +

    The low-level implementation of write requests usually has a +relatively bug-free transnational semantics, so bugs in write +requests which lead to transaction aborts dont corrupt data as +well.

    +
  • +
  • +

    Most applications have some kind of backup/undo functionality, and +even if a bug leads to a commit of invalid data, user often can +restore good state (of course this works only for relatively +unimportant data).

    +
  • +
+

However, @vorner identifies a very interesting specific problem with +unwinding which I feel we should really try to solve better: if you +have a bunch of threads running, and one of them catches fire, what +happens? It turns out that often nothing particular happens: some more +threads might die from the poisoned mutexes and closed channels, but +other treads might continue, and, as a result the application will +exist in a half-dead state for indefinite period of time.

+
+
+ +

+ Structured Concurrency +

+

At this point, some of you might be silently screaming Erlang!:

+ +
+ +Destroy one of my processes & I will only grow stronger +
+

Source: http://leftoversalad.com/c/015_programmingpeople/

+

You are right! Erlang and especially OTP behaviors are great for managing errors +at scale. However a full actor system might be an overkill if all you want is +just an OS thread.

+

If you havent done this already, pack some snacks, prepare lots of coffee/tea +and do read the structured +concurrency +blog post. The crux of the pattern is to avoid fire and forget concurrency:

+ +
+ + +
use std::thread;
+
+fn unstructured() {
+    thread::spawn(|| {
+        do_stuff()
+    });
+    // The thread is "leaked" out of `unstructured` function
+}
+ +
+

Instead, each thread should be confined to some lexical scope and +never escape it:

+ +
+ + +
extern crate crossbeam;
+
+fn structured() {
+    crossbeam::scope(|scope| {
+        scope.spawn(|| {
+            do_stuff()
+        })
+    });
+    // The thread is finished and joined at this point.
+}
+ +
+

The benefit of this organization is that all threads form a tree, +which gives you greater control, because you know for sure which parts +are sequential and which are concurrent. Concurrency is explicitly +scoped.

+
+
+ +

+ Panics and Structured Concurrency +

+

And we have a really, really interesting API design problem if we +combine structured concurrency and unwinding. What should be the +behavior of the following program?

+ +
+ + +
fn everything_is_terrible() {
+    crossbeam::scope(|scope| {
+        scope.spawn(|| do_work());
+        scope.spawn(|| panic!("this hurts"));
+    });
+}
+ +
+

Now, for crossbeam specifically theres little choice here due to +the boring requirement for memory safety. But lets pretend for now +that this is a garbage collected language.

+

So, we have two concurrent threads in a single scope, one of which is +currently running and another one is, unfortunately, dead.

+

The most obvious choice is to wait for the running thread to finish +(we dont want to let it escape the scope) and then to reraise the +panic at scope exit. The problem with this approach is that theres a +potentially unbounded window between the instant the panic is created, +and its propagation.

+

This is not a theoretical concern: some time ago a friend of mine had +a fascinating debugging session with a Python machine learning +application. The program was processing a huge amount of data, so, to +speed things up, it partitioned the data and spawned a thread per +partition (actual processing was in native code, so GIL was avoided):

+ +
+ + +
with ThreadPoolExecutor() as executor:
+    futures = []
+    for task_type, hosts in reversed(tasks):
+        for task_id, _host in enumerate(hosts):
+            futures.append(
+                executor.submit(func, task_type, task_id))
+
+    # Re-raise the exception.
+    for future in as_completed(futures):
+        future.result()
+ +
+

The observed behavior was that a singe thread died, but no exception +or stack trace were printed anywhere. This was because the executor +was waiting for all other threads before propagating the +exception. Although technically the exception was not lost, in +practice youd have to wait for several hours to actually see it!

+

The Trio library uses an +interesting +refinement of this strategy: when one of the tasks in scope fails, all others +are immediately cancelled, and then awaited for. I think this should work well +for Trio, because it has first-class support for cancellation; any async +operation is a cancellation point. So all children tasks will be cancelled in a +timely manner, although I wouldnt be surprised if there are some pathological +cases where exception propagation is delayed.

+

Unfortunately, this solution doest work for native threads, because +there are just no good cancellation points. And I dont know of any +approach that would work :(

+

One vague idea I have is inspired by handling of orphaned processes in +Unix: if a thread in a scope dies, the scope is teared down +immediately, and all the running processes are attached to the value +that is thrown. If anyone wants to handle the failure, they must +wait for all attached threads to finish first. This way, the initial +panic and all in-progress threads could be propagated to the top-level +init scope, which then can attempt either a clean exit by waiting +for all children, or do a process::abort.

+

However this attachment to the parent violates the property that a +thread never leaves its original scope. Because crossbeam relies on +this property for memory safety, this approach is just not applicable +for threads which share stack data.

+

Its already 4 am here, so I really should be wrapping the post up :) +So, a challenge: design a Rust library for scoped concurrency based on +native OS threads that:

+
    +
  • +never looses a thread or a panic, +
  • +
  • +immediately propagates panics, +
  • +
  • +allows to (optionally?) share stack data between the threads. +
  • +
+

Discussion on r/rust.

+
+
+
+ + + + + diff --git a/2019/05/19/consider-using-asciidoctor-for-your-next-presentation.html b/2019/05/19/consider-using-asciidoctor-for-your-next-presentation.html new file mode 100644 index 00000000..424f5204 --- /dev/null +++ b/2019/05/19/consider-using-asciidoctor-for-your-next-presentation.html @@ -0,0 +1,370 @@ + + + + + + + Consider Using Asciidoctor for Your Next Presentation + + + + + + + + + + + + +
+ +
+ +
+
+ +

Consider Using Asciidoctor for Your Next Presentation

+

Ive spend years looking for a good tool to make slides. +Ive tried LaTeX Beamer, Google Docs, Slides.com and several reveal.js offsprings, but neither was satisfactory for me. +Last year, I stumbled upon Asciidoctor.js PDF (which had like three GitHub starts at that moment), and it is perfect.

+

At least, it is perfect for my use case, your requirements might be different. +I make presentations for teaching programming at Computer Science Center, so my slides are full of code, bullet lists, and sometimes have moderately complex layout. +To make reviewing course material easier, slides need to have high information density

+

If you want to cut down straight to the code, see the repository with slides for my Rust course:

+

http://github.com/matklad/rust-course

+

By the way, the sibling post talks about the course in more detail.

+
+ +

+ Requirements +

+

The specific things I want from the slides are:

+
    +
  • +A source markup language: I like to keep my slides on GitHub +
  • +
  • +Ease of styling and layout. +A good test here is two-column layout with code snippet on the left and a bullet list on the right +
  • +
  • +The final output should be a PDF. +I dont use animations, but I need exactly the same look of slides on different computers +
  • +
+

All the tools Ive tried dont quite fit the bill.

+

While TeX is good for formatting formulas, LaTeX is a relatively poor language for describing the structure of the document. +Awesome Emacs mode fixes the issue partially, but still, \begin{itemize} is way to complex for a bullet list. +Additionally, quality of implementation is not perfect: unicode support needs opt-in, and the build process is fiddly.

+

Google Docs and Slides.com are pretty solid choices if you want WYSWIG. +In fact, I primarily used these two tools before AsciiDoctor. +However WYSWIG and limited flexibility which come with it are significant drawbacks

+

I think Ive never made a serious presentation in any of the JavaScript presentation frameworks. +Ive definitely tried reveal.js, remark and shower, but turned back to Google Docs in the end. +The two main reasons for this were:

+
    +
  • +Less than ideal source language: +
      +
    • +if it is Markdown, I struggled with creating complex layouts like the two column one; +
    • +
    • +if it is HTML, simple things like bullet lists or emphasis are hard. +
    • +
    +
  • +
  • +Cross browser CSS. +These frameworks pack a lot of JS and CSS, which I dont really need, but which makes tweaking stuff difficult for me, as I am not a professional web developer. +
  • +
+
+
+ +

+ AsciiDoc Language +

+

The killer feature behind Asciidoctor.js PDF is the AsciiDoc markup language. +Like Markdown, its a lightweight markup language. +When I was translating this blog from .md to .adoc the only significant change in the syntax was for links, from

+ +
+ + +
[some link](http://example.com)
+ +
+

to

+ +
+ + +
[some link](http://example.com)
+ +
+

However, unlike Markdown and LaTeX, AsciiDoc has native support for rich hierarchical document model. +AsciiDoc source is parsed into a tree of nested elements with attributes (historically, AsciiDoc was created as an easier way to author DocBook XML). +This allows to express complex document structure without ad-hoc syntax extensions. +Additionally, the concrete syntax feels very orthogonal and well rounded up. +Weve seen the syntax for links before, and this is how one includes an image:

+ +
+ + +
image::assets/logo.svg[alt text]
+ +
+

Or a snippet from another file:

+ +
+ + +
include::code_samples/worker.rs[]
+ +
+

A couple of more examples, just to whet your appetite (Asciidoctor has extensive documentation)

+ +
+
Paragraphs
+ + +
This is a paragraph
+
+[.lead]
+This is a paragraph with an attribute (which translates to CSS class)
+ +
+
+

This is a paragraph

+

This is a paragraph with an attribute (which translates to CSS class)

+
+ +
+
List with nested elements
+ + +
* This is a bullet list
+* Bullet with table (+ joins blocks)
++
+|===
+|Are tables in lists stupid?| Probably!
+|===
+ +
+
+
    +
  • +

    This is a bullet list

    +
  • +
  • +

    Bullet with table (+ joins blocks)

    + + + + + +
    Are tables in lists stupid?Probably!
    +
  • +
+
+ +
+
Code with inline markup
+ + +
[source,rust,subs="+quotes"]
+----
+let x = 1;
+let r: &i32;
+{
+    let y = 2;
+    r = [.hl-error]##&y##;  // borrowed value does not live long enough
+}
+println!("{}", *r);
+----
+ +
+ +
+ + +
+

That is, in addition to the usual syntax highlighting, the &xs[0] bit is wrapped into a <span class="hl-error">. +This can be used to call out specific bits of code, or, like in this case, to show compiler errors:

+

Heres an example of a complex slide:

+ +
+ + +
[.two-col]   
+## Ссылки в C++ и Rust
+
+.C++
+- создаются неявно
+- не являются первоклассными объектами (`std::reference_wrapper`)
+- не всегда валидны
+
+.Rust
+- требуют явных `&`/[.language-rust]`&mut` и `*` 
+- обычные объекты 
++
+[source,rust]
+----
+let x = 1;
+let y = 2;
+let mut r: &i32 = &x;
+r = &y;
+----
+- всегда валидны
+ +
+
    +
  1. +.two-col sets the css class for two-column flex layout. +
  2. +
  3. +[.language-rust] sets css class for inline <code> element, so mut gets highlighted. +
  4. +
  5. +This bullet-point contains a longer snippet of code. +
  6. +
  7. +Have you noticed these circled numbered callouts? They are another useful feature of AsciiDoc! +
  8. +
+

The result is the following slide

+ +
+ + +
+
+
+ +

+ HTML Translation +

+

AsciiDoc markup language is a powerful primitive, but how do we turn it into pixels on the screen? +The hard part of making slides is laying out the contents: breaking paragraphs in lines, aligning images, arranging columns. +As was pointed out by Asciidoctor maintainer, browsers are extremely powerful layout engines, and HTML + CSS is a decent way to describe the layout.

+

And heres where Asciidoctor.js PDF comes in: it allows one to transform AsciiDoc DOM into HTML, by supplying a functional-style visitor. +This HTML is then rendered to PDF by chromium (but you can totally use HTML slides directly if you like it more).

+

Heres the visitor which produces the slides for my Rust course:

+

https://github.com/matklad/rust-course/blob/master/lectures/template.js

+

In contrast to reveal.js, I have full control over the resulting HTML and CSS. +As I dont need cross browser support or complex animations, I can write a relatively simple modern CSS, which I myself can understand.

+
+
+ +

+ Bits and Pieces +

+

Note that Asciidoctor.js PDF is a relatively new piece of technology (although the underlying Asciidoctor project is very mature). +For this reason for my slides I just vendor a specific version of the tool.

+

Because the intermediate result is HTML, the development workflow is very smooth. +Its easy to make a live preview with a couple of editor plugins, and you can use browsers dev-tools to debug CSS. +Ive also written a tiny bit of JavaScript to enable keyboard navigation for slides during preview. +Syntax highlighting is also a bespoke pile of regexes :-)

+

One thing I am worried about is the depth of the stack of technologies of Asciidoctor.js PDF.

+
    +
  1. +Original AsciiDoc tool was written in Python. +
  2. +
  3. +Asciidoctor is a modern enhanced re-implementation in Ruby. +
  4. +
  5. +Asciidoctor.js PDF runs on NodeJS via Opal Ruby -> JavaScript compiler +
  6. +
  7. +It is used to produce HTML which is then fed into chromium to produce PDF! +
  8. +
+

Oh, and syntax highlighting on this blog is powered by pygments, so Ruby calls into Python!

+

This is quite a Zoo, but it works reliably for me!

+
+
+
+ + + + + diff --git a/2019/05/19/rust-course-retrospective.html b/2019/05/19/rust-course-retrospective.html new file mode 100644 index 00000000..8aca3875 --- /dev/null +++ b/2019/05/19/rust-course-retrospective.html @@ -0,0 +1,189 @@ + + + + + + + Rust Course Retrospective + + + + + + + + + + + + +
+ +
+ +
+
+ +

Rust Course Retrospective

+

It was the last week of the Rust course at Computer Science Center. +This post is my experience report from teaching this course.

+
+ +

+ Materials +

+

Note that the course is in Russian :-)

+

Course slides are available under CC-BY at https://github.com/matklad/rust-course. +See the sibling post if you want to learn more about how the slides were made +(TL;DR: Asciidoctor is better than beamer, Google Docs, slides.com, reveal.js, remark).

+

High-quality recordings of lectures are available on YouTube:

+

https://www.youtube.com/playlist?list=PLlb7e2G7aSpTfhiECYNI2EZ1uAluUqE_e

+

The homework is not available, but it was based on the Ray Tracing in One Weekend book.

+
+
+ +

+ Good Parts +

+

Teaching is hard, but very rewarding. +Teaching Rust feels especially good because the language is very well designed and the quality of the implementation is great. +Overall, I dont feel like this was a particularly hard course for the students. +In the end most of the folks successfully completed all assignments, which were fairly representative of the typical Rust code.

+
+
+ +

+ Hard Parts +

+

There were one extremely hard topic and one poorly explained topic.

+

The hard one was the module system. +Many students were completely stumped by it. +Its difficult to point out the specific hard aspect of the current (Rust 2018) module system: each student struggled in their own way.

+

Heres a selection of points of confusion:

+
    +
  • +you dont need to wrap contents of foo.rs in mod foo { ... } +
  • +
  • +you dont need to add mod lib; to main.rs +
  • +
  • +child module lives in the parent/child.rs file, unless the parent is lib.rs or main.rs +
  • +
+

I feel like my explanation of modules was an OK one, it contained all the relevant details and talked about how things work under the hood. +However, it seems like just explaining the modules is not enough: one really needs to arrange a series of exercises about modules, and make sure that all students successfully pass them.

+

I dont think that modules are the hardest feature of the language: advanced lifetimes and unsafe subtleties are more difficult. +However, you dont really write mem::transmute or HRTB every day, while you face modules pretty early.

+

The poorly explained topic was Send/Sync. +I was like compiler infers Send/Sync automatically, and after that your code just fails to compile if it would had a data race, isnt Rust wonderful?. +But this misses the crucial point: in generic code (both for impl T and dyn T), youll need to write : Sync bounds yourself. +Of course the homework was about generic code, and there were a number of solutions with (unsound) unsafe impl<T> Sync for MyThing<T> :-)

+
+
+ +

+ Annoying Parts +

+

Its very hard to google Rust documentation at the moment, because google links +you to redirect stubs of the old book, which creates that weird feeling that you +are inside of a science-fiction novel. +I know that the problem is already fixed, and we just need to wait until the new version of the old book is deployed, but I wish we could have fixed it earlier.

+

Editions are a minor annoyance as well. Ive completely avoided talking about Rust 2015, hoping that Ill just teach the shiny new thing. +But of course students google for help and get outdated info.

+
    +
  • +many used extern crate syntax +
  • +
  • +dyn in dyn T was sometimes omitted +
  • +
  • +there was a couple of mod.rs +
  • +
+

Additionally, several students somehow ended up without edition = "2015" in Cargo.toml.

+

Discussion on /r/rust.

+
+
+
+ + + + + diff --git a/2019/06/20/linux-desktop-tips.html b/2019/06/20/linux-desktop-tips.html new file mode 100644 index 00000000..3a919ede --- /dev/null +++ b/2019/06/20/linux-desktop-tips.html @@ -0,0 +1,362 @@ + + + + + + + Linux Desktop Tips + + + + + + + + + + + + +
+ +
+ +
+
+ +

Linux Desktop Tips

+

Over time I have accumulated a number of tricks and hacks that make linux desktop more natural for me. +Today Ive discovered another one: a way to minimize Firefox on close. +This seems like a good occasion to write about things Ive been doing!

+
+ +

+ Window Switching +

+

Ive never understood the appeal of multiple desktops, tiling window managers, +or Mac style full screen window is outside of your desktop. +They for sure let you to neatly organize several applications at once, but I never need an overview of all applications. +What I need most of the time is switching to a specific application, like a browser.

+

Windows has a feature for this, that fits this workflow perfectly. +If you pin an application to start menu, then win + number will launch or focus that app. +That is, if the app is already running, its window will be raised and focused.

+

For some reason, this is not available out of the box in any of the Linux window +managers Ive tried. What is easy is binding launching an application to a +shortcut, but I rarely use more than once instance of Firefox!

+

Luckily, jumpapp is exactly what is needed +to implement this properly.

+

I use Xbindkeys for global +shortcuts, with the following config:

+ +
+
~/.xbindkeysrc
+ + +
"jumpapp -m kitty"
+  F1
+
+"jumpapp -m firefox"
+  F2
+
+"jumpapp -m code"
+  F3
+ +
+

Note that I bind F? keys without any modifiers: these keys are rarely used +by applications and are very convenient for personal use.

+
+ +
+ +

+ Window Tiling +

+

Because switching windows/applications is easy for me, I typically look at a single maximized window. +However, sometimes I like to have two windows side-by-side, for example an editor and a browser with preview. +A full blown tiling window manager will be an overkill for this use-case, but another Windows feature comes in handy. +In Windows, Win + and Win + tiles active window to the left and right side of the screen. +Luckily, this is a built in feature in most window managers, including KWin and Openbox (the two I use the most).

+
+
+ +

+ Screen Real Estate +

+

This one is tricky! +On one hand, because I use one maximized window at a time, I feel comfortable with smaller displays. +I was even disappointed with a purchase of external display for my laptop: turns out, bigger screen doesnt really help me! +On the other hand, I really like when all pixels I have are utilized fully.

+

Ive tried to work in full screen windows, but that wasnt very convenient for two reasons:

+
    +
  • +Tray area is useful for current time, other status information, and notifications. +
  • +
  • +Full screen doesnt play well with jumpapp window switching. +
  • +
+

After some experiments, Ive settled with the following setup:

+
    +
  • +

    Use of maximized, but not full screen windows.

    +
  • +
  • +

    When window is maximized, its borders and title bar are hidden. To do this in kwin add the following to ~/.config/kwinrc:

    + +
    + + +
    [Windows]
    +BorderlessMaximizedWindows=true
    + +
    +
  • +
  • +

    To still have an ability to close/minimize the window with the mouse, I use Active Window Menu Plasmoid. +What it does is that it packs window title and close/maximize/minimize buttons into the desktop panel, without spending extra pixels:

    + +
    + + +
    +
  • +
+

Another thing Ive noticed is that I look to the bottom side of the screen much more often. +For this reason, I move desktop panel to the top. +You can imagine how inconvenient Macs dock is for me: it wastes so many pixels in the most important area of the display :-)

+
+
+ +

+ Keybindings +

+

After several years of using Emacs and a number of short detours into Vim-land, I grew a profound dislike for the arrow keys. +Its not that they make me slower: they distract me because I need to think about moving my hands.

+

For the long time Ive tried to banish arrow keys from my life by making every +application understand ctrl+b, ctrl+f and the like. +But that was always a whack-a-mole game without a chance to win.

+

A much better approach is Home Row Computing. +I rebind, on the low level, CapsLock + i/j/k/l to arrow keys. +This works in every app. +It also works with alt and shift modifiers.

+

I use xkbcomp with this config to set this up. +I have no idea how this actually works :-)

+
+
+ +

+ File Organization +

+

I used to pile up everything on the desktop. +But now my desktop is completely empty, and I enjoy uncluttered view of +The Hunters in the Snow +every time I boot my laptop.

+

The trick is to realize that accreting junk files is totally normal, and +“just dont put garbage on desktop is not a solution. +Instead, one can create a dedicated place for hoarding.

+

I have two of those:

+
    +
  • +~/downloads which I remove automatically on every reboot +
  • +
  • +~/tmp which I rm -fr ~/tmp manually once in a while +
  • +
+
+
+ +

+ Shell +

+

I used to use Zsh with a bunch of plugins, hoping that Ill learn bash this way. +I still google How to if in bash? every single time though.

+

For this reason, Ive switched to fish with mostly default config. +The killer feature for me is autosuggestions: completion of the commands based on the history. +Zsh has something similar, via a plugin, but this crucial feature works in fish out of the box.

+

One slightly non-standard thing I do is a two-line prompt that looks like this:

+ +
+ + +
02:43:39|~/.config
+λ
+ +
+

Two line prompts are great! You can always see a full working directory, and commands are always visually in the same place. +Having current time in the prompt is also useful in case you run a long command and forget to time it.

+
+
+ +

+ Minimizing Firefox +

+

Finally, the direct cause of this post!

+

I dont use a lot of desktop apps, but I keep a browser with at least five tabs for different messaging apps. +By the way, Tree Style Tab is the best tool for taming modern apps!

+

The problem with this is that I automatically Alt+F4 Firefox once I am done with it, but launching it every time is slow. +Ideally, I want to minimize it on close, just how I do with qBittorrent and Telegram. +Unfortunately, theres no built-in feature for this in Firefox.

+

I once tried to build it with Xbindkeys and Xdotool. +The idea was to intercept Alt+F4 and minize active window if it is Firefox. +That didnt work too well: to close all other applications, I tried to forward Alt+F4, but that recursed badly :-)

+

Luckily, today Ive realized that I can write a KWin script for this! +This turned out to be much harder than anticipated, because the docs are thin and setup is fiddly.

+

This +post was instrumental for me to figure this stuff out. Thanks Chris!

+

Ive created two files:

+ +
+
~/.local/share/kwin/scripts/SmartCloseWindow/metadata.desktop
+ + +
[Desktop Entry]
+Name=Smart Close Window
+Comment=
+Icon=preferences-system-windows-script-test
+
+Type=Service
+
+X-Plasma-API=javascript
+X-Plasma-MainScript=code/main.js
+X-KDE-ServiceTypes=KWin/Script
+
+X-KDE-PluginInfo-Name=SmartCloseWindow # Note, the same name as the dir
+X-KDE-PluginInfo-Author=matklad
+X-KDE-PluginInfo-Email=...
+X-KDE-PluginInfo-License=GPL
+X-KDE-PluginInfo-Version=3
+ +
+ +
+
~/.local/share/kwin/scripts/SmartCloseWindow/contents/code/main.js
+ + +
registerShortcut("Smart Close Window.",
+    "Smart Close Window.",
+    "alt+f4",
+    function () {
+        var c = workspace.activeClient;
+        if (c.caption.indexOf("Firefox") == -1) {
+            c.closeWindow();
+        } else {
+            c.minimized = true;
+        }
+    });
+ +
+

After than, Ive ticked a box in front of Smart Close Window in System Settings › Window Management › KWin Scripts and +added a shortcut in System Settings › Shortcuts › Global Shortcuts › System Settings. +The last step took a while fo figure out: although it looks like we set shortcut in the script itself, this doesnt actually work for some reason.

+
+
+ +

+ Linux Distribution +

+

Finally, my life has become significancy easier since Ive settled on NixOS. +I had mainly used Arch and a bit of Ubuntu before, but NixOS is so much easier to control. +I highly recommend to check it out!

+
+
+ +

+ My Dotfiles +

+

Most of the stuff in this post is codified in my config repo: https://github.com/matklad/config.

+
+
+
+ + + + + diff --git a/2019/07/16/perils-of-constructors.html b/2019/07/16/perils-of-constructors.html new file mode 100644 index 00000000..8001a590 --- /dev/null +++ b/2019/07/16/perils-of-constructors.html @@ -0,0 +1,424 @@ + + + + + + + Perils of Constructors + + + + + + + + + + + + +
+ +
+ +
+
+ +

Perils of Constructors

+

One of my favorite blog posts about Rust is Things Rust Shipped Without by Graydon Hoare. +To me, footguns that dont exist in a language are usually more important than expressiveness. +In this slightly philosophical essay, I want to tell about a missing Rust feature I especially like: constructors.

+
+ +

+ What Is Constructor +

+

Constructors are typically found in Object Oriented languages. +The job of a constructor is to fully initialize an object before the rest of the world sees it. +At the first blush, this seems like a really good idea:

+
    +
  1. +You establish invariants in the constructor. +
  2. +
  3. +Each method takes care to maintain invariants. +
  4. +
  5. +Together, these two properties mean that it is possible to reason about the object in terms of coarse-grained invariants, instead of fine-grained internal state. +
  6. +
+

The constructor plays a role of induction base here, as it is the only way to create a new object.

+

Unfortunately, theres a hole in this reasoning: constructor itself observes an object in an inconsistent state, and that creates a number of problems.

+
+
+ +

+ Value of this +

+

When the constructor initializes the object, it starts with some dummy state. +But how do you define a dummy state for an arbitrary object?

+

The easiest answer is to set all fields to default values: booleans to false, numbers to 0, and reference types to null. +But this requires that every type has a default value, and forces the infamous null into the language. +This is exactly the path that Java took: at the start of construction, all fields are zero or null.

+

Its really hard to paper over this if you want to get rid of null afterwards. +A good case study here is Kotlin. +Kotlin uses non-nullable types by default, but has to work with pre-exiting JVM semantics. +The language-design heroics to hide this fact are really impressive and work well in practice, but are unsound. +That is, with constructors it is possible to circumvent Kotlin null-checking.

+

Kotlins main trick is to encourage usage of so-called primary constructors, which simultaneously declare a field and set it before any user code runs:

+ +
+ + +
class Person(
+  val firstName: String,
+  val lastName: String
+) { ... }
+ +
+

Alternatively, if the field is not declared in the constructor, the programmer is encouraged to immediately initialize it:

+ +
+ + +
class Person(val firstName: String, val lastName: String) {
+    val fullName: String = "$firstName $lastName"
+}
+ +
+

Trying to use a field before initialization is forbidden statically on the best effort basis:

+ +
+ + +
class Person(val firstName: String, val lastName: String) {
+    val fullName: String
+    init {
+        println(fullName) // error: variable must be initialized
+        fullName = "$firstName $lastName"
+    }
+}
+ +
+

But, with some creativity, one can get around these checks. +For example, a method call would do:

+ +
+ + +
class A {
+    val x: Any
+    init {
+        observeNull()
+        x = 92
+    }
+    fun observeNull() = println(x) // prints null
+}
+
+fun main() {
+    A()
+}
+ +
+

As well as capturing this by a lambda (spelled { args -> body } in Kotlin):

+ +
+ + +
class B {
+    val x: Any = { y }()
+    val y: Any = x
+}
+
+fun main() {
+    println(B().x) // prints null
+}
+ +
+

Examples like these seem contorted (and they are), but I did hit similar issues +in real code +(Kolmogorovs zeroone law of software engineering: in a sufficiently large code base, every code pattern exists almost surely, unless it is statically rejected by the compiler, in which case it almost surely doesnt exist).

+

The reason why Kotlin can get away with this unsoundness is the same as with Javas covariant arrays: runtime does null checks anyway. +All in all, I wouldnt want to complicate Kotlins type system to make the above cases rejected at compile time: +given existing constraints (JVM semantics), cost/benefit ratio of a runtime check is much better than that of a static check.

+

What if the language doesnt have a reasonable default for every type? +For example, in C++, where user defined types are not necessary references, one can not just assign nulls to every field and call it a day! +Instead, C++ invents special kind of syntactic machinery for specifying initial values of the fields: initializer lists:

+ +
+ + +
#include <string>
+#include <utility>
+
+class person {
+  person(std::string first_name, std::string last_name)
+    : first_name(std::move(first_name))
+    , last_name(std::move(last_name))
+  {}
+
+  std::string first_name;
+  std::string last_name;
+};
+ +
+

Being a special syntax, the rest of the language doesnt work completely flawlessly with it. +For example, its hard to fit arbitrary statements in initializer lists, because C++ is not expression-oriented language (which by itself is OK!). +Working with exceptions from initializer lists needs yet another obscure language feature.

+
+
+ +

+ Calling Methods From Constructor +

+

As Kotlin examples alluded, all hell breaks loose if one calls a method from a constructor. +Generally, methods expect that this object is fully constructed and valid (adheres to invariants). +But, in Java or Kotlin, nothing prevents you from calling a method in constructor, and that way a semi-alive object can escape. +Constructor promises to establish invariants, but is actually the easiest place to break them!

+

A particularly bizarre thing happens when the base class calls a method overridden in the subclass:

+ +
+ + +
abstract class Base {
+    init {
+        initialize()
+    }
+    abstract fun initialize()
+}
+
+class Derived: Base() {
+    val x: Any = 92
+    override fun initialize() = println(x) // prints null!
+}
+ +
+

Just think about it: code for Derived runs before the its constructor! +Doing a similar thing in C++ leads to even curiouser results. +Instead of calling the function from Derived, a function from Base will be called. +This makes some sense, because Derived is not at all initialized (remember, we cant just say that all fields are null). +However, if the function in Base happens to be pure virtual, undefined behavior occurs.

+
+
+ +

+ Constructors Signature +

+

Breaking invariants isnt the only problem with constructors. +They also have signature with fixed name (empty) and return type (the class itself). +That makes constructor overloads confusing for humans.

+ +

The problem with return type usually comes up if construction can fail. +You cant return Result<MyClass, io::Error> or null from a constructor!

+

This is often used as an argument that C++ with exceptions disabled is not viable, and that using constructors force one to use exceptions as well. +I dont think thats a valid argument though: factory functions solve both problems, because they can have arbitrary names and can return arbitrary types. +I actually this to be an occasionally useful pattern in OO-languages:

+
    +
  • +

    Make a single private constructor that accepts all the fields as arguments and just sets them. +That is, this constructor acts almost like a record literal in Rust. +It can also validate any invariants, but it shouldnt do anything else with arguments or fields.

    +
  • +
  • +

    For public API, provide the necessary public factory functions, with +appropriate naming and adjusted return types.

    +
  • +
+

A similar problem with constructors is that, because they are a special kind of thing, its hard to be generic over them. +In C++, default constructable or copy constructable cant be expressed more directly than certain syntax works. +Contrast this with Rust, where these concepts have appropriate signatures:

+ +
+ + +
trait Default {
+    fn default() -> Self;
+}
+
+trait Clone {
+    fn clone(&self) -> Self;
+}
+ +
+
+
+ +

+ Life Without Constructors +

+

In Rust, theres only one way to create a struct: providing values for all the fields. +Factory functions, like the conventional new, play the role of constructors, but, crucially, dont allow calling any methods until you have at least a basically valid struct instance on hand.

+

A perceived downside of this approach is that any code can create a struct, so theres no the single place, like the constructor, to enforce invariants. +In practice, this is easily solved by privacy: if structs fields are private it can only be created inside its declaring module. +Within a single module, its not at all hard to maintain a convention like all construction must go via the new method. +One can even imagine a language extension that allows one to mark certain functions with a #[constructor] attribute, with the effect that the record literal syntax is available only in the marked functions. +But, again, additional language machinery seems unnecessary: maintaining local conventions needs little effort.

+

I personally think that this tradeoff looks the same for first-class contract programming in general. +Contracts like not null or positive are best encoded in types. +For complex invariants, just writing assert!(self.validate()) in each method manually is not that hard. +Between these two patterns theres little room for language-level or macro-based #[pre] and #[post] conditions.

+
+
+ +

+ A Case of Swift +

+

An interesting language to look at the constructor machinery is Swift. +Like Kotlin, Swift is a null-safe language. +Unlike Kotlin, Swifts null-checking needs to be sound, so it employs interesting tricks to mitigate constructor-induced damage.

+

First, Swift embraces named arguments, and that helps quite a bit with all constructors have the same name. +In particular, having two constructors with the same types of parameters is not a problem:

+ +
+ + +
Celsius(fromFahrenheit: 212.0)
+Celsius(fromKelvin: 273.15)
+ +
+

Second, to solve constructor calls virtual function from an objects class that didnt came into existence yet problem, Swift uses elaborate two-phase initialization protocol. +Although theres no special syntax for initializer lists, compiler statically checks that constructors body has just the right, safe and sound, form. +For example, calling methods is only allowed after all fields of the class and its ancestors are set.

+

Third, theres special language-level support for failable constructors. +A constructor can be declared nullable, which makes the result of a call to a constructor an option. +A constructor can also have throws modifier, which works somewhat nicer with Swiftss semantic two-phase initialization than with C++ syntactic initializer lists.

+

Swift manages to plug all of the holes in constructors I am ranting about. +This comes at a price, however: the initialization chapter is one of the longest in Swift book!

+
+
+ +

+ When Constructors Are Necessary +

+

However, I can think of at least two reasons why constructors cant be easily substituted with Rust-style record literals.

+

First, inheritance more or less forces the language to have constructors. +One can imagine extending the record syntax with support for base classes:

+ +
+ + +
struct Base { ... }
+
+struct Derived: Base { foo: i32 }
+
+impl Derived {
+    fn new() -> Derived {
+        Derived {
+            Base::new()..,
+            foo: 92,
+        }
+    }
+}
+ +
+

But this wont work in a typical single-inheritance OO language object layout! +Usually, an object starts with a header and continues with fields of classes, from the base one to the most derived one. +This way, a prefix of an object of a derived class forms a valid object of a base class. +For this layout to work though, constructor needs to allocate memory for the whole object at once. +It cant allocate just enough space for base, and than append derived fields afterwards. +But such piece-wise allocation is required if we want a record syntax were we can just specify a value for a base class.

+

Second, unlike records, constructors have a placement-friendly ABI. +Constructor acts on the this pointer, which points to a chunk of memory which a newborn object should occupy. +Crucially, a constructor can easily pass pointer to subobjects constructors, allowing to create a complex tree of values in-place. +In contrast, in Rust constructing records semantically involves quite a few copies of memory, and we are at the mercy of the optimizer here. +Its not a coincidence that theres still no accepted RFC for placement in Rust!

+

Discussion on /r/rust.

+
+
+
+ + + + + diff --git a/2019/07/25/unsafe-as-a-type-system.html b/2019/07/25/unsafe-as-a-type-system.html new file mode 100644 index 00000000..2989d9a2 --- /dev/null +++ b/2019/07/25/unsafe-as-a-type-system.html @@ -0,0 +1,200 @@ + + + + + + + Unsafe as a Human-Assisted Type System + + + + + + + + + + + + +
+ +
+ +
+
+ +

Unsafe as a Human-Assisted Type System

+

This is a short note about yet another way to look at Rusts unsafe.

+

Today, an interesting bug was found in rustc, which made me aware just how useful unsafe is for making code maintainable. +The story begins a couple of months ago, when I was casually browsing through recent pull requests for rust-lang/rust. +I was probably waiting for my code to compile at that moment :] +Anyway, a pull request caught my attention, and, while I was reading the diff, I noticed a usage of unsafe. +It looked roughly like this:

+ +
+ + +
fn map_in_place<T, F>(t: &mut T, f: F)
+where
+    F: FnOnce(T) -> T,
+{
+    unsafe { std::ptr::write(t, f(std::ptr::read(t))); }
+}
+ +
+

This function applies a T -> T function to a &mut T value, a-la take_mut crate.

+

There is a safe way to do this in Rust, by temporary replacing the value with something useless (Joness trick):

+ +
+ + +
fn map_in_place_safe<T, F>(t: &mut T, f: F)
+where
+    F: FnOnce(T) -> T,
+    T: Default,
+{
+    let stolen_t = std::mem::replace(t, T::default());
+    t = f(stolen_t)
+}
+ +
+

In map_in_place we dont have a T: Default bound, so the trick is not applicable. +Instead, the function uses (unsafe) ptr::read to get an owned value out of a unique reference, and then uses ptr::write to store the new value back, without calling the destructor.

+

However, the code has a particular unsafe code smell: it calls user-supplied code (f) from within an unsafe block. +This is usually undesirable, because it makes reasoning about invariants harder: arbitrary code can do arbitrary unexpected things.

+ +

And, indeed, this function is unsound: if f panics and unwinds, the t value would be dropped twice! +The solution here (which I know from the take_mut crate) is to just abort the process if the closure panics. +Stern, but effective!

+

I felt really torn about bringing this issue up: clearly, inside the compiler we know what we are doing, and the error case seems extremely marginal. +Nevertheless, I did leave the comment, and the abort trick was implemented.

+

And guess what? +Today a bug report came in (#62894), demonstrating that closure does panic in some cases, and rustc aborts. +To be clear, the abort in this case is a good thing! +If rustc didnt abort, it would be a use-after-free.

+

Note how cool is this: a casual code-reviewer was able to prevent a memory-safety issue by looking at just a single one-line function. +This was possible for two reasons:

+
    +
  1. +The code was marked unsafe which made it stand out. +
  2. +
  3. +The safety reasoning was purely local: I didnt need to understand the PR (or surrounding code) as a whole to reason about the unsafe block. +
  4. +
+

The last bullet point is especially interesting, because it is what makes type systems [1] in general effective in large-scale software development:

+
    +
  1. +Checking types is a local (per-expression, per-function, per-module, depending on the language) procedure. +Every step is almost trivial: verify that sub-expressions have the right type and work out the result type. +
  2. +
  3. +Together, these local static checks guarantee a highly non-trivial global property: +during runtime, actual types of all the values match inferred static types of variables. +
  4. +
+

Rusts unsafe is similar: if we verify every usage of unsafe (local property!) to be correct, then we guarantee that the program as a whole does not contain undefined behavior.

+

The devil is in the details, however, so the reality is slightly more nuanced.

+

First, unsafe should be checked by humans, thus a human-assisted type system. +The problem with humans, however, is that they make mistakes all the time.

+

Second, checking unsafe can involve a rather large chunk of code. +For example, if you implement Vec, you can (safely) write to its length field from anywhere in the defining module. +That means that correctness of Deref impl for Vec depends on the whole module. +Common wisdom says that the boundary for unsafe code is a module, but I would love to see a more precise characteristic. +For example, in map_in_place case its pretty clear that only a single function should be examined. +On the other hand, if Vecs field are pub(super), parent module should be scrutinized as well.

+

Third, its trivial to make all unsafe blocks technically correct by just making every function unsafe. +That wouldnt be a useful thing to do though! +Similarly, if unsafe is used willy-nilly across the ecosystem, its value is decreased, because there would be many incorrect unsafe blocks, and reviewing each additional block would be harder.

+

Fourth, and probably most disturbing, correctness of two unsafe blocks in isolation does not guarantee that they together are correct! +We shouldnt panic though: in practice, realistic usages of unsafe do compose.

+

Discussion on r/rust.

+

Update(2020-08-17): oops, I did it again.

+

[1] unsafe is really an effect system, but the difference is not important here.

+
+
+ + + + + diff --git a/2019/08/23/join-your-threads.html b/2019/08/23/join-your-threads.html new file mode 100644 index 00000000..97d67696 --- /dev/null +++ b/2019/08/23/join-your-threads.html @@ -0,0 +1,319 @@ + + + + + + + Join Your Threads + + + + + + + + + + + + +
+ +
+ +
+
+ +

Join Your Threads

+

This is a note on how to make multithreaded programs more robust. +Its not really specific to Rust, but I get to advertise my new jod-thread micro-crate :)

+

Lets say youve created a fresh new thread with std::thread::spawn, but havent call JoinHandle::join anywhere in your program. +What can go wrong in this situation? +As a reminder, join blocks until the thread represented by handle completes successfully or with a panic.

+

First, if the main function finishes earlier, some destructors on that other threads stack might not run. +Its not a big deal if all that destructors do is just freeing memory: the OS cleanups after the process exit anyway. +However, Drop could have been used for something like flushing IO buffers, and that is more problematic.

+

Second, not joining threads can lead to surprising interference between unrelated parts of the program and in general to more chaotic behavior. +Imagine, for example, running a test suite with many tests. +In this situation typical singleton threads may accumulate during a test run. +Another scenario is spawning helper threads when processing tasks. +If you dont join these threads, you might end up using more resources than there are concurrent tasks, making it harder to measure the load. +To be clear, if you dont call join, the thread will complete at some point anyway, it wont leak or anything. +But this some point is non-deterministic.

+

Third, If a thread panics in a forest, and no one is around to hear it, does it make a sound? +The join method returns a Result, which is be an Err if the thread has panicked. +If you dont join the thread, you wont get a chance to react to this event. +So, unless you are looking at the stderr at this moment, you might not realize that something is wrong!

+ +

It seems like joining the threads by default is a good idea. +However, just calling JoinHandle::join is not enough:

+ +
+ + +
let thread = std::thread::spawn(|| {
+    /* useful work */
+});
+
+// ...
+
+thread.join().unwrap(); // propagate the panic
+ +
+

The problem is, code in might use ? (or some other form of early return), or it can panic, and in both cases the thread wont be joined. +As usual, the solution is to put the cleanup operation into a Drop impl. +Thats exactly what my crate, jod_thread, does! +Note that this is really a micro crate, so consider just rolling your own join on drop. +The value is not in the code, its in the pattern of never leaving a loose thread behind!

+
+ +

+ A Look At C++ +

+

As usual, it is instructive to contrast and compare Rust and C++.

+

In C++, std::thread has this interesting peculiarity that it terminates the process in destructor unless you call .join (which works just like in Rust) or .detach (which says I wont be joining this thread at all). +In other words, C++ mandates that you explicitly choose between joining and detaching. +Why is that?

+

Its easy to argue that detach by default is a wrong choice for C++: it can easily lead to undefined behavior if the lambda passed to the thread uses values from parents stack frame.

+

Or, as Scott Meyer poetically puts it in the Item 37 of Effective Modern C++ (which is probably the best book to read if you are into both Rust and C++):

+ +

This also happens to be one of my favorite arguments for why Rust? :)

+

The reasoning behind not making join the default is less clear cut. +The book says that join by default is be counterintuitive, but that is somewhat circular: it is surprising precisely because it is not the default.

+

In Rust, unlike C++, implicit detach cant cause undefined behavior (compiler will just refuse the code if the lambda borrows from the stack). +I suspect this we can, so why not? is the reason why Rust detaches by default.

+

However, theres a twist! +C++ core guidelines now recommend to always use gsl::joining_thread (which does implicit join) over std::thread in CP.25. +The following CP.26 reinforces the point by advising against .detach() method. +The reasoning is roughly similar to my post: detached threads make the program more chaotic, as they add superfluous degrees of freedom to the runtime behavior.

+

Its interesting that Ive learned about these two particular guidelines only today, when refreshing my C++ for this section of the post!

+

So, it seems like both C++ and Rust picked the wrong default for the thread API in this case. But at least C++ has official guidelines recommending the better approach. +And Rust, well, Rust has my blog post now :-)

+
+
+ +

+ A Silver Bullet +

+

Of course there isnt one! +Joining on drop seems to be a better default, but it brings its own problems. +The nastiest one is deadlocks: if you are joining a thread which waits for something else, you might wait forever. +I dont think theres an easy solution here: not joining the thread lets you forget about the deadlock, and may even make it go away (if a child thread is blocked on the parent thread), but youll get a detached thread on your hands! +The fix is to just arrange the threads in such a way that shutdown is always orderly and clean. +Ideally, shutdown should work the same for both the happy and panicking path.

+

I want to discuss a specific instructive issue that Ive solved in rust-analyzer. +It was about the usual setup with a worker thread that consumes items from a channel, roughly like this:

+ +
+ + +
fn frobnicate() {
+    let (sender, receiver) = channel();
+    let worker = jod_thread::spawn(move || {
+        for item receiver {
+            do_work(item)
+        }
+    });
+
+    // prepare some work and send it via sender
+}
+ +
+

Here, the worker thread has a simple termination condition: it stops when the channel is closed. +However, here lies the problem: we create the channel before the thread, so the sender is dropped after the worker. +This is a deadlock: frobnicate waits for worker to exit, and worker waits for frobnicate to drop the sender!

+

Theres a straightforward fix: drop the sender first!

+ +
+ + +
fn frobnicate() {
+    let (sender, receiver) = channel();
+    let worker = jod_thread::spawn(move || {
+        for item receiver {
+            do_work(item)
+        }
+    });
+
+    // prepare some work and send it via sender
+
+    drop(sender);
+    drop(worker);
+}
+ +
+

This solution, while obvious, has a pretty serious problem! +The prepare some work ... bit of code can contain early returns due to error handling or it may panic. +In both case the result is a deadlock. +What is the worst, now deadlock happens only on the unhappy path!

+

There is an elegant, but tricky fix for this. Take a minute to think about it! +How to change the above snippet such that the worker thread is guranted to be joined, without deadlocks, regardless of the exit condition (normal termination,?, panic) of frobnicate?

+

The answer will be below these beautiful Ukiyo-e prints :-)

+ +
+
Fine Wind, Clear Morning
+ + +
+ +
+
Rainstorm Beneath the Summit
+ + +
+

First of all, the problem we are seeing here is an instance of a very general setup. +We have a bug which only manifests itself if a rare error condition arises. +In some sense, we have a bug in the (implicit) error handling (just like 92% of critical bugs). +The solutions here are a classic:

+
    +
  1. +Artificially trigger unhappy path often (restoring from backup every night). +
  2. +
  3. +Make sure that there arent different happy and unhappy paths (crash only software). +
  4. +
+

We are going to do the second one. +Specifically, well arrange the code in such way that compiler automatically drops worker first, without the need for explicit drop.

+

Something like this:

+ +
+ + +
let worker = jod_thread::spawn(move || { ... });
+let (sender, receiver) = channel();
+ +
+

The problem here is that we need receiver inside the worker, but moving let (sender, receiver) up brings us back to the square one. +Instead, we do this:

+ +
+ + +
let worker;
+let (sender, receiver) = channel();
+worker = jod_thread::spawn(move || { ... });
+ +
+

Beautiful, isnt it? +And super cryptic: the real code has a sizable comment chunk!

+

The second big issue with join by default is that, if you have many threads in the same scope, and one of them errors, you really want to not only wait until others are finished, but to actually cancel them. +Unfortunately, cancelling a thread is a notoriously thorny problem, which Ive explained a bit in another post.

+
+
+ +

+ Wrapping Up +

+

So, yeah, join your threads, but be on guard about deadlocks! +Note that most of the time one shouldnt actually spawn threads manually: instead, tasks should be spawned to a common threadpool. +This way, physical parallelism is nicely separated from logical concurrency. +However, tasks should generally be joined for the same reason threads should be joined. +A nice additional properly of tasks is that joining the threadpool itself in the end ensures that no tasks are leaked in the single place.

+

A part of the inspiration for this post was the fact that I once forgot to join a thread :( +This rather embarrassingly happened in my other post. +Luckily, my current colleague Stjepan Glavina noticed this. +Thank you, Stjepan!

+

Discussion on r/rust.

+
+
+
+ + + + + diff --git a/2019/11/13/rust-analyzer-blog.html b/2019/11/13/rust-analyzer-blog.html new file mode 100644 index 00000000..56e91cad --- /dev/null +++ b/2019/11/13/rust-analyzer-blog.html @@ -0,0 +1,115 @@ + + + + + + + rust-analyzer Blog + + + + + + + + + + + + +
+ +
+ +
+
+ +

rust-analyzer Blog

+

Hey, Ive set up a website for rust-analyzer:

+

https://rust-analyzer.github.io/

+

It has a blog section, and I plan to post rust-analyzer related articles there.

+

The first technical article is Find Usages.

+

If you are finding rust-analyzer useful in your work, consider talking to management about sponsoring rust-analyzer. +We are specifically seeking sponsorship from companies that use Rust!

+

Support rust-analyzer on Open Collective

+
+
+ + + + + diff --git a/2019/11/16/a-better-shell.html b/2019/11/16/a-better-shell.html new file mode 100644 index 00000000..60fe4fda --- /dev/null +++ b/2019/11/16/a-better-shell.html @@ -0,0 +1,288 @@ + + + + + + + A Better Shell + + + + + + + + + + + + +
+ +
+ +
+
+ +

A Better Shell

+

I want a better shell.

+

There are exciting projects to improve data-processing capabilities of shells, like nushell. +However, I personally dont use this capability of shell a lot: 90% of commands I enter are simpler than some cmd | rg pattern.

+

I primarily use shell as a way to use my system, and it is these interactive capabilities that I find lacking. +So I want something closer in spirit to notty.

+
+ +

+ Things I Need +

+

The most commands I type are cd, exa, rm, git ..., cargo .... +I also type mg, which launches a GUI version of Emacs with Magit:

+ +
+ + +
+

These tools make me productive. +Keyboard-only input is fast and composable (I can press up to see previous commands, I can copy-paste paths, etc). +Colored character-box based presentation is very clear and predictable, I can scan it very quickly.

+ +

However, there are serious gaps in the UX:

+
    +
  • +

    ctrl+c doesnt work as it works in every other application.

    +
  • +
  • +

    I launch GUI version of Emacs: the terminal one changes some keybindings, which is confusing to me. +For example, I have splits inside emacs, and inside my terminal as well, and I just get confused as to which shortcut I should use.

    +
  • +
  • +

    The output of programs is colored with escaped codes, which are horrible, and not flexible enough. +When my Rust program panics and prints that it failed in my_crate::foo::bar function, I want this to be a hyperlink to the source code of the function. +I want to cat images and PDFs in my terminal (and html, obviously).

    +
  • +
  • +

    My workflow after Ive done a bunch of changes is:

    +
      +
    1. +type cargo test to launch tests +
    2. +
    3. +type ctrl+shift+Enter to split the terminal +
    4. +
    5. +type git status or mg in the split to start making a commit in parallel to testing +
    6. +
    +
  • +
+

The last step is crazy!

+

Like, cargo test is being run by my shell (fish), the split is handled by the terminal emulator (kitty), which launches a fresh instance of fish and arranges the working directory to be saved.

+

As a user, I dont care about this terminal/terminal emulator/shell split. +I want to launch a program, and just type commands. +Why cargo test blocks my input? +Why cant I type cargo test, Enter, exa -l, Enter and have this program to automatically create the split?

+ +
+ + +
$ cargo test
+...
+tons of output in progress
+...
+
+# -- split (healed once `cargo test` finishes) -- #
+
+$ ls
+foo.txt
+bar.rs
+
+$ git ...
+ +
+

Additionally, while magit awesome, I want an option to use such interface for all my utilities. +Like, for tar? +And, when I type cargo test --package, I really want completion for the set of packages which are available in the current directory.

+
+
+ +

+ New Shell +

+

What I really want is an extensible application container, a-la Emacs or Eclipse, but focused for a shell use-case. +It could look like this:

+
    +
  • +A GUI application (which draws using raw OpenGL: we wont be using native OS GUI widgets). +
  • +
  • +A UI framework for text-based UIs, using magit as a model. ctrl+c, ctrl+v and friends should work as expected. +
  • +
  • +A tilling frame management, again, like the one in Emacs (and golden-ratio should be default). +
  • +
  • +Some concept of process-let, which can occupy a frame. +
  • +
  • +A prompt, which is always available, and smartly (without blocking, splitting screen if necessary) spawns new processlets. +
  • +
  • +An API to let processlets interact with text UI. +
  • +
  • +A plugin system for in-process processlets (obviously, plugins should be implemented in WASM). +
  • +
  • +A plugin marketplace (versions, dependencies, lockfile, backwards compatibility). +
  • +
  • +A plugin system for out-of-process processlets (JSON over stdio?). +
  • +
  • +A backwards compatibility wrapper to treat usual Unix utilities as processlets. +
  • +
+
+
+ +

+ Emacs? +

+

Isnt it Emacs that I am trying to describe? +Well, sort-of. +Emacs is definitely in the same class of application containers, but it has some severe problems, in my opinion:

+
    +
  • +Emacs Lisp is far from the best possible language for writing extensions. +
  • +
  • +Plugin ecosystem is not really dependable. +
  • +
  • +It doesnt define out-of-process plugin API (things like hyperlinking output). +
  • +
  • +Async support is somewhere between non-existent and awkward. +
  • +
  • +Its main focus is text editing. +
  • +
  • +Its defaults are not really great (fish shell is a great project to learn from here). +
  • +
  • +ctrl+c, ctrl+v do not work by default, M-x is not really remappable. +
  • +
+
+
+ +

+ Random Closing Thoughts +

+

This post contains the best plugin diagram ever:

+

https://www.tedinski.com/2018/01/30/the-one-ring-problem-abstraction-and-power.html

+

This talk echoes similar sentiments:

+

https://www.destroyallsoftware.com/talks/a-whole-new-world

+

If you build some like this, please sign me up!

+
+
+ +

+ Addendum (2020-03-27) +

+

A terminals are a mess story from today. +I wanted kill other split shortcut shortcut for my terminal, bound to ctrl+k, 1. +Implementing it was easy, as kitty has a nice plugin API. +After that Ive realized that I need to remap kill_line from ctrl+k to ctrl+shift+k, so that it doesnt conflict with the ctrl+k, 1 chord. +It took me a while to realize that searching for kill_line in kitty is futile editing is handled by the shell. +Ok, so it looks like I can just remap the key in fish, by bind \cK kill_line, except that, no, ctrl shortcuts do not work with Shift because of some obscure terminal limitation. +So, lets go back to kitty and add a ctrl+shift+k shortcut that sends ^k to the fish! +An hour wasted.

+
+
+
+ + + + + diff --git a/2020/01/02/spinlocks-considered-harmful.html b/2020/01/02/spinlocks-considered-harmful.html new file mode 100644 index 00000000..91454c4d --- /dev/null +++ b/2020/01/02/spinlocks-considered-harmful.html @@ -0,0 +1,474 @@ + + + + + + + Spinlocks Considered Harmful + + + + + + + + + + + + +
+ +
+ +
+
+ +

Spinlocks Considered Harmful

+

Happy new year 🎉!

+

In this post, I will be expressing strong opinions about a topic I have relatively little practical experience with, so feel free to roast and educate me in comments (link at the end of the post) :-)

+

Specifically, Ill talk about:

+ +
+ +

+ Context +

+

I maintain once_cell crate, which is a synchronization primitive. +It uses std blocking facilities under the hood (specifically, std::thread::park), and as such is not compatible with #[no_std]. +A popular request is to add a spin-lock based implementation for use in #[no_std] environments: #61.

+

More generally, this seems to be a common pattern in Rust ecosystem:

+
    +
  • +A crate uses Mutex or other synchronization mechanism from std +
  • +
  • +Someone asks for #[no_std] support +
  • +
  • +Mutex is swapped for some variation of spinlock. +
  • +
+

For example, the lazy_static crate does this:

+

github.com/rust-lang-nursery/lazy-static.rs/blob/master/src/core_lazy.rs

+

I think this is an anti-pattern, and I am writing this blog post to call it out.

+
+
+ +

+ What Is a Spinlock, Anyway? +

+

A Spinlock is the simplest possible implementation of a mutex, its general form looks like this:

+ +
+ + +
static LOCKED: AtomicBool = AtomicBool::new(false);
+
+while LOCKED.compare_and_swap(false, true, Ordering::Acquire) { 
+  std::sync::atomic::spin_loop_hint(); 
+}
+
+/* Critical section */  
+
+LOCKED.store(false, Ordering::Release);
+ +
+
    +
  1. +To grab a lock, we repeatedly execute compareandswap until it succeeds. The CPU spins in this very short loop. +
  2. +
  3. +Only one thread at a time can be here. +
  4. +
  5. +To release the lock, we do a single atomic store. +
  6. +
  7. +Spinning is wasteful, so we use an intrinsic to instruct the CPU to enter a low-power mode. +
  8. +
+

Why we need Ordering::Acquire and Ordering::Release is very interesting, but beyond the scope of this article.

+

The key take-away here is that a spinlock is implemented entirely in user space: from OS point of view, a spinning thread looks exactly like a thread that does a heavy computation.

+

An OS-based mutex, like std::sync::Mutex or parking_lot::Mutex, uses a system call to tell the operating system that a thread needs to be blocked. In pseudo code, an implementation might look like this:

+ +
+ + +
static LOCKED: AtomicBool = AtomicBool::new(false);
+
+while LOCKED.compare_and_swap(false, true, Ordering::Acquire)
+  park_this_thread(&LOCKED);
+}
+
+/* Critical section */
+
+LOCKED.store(false, Ordering::Release);
+unpark_some_thread(&LOCKED);
+ +
+

The main difference is park_this_thread a blocking system call. +It instructs the OS to take current thread off the CPU until it is woken up by an unpark_some_thread call. +The kernel maintains a queue of threads waiting for a mutex. +The park call enqueues current thread onto this queue, while unpark dequeues some thread. The park system call returns when the thread is dequeued. +In the meantime, the thread waits off the CPU.

+

If there are several different mutexes, the kernel needs to maintain several queues. +An address of a lock can be used as a token to identify a specific queue (this is a futex API).

+

System calls are expensive, so production implementations of Mutex usually spin for several iterations before calling into OS, optimistically hoping that the Mutex will be released soon. +However, the waiting always bottoms out in a syscall.

+
+
+ +

+ Spinning Just For a Little Bit, What Can Go Wrong? +

+

Because spin locks are so simple and fast, it seems to be a good idea to use them for short-lived critical sections. +For example, if you only need to increment a couple of integers, should you really bother with complicated syscalls? In the worst case, the other thread will spin just for a couple of iterations

+

Unfortunately, this logic is flawed! +A thread can be preempted at any time, including during a short critical section. +If it is preempted, that means that all other threads will need to spin until the original thread gets its share of CPU again. +And, because a spinning thread looks like a good, busy thread to the OS, the other threads will spin until they exhaust their quants, preventing the unlucky thread from getting back on the processor!

+

If this sounds like a series of unfortunate events, dont worry, it gets even worse. Enter Priority Inversion. Suppose our threads have priorities, and OS tries to schedule high-priority threads over low-priority ones.

+

Now, what happens if the thread that enters a critical section is a low-priority one, but competing threads have high priority? +It will likely get preempted: there are higher priority threads after all. +And, if the number of cores is smaller than the number of high priority threads that try to lock a mutex, it likely wont be able to complete a critical section at all: OS will be repeatedly scheduling all the other threads!

+
+
+ +

+ No OS, no problem? +

+

But wait! you would say we only use spin locks in #[no_std] crates, so theres no OS to preempt our threads.

+

First, its not really true: its perfectly fine, and often even desirable, to use #[no_std] crates for usual user-space applications. +For example, if you write a Rust replacement for a low-level C library, like zlib or openssl, you will probably make the crate #[no_std], so that non-Rust applications can link to it without pulling the whole of the Rust runtime.

+

Second, if theres really no OS to speak about, and you are on the bare metal (or in the kernel), it gets even worse than priority inversion.

+

On bare metal, we generally dont worry about thread preemption, but we need to worry about processor interrupts. That is, while processor is executing some code, it might receive an interrupt from some periphery device, and temporary switch to the interrupt handlers code.

+

And here comes the disaster: if the main code is in the middle of the critical section when the interrupt arrives, and if the interrupt handler tries to enter the critical section as well, we get a guaranteed deadlock! +Theres no OS to switch threads after a quant expires. +Here are Linux kernel docs discussing this issue.

+
+
+ +

+ Practical Applications +

+

Lets trigger priority inversion! +Our victim is the getrandom crate. +I dont pick on getrandom specifically here: the pattern is pervasive across the ecosystem.

+

The crate uses spinning in the LazyUsize utility type:

+ +
+ + +
pub struct LazyUsize(AtomicUsize);
+
+impl LazyUsize {
+  // Synchronously runs the init() function. Only one caller
+  // will have their init() function running at a time, and
+  // exactly one successful call will be run. init() returning
+  // UNINIT or ACTIVE will be considered a failure, and future
+  // calls to sync_init will rerun their init() function.
+
+  pub fn sync_init(
+    &self,
+    init: impl FnOnce() -> usize,
+    mut wait: impl FnMut(),
+  ) -> usize {
+    // Common and fast path with no contention.
+    // Don't wast time on CAS.
+    match self.0.load(Relaxed) {
+      Self::UNINIT | Self::ACTIVE => {}
+      val => return val,
+    }
+    // Relaxed ordering is fine,
+    // as we only have a single atomic variable.
+    loop {
+      match self.0.compare_and_swap(
+        Self::UNINIT,
+        Self::ACTIVE,
+        Relaxed,
+      ) {
+        Self::UNINIT => {
+          let val = init();
+          self.0.store(
+            match val {
+              Self::UNINIT | Self::ACTIVE => Self::UNINIT,
+              val => val,
+            },
+            Relaxed,
+          );
+          return val;
+        }
+        Self::ACTIVE => wait(),
+        val => return val,
+      }
+    }
+  }
+}
+ +
+

Theres a static instance of LazyUsize which caches file descriptor for /dev/random:

+

https://github.com/rust-random/getrandom/blob/v0.1.13/src/use_file.rs#L26

+

This descriptor is used when calling getrandom the only function that is exported by the crate.

+

To trigger priority inversion, we will create 1 + N threads, each of which will call getrandom::getrandom. +We arrange it so that the first thread has a low priority, and the rest are high priority. +We stagger threads a little bit so that the first one does the initialization. +We also make creating the file descriptor slow, so that the first thread gets preempted while in the critical section.

+ +

Here is the implementation of this plan: https://github.com/matklad/spin-of-death.

+

It uses a couple of systems programming hacks to make this disaster scenario easy to reproduce. +To simulate slow /dev/random, we want to intercept the poll syscall getrandom is using to ensure that theres enough entropy. +We can use strace to log system calls issued by a program. +I dont know if strace can be used to make a syscall run slow (now, once Ive looked at the website, I see that it can in fact be used to tamper with syscalls, sigh), but we actually dont need to! +getrandom does not use the syscall directly, it uses the poll function from libc. +We can substitute this function by using LD_PRELOAD, but theres an even simpler way! +We can trick the static linker into using a function which we define ourselves:

+ +
+ + +
#[no_mangle]
+pub extern "C" fn poll(
+  _fds: *const u8,
+  _nfds: usize,
+  _timeout: i32,
+) -> u32 {
+  sleep_ms(500);
+  1
+}
+ +
+

The name of the function accidentally ( :) ) clashes with a well-known POSIX function.

+

However, this alone is not enough. +getrandom tries to use getrandom syscall first, and that code path does not use a spin lock. +We need to fool getrandom into believing that the syscall is not available. +Our extern "C" trick wouldnt have worked if getrandom literally used the syscall instruction. +However, as inline assembly (which you need to issue a syscall manually) is not available on stable Rust, getrandom goes via syscall function from libc. +That we can override with the same trick.

+

However, theres a wrinkle! +Traditionally, libc API used errno for error reporting. +That is, on a failure the function would return an single specific invalid value, and set the errno thread local variable to the specific error code. syscall follows this pattern.

+

The errno interface is cumbersome to use. +The worst part of errno is that the specification requires it to be a macro, and so you can only really use it from C source code. +Internally, on Linux the macro calls __get_errno_location function to get the thread local, but this is an implementation detail (which we will gladly take advantage of, in this land of reckless systems hacking!). The irony is that the ABI of Linux syscall just returns error codes, so libc has to do some legwork to adapt to the awkward errno interface.

+

So, heres a strong contender for the most cursed function Ive written so far:

+ +
+ + +
#[no_mangle]
+pub extern "C" fn syscall(
+  _syscall: u64,
+  _buf: *const u8,
+  _len: usize,
+  _flags: u32,
+) -> isize {
+  extern "C" {
+    fn __errno_location() -> *mut i32;
+  }
+  unsafe {
+    *__errno_location() = 38; // ENOSYS
+  }
+  -1
+}
+ +
+

It makes getrandom believe that theres no getrandom syscall, which causes it to fallback to /dev/random implementation.

+

To set thread priorities, we use thread_priority crate, which is a thin wrapper around pthread APIs. +We will be using real time priorities, which require sudo.

+

And here are the results:

+ +
+ + +
$ cargo build --release
+    Finished release [optimized] target(s) in 0.01s
+$ time sudo ./target/release/spin-of-death
+^CCommand terminated by signal 2
+real 136.54s
+user 96.02s
+sys  940.70s
+rss  6880k
+ +
+

Note that I had to kill the program after two minutes. +Also note the impressive system time, as well as load average

+ +
+ + +
+

If we patch getrandom to use std::sync::Once instead we get a much better result:

+ +
+ + +
$ cargo build --release --features os-blocking-getrandom
+    Finished release [optimized] target(s) in 0.01s
+$ time sudo ./target/release/spin-of-death
+real 0.51s 
+user 0.01s
+sys  0.04s
+rss  6912k
+ +
+
    +
  1. +Note how real is half a second, but user and sys are small. +Thats because we are waiting for 500 milliseconds in our poll +
  2. +
+

This is because Once uses OS facilities for blocking, and so OS notices that high priority threads are actually blocked and gives the low priority thread a chance to finish its work.

+
+
+ +

+ If Not a Spinlock, Then What? +

+

First, if you only use a spin lock because its faster for small critical sections, just replace it with a mutex from std or parking_lot. +They already do a small amount of spinning iterations before calling into the kernel, so they are as fast as a spinlock in the best case, and infinitely faster in the worst case.

+

Second, it seems like most problematic uses of spinlocks come from one time initialization (which is exactly what my once_cell crate helps with). I think it usually is possible to get away without using spinlocks. For example, instead of storing the state itself, the library may just delegate state storing to the user. For getrandom, it can expose two functions:

+ +
+ + +
fn init() -> Result<RandomState>;
+fn getrandom(state: &RandomState, buf: &mut[u8]) -> Result<usize>;
+ +
+

It then becomes the users problem to cache RandomState appropriately. +For example, std may continue using a thread local (src) while rand, with std feature enabled, could use a global variable, protected by Once.

+

Another option, if the state fits into usize and the initializing function is idempotent and relatively quick, is to do a racy initialization:

+ +
+ + +
pub fn get_state() -> usize {
+  static CACHE: AtomicUsize = AtomicUsize::new(0);
+  let mut res = CACHE.load(Ordering::Relaxed);
+  if res == 0 {
+    res = init();
+    CACHE.store(res, Ordering::Relaxed);
+  }
+  res
+}
+
+fn init() -> usize { ... }
+ +
+

Take a second to appreciate the absence of unsafe blocks and cross-core communication in the above example! +At worst, init will be called number of cores times (EDIT: this is wrong, thanks to /u/pcpthm for pointing this out!).

+

Theres also a nuclear option: parametrize the library by blocking behavior, and allow the user to supply their own synchronization primitive.

+

Third, sometimes you just know that theres only a single thread in the program, and you might want to use a spinlock just to silence those annoying compiler errors about static mut. +The primary use case here I think is WASM. A solution for this case is to assume that blocking just doesnt happen, and panic otherwise. This is what std does for Mutex on WASM, and what is implemented for once_cell in this PR: #82.

+

Discussion on /r/rust.

+

EDIT: If you enjoyed this post, you might also like this one:

+

https://probablydance.com/2019/12/30/measuring-mutexes-spinlocks-and-how-bad-the-linux-scheduler-really-is/

+

Looks like we have some contention here!

+

EDIT: theres now a follow up post, where we actually benchmark spinlocks:

+

https://matklad.github.io/2020/01/04/mutexes-are-faster-than-spinlocks.html

+
+
+
+ + + + + diff --git a/2020/01/04/mutexes-are-faster-than-spinlocks.html b/2020/01/04/mutexes-are-faster-than-spinlocks.html new file mode 100644 index 00000000..3b8a393d --- /dev/null +++ b/2020/01/04/mutexes-are-faster-than-spinlocks.html @@ -0,0 +1,470 @@ + + + + + + + Mutexes Are Faster Than Spinlocks + + + + + + + + + + + + +
+ +
+ +
+
+ +

Mutexes Are Faster Than Spinlocks

+

(at least on commodity desktop Linux with stock settings)

+

This is a followup to the previous post about spinlocks. +The gist of the previous post was that spinlocks have some pretty bad worst-case behaviors, and, for that reason, one shouldnt blindly use a spinlock if using a sleeping mutex or avoiding blocking altogether is cumbersome.

+

In the comments, I was pointed to this interesting article, which made me realize that theres another misconception:

+ +

Until today, I havent benchmarked any mutexes, so I dont know for sure. +However, what I know in theory about mutexes and spinlocks makes me doubt this claim, so lets find out.

+ +
+ +

+ Where Does The Misconception Come From? +

+

I do understand why people might think that way though. +A simplest mutex just makes lock / unlock syscalls when entering and exiting a critical section, offloading all synchronization to the kernel. +However, syscalls are slow and so, if the length of critical section is smaller than the length of two syscalls, spinning would be faster.

+

Its easy to eliminate the syscall on entry in an uncontended state. +We can try to optimistically CAS lock to the locked state, and call into kernel only if we failed and need to sleep. +Eliminating syscall on exit is tricky, and so I think historically many implementations did at least one syscall in practice. +Thus, mutexes were, in fact, slower than spinlocks in some benchmarks.

+

However, modern mutex implementations avoid all syscalls if theres no contention. +The trick is to make the state of the mutex an enum: unlocked, locked with some waiting threads, locked without waiting threads. +This way, we only need to call into the kernel if there are in fact waiters.

+

Another historical benefit of spinlocks is that they are smaller in size. +A state of a spinlock is just a single boolean variable, while for a mutex you also need a queue of waiting threads. But theres a trick to combat this inefficiency as well. +We can use the address of the boolean flag as token to identify the mutex, and store non-empty queues in a side table. +Note how this also reduces the (worst case) total number of queues from number of mutexes to number of threads!

+

So a modern mutex, like the one in WTF::ParkingLot, is a single boolean, which behaves more or less like a spinlock in an uncontended case but doesnt have pathological behaviors of the spinlock.

+
+
+ +

+ Benchmark +

+

So, lets check if the theory works in practice! +The source code for the benchmark is here:

+

https://github.com/matklad/lock-bench

+

The interesting bit is reproduced below:

+ +
+ + +
fn run_bench<M: Mutex>(options: &Options) -> time::Duration {
+  let locks = &(0..options.n_locks) 
+      .map(|_| CachePadded::new(M::default()))
+      .collect::<Vec<_>>();
+
+  let start_barrier =
+    &Barrier::new(options.n_threads as usize + 1);
+  let end_barrier =
+    &Barrier::new(options.n_threads as usize + 1);
+
+  scope(|scope| {
+    let thread_seeds = random_numbers(0x6F4A955E)
+      .scan(0x9BA2BF27, |state, n| {
+        *state ^= n;
+        Some(*state)
+      })
+      .take(options.n_threads as usize);
+
+    for thread_seed in thread_seeds {
+      scope.spawn(move |_| {
+        start_barrier.wait();
+        let indexes = random_numbers(thread_seed)
+          .map(|it| it % options.n_locks)
+          .map(|it| it as usize)
+          .take(options.n_ops as usize);
+        for idx in indexes {
+          locks[idx].with_lock(|cnt| *cnt += 1); 
+        }
+        end_barrier.wait();
+      });
+    }
+
+    std::thread::sleep(time::Duration::from_millis(100));
+    start_barrier.wait();
+    let start = time::Instant::now();
+    end_barrier.wait();
+    let elapsed = start.elapsed();
+
+    let mut total = 0;
+    for lock in locks.iter() {
+      lock.with_lock(|cnt| total += *cnt);
+    }
+    assert_eq!(total, options.n_threads * options.n_ops); 
+
+    elapsed
+  })
+  .unwrap()
+}
+
+fn random_numbers(seed: u32) -> impl Iterator<Item = u32> { 
+  let mut random = seed;
+  iter::repeat_with(move || {
+    random ^= random << 13;
+    random ^= random >> 17;
+    random ^= random << 5;
+    random
+  })
+}
+ +
+

Our hypothesis is that mutexes are faster, so we need to pick a workload which favors spinlocks. +That is, we need to pick a very short critical section, and so we will just be incrementing a counter (1).

+

This is better than doing a dummy lock/unlock. +At the end of the benchmark, we will assert that the counter is indeed incremented the correct number of times (2). +This has a number of benefits:

+
    +
  • +This is a nice smoke test which at least makes sure that we havent done an off by one error anywhere. +
  • +
  • +As we will be benchmarking different implementations, its important to verify that they indeed give the same answer! More than once Ive made some piece of code ten times faster by accidentally eliminating some essential logic :D +
  • +
  • +We can be reasonably sure that compiler wont outsmart us and wont remove empty critical sections. +
  • +
+

Now, we can just make all the threads hammer a single global counter, but that would only test a situation of extreme contention. +We need to structure a benchmark in a way that allow us to vary contention level.

+

So instead of a single global counter, we will use an array of counters (3). +Each thread will be incrementing random elements of this array. +By varying the size of the array, we will be able to control the level of contention. +To avoid false sharing between neighboring elements of the array we will use crossbeams CachePadded. +To make the benchmark more reproducible, we will vendor a simple PRNG (4), which we seed manually.

+
+
+ +

+ Results +

+

We are testing std::sync::Mutex, parking_lot::Mutex, spin::Mutex and a bespoke implementation of spinlock from probablydance article. +We use 32 threads (on 4 core/8 hyperthreads CPU), and each thread increments some counter 10 000 times. +We run each benchmark 100 times and compute average, min and max times (we are primarily measuring throughput, so average makes more sense than median this time). +Finally, we run the whole suite twice, to sanity check that the results are reproducible.

+ +
+
Extreme Contention
+ + +
$ cargo run --release 32 2 10000 100
+    Finished release [optimized] target(s) in 0.01s
+     Running `target/release/lock-bench 32 2 10000 100`
+Options {
+    n_threads: 32,
+    n_locks: 2,
+    n_ops: 10000,
+    n_rounds: 100,
+}
+
+std::sync::Mutex     avg  97ms  min 38ms  max 103ms
+parking_lot::Mutex   avg  68ms  min 32ms  max  72ms
+spin::Mutex          avg 142ms  min 69ms  max 217ms
+AmdSpinlock          avg 127ms  min 50ms  max 219ms
+
+std::sync::Mutex     avg  98ms  min 68ms  max 125ms
+parking_lot::Mutex   avg  68ms  min 58ms  max  71ms
+spin::Mutex          avg 139ms  min 54ms  max 193ms
+AmdSpinlock          avg 127ms  min 50ms  max 210ms
+ +
+ +
+
Heavy contention
+ + +
$ cargo run --release 32 64 10000 100
+    Finished release [optimized] target(s) in 0.01s
+     Running `target/release/lock-bench 32 64 10000 100`
+Options {
+    n_threads: 32,
+    n_locks: 64,
+    n_ops: 10000,
+    n_rounds: 100,
+}
+
+std::sync::Mutex     avg 21ms  min 11ms  max  23ms
+parking_lot::Mutex   avg 10ms  min  6ms  max  11ms
+spin::Mutex          avg 55ms  min  7ms  max 161ms
+AmdSpinlock          avg 40ms  min  6ms  max 123ms
+
+std::sync::Mutex     avg 21ms  min 20ms  max  24ms
+parking_lot::Mutex   avg  9ms  min  6ms  max  12ms
+spin::Mutex          avg 48ms  min  7ms  max 138ms
+AmdSpinlock          avg 40ms  min  8ms  max 110ms
+ +
+ +
+
Light contention
+ + +
$ cargo run --release 32 1000 10000 100
+    Finished release [optimized] target(s) in 0.01s
+     Running `target/release/lock-bench 32 1000 10000 100`
+Options {
+    n_threads: 32,
+    n_locks: 1000,
+    n_ops: 10000,
+    n_rounds: 100,
+}
+
+std::sync::Mutex     avg 13ms  min 8ms   max  15ms
+parking_lot::Mutex   avg  6ms  min 3ms   max   8ms
+spin::Mutex          avg 37ms  min 4ms   max 115ms
+AmdSpinlock          avg 39ms  min 2ms   max 127ms
+
+std::sync::Mutex     avg 13ms  min 12ms  max  15ms
+parking_lot::Mutex   avg  6ms  min  5ms  max   8ms
+spin::Mutex          avg 39ms  min  4ms  max 102ms
+AmdSpinlock          avg 37ms  min  5ms  max 103ms
+ +
+ +
+
No contention
+ + +
$ cargo run --release 32 1000000 10000 100
+    Finished release [optimized] target(s) in 0.01s
+     Running `target/release/lock-bench 32 1000000 10000 100`
+Options {
+    n_threads: 32,
+    n_locks: 1000000,
+    n_ops: 10000,
+    n_rounds: 100,
+}
+
+std::sync::Mutex     avg 15ms  min 8ms   max 27ms
+parking_lot::Mutex   avg  7ms  min 4ms   max  9ms
+spin::Mutex          avg  5ms  min 4ms   max  8ms
+AmdSpinlock          avg  6ms  min 5ms   max 10ms
+
+std::sync::Mutex     avg 15ms  min 8ms   max 27ms
+parking_lot::Mutex   avg  6ms  min 4ms   max  9ms
+spin::Mutex          avg  5ms  min 4ms   max  7ms
+AmdSpinlock          avg  6ms  min 5ms   max  7ms
+ +
+
+
+ +

+ Analysis +

+

There are several interesting observations here!

+

First, we reproduce the result that the variance of spinlocks on Linux with default scheduling settings can be huge:

+ +
+ + +
parking_lot::Mutex  min 6ms  max  11ms
+AmdSpinlock         min 6ms  max 123ms
+ +
+

Note that these are extreme results for 100 runs, where each run does 32 * 10_000 lock operations. +That is, individual lock/unlock operations probably have an even higher spread.

+

Second, the uncontended case looks like I have expected: mutexes and spinlocks are not that different, because they essentially use the same code

+ +
+ + +
Parking_lot::Mutex   avg 6ms  min 4ms  max 9ms
+spin::Mutex          avg 5ms  min 4ms  max 7ms
+ +
+

Third, under heavy contention mutexes annihilate spinlocks:

+ +
+ + +
parking_lot::Mutex   avg 10ms  max  11ms
+spin::Mutex          avg 55ms  max 161ms
+ +
+

Now, this is the opposite of what I would naively expect. +Even in heavy contended state, the critical section is still extremely short, so for each thread, the most efficient strategy seems to spin for a couple of iterations.

+

But I think I can explain why mutexes are so much better in this case. +One reason is that with spinlocks a thread can get unlucky and be preempted in the critical section. +The other more important reason is that, at any given moment in time, there are many threads trying to enter the same critical section. +With spinlocks, all cores can be occupied by threads who compete for the same lock. +With mutexes, there is a queue of sleeping threads for each lock, and the kernel generally tries to make sure that only one thread from the group is awake.

+

This is a funny example of mechanical race to the bottom. Due to the short length of critical section, each individual thread would spend less CPU cycles in total if it were spinning, but it increases the overall cost.

+

EDIT: simpler and more plausible explanation from the author of Rusts parking lot is that it does exponential backoff when spinning, unlike the two spinlock implementations.

+

Fourth, even under heavy contention spin locks can luck out and finish almost as fast as mutexes:

+ +
+ + +
parking_lot::Mutex   avg 10ms  min 6ms
+spin::Mutex          avg 55ms  min 7ms
+ +
+

This again shows that a good mutex is roughly equivalent to a spinlock in the best case.

+

Fifth, the amount of contention required to disrupt spinlocks seems to be small. Even if 32 threads compete for 1 000 locks, spinlocks still are considerably slower:

+ +
+ + +
parking_lot::Mutex   avg  6ms  min 3ms   max   8ms
+spin::Mutex          avg 37ms  min 4ms   max 115ms
+ +
+

EDIT: someone on Reddit noticed that the number of threads is significantly higher than the number of cores, which is an unfortunate situation for spinlocks. +And, although the number of threads in the benchmark is configurable, it never occurred to me to actually vary it 😅! +Lowering the number of threads to four gives a picture similar to the no contention situation above: spinlocks a slightly, but not massively, faster. +Which makes total sense! as there are more cores than CPUs, theres no harm in spinning. +And, if you can carefully architecture you application such that it runs a small fixed number of threads, ideally pinned to specific CPUs (like in the seastar architecture), using spinlocks might make sense!

+
+
+ +

+ Disclaimer +

+

As usual, each benchmark exercises only a narrow slice from the space of possible configurations, so it would be wrong to draw a sweeping conclusion that mutexes are always faster. +For example, if you are in a situation where preemption is impossible (interrupts are disabled, cooperative multitasking, realtime scheduling, etc), spinlocks might be better (or even the only!) choice. +And theres also a chance the benchmark doesnt measure what I think it measures :-)

+

But I find this particular benchmark convincing enough to disprove that spinlocks are faster then mutexes for short critical sections. +In particular I find the qualitative observation that, under contention mutexes allow for better scheduling even if critical sections are short and not preempted in the middle, enlightening.

+
+
+ +

+ Reading List +

+ +

Discussion on /r/rust.

+
+
+
+ + + + + diff --git a/2020/02/14/why-rust-is-loved.html b/2020/02/14/why-rust-is-loved.html new file mode 100644 index 00000000..b2dc7c07 --- /dev/null +++ b/2020/02/14/why-rust-is-loved.html @@ -0,0 +1,450 @@ + + + + + + + Why is Rust the Most Loved Programming Language? + + + + + + + + + + + + +
+ +
+ +
+
+ +

Why is Rust the Most Loved Programming Language?

+

... by me?

+

Rust is my favorite programming language (other languages I enjoy are Kotlin and Python). +In this post I want to explain why I, somewhat irrationally, find this language so compelling. +The post does not try to explain why Rust is the most loved language according to +StackOverflow survey :-)

+

Additionally, this post does not cover the actual good reasons why one might want to use Rust. +Briefly:

+ +

If youd like to hear more about the above, this post will disappoint you :-)

+
+ +

+ Its All the Small Things! +

+

The reason why I irrationally like Rust is that it, subjectively, gets a lot of small details just right (or at least better than other languages I know). +The rest of the post would be a laundry list of those things, but first Id love to mention why I think Rust is the way it is.

+

First, it is a relatively young language, so it can have many obviously good things. +For example, I feel like theres a general consensus now that, by default, local variables should not be reassignable. +This probably was much less obvious in the 90s, when todays mainstream languages were designed.

+

Second, it does not try to maintain source/semantic compatibility with any existing language. +Even if we think that const by default is a good idea, we cant employ it in TypeScript, because it needs to stay compatible with JavaScript.

+

Third, (and this is a pure speculation on my part) I feel that the initial bunch of people who designed the language and its design principles just had an excellent taste!

+

So, to the list of adorable things!

+
+
+ +

+ Naming Convention +

+

To set the right mood for the rest of the discussion, let me start with claiming that snake_case is more readable than camelCase :-) +Similarly, XmlRpcRequest is better than XMLRPCRequest.

+

I believe that readability is partially a matter of habit. +But it also seems logical that _ is better at separating words than case change or nothing at all. +And, subjectively, after writing a bunch of camelCase and snake_case, I much prefer _.

+
+
+ +

+ Keyword First Syntax +

+

How would you Ctrl+F the definition of foo function in a Java file on GitHub? +Probably just foo(, which would give you both the definition and all the calls. +In Rust, youd search for fn foo. +In general, every construct is introduced by a leading keyword, which makes it much easier to read the code for a human. +When I read C++, I always have a hard time distinguishing field declarations from method declarations: they start the same. +Leading keywords also make it easier to do stupid text searches for things. +If you dont find this argument compelling because one should just use an IDE to look for methods, well, it actually makes implementing an IDE slightly easier as well:

+
    +
  • +Parsing has a nice LL(1) vibe to it, you just dispatch on the current token. +
  • +
  • +Parser resilience is easy, you can synchronize on leading keywords like fn, struct etc. +
  • +
  • +Its easier for the IDE to guess the intention of a user. +If you type fn, IDE recognizes that you want to add a new function and can, for example, complete function overrides for you. +
  • +
+
+
+ +

+ Type Last Syntax +

+

C-family languages usually use Type name order. +Languages with type inference, including Rust, usually go for name: Type. +Technically, this is more convenient because in a recursive descent parser its easier to make the second part optional. +Its also more readable, because you put the most important part, the name, first. +Because names are usually more uniform in length than types, groups of fields/local variables align better.

+
+
+ +

+ No Dangling Else +

+

Many languages use if (condition) { then_branch } syntax, where parenthesis around condition are mandatory, and braces around then_branch are optional. +Rust does the opposite, which has the following benefits:

+
    +
  • +Theres no need for a special rule to associate else with just the right if. Instead, else if is an indivisible unambiguous bit of syntax. +
  • +
  • +goto fail; bug is impossible; more generally, you dont have to make the decision if it is ok to omit the braces. +
  • +
+
+
+ +

+ Everything Is An Expression, Including Blocks +

+

I think everything is an expression is generally a good idea, because it makes things composable. +Just the other day I tried to handle null in TypeScript in a Kotlin way, with foo() ?? return false, and failed because return is not an expression.

+

The problem with traditional functional (Haskell/OCaml) approach is that it uses let name = expr in expression for introducing new variables, which just feels bulky. +Specifically, the closing in keyword feels verbose, and also emphasizes the nesting of expression. +The nesting is undoubtedly there, but usually it is very boring, and calling it out is not very helpful.

+

Rust doesnt have a let expression per se, instead it has flat-feeling blocks which can contain many let statements:

+ +
+ + +
let d = {
+    let a = 1;
+    let b = 6;
+    let c = 9;
+    b*b - 4*a*c
+};
+ +
+

This gives, subjectively, a lighter-weight syntax for introducing bindings and side-effecting statements, as well as an ability to nicely scope local variables to sub-blocks!

+
+
+ +

+ Immutable/non-Reassignable by Default +

+

In Rust, reassignable variables are declared with let mut and non-reassignable with let. +Note how the rarer option is more verbose, and how it is expressed as a modifier, and not a separate keyword, like let and const.

+
+
+ +

+ Namespaced Enums +

+

In Rust, enums (sum types, algebraic data types) are namespaced.

+

You declare enums like this:

+ +
+ + +
enum Expr {
+    Int(i32),
+    Bool(bool),
+    Sum { lhs: Box<Expr>, rhs: Box<Expr> },
+}
+ +
+

And use them like Expr::Int, without worrying that it might collide with

+ +
+ + +
enum Type {
+    Int,
+    Bool
+}
+ +
+

No more repetitive data Expr = ExprInt Int | ExprBool Bool | ExprSum Expr Expr!

+

Swift does even a nicer trick here, by using an .VariantName syntax to refer to a namespaced enum (docs). +This makes matching less verbose and completely dodges the sad Rust ambiguity between constants and bindings:

+ +
+ + +
let x: Option<i32> = Some(92);
+match x {
+    None => 1,
+    none => 2,
+}
+ +
+
+
+ +

+ Syntactic Separation of Fields and Methods +

+

Fields and methods are declared in separate blocks (like in Go):

+ +
+ + +
#[derive(Clone, Copy)]
+struct Point {
+    x: f64,
+    y: f64,
+}
+
+impl Point {
+    fn distance_to_origin(self) -> f64 {
+        let Point { x, y } = self;
+        (x*x + y*y).sqrt()
+    }
+    ...
+}
+ +
+

This is a huge improvement to readability: there are usually far fewer fields than methods, but by looking at the fields you can usually understand which set of methods can exist.

+
+
+ +

+ Integer Types +

+

u32 and i64 are shorter and clearer than unsigned int or long. +usize and isize cover the most important use case for arch-dependent integer type, and also make it clearer at the type level which things are addresses/indices, and which are quantities. +Theres also no question of how integer literals of various types look, its just 1i8 or 92u64

+

The overflow during arithmetic operations is considered a bug, traps in debug builds and wraps in release builds. +However, theres a plethora of methods like wrapping_add, saturating_sub, etc, so you can exactly specify behavior on overflow in specific cases where it is not a bug. +In general, methods on primitives allow to expose a ton of compiler intrinsics in a systematic way, like u64::count_ones.

+
+
+ +

+ Definitive Initialization +

+

Rust uses control flow analysis to check that every local variable is assigned before the first use. +This is a much better default than making this UB, or initializing all locals to some default value. +Additionally, Rust has a first-class support for diverging control flow (! type and loop {} construct), which protects it from at-a-distance changes like +this example +from Java.

+

Definitive initialization analysis is an interesting example of a language feature which requires relatively high-brow implementation techniques, but whose effects seem very intuitive, almost trivial, to the users of the language.

+
+
+ +

+ Crates +

+

The next two things are actually not so small.

+

Rust libraries (crates) dont have names. +More generally, Rust doesnt have any kind of global shared namespace.

+

This is in contrast to languages which have a concept of library path (PYTHONPATH, classpath, -I). +If you have a library path, you are exposed to name/symbol clashes between libraries. +While a name clash between two libraries seems pretty unlikely, theres a special case where collision happens regularly. +One of your dependencies can depend on libfoo v1, and another one on libfoo v2. +Usually this means that you either cant use the two libraries together, or need to implement some pretty horrific workarounds.

+

In Rust the name you use for a library is a property of the dependency edge between upstream and downstream crate. +That is, the single crate can be known under different names in different dependant crates or, vice versa, two different crates might be known under equal names in different parts of the crate graph! +This (and semver discipline, which is a social thing) is the reason why Cargo doesnt suffer from dependency hell as much as some other ecosystems.

+
+
+ +

+ Crate Visibility +

+

Related to the previous point, crates are also an important visibility boundary, which allows you clearly delineate public API of a library from implementation details. +This is a major improvement over class-level visibility controls.

+

Its interesting though that it took Rust two tries to get first-class exported from the library (pub) and internal to the library (pub(crate)) visibilities. +That is also the reason why more restrictive pub(crate) is unfortunately longer to write, I wish we used pub and pub*.

+

Before 2018 edition, Rust had a simpler and more orthogonal system, where you can only say visible in the parent, which happens to be exported if the parent is root or is itself exported. +But the old system is less convenient in practice, because you cant look at the declaration and immediately say if it is a part of crates public API or not.

+

The next language should use these library-level visibilities from the start.

+
+
+ +

+ Cross Platform Binaries +

+

Rust programs generally just work on Linux, Mac and Windows, and you dont need to install a separate runtime to run them.

+
+
+ +

+ Eq +

+

Equality operator (==) is not polymorphic, comparing things of different types (92 == "the answer") is a type error.

+
+
+ +

+ Ord +

+

The canonical comparison function returns an enum Ordering { Less, Equal, Greater }, you dont need to override all six comparison operators. +Rust also manages this without introducing a separate <=> spaceship operator just for this purpose. +And you still can implement fast path for == / != checks.

+
+
+ +

+ Debug & Display +

+

Rust defines two ways to turn something into a string: Display, which is intended for user-visible strings, and Debug, which is generally intended for printf debugging. +This is similar to Pythons __str__ and __repr__.

+

Unlike Python, the compiler derives Debug for you. +Being able to inspect all data structures is a huge productivity boost. +I hope some day well be able to call custom user-provided Debug from a debugger.

+

A nice bonus is that you can debug-print things in two modes:

+
    +
  • +compactly on a single-line +
  • +
  • +verbosely, on multiple lines as an indented tree +
  • +
+
+
+ +

+ Trivial Data Types +

+

Creating simple bag of data types takes almost no syntax, and you can opt-into all kinds of useful extra functionality:

+ +
+ + +
#[derive(
+    Debug,
+    Clone, Copy,
+    PartialEq, Eq,
+    PartialOrd, Ord,
+    Hash,
+    Serialize, Deserialize,
+)]
+struct Point {
+    x: i64,
+    y: i64,
+}
+ +
+
+
+ +

+ Strings +

+

Another obvious in retrospect thing.

+

Strings are represented as utf-8 byte buffers. +The encoding is fixed, cant be changed, and its validity is enforced. +Theres no random access to characters, but you can slice string with a byte index, provided that it doesnt fall in the middle of a multi-byte character.

+
+
+ +

+ assert! +

+

The default assert! macro is always enabled. +The flavor which can be disabled with a compilation flag, debug_assert, is more verbose.

+

Discussion on /r/rust.

+
+
+
+ + + + + diff --git a/2020/03/22/fast-simple-rust-interner.html b/2020/03/22/fast-simple-rust-interner.html new file mode 100644 index 00000000..c97ccd6d --- /dev/null +++ b/2020/03/22/fast-simple-rust-interner.html @@ -0,0 +1,399 @@ + + + + + + + Fast and Simple Rust Interner + + + + + + + + + + + + +
+ +
+ +
+
+ +

Fast and Simple Rust Interner

+

This post describes a simple technique for writing interners in Rust which I havent seen documented before.

+

String interning is a classical optimization when you have to deal with many equal strings. +The canonical example would be a compiler: most identifiers in a program are repeated several times.

+

Interning works by ensuring that theres only one canonical copy of each distinct string in memory. +It can give the following benefits:

+ +

The simplest possible interner in Rust could look like this:

+ +
+ + +
use std::collections::HashMap;
+
+#[derive(Default)]
+pub struct Interner {
+    map: HashMap<String, u32>,
+    vec: Vec<String>,
+}
+
+impl Interner {
+    pub fn intern(&mut self, name: &str) -> u32 {
+        if let Some(&idx) = self.map.get(name) {
+            return idx;
+        }
+        let idx = self.map.len() as u32;
+        self.map.insert(name.to_owned(), idx);
+        self.vec.push(name.to_owned());
+
+        debug_assert!(self.lookup(idx) == name);
+        debug_assert!(self.intern(name) == idx);
+
+        idx
+    }
+
+    pub fn lookup(&self, idx: u32) -> &str {
+        self.vec[idx as usize].as_str()
+    }
+}
+ +
+

To remove duplicates, we store strings in a HashMap. +To map from an index back to the string, we also store strings in a Vec.

+

I didnt quite like this solution yesterday, for two reasons:

+ +

So Ive spent a part of the evening cobbling together a non-allocating trie-based interner. +The result: trie does indeed asymptotically reduce the number of allocations from O(n) to O(log(n)). +Unfortunately, it is slower, larger and way more complex than the above snippet. +Minimizing allocations is important, but allocators are pretty fast, and that shouldnt be done at the expense of everything else. +Also, Rust HashMap (implemented by @Amanieu based on Swiss Table) is fast.

+ +
+For the curious, the Trie design I've used +

The trie is build on per-byte basis (each node has at most 256 children). +Each internal node is marked with a single byte. +Leaf nodes are marked with substrings, so that only the common prefix requires node per byte.

+

To avoid allocating individual interned strings, we store them in a single long String. +An interned string is represented by a Span (pair of indexes) inside the big buffer.

+

Trie itself is a tree structure, and we can use a standard trick of packing its nodes into array and using indexes to avoid allocating every node separately. +However, nodes themselves can be of varying size, as each node can have different number of children. +We can still array-allocate them, by rolling our own mini-allocator (using a segregated free list)!

+

Nodes children are represented as a sorted array of links. +We use binary search for indexing and simple linear shift insertion. +With at most 256 children per node, it shouldnt be that bad. +Additionally, we pre-allocate 256 nodes and use array indexing for the first transition.

+

Links are organized in layers. +The layer n stores a number of [Link] chunks of length 2n (in a single contiguous array). +Each chunk represents the links for a single node (with possibly some extra capacity). +Node can find its chunk because it knows the number of links (which gives the number of layers) and the first link in the layer. +A new link for the node is added to the current chunk if theres space. +If the chunk is full, it is copied to a chunk twice as big first. +The old chunk is then added to the list of free chunks for reuse.

+

Heres the whole definition of the data structure:

+ +
+ + +
pub struct Interner {
+    trie: Vec<Node>,
+    links: Vec<Layer>,
+    strs: Vec<Span>,
+    buf: String,
+}
+
+struct Span { start: u32, end: u32 }
+
+struct Node {
+    str: Option<u32>,
+    n_links: u8,
+    first_link: u32,
+//  layer: u32 = first_link.next_power_of_two(),
+}
+
+struct Link { byte: u8, node: u32, }
+
+struct Layer {
+    links: Vec<Link>,
+    free: Vec<u32>,
+}
+ +
+

Isnt it incredibly cool that you can look only at the fields and understand how the thing works, +without even seeing the rest 150 lines of relatively tricky implementation?

+ +
+

However, implementing a trie made me realize that theres a simple optimization we can apply to our naive interner to get rid of extra allocations. +In the trie, I concatenate all interned strings into one giant String and use (u32, u32) index pairs as an internal representation of string slice.

+

If we translate this idea to our naive interner, we get:

+ +
+ + +
struct Span { start: u32, end: u32 }
+
+struct Interner {
+    map: HashMap<Span, u32>,
+    vec: Vec<Span>,
+    buf: String,
+}
+
+impl Interner {
+    pub fn intern(&mut self, name: &str) -> u32 { ... }
+
+    pub fn lookup(&self, idx: u32) -> &str {
+        let Span { start, end } = self.vec[idx as usize]
+        &self.buf[start as usize..end as usize]
+    }
+}
+ +
+

The problem here is that we cant actually write implementations of Eq and Hash for Span to make this work. +In theory, this is possible: to compare two Spans, you resolve them to &str via buf, and then compare the strings. +However, Rust API does not allow to express this idea. +Moreover, even if HashMap allowed supplying a key closure at construction time, it wouldnt help!

+ +
+ + +
impl HashMap<K, V, KeyFn, Key>
+where
+    KeyFn: Fn(&K) -> Key,
+    Key: Hash + Eq,
+{
+    fn new_with_key_fn(key_fn: F) -> Self { ... }
+}
+ +
+

Such API would run afoul of the borrow checker. +The key_fn would have to borrow from the same struct. +What would work is supplying a key_fn at call-site for every HashMap operation, but that would hurt ergonomics and ease of use a lot. +This exact problem requires +slightly unusual +design of lazy values in Rust.

+ +

However, with a bit of unsafe, we can make something similar work. +The trick is to add strings to buf in such a way that they are never moved, even if more strings are added on top. +That way, we can just store &str in the HashMap. +To achieve address stability, we use another trick from the typed_arena crate. +If the buf is full (so that adding a new string would invalidate old pointers), we allocate a new buffer, twice as large, +without coping the contents of the old one.

+

Heres the full implementation:

+ +
+ + +
use std::{mem, collections::HashMap};
+
+pub struct Interner {
+    map: HashMap<&'static str, u32>,
+    vec: Vec<&'static str>,
+    buf: String,
+    full: Vec<String>,
+}
+
+impl Interner {
+    pub fn with_capacity(cap: usize) -> Interner {
+        let cap = cap.next_power_of_two();
+        Interner {
+            map: HashMap::default(),
+            vec: Vec::new(),
+            buf: String::with_capacity(cap),
+            full: Vec::new(),
+        }
+    }
+
+    pub fn intern(&mut self, name: &str) -> u32 {
+        if let Some(&id) = self.map.get(name) {
+            return id;
+        }
+        let name = unsafe { self.alloc(name) };
+        let id = self.map.len() as u32;
+        self.map.insert(name, id);
+        self.vec.push(name);
+
+        debug_assert!(self.lookup(id) == name);
+        debug_assert!(self.intern(name) == id);
+
+        id
+    }
+
+    pub fn lookup(&self, id: u32) -> &str {
+        self.vec[id as usize]
+    }
+
+    unsafe fn alloc(&mut self, name: &str) -> &'static str {
+        let cap = self.buf.capacity();
+        if cap < self.buf.len() + name.len() {
+            let new_cap = (cap.max(name.len()) + 1)
+                .next_power_of_two();
+            let new_buf = String::with_capacity(new_cap);
+            let old_buf = mem::replace(&mut self.buf, new_buf);
+            self.full.push(old_buf);
+        }
+
+        let interned = {
+            let start = self.buf.len();
+            self.buf.push_str(name);
+            &self.buf[start..]
+        };
+
+        &*(interned as *const str)
+    }
+}
+ +
+

The precise rule for increasing capacity is slightly more complicated:

+ +
+ + +
let new_cap = (cap.max(name.len()) + 1).next_power_of_two();
+ +
+

Just doubling wont be enough, we also need to make sure that the new string actually fits.

+

We could have used a single bufs: Vec<String> in place of both buf and full. +The benefit of splitting the last buffer into a dedicated field is that we statically guarantee that theres at least one buffer. +That way, we void a bounds check and/or .unwrap when accessing the active buffer.

+

We also use &'static str to fake interior references. +Miri (rust in-progress UB checker) is not entirely happy about this. +I havent dug into this yet, it might be another instance of +rust-lang/rust#61114. +To be on the safe side, we can use *const str instead, with a bit of boilerplate to delegate PartialEq and Hash. +Some kind of (hypothetical) 'unsafe lifetime could also be useful here! +The critical detail that makes our use of fake 'static sound here is that the alloc function is private. +The public lookup function shortens the lifetime to that of &self (via lifetime elision).

+

For the real implementation, I would change two things:

+ +

Thats all I have to say about fast and simple string interning in Rust! +Discussion on /r/rust.

+
+
+ + + + + diff --git a/2020/04/13/simple-but-powerful-pratt-parsing.html b/2020/04/13/simple-but-powerful-pratt-parsing.html new file mode 100644 index 00000000..65a8fc7b --- /dev/null +++ b/2020/04/13/simple-but-powerful-pratt-parsing.html @@ -0,0 +1,1398 @@ + + + + + + + Simple but Powerful Pratt Parsing + + + + + + + + + + + + +
+ +
+ +
+
+ +

Simple but Powerful Pratt Parsing

+

Welcome to my article about Pratt parsing the monad tutorial of syntactic analysis. +The number of Pratt parsing articles is so large that there exists a survey post :)

+

The goals of this particular article are:

+ +

This post assumes a fair bit of familiarity with parsing techniques, and, for example, does not explain what a context free grammar is.

+
+ +

+ Introduction +

+

Parsing is the process by which a compiler turns a sequence of tokens into a tree representation:

+ +
+ + +
                            Add
+                 Parser     / \
+ "1 + 2 * 3"    ------->   1  Mul
+                              / \
+                             2   3
+ +
+

There are many approaches to this task, which roughly fall into one of the broad two categories:

+
    +
  • +Using a DSL to specify an abstract grammar of the language +
  • +
  • +Hand-writing the parser +
  • +
+

Pratt parsing is one of the most frequently used techniques for hand-written parsing.

+
+
+ +

+ BNF +

+

The pinnacle of syntactic analysis theory is discovering the context free grammar +notation (often using BNF concrete syntax) for decoding linear structures into trees:

+ +
+ + +
Item ::=
+    StructItem
+  | EnumItem
+  | ...
+
+StructItem ::=
+    'struct' Name '{' FieldList '}'
+
+...
+ +
+

I remember being fascinated by this idea, especially by parallels with natural language sentence structure. +However, my optimism quickly waned once we got to describing expressions. +The natural expression grammar indeed allows one to see what is an expression.

+ +
+ + +
Expr ::=
+    Expr '+' Expr
+  | Expr '*' Expr
+  | '(' Expr ')'
+  | 'number'
+ +
+

Although this grammar looks great, it is in fact ambiguous and imprecise, and needs to be rewritten to be amendable to automated parser generation. +Specifically, we need to specify precedence and associativity of operators. +The fixed grammar looks like this:

+ +
+ + +
Expr ::=
+    Factor
+  | Expr '+' Factor
+
+Factor ::=
+    Atom
+  | Factor '*' Atom
+
+Atom ::=
+    'number'
+  | '(' Expr ')'
+ +
+

To me, the shape of expressions feels completely lost in this new formulation. +Moreover, it took me three or four courses in formal languages before I was able to reliably create this grammar myself.

+

And thats why I love Pratt parsing it is an enhancement of recursive descent parsing algorithm, which uses the natural terminology of precedence and associativity for parsing expressions, instead of grammar obfuscation techniques.

+
+
+ +

+ Recursive descent and left-recursion +

+

The simplest technique for hand-writing a parser is recursive descent, which +models the grammar as a set of mutually recursive functions. For example, the +above item grammar fragment can look like this:

+ +
+ + +
fn item(p: &mut Parser) {
+    match p.peek() {
+        STRUCT_KEYWORD => struct_item(p),
+        ENUM_KEYWORD   => enum_item(p),
+        ...
+    }
+}
+
+fn struct_item(p: &mut Parser) {
+    p.expect(STRUCT_KEYWORD);
+    name(p);
+    p.expect(L_CURLY);
+    field_list(p);
+    p.expect(R_CURLY);
+}
+
+...
+ +
+

Traditionally, text-books point out left-recursive grammars as the Achilles heel +of this approach, and use this drawback to motivate more advanced LR parsing +techniques. An example of problematic grammar can look like this:

+ +
+ + +
Sum ::=
+    Sum '+' Int
+  | Int
+ +
+

Indeed, if we naively code the sum function, it wouldnt be too useful:

+ +
+ + +
fn sum(p: &mut Parser) {
+    // Try first alternative
+    sum(p); 
+    p.expect(PLUS);
+    int(p);
+
+    // If that fails, try the second one
+    ...
+}
+ +
+
    +
  1. +At this point we immediately loop and overflow the stack +
  2. +
+

A theoretical fix to the problem involves rewriting the grammar to eliminate the left recursion. +However in practice, for a hand-written parser, a solution is much simpler breaking away with a pure recursive paradigm and using a loop:

+ +
+ + +
fn sum(p: &mut Parser) {
+    int(p);
+    while p.eat(PLUS) {
+        int(p);
+    }
+}
+ +
+
+
+ +

+ Pratt parsing, the general shape +

+

Using just loops wont be enough for parsing infix expressions. +Instead, Pratt parsing uses both loops and recursion:

+ +
+ + +
fn parse_expr() {
+    ...
+    loop {
+        ...
+        parse_expr()
+        ...
+    }
+}
+ +
+

Not only does it send your mind into Möbeus-shaped hamster wheel, it also handles associativity and precedence!

+
+
+ +

+ From Precedence to Binding Power +

+

I have a confession to make: I am always confused by high precedence and low precedence. In a + b * c, addition has a lower precedence, but it is at the top of the parse tree

+

So instead, I find thinking in terms of binding power more intuitive.

+ +
+ + +
expr:   A       +       B       *       C
+power:      3       3       5       5
+ +
+

The * is stronger, it has more power to hold together B and C, and so the expression is parsed as +A + (B * C).

+

What about associativity though? In A + B + C all operators seem to have the same power, and it is unclear which + to fold first. +But this can also be modelled with power, if we make it slightly asymmetric:

+ +
+ + +
expr:      A       +       B       +       C
+power:  0      3      3.1      3      3.1     0
+ +
+

Here, we pumped the right power of + just a little bit, so that it holds the right operand tighter. +We also added zeros at both ends, as there are no operators to bind from the sides. +Here, the first (and only the first) + holds both of its arguments tighter than the neighbors, so we can reduce it:

+ +
+ + +
expr:     (A + B)     +     C
+power:  0          3    3.1    0
+ +
+

Now we can fold the second plus and get (A + B) + C. +Or, in terms of the syntax tree, the second + really likes its right operand more than the left one, so it rushes to get hold of C. +While he does that, the first + captures both A and B, as they are uncontested.

+

What Pratt parsing does is that it finds these badass, stronger than neighbors operators, by processing the string left to right. +We are almost at a point where we finally start writing some code, but lets first look at the other running example. +We will use function composition operator, . (dot) as a right associative operator with a high binding power. +That is, f . g . h is parsed as f . (g . h), or, in terms of power

+ +
+ + +
  f     .    g     .    h
+0   8.5    8   8.5    8   0
+ +
+
+
+ +

+ Minimal Pratt Parser +

+

We will be parsing expressions where basic atoms are single character numbers and variables, and which uses punctuation for operators. +Lets define a simple tokenizer:

+ +
+ + +
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+enum Token {
+    Atom(char),
+    Op(char),
+    Eof,
+}
+
+struct Lexer {
+    tokens: Vec<Token>,
+}
+
+impl Lexer {
+    fn new(input: &str) -> Lexer {
+        let mut tokens = input
+            .chars()
+            .filter(|it| !it.is_ascii_whitespace())
+            .map(|c| match c {
+                '0'..='9' |
+                'a'..='z' | 'A'..='Z' => Token::Atom(c),
+                _ => Token::Op(c),
+            })
+            .collect::<Vec<_>>();
+        tokens.reverse();
+        Lexer { tokens }
+    }
+
+    fn next(&mut self) -> Token {
+        self.tokens.pop().unwrap_or(Token::Eof)
+    }
+    fn peek(&mut self) -> Token {
+        self.tokens.last().copied().unwrap_or(Token::Eof)
+    }
+}
+ +
+

To make sure that we got the precedence binding power correctly, we will be transforming infix expressions into a gold-standard (not so popular in Poland, for whatever reason) unambiguous notation S-expressions:
+1 + 2 * 3 == (+ 1 (* 2 3)).

+ +
+ + +
use std::fmt;
+
+enum S {
+    Atom(char),
+    Cons(char, Vec<S>),
+}
+
+impl fmt::Display for S {
+    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+        match self {
+            S::Atom(i) => write!(f, "{}", i),
+            S::Cons(head, rest) => {
+                write!(f, "({}", head)?;
+                for s in rest {
+                    write!(f, " {}", s)?
+                }
+                write!(f, ")")
+            }
+        }
+    }
+}
+ +
+

And lets start with just this: expressions with atoms and two infix binary operators, + and *:

+ +
+ + +
fn expr(input: &str) -> S {
+    let mut lexer = Lexer::new(input);
+    expr_bp(&mut lexer)
+}
+
+fn expr_bp(lexer: &mut Lexer) -> S {
+    todo!()
+}
+
+#[test]
+fn tests() {
+    let s = expr("1 + 2 * 3");
+    assert_eq!(s.to_string(), "(+ 1 (* 2 3))")
+}
+ +
+

So, the general approach is roughly the one we used to deal with left recursion start with parsing a first number, and then loop, consuming operators and doing something?

+ +
+ + +
fn expr_bp(lexer: &mut Lexer) -> S {
+    let lhs = match lexer.next() {
+        Token::Atom(it) => S::Atom(it),
+        t => panic!("bad token: {:?}", t),
+    };
+
+    loop {
+        let op = match lexer.next() {
+            Token::Eof => break,
+            Token::Op(op) => op,
+            t => panic!("bad token: {:?}", t),
+        };
+
+        todo!()
+    }
+
+    lhs
+}
+
+#[test]
+fn tests() {
+    let s = expr("1"); 
+    assert_eq!(s.to_string(), "1");
+}
+ +
+
    +
  1. +Note that we already can parse this simple test! +
  2. +
+

We want to use this power idea, so lets compute both left and right powers of the operator. +Well use u8 to represent power, so, for associativity, well add 1. +And well reserve the 0 power for the end of input, so the lowest power operator can have is 1.

+ +
+ + +
fn expr_bp(lexer: &mut Lexer) -> S {
+    let lhs = match lexer.next() {
+        Token::Atom(it) => S::Atom(it),
+        t => panic!("bad token: {:?}", t),
+    };
+
+    loop {
+        let op = match lexer.peek() {
+            Token::Eof => break,
+            Token::Op(op) => op,
+            t => panic!("bad token: {:?}", t),
+        };
+        let (l_bp, r_bp) = infix_binding_power(op);
+
+        todo!()
+    }
+
+    lhs
+}
+
+fn infix_binding_power(op: char) -> (u8, u8) {
+    match op {
+        '+' | '-' => (1, 2),
+        '*' | '/' => (3, 4),
+        _ => panic!("bad op: {:?}")
+    }
+}
+ +
+

And now comes the tricky bit, where we introduce recursion into the picture. +Lets think about this example (with powers below):

+ +
+ + +
a   +   b   *   c   *   d   +   e
+  1   2   3   4   3   4   1   2
+ +
+

The cursor is at the first +, we know that the left bp is 1 and the right one is 2. +The lhs stores a. +The next operator after + is *, so we shouldnt add b to a. +The problem is that we havent yet seen the next operator, we are just past +. +Can we add a lookahead? +Looks like no wed have to look past all of b, c and d to find the next operator with lower binding power, which sounds pretty unbounded. +But we are onto something! +Our current right priority is 2, and, to be able to fold the expression, we need to find the next operator with lower priority. +So lets recursively call expr_bp starting at b, but also tell it to stop as soon as bp drops below 2. +This necessitates the addition of min_bp argument to the main function.

+

And lo, we have a fully functioning minimal Pratt parser:

+ +
+ + +
fn expr(input: &str) -> S {
+    let mut lexer = Lexer::new(input);
+    expr_bp(&mut lexer, 0) 
+}
+
+fn expr_bp(lexer: &mut Lexer, min_bp: u8) -> S { 
+    let mut lhs = match lexer.next() {
+        Token::Atom(it) => S::Atom(it),
+        t => panic!("bad token: {:?}", t),
+    };
+
+    loop {
+        let op = match lexer.peek() {
+            Token::Eof => break,
+            Token::Op(op) => op,
+            t => panic!("bad token: {:?}", t),
+        };
+
+        let (l_bp, r_bp) = infix_binding_power(op);
+        if l_bp < min_bp { 
+            break;
+        }
+
+        lexer.next(); 
+        let rhs = expr_bp(lexer, r_bp);
+
+        lhs = S::Cons(op, vec![lhs, rhs]); 
+    }
+
+    lhs
+}
+
+fn infix_binding_power(op: char) -> (u8, u8) {
+    match op {
+        '+' | '-' => (1, 2),
+        '*' | '/' => (3, 4),
+        _ => panic!("bad op: {:?}"),
+    }
+}
+
+#[test]
+fn tests() {
+    let s = expr("1");
+    assert_eq!(s.to_string(), "1");
+
+    let s = expr("1 + 2 * 3");
+    assert_eq!(s.to_string(), "(+ 1 (* 2 3))");
+
+    let s = expr("a + b * c * d + e");
+    assert_eq!(s.to_string(), "(+ (+ a (* (* b c) d)) e)");
+}
+ +
+
    +
  1. +min_bp argument is the crucial addition. expr_bp now parses expressions with relatively high binding power. As soon as it sees something weaker than min_bp, it stops. +
  2. +
  3. +This is the it stops point. +
  4. +
  5. +And here we bump past the operator itself and make the recursive call. +Note how we use l_bp to check against min_bp, and r_bp as the new min_bp of the recursive call. +So, you can think about min_bp as the binding power of the operator to the left of the current expressions. +
  6. +
  7. +Finally, after parsing the correct right hand side, we assemble the new current expression. +
  8. +
  9. +To start the recursion, we use binding power of zero. +Remember, at the beginning the binding power of the operator to the left is the lowest possible, zero, as theres no actual operator there. +
  10. +
+

So, yup, these 40 lines are the Pratt parsing algorithm. +They are tricky, but, if you understand them, everything else is straightforward additions.

+
+
+ +

+ Bells and Whistles +

+

Now lets add all kinds of weird expressions to show the power and flexibility of the algorithm. +First, lets add a high-priority, right associative function composition operator: .:

+ +
+ + +
fn infix_binding_power(op: char) -> (u8, u8) {
+    match op {
+        '+' | '-' => (1, 2),
+        '*' | '/' => (3, 4),
+        '.' => (6, 5),
+        _ => panic!("bad op: {:?}"),
+    }
+}
+ +
+

Yup, its a single line! +Note how the left side of the operator binds tighter, which gives us desired right associativity:

+ +
+ + +
let s = expr("f . g . h");
+assert_eq!(s.to_string(), "(. f (. g h))");
+
+let s = expr(" 1 + 2 + f . g . h * 3 * 4");
+assert_eq!(s.to_string(), "(+ (+ 1 2) (* (* (. f (. g h)) 3) 4))");
+ +
+

Now, lets add unary -, which binds tighter than binary arithmetic operators, but less tight than composition. +This requires changes to how we start our loop, as we no longer can assume that the first token is an atom, and need to handle minus as well. +But let the types drive us. +First, we start with binding powers. +As this is an unary operator, it really only have right binding power, so, ahem, lets just code this:

+ +
+ + +
fn prefix_binding_power(op: char) -> ((), u8) { 
+    match op {
+        '+' | '-' => ((), 5),
+        _ => panic!("bad op: {:?}", op),
+    }
+}
+
+fn infix_binding_power(op: char) -> (u8, u8) {
+    match op {
+        '+' | '-' => (1, 2),
+        '*' | '/' => (3, 4),
+        '.' => (8, 7), 
+        _ => panic!("bad op: {:?}"),
+    }
+}
+ +
+
    +
  1. +Here, we return a dummy () to make it clear that this is a prefix, and not a postfix operator, and thus can only bind things to the right. +
  2. +
  3. +Note, as we want to add unary - between . and *, we need to shift priorities of . by two. +The general rule is that we use an odd priority as base, and bump it by one for associativity, if the operator is binary. For unary minus it doesnt matter and we could have used either 5 or 6, but sticking to odd is more consistent. +
  4. +
+

Plugging this into expr_bp, we get:

+ +
+ + +
fn expr_bp(lexer: &mut Lexer, min_bp: u8) -> S {
+    let mut lhs = match lexer.next() {
+        Token::Atom(it) => S::Atom(it),
+        Token::Op(op) => {
+            let ((), r_bp) = prefix_binding_power(op);
+            todo!()
+        }
+        t => panic!("bad token: {:?}", t),
+    };
+    ...
+}
+ +
+

Now, we only have r_bp and not l_bp, so lets just copy-paste half of the code from the main loop? +Remember, we use r_bp for recursive calls.

+ +
+ + +
fn expr_bp(lexer: &mut Lexer, min_bp: u8) -> S {
+    let mut lhs = match lexer.next() {
+        Token::Atom(it) => S::Atom(it),
+        Token::Op(op) => {
+            let ((), r_bp) = prefix_binding_power(op);
+            let rhs = expr_bp(lexer, r_bp);
+            S::Cons(op, vec![rhs])
+        }
+        t => panic!("bad token: {:?}", t),
+    };
+
+    loop {
+        let op = match lexer.peek() {
+            Token::Eof => break,
+            Token::Op(op) => op,
+            t => panic!("bad token: {:?}", t),
+        };
+
+        let (l_bp, r_bp) = infix_binding_power(op);
+        if l_bp < min_bp {
+            break;
+        }
+
+        lexer.next();
+        let rhs = expr_bp(lexer, r_bp);
+
+        lhs = S::Cons(op, vec![lhs, rhs]);
+    }
+
+    lhs
+}
+
+#[test]
+fn tests() {
+    ...
+
+    let s = expr("--1 * 2");
+    assert_eq!(s.to_string(), "(* (- (- 1)) 2)");
+
+    let s = expr("--f . g");
+    assert_eq!(s.to_string(), "(- (- (. f g)))");
+}
+ +
+

Amusingly, this purely mechanical, type-driven transformation works. +You can also reason why it works, of course. +The same argument applies; after weve consumed a prefix operator, the operand consists of operators that bind tighter, and we just so conveniently happen to have a function which can parse expressions tighter than the specified power.

+

Ok, this is getting stupid. +If using ((), u8) just worked for prefix operators, can (u8, ()) deal with postfix ones? +Well, lets add ! for factorials. It should bind tighter than -, because -(92!) is obviously more useful than (-92)!. +So, the familiar drill new priority function, shifting priority of . (this bit is annoying in Pratt parsers), copy-pasting the code

+ +
+ + +
let (l_bp, ()) = postfix_binding_power(op);
+if l_bp < min_bp {
+    break;
+}
+
+let (l_bp, r_bp) = infix_binding_power(op);
+if l_bp < min_bp {
+    break;
+}
+ +
+

Wait, somethings wrong here. +After weve parsed the prefix expression, we can see either a postfix or an infix operator. +But we bail on unrecognized operators, which is not going to work… +So, lets make postfix_binding_power to return an option, for the case where the operator is not postfix:

+ +
+ + +
fn expr_bp(lexer: &mut Lexer, min_bp: u8) -> S {
+    let mut lhs = match lexer.next() {
+        Token::Atom(it) => S::Atom(it),
+        Token::Op(op) => {
+            let ((), r_bp) = prefix_binding_power(op);
+            let rhs = expr_bp(lexer, r_bp);
+            S::Cons(op, vec![rhs])
+        }
+        t => panic!("bad token: {:?}", t),
+    };
+
+    loop {
+        let op = match lexer.peek() {
+            Token::Eof => break,
+            Token::Op(op) => op,
+            t => panic!("bad token: {:?}", t),
+        };
+
+        if let Some((l_bp, ())) = postfix_binding_power(op) {
+            if l_bp < min_bp {
+                break;
+            }
+            lexer.next();
+
+            lhs = S::Cons(op, vec![lhs]);
+            continue;
+        }
+
+        let (l_bp, r_bp) = infix_binding_power(op);
+        if l_bp < min_bp {
+            break;
+        }
+
+        lexer.next();
+        let rhs = expr_bp(lexer, r_bp);
+
+        lhs = S::Cons(op, vec![lhs, rhs]);
+    }
+
+    lhs
+}
+
+fn prefix_binding_power(op: char) -> ((), u8) {
+    match op {
+        '+' | '-' => ((), 5),
+        _ => panic!("bad op: {:?}", op),
+    }
+}
+
+fn postfix_binding_power(op: char) -> Option<(u8, ())> {
+    let res = match op {
+        '!' => (7, ()),
+        _ => return None,
+    };
+    Some(res)
+}
+
+fn infix_binding_power(op: char) -> (u8, u8) {
+    match op {
+        '+' | '-' => (1, 2),
+        '*' | '/' => (3, 4),
+        '.' => (10, 9),
+        _ => panic!("bad op: {:?}"),
+    }
+}
+
+#[test]
+fn tests() {
+    let s = expr("-9!");
+    assert_eq!(s.to_string(), "(- (! 9))");
+
+    let s = expr("f . g !");
+    assert_eq!(s.to_string(), "(! (. f g))");
+}
+ +
+

Amusingly, both the old and the new tests pass.

+

Now, we are ready to add a new kind of expression: parenthesised expression. +It is actually not that hard, and we could have done it from the start, but it makes sense to handle this here, youll see in a moment why. +Parens are just a primary expressions, and are handled similar to atoms:

+ +
+ + +
let mut lhs = match lexer.next() {
+    Token::Atom(it) => S::Atom(it),
+    Token::Op('(') => {
+        let lhs = expr_bp(lexer, 0);
+        assert_eq!(lexer.next(), Token::Op(')'));
+        lhs
+    }
+    Token::Op(op) => {
+        let ((), r_bp) = prefix_binding_power(op);
+        let rhs = expr_bp(lexer, r_bp);
+        S::Cons(op, vec![rhs])
+    }
+    t => panic!("bad token: {:?}", t),
+};
+ +
+

Unfortunately, the following test fails:

+ +
+ + +
let s = expr("(((0)))");
+assert_eq!(s.to_string(), "0");
+ +
+

The panic comes from the loop below the only termination condition we have is reaching eof, and ) is definitely not eof. +The easiest way to fix that is to change infix_binding_power to return None on unrecognized operands. +That way, itll become similar to postfix_binding_power again!

+ +
+ + +
fn expr_bp(lexer: &mut Lexer, min_bp: u8) -> S {
+    let mut lhs = match lexer.next() {
+        Token::Atom(it) => S::Atom(it),
+        Token::Op('(') => {
+            let lhs = expr_bp(lexer, 0);
+            assert_eq!(lexer.next(), Token::Op(')'));
+            lhs
+        }
+        Token::Op(op) => {
+            let ((), r_bp) = prefix_binding_power(op);
+            let rhs = expr_bp(lexer, r_bp);
+            S::Cons(op, vec![rhs])
+        }
+        t => panic!("bad token: {:?}", t),
+    };
+
+    loop {
+        let op = match lexer.peek() {
+            Token::Eof => break,
+            Token::Op(op) => op,
+            t => panic!("bad token: {:?}", t),
+        };
+
+        if let Some((l_bp, ())) = postfix_binding_power(op) {
+            if l_bp < min_bp {
+                break;
+            }
+            lexer.next();
+
+            lhs = S::Cons(op, vec![lhs]);
+            continue;
+        }
+
+        if let Some((l_bp, r_bp)) = infix_binding_power(op) {
+            if l_bp < min_bp {
+                break;
+            }
+
+            lexer.next();
+            let rhs = expr_bp(lexer, r_bp);
+
+            lhs = S::Cons(op, vec![lhs, rhs]);
+            continue;
+        }
+
+        break;
+    }
+
+    lhs
+}
+
+fn prefix_binding_power(op: char) -> ((), u8) {
+    match op {
+        '+' | '-' => ((), 5),
+        _ => panic!("bad op: {:?}", op),
+    }
+}
+
+fn postfix_binding_power(op: char) -> Option<(u8, ())> {
+    let res = match op {
+        '!' => (7, ()),
+        _ => return None,
+    };
+    Some(res)
+}
+
+fn infix_binding_power(op: char) -> Option<(u8, u8)> {
+    let res = match op {
+        '+' | '-' => (1, 2),
+        '*' | '/' => (3, 4),
+        '.' => (10, 9),
+        _ => return None,
+    };
+    Some(res)
+}
+ +
+

And now lets add array indexing operator: a[i]. +What kind of -fix is it? +Around-fix? +If it were just a[], it would clearly be postfix. +if it were just [i], it would work exactly like parens. +And it is the key: the i part doesnt really participate in the whole power game, as it is unambiguously delimited. So, lets do this:

+ +
+ + +
fn expr_bp(lexer: &mut Lexer, min_bp: u8) -> S {
+    let mut lhs = match lexer.next() {
+        Token::Atom(it) => S::Atom(it),
+        Token::Op('(') => {
+            let lhs = expr_bp(lexer, 0);
+            assert_eq!(lexer.next(), Token::Op(')'));
+            lhs
+        }
+        Token::Op(op) => {
+            let ((), r_bp) = prefix_binding_power(op);
+            let rhs = expr_bp(lexer, r_bp);
+            S::Cons(op, vec![rhs])
+        }
+        t => panic!("bad token: {:?}", t),
+    };
+
+    loop {
+        let op = match lexer.peek() {
+            Token::Eof => break,
+            Token::Op(op) => op,
+            t => panic!("bad token: {:?}", t),
+        };
+
+        if let Some((l_bp, ())) = postfix_binding_power(op) {
+            if l_bp < min_bp {
+                break;
+            }
+            lexer.next();
+
+            lhs = if op == '[' {
+                let rhs = expr_bp(lexer, 0);
+                assert_eq!(lexer.next(), Token::Op(']'));
+                S::Cons(op, vec![lhs, rhs])
+            } else {
+                S::Cons(op, vec![lhs])
+            };
+            continue;
+        }
+
+        if let Some((l_bp, r_bp)) = infix_binding_power(op) {
+            if l_bp < min_bp {
+                break;
+            }
+
+            lexer.next();
+            let rhs = expr_bp(lexer, r_bp);
+
+            lhs = S::Cons(op, vec![lhs, rhs]);
+            continue;
+        }
+
+        break;
+    }
+
+    lhs
+}
+
+fn prefix_binding_power(op: char) -> ((), u8) {
+    match op {
+        '+' | '-' => ((), 5),
+        _ => panic!("bad op: {:?}", op),
+    }
+}
+
+fn postfix_binding_power(op: char) -> Option<(u8, ())> {
+    let res = match op {
+        '!' | '[' => (7, ()), 
+        _ => return None,
+    };
+    Some(res)
+}
+
+fn infix_binding_power(op: char) -> Option<(u8, u8)> {
+    let res = match op {
+        '+' | '-' => (1, 2),
+        '*' | '/' => (3, 4),
+        '.' => (10, 9),
+        _ => return None,
+    };
+    Some(res)
+}
+
+#[test]
+fn tests() {
+    ...
+
+    let s = expr("x[0][1]");
+    assert_eq!(s.to_string(), "([ ([ x 0) 1)");
+}
+ +
+
    +
  1. +Note that we use the same priority for ! as for [. +In general, for the correctness of our algorithm its pretty important that, when we make decisions, priorities are never equal. +Otherwise, we might end up in a situation like the one before tiny adjustment for associativity, where there were two equally-good candidates for reduction. +However, we only compare right bp with left bp! +So for two postfix operators its OK to have priorities the same, as they are both right. +
  2. +
+

Finally, the ultimate boss of all operators, the dreaded ternary:

+ +
+ + +
c ? e1 : e2
+ +
+

Is this all-other-the-place-fix operator? +Well, lets change the syntax of ternary slightly:

+ +
+ + +
c [ e1 ] e2
+ +
+

And lets recall that a[i] turned out to be a postfix operator + parenthesis… +So, yeah, ? and : are actually a weird pair of parens! +And lets handle it as such! +Now, what about priority and associativity? +What associativity even is in this case?

+ +
+ + +
a ? b : c ? d : e
+ +
+

To figure it out, we just squash the parens part:

+ +
+ + +
a ?: c ?: e
+ +
+

This can be parsed as

+ +
+ + +
(a ?: c) ?: e
+ +
+

or as

+ +
+ + +
a ?: (c ?: e)
+ +
+

What is more useful? +For ?-chains like this:

+ +
+ + +
a ? b :
+c ? d :
+e
+ +
+

the right-associative reading is more useful. +Priority-wise, the ternary is low priority. +In C, only = and , have lower priority. +While we are at it, lets add C-style right associative = as well.

+

Heres our the most complete and perfect version of a simple Pratt parser:

+ +
+ + +
use std::{fmt, io::BufRead};
+
+enum S {
+    Atom(char),
+    Cons(char, Vec<S>),
+}
+
+impl fmt::Display for S {
+    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+        match self {
+            S::Atom(i) => write!(f, "{}", i),
+            S::Cons(head, rest) => {
+                write!(f, "({}", head)?;
+                for s in rest {
+                    write!(f, " {}", s)?
+                }
+                write!(f, ")")
+            }
+        }
+    }
+}
+
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+enum Token {
+    Atom(char),
+    Op(char),
+    Eof,
+}
+
+struct Lexer {
+    tokens: Vec<Token>,
+}
+
+impl Lexer {
+    fn new(input: &str) -> Lexer {
+        let mut tokens = input
+            .chars()
+            .filter(|it| !it.is_ascii_whitespace())
+            .map(|c| match c {
+                '0'..='9'
+                | 'a'..='z' | 'A'..='Z' => Token::Atom(c),
+                _ => Token::Op(c),
+            })
+            .collect::<Vec<_>>();
+        tokens.reverse();
+        Lexer { tokens }
+    }
+
+    fn next(&mut self) -> Token {
+        self.tokens.pop().unwrap_or(Token::Eof)
+    }
+    fn peek(&mut self) -> Token {
+        self.tokens.last().copied().unwrap_or(Token::Eof)
+    }
+}
+
+fn expr(input: &str) -> S {
+    let mut lexer = Lexer::new(input);
+    expr_bp(&mut lexer, 0)
+}
+
+fn expr_bp(lexer: &mut Lexer, min_bp: u8) -> S {
+    let mut lhs = match lexer.next() {
+        Token::Atom(it) => S::Atom(it),
+        Token::Op('(') => {
+            let lhs = expr_bp(lexer, 0);
+            assert_eq!(lexer.next(), Token::Op(')'));
+            lhs
+        }
+        Token::Op(op) => {
+            let ((), r_bp) = prefix_binding_power(op);
+            let rhs = expr_bp(lexer, r_bp);
+            S::Cons(op, vec![rhs])
+        }
+        t => panic!("bad token: {:?}", t),
+    };
+
+    loop {
+        let op = match lexer.peek() {
+            Token::Eof => break,
+            Token::Op(op) => op,
+            t => panic!("bad token: {:?}", t),
+        };
+
+        if let Some((l_bp, ())) = postfix_binding_power(op) {
+            if l_bp < min_bp {
+                break;
+            }
+            lexer.next();
+
+            lhs = if op == '[' {
+                let rhs = expr_bp(lexer, 0);
+                assert_eq!(lexer.next(), Token::Op(']'));
+                S::Cons(op, vec![lhs, rhs])
+            } else {
+                S::Cons(op, vec![lhs])
+            };
+            continue;
+        }
+
+        if let Some((l_bp, r_bp)) = infix_binding_power(op) {
+            if l_bp < min_bp {
+                break;
+            }
+            lexer.next();
+
+            lhs = if op == '?' {
+                let mhs = expr_bp(lexer, 0);
+                assert_eq!(lexer.next(), Token::Op(':'));
+                let rhs = expr_bp(lexer, r_bp);
+                S::Cons(op, vec![lhs, mhs, rhs])
+            } else {
+                let rhs = expr_bp(lexer, r_bp);
+                S::Cons(op, vec![lhs, rhs])
+            };
+            continue;
+        }
+
+        break;
+    }
+
+    lhs
+}
+
+fn prefix_binding_power(op: char) -> ((), u8) {
+    match op {
+        '+' | '-' => ((), 9),
+        _ => panic!("bad op: {:?}", op),
+    }
+}
+
+fn postfix_binding_power(op: char) -> Option<(u8, ())> {
+    let res = match op {
+        '!' => (11, ()),
+        '[' => (11, ()),
+        _ => return None,
+    };
+    Some(res)
+}
+
+fn infix_binding_power(op: char) -> Option<(u8, u8)> {
+    let res = match op {
+        '=' => (2, 1),
+        '?' => (4, 3),
+        '+' | '-' => (5, 6),
+        '*' | '/' => (7, 8),
+        '.' => (14, 13),
+        _ => return None,
+    };
+    Some(res)
+}
+
+#[test]
+fn tests() {
+    let s = expr("1");
+    assert_eq!(s.to_string(), "1");
+
+    let s = expr("1 + 2 * 3");
+    assert_eq!(s.to_string(), "(+ 1 (* 2 3))");
+
+    let s = expr("a + b * c * d + e");
+    assert_eq!(s.to_string(), "(+ (+ a (* (* b c) d)) e)");
+
+    let s = expr("f . g . h");
+    assert_eq!(s.to_string(), "(. f (. g h))");
+
+    let s = expr(" 1 + 2 + f . g . h * 3 * 4");
+    assert_eq!(
+        s.to_string(),
+        "(+ (+ 1 2) (* (* (. f (. g h)) 3) 4))",
+    );
+
+    let s = expr("--1 * 2");
+    assert_eq!(s.to_string(), "(* (- (- 1)) 2)");
+
+    let s = expr("--f . g");
+    assert_eq!(s.to_string(), "(- (- (. f g)))");
+
+    let s = expr("-9!");
+    assert_eq!(s.to_string(), "(- (! 9))");
+
+    let s = expr("f . g !");
+    assert_eq!(s.to_string(), "(! (. f g))");
+
+    let s = expr("(((0)))");
+    assert_eq!(s.to_string(), "0");
+
+    let s = expr("x[0][1]");
+    assert_eq!(s.to_string(), "([ ([ x 0) 1)");
+
+    let s = expr(
+        "a ? b :
+         c ? d
+         : e",
+    );
+    assert_eq!(s.to_string(), "(? a b (? c d e))");
+
+    let s = expr("a = 0 ? b : c = d");
+    assert_eq!(s.to_string(), "(= a (= (? 0 b c) d))")
+}
+
+fn main() {
+    for line in std::io::stdin().lock().lines() {
+        let line = line.unwrap();
+        let s = expr(&line);
+        println!("{}", s)
+    }
+}
+ +
+

The code is also available in +this repository, Eof :-)

+
+
+
+ + + + + diff --git a/2020/04/15/from-pratt-to-dijkstra.html b/2020/04/15/from-pratt-to-dijkstra.html new file mode 100644 index 00000000..d25f286c --- /dev/null +++ b/2020/04/15/from-pratt-to-dijkstra.html @@ -0,0 +1,1260 @@ + + + + + + + From Pratt to Dijkstra + + + + + + + + + + + + +
+ +
+ +
+
+ +

From Pratt to Dijkstra

+

This is a sequel to the previous post about Pratt parsing. +Here, well study the relationship between top-down operator precedence (Pratt parsing) and the more famous shunting yard algorithm. +Spoiler: they are the same algorithm, the difference is implementation style with recursion (Pratt) or a manual stack (Dijkstra).

+

Unlike the previous educational post, this one is going to be an excruciatingly boring pile of technicalities well just slowly and mechanically refactor our way to victory. +Specifically,

+
    +
  1. +We start with refactoring Pratt parser to minimize control flow variations. +
  2. +
  3. +Then, having arrived at the code with only one return and only one recursive call, we replace recursion with an explicit stack. +
  4. +
  5. +Finally, we streamline control in the iterative version. +
  6. +
  7. +At this point, we have a bona fide shunting yard algorithm. +
  8. +
+

To further reveal the connection, we further verify that the original recursive and the iterative formulation produce syntax nodes in the same order.

+

Really, the most exciting bit about this post is the conclusion, and you already know it :)

+
+ +

+ Starting Point +

+

Last time, weve ended up with the following code:

+ +
+ + +
enum S {
+    Atom(char),
+    Cons(char, Vec<S>),
+}
+
+impl fmt::Display for S {
+    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+        match self {
+            S::Atom(i) => write!(f, "{}", i),
+            S::Cons(head, rest) => {
+                write!(f, "({}", head)?;
+                for s in rest {
+                    write!(f, " {}", s)?
+                }
+                write!(f, ")")
+            }
+        }
+    }
+}
+
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+enum Token {
+    Atom(char),
+    Op(char),
+    Eof,
+}
+
+struct Lexer {
+    tokens: Vec<Token>,
+}
+
+impl Lexer {
+    fn new(input: &str) -> Lexer {
+        let mut tokens = input
+            .chars()
+            .filter(|it| !it.is_ascii_whitespace())
+            .map(|c| match c {
+                '0'..='9'
+                | 'a'..='z' | 'A'..='Z' => Token::Atom(c),
+                _ => Token::Op(c),
+            })
+            .collect::<Vec<_>>();
+        tokens.reverse();
+        Lexer { tokens }
+    }
+
+    fn next(&mut self) -> Token {
+        self.tokens.pop().unwrap_or(Token::Eof)
+    }
+    fn peek(&mut self) -> Token {
+        self.tokens.last().copied().unwrap_or(Token::Eof)
+    }
+}
+
+fn expr(input: &str) -> S {
+    let mut lexer = Lexer::new(input);
+    expr_bp(&mut lexer, 0)
+}
+
+fn expr_bp(lexer: &mut Lexer, min_bp: u8) -> S {
+    let mut lhs = match lexer.next() {
+        Token::Atom(it) => S::Atom(it),
+        Token::Op('(') => {
+            let lhs = expr_bp(lexer, 0);
+            assert_eq!(lexer.next(), Token::Op(')'));
+            lhs
+        }
+        Token::Op(op) => {
+            let ((), r_bp) = prefix_binding_power(op);
+            let rhs = expr_bp(lexer, r_bp);
+            S::Cons(op, vec![rhs])
+        }
+        t => panic!("bad token: {:?}", t),
+    };
+
+    loop {
+        let op = match lexer.peek() {
+            Token::Eof => break,
+            Token::Op(op) => op,
+            t => panic!("bad token: {:?}", t),
+        };
+
+        if let Some((l_bp, ())) = postfix_binding_power(op) {
+            if l_bp < min_bp {
+                break;
+            }
+            lexer.next();
+
+            lhs = if op == '[' {
+                let rhs = expr_bp(lexer, 0);
+                assert_eq!(lexer.next(), Token::Op(']'));
+                S::Cons(op, vec![lhs, rhs])
+            } else {
+                S::Cons(op, vec![lhs])
+            };
+            continue;
+        }
+
+        if let Some((l_bp, r_bp)) = infix_binding_power(op) {
+            if l_bp < min_bp {
+                break;
+            }
+            lexer.next();
+
+            lhs = if op == '?' {
+                let mhs = expr_bp(lexer, 0);
+                assert_eq!(lexer.next(), Token::Op(':'));
+                let rhs = expr_bp(lexer, r_bp);
+                S::Cons(op, vec![lhs, mhs, rhs])
+            } else {
+                let rhs = expr_bp(lexer, r_bp);
+                S::Cons(op, vec![lhs, rhs])
+            };
+            continue;
+        }
+
+        break;
+    }
+
+    lhs
+}
+
+fn prefix_binding_power(op: char) -> ((), u8) {
+    match op {
+        '+' | '-' => ((), 9),
+        _ => panic!("bad op: {:?}", op),
+    }
+}
+
+fn postfix_binding_power(op: char) -> Option<(u8, ())> {
+    let res = match op {
+        '!' => (11, ()),
+        '[' => (11, ()),
+        _ => return None,
+    };
+    Some(res)
+}
+
+fn infix_binding_power(op: char) -> Option<(u8, u8)> {
+    let res = match op {
+        '=' => (2, 1),
+        '?' => (4, 3),
+        '+' | '-' => (5, 6),
+        '*' | '/' => (7, 8),
+        '.' => (14, 13),
+        _ => return None,
+    };
+    Some(res)
+}
+ +
+

First, to not completely drown in minutia, well simplify it by removing support for indexing operator [] and ternary operator ?:. +We will keep parenthesis, left and right associative operators, and the unary minus (which is somewhat tricky to handle in shunting yard). +So this is our starting point:

+ +
+ + +
fn expr_bp(lexer: &mut Lexer, min_bp: u8) -> S {
+    let mut lhs = match lexer.next() {
+        Token::Atom(it) => S::Atom(it),
+        Token::Op('(') => {
+            let lhs = expr_bp(lexer, 0);
+            assert_eq!(lexer.next(), Token::Op(')'));
+            lhs
+        }
+        Token::Op(op) => {
+            let ((), r_bp) = prefix_binding_power(op);
+            let rhs = expr_bp(lexer, r_bp);
+            S::Cons(op, vec![rhs])
+        }
+        t => panic!("bad token: {:?}", t),
+    };
+
+    loop {
+        let op = match lexer.peek() {
+            Token::Eof => break,
+            Token::Op(op) => op,
+            t => panic!("bad token: {:?}", t),
+        };
+
+        if let Some((l_bp, ())) = postfix_binding_power(op) {
+            if l_bp < min_bp {
+                break;
+            }
+            lexer.next();
+
+            lhs = S::Cons(op, vec![lhs]);
+            continue;
+        }
+
+        if let Some((l_bp, r_bp)) = infix_binding_power(op) {
+            if l_bp < min_bp {
+                break;
+            }
+            lexer.next();
+
+            let rhs = expr_bp(lexer, r_bp);
+            lhs = S::Cons(op, vec![lhs, rhs]);
+            continue;
+        }
+
+        break;
+    }
+
+    lhs
+}
+ +
+

What I like about this code is how up-front it is about all special cases and control flow. +This is a shameless green code! +However, it is clear that we have a bunch of duplication between prefix, infix and postfix operators. +Our first step would be to simplify the control flow to its core.

+
+
+ +

+ Minimization +

+

First, lets merge postfix and infix cases, as they are almost the same. +The idea is to change priorities for ! from (11, ()) to (11, 100), where 100 is a special, very strong priority, which means that the right hand side of a binary operator is empty. +Well handle this in a pretty crude way right now, but all the hacks would go away once we refactor the rest.

+ +
+ + +
fn expr_bp(lexer: &mut Lexer, min_bp: u8) -> Option<S> {
+    if min_bp == 100 {
+        return None;
+    }
+    let mut lhs = match lexer.next() {
+        Token::Atom(it) => S::Atom(it),
+        Token::Op('(') => {
+            let lhs = expr_bp(lexer, 0).unwrap();
+            assert_eq!(lexer.next(), Token::Op(')'));
+            lhs
+        }
+        Token::Op(op) => {
+            let ((), r_bp) = prefix_binding_power(op);
+            let rhs = expr_bp(lexer, r_bp).unwrap();
+            S::Cons(op, vec![rhs])
+        }
+        t => panic!("bad token: {:?}", t),
+    };
+
+    loop {
+        let op = match lexer.peek() {
+            Token::Eof => break,
+            Token::Op(op) => op,
+            t => panic!("bad token: {:?}", t),
+        };
+
+        if let Some((l_bp, r_bp)) = infix_binding_power(op) {
+            if l_bp < min_bp {
+                break;
+            }
+            lexer.next();
+
+            let rhs = expr_bp(lexer, r_bp);
+            let mut args = Vec::new();
+            args.push(lhs);
+            args.extend(rhs);
+            lhs = S::Cons(op, args);
+            continue;
+        }
+
+        break;
+    }
+
+    Some(lhs)
+}
+ +
+

Yup, we just check for hard-coded 100 constant and use a bunch of unwraps all over the place. +But the code is already smaller.

+

Lets apply the same treatment for prefix operators. +Well need to move their handing into the loop, and we also need to make lhs optional, which is now not a big deal, as the function as a whole returns an Option. +On a happier note, this will allow us to remove the if 100 wart. +Whats more problematic is handing priorities: minus has different binding powers depending on whether it is in an infix or a prefix position. +We solve this problem by just adding an prefix: bool argument to the binding_power function.

+ +
+ + +
fn expr_bp(lexer: &mut Lexer, min_bp: u8) -> Option<S> {
+    let mut lhs = match lexer.peek() {
+        Token::Atom(it) => {
+            lexer.next();
+            Some(S::Atom(it))
+        }
+        Token::Op('(') => {
+            lexer.next();
+            let lhs = expr_bp(lexer, 0).unwrap();
+            assert_eq!(lexer.next(), Token::Op(')'));
+            Some(lhs)
+        }
+        _ => None,
+    };
+
+    loop {
+        let op = match lexer.peek() {
+            Token::Eof => break,
+            Token::Op(op) => op,
+            t => panic!("bad token: {:?}", t),
+        };
+
+        if let Some((l_bp, r_bp)) =
+            binding_power(op, lhs.is_none())
+        {
+            if l_bp < min_bp {
+                break;
+            }
+            lexer.next();
+
+            let rhs = expr_bp(lexer, r_bp);
+            let mut args = Vec::new();
+            args.extend(lhs);
+            args.extend(rhs);
+            lhs = Some(S::Cons(op, args));
+            continue;
+        }
+
+        break;
+    }
+
+    lhs
+}
+
+fn binding_power(op: char, prefix: bool) -> Option<(u8, u8)> {
+    let res = match op {
+        '=' => (2, 1),
+        '+' | '-' if prefix => (99, 9),
+        '+' | '-' => (5, 6),
+        '*' | '/' => (7, 8),
+        '!' => (11, 100),
+        '.' => (14, 13),
+        _ => return None,
+    };
+    Some(res)
+}
+ +
+

Keen readers might have noticed that we use 99 and not 100 here for no operand case. +This is not important yet, but will be during the next step.

+

Weve unified prefix, infix and postfix operators. +The next logical step is to treat atoms as nullary operators! +That is, well parse 92 into (92) S-expression, with None for both lhs and rhs. +We get this by using (99, 100) binding power. +At this stage, we can get rid of distinction between atom tokens and operator tokens, and make the lexer return underlying chars directly. +Well also get rid of S::Atom, which gives us this somewhat large change:

+ +
+ + +
enum S {
+    Cons(char, Vec<S>),
+}
+
+impl fmt::Display for S {
+    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+        match self {
+            S::Cons(head, rest) => {
+                if rest.is_empty() {
+                    write!(f, "{}", head)
+                } else {
+                    write!(f, "({}", head)?;
+                    for s in rest {
+                        write!(f, " {}", s)?
+                    }
+                    write!(f, ")")
+                }
+            }
+        }
+    }
+}
+
+struct Lexer {
+    tokens: Vec<char>,
+}
+
+impl Lexer {
+    fn new(input: &str) -> Lexer {
+        let mut tokens = input
+            .chars()
+            .filter(|it| !it.is_ascii_whitespace())
+            .collect::<Vec<_>>();
+        tokens.reverse();
+        Lexer { tokens }
+    }
+
+    fn next(&mut self) -> Option<char> {
+        self.tokens.pop()
+    }
+    fn peek(&mut self) -> Option<char> {
+        self.tokens.last().copied()
+    }
+}
+
+fn expr(input: &str) -> S {
+    let mut lexer = Lexer::new(input);
+    expr_bp(&mut lexer, 0).unwrap()
+}
+
+fn expr_bp(lexer: &mut Lexer, min_bp: u8) -> Option<S> {
+    let mut lhs = match lexer.peek() {
+        Some('(') => {
+            lexer.next();
+            let lhs = expr_bp(lexer, 0).unwrap();
+            assert_eq!(lexer.next(), Some(')'));
+            Some(lhs)
+        }
+        _ => None,
+    };
+
+    loop {
+        let token = match lexer.peek() {
+            Some(token) => token,
+            None => break,
+        };
+
+        if let Some((l_bp, r_bp)) =
+            binding_power(token, lhs.is_none())
+        {
+            if l_bp < min_bp {
+                break;
+            }
+            lexer.next();
+
+            let rhs = expr_bp(lexer, r_bp);
+            let mut args = Vec::new();
+            args.extend(lhs);
+            args.extend(rhs);
+            lhs = Some(S::Cons(token, args));
+            continue;
+        }
+
+        break;
+    }
+
+    lhs
+}
+
+fn binding_power(op: char, prefix: bool) -> Option<(u8, u8)> {
+    let res = match op {
+        '0'..='9' | 'a'..='z' | 'A'..='Z' => (99, 100),
+        '=' => (2, 1),
+        '+' | '-' if prefix => (99, 9),
+        '+' | '-' => (5, 6),
+        '*' | '/' => (7, 8),
+        '!' => (11, 100),
+        '.' => (14, 13),
+        _ => return None,
+    };
+    Some(res)
+}
+ +
+

This is the stage where it becomes important that fake binding power of unary - is 99. +After parsing first constant in 1 - 2 the r_bp is 100, and we need to avoid eating the following minus.

+

The only thing left outside the main loop are parenthesis. +We can deal with them using (99, 0) priority after ( we enter a new context where all operators are allowed.

+ +
+ + +
fn expr_bp(lexer: &mut Lexer, min_bp: u8) -> Option<S> {
+    let mut lhs = None;
+
+    loop {
+        let token = match lexer.peek() {
+            Some(token) => token,
+            None => break,
+        };
+
+        if let Some((l_bp, r_bp)) =
+            binding_power(token, lhs.is_none())
+        {
+            if l_bp < min_bp {
+                break;
+            }
+            lexer.next();
+
+            let rhs = expr_bp(lexer, r_bp);
+            if token == '(' {
+                assert_eq!(lexer.next(), Some(')'));
+                lhs = rhs;
+                continue;
+            }
+
+            let mut args = Vec::new();
+            args.extend(lhs);
+            args.extend(rhs);
+            lhs = Some(S::Cons(token, args));
+            continue;
+        }
+
+        break;
+    }
+
+    lhs
+}
+
+fn binding_power(op: char, prefix: bool) -> Option<(u8, u8)> {
+    let res = match op {
+        '0'..='9' | 'a'..='z' | 'A'..='Z' => (99, 100),
+        '(' => (99, 0),
+        '=' => (2, 1),
+        '+' | '-' if prefix => (99, 9),
+        '+' | '-' => (5, 6),
+        '*' | '/' => (7, 8),
+        '!' => (11, 100),
+        '.' => (14, 13),
+        _ => return None,
+    };
+    Some(res)
+}
+ +
+

Or, after some control flow cleanup:

+ +
+ + +
fn expr_bp(lexer: &mut Lexer, min_bp: u8) -> Option<S> {
+    let mut lhs = None;
+
+    loop {
+        let token = match lexer.peek() {
+            Some(token) => token,
+            None => return lhs,
+        };
+
+        let r_bp = match binding_power(token, lhs.is_none()) {
+            Some((l_bp, r_bp)) if min_bp <= l_bp => r_bp,
+            _ => return lhs,
+        };
+
+        lexer.next();
+
+        let rhs = expr_bp(lexer, r_bp);
+        if token == '(' {
+            assert_eq!(lexer.next(), Some(')'));
+            lhs = rhs;
+            continue;
+        }
+
+        let mut args = Vec::new();
+        args.extend(lhs);
+        args.extend(rhs);
+        lhs = Some(S::Cons(token, args));
+    }
+}
+ +
+

This is still recognizably a Pratt parse, with its characteristic shape

+ +
+ + +
fn parse_expr() {
+    loop {
+        ...
+        parse_expr()
+        ...
+    }
+}
+ +
+

What well do next is mechanical replacement of recursion with a manual stack.

+
+
+ +

+ From Recursion to Stack +

+

This is a general transformation and (I think) it can be done mechanically. +The interesting bits during transformation are recursive calls themselves and returns. +The underlying goal of the preceding refactorings was to reduce the number of recursive invocations to one. +We still have two return statements there, so lets condense that to just one as well:

+ +
+ + +
fn expr_bp(lexer: &mut Lexer, min_bp: u8) -> Option<S> {
+    let mut lhs = None;
+
+    loop {
+        let token = lexer.peek();
+        let (token, r_bp) =
+            match binding_power(token, lhs.is_none()) {
+                Some((t, (l_bp, r_bp))) if min_bp <= l_bp => {
+                    (t, r_bp)
+                }
+                _ => return lhs,
+            };
+
+        lexer.next();
+
+        let rhs = expr_bp(lexer, r_bp);
+        if token == '(' {
+            assert_eq!(lexer.next(), Some(')'));
+            lhs = rhs;
+            continue;
+        }
+
+        let mut args = Vec::new();
+        args.extend(lhs);
+        args.extend(rhs);
+        lhs = Some(S::Cons(token, args));
+    }
+}
+
+fn binding_power(
+    op: Option<char>,
+    prefix: bool,
+) -> Option<(char, (u8, u8))> {
+    let op = op?;
+    let res = match op {
+        '0'..='9' | 'a'..='z' | 'A'..='Z' => (99, 100),
+        '(' => (99, 0),
+        '=' => (2, 1),
+        '+' | '-' if prefix => (99, 9),
+        '+' | '-' => (5, 6),
+        '*' | '/' => (7, 8),
+        '!' => (11, 100),
+        '.' => (14, 13),
+        _ => return None,
+    };
+    Some((op, res))
+}
+ +
+

Next, we should reify locals which are live across the recursive call into a data structure. +If there were more than one recursive call, wed have to reify control-flow as enum as well, but weve prudently removed all but one recursive invocation.

+

So lets start with introducing a Frame struct, without actually adding a stack just yet.

+ +
+ + +
struct Frame {
+    min_bp: u8,
+    lhs: Option<S>,
+    token: Option<char>,
+}
+
+fn expr_bp(lexer: &mut Lexer, min_bp: u8) -> Option<S> {
+    let mut top = Frame {
+        min_bp,
+        lhs: None,
+        token: None,
+    };
+
+    loop {
+        let token = lexer.peek();
+        let (token, r_bp) =
+            match binding_power(token, top.lhs.is_none()) {
+                Some((t, (l_bp, r_bp))) if top.min_bp <= l_bp => {
+                    (t, r_bp)
+                }
+                _ => return top.lhs,
+            };
+        lexer.next();
+
+        top.token = Some(token);
+        let rhs = expr_bp(lexer, r_bp);
+        if token == '(' {
+            assert_eq!(lexer.next(), Some(')'));
+            top.lhs = rhs;
+            continue;
+        }
+
+        let mut args = Vec::new();
+        args.extend(top.lhs);
+        args.extend(rhs);
+        top.lhs = Some(S::Cons(token, args));
+    }
+}
+ +
+

And now, lets add a stack: Vec<Frame>. +This is the point where the magic happens. +Well still keep the top local variable: representing a stack as (T, Vec<T>) and not as just Vec<T> gives us compile-time guarantee of non-emptiness. +We replace the expr_bp(lexer, r_bp) recursive call with pushing to the stack. +All operations after the call are moved after return. +return itself is replaced with popping off the stack.

+ +
+ + +
fn expr_bp(lexer: &mut Lexer) -> Option<S> {
+    let mut top = Frame {
+        min_bp: 0,
+        lhs: None,
+        token: None,
+    };
+    let mut stack = Vec::new();
+
+    loop {
+        let token = lexer.peek();
+        let (token, r_bp) =
+            match binding_power(token, top.lhs.is_none()) {
+                Some((t, (l_bp, r_bp))) if top.min_bp <= l_bp => {
+                    (t, r_bp)
+                }
+                _ => {
+                    let res = top;
+                    top = match stack.pop() {
+                        Some(it) => it,
+                        None => return res.lhs,
+                    };
+
+                    if res.token == Some('(') {
+                        assert_eq!(lexer.next(), Some(')'));
+                        top.lhs = res.lhs;
+                        continue;
+                    }
+
+                    let mut args = Vec::new();
+                    args.extend(top.lhs);
+                    args.extend(res.lhs);
+                    top.lhs =
+                        Some(S::Cons(res.token.unwrap(), args));
+                    continue;
+                }
+            };
+        lexer.next();
+
+        stack.push(top);
+        top = Frame {
+            min_bp: r_bp,
+            lhs: None,
+            token: Some(token),
+        };
+    }
+}
+ +
+

Tada! No recursion anymore, and still passes the tests! +Lets cleanup this further though. +First, lets treat ) more like a usual operator. +The correct binding powers here are the opposite of (: (0, 100):

+ +
+ + +
fn expr_bp(lexer: &mut Lexer) -> Option<S> {
+    let mut top = Frame {
+        min_bp: 0,
+        lhs: None,
+        token: None,
+    };
+    let mut stack = Vec::new();
+
+    loop {
+        let token = lexer.peek();
+        let (token, r_bp) =
+            match binding_power(token, top.lhs.is_none()) {
+                Some((t, (l_bp, r_bp))) if top.min_bp <= l_bp => {
+                    (t, r_bp)
+                }
+                _ => {
+                    let res = top;
+                    top = match stack.pop() {
+                        Some(it) => it,
+                        None => return res.lhs,
+                    };
+
+                    let mut args = Vec::new();
+                    args.extend(top.lhs);
+                    args.extend(res.lhs);
+                    top.lhs =
+                        Some(S::Cons(res.token.unwrap(), args));
+                    continue;
+                }
+            };
+        lexer.next();
+        if token == ')' {
+            assert_eq!(top.token, Some('('));
+            let res = top;
+            top = stack.pop().unwrap();
+            top.lhs = res.lhs;
+            continue;
+        }
+
+        stack.push(top);
+        top = Frame {
+            min_bp: r_bp,
+            lhs: None,
+            token: Some(token),
+        };
+    }
+}
+
+fn binding_power(
+    op: Option<char>,
+    prefix: bool,
+) -> Option<(char, (u8, u8))> {
+    let op = op?;
+    let res = match op {
+        '0'..='9' | 'a'..='z' | 'A'..='Z' => (99, 100),
+        '(' => (99, 0),
+        ')' => (0, 100),
+        '=' => (2, 1),
+        '+' | '-' if prefix => (99, 9),
+        '+' | '-' => (5, 6),
+        '*' | '/' => (7, 8),
+        '!' => (11, 100),
+        '.' => (14, 13),
+        _ => return None,
+    };
+    Some((op, res))
+}
+ +
+

Finally, lets note that continue inside the match is somewhat wasteful when we hit it, well re-peek the same token again. +So lets repeat just the match until we know we can make progress. +This also allows replacing peek() / next() pair with just next().

+ +
+ + +
fn expr_bp(lexer: &mut Lexer) -> Option<S> {
+    let mut top = Frame {
+        min_bp: 0,
+        lhs: None,
+        token: None,
+    };
+    let mut stack = Vec::new();
+
+    loop {
+        let token = lexer.next();
+        let (token, r_bp) = loop {
+            match binding_power(token, top.lhs.is_none()) {
+                Some((t, (l_bp, r_bp))) if top.min_bp <= l_bp => {
+                    break (t, r_bp)
+                }
+                _ => {
+                    let res = top;
+                    top = match stack.pop() {
+                        Some(it) => it,
+                        None => return res.lhs,
+                    };
+
+                    let mut args = Vec::new();
+                    args.extend(top.lhs);
+                    args.extend(res.lhs);
+                    top.lhs =
+                        Some(S::Cons(res.token.unwrap(), args));
+                }
+            };
+        };
+
+        if token == ')' {
+            assert_eq!(top.token, Some('('));
+            let res = top;
+            top = stack.pop().unwrap();
+            top.lhs = res.lhs;
+            continue;
+        }
+
+        stack.push(top);
+        top = Frame {
+            min_bp: r_bp,
+            lhs: None,
+            token: Some(token),
+        };
+    }
+}
+ +
+

And guess what? This is the shunting yard algorithm, with its characteristic shape of

+ +
+ + +
loop {
+    let token = next_token();
+    while stack.top.priority > token.priority {
+        stack.pop()
+    }
+}
+ +
+

To drive the point home, lets print the tokens we pop off the stack, to verify that we get reverse Polish notation without any kind of additional tree rearrangement, just like in the original algorithm description:

+ +
+ + +
use std::{fmt, io::BufRead};
+
+enum S {
+    Cons(char, Vec<S>),
+}
+
+impl fmt::Display for S {
+    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+        match self {
+            S::Cons(head, rest) => {
+                if rest.is_empty() {
+                    write!(f, "{}", head)
+                } else {
+                    write!(f, "({}", head)?;
+                    for s in rest {
+                        write!(f, " {}", s)?
+                    }
+                    write!(f, ")")
+                }
+            }
+        }
+    }
+}
+
+struct Lexer {
+    tokens: Vec<char>,
+}
+
+impl Lexer {
+    fn new(input: &str) -> Lexer {
+        let mut tokens = input
+            .chars()
+            .filter(|it| !it.is_ascii_whitespace())
+            .collect::<Vec<_>>();
+        tokens.reverse();
+        Lexer { tokens }
+    }
+
+    fn next(&mut self) -> Option<char> {
+        self.tokens.pop()
+    }
+}
+
+fn expr(input: &str) -> S {
+    let mut lexer = Lexer::new(input);
+    eprintln!("{}", input);
+    let res = expr_bp(&mut lexer).unwrap();
+    eprintln!("{}\n", res);
+    res
+}
+
+struct Frame {
+    min_bp: u8,
+    lhs: Option<S>,
+    token: Option<char>,
+}
+
+fn expr_bp(lexer: &mut Lexer) -> Option<S> {
+    let mut top = Frame {
+        min_bp: 0,
+        lhs: None,
+        token: None,
+    };
+    let mut stack = Vec::new();
+
+    loop {
+        let token = lexer.next();
+        let (token, r_bp) = loop {
+            match binding_power(token, top.lhs.is_none()) {
+                Some((t, (l_bp, r_bp))) if top.min_bp <= l_bp =>{
+                    break (t, r_bp)
+                }
+                _ => {
+                    let res = top;
+                    top = match stack.pop() {
+                        Some(it) => it,
+                        None => {
+                            eprintln!();
+                            return res.lhs;
+                        }
+                    };
+
+                    let mut args = Vec::new();
+                    args.extend(top.lhs);
+                    args.extend(res.lhs);
+                    let token = res.token.unwrap();
+                    eprint!("{} ", token);
+                    top.lhs = Some(S::Cons(token, args));
+                }
+            };
+        };
+
+        if token == ')' {
+            assert_eq!(top.token, Some('('));
+            let res = top;
+            top = stack.pop().unwrap();
+            top.lhs = res.lhs;
+            continue;
+        }
+
+        stack.push(top);
+        top = Frame {
+            min_bp: r_bp,
+            lhs: None,
+            token: Some(token),
+        };
+    }
+}
+
+fn binding_power(
+    op: Option<char>,
+    prefix: bool,
+) -> Option<(char, (u8, u8))> {
+    let op = op?;
+    let res = match op {
+        '0'..='9' | 'a'..='z' | 'A'..='Z' => (99, 100),
+        '(' => (99, 0),
+        ')' => (0, 100),
+        '=' => (2, 1),
+        '+' | '-' if prefix => (99, 9),
+        '+' | '-' => (5, 6),
+        '*' | '/' => (7, 8),
+        '!' => (11, 100),
+        '.' => (14, 13),
+        _ => return None,
+    };
+    Some((op, res))
+}
+
+#[test]
+fn tests() {
+    let s = expr("1");
+    assert_eq!(s.to_string(), "1");
+
+    let s = expr("1 + 2 * 3");
+    assert_eq!(s.to_string(), "(+ 1 (* 2 3))");
+
+    let s = expr("a + b * c * d + e");
+    assert_eq!(s.to_string(), "(+ (+ a (* (* b c) d)) e)");
+
+    let s = expr("f . g . h");
+    assert_eq!(s.to_string(), "(. f (. g h))");
+
+    let s = expr(" 1 + 2 + f . g . h * 3 * 4");
+    assert_eq!(
+        s.to_string(),
+        "(+ (+ 1 2) (* (* (. f (. g h)) 3) 4))"
+    );
+
+    let s = expr("--1 * 2");
+    assert_eq!(s.to_string(), "(* (- (- 1)) 2)");
+
+    let s = expr("--f . g");
+    assert_eq!(s.to_string(), "(- (- (. f g)))");
+
+    let s = expr("-9!");
+    assert_eq!(s.to_string(), "(- (! 9))");
+
+    let s = expr("f . g !");
+    assert_eq!(s.to_string(), "(! (. f g))");
+
+    let s = expr("(((0)))");
+    assert_eq!(s.to_string(), "0");
+
+    let s = expr("(1 + 2) * 3");
+    assert_eq!(s.to_string(), "(* (+ 1 2) 3)");
+
+    let s = expr("1 + (2 * 3)");
+    assert_eq!(s.to_string(), "(+ 1 (* 2 3))");
+}
+ +
+ +
+ + +
1
+1
+1
+
+1 + 2 * 3
+1 2 3 * +
+(+ 1 (* 2 3))
+
+a + b * c * d + e
+a b c * d * + e +
+(+ (+ a (* (* b c) d)) e)
+
+f . g . h
+f g h . .
+(. f (. g h))
+
+ 1 + 2 + f . g . h * 3 * 4
+1 2 + f g h . . 3 * 4 * +
+(+ (+ 1 2) (* (* (. f (. g h)) 3) 4))
+
+--1 * 2
+1 - - 2 *
+(* (- (- 1)) 2)
+
+--f . g
+f g . - -
+(- (- (. f g)))
+
+-9!
+9 ! -
+(- (! 9))
+
+f . g !
+f g . !
+(! (. f g))
+
+(((0)))
+0
+0
+
+(1 + 2) * 3
+1 2 + 3 *
+(* (+ 1 2) 3)
+
+1 + (2 * 3)
+1 2 3 * +
+(+ 1 (* 2 3))
+ +
+

We actually could have done it with the original recursive formulation as well. +Placing print statements at all points where we construct an S node prints expression in a reverse polish notation, +proving that the recursive algorithm does the same steps and in the same order as the shunting yard.

+

Q.E.D.

+

The code from this and the previous article is available here: https://github.com/matklad/minipratt.

+
+
+
+ + + + + diff --git a/2020/07/15/two-beautiful-programs.html b/2020/07/15/two-beautiful-programs.html new file mode 100644 index 00000000..e0d2d0c7 --- /dev/null +++ b/2020/07/15/two-beautiful-programs.html @@ -0,0 +1,211 @@ + + + + + + + Two Beautiful Rust Programs + + + + + + + + + + + + +
+ +
+ +
+
+ +

Two Beautiful Rust Programs

+

This is a short ad of a Rust programming language targeting experienced C++ developers. +Being an ad, it will only whet your appetite, consult other resources for fine print.

+
+ +

+ First Program +

+ +
+ + +
fn main() {
+  let mut xs = vec![1, 2, 3];
+  let x: &i32 = &xs[0];
+  xs.push(92);
+  println!("{}", *x);
+}
+ +
+

This program creates a vector of 32-bit integers (std::vector<int32_t>), takes a reference to the first element, x, pushes one more number onto the vector and then uses x. +The program is wrong: extending the vector may invalidate references to element, and *x might dereference a dangling pointer.

+

The beauty of this program is that it doesnt compile:

+ +
+ + +
error[E0502]: cannot borrow xs as mutable
+    because it is also borrowed as immutable
+ --> src/main.rs:4:5
+
+     let x: &i32 = &xs[0];
+                    -- immutable borrow occurs here
+     xs.push(92);
+     ^^^^^^^^^^^ mutable borrow occurs here
+     println!(x);
+              - immutable borrow later used here
+ +
+

Rust compiler tracks the aliasing status of every piece of data and forbids mutations of potentially aliased data. +In this example, x and xs alias the first integer in the vectors storage in the heap.

+

Rust doesnt allow doing stupid things.

+
+
+ +

+ Second Program +

+ +
+ + +
use std::sync::{Mutex, MutexGuard};
+
+fn main() {
+  let mut counter = Mutex::new(0);
+
+  std::thread::scope(|s| {
+    for _ in 0..10 {
+      s.spawn(|| {
+        for _ in 0..10 {
+          let mut guard: MutexGuard<i32> =
+            counter.lock().unwrap();
+          *guard += 1;
+        }
+      });
+    }
+  });
+
+  let total: &mut i32 = counter.get_mut().unwrap();
+  println!("total = {}", *total)
+}
+ +
+

This program creates an integer counter protected by a mutex, spawns 10 threads, increments the counter 10 times from each thread, and prints the total.

+

The counter variable lives on the stack, and a pointer to these stack data is shared with other threads. +The threads have to lock the mutex to do the increments. +When printing the total, the counter is read bypassing the mutex, without any synchronization.

+

The beauty of this program is that it relies on several bits of subtle reasoning for correctness, each of which is checked by compiler:

+
    +
  1. +Child threads dont escape the main function and so can read counter from its stack. +
  2. +
  3. +Child threads only access counter through the mutex. +
  4. +
  5. +Child threads will have terminated by the time we read total out of counter without mutex. +
  6. +
+

If any of these constraints are broken, the compiler rejects the code. +Theres no need for std::shared_ptr just to defensively make sure that the memory isnt freed under your feet.

+

Rust allows doing dangerous, clever, and fast things without fear of introducing undefined behavior.

+

If you like what you see, here are two books I recommend for diving deeper into Rust:

+ +
+
+
+ + + + + diff --git a/2020/08/11/things-I-have-learned-about-life.html b/2020/08/11/things-I-have-learned-about-life.html new file mode 100644 index 00000000..a0cf5af4 --- /dev/null +++ b/2020/08/11/things-I-have-learned-about-life.html @@ -0,0 +1,392 @@ + + + + + + + Things I Have Learned About Life + + + + + + + + + + + + +
+ +
+ +
+
+ +

Things I Have Learned About Life

+

Hey, unlike all other articles on this blog, this one isnt about programming, its about my personal life. +Its nothing important, just some thoughts that have been on my mind recently. +So, if you come here for technical content, feel free to skip this one!

+

I do, however, intentionally post this together with other articles, for two main reasons:

+ +
+ +

+ Background +

+

I think giving some background info about me would be useful. +I come from a middle class Russian family. +I was born in 1992, so my earliest years fell onto a rather fun historical period, of which I dont really remember anything. +I grew up in Stavropol a city circa 400_000 in the southern part of Russia. +After finishing school, (I was sixteen), I moved to St. Petersburg to study in the state University there. +I had spent 10 years in that city before moving to Berlin, the place I currently live, last year.

+

In terms of understanding how the life works, I became somewhat actively self-conscious at about 14. +The set of important beliefs Ive learned/discovered then hasnt changed until about 2017 or so. +This latter change (which I feel is still very much ongoing) gives the title to the present article.

+
+
+ +

+ Romance & Polyamory +

+

I guess the biggest deal for me is discovering that polyamory 1) exists 2) is something Ive been missing a lot in my interpersonal relations. +Its the big one because it most directly affected me, and because other stuff Ive learned, Ive learned from my poly partners.

+

In a nutshell, polyamory is the idea that is OK to love several people at the same time time. +That if you love A, and also love B, it doesnt mean that your love for A is somehow fake or untrue. +I find the analogy with kids illuminating if its OK to love both your kids, than it should be OK to love both your partners, right? +I highly recommend everyone to read More than Two, on the basis that its a rare book that directly affected my life, and that it would probably would have affected it even if polyamory werent my thing (which is, of course, totally valid as well!).

+

A more general point is that until 2017, I didnt have a real working model of romantic relationships. +I am reasonably sure that a lot of people are in a similar situation: its hard to encounter a reasonable relationship model in society to learn from! +(This might be biased by my culture, but I suspect that it might not).

+

We arent taught how to be with another person (if we are lucky enough, we are taught how to practice safe sex at least), so we have to learn on our own by observing. +One model is the relationships of our parents, which are quite often at least somewhat broken (like in my case). +The other model is the art, and the portrayal of romance in art is (and this is an uncomfortably strong opinion for me) actively harmful garbage.

+

What I now hold as the most important thing in romantic relations is a very clear, direct and honest communication. +Honest with yourself and honest with your partner. +Honesty includes the ability to feel your genuine needs and desires (as opposed to following the model of what you think you should feel).

+

An example that is near and dear to my heart is when you are in relationship with A, but theres also this other person B whom your you find attractive. +Honesty is accepting that attractive means my body (and quite probably my consciousness) wants to have sex with this person and acting on that observation, rather than pretending that it doesnt exist or shaming yourself into thinking it shouldnt exist.

+

Or a more concrete example: one of my favorite dishes (code named the dish I find the most yummy) is bananas mixed with sour cream and quark. +Me and my partner O enjoyed eating this dish in the morning, and I was usually tasked with preparing it. +There are two variates of quark a hard grainy one and a soft one. +O had a preference for the soft one, so, naturally, I made morning meals using the soft one, because I dont really care, and eating the same thing is oh sooo romantic. +This continued until one day O said Kladov, stop bullshitting yourself and admit that you love the grainy one. Lets buy both variates and make two portions. +O was totally right. +And the thing is, I havent even noticed my (useless, stupid, and most egregiously, not called for) sacrifice for the sake of the relationship until it was called out by my partner. +(In the end, O came to the conclusion that the grainy quark is actually yummier, but thats besides the point).

+

And the depiction of love in art is the opposite of this. +Which is understandable the reason why romance (and death) is featured so prominently in art is that a major component of arts success is its capacity for evoking emotions, and theres little so heart wrecking as romantic drama (and death). +And the model of speak with words through the mouth relationships is very good at minimizing drama. +(Reminder: this is non-technical post, so if I say here that something is or isnt doesnt mean Ive performed due diligence to confirm that it is true). +My relations with poly partners were more boring than my relations with monogamous partners. +This is great for participating people, but bad for art (unless it is some kind of slow-cinema piece).

+

Recently, I re-read Anna Karenina by Leo Tolstoy. +I highly recommend this novel if you can read Russian. +(I am not sure if it is translatable to English, a big part of its merit is the exquisite language). +There are two romantic lines there: a passionate, forbidden and fatal love between Anna (who is married) and Vronski (who is not the guy Anna is married to), and a homely love/family of Levin and Kity. +The second one is portraited in a favorable light, as a representative of the isomorphism class of happy families. +The scene of engagement between Levin and Kity made my blood boil. +They are sitting at the table, with a piece of chalk. +Leving feels that its kind of an appropriate model to ask Kity to mary him. +So he take chalk and writes:

+ +
+

к, в, м, о, э, н, м, б, з, л, э, н, и, т

+
+ +
+

Which are the initial letters of a phrase

+ +
+

когда вы мне ответили: этого не может быть, значило ли это, что никогда, или тогда?

+
+ +
+

Which asks about Kitys original rejection of Levin several years ago. +Kity decodes this messages, and answers in a likewise manner. +This dialog continues for some time, at the end of which they are happily engaged, and I am enraged. +Such implicit, subtle and ellipsis based communication is exactly how you wreck any relation.

+

Which is the saddest part here is that I wasnt enraged when Ive read the book for the first time when I was 15 or so. +Granted, I had a full understanding that the book is about late XIX century, and that the models of relations are questionable. +But still, I think I subconsciously binned Levin and Kitys relationship to the good ones, and this why I find the art harmful in this respect.

+

My smaller quibble is that sex is both a fetishized and a taboo topic. +Its hinted at, today not so subtly, but is rarely shown or studied as a subject of art. +Von Trier and Gaspar Noe being two great exceptions among the artists I like.

+
+
+ +

+ The Road There +

+

So, how did I go from a default void model of romance, to my current place, where I know what I want and can actively build my relationships as I like, and not as they are supposed to be? +This is the most fascinating thing about this, and one of the primary reasons for me to write this down for other people to read.

+

I think I am a pretty introspective person I like to think about things, form models and opinions, adjust and tweak them. +And I did think about relationships a lot. +And, for example, one conclusion was that I dont really understand jealousy, and I dont want to own or otherwise restrict my partner. +I was always ok with the fact that a person I love has relationship with someone else, both in theory and a couple of times in practice.

+

But I didnt make a jump to its OK for me to love more than a single person, and I dont really understand that. +It feels like a very simple theorem, which you should just prove yourself. +Instead, it took me several chance encounters to get to this truth. +(To clarify again, I dont claim that polyamory is a universal truth, this is just something that works for me, you are likely different). +Once I got it, it turned out obvious and self evident. +But to get it, I needed:

+
    +
  1. +A relation with a poly person S, who was literally reading More Than Two when we were together. +
  2. +
  3. +A relation with an extremely monogamous (as in, expressing a lot of distress due to jealousy) S. +
  4. +
  5. +A relation with another poly person A, at which point it finally clicked that if I like 1 & 3, and dont like 2, than maybe it makes sense for me to read that book as well. +
  6. +
+

So, surprise, its possible to have some hugely important, but not so subtly broken things in life which were carried over from childhood/early adolescence without reconsidering. +If they are pointed out, its clear how to fix them, but noticing them is the tricky bit

+
+
+ +

+ Mental Health +

+

Speaking of things which are hard to notice… +Surprisingly, mental health exists! +Up until very recently, my association for mental health was The Cabinet of Dr. Caligari: something which just doesnt happen in real life. +Very, very far from the truth. +A lot of people seriously struggle with their minds. +Major depression or borderline personality disorder (examples I am familiar with) affect the very way you think, and are not that uncommon. +And many people struggle with smaller problems, like anxiety, self-loathing, low self-esteem, etc.

+

My own emotional responses are pretty muted. +Id pass a Voight Kampff test I guess. Maybe. +My own self-esteem is adequate, and I love myself.

+

So, it was eye-opening to realize that this might not be the case for other people. +Empathy is also not my strongest point, hehe.

+

Well, it gets even better than this. +I suspect I might be autistic :-) +Thanks M for pointing that out to me:

+

M: I am autistic. + +A: Wait wat? On the contrary, you are the first person Ive met who doesnt seem insane. Wait a second

+

( +Actually, S had made a bet that I am an aspie couple of years before that… +Apparently, just telling me something important about myself never works? +)

+

To clarify, Ive never been to a counselor, so I dont know what labels are applicable to me, if any, but I do think that I can be described as a person demonstrating certain unusual/autistic traits. +They dont bother me (on the contrary, having learned a bit about minds of other people, I feel super lucky about the way my brain works), so I dont think Id get counseling any time soon. +However, if something in your life bothers you (or even if it doesnt), counseling is probably a good idea to try! +Several people I trust highly recommend it. +Keep in mind that a lot that is called psychology is oscillating between science and, well, bullshit, so be careful with your choice. +Check that it is indeed a science based thing (Cognitive Behavioral Therapy being one of the most properly researched approaches).

+

Anyway, I guess it makes sense to share a bit of my experiences, in case someone reads this and thinks oh shit, thats me :-) +Hypothetical me from ten years ago would have appreciated this.

+

I think the single most telling thing is that I am Meursault, from Camuss The Stranger. +I read a lot, but characters rarely make sense to me, even less so than people. +Except for Meursault, I can associate myself with him. +Not as he is in a similar situation to mine but I understand the motives of his actions in any given situation.

+

After I had formed a hypothesis that I might have some autistic traits, I thought that Meursault feels very similar to me, and after some googling, presto: +https://www.dovepress.com/camuss-letranger-and-the-first-description-of-a-man-with-aspergers-syn-peer-reviewed-fulltext-article-PRBM

+

Apparently, Meursault had a real life prototype, Camuss best fried, and looks like that friend had Aspergers since before it was was named! +Hey, the hypothesis that I am autistic has predictive power!

+

Another thing where I find myself different from other people is that I am introverted. +Well, a lot of folks I know claim I am introverted, but the amount of social life they have gives me chills :-) +Kladovs radius the minimal degree of introvercy such that you are the most introverted person you know, because for any person more introverted than yourself, you two have zero chance to meet.

+

I dont really have a need for social interactions I think I like being by myself. +Not uttering a single word in a day (or a weekend) is something which happens to me pretty regularly, and I enjoy that. +By the way, did you know that Gandhi had one day in the week when he spoke to no one?

+

What do I do instead of people? +(formerly) Mathematics, programming, watching good movies, reading good books. +Programming is a big one for the past six years or so I rather easily loose myself in the state of flow (although my overall productivity is super unstable, and sometimes I cant have anything done for the whole day just because). +I also occasionally get mildly annoying by the work-life balance articles on reddit (I am thinking about a specific one which contrasted having life with building a carrier). +Of course everyone should do what works best for them. +But if someone codes at work, and then codes at home, it doesnt necessary mean they are optimizing their salary or are trying to get better at coding or something. +They might just really like writing code, and sometimes practice it during working hours as well because what else would you do between the meetings?

+

Otherwise, I am pretty uninterested in stuff. +I dont like traveling or trying out new things.

+

I dont have any super specific physical or psychological sensitivities. +I dont go outside of my apartment without headphones; music helps me to create a sort of bubble of my space around myself. +I am pretty easily overwhelmed in groups of people (which is different from not enjoying people generally I might get overwhelmed even among people I like to be around.).

+

My interpersonal relations are funny I always perceive myself much colder than the other person (and I project much fewer emotions in stressful situations). +Note that colder here is a positive thing I wish other people were more like me, not the other way around.

+

I am awkward and avoidant of casual social contact. +As in, I dont eat alone in cafes and such, as that means interacting with the waiter. +I do that in company though, where I can just observe and repeat what others are doing.

+

In general, I am pretty happy to be at the place where I am. +Well, I guess it would have helped a tiny bit if I could go to the supermarket in the next building, and not to the one three blocks away where I had already been before and where I know how to behave. +But, really, I perceive these as small things which are not worth fixing.

+
+
+ +

+ Mechanism rule the world +

+

The next discovery (or rather, subtle shift in the world view) is from a slightly earlier era (2014 maybe?). +I dont believe that people are X. +Or rather, I believe that its generally unimportant that this person is X when explaining their actions. +I weigh circumstances as relatively more important that personalities when explaining events. +In other words, there are no good or bad people, the same person can display a wide range of behaviors, depending on the current (not necessary historical) environment. +This is what Ive learned from The Lucifer Effect.

+

More generally, I feel that systems, mechanisms and institutions in place define the broad outlook of the world, and, if something is wrong, we should not make it right, but understand what force makes it wrong, and try to build a counter-mechanism.

+

A specific example here is that, if I see a less than polite/constructive/respectful comment on reddit making a point I disagree with, I answer with two comments. +One is factual comment about the point being discussed, another one is a templated response along the lines of I read your comment as _, I find this unacceptable, please avoid antagonistic rhetorical constructs like _.

+

That is:

+
    +
  1. +Clarify my subjective interpretation of the comment. +
  2. +
  3. +State that I dont find it appropriate. +
  4. +
  5. +Point out specific ways to improve the comment. +
  6. +
+

The goal here is not to disagree with a single specific comment or a to change behavior of a single specific commenter to write better comments in the future. +The goal is to create a culture which I think promotes healthy discussion, so that, when other people read the exchange, they get a strong signal what is ok and what is not.

+
+
+ +

+ Mechanisms rule me +

+

A more recent development of this idea is that mechanisms rule me as well (thanks to O again for this one!).

+

Specifically, I now separate my mind from myself. +What my mind feels/wants is not necessary what I want. +I am not my brain.

+

If I feel a craving for a bit of chocolate, that doesnt mean that I actually want sweets! +It only means that some chemistry in my brain decided that I need to experience the feeling of wanting something sweet right now.

+

An interesting aspect of this is that the desires part of our brain is older and more primitive than proving theorems part of our brain. +As it is simpler, it is more reliable and powerful. +So, it takes a disproportionally large amount of willpower to override your primitive wanting brain.

+

This flipped me from If I want to stop doing X, Id easily do that to ok, I should not start wanting X, otherwise getting rid of that would be a pain. +Somehow, Ive never tried alcohol, tobacco or drugs before (yes, I voluntarily moved to Berlin). +There wasnt strong reason for that, I am totally OK with all those things, its just that (I guess) I am too introverted to land into a company to start. +However, now I think I would deliberately avoid addictive substances, because I value my thinking about complicated stuff. +And when I am dealing with a hard math-y problem, I dont want to think and dont drink that extra bottle of beer on top, as thats too hard.

+

I am less successful with the torrent of low-quality superficial info from the internet. +Luckily, Ive never had any social network profiles (I guess for the same reason as with alcohol), but I started reading reddit at some point, and that eats into my attention. +/etc/hosts and RSS help a lot here.

+
+
+ +

+ Rationality? +

+

This discussion about mind, cognitive biases, mechanisms etc sounds a lot like something from rationalists community. +I am somewhat superficially familiar with it, and it does sound like a good thing. +If I were to optimize my life to better achieve my goals, I would probably dedicate some time studying https://www.lesswrong.com/. +Perhaps even me not having any particular goals (besides locally optimizing for what I find the most desirable at any given moment) is some form of a bias?

+
+
+ +

+ Ethics +

+

To conclude, a small, but crisp observation. +I often find myself in emotionally non-neutral debates about whether doing X is good. +If theres an actual disagreement, I tend to find myself a relatively more cold/cynic side, and my interlocutor a more empathetic one. +Surprisingly to me, many of such disagreements are traced to a single fundamental difference in decision-making process.

+

When I make a decision (especially an ethical one), I tend to go for what I feel is right in some abstract sense. +I cant explain this any better, this is really a gut feeling (and is not categorical imperative, at least not consciously).

+

Apparently, another mode for making ethical decisions is common weighing the consequences of a specific action in a specific context, and making decision based on that, without taking poperness of the action itself into consideration.

+

With this two different underlying algorithms, its pretty easy to heatedly disagree about some specific conclusion! +(Tip: to unearth such deep disagreements more efficiently, use the following rule: as soon anyone notices that a debate is happening, the debate is paused, and each side explains the position the other side is arguing for).

+
+
+ +

+ Conclusion +

+

I guess thats it for now and the nearest future! +If you have comments, suggestions or just want to say hello, feel free to drop me a email (in GitHub profile) or contact me on Telegram (@matklad).

+
+
+
+ + + + + diff --git a/2020/08/12/who-builds-the-builder.html b/2020/08/12/who-builds-the-builder.html new file mode 100644 index 00000000..82464cf7 --- /dev/null +++ b/2020/08/12/who-builds-the-builder.html @@ -0,0 +1,151 @@ + + + + + + + Who Builds the Builder? + + + + + + + + + + + + +
+ +
+ +
+
+ +

Who Builds the Builder?

+

This is a short note on the builder pattern, or, rather, on the builder method pattern.

+

TL;DR: if you have Foo and FooBuilder, consider adding a builder method to Foo:

+ +
+ + +
struct Foo { ... }
+
+#[derive(Default)]
+struct FooBuilder { ... }
+
+impl Foo {
+    fn builder() -> FooBuilder {
+        FooBuilder::default()
+    }
+}
+
+impl FooBuilder {
+    fn build(self) -> Foo { ... }
+}
+ +
+

A more minimal solution is to rely just on FooBuilder::default or FooBuilder::new. +There are two problems with that:

+

First, it is hard to discover. +Nothing in the docs/signature of Foo mentions FooBuilder, you need to look elsewhere to learn how to create a Foo. +I remember being puzzled at how to create a GlobSet for exactly this reason. +In contrast, the builder method is right there on Foo, probably the first one.

+

Second, it is more annoying to use, as you need to import both Foo and FooBuilder. +With Foo::builder method often only one import suffices, as you dont need to name the builder type.

+

Case studies:

+ +

Discussion on /r/rust.

+
+
+ + + + + diff --git a/2020/08/15/concrete-abstraction.html b/2020/08/15/concrete-abstraction.html new file mode 100644 index 00000000..60f3dc67 --- /dev/null +++ b/2020/08/15/concrete-abstraction.html @@ -0,0 +1,271 @@ + + + + + + + Code Smell: Concrete Abstraction + + + + + + + + + + + + +
+ +
+ +
+
+ +

Code Smell: Concrete Abstraction

+

This is a hand-wavy philosophical article about programming, without quantifiable justification, but with some actionable advice and a case study.

+

Suppose that there are two types in the program, Blorb and Gonk. +Suppose also that they both can blag.

+

Does it make sense to add the following trait?

+ +
+ + +
trait Blag {
+    fn blag(&mut self);
+}
+ +
+

I claim that it makes sense only if you have a function like

+ +
+ + +
fn blagyify<T: Blag>(x: T) {
+    ...
+}
+ +
+

That is, if some part of you program is generic over T: Blag.

+

If in every x.blag() the x is either Blorg, or Gonk, but never a T (each usage is concrete), you dont need this abstraction. +“Need is used in a literal sense here: replace a trait with two inherent methods named blag, and the code will be essentially the same. +Using a trait here doesnt achieve any semantic compression.

+

Given that abstractions have costs dont need can be strengthen to probably shouldnt.

+ + +

Not going for an abstraction often allows a for more specific interface. +A monad in Haskell is a thing with >>=. +Which isnt telling much. +Languages like Rust and OCaml cant express a general monad, but they still have concrete monads. +The >>= is called and_then for futures and flat_map for lists. +These names are more specific than >>= and are easier to understand. +The >>= is only required if you want to write code generic over type of monad itself, which happens rarely.

+

Another example of abstraction which is used mostly concretely are collection hierarchies. +In Java or Scala, theres a whole type hierarchy for things which can hold other things. +Rusts type system cant express Collection trait, so we have to get by with using Vec, HashSet and BTreeSet directly. +And it isnt actually a problem in practice. +Turns out, writing code which is generic over collections (and not just over iterators) is not that useful. +The but I can change the collection type later argument also seems overrated often, theres only single collection type that makes sense. +Moreover, swapping HashSet for BTreeSet is mostly just a change at the definition site, as the two happen to have almost identical interface anyway. +The only case where I miss Java collections is when I return Vec<T>, but mean a generic unordered collection. +In Java, the difference is captured by List<T> vs Collection<T>. +In Rust, theres nothing built-in for this. +It is possible to define a VecSet<T>(Vec<T>), but doesnt seem worth the effort.

+

Collections also suffer from >>= problem collapsing similar synonyms under a single name. +Javas +Queue +has add, offer, remove, and poll methods, because it needs to be a collection, but also is a special kind of collection. +In C++, you have to spell push_back for vectors push operation, so that it duck-types with deques front and back.

+ +

Finally, the promised case study! +rust-analyzer needs to convert a bunch of internal type to types suitable for converting them into JSON message of the Language Server Protocol. +ra::Completion is converted into lsp::Completion; ra::Completion contains ra::TextRange which is converted to lsp::Range, etc.

+

The first implementation started with an abstraction for conversion:

+ +
+ + +
pub trait Conv {
+    type Output;
+    fn conv(self) -> Self::Output;
+}
+ +
+

This abstraction doesnt work for all cases sometimes the conversion requires additional context. +For example, to convert a rust-analyzers offset (a position of byte in the file) to an LSP position ((line, column) pair), a table with positions of newlines is needed. +This is easy to handle:

+ +
+ + +
pub trait ConvWith<CTX> {
+    type Output;
+    fn conv_with(self, ctx: CTX) -> Self::Output;
+}
+ +
+

Naturally, there was an intricate web of delegating impls. +The typical one looked like this:

+ +
+ + +
impl ConvWith<&LineIndex> for TextRange {
+    type Output = Range;
+    fn conv_with(
+        self,
+        line_index: &LineIndex,
+    ) -> lsp_types::Range {
+        Range::new(
+            self.start().conv_with(line_index),
+            self.end().conv_with(line_index),
+        )
+    }
+}
+ +
+

There were a couple of genuinely generic impls for converting iterators of convertible things.

+

The code was hard to understand. +It also was hard to use: if calling .conv didnt work immediately, it took a lot of time to find which specific impl didnt apply. +Finally, there were many accidental (as in accidental complexity) changes to the shape of code: CTX being passed by value or by reference, switching between generic parameters and associated types, etc.

+

I was really annoyed by how this conceptually simple pure boilerplate operation got expressed as clever and fancy abstraction. +Crucially, almost all of the usages of the abstraction (besides those couple of iterator impls) were concrete. +So I replaced the whole edifice with much simpler code, a bunch of functions:

+ +
+ + +
fn range(
+    line_index: &LineIndex,
+    range: TextRange,
+) -> lsp_types::Range {
+    let start = position(line_index, range.start());
+    let end = position(line_index, range.end());
+    lsp_types::Range::new(start, end)
+}
+
+fn position(
+    line_index: &LineIndex,
+    offset: TextSize,
+) -> lsp_types::Position {
+    ...
+}
+ +
+

Simplicity and ease of use went up tremendously. +Now instead of typing x.conv() and trying to figure out why an impl I think should apply doesnt apply, I just auto-complete to_proto::range and let the compiler tell me exactly which types dont line up.

+

Ive lost fancy iterator impls, but the +total diff +for the commit was +999,-1123. +There was some genuine code re-use in those impls, but it was not justified by the overall compression, even disregarding additional complexity tax.

+

To sum up, is this abstraction used exclusively concretely? is a meaningful question about the overall shape of code. +If the answer is Yes!, then the abstraction can be replaced by a number of equivalent non-abstract implementations. +As the latter tend to be simpler, shorter, and more direct, Concrete Abstraction can be considered a code smell. +As usual though, any abstract programming advice can be applied only in a concrete context dont blindly replace abstractions with concretions, check if provided justifications work for your particular case!

+

Discussion on /r/rust.

+
+
+ + + + + diff --git a/2020/09/12/rust-in-2021.html b/2020/09/12/rust-in-2021.html new file mode 100644 index 00000000..5c651f61 --- /dev/null +++ b/2020/09/12/rust-in-2021.html @@ -0,0 +1,384 @@ + + + + + + + Rust in 2021 + + + + + + + + + + + + +
+ +
+ +
+
+ +

Rust in 2021

+

This is my response for this years call for blog posts. +I am writing this as a language implementor, not as a language user. +I also dont try to prioritize the problems. +The two things Ill mention are the things that worry me most without reflecting on the overall state of the project. +They are not necessary the most important things.

+
+ +

+ Funding Teams to Work on Rust +

+

For the past several years, Ive been a maintainer of Sponsored Open Source projects (rust-analyzer & IntelliJ Rust). +These projects:

+
    +
  • +have a small number of core developers who work full-time at company X, and whose job is to maintain the project, +
  • +
  • +are explicitly engineered for active open source: +
      +
    • +significant fraction of maintainers time goes to contribution documentation, issue mentoring, etc, +
    • +
    • +non-trivial amount of features end up being implemented by the community. +
    • +
    +
  • +
+

This experience taught me that theres a great deal of a difference between the work done by the community, and the work done during payed hours. +To put it bluntly, a small team of 2-3 people working full-time on a specific project with a long time horizon can do a lot. +Not because payed hours == higher quality work, but because of the cumulative effect of:

+
    +
  • +being able to focus on a single thing, +
  • +
  • +keeping the project in a mental cache and accumulating knowledge, +
  • +
  • +being able to invest into the code and do long-term planing effectively. +
  • +
+

In other words, community gives breadth of contributions, while payed hours give depth. +Both are important, but I feel that Rust could use a lot of the latter at the moment, in two senses.

+

First, marginal utility of adding a full-time developer to the Rust project will be high for quite a few full-time developers.

+

Second, perhaps more worrying, I have a nagging feeling that the imbalance between community and payed hours can affect the quality of the technical artifact, and not just the speed of development. +The two styles of work lend themselves to different kinds of work actually getting done. +Most of pull requests I merge are about new features, and some are about bug-fixes. +Most of pull requests I submit are about refactoring existing code. +Community naturally picks the work of incrementally adding new code, maintainers can refactor and rewrite existing code. +Its easy to see that, in the limit, this could end with an effectively immutable/append only code base. +I think we are pretty far from the limit today, but I dont exactly like the current dynamics. +I keep coming back to this Rust 2019 post when I think about this issue.

+

The conclusion from this section is that we should find ways to fund teams of people to focus on improving the Rust programming language. +Through luck, hard work of my colleagues at JetBrains and Ferrous Systems, and my own efforts it became possible to move in this direction for both IntelliJ Rust and rust-analyzer. +This was pretty stressful, and, well, I feel that the marginal utility of one more compiler engineer is still huge in the IDE domain at least.

+
+
+ +

+ Compiling the Compiler +

+

And now to something completely different! +I want this:

+ +
+ + +
$ git clone git@github.com:rust-lang/rust.git && cd rust
+$ cargo t
+info: syncing channel updates for 'beta-x86_64-unknown-linux-gnu'
+info: latest update on 2020-09-10, rust version 1.47.0-beta
+info: downloading component 'cargo'
+info: downloading component 'rustc'
+info: installing component 'cargo'
+info: installing component 'rustc'
+Compiling unicode-xid v0.2.1
+Compiling proc-macro2 v1.0.20
+
+...
+
+Finished test [unoptimized] target(s) in 5m 45s
+  Running target/debug/deps/rustc-bf0145d0690d0fbc
+
+running 9001 tests
+
+...
+
+test result: ok. 9001 passed;  in 1m 3s
+ +
+

That is, I want to simplify working on the compiler itself to it being just a crate. +This section of the article expands on the comment Ive made on the +irlo +a while ago.

+

Since a couple of months ago, I am slowly pivoting from doing mostly green field dev in the rust-analyzers code base to refactoring rustc internals towards merging the two. +The process has been underwhelming, and slow and complicated build process plays a significant part in this: I feel like my own productivity is at least five times greater when I work on rust-analyzer in comparison to rustc.

+

Before I go into details about my vision here, I want to give shout-outs to +@Mark-Simulacrum, @mark-i-m, and @jyn514 +who already did a lot of work on simplifying the build process in the recent several months.

+

Note that I am going to make a slightly deeper than Rust in 20XX dive into the topic, feel free to skip the rest of the post if technical details about bootstrapping process are not your cup of tea.

+

Finally, I also should warn that I have an intern advantage here I have absolutely no idea about how Rusts current build process works, so I tell how it should work from the position of ignorance. Without further ado,

+
+ +

+ How Simple Could the Build Process Be? +

+

rustc is a bootstrapping compiler. +This means that, to compile rustc itself, one needs to have a previous version of rustc available. +This could make compilers build process peculiar. +My thesis is that this doesnt need to be the case, and that the compiler could be just a crate.

+

Bootstrapping does make this harder to see though, so, as a thought experiment, lets imagine what would rustcs build process look like were it not written in Rust. +Lets imagine the world where rustc is implemented in Go. +How would one build and test this rust compiler?

+

First, we clone the rust-lang/rust repository. +Then we download the latest version of the Go compiler as we are shipping rustc binaries to the end user, its OK to require a cutting-edge compiler. +But theres probably some script or gvm config file to make getting the latest Go compiler easier. +After that, go test builds the compiler and runs the unit tests. +Unit tests take a snippet of Rust code as an input and check that the compiler correctly analyses the snippet: that the parse tree is correct, that diagnostics are emitted, that borrow checker correctly accepts or rejects certain problems.

+

What we can not check in this way is that the compiler is capable of producing a real binary which we can run (that is, run-pass tests). +The reason for that is slightly subtle to produce a binary, compiler needs to link the tested code with the standard library. +But weve only compiled the compiler, we dont have a standard library yet!

+

So, in addition to unit-tests, we also need somewhat ad-hoc integration tests, which assume that the compiler has been build already, use it to compile the standard library, and then compile, link, and run the corpus of the test programs. +Running stds own #[test] tests is also a part of this integration testing.

+

Now, lets see if the above setup has any bottlenecks:

+
    +
  1. +

    Getting the Go compiler is fast and straightforward. +In fact, its reasonable to assume that the user already have a recent Go compiler installed, and that they are familiar with standard Go workflows.

    +
  2. +
  3. +

    Compiling rustc would take a little while. +On the one hand, Rust is a big language, and you need to spend quite a few lines of code to implement it. +On the other hand, compilers are very straightforward programs, which dont do a lot of IO, dont have to deal with changing business requirements and dont have a lot of dependencies. +Besides, Go is a language known for fast compile times. +So, spending something like five minutes on a quad-core machine for compiling the compiler seems reasonable.

    +
  4. +
  5. +

    After that, running unit-tests is a breeze: unit-tests do not depend on any state external to the test itself; we are testing pure functions.

    +
  6. +
  7. +

    The first integration tests is compiling and #[test]ing std. +As std is relatively small, compiling it with our compiler should be relatively fast.

    +
  8. +
  9. +

    Running tens of thousands of full integration tests will be slow. +Each such test would need to do IO to read the source code, write the executable, and run the process. +It is reasonable to assume that most of potential failures are covered with compilers and stds unit tests. +But it would be foolish to rely solely on those tests fully integrated test suite is important to make sure that compiler indeed does what it is supposed to, and it is vital to compare several independent implementations who knows, maybe one day well rewrite rustc from Go to Rust, and re-using compilers unit-tests would be much harder in that context.

    +
  10. +
+

So, it seems like except for the final integration test suite, theres no complexity/performance bottlenecks in our setup for a from-scratch build. +The problem with integrated suite can be handled by running a subset of smoke tests by default, and only running the full set of integrated tests on CI. +Testing is embarrassingly parallel, so a beefy CI fleet should handle that just fine.

+

What about incremental builds? +Lets say we want to contribute a change to std. +First time around, this requires building the compiler, which is unfortunate. +This is a one-time cost though, and it shouldnt be prohibitive (or we will have troubles with changes to the compiler itself anyway). +We can also cheat here, and just download some version of rustc from the internet to check std. +This will mostly work, except for the bits where std and rustc need to know about each other (lang items and intrinsics). +For those, we can use #[cfg(not(bootstrap))] in the std to compile different code for older versions of the compiler. +This makes std implementation mind-bending though, so a better alternative might be to just make CI publish the artifacts for the compiler built off the master branch. +That is, if you only contribute to std, you download the latest compiler instead of building it yourself. +We have a trade off between implementation complexity and compile times.

+

If we want to contribute a change to the compiler, then we are golden as long as it can be checked by the unit-tests (which, again, in theory is everything except for run-pass tests). +If we need to run integrated tests with std, then we need to recompile std with the new compiler, after every change to the compiler. +This is pretty unfortunate, but:

+
    +
  • +if you fundamentally need to recompile std (for example, you change lang-items), theres no way around this, +
  • +
  • +if you dont need to recompile std, than you probably can write an std-less unit-test, +
  • +
  • +as an escape hatch, there might be some kind of KEEP_STDLIB env var, which causes integrated tests to re-use existing std, even if the compiler is newer. +
  • +
+

To sum up, compiler is just a program which does some text processing. +In the modern world full of distributed highly-available long-running systems, compiler is actually a pretty simple program. +It also is fairly easy to test. +The hard bit is not the compiler itself, but the standard library: to even start building the standard library, we need to compile the compiler. +However, most of the compiler can be tested without std, and std itself can be tested using compiler binary built from the master branch by CI.

+
+
+ +

+ Why Todays Build Process is not Simple? +

+

In theory, it should be possible to replace Go from the last section with Rust, and get a similarly simple bootstrapping compiler. +That is, we would use latest stable/beta Rust to compile rustc, then well use this rustc to compile std, and we are done. +We might add a sanity check using the freshly built compiler & std, recompile the compiler again and check that everything works. +This is optional, and in a sense just a subset of a crater run, where we check one specific crate compiler itself.

+

However, todays build is more complicated than that.

+

First, instead of using a standard distribution of the compiler for bootstrapping, x.py downloads custom beta toolchain. +This could and should be replaced with using rustup by default.

+

Second, master rustc requires master std to build. +This is the bit which makes rustc not a simple crate. +Remember how before the build started with just compiling the compiler as a usual program? +Today, rustc build starts with compiling master std using the beta compiler, than with compiling master rustc using master std and beta compiler. +So, theres a requirement that std builds with both master and beta compilers, and we also has this weird state where versions of compiler and std we are using to compile the code do not match. In other words, while #[cfg(not(bootstrap))] was an optimization in the previous section (which could be replaced with downloading binary rustc from CI), today it is required.

+

Third, theres not much in a way of the unit tests in the compiler. +Almost all tests require std, which means that, to test anything, one needs to rebuild everything.

+

Fourth, LLVM & linkers. +A big part of compilers are easy to test is the fact that they are, in theory, closed systems interacting with the outside world in a limited well-defined way. +In the real world, however, rustc relies on a bunch of external components to work, the biggest one of which is LLVM. +Luckily, these external components are required only for making the final binary. +The bulk of the compiler, analysis phases which reject invalid programs and lower valid ones, does not need them.

+
+
+ +

+ Specific Improvements +

+

With all this in mind, here are specific steps which I believe would make the build process easier:

+
    +
  • +Gear the overall build process and defaults to the hacking on the compiler use case. +
  • +
  • +By default, rely on rust-toolchain file and rustup to get the beta compiler. +
  • +
  • +Switch from x.py to something like cargo-xtask, to remove dependency on Python. +
  • +
  • +Downgrade rustcs libstd requirements to beta. +Note that this refers solely to the std used to build rustc itself. +rustc will use master std for building users code. +
  • +
  • +Split compiler and std into separate Cargo workspaces. +
  • +
  • +Make sure that, by default, rustc is using system llvm, or llvm downloaded from a CI server. +Building llvm from source should require explicit op-in. +
  • +
  • +Make sure that cd compiler && cargo test just works. +
  • +
  • +Add ability to to make a build of the compiler which can run check, but doesnt do llvm-dependent codegen. +
  • +
  • +Split the test suite into cross-platform codegen-less check part, and the fully-integrated part. +
  • +
  • +Split the compiler itself into frontend and codegen parts, such that changes in frontend can be tested without linking backend, and changes in backend can be tested without recompiling the frontend. +
  • +
  • +Stop building std with beta compiler and remove all #[cfg(bootstrap)]. +
  • +
  • +Somehow make cargo test just work in std. +This will require some hackery to plug the logic for build compiler from source or download from CI somewhere. +
  • +
+

At this stage, we have a compiler which is 100% bog standard crate, and std, which is almost a typical crate (it only requires a very recent compiler to build).

+

After this, we can start the standard procedure to optimize compile and test times, just how you would do for any other Rust project (I am planning to write a couple of posts on these topics). +I have a suspicion that theres a lot of low-hanging fruit there one of the reasons why I writing this post is that Ive noticed that doctests in std are insanely slow, and that nobody complains about that just because everything else is even slower!

+

This post ended up being too technical for the genre, but, to recap, there seems to be two force multipliers we could leverage to develop Rust itself:

+
    +
  • +Creating a space for small teams of people to work full-time on Rust. +
  • +
  • +Simplifying hacking on the compiler to just cargo test. +
  • +
+

Discussion on /r/rust.

+
+
+
+
+ + + + + diff --git a/2020/09/13/your-language-sucks.html b/2020/09/13/your-language-sucks.html new file mode 100644 index 00000000..963e6252 --- /dev/null +++ b/2020/09/13/your-language-sucks.html @@ -0,0 +1,188 @@ + + + + + + + Your Language Sucks, It Doesn't Matter + + + + + + + + + + + + +
+ +
+ +
+
+ +

Your Language Sucks, It Doesnt Matter

+

This post describes my own pet theory of programming languages popularity. +My understanding is that no one knows why some languages are popular and others arent, so theres no harm done if I add my own thoughts to the overall confusion. +Obviously, this is all wild speculation and a just-so story without any kind of data backed research.

+

The central thesis is that the actual programming language (syntax, semantics, paradigm) doesnt really matter. +What matters is characteristics of the runtime roughly, what does memory of the running process look like?

+

To start, an observation. +A lot of software is written in vimscript and emacs lisp (magit being one example I cant live without). +And these languages are objectively bad. +This happens even with less esoteric technologies, notable examples being PHP and JavaScript. +While JavaScript is great in some aspects (its the first mainstream language with lambdas!), it surely isnt hard to imagine a trivially better version of it (for example, without two different nulls).

+

This is a general rule as soon as you have a language which is Turing-complete, and has some capabilities for building abstractions, people will just get the things done with it. +Surely, some languages are more productive, some are less productive, but, overall, FP vs OOP vs static types vs dynamic types doesnt seem super relevant. +Its always possible to overcome the language by spending some more time writing a program.

+

In contrast, overcoming language runtime is not really possible. +If you want to extend vim, you kinda have to use vimscript. +If you want your code to run in the browser, JavaScript is still the best bet. +Need to embed your code anywhere? GC is probably not an option for you.

+

This two observations lead to the following hypothesis:

+ +

Lets see some examples which can be explained by this theory.

+
+
C
+
+

C has a pretty spartan runtime, which is notable for two reasons. +First, it was the first fast enough runtime for a high-level language. +It was possible to write the OS kernel in C, which had been typically done in assembly before that for performance. +Second, C is the language of Unix. +(And yes, I would put C into the easily improved upon category of languages. Null-terminated strings are just a bad design).

+
+
JavaScript
+
+

This language has been exclusive in the browsers for quite some time.

+
+
Java
+
+

This case I think is the most interesting for the theory. +A common explanation for Javas popularity is marketing by Sun, and subsequent introduction of Java into Universitys curricula. +This doesnt seem convincing to me. +Lets look at the 90s popular languages (I am not sure about percentage and relative ranking here, but the composition seems broadly correct to me):

+ +
+
https://www.youtube.com/watch?v=Og847HVwRSI
+ + +
+

On this list, Java is the only non-dynamic cross-platform memory safe language. +That is, Java is both memory safe (no manual error-prone memory management) and can be implemented reasonably efficiently (field access is a load and not a dictionary lookup). +This seems like a pretty compelling reason to choose Java, irrespective of what the language itself actually looks like.

+
+
Go
+
+

One can argue whether focus on simplicity at the expense of everything else is good or bad, but statically linked zero dependency binaries definitely were a reason for Go popularity in the devops sphere. +In a sense, Go is an upgrade over memory safe & reasonably fast Java runtime, when you no longer need to install JVM separately.

+
+
+

Naturally, there are also some things which are not explained by my hypothesis. +One is scripting languages. +A highly dynamic runtime with eval and ability to easily link C extensions indeed would be a differentiator, so we would expect a popular scripting language. +However, its unclear why they are Python and PHP, and not Ruby and Perl.

+

Another one is language evolutions: C++ and TypeScript dont innovate runtime-wise, yet they are still major languages.

+

Finally, lets make some bold predictions using the theory.

+

First, I expect Rust to become a major language, naturally :) +This needs some explanation on the first blush, Rust is runtime-equivalent to C and C++, so the theory should predict just the opposite. +But I would argue that memory safety is a runtime property, despite the fact that it is, uniquely to Rust, achieved exclusively via language machinery.

+

Second, I predict Julia to become more popular. +Its pretty unique, runtime-wise, with its stark rejection of Ousterhouts Dichotomy and insisting that, yeah, well just JIT highly dynamic language to suuuper fast numeric code at runtime.

+

Third, I wouldnt be surprised if Dart grows. +On the one hand, its roughly in the same boat as Go and Java, with memory safe runtime with fixed layout of objects and pervasive dynamic dispatch. +But the quality of implementation of the runtimes is staggering: it has first-class JIT, AOT and JS compilers. +Moreover, it has top-notch hot-reload support. +Nothing here is a breakthrough, but the combination is impressive.

+

Fourth, I predict that Nim, Crystal and Zig (which is very interesting, language design wise) would not become popular.

+

Fifth, I predict that Swift will be pretty popular on Apple hardware due to platform exclusivity, but wont grow much outside of it, despite being very innovative in language design (generics in Swift are the opposite of the generics in Go).

+
+
+ + + + + diff --git a/2020/09/20/why-not-rust.html b/2020/09/20/why-not-rust.html new file mode 100644 index 00000000..c39af749 --- /dev/null +++ b/2020/09/20/why-not-rust.html @@ -0,0 +1,258 @@ + + + + + + + Why Not Rust? + + + + + + + + + + + + +
+ +
+ +
+
+ +

Why Not Rust?

+

Ive recently read an article criticizing Rust, and, while it made a bunch of good points, I didnt enjoy it it was an easy to argue with piece. +In general, I feel that I cant recommend an article criticizing Rust. +This is a shame confronting drawbacks is important, and debunking low effort/miss informed attempts at critique sadly inoculates against actually good arguments.

+

So, heres my attempt to argue against Rust:

+
+
Not All Programming is Systems Programming
+
+

Rust is a systems programming language. +It offers precise control over data layout and runtime behavior of the code, granting you maximal performance and flexibility. +Unlike other systems programming languages, it also provides memory safety buggy programs terminate in a well-defined manner, instead of unleashing (potentially security-sensitive) undefined behavior.

+

However, in many (most) cases, one doesnt need ultimate performance or control over hardware resources. +For these situations, modern managed languages like Kotlin or Go offer decent speed, enviable +time to performance, and are memory safe by virtue of using a garbage collector for dynamic memory management.

+
+
Complexity
+
+

Programmers time is valuable, and, if you pick Rust, expect to spend some of it on learning the ropes. +Rust community poured a lot of time into creating high-quality teaching materials, but the Rust language is big. +Even if a Rust implementation would provide value for you, you might not have resources to invest into growing the language expertise.

+

Rusts price for improved control is the curse of choice:

+ +
+ + +
struct Foo     { bar: Bar         }
+struct Foo<'a> { bar: &'a Bar     }
+struct Foo<'a> { bar: &'a mut Bar }
+struct Foo     { bar: Box<Bar>    }
+struct Foo     { bar: Rc<Bar>     }
+struct Foo     { bar: Arc<Bar>    }
+ +
+

In Kotlin, you write class Foo(val bar: Bar), and proceed with solving your business problem. +In Rust, there are choices to be made, some important enough to have dedicated syntax.

+

All this complexity is there for a reason we dont know how to create a simpler memory safe low-level language. +But not every task requires a low-level language to solve it.

+

See also Why C++ Sails When the Vasa Sank.

+
+
Compile Times
+
+

Compile times are a multiplier for everything. +A program written in a slower to run but faster to compile programming language can be faster to run because the programmer will have more time to optimize!

+

Rust intentionally picked slow compilers in the generics dilemma. +This is not necessarily the end of the world (the resulting runtime performance improvements are real), but it does mean that youll have to fight tooth and nail for reasonable build times in larger projects.

+

rustc implements what is probably the most advanced incremental compilation algorithm in production compilers, but this feels a bit like fighting with language compilation model.

+

Unlike C++, Rust build is not embarrassingly parallel; the amount of parallelism is limited by length of the critical path in the dependency graph. +If you have 40+ cores to compile, this shows.

+

Rust also lacks an analog for the pimpl idiom, which means that changing a crate requires recompiling (and not just relinking) all of its reverse dependencies.

+
+
Maturity
+
+

Five years old, Rust is definitely a young language. +Even though its future looks bright, I will bet more money on C will be around in ten years than on Rust will be around in ten years” +(See Lindy Effect). +If you are writing software to last decades, you should seriously consider risks associated with picking new technologies. +(But keep in mind that picking Java over Cobol for banking software in 90s retrospectively turned out to be the right choice).

+

Theres only one complete implementation of Rust the rustc compiler. +The most advanced alternative implementation, mrustc, purposefully omits many static safety checks. +rustc at the moment supports only a single production-ready backend LLVM. +Hence, its support for CPU architectures is narrower than that of C, which has GCC implementation as well as a number of vendor specific proprietary compilers.

+

Finally, Rust lacks an official specification. +The reference is a work in progress, and does not yet document all the fine implementation details.

+
+
Alternatives
+
+

There are other languages besides Rust in systems programming space, notably, C, C++, and Ada.

+

Modern C++ provides tools and guidelines for improving safety. +Theres even a proposal for a Rust-like lifetimes mechanism! +Unlike Rust, using these tools does not guarantee the absence of memory safety issues. +Modern C++ is safer, Rust is safe. +However, if you already maintain a large body of C++ code, it makes sense to check if following best practices and using sanitizers helps with security issues. +This is hard, but clearly is easier than rewriting in another language!

+

If you use C, you can use formal methods to prove the absence of undefined behaviors, or just exhaustively test everything.

+

Ada is memory safe if you dont use dynamic memory (never call free).

+

Rust is an interesting point on the cost/safety curve, but is far from the only one!

+
+
Tooling
+
+

Rust tooling is a bit of a hit and miss. +The baseline tooling, the compiler and the build system +(cargo), are often cited as best in class.

+

But, for example, some runtime-related tools (most notably, heap profiling) are just absent its hard to reflect on the runtime of the program if theres no runtime! +Additionally, while IDE support is decent, it is nowhere near the Java-level of reliability. +Automated complex refactors of multi-million line programs are not possible in Rust today.

+
+
Integration
+
+

Whatever the Rust promise is, its a fact of life that todays systems programming world speaks C, and is inhabited by C and C++. +Rust intentionally doesnt try to mimic these languages it doesnt use C++-style classes or C ABI.

+

That means that integration between the worlds needs explicit bridges. +These are not seamless. +They are unsafe, not always completely zero-cost and need to be synchronized between the languages. +While the general promise of piece-wise integration holds up and the tooling catches up, there is accidental complexity along the way.

+

One specific gotcha is that Cargos opinionated world view (which is a blessing for pure Rust projects) might make it harder to integrate with a bigger build system.

+
+
Performance
+
+

Using LLVM is not a universal solution to all performance problems. +While I am not aware of benchmarks comparing performance of C++ and Rust at scale, its not to hard to come up with a list of cases where Rust leaves some performance on the table relative to C++.

+

The biggest one is probably the fact that Rusts move semantics is based on values (memcpy at the machine code level). +In contrast, C++ semantics uses special references you can steal data from (pointers at the machine code level). +In theory, compiler should be able to see through chain of copies; in practice it often doesnt: #57077. +A related problem is the absence of placement new Rust sometimes need to copy bytes to/from the stack, while C++ can construct the thing in place.

+

Somewhat amusingly, Rusts default ABI (which is not stable, to make it as efficient as possible) is sometimes worse than that of C: #26494.

+

Finally, while in theory Rust code should be more efficient due to the significantly richer aliasing information, enabling aliasing-related optimizations triggers LLVM bugs and miscompilations: #54878.

+

But, to reiterate, these are cherry-picked examples, sometimes the field is tilted the other way. +For example, std::unique_ptr has a performance problem which Rusts Box lacks.

+

A potentially bigger issue is that Rust, with its definition time checked generics, is less expressive than C++. +So, some C++ template tricks for high performance are not expressible in Rust using a nice syntax.

+
+
Meaning of Unsafe
+
+

An idea which is even more core to Rust than ownership & borrowing is perhaps that of unsafe boundary. +That, by delineating all dangerous operations behind unsafe blocks and functions and insisting on providing a safe higher-level interface to them, it is possible to create a system which is both

+
    +
  1. +sound (non-unsafe code cant cause undefined behavior), +
  2. +
  3. +and modular (different unsafe blocks can be checked separately). +
  4. +
+

Its pretty clear that the promise works out in practice: fuzzing Rust code unearths panics, not buffer overruns.

+

But the theoretical outlook is not as rosy.

+

First, theres no definition of Rust memory model, so it is impossible to formally check if a given unsafe block is valid or not. +Theres informal definition of things rustc does or might rely on and in in-progress runtime verifier, but the actual model is in flux. +So there might be some unsafe code somewhere which works OK in practice today, might be declared invalid tomorrow, and broken by a new compiler optimization next year.

+

Second, theres also an observation that unsafe blocks are not, in fact, modular. +Sufficiently powerful unsafe blocks can, in effect, extend the language. +Two such extensions might be fine in isolation, but lead to undefined behavior if used simultaneously: +Observational equivalence and unsafe code.

+

Finally, there are outright bugs in the compiler.

+
+
+
+

Here are some thing I have deliberately omitted from the list:

+ +

Discussion on /r/rust.

+
+
+ + + + + diff --git a/2020/10/03/fast-thread-locals-in-rust.html b/2020/10/03/fast-thread-locals-in-rust.html new file mode 100644 index 00000000..ab93a386 --- /dev/null +++ b/2020/10/03/fast-thread-locals-in-rust.html @@ -0,0 +1,359 @@ + + + + + + + Fast Thread Locals In Rust + + + + + + + + + + + + +
+ +
+ +
+
+ +

Fast Thread Locals In Rust

+

Rust thread-locals are slower than they could be. +This is because they violate zero-cost abstraction principle, specifically the you dont pay for what you dont use part.

+

Rusts thread-local implementation( +1, +2 +) comes with built-in support for laziness thread locals are initialized on the first access. +Sometimes this overhead is a big deal, as thread locals are a common tool for writing high-performance code. +For example, allocator fast path often involves looking into thread-local heap.

+

Theres an unstable #[thread_local] attribute for a zero-cost implementation +(see the tracking issue).

+

Lets see how much is thread local initialized? check costs by comparing these two programs:

+ +
+
./src/main.rs
+ + +
thread_local! {
+  static COUNTER: Cell<u32> = Cell::new(0);
+}
+
+const STEPS: u32 = 1_000_000_000;
+fn sum_rust() -> u32 {
+  for step in 0..STEPS {
+    COUNTER.with(|it| {
+      let inc = step.wrapping_mul(step) ^ step;
+      it.set(it.get().wrapping_add(inc))
+    })
+  }
+  COUNTER.with(|it| it.get())
+}
+
+fn main() {
+  let t = Instant::now();
+  let r = sum_rust();
+  eprintln!("Rust:   {} {}ms", r, t.elapsed().as_millis());
+}
+ +
+ +
+
./src/main.c
+ + +
#define _POSIX_C_SOURCE 200809L
+
+#include "inttypes.h"
+#include "stdint.h"
+#include "stdio.h"
+#include "threads.h"
+#include "time.h"
+
+thread_local uint32_t COUNTER = 0;
+
+const uint32_t STEPS = 1000000000;
+
+uint32_t sum_c() {
+  for (uint32_t step = 0; step < STEPS; step++) {
+    uint32_t inc = (step * step) ^ step;
+    COUNTER += inc;
+  }
+  return COUNTER;
+}
+
+uint64_t now_ms() {
+  struct timespec spec;
+  clock_gettime(CLOCK_MONOTONIC, &spec);
+  return spec.tv_sec * 1000 + spec.tv_nsec / 1000000;
+}
+
+int main(void) {
+  uint64_t t = now_ms();
+  uint32_t r = sum_c();
+  printf("C:      %" PRIu32 " %"PRIu64"ms\n", r, now_ms() - t);
+  return 0;
+}
+ +
+

In this test, we declare an integer thread-local variable, and use it as an accumulator for the summation.

+

We use non-trivial summation term: (step * step) ^ step this is to prevent LLVM from evaluating the sum at compile time. +If a term of a summation is a polynomial (like 1, step or step * step), then the sum itself is a one degree higher polynomial, and LLVM can figure this out! +We rely on wrapping overflow of unsigned integers in C, and use wrapping_mul and wrapping_add in Rust. +To make sure that both programs are equivalent, we also print the result.

+

One optimization we specifically dont protect from is caching thread-local access. +That is, instead of doing a billion of thread-local loads and stores, the compiler could generate code to compute the sum into the local variable, and do a single store at the end. +This is because can the compiler optimize thread-local access? is exactly the property we want to measure.

+

Theres no standard way to get monotonic wall-clock time in C, so the C version is not cross-platform.

+

This code gives the following results on my machine:

+ +
+ + +
$ cargo build --release -q        && ./target/release/ftl
+Rust:   62565888 487ms
+$ clang -std=c17 -O3 ./src/main.c && ./a.out
+C:      62565888 239ms
+ +
+

This benchmark doesnt allow to measure the cost of thread-local access per se, but the overall time is about 2x longer for Rust.

+

Can we make Rust faster? +I dont know how to do that, but I know how to cheat. +We can apply a general Rust extension trick write some C code and link it with Rust!

+

Lets implement a simple C library which declares a thread-local and provides access to it:

+ +
+
../src/thread_local.c
+ + +
#include "stdint.h"
+#include "threads.h"
+
+thread_local uint32_t COUNTER = 0;
+
+uint32_t* get_thread_local() {
+  return &COUNTER;
+}
+ +
+

Link it with Rust:

+ +
+
../build.rs
+ + +
use std::{env, path::Path, process::Command};
+
+fn main() {
+  let out_dir = env::var("OUT_DIR").unwrap();
+
+  Command::new("clang")
+    .args(&[ "src/thread_local.c", "-O3", "-c", "-o"])
+    .arg(&format!("{}/thread_local.o", out_dir))
+    .status()
+    .unwrap();
+  Command::new("ar")
+    .args(&["crus", "libthread_local.a", "thread_local.o"])
+    .current_dir(&Path::new(&out_dir))
+    .status()
+    .unwrap();
+
+  println!("cargo:rustc-link-search=native={}", out_dir);
+  println!("cargo:rustc-link-lib=static=thread_local");
+  println!("cargo:rerun-if-changed=src/thread_local.c");
+}
+ +
+

And use it:

+ +
+
../src/main.rs
+ + +
fn with_counter<T>(f: impl FnOnce(&Cell<u32>) -> T) -> T {
+  extern "C" { fn get_thread_local() -> *mut u32; }
+  let counter =
+    unsafe { &*(get_thread_local() as *mut Cell<u32>) };
+  f(&counter)
+}
+
+fn sum_rust_c() -> u32 {
+  for step in 0..STEPS {
+    with_counter(|it| {
+      let inc = step.wrapping_mul(step) ^ step;
+      it.set(it.get().wrapping_add(inc))
+    })
+  }
+  with_counter(|it| it.get())
+}
+ +
+

The result are underwhelming:

+ +
+ + +
C:               62565888 239ms
+Rust:            62565888 485ms
+Rust/C:          62565888 1198ms
+ +
+

This is expected we replaced access to a thread local with a function call. +As we are crossing the language boundary, the compiler cant inline it, which destroys performance. +However, theres a way around that: Rust allows cross-language Link Time Optimization (docs). +That is, Rust and C compilers can cooperate, to allow the linker to do inlining across the languages.

+

This requires to manually align a bunch of stars:

+ +

Now, just recompiling the old code gives the same performance for C and Rust:

+ +
+ + +
C:               62565888 240ms
+Rust:            62565888 495ms
+Rust/C:          62565888 241ms
+ +
+

Interestingly, this is the same performance we get without any thread-locals at all:

+ +
+ + +
fn sum_local() -> u32 {
+  let mut counter = 0u32;
+  for step in 0..STEPS {
+    let inc = step.wrapping_mul(step) ^ step;
+    counter = counter.wrapping_add(inc)
+  }
+  counter
+}
+ +
+

So, either the compiler/linker was able to lift thread-local access out of the loop, or its cost is masked by arithmetics.

+

Full code for the benchmarks is available at https://github.com/matklad/ftl. +Note that this research only scratches the surface of the topic: thread locals are implemented differently on different OSes. +Even on a single OS, there are be differences depending on compilation flags (dynamic libraries differ from static libraries, for example). +Looking at the generated assembly could also be illuminating (code on Compiler Explorer).

+

Discussion on /r/rust.

+

Update(2023-12-18): since writing this post, Rust gained an ability to opt-out of lazy +initialization semantics by using a const block in the thread_local macro:

+ +
+ + +
thread_local! {
+  static COUNTER: Cell<u32> = const { Cell::new(0) };
+}
+ +
+

This remove the overhead measured in this article. Note that in this case const { is a feature of +thread_local macro. That is, const { is parsed specifically by the declarative macro +machinery, it is not a part of a more general (currently unstable) const block syntax.

+
+
+ + + + + diff --git a/2020/10/15/study-of-std-io-error.html b/2020/10/15/study-of-std-io-error.html new file mode 100644 index 00000000..7d133089 --- /dev/null +++ b/2020/10/15/study-of-std-io-error.html @@ -0,0 +1,582 @@ + + + + + + + Study of std::io::Error + + + + + + + + + + + + +
+ +
+ +
+
+ +

Study of std::io::Error

+

In this article well dissect the implementation of std::io::Error type from the Rusts standard library. +The code in question is here: +library/std/src/io/error.rs.

+

You can read this post as either of:

+
    +
  1. +A study of a specific bit of standard library. +
  2. +
  3. +An advanced error management guide. +
  4. +
  5. +A case of a beautiful API design. +
  6. +
+

The article requires basic familiarity with Rust error handing.

+
+

When designing an Error type for use with Result<T, E>, the main question to ask is how the error will be used?. +Usually, one of the following is true.

+ +

Note that theres a tension between exposing implementation details and encapsulating them. A common anti-pattern for implementing the first case is to define a kitchen-sink enum:

+ +
+ + +
pub enum Error {
+  Tokio(tokio::io::Error),
+  ConnectionDiscovery {
+    path: PathBuf,
+    reason: String,
+    stderr: String,
+  },
+  Deserialize {
+    source: serde_json::Error,
+    data: String,
+  },
+  ...,
+  Generic(String),
+}
+ +
+

There is a number of problems with this approach.

+

First, exposing errors from underlying libraries makes them a part of your public API. +Major semver bump in your dependency would require you to make a new major version as well.

+

Second, it sets all the implementation details in stone. +For example, if you notice that the size of ConnectionDiscovery is huge, boxing this variant would be a breaking change.

+

Third, it is usually indicative of a larger design issue. +Kitchen sink errors pack dissimilar failure modes into one type. +But, if failure modes vary widely, it probably isnt reasonable to handle them! +This is an indication that the situation looks more like the case two.

+ + +

However bad the enum approach might be, it does achieve maximum inspectability of the first case.

+

The propagation-centered second case of error management is typically handled by using a boxed trait object. +A type like Box<dyn std::error::Error> can be constructed from any specific concrete error, can be printed via Display, and can still optionally expose the underlying error via dynamic downcasting. +The anyhow crate is a great example of this style.

+

The case of std::io::Error is interesting because it wants to be both of the above and more.

+ +

Heres what std::io::Error looks like:

+ +
+ + +
pub struct Error {
+  repr: Repr,
+}
+
+enum Repr {
+  Os(i32),
+  Simple(ErrorKind),
+  Custom(Box<Custom>),
+}
+
+struct Custom {
+  kind: ErrorKind,
+  error: Box<dyn error::Error + Send + Sync>,
+}
+ +
+

First thing to notice is that its an enum internally, but this is a well-hidden implementation detail. +To allow inspecting and handing of various error conditions theres a separate public fieldless kind enum:

+ +
+ + +
#[derive(Clone, Copy)]
+#[non_exhaustive]
+pub enum ErrorKind {
+  NotFound,
+  PermissionDenied,
+  Interrupted,
+  ...
+  Other,
+}
+
+impl Error {
+  pub fn kind(&self) -> ErrorKind {
+    match &self.repr {
+      Repr::Os(code) => sys::decode_error_kind(*code),
+      Repr::Custom(c) => c.kind,
+      Repr::Simple(kind) => *kind,
+    }
+  }
+}
+ +
+

Although both ErrorKind and Repr are enums, publicly exposing ErrorKind is much less scary. +A #[non_exhaustive] Copy fieldless enums design space is a point there are no plausible alternatives or compatibility hazards.

+

Some io::Errors are just raw OS error codes:

+ +
+ + +
impl Error {
+  pub fn from_raw_os_error(code: i32) -> Error {
+    Error { repr: Repr::Os(code) }
+  }
+  pub fn raw_os_error(&self) -> Option<i32> {
+    match self.repr {
+      Repr::Os(i) => Some(i),
+      Repr::Custom(..) => None,
+      Repr::Simple(..) => None,
+    }
+  }
+}
+ +
+

Platform-specific sys::decode_error_kind function takes care of mapping error codes to ErrorKind enum. +All this together means that code can handle error categories in a cross-platform way by inspecting the .kind(). +However, if the need arises to handle a very specific error code in an OS-dependent way, that is also possible. +The API carefully provides a convenient abstraction without abstracting away important low-level details.

+

An std::io::Error can also be constructed from an ErrorKind:

+ +
+ + +
impl From<ErrorKind> for Error {
+  fn from(kind: ErrorKind) -> Error {
+    Error { repr: Repr::Simple(kind) }
+  }
+}
+ +
+

This provides cross-platform access to error-code style error handling. +This is handy if you need the fastest possible errors.

+

Finally, theres a third, fully custom variant of the representation:

+ +
+ + +
impl Error {
+  pub fn new<E>(kind: ErrorKind, error: E) -> Error
+  where
+    E: Into<Box<dyn error::Error + Send + Sync>>,
+  {
+    Self::_new(kind, error.into())
+  }
+
+  fn _new(
+    kind: ErrorKind,
+    error: Box<dyn error::Error + Send + Sync>,
+  ) -> Error {
+    Error {
+      repr: Repr::Custom(Box::new(Custom { kind, error })),
+    }
+  }
+
+  pub fn get_ref(
+    &self,
+  ) -> Option<&(dyn error::Error + Send + Sync + 'static)> {
+    match &self.repr {
+      Repr::Os(..) => None,
+      Repr::Simple(..) => None,
+      Repr::Custom(c) => Some(&*c.error),
+    }
+  }
+
+  pub fn into_inner(
+    self,
+  ) -> Option<Box<dyn error::Error + Send + Sync>> {
+    match self.repr {
+      Repr::Os(..) => None,
+      Repr::Simple(..) => None,
+      Repr::Custom(c) => Some(c.error),
+    }
+  }
+}
+ +
+

Things to note:

+ +

Similarly, Display implementation reveals the most important details about internal representation.

+ +
+ + +
impl fmt::Display for Error {
+  fn fmt(&self, fmt: &mut fmt::Formatter<'_>) -> fmt::Result {
+    match &self.repr {
+      Repr::Os(code) => {
+        let detail = sys::os::error_string(*code);
+        write!(fmt, "{} (os error {})", detail, code)
+      }
+      Repr::Simple(kind) => write!(fmt, "{}", kind.as_str()),
+      Repr::Custom(c) => c.error.fmt(fmt),
+    }
+  }
+}
+ +
+

To sum up, std::io::Error:

+ +

The last point means that io::Error can be used for ad-hoc errors, as &str and String are convertible to Box<dyn std::error::Error>:

+ +
+ + +
io::Error::new(io::ErrorKind::Other, "something went wrong")
+ +
+

It also can be used as a simple replacement for anyhow. +I think some libraries might simplify their error handing with this:

+ +
+ + +
io::Error::new(io::ErrorKind::InvalidData, my_specific_error)
+ +
+

For example, serde_json provides the following method:

+ +
+ + +
fn from_reader<R, T>(rdr: R) -> Result<T, serde_json::Error>
+where
+  R: Read,
+  T: DeserializeOwned,
+ +
+

Read can fail with io::Error, so serde_json::Error needs to be able to represent io::Error internally. +I think this is backwards (but I dont know the whole context, Id be delighted to be proven wrong!), and the signature should have been this instead:

+ +
+ + +
fn from_reader<R, T>(rdr: R) -> Result<T, io::Error>
+where
+  R: Read,
+  T: DeserializeOwned,
+ +
+

Then, serde_json::Error wouldnt have Io variant and would be stashed into io::Error with InvalidData kind.

+ + +

I think std::io::Error is a truly marvelous type, which manages to serve many different use-cases without much compromise. +But can we perhaps do better?

+

The number one problem with std::io::Error is that, when a file-system operation fails, you dont know which path it has failed for! +This is understandable Rust is a systems language, so it shouldnt add much fat over what OS provides natively. +OS returns an integer return code, and coupling that with a heap-allocated PathBuf could be an unacceptable overhead!

+ + +

I dont know an obviously good solution here. +One option would be to add compile time (once we get std-aware cargo) or runtime (a-la RUST_BACKTRACE) switch to heap-allocate all path-related IO errors. +A similarly-shaped problem is that io::Error doesnt carry a backtrace.

+

The other problem is that std::io::Error is not as efficient as it could be:

+ +

I think we can fix this now!

+

First, we can get rid of double indirection by using a thin trait object, a-la +failure or +anyhow. +Now that GlobalAlloc exist, its a relatively straight-forward implementation.

+

Second, we can make use of the fact that pointers are aligned, and stash both Os and Simple variants into usize with the least significant bit set. +I think we can even get creative and use the second least significant bit, leaving the first one as a niche. +That way, even something like io::Result<i32> can be pointer-sized!

+

And this concludes the post. +Next time youll be designing an error type for your library, take a moment to peer through +sources +of std::io::Error, you might find something to steal!

+

Discussion on /r/rust.

+ + +
+
+ + + + + diff --git a/2020/11/01/notes-on-paxos.html b/2020/11/01/notes-on-paxos.html new file mode 100644 index 00000000..13fad850 --- /dev/null +++ b/2020/11/01/notes-on-paxos.html @@ -0,0 +1,1050 @@ + + + + + + + Notes on Paxos + + + + + + + + + + + + +
+ +
+ +
+
+ +

Notes on Paxos

+

These are my notes after learning the Paxos algorithm. +The primary goal here is to sharpen my own understanding of the algorithm, but maybe someone will find this explanation of Paxos useful! +This post assumes fluency with mathematical notation.

+

I must confess it took me a long time to understand distributed consensus. +Ive read a whole bunch of papers +(Part Time Parliament, +Paxos Made Simple, +Practical BFT, +In Search of an Understandable Consensus Algorithm, +CASPaxos: Replicated State Machines without logs), but they didnt make sense. +Or rather, nothing specific was unclear, but, at the same time, I was unable to answer the core question:

+ + +

That means that I didnt actually understand the algorithm.

+

What finally made the whole thing click are

+ +

I now think that the thing is actually much simpler than it is made to believe :-)

+

Buckle in, we are starting!

+
+ +

+ What is Paxos? +

+

Paxos is an algorithm for implementing distributed consensus. +Suppose you have N machines which communicate over a faulty network. +The network may delay, reorder, and lose messages (it can not corrupt them though). +Some machines might die, and might return later. +Due to network delays, machine is dead and machine is temporary unreachable are indistinguishable. +What we want to do is to make machines agree on some value. +“Agree here means that if some machine says value is X, and another machine says value is Y, then X necessary is equal to Y. +It is OK for machine to answer I dont know yet.

+

The problem with this formulation is that Paxos is an elementary, but subtle algorithm. +To understand it (at least for me), a precise, mathematical formulation is needed. +So, lets try again.

+

What is Paxos? +Paxos is a theorem about sets! +This is definitely mathematical, and is true (as long as you base math on set theory), but is not that helpful. +So, lets try again.

+

What is Paxos? +Paxos is a theorem about nondeterministic state machines!

+

A system is characterized by a state. +The system evolves in discrete steps: each step takes system from state to state'. +Transitions are non-deterministic: from a single current s1, you may get to different next states s2 and s3. +(non-determinism models a flaky network). +An infinite sequence of systems states is called a behavior:

+ +
+ + +
state_0 → state_1 → ... → state_n → ...
+ +
+

Due to non-determinism, theres a potentially infinite number of possible behaviors. +Nonetheless, depending on the transition function, we might be able to prove that some condition is true for any state in any behavior.

+

Lets start with a simple example, and also introduce some notation. +I wont use TLA+, as I dont enjoy its concrete syntax. +Instead, math will be set in monospaced unicode.

+

The example models an integer counter. +Each step the counter decrements or increments (non-deterministically), but never gets too big or too small

+ +
+
Counter
+ + +
Sets:
+  ℕ -- Natural numbers with zero
+
+Vars:
+  counter ∈ ℕ
+
+Init ≡
+  counter = 0
+
+Next ≡
+    (counter < 9 ∧ counter' = counter + 1)
+  ∨ (counter > 0 ∧ counter' = counter - 1)
+
+Theorem:
+  ∀ i: 0 ≤ counter_i ≤ 9
+
+-- Notation
+-- ≡: equals by definition
+-- ∧: "and", conjunction
+-- ∨: "or",  disjunction
+ +
+

The sate of the system is a single variable counter. +It holds a natural number. +In general, we will represent a state of any system by a fixed set of variables. +Even if the system logically consists of several components, we model it using a single unified state.

+

The Init formula specifies the initial state, the counter is zero. +Note that = is a mathematical equality, and not an assignment. +Init is a predicate on states.

+

Init is true for {counter: 0}.
+Init is false for {counter: 92}.

+

Next defines a non-deterministic transition function. +It is a predicate on pairs of states, s1 and s2. +counter is a variable in the s1 state, counter' is the corresponding variable in the s2 state. +In plain English, transition from s1 to s2 is valid if one of these is true:

+
    +
  • +Value of counter in s1 is less than 9 and value of counter in s2 is greater by 1. +
  • +
  • +Value of counter in s1 is greater than 0, and value of counter in s2 is smaller by 1. +
  • +
+

Next is true for ({counter: 5}, {counter: 6}).
+Next is false for ({counter: 5}, {counter: 5}).

+

Here are some behaviors of this system:

+
    +
  • +0 → 1 → 2 → 3 → 4 → 5 → 6 → 7 → 8 → 9 +
  • +
  • +0 → 1 → 0 → 1 → 0 → 1 +
  • +
  • +0 → 1 → 2 → 3 → 2 → 1 → 0 +
  • +
+

Here are some non behaviors of this system:

+
    +
  • +1 → 2 → 3 → 4 → 5: Init does not hold for initial state +
  • +
  • +0 → 2: Next does not hold for (0, 2) pair +
  • +
  • +0 → 1 → 0 → -1: Next does not hold for (0, -1) pair +
  • +
+

behavior means that the initial state satisfies Init, and each transition satisfies Next.

+

We can state and prove a theorem about this system: for every state in every behavior, the value of counter is between 0 and 9. +Proof is by induction:

+
    +
  • +The condition is true in the initial state. +
  • +
  • +If the condition is true for state s1, and Next holds for (s1, s2), then the condition is true for s2. +
  • +
  • +QED. +
  • +
+

As usual with induction, sometimes we would want to prove a stronger property, because it gives us more powerful base for an induction step.

+

To sum up, we define a non-deterministic state machine using two predicates Init and Next. +Init is a predicate on states which restricts possible initial states. +Next is a predicate on pairs of states, which defines a non-deterministic transition function. +Vars section describes the state as a fixed set of typed variables. +Sets defines auxiliary fixed sets, elements of which are values of variables. +Theorem section specifies a predicate on behaviors: sequences of steps evolving according to Init and Next.

+

The theorem does not automatically follow from Init and Next, it needs to be proven. +Alternatively, we can simulate a range of possible behaviors on a computer and check the theorem for the specific cases. +If the set of reachable states is small enough (finite would be a good start), we can enumerate all behaviors and produce a brute force proof. +If there are too many reachable states, we cant prove the theorem this way, but we often can prove it to be wrong, by finding a counter example. +This is the idea behind model checking in general and TLA+ specifically.

+
+
+ +

+ What is Consensus? +

+

Having mastered the basic vocabulary, lets start slowly building towards Paxos. +We begin with defining what consensus is. +As this is math, well do it using sets.

+ +
+ + +
Sets:
+  𝕍 -- Arbitrary set of values
+
+Vars:
+  chosen ∈ 2^𝕍 -- Subset of values
+
+Theorem:
+    ∀ i: |chosen_i| ≤ 1
+  ∧ ∀ i, j: i ≤ j ∧ chosen_i ≠ {} ⇒ chosen_i = chosen_j
+
+-- Notation
+-- {}:  empty set
+-- 2^X: set of all subsets of X, powerset
+-- |X|: cardinality (size) of the set
+ +
+

The state of the system is a set of chosen values. +For this set to constitute consensus (over time) we need two conditions to hold:

+
    +
  • +at most one value is chosen +
  • +
  • +if we choose a value at one point in time, we stick to it (math friendly: any two chosen values are equal to each other) +
  • +
+

Heres the simplest possible implementation of consensus:

+ +
+
Consensus
+ + +
Sets:
+  𝕍 -- Arbitrary set of values
+
+Vars:
+  chosen ∈ 2^𝕍 -- Subset of values
+
+Init ≡
+  chosen = {}
+
+Next ≡
+  chosen = {} ∧ ∃ v ∈ 𝕍: chosen' = {v}
+
+
+Theorem:
+    ∀ i: |chosen_i| ≤ 1
+  ∧ ∀ i, j: i ≤ j ∧ (chosen_i ≠ {} ⇒ chosen_i = chosen_j)
+ +
+

In the initial state, the set of chosen values is empty. +We can make a step if the current set of chosen values is empty, in which case we select an arbitrary value.

+

This technically breaks our behavior theory: we require behaviors to be infinite, but, for this spec, we can only make a single step. +The fix is to allow empty steps: a step which does not change the state at all is always valid. +We call such steps stuttering steps.

+

The proof of the first condition of the consensus theorem is a trivial induction. +The proof of the second part is actually non-trivial, heres a sketch. +Assume that i and j are indices, which violate the condition. +They might be far from each other in state-space, so we cant immediately apply Next. +So lets choose the smallest j1 ∈ [i+1;j] such that the condition is violated. +Let i1 = j1 - 1. +The condition is still violated for (i1, j1) pair, but this time they are subsequent steps, and we can show that Next does not hold for them, concluding the proof.

+

Yay! We have a distributed consensus algorithm which works for 1 (one) machine:

+ +
+
Distributed Consensus For One Machine
+ + +
Pick arbitrary value.
+ +
+
+
+ +

+ Simple Voting +

+

Lets try to extend this to a truly distributed case, where we have N machines (acceptors). +We start with formalizing the naive consensus algorithm: let acceptors vote for values, and select the value which gets a majority of votes.

+ +
+
Majority Vote
+ + +
Sets:
+  𝕍 -- Arbitrary set of values
+  𝔸 -- Finite set of acceptors
+
+Vars:
+  votes ∈ 2^(𝔸×𝕍) -- Set of (acceptor, value) pairs
+
+Init ≡
+  votes = {}
+
+Next ≡
+  ∃ a ∈ 𝔸:
+      ∃ v ∈ V: votes' = votes ∪ {(a, v)}
+    ∧ ∀ v ∈ V: (a, v) ∉ votes
+
+chosen ≡
+  {v ∈ V: |{a ∈ 𝔸: (a, v) ∈ votes}| > |𝔸| / 2}
+ +
+

The state of the system is the set of all votes cast by all acceptors. +We represent a vote as a pair of an acceptor and the value it voted for. +Initially, the set of votes is empty. +On each step, some acceptor casts a vote for some value (adds (a, v) pair to the set of votes), but only if it hasnt voted yet. +Remember that Next is a predicate on pairs of states, so we check votes for existing vote, but add a new one to votes'. +The value is chosen if the set of acceptors which voted for the value ({a ∈ 𝔸: (a, v) ∈ votes}) is at least half as large as the set of all acceptors. +In other words, if a majority of acceptors has voted for the value.

+ + +

Lets prove consensus theorem for Majority Vote protocol. +TYPE ERROR, DOES NOT COMPUTE. +The consensus theorem is a predicate on behaviors of states consisting of chosen variable. +Here, chosen isnt a variable, votes is! +chosen is a function which maps current state to some boolean.

+

While it is intuitively clear what consensus theorem would look like for this case, lets make this precise. +Lets map states with votes variable to states with chosen variable using the majority rule, f. +This mapping naturally extends to a mapping between corresponding behaviors (sequences of steps):

+ +
+ + +
  f(votes_0   →   votes_1  → ...)
+= f(votes_0)  → f(votes_1) → ...
+=  chosen_0   →  chosen_1  → ...
+ +
+

Now we can precisely state that for every behavior B of majority voting spec, the theorem holds for f(B). +This yields a better way to prove this! +Instead of proving the theorem directly (which would again require i1, j1 trick), we prove that our mapping f is a homomorphism. +That is, we prove that if votes_0 → votes_1 → ... is a behavior of the majority voting spec, then f(votes_0) → f(votes_1) → ... is a behavior of the consensus spec. +This lets us to re-use existing proof.

+

The poof for initial step is trivial, but lets spell it out just to appreciate the amount of details a human mind can glance through

+ +
+ + +
  f({votes: {}})
+= {chosen: {v ∈ V: |{a ∈ 𝔸: (a, v) ∈ {}}| > |𝔸| / 2}}
+= {chosen: {v ∈ V: |{}| > |𝔸| / 2}}
+= {chosen: {v ∈ V: 0 > |𝔸| / 2}}
+= {chosen: {v ∈ V: FALSE}}
+= {chosen: {}}
+ +
+

Lets show that if Majority Votes Next_m holds for (votes, votes'), then Consensuss Next_c holds for (f(votes), f(votes')). +Theres one obstacle on our way: this claim is false! +Consider a case with three acceptors and two values: 𝔸 = {a1, a2, a3}, 𝕍 = {v1, v2}. +Consider these values of votes and votes':

+ +
+ + +
votes  = {(a1, v1), (a2, v1), (a1, v2)}
+votes' = {(a1, v1), (a2, v1), (a1, v2), (a3, v2)}
+ +
+

If you just mechanically check Next, you see that it works! +a3 hasnt cast its vote, so it can do this now. +The problem is that chosen(votes) = {v1} and chosen(votes') = {v1, v2}.

+

We are trying to prove too much! +f works correctly only for states reachable from Init, and the bad value of votes where a1 votes twice is not reachable.

+

So, we first should prove a lemma: each acceptor votes at most once. +After that, we can prove Next_m(votes, votes') = Next_c(f(votes), f(votes')) under the assumption of at most once voting. +Specifically, if |f(votes')| turns out to be larger than 1, then we can pick two majorities which voted for different values, which allows to pin down a single acceptor which voted twice, which is a contradiction. +The rest is left as an exercise for the reader :)

+

So, majority vote indeed implements consensus. +Lets look closer at the majority condition. +It is clearly important. +If we define chosen as

+ +
+ + +
chosen ≡
+  {v ∈ V: |{a ∈ 𝔸: (a, v) ∈ votes}| > 0}
+ +
+

then its easy to construct a behavior with several chosen values. +The property of majority we use is that any two majorities have at least one acceptor in common. +But any other condition with this property would work as well as majority. +For example, we can assign an integer weight to each acceptor, and require the sum of weights to be more than half. +As a more specific example, consider a set of for acceptors {a, b, c, d}.

+

Its majorities are:

+ +
+ + +
{a, b, c, d}
+{a, b, c}
+{a, b, d}
+{a, c, d}
+{b, c, d}
+ +
+

But the following set of sets would also satisfy non-empty intersection condition:

+ +
+ + +
{a, b, c, d}
+{a, b, c}
+{a, b, d}
+{a, c}
+{b, c}
+ +
+

Operationally, it is strictly better, as fewer are acceptors needed to reach a decision.

+

So lets refine the protocol to a more general form.

+ +
+
Quorum Vote
+ + +
Sets:
+  𝕍       -- Arbitrary set of values
+  𝔸       -- Finite set of acceptors
+  ℚ ∈ 2^𝔸 -- Set of quorums
+
+Assume:
+  ∀ q1, q2 ∈ ℚ: q1 ∩ q2 ≠ {}
+
+Vars:
+  votes ∈ 2^(𝔸×𝕍) -- Set of (acceptor, value) pairs
+
+Init ≡
+  votes = {}
+
+Next ≡
+  ∃ a ∈ 𝔸:
+      ∃ v ∈ V: votes' = votes ∪ {(a, v)}
+    ∧ ∀ v ∈ V: (a, v) ∉ votes
+
+chosen ≡
+  {v ∈ V: ∃ q ∈ ℚ: AllVotedFor(q, v)}
+
+AllVotedFor(q, v) ≡
+  ∀ a ∈ q: (a, v) ∈ votes
+ +
+

We require to specify a set of quorums set a of subsets of acceptors such that every two quorums have at least one acceptor in common. +The value is chosen if there exists a quorum such that its every member voted for the value.

+

Theres one curious thing worth noting here. +Consensus is a property of the whole system, theres no single place where we can point to and say hey, this is it, this is consensus. +Imagine 3 acceptors, sitting on Earth, Venus, and Mars, and choosing between values v1 and v2. +They can execute Quorum Vote algorithm without communicating with each other at all. +They will necessary reach consensus without knowing which specific value they agreed on! +An external observer can then travel to the three planets, collect the votes and discover the chosen value, but this feature isnt built into the algorithm itself.

+

OK, so weve just described an algorithm for finding consensus among N machines, proved the consensus theorem for it, and noted that it has staggering communication efficiency: zero messages. +Should we collect our Turing Award?

+

Well, no, theres a big problem with Quorum Vote it can get stuck. +Specifically, if there are three values, and the votes are evenly split between them, then no value is chosen, and only stuttering steps are possible. +If you can vote for different values, it might happen that neither value receives a majority of votes. +Voting satisfies the safety property, but not the liveness property the algorithm can get stuck even if all machines are on-line and communication is perfect.

+

There is a simple fix to the problem, with a rich historical tradition among many democratic governments. +Lets have a vote, and lets pick the value chosen by the majority, but lets allow to vote only for a single candidate value:

+ +
+
Rigged Quorum Vote
+ + +
Sets:
+  𝕍       -- Arbitrary set of values
+  𝔸       -- Finite set of acceptors
+  ℚ ∈ 2^𝔸 -- Set of quorums
+
+Assume:
+  ∀ q1, q2 ∈ ℚ: q1 ∩ q2 ≠ {}
+
+Vars:
+  votes ∈ 2^(𝔸×𝕍) -- Set of (acceptor, value) pairs
+
+Init ≡
+  votes = {}
+
+Next ≡
+  ∃ a ∈ 𝔸, v ∈ V:
+      ∀ (a1, v1) ∈ votes: v1 = v
+    ∧ votes' = votes ∪ {(a, v)}
+
+chosen ≡
+  {v ∈ V: ∃ q ∈ ℚ: AllVotedFor(q, v)}
+
+AllVotedFor(q, v) ≡
+  ∀ a ∈ q: (a, v) ∈ votes
+ +
+

The new condition says that an acceptor is only allowed to cast a vote if all other votes are for the same value. +As a special case, if the set of votes is empty, the acceptor can vote for any value (but all other acceptors would have to vote for this value afterwards).

+

From a mathematical point of view, this algorithm is perfect. +From a practical stand point, not so much: an acceptor to cast the first vote somehow needs to make sure that it is indeed the first one. +The obvious fix to this problem is to assign a unique integer number to each acceptor, call the highest-numbered acceptor leader, and allow only the leader to cast the first decisive vote.

+

So acceptors first communicate with each other to figure out who the leader is, then the leader casts the vote, and the followers follow. +But this also violates liveness: if the leader dies, then the followers would wait indefinitely. +A fix for this problem is to let the second highest acceptor to take over the leadership if the leader perishes. +But under our assumptions, its impossible to distinguish between a situation when the leader is dead from a situation when it just has a really bad internet connection. +So naively picking successor would lead to a split vote and a standstill again (power transitions are known to be problematic for authoritarian regimes in real life too!). +If only there were some kind of distributed consensus algorithm for picking the leader!

+
+
+ +

+ Ballot Voting +

+

This is the place were we start discussing real Paxos :-) +It starts with a ballot voting algorithm. +This algorithm, just like the ones weve already seen, does not define any messages. +Rather, message passing is an implementation detail, so well get to it later.

+

Recall that rigged voting requires all acceptors to vote for a single values. +It is immune to split voting, but is susceptible to getting stuck when the leader goes offline. +The idea behind ballot voting is to have many voting rounds, ballots. +In each ballot, acceptors can vote only for a single value, so each ballot individually can get stuck. +However, as we are running many ballots, some ballots will make progress. +The value is chosen in a ballot if it is chosen by some quorum of acceptors. +The value is chosen in an overall algorithm if it is chosen in some ballot.

+

The Turing award question is: how do we make sure that no two ballots choose different values? +Note that it is OK if two ballots choose the same value.

+

Lets just brute force this question, really. +First, assume that the ballots are ordered (for example, by numbering them with natural numbers). +And lets say we want to pick some value v to vote for in ballot b. +When v is safe? +Well, when no other value v1 can be chosen by any other ballot. +Lets tighten this up a bit.

+

Value v is safe at ballot b if any smaller ballot b1 (b1 < b) did not choose and will not choose any value other than v.

+

So yeah, easy-peasy, we just need to predict which values will be chosen in the future, and we are done! +Well deal with it in a moment, but lets first convince ourselves that, if we only select safe values for voting, we wont violate consensus spec.

+

So, when we select a safe value v to vote for in a particular ballot, it might get chosen in this ballot. +We need to check that it wont conflict with any other value. +For smaller ballots thats easy its the definition of safety condition. +What if we conflict with some value v1 chosen in a future ballot? +Well, that value is also safe, so whoever chose v1, was sure that it wont conflict with v.

+

How do we tackle the precognition problem? +Well ask acceptors to commit to not voting in certain ballots. +For example, if you are looking for a safe value for ballot b and know that theres a quorum q such that each quorum member never voted in smaller ballots, and promised to never vote in smaller ballots, you can be sure that any value is safe. +Indeed, any quorum in smaller ballots will have at least one member which would refuse to vote for any value.

+

Ok, but what if theres some quorum member which has already voted for some v1 in some ballot b1 < b? +(Take a deep breath, the next sentence is the kernel of the core idea of Paxos). +Well, that means that v1 was safe at b1, so, if there will be no votes between b1 and b, v1 is also safe at b! +(Exhale). +In other words, to pick a safe value at b we:

+
    +
  1. +Take some quorum q. +
  2. +
  3. +Make everyone in q promise to never vote in ballots earlier than b. +
  4. +
  5. +Among all of the votes already cast by the quorum members we pick the one with the highest ballot number. +
  6. +
  7. +If such vote exists, its value is a safe value. +
  8. +
  9. +Otherwise, any value is safe. +
  10. +
+

To implement the never vote promise, each acceptor will maintain maxBal value. +It will never vote in ballots smaller or equal to maxBal.

+

Lets stop hand-waving and put this algorithm in math. +Again, we are not thinking about messages yet, and just assume that each acceptor can observe the state of the whole system.

+ +
+
Ballot Vote
+ + +
Sets:
+  𝔹       -- Numbered set of ballots (for example, ℕ)
+  𝕍       -- Arbitrary set of values
+  𝔸       -- Finite set of acceptors
+  ℚ ∈ 2^𝔸 -- Set of quorums
+
+Assume:
+  ∀ q1, q2 ∈ ℚ: q1 ∩ q2 ≠ {}
+
+Vars:
+  -- Set of (acceptor, ballot, value) triples
+  votes ∈ 2^(𝔸×𝔹×𝕍)
+
+  -- Function that maps acceptors to ballot numbers or -1.
+  -- maxBal :: 𝔸 -> 𝔹 ∪ {-1}
+  maxBal ∈ (𝔹 ∪ {-1})^𝔸
+
+Voted(a, b) ≡
+  ∃ v ∈ 𝕍: (a, b, v) ∈ votes
+
+Safe(v, b) ≡
+  ∃ q ∈ ℚ:
+      ∀ a ∈ q: maxBal(a) ≥ b - 1
+    ∧ ∃ b1 ∈ 𝔹 ∪ {-1}:
+          ∀ b2 ∈ [b1+1; b-1], a ∈ q: ¬Voted(a, b2)
+        ∧ b1 = -1 ∨ ∃ a ∈ q: (a, b1, v) ∈ votes
+
+AdvanceMaxBal(a, b) ≡
+    maxBal(a) < b
+  ∧ votes' = votes
+  ∧ maxBal' = λ a1 ∈ 𝔸: if a1 = a then b else maxBal(a1)
+
+Vote(a, b, v) ≡
+    maxBal(a) < b
+  ∧ ∀ (a1, b1, v1) ∈ votes: b = b1 ⇒ v = v1
+  ∧ Safe(v, b)
+  ∧ votes' = votes ∪ (a, b, v)
+  ∧ maxBal' = λ a1 ∈ 𝔸: if a1 = a then b else maxBal(a1)
+
+Init ≡
+    votes = {}
+  ∧ maxBal = λ a ∈ 𝔸: -1
+
+Next ≡
+  ∃ a ∈ 𝔸, b ∈ 𝔹:
+      AdvanceMaxBal(a, b)
+    ∨ ∃ v ∈ 𝕍: Vote(a, b, v)
+
+chosen ≡
+  {v ∈ V: ∃ q ∈ ℚ, b ∈ 𝔹: AllVotedFor(q, b, v)}
+
+AllVotedFor(q, b, v) ≡
+  ∀ a ∈ q: (a, b, v) ∈ votes
+
+-- Notation
+-- [b1;b2]: inclusive interval of ballots
+--- Y^X: set of function from X to Y (f: X -> Y)
+-- λ x ∈ X: y: function that maps x to y
+-- ¬: "not", negation
+--
+-- f' = λ x1 ∈ X: if x1 = x then y else f(x1):
+-- A tedious way to write that f' is the same function as f,
+-- except on x, where it returns y instead.
+--
+-- I am sorry! In my defense, TLA+ notation for this
+--- is also horrible :-)
+ +
+

Lets unwrap this top-down. +First, the chosen condition says that it is enough for some quorum to cast votes in some ballot for a value to be accepted. +Its trivial to see that, if we fix the ballot, then any two quorums would vote for the same value quorums intersect. +Showing that quorums vote for the same value in different ballots is the tricky bit.

+

The Init condition is simple no votes, any acceptor can vote in any ballot (= any ballot with number larger than -1).

+

The Next consists of two cases. +On each step of the protocol, some acceptor either votes for some value in some ballot ∃ v ∈ 𝕍: Vote(a, b, v), or declares that it wont cast additional vote in small ballots AdvanceMaxBal(a, b). +Advancing ballot just sets maxBal for this acceptor (but takes care not to rewind older decisions). +Casting a vote is more complicated and is predicated on three conditions:

+
    +
  • +We havent forfeited our right to vote in this ballot. +
  • +
  • +If theres some vote in this ballot already, we are voting for the same value. +
  • +
  • +If there are no votes, then the value should be safe. +
  • +
+

Note that the last two checks overlap a bit: if the set of votes cast in a ballot is not empty, we immediately know that the value is safe: somebody has proven this before. +But it doesnt harm to check for safety again: a safe value can not become unsafe.

+

Finally, the safety check. +It is done in relation to some quorum if q proves that v is safe, than members of this quorum would prevent any other value to be accepted in early ballots. +To be able to do this, we first need to make sure that q indeed finalized their votes for ballots less than b (maxBall is at least b - 1). +Then, we need to find the latest vote of q. +There are two cases

+
    +
  • +No one in q ever voted (b1 = -1). +In this case, there are no additional conditions on v, any value would work. +
  • +
  • +Someone in q voted, and b1 is the last ballot when someone voted. +Then v must be the value voted for in b1. +This implies Safe(v, b1). +
  • +
+

If all of these conditions are fulfilled, we cast our vote and advance maxBall.

+

This is the hardest part of the article. +Take time to fully understand Ballot Vote.

+ + +

Rigorously proving that Ballot Voting satisfies Consensus would be tedious the specification is large, and the proof would necessary use every single piece of the spec! +But lets add some hand-waving. +Again, we want to provide homomorphism from Ballot Voting to Consensus. +Cases where the image of a step is a stuttering step (the set of chosen values is the same) are obvious. +Its also obvious that the set of chosen values never decreases (we never remove votes, so a value can not become unchosen). +It also increases by at most one value with each step.

+

The complex case is to prove that, if currently only v1 is chosen, no other v2 can be chosen as a result of the current step. +Suppose the contrary, let v2 be the newly chosen value, and v1 be a different value chosen some time ago. +v1 and v2 cant belong to the same ballot, because every ballot contains votes only for a single value (this needs proof!). +Lets say they belong to b1 and b2, and that b1 < b2. +Note that v2 might belong to b1 nothing prevents smaller ballot from finishing later. +When we chose v2 for b2, it was safe. +This means that some quorum either promised not to vote in b1 (but then v1 couldnt have been chosen in b1), or someone from the quorum voted for v2 in b1 (but then v1 = v2 (proving this might require repeated application of safety condition)).

+

Ok, but is this better than Majority Voting? +Can Ballot Voting get stuck? +No if at least one quorum of machines is online, they can bump their maxBall to a ballot bigger than any existing one. +After they do this, there necessary will be a safe value relative to this quorum, which they can then vote on.

+

However, Ballot Voting is prone to a live lock if acceptors continue to bump maxBal instead of voting, theyll never select any value. +In fact, in the current formulation one needs to be pretty lucky to not get stuck. +To finish voting, there needs to be a quorum which can vote in ballot b, but not in any smaller ballot, and in the above spec this can only happen by luck.

+

It is impossible to completely eliminate live locks without assumptions about real time. However, when we implement Ballot Voting with real message passing, we try to reduce the probability of a live lock.

+
+
+ +

+ Paxos for Real +

+

One final push left! +Given the specification of Ballot Voting, how do we implement it using message passing? +Specifically, how do we implement the logic for selecting the first (safe) value for the ballot?

+

The idea is to have a designated leader for each ballot. +As there are many ballots, we dont need a leader selection algorithm, and can just statically assign ballot leaders. +For example, if there are N acceptors, acceptor 0 can lead ballots 0, N, 2N, …, acceptor 1 can lead 1, N + 1, 2N + 1, … etc.

+

To select a value for ballot b, the ballots leader broadcasts a message to initiate the ballot. +Upon receiving this message, each acceptor advances its maxBall to b - 1, and sends the leader its latest vote, unless the acceptor has already made a promise to not vote in b. +If the leader receives replies from some quorum, it can be sure that this quorum wont vote in smaller ballots. +Besides, the leader knows quorums votes, so it can pick a safe value.

+

In other words, the practical trick for picking a safe value is to ask some quorum to abstain from voting in small ballots and to pick a value consistent with votes already cast. +This is the first phase of Paxos, consisting of two message types, 1a and 1b.

+

The second phase is to ask the quorum to cast the votes. +The leader picks a safe value and broadcasts it for the quorum. +Quorum members vote for the value, unless in the meantime they happened to promise to a leader of the bigger ballot to not vote. +After a member voted, it broadcasts its vote. +When a quorum of votes is observed, the value is chosen and the consensus is reached. +This is the second phase of Paxos with messages 2a and 2b.

+

Lets write this in math! +To model message passing, we will use msgs variable: a set of messages which have ever been send. +Sending a message is adding it to this set. +Receiving a message is asserting that it is contained in the set. +By not removing messages, we model reorderings and duplications.

+

The messages themselves will be represented by records. For example, phase 1a message which initiates voting in ballot b will look like this:

+ +
+ + +
{type: "1a", bal: b}
+ +
+

Another bit of state well need is lastVote for each acceptor, what was the last ballot the acceptor voted in, together with the corresponding vote. +It will be null if the acceptor hasnt voted.

+

Without further ado,

+ +
+
Paxos
+ + +
Sets:
+  𝔹       -- Numbered set of ballots (for example, ℕ)
+  𝕍       -- Arbitrary set of values
+  𝔸       -- Finite set of acceptors
+  ℚ ∈ 2^𝔸 -- Set of quorums
+
+  -- Sets of messages for each of the four subphases
+  Msgs1a ≡ {type: {"1a"}, bal: 𝔹}
+
+  Msgs1b ≡ {type: {"1b"}, bal: 𝔹, acc: 𝔸,
+            vote: {bal: 𝔹, val: 𝕍} ∪ {null}}
+
+  Msgs2a ≡ {type: {"2a"}, bal: 𝔹, val: 𝕍}
+
+  Msgs2b ≡ {type: {"2b"}, bal: 𝔹, val: 𝕍, acc: 𝔸}
+
+Assume:
+  ∀ q1, q2 ∈ ℚ: q1 ∩ q2 ≠ {}
+
+Vars:
+  -- Set of all messages sent so far
+  msgs ∈ 2^(Msgs1a ∪ Msgs1b ∪ Msgs2a ∪ Msgs2b)
+
+  -- Function that maps acceptors to ballot numbers or -1
+  -- maxBal :: 𝔸 -> 𝔹 ∪ {-1}
+  maxBal ∈ (𝔹 ∪ {-1})^𝔸
+
+  -- Function that maps acceptors to their last vote
+  -- lastVote :: 𝔸 -> {bal: 𝔹, val: 𝕍} ∪ {null}
+  lastVote ∈ ({bal: 𝔹, val: 𝕍} ∪ {null})^𝔸
+
+Send(m) ≡ msgs' = msgs ∪ {m}
+
+Phase1a(b) ≡
+    Send({type: "1a", bal: b})
+  ∧ maxBal' = maxBal
+  ∧ lastVote' = lastVote
+
+Phase1b(a) ≡
+  ∃ m ∈ msgs:
+      m.type = "1a" ∧ maxBal(a) < m.bal
+    ∧ maxBal' = λ a1 ∈ 𝔸: if a = a1
+                            then m.bal - 1
+                            else maxBal(a1)
+    ∧ lastVote' = lastVote
+    ∧ Send({type: "1b", bal: m.bal, acc: a, vote: lastVote(a)})
+
+Phase2a(b, v) ≡
+   ¬∃ m ∈ msgs: m.type = "2a" ∧ m.bal = b
+  ∧ ∃ q ∈ ℚ:
+    let
+      qmsgs  ≡ {m ∈ msgs: m.type = "1b" ∧ m.bal = b ∧ m.acc ∈ q}
+      qvotes ≡ {m ∈ qmsgs: m.vote ≠ null}
+    in
+        ∀ a ∈ q: ∃ m ∈ qmsgs: m.acc = a
+      ∧ (  qvotes = {}
+         ∨ ∃ m ∈ qvotes:
+               m.vote.val = v
+             ∧ ∀ m1 ∈ qvotes: m1.vote.bal <= m.vote.bal)
+      ∧ Send({type: "2a", bal: b, val: v})
+      ∧ maxBal' = maxBal
+      ∧ lastVote' = lastVote
+
+Phase2b(a) ≡
+  ∃ m ∈ msgs:
+      m.type = "2a" ∧ maxBal(a) < m.bal
+    ∧ maxBal' = λ a1 ∈ 𝔸: if a = a1 then m.bal else maxBal(a1)
+    ∧ lastVote' = λ a1 ∈ 𝔸: if a = a1
+                              then {bal: m.bal, val: m.val}
+                              else lastVote(a1)
+    ∧ Send({type: "2b", bal: m.bal, val: m.val, acc: a})
+
+Init ≡
+    msgs = {}
+  ∧ maxBal   = λ a ∈ 𝔸: -1
+  ∧ lastVote = λ a ∈ 𝔸: null
+
+Next ≡
+    ∃ b ∈ 𝔹:
+        Phase1a(b) ∨ ∃ v ∈ 𝕍: Phase2a(b, v)
+  ∨ ∃ a ∈ 𝔸:
+        Phase1b(a) ∨ Phase2b(a)
+
+chosen ≡
+  {v ∈ V: ∃ q ∈ ℚ, b ∈ 𝔹: AllVotedFor(q, b, v)}
+
+AllVotedFor(q, b, v) ≡
+  ∀ a ∈ q: (a, b, v) ∈ votes
+
+votes ≡
+  let
+    msgs2b ≡ {m ∈ msgs: m.type = "2b"}
+  in
+    {(m.acc, m.bal, m.val): m ∈ msgs2b}
+
+-- Notation
+-- {f1: value1, f2: value}  -- a record with .f1 and .f2 fields
+-- {f1: Set1, f2: Set2}     -- set of records
+--- let name ≡ def in expr   -- local definition of name
+ +
+

Lets go through each of the phases.

+

Phase1a initiates ballot b. +It is executed by the ballots leader, but theres no need to model who exactly the leader is, as long as it is unique. +This stage simply broadcasts 1a message.

+

Phase1b is executed by an acceptor a. +If a receives 1a message for ballot b and it can vote in b, then it replies with its lastVote. +If it cant vote (it has already started some larger ballot), it simply doesnt respond. +If enough acceptors dont respond, the ballot will get stuck, but some other ballot might succeed.

+

Phase2a is the tricky bit, it checks if the value v is save for ballot b.

+

First, we need to make sure that we havent already initiated Phase2a for this ballot. +Otherwise, we might initiate Phase2a for different values. +Here is the bit where it is important that the ballots leader is stable. +The leader needs to remember if it has already picked a safe value.

+

Then, we collect 1b messages from some quorum (we need to make sure that every quorum member has send 1b message for this ballot). +Value v is safe if the whole quorum didnt vote (vote is null), or if it is the value of the latest vote of some quorum member. +We know that quorum members wont vote in earlier ballots, because they had increased maxBal before sending 1b messages.

+

If the value indeed turns out to be safe, we broadcast 2a message for this ballot and value.

+

Finally, in Phase2b an acceptor a votes for this value, if its maxBall is still good. +The bookkeeping is updating maxBal, lastVote, and sending the 2b message.

+

The set of 2b messages corresponds to the votes variable of the Ballot Voting specification.

+
+
+ +

+ Notes on Notes +

+

Theres a famous result called FLP impossibility: Impossibility of Distributed Consensus with One Faulty Process. +But weve just presented Paxos algorithm, which works as long as more than half of the processes are alive. +What gives? +FLP theorem states that theres no consensus algorithm with finite behaviors. +Stated in a positive way, any asynchronous distributed consensus algorithm is prone to live-lock. +This is indeed the case for Paxos.

+

Liveness can be improved under partial synchronity assumptions. +Ie, if we give each process a good enough clock, such that we can say things like if no process fails, Paxos completes in t seconds. +If this is the case, we can fix live locking (ballots conflicting each other) by using naive leader selection algorithm to select the single acceptor which can initiate ballots. +If we dont reach consensus after t seconds, we can infer that someone has failed and re-run naive leader selection. +If we are unlucky, naive leader selection will produce two leaders, but this wont be a problem for safety.

+

Paxos requires atomicity and durability to function correctly. +For example, once the has leader picked safe value and has broadcasted a 2a message, it should persist the selected value. +Otherwise, if it goes down and then resurrects, it might choose a different value. +How to make a choice of value atomic and durable? +Write it to a local database! +How to make local transaction atomic and durable? +Write it first into the write ahead log? +How to write something to WAL? +Using the write syscall/DMA. +What happens if the power goes down exactly in the middle of the write operation? +Well, we can write a chunk of bytes with a checksum! +Even if the write itself is not atomic, a checksummed write is! +If we read the record from disk and checksum matches, then the record is valid.

+

I use slightly different definition of maxBal (less by one) than the one in the linked lecture, dont get confused about this!

+

See https://github.com/matklad/paxosnotes for TLA specs.

+
+
+
+ + + + + diff --git a/2020/11/11/yde.html b/2020/11/11/yde.html new file mode 100644 index 00000000..5e894603 --- /dev/null +++ b/2020/11/11/yde.html @@ -0,0 +1,224 @@ + + + + + + + Why an IDE? + + + + + + + + + + + + +
+ +
+ +
+
+ +

Why an IDE?

+

Some time ago I wrote a reddit comment explaining the benefits of IDEs. +Folks refer to it from time to time, so I decided to edit it into an article form. +Enjoy!

+

I think I have a rather balanced perspective on IDEs. +I used to be a heavy Emacs user (old config, current config). +I worked at JetBrains on IntelliJ Rust for several years. +I used evil mode and vim for a bit, and tried tmux and kakoune. +Nowadays, I primarily use VS Code to develop rust-analyzer: LSP-based editor-independent IDE backend for Rust.

+

I will be focusing on IntelliJ family of IDEs, as I believe these are the most advanced IDEs today.

+

The main distinguishing feature of IntelliJ is semantic understanding of code. +The core of IntelliJ is a compiler which parses, type checks and otherwise understands your code. +PostIntelliJ is the canonical post about this. +That article also refutes the claim that Smalltalk IDE is the best weve ever had.

+

Note that semantic understanding is mostly unrelated to the traditional interpretation of IDE as Integrated Development Environment. +I personally dont feel that the Integrated bit is all that important. +I commit&push from the command line using Julia scripts, rebase in magit, and do code reviews in a browser. +If anything, theres an ample room for improvement for the integration bits. +For me, I in IDE stands for intelligent, smart.

+

Keep in mind this terminology difference. +I feel it is a common source of misunderstanding. +“Unix and command line can do anything an IDE can do is correct about integrated bits, but is wrong about semantical bits.

+

Traditional editors like Vim or Emacs understand programming languages very approximately, mostly via regular expressions. +For me, this feels very wrong. +Its common knowledge that HTML shall not be parsed with regex. +Yet this is exactly what happens every time one does vim index.html with syntax highlighting on. +I sincerely think that almost every syntax highlighter out there is wrong and we, as an industry, should do better. +I also understand that this is a tall order, but I do my best to change the status quo here :-)

+

These are mostly theoretical concerns though. +The question is, does semantic understanding help in practice? +I am pretty sure that it is non-essential, especially for smaller code bases. +My first non-trivial Rust program was written in Emacs, and it was fine. +Most of rust-analyzer was written using pretty spartan IDE support. +There are a lot of insanely-productive folks who are like sometimes I type vim, sometimes I type vi, they are sufficiently similar. +Regex-based syntax highlighting and regex based fuzzy symbol search (ctags) get you a really long way.

+

However, I do believe that features unlocked by deep understanding of the language help. +The funniest example here is extend/shrink selection. +This features allows you to extend current selection to the next encompassing syntactic construct. +Its the simplest feature a PostIntelliJ IDE can have, it only needs the parser. +But it is sooo helpful when writing code, it just completely blows vims text objects out of the water, especially when combined with multiple cursors. +In a sense, this is structural editing which works for text.

+ +
+ + +
+

If you add further knowledge of the language into a mix, youll get the assists system: micro-refactoring which available in a particular context. +For example, if the cursor is on a comma in a list of function arguments, you can alt+enter > swap arguments, and the order of arguments will be changed in the declaration and on various call-sites as well. +(See this post to learn how assists are implemented).

+

These small dwim things add up to a really nice editing experience, where you mostly express the intention, and the IDE deals with boring syntactical aspects of code editing:

+ +
+ + +
+

For larger projects, complex refactors are a huge time-saver. +Doing project-wide renames and signature changes automatically and without thinking reduces the cost of keeping the code clean.

+

Another transformative experience is navigation. +In IntelliJ, you generally dont open a file. +Instead you think directly in terms of functions, types and modules, and navigate to those using file structure, goto symbol, to do definition/implementation/type, etc:

+

https://www.jetbrains.com/help/idea/navigating-through-the-source-code.html

+

When I used Emacs, I really admired its buffer management facilities, because they made opening a file I want a breeze. +When I later switched to IntelliJ, I stopped thinking in terms of a set of opened files altogether. +I disabled editor tabs and started using editor splits less often you dont need bookmarks if you can just find things.

+

For me, theres one aspect of traditional editors which is typically not matched in IDEs out of the box basic cursor motion. +Using arrow keys for that is slow and flow-breaking, because one needs to move the hand from the home row. +Even Emacs horrific C-p, C-n are a big improvement, and vims hjkl go even further. +One fix here is to configure each tool to use your favorite shortcuts, but this is a whack-a-mole game. +What I do is remapping CapsLock to act as an extra modifier, such that ijkl are arrow keys. +(There are also keyboards with hardware support for this). +This works in all applications the same way. +Easy motion / ace jump functionality for jumping to any visible character is also handy, and usually is available via a plugin.

+

Recent advancements with LSP protocol promise to give one the best of both worlds, where semantic-aware backend and light-weight editor frontend are different processes, which can be mixed and matched. +This is nice in theory, but not as nice in practice as IntelliJ yet, mostly because IntelliJ is way more polished.

+

To give a simple example, in IntelliJ for go to symbol by fuzzy name functionality, I can filter the search scope by:

+ +

VS Code and LSP simply do not have capabilities for such filters yet, they have to be bolted on using hacks. +Support for LSP in other editors is even more hit-and-miss.

+

LSP did achieve a significant breakthrough it made people care about implementing IDE backends. +Experience shows that re-engineering an existing compiler to power an IDE is often impossible, or isomorphic to a rewrite. +How a compiler talks to an editor is the smaller problem. +The hard one is building a compiler that can do IDE stuff in the first place. +Check out this post for some of the technical details. +Starting with this use-case in mind saves a lot of effort down the road.

+

This I think is a big deal. +I hypothesize that the reason why IDEs do not completely dominate tooling landscape is the lack of good IDE backends.

+

If we look at the set of languages fairly popular recently, a significant fraction of them is dynamically typed: PHP, JavaScript, Python, Ruby. +The helpfulness of an IDE for dynamically typed languages is severely limited: while approximations and heuristics can get you a long way, you still need humans in the loop to verify IDEs guesses.

+

Theres C++, but its templates are effectively dynamically typed, with exactly the same issues (and a very complex base language to boot). +Curiously, C looks like a language for which implementing a near-perfect IDE is pretty feasible. +I dont know why it didnt happen before CLion.

+

This leaves C# and Java. +Indeed, these languages are dominated by IDEs. +Theres a saying that you cant write Java without an IDE. +I think it gets the causation direction backwards: Java is one of the few languages for which it is possible to implement a great IDE without great pain. +Supporting evidence here is Go. +According to survey results, text editors are stably declining in popularity in favor of IDEs.

+

I think this is because Go actually has good IDEs. +This is possible because the language is sufficiently statically typed for an IDE to be a marked improvement. +Additionally, the language is very simple, so the amount of work you need to put in to make a decent IDE is much lower than for other languages. +If you have something like JavaScript… +Well, you first need to build an alternative language for which you can actually implement an IDE (TypeScript) and only then you can build the IDE itself (VS Code).

+
+
+ + + + + diff --git a/2020/12/12/notes-on-lock-poisoning.html b/2020/12/12/notes-on-lock-poisoning.html new file mode 100644 index 00000000..f345ee37 --- /dev/null +++ b/2020/12/12/notes-on-lock-poisoning.html @@ -0,0 +1,231 @@ + + + + + + + Notes On Lock Poisoning + + + + + + + + + + + + +
+ +
+ +
+
+ +

Notes On Lock Poisoning

+

Rusts libs teams is considering overhauling std::sync module. +As a part of this effort, they are launching lock poisoning survey.

+

https://blog.rust-lang.org/2020/12/11/lock-poisoning-survey.html

+

This is post is a an extended response to that survey. +It is not be well-edited :-)

+
+ +

+ Panics Should Propagate +

+

Midori error model makes sharp distinction between two kinds of errors:

+
    +
  • +bugs in the program, like indexing an array with -92 +
  • +
  • +error conditions in programs environment (reading a file which doesnt exist) +
  • +
+

In Rust, those correspond to panics and Results. +Its important to not mix the two.

+

std I think sadly does mix them in sync API. +The following APIs convert panics to recoverable results:

+
    +
  • +Mutex::lock +
  • +
  • +thread::JoinHandle::join +
  • +
  • +mpsc::Sender::send +
  • +
+

All those APIs return a Result when the other thread panicked. +These leads to people using ? with these methods, using recoverable error handling for bugs in the program.

+

In my mind, a better design would be to make those API panic by default. +Sometimes synchronization point also happen to be failure isolation boundaries. +More verbose result-returning catching_lock, catching_join, catching_send would work for those special cases.

+

If std::Mutex did implement lock poisoning, but the lock method returned a LockGuard<T>, rather than Result<LockGuard<T>, PoisonError>, then we wouldnt be discussing poisoning in the rust book, in every mutex example, and wouldnt consider changing the status quo. +At the same time, wed preserve safer semantics of lock poisoning.

+

Theres an additional consideration here. +In a single-threaded program, panic propagation is linear. +One panic is unwound past a sequence of frames. +If we get the second panic in some Drop, the result is process aborting.

+

In a multi-threaded program, the stack is tree-shaped. +What should happen if one of the three parallel threads panics? +I believe the right semantics here is that siblings are cancelled, and then the panic is propagated to the parent. +How to implement cancellation is an open question. +If two children panic, we should propagate a pair of panics.

+
+
+ +

+ Almost UnwindSafe +

+

A topic closely related to lock poisoning is unwinding safety UnwindSafe and RefUnwindSafe traits. +I want to share an amusing story how this machinery almost, but not quite, saved my bacon.

+

rust-analyzer implements cancellation via unwinding. +After a user types something and we have new code to process, we set a global flag. +Long-running background tasks like syntax highlighting read this flag and, if it is set, panic with a struct Cancelled payload. +We use resume_unwind and not panic to avoid printing backtrace. +After the stack is unwound, we can start processing new code.

+

This means that rust-analyzers data, stored in the Db type, needs to be unwind safe.

+

One day while I was idly hacking on rust-analyzer during Rust all-hands Ive noticed a weird compilation error, telling me that Db doesnt implement the corresponding trait. +Whats worse, removing the target directory fixed the bug. +This was an instance of incorrect incremental compilation.

+

The problem stemmed from two issues:

+
    +
  • +UnwindSafe and RefUnwindSafe are auto traits, and inference rules for those are complicated +
  • +
  • +Db type has a curiously recurring template structure +
  • +
+

With incremental compilation in the mix, something somewhere went wrong.

+

The compiler bug was fixed after several months, but, to work around it in the meantime, weve added a manual impl UnwindSafe for Db which masked the bug.

+

Couple of months more has passed, and we started integrating chalk into rust-analyzer. +At that time, chalk had its own layer of caching, in addition to the incremental compilation of rust-analyzer itself. +So we had something like this:

+ +
+ + +
struct Db {
+    solver: parking_lot::Mutex<ChalkSolver>,
+    ...
+}
+ +
+

(We used parking_lot for perf, and to share mutex impl between salsa and rust-analyzer).

+

Now, one of the differences between std::Mutex and parking_lot::Mutex is lock poisoning. +And that means that std::Mutex is unwind safe (as it just becomes poisoned), while parking_lot::Mutex is not. +Chalk used some RefCells internally, so it wasnt unwind safe. +So the whole Db stopped being UnwindSafe after addition of chalk. +But because we had that manual impl UnwindSafe for Db, we havent noticed this.

+

And that lead to a heisenbug. +If cancellation happened during trait solving, we unwound past ChalkSolver. +And, as didnt have strict exception safety guarantees, that messed up its internal questions. +So the next trait solving query would observe really weird errors like index out of bounds inside chalk.

+

The solution was to:

+
    +
  • +remove the manual impl (by that time the underlying compiler bug was fixed). +
  • +
  • +get the Db: !UnwindSafe expected error. +
  • +
  • +replace parking_lot::Mutex with std::Mutex to get unwind-safety. +
  • +
  • +change calls to .lock to propagate cancellation. +
  • +
+

The last point is interesting, it means that we need support for recoverable poisoning in this case. +We need to understand that the other thread was cancelled mid-operation (so that chalks state might be inconsistent). +And we also need to re-raise the panic with a specific payload the Cancelled struct. +This is because the situation is not a bug.

+

Discussion on /r/rust.

+
+
+
+ + + + + diff --git a/2020/12/28/csdi.html b/2020/12/28/csdi.html new file mode 100644 index 00000000..416422da --- /dev/null +++ b/2020/12/28/csdi.html @@ -0,0 +1,356 @@ + + + + + + + Call Site Dependency Injection + + + + + + + + + + + + +
+ +
+ +
+
+ +

Call Site Dependency Injection

+

This post documents call site dependency injection pattern. +It is a rather low level specimen and has little to do with enterprise DI. +The pattern is somewhat Rust-specific.

+

Usually, when you implement a type which needs some user-provided functionality, the first thought is to supply it in constructor:

+ +
+ + +
struct Engine {
+    config: Config,
+    ...
+}
+
+impl Engine {
+    fn new(config: Config) -> Engine { ... }
+    fn go(&mut self) { ... }
+}
+ +
+

In this example, we implement Engine and the caller supplies Config.

+

An alternative is to pass the dependency to every method call:

+ +
+ + +
struct Engine {
+    ...
+}
+
+impl Engine {
+    fn new() -> Engine { ... }
+    fn go(&mut self, config: &Config) { ... }
+}
+ +
+

In Rust, the latter (call-site injection) sometimes works with lifetimes better. +Lets see the examples!

+
+ +

+ Lazy Field +

+

In the first example, we want to lazily compute a fields value based on other fields. +Something like this:

+ +
+ + +
struct Widget {
+    name: String,
+    name_hash: Lazy<u64>,
+}
+
+impl Widget {
+    fn new(name: String) -> Widget {
+        Widget {
+            name,
+            name_hash: Lazy::new(|| {
+                compute_hash(&self.name)
+            }),
+        }
+    }
+}
+ +
+

The problem with this design is that it doesnt work in Rust. +The closure in Lazy needs access to self, and that would create a self-referential data structure!

+

The solution is to supply the closure at the point where the Lazy is used:

+ +
+ + +
struct Widget {
+    name: String,
+    name_hash: OnceCell<u64>,
+}
+
+impl Widget {
+    fn new(name: String) -> Widget {
+        Widget {
+            name,
+            name_hash: OnceCell::new(),
+        }
+    }
+    fn name_hash(&self) -> u64 {
+        *self.name_hash.get_or_init(|| {
+            compute_hash(&self.name)
+        })
+    }
+}
+ +
+
+
+ +

+ Indirect Hash Table +

+

The next example is about plugging a custom hash function into a hash table. +In Rusts standard library, this is only possible on the type level, by implementing the Hash trait for a type. +A more general design would be to parameterize the table with a hash function at run-time. +This is what C++ does. +However in Rust this wont be general enough.

+

Consider a string interner, which stores strings in a vector and additionally maintains a hash-based index:

+ +
+ + +
struct Interner {
+    vec: Vec<String>,
+    set: HashSet<usize>,
+}
+
+impl Interner {
+    fn intern(&mut self, s: &str) -> usize { ... }
+    fn lookup(&self, i: usize) -> &str { ... }
+}
+ +
+

The set field stores the strings in a hash table, but it represents them using indices into neighboring vec.

+

Constructing the set with a closure wont work for the same reason Lazy didnt work this creates a self-referential structure. +In C++ there exists a work-around it is possible to box the vec and share a stable pointer between Interner and the closure. +In Rust, that would create aliasing, preventing the use of &mut Vec.

+

Curiously, using a sorted vec instead of a hash works with std APIs:

+ +
+ + +
struct Interner {
+    vec: Vec<String>,
+    // Invariant: sorted
+    set: Vec<usize>,
+}
+
+impl Interner {
+    fn intern(&mut self, s: &str) -> usize {
+        let idx = self.set.binary_search_by(|&idx| {
+            self.vec[idx].cmp(s)
+        });
+        match idx {
+            Ok(idx) => self.set[idx],
+            Err(idx) => {
+                let res = self.vec.len();
+                self.vec.push(s.to_string());
+                self.set.insert(idx, res);
+                res
+            }
+        }
+    }
+    fn lookup(&self, i: usize) -> &str { ... }
+}
+ +
+

This is because the closure is supplied at the call site rather than at the construction site.

+

The hashbrown crate provides this style of API for hashes via RawEntry.

+
+
+ +

+ Per Container Allocators +

+

The third example is from the Zig programming language. +Unlike Rust, Zig doesnt have a blessed global allocator. +Instead, containers in Zig come in two flavors. +The Managed flavor accepts an allocator as a constructor parameter and stores it as a field +(Source). +The Unmanaged flavor adds an allocator parameter to every method +(Source).

+

The second approach is more frugal it is possible to use a single allocator reference with many containers.

+
+
+ +

+ Fat Pointers +

+

The final example comes from the Rust language itself. +To implement dynamic dispatch, Rust uses fat pointers, which are two words wide. +The first word points to the object, the second one to the vtable. +These pointers are manufactured at the point where a concrete type is used generically.

+

This is different from C++, where vtable pointer is embedded into the object itself during construction.

+
+

Having seen all these examples, I am warming up to Scala-style implicit parameters. +Consider this hypothetical bit of Rust code with Zig-style vectors:

+ +
+ + +
{
+    let mut a = get_allocator();
+    let mut xs = Vec::new();
+    let mut ys = Vec::new();
+    xs.push(&mut a, 1);
+    ys.push(&mut a, 2);
+}
+ +
+

The problem here is Drop freeing the vectors requires access to the allocator, and its unclear how to provide one. +Zig dodges the problem by using defer statement rather than destructors. +In Rust with implicit parameters, I imagine the following would work:

+ +
+ + +
impl<implicit a: &mut Allocator, T> Drop for Vec<T>
+ +
+
+

To conclude, I want to share one last example where CSDI thinking helped me to discover a better application-level architecture.

+

A lot of rust-analyzers behavior is configurable. +There are toggles for inlay hints, completion can be tweaked, and some features work differently depending on the editor. +The first implementation was to store a global Config struct together with the rest of analysis state. +Various subsystems then read bits of this Config. +To avoid coupling distinct features together via this shared struct, config keys were dynamic:

+ +
+ + +
type Config = HashMap<String, String>;
+ +
+

This system worked, but felt rather awkward.

+

The current implementation is much simpler. +Rather than storing a single Config as a part of the state, each method now accepts a specific config parameter:

+ +
+ + +
fn get_completions(
+    analysis: &Analysis,
+    config: &CompletionConfig,
+    file: FileId,
+    offset: usize,
+)
+
+fn get_inlay_hints(
+    analysis: &Analysis,
+    config: &HintsConfig,
+    file: FileId,
+)
+ +
+

Not only the code is simpler, it is more flexible. +Because configuration is no longer a part of the state, it is possible to use different configs for the same functionality depending on the context. +For example, explicitly invoked completion might be different from the asynchronous one.

+

Discussion on /r/rust.

+
+
+
+ + + + + diff --git a/2021/01/03/two-kinds-of-code-review.html b/2021/01/03/two-kinds-of-code-review.html new file mode 100644 index 00000000..715b98cf --- /dev/null +++ b/2021/01/03/two-kinds-of-code-review.html @@ -0,0 +1,189 @@ + + + + + + + Two Kinds of Code Review + + + + + + + + + + + + +
+ +
+ +
+
+ +

Two Kinds of Code Review

+

Ive read a book about management and it helped me to solve a long-standing personal conundrum about the code review process. +The book is High Output Management. +Naturally, I recommend it (and read this review as well: https://apenwarr.ca/log/20190926).

+

One of the smaller ideas of the book is that of the managerial meddling. +If my manager micro-manages me and always tells me what to do, Ill grow accustomed to that and wont be able to contribute without close supervision. +This is a facet of a more general Task-Relevant Maturity framework. +Irrespective of the overall level of seniority, a person has some expertise level for each specific task. +The optimal quantity and quality of supervisors involvement depends on this level (TRM). +When TRM grows, the management style should go from structured control to supervision to nudges and consultations. +I dont need a ton of support when writing Rust, I can benefit a lot from a thorough review when coding in Julia and I certainly require hand-holding when attempting to write Spanish! +But the overarching goal is to improve my TRM, as that directly improves my productivity and frees up my supervisors time. +The problem with meddling is not excessive control (it might be appropriate in low-TRM situations), it is that meddling removes the motivation to learn to take the wheel yourself.

+

Now, how on earth all this managerial gibberish relates to the pull request review? +I now believe that there are two largely orthogonal (and even conflicting) goals to a review process.

+

One goal of a review process is good code. +The review ensures that each change improves the overall quality of a code base. +Without continuous betterment any code under change reverts to the default architecture: a ball of goo.

+

Another goal of a review is good coders. +The review is a perfect mentorship opportunity, it is a way to increase contributors TRM. +This is vital for community-driven open-source projects.

+

I personally always felt that the review process I use falls quite short of the proper level of quality. +Which didnt really square with me bootstrapping a couple of successful open source projects. +Now I think that I just happen to optimize for the peoples aspect of the review process, while most guides +(with a notable exception of Optimistic Merging) focus on code aspects.

+

Now, (let me stress this point), I do not claim that the second goal is inherently better (though it sounds nicer). +Its just that in the context of both IntelliJ Rust and rust-analyzer (green-field projects with massive scope, big uncertainties and limited payed-for hours) growing the community of contributors and maintainers was more important than maintaining perfectly clean code.

+

Reviews for quality are hard and time consuming. +I personally cant really review the code looking at the diff, I can give only superficial comments. +To understand the code, most of the time I need to fetch it locally and to try to implement the change myself in a different way. +To make a meaningful suggestion, I need to implement and run it on my machine (and the first two attempts wont fly). +Hence, a proper review for me takes roughly the same time as the implementation itself. +Taking into account the fact that there are many more contributors than maintainers, this is an instant game over for reviews for quality.

+

Luckily, folks submitting PRs generally have medium/high TRM. +They were able to introduce themselves to the codebase, find an issue to work on and come up with a working code without me! +So, instead of scrutinizing away every last bit of diffs imperfection, my goal is to promote the contributor to an autonomous maintainer status. +This is mostly just a matter of trust. +I dont read every line of code, as I trust the author of the PR to handle ifs and whiles well enough (this is the major time saver). +I trust that people address my comments and let them merge their own PRs (bors d+). +I trust that people can review others code, and share commit access (r+) liberally.

+ + +

What new contributors dont have and what I do talk about in reviews is the understanding of project-specific architecture and values. +These are best demonstrated on specific issues with the diff. +But the focus isnt the improvement of a specific change, the focus is teaching the author of (hopefully) subsequent changes. +I liberally digress into discussing general code philosophy issues. +As disseminating this knowledge 1-1 is not very efficient, I also try to document it. +Rather than writing a PR comment, I put the text into +architecture.md or +style.md +and link that instead. +I also try to do only a small fixed number of review rounds. +Roughly, the PR is merged after two round-trips, not when theres nothing left to improve.

+

All this definitely produces warm fuzzy feelings, but what about code quality? +Gating PRs on quality is one, but not the only one, way to maintain clean code. +The approach I use instead is continuous reafactoring / asynchronous reviews. +One of the (documented) values in rust-analyzer is that anyone is allowed and encouraged to refactor all the code, old and new.

+

Instead of blocking the PR, I merge it and then refactor the code in a follow-up (ccing the original author), when I touch this area next time. +This gives me a much better context than a diff view, as I can edit the code in-place and run the tests. +I also dont waste time transforming the change I have in mind to a PR comment (the motivation bits go directly into comment/commit message). +Its also easy to do unrelated drive-by fixes!

+

I wish this asynchronous review workflow was better supported by tools. +By default, changes are merged by the author, but the PR also goes to a review queue. +Later, the reviewer looks at the merged code in the main branch. +Any suggestions are submitted as a new PR, with the original author set as a reviewer. +(The in-editor reviewing reminds me iron workflow.)

+
+

For conclusion, let me reference another book. +I like item 32 from C++ Coding Standards: be clear what kind of class youre writing. +A value type is not an interface is not a base class. +All three are classes, but each needs a unique set of rules.

+

When doing/receiving a code review, understand the context and purpose. +If this is a homework assignment, you want to share knowledge. +In a critical crypto library, you need perfect code. +And for a young open source project, you aim to get a co-maintainer!

+
+
+ + + + + diff --git a/2021/02/06/ARCHITECTURE.md.html b/2021/02/06/ARCHITECTURE.md.html new file mode 100644 index 00000000..79b47810 --- /dev/null +++ b/2021/02/06/ARCHITECTURE.md.html @@ -0,0 +1,159 @@ + + + + + + + ARCHITECTURE.md + + + + + + + + + + + + +
+ +
+ +
+
+ +

ARCHITECTURE.md

+

If you maintain an open-source project in the range of 10k-200k lines of code, I strongly encourage you to add an ARCHITECTURE document next to README and CONTRIBUTING. +Before going into the details of why and how, I want to emphasize that this is not another docs are good, write more docs advice. +I am pretty sloppy about documentation, and, e.g., I often use just simplify as a commit message. +Nonetheless, I feel strongly about the issue, even to the point of pestering you :-)

+

I have experience with both contributing to and maintaining open-source projects. +One of the lessons Ive learned is that the biggest difference between an occasional contributor and a core developer lies in the knowledge about the physical architecture of the project. +Roughly, it takes 2x more time to write a patch if you are unfamiliar with the project, but it takes 10x more time to figure out where you should change the code. +This difference might be hard to perceive if youve been working with the project for a while. +If I am new to a code base, I read each file as a sequence of logical chunks specified in some pseudo-random order. +If Ive made significant contributions before, the perception is quite different. +I have a mental map of the code in my head, so I no longer read sequentially. +Instead, I just jump to where the thing should be, and, if it is not there, I move it. +Ones mental map is the source of truth.

+

I find the ARCHITECTURE file to be a low-effort high-leverage way to bridge this gap. +As the name suggests, this file should describe the high-level architecture of the project. +Keep it short: every recurring contributor will have to read it. +Additionally, the shorter it is, the less likely it will be invalidated by some future change. +This is the main rule of thumb for ARCHITECTURE only specify things that are unlikely to frequently change. +Dont try to keep it synchronized with code. +Instead, revisit it a couple of times a year.

+

Start with a birds eye overview of the problem being solved. +Then, specify a more-or-less detailed codemap. +Describe coarse-grained modules and how they relate to each other. +The codemap should answer wheres the thing that does X?. +It should also answer what does the thing that I am looking at do?. +Avoid going into details of how each module works, pull this into separate documents or (better) inline documentation. +A codemap is a map of a country, not an atlas of maps of its states. +Use this as a chance to reflect on the project structure. +Are the things you want to put near each other in the codemap adjacent when you run tree .?

+

Do name important files, modules, and types. +Do not directly link them (links go stale). +Instead, encourage the reader to use symbol search to find the mentioned entities by name. +This doesnt require maintenance and will help to discover related, similarly named things.

+

Explicitly call-out architectural invariants. +Often, important invariants are expressed as an absence of something, and its pretty hard to divine that from reading the code. +Think about a common example from web development: nothing in the model layer specifically doesnt depend on the views.

+

Point out boundaries between layers and systems as well. +A boundary implicitly contains information about the implementation of the system behind it. +It even constrains all possible implementations. +But finding a boundary by just randomly looking at the code is hard good boundaries have measure zero.

+

After finishing the codemap, add a separate section on cross-cutting concerns.

+

A good example of ARCHITECTURE document is this one from rust-analyzer: +architecture.md.

+ +
+
+ + + + + diff --git a/2021/02/10/a-better-profiler.html b/2021/02/10/a-better-profiler.html new file mode 100644 index 00000000..29934506 --- /dev/null +++ b/2021/02/10/a-better-profiler.html @@ -0,0 +1,255 @@ + + + + + + + A Better Rust Profiler + + + + + + + + + + + + +
+ +
+ +
+
+ +

A Better Rust Profiler

+

I want a better profiler for Rust. +Heres what a rust-analyzer benchmark looks like:

+ +
+ + +
#[test]
+fn benchmark_syntax_highlighting_parser() {
+  if skip_slow_tests() {
+    return;
+  }
+
+  let fixture = bench_fixture::glorious_old_parser();
+  let (analysis, file_id) = fixture::file(&fixture);
+
+  let hash = {
+    let _pt = bench("syntax highlighting parser");
+    analysis
+      .highlight(file_id)
+      .unwrap()
+      .iter()
+      .filter(|it| {
+        it.highlight.tag == HlTag::Symbol(SymbolKind::Function)
+      })
+      .count()
+  };
+  assert_eq!(hash, 1629);
+}
+ +
+

Heres how I want to profile it:

+ +
+ + +
#[test]
+fn benchmark_syntax_highlighting_parser() {
+  if skip_slow_tests() {
+    return;
+  }
+
+  let fixture = bench_fixture::glorious_old_parser();
+  let (analysis, file_id) = fixture::file(&fixture);
+
+  let hash = {
+    let _b = bench("syntax highlighting parser");
+    let _p = better_profiler::profile();
+    analysis
+      .highlight(file_id)
+      .unwrap()
+      .iter()
+      .filter(|it| {
+        it.highlight.tag == HlTag::Symbol(SymbolKind::Function)
+      })
+      .count()
+  };
+  assert_eq!(hash, 1629);
+}
+ +
+

First, the profiler prints to stderr:

+ +
+ + +
warning: run with `--release`
+warning: add `debug=true` to Cargo.toml
+warning: set `RUSTFLAGS="-Cforce-frame-pointers=yes"`
+ +
+

Otherwise, if everything is setup correctly, the output is

+ +
+ + +
Output is saved to:
+   ~/projects/rust-analyzer/profile-results/
+ +
+

The profile-results folder contains the following:

+ +

To tweak settings, the following API is available:

+ +
+ + +
let _p = better_profiler::profile()
+  .output("./other-dir/")
+  .samples_per_second(999)
+  .flamegraph(false);
+ +
+

Naturally, the following also works and produces an aggregate profile:

+ +
+ + +
for _ in 0..100 {
+  {
+    let _p = profile();
+    interesting_computation();
+  }
+  not_interesting_computation();
+}
+ +
+

I dont know how this should work. +I think I would be happy with a perf-based Linux-only implementation. +The perf-event crate by Jim Blandy (co-author of Programming Rust) is good.

+

Have I missed something? +Does this tool already exist? +Or is it impossible for some reason?

+

Discussion on /r/rust.

+
+
+ + + + + diff --git a/2021/02/14/for-the-love-of-macros.html b/2021/02/14/for-the-love-of-macros.html new file mode 100644 index 00000000..5a11bce4 --- /dev/null +++ b/2021/02/14/for-the-love-of-macros.html @@ -0,0 +1,302 @@ + + + + + + + For the Love of Macros + + + + + + + + + + + + +
+ +
+ +
+
+ +

For the Love of Macros

+

Ive been re-reading Ted Kaminski blog about software design. +I highly recommend all the posts, especially the earlier ones +(heres the first). +He manages to offer design advice which is both non-trivial and sound (a subjective judgment of course), a rare specimen!

+

Anyway, one of the insights of the series is that, when designing an abstraction, we always face the inherent tradeoff between power and properties. +The more we can express using a particular abstraction, the less we can say about the code using it. +Our human bias for more expressive power is not inherent however. +This is evident in programming language communities, where users unceasingly ask for new features and language designers say no.

+

Macros are a language feature which is very far in the more power side of the chart. +Macros give you an ability to abstract over the source code. +In exchange, you give up the ability to (automatically) reason about the surface syntax. +As a specific example, rename refactoring doesnt work 100% reliably in languages with powerful macro systems.

+

I do think that, in the ideal world, this is a wrong trade for a language which wants to scale to gigantic projects. +The ability to automatically reason about and transform source code gains in importance when you add more programmers, more years, and more millions of lines of code. +But take this with a huuuge grain of salt I am obviously biased, having spent several years developing Rust IDEs.

+

That said, macros have a tremendous appeal they are a language designers duct tape. +Macros are rarely the best tool for the job, but they can do almost any job. +The language design is incremental. +A macro system relieves the design pressure by providing a ready poor mans substitute for many features.

+

In this post, I want to explore what macros are used for in Rust. +The intention is to find solutions which do not give up the reasoning about source code property.

+
+ +

+ String Interpolation +

+

By far, the most common use-case is the format! family of macros. +The macro-less solution here is straightforward a string interpolation syntax:

+ +
+ + +
let key = "number";
+let value = || 92;
+let t = f"$key: ${value()}";
+assert_eq!(t.to_string(), "number: 92");
+ +
+

In Rust, interpolation probably shouldnt construct a string directly. +Instead, it can produce a value implementing Display (just like format_args!), which can avoid allocations. +An interesting extension would be to allow iterating over format string pieces. +That way, the interpolation syntax could be used for things like SQL statements or command line arguments, without the fear of introducing injection vulnerabilities:

+ +
+ + +
let arg = "my dir";
+let cmd = f"ls $arg".to_cmd();
+assert_eq!(cmd.to_string(), "ls 'my dir'");
+ +
+

This post about Julia programming language explains the issue. +xshell crate implements this idea for Rust.

+
+
+ +

+ Derives +

+

I think the second most common, and probably the most important use of macros in Rust are derives. +Rust is one of the few languages which gets equality right (and forbids comparing apples and oranges), but this crucially depends on the ability to derive(Eq). +Common solutions in this space are special casing in the compiler (Haskells deriving) or runtime reflection.

+

But the solution I am most excited about are C# source generators. +Which are nothing new this is just the old (source) code generation, just with a nice quality of implementation. +You can supply custom code which gets run during the build and which can read existing sources and generate additional files, which are then added back to the compilation.

+

The beauty of this solution is that it moves all the complexity out of the language and into the build system. +This means that you get baseline tooling support for free. +Goto definition for generated code? Just works. +Want to step into some serialization code while debugging? Theres actual source code on disk, so feel free to! +You are more of a printf person? Well, youd need to convince the build system to not stomp over your changes, but, otherwise, why not?

+

Additionally, source generators turn out to be significantly more expressive. +They can call into the Roslyn compiler to analyzer the source code, so they are capable of type-directed code generation.

+

To be useful, source generators require some language level support for splitting a single entity across several files. +In C#, partial classes play this role.

+
+
+ +

+ Domain Specific Languages +

+

The raison dêtre of macros is implementation of embedded DSLs. +We want to introduce custom syntax within the language for succinctly modeling the programs domain. +For example, a macro can be used to embed HTML fragments in Rust code.

+

To me personally, eDSL is not problem to be solved, but just a problem. +Introducing a new sublanguage (even if small) spends a lot of cognitive complexity budget. +If you need it once in a while, better stick to just chaining together somewhat verbose function calls. +If you need it a lot, it makes sense to introduce external DSL, with a compiler, a language server, and all the tooling that makes programming productive. +To me, macro-based DSLs just dont fell like an interesting point on the cost-benefit curve.

+

That being said, the Kotlin programming language solves the problem of strongly-typed, tooling-friendly DSL nicely (example). +Infuriatingly, its hard to point what specifically is the solution. +Its just the concrete syntax mostly. +Here are some ingredients:

+
    +
  • +The syntax for closures is { arg -> body }, or just { body }, so closures syntactically resemble blocks. +
  • +
  • +Extension methods (which are just sugar for static methods). +
  • +
  • +Java style implicit this, which introduces names into scope without an explicit declaration. +
  • +
  • +TCP-preserving inline closures (this the single non-syntactical feature) +
  • +
+

Nonetheless, this was not enough to implement Jetpack Compose UI DSL, it also needs a compiler plugin.

+
+
+ +

+ sqlx +

+

An interesting case of a DSL I want to call out is sqlx::query. +It allows one to write code like this:

+ +
+ + +
let account =
+  sqlx::query!("select (1) as id, 'Herp Derpinson' as name")
+    .fetch_one(&mut conn)
+    .await?;
+
+// anonymous struct has `#[derive(Debug)]` for convenience
+println!("{:?}", account);
+println!("{}: {}", account.id, account.name);
+ +
+

This I think is one of the few cases where eDSL does really pull its weight. +I dont know how to do this without macros. +Using string interpolation (the advanced version to protect from injection), it is possible to specify the query. +Using a source generator, it is possible to check the syntax of the query and verity the types, to, eg, raise a type error in this case:

+ +
+ + +
let (id, name): (i32, f32) =
+  query("select (1) as id, 'Herp Derpinson' as name")
+    .fetch_one(&mut conn)
+    .await?;
+ +
+

But this wont be enough to generate an anonymous struct, or to get rid of dynamic casts.

+
+
+ +

+ Conditional Compilation +

+

Rust also uses macros for conditional compilation. +This use case convincingly demonstrates lack of properties aspect of power. +Dealing with feature combinations is a perpetual headache for Cargo. +Users have to repeatedly recompile large chunks of the crate graph when feature flags change. +Catching a type error on CI with cargo test --no-default-features is pretty annoying, especially if you did run cargo test before submitting a PR. +“Additive Features is an uncheckable wishful thinking.

+

In this case, I dont know a good macro-less alternative. +But, in principle, this seems doable, if conditional compilation is pushed further down the compiler pipeline, to the code generation and linking stage. +Rather than discarding some code early during parsing, the compiler can select the platform-specific version just before producing machine code for a function. +Before that, it checks that all conditionally-compiled versions of the function have the same interface. +That way, platform-specific type errors are impossible.

+
+
+ +

+ Placeholder Syntax +

+

The final use-case I want to cover is that of a placeholder syntax. +Rusts macro_call!(...) syntax carves a well-isolated region where anything goes, syntax wise, as long as the parenthesis are balanced. +In theory, this allow language designers to experiment with provisional syntax before setting something in stone. +In practice, it looks like this is not at all that beneficial? +There was some opposition to stabilizing postfix .await without going via intermediate period with await! macro. +And, after stabilization, all syntax discussions were immediately forgotten? +On the other hand, we did have try! -> ? transition, and I dont think it helped to uncover any design pitfalls? +At least, we managed to stabilize the unnecessary restrictive desugaring on that one.

+
+

For conclusion, I want to circle back to source generators. +What exactly makes them easier for tooling than macros? +I think the following three properties do. +First, both input and output is, fundamentally, text. +Theres no intermediate representation (like token trees), which is used by this meta-programming facility. +This means that it doesnt need to be integrated deeply with the compiler. +Of course, internally the tool is free to parse, typecheck and transform the code however it likes. +Second, there is a phase distinction. +Source generators are executed once, in unordered fashion. +Theres no back and forth between meta programming and name resolution, which, again, allows to keep meta part outside. +Third, source generators can only add code, they can not change the meaning of the existing code. +This means that semantically sound source code transformations remains so in the presence of a code generator.

+

Thats all! +Discussion on /r/rust.

+
+
+
+ + + + + diff --git a/2021/02/15/NEAR.html b/2021/02/15/NEAR.html new file mode 100644 index 00000000..6d6a47da --- /dev/null +++ b/2021/02/15/NEAR.html @@ -0,0 +1,142 @@ + + + + + + + matklad @ NEAR + + + + + + + + + + + + +
+ +
+ +
+
+ +

matklad @ NEAR

+

Hey, I have a short announcement to make: I am joining NEAR (sharded proof of stake public blockchain)! +TL;DR: Ill be spending 60% of my time on WASM runtime for smart contracts and 40% on rust-analyzer.

+

Why NEAR? +One of the problems I have with current popular blockchain technologies is that they are not scalable. +Every node needs to process every transaction in the network. +For a network of with N nodes that is roughly O(N^2) total work. +NEAR aims to solve exactly this problem using the classic big data trick sharding the data across several partitions.

+

Another aspect of NEAR I am particularly excited about is the strategic focus on the smart contracts developer experience. +Thats why NEAR is particularly interested in supporting rust-analyzer. +Rust, with its top-notch WASM ecosystem and focus on correctness is a natural choice for writing contracts. +At the same time, it is not the most approachable language there is. +Good tooling can help a lot with surmounting the languages inherent complexity, making writing smart contracts in Rust easy.

+

What does it mean for rust-analyzer? +Well see: I am still be putting significant hours into it, although a bit less than previously. +Ill also help to manage rust-analyzer Open Collective. +And, naturally, my know-how about building IDEs isnt going anywhere :) +At the same time, I am excited about lowering the bus factor and distributing rust-analyzer maintainership. +I do want to take credit for initiating the effort, but its high time for some structured leadership rotation. +Its exciting to see @jonas-schievink from Ferrous System taking on more team leadership tasks. +(I am hyped about support for inner items, kudos Jonas!) +I am also delighted with the open source community that formed around rust-analyzer. +@edwin0cheng, +@flodiebold, +@kjeremy, +@lnicola, +@SomeoneToIgnore, +@Veetaha, +@Veykril +you are awesome, and rust-analyzer wouldnt be possible without you ❤️

+

Finally, I cant help but notice that IntelliJ Rust which I left completely a while ago is doing better than ever. +Overall, I must say I am quite happy with todays state of Rust IDE tooling. +The basics are firmly in place. +Lets just finish the remaining 90%!

+
+
+ + + + + diff --git a/2021/02/24/another-generic-dilemma.html b/2021/02/24/another-generic-dilemma.html new file mode 100644 index 00000000..41ad5f0b --- /dev/null +++ b/2021/02/24/another-generic-dilemma.html @@ -0,0 +1,158 @@ + + + + + + + Another Generic Dilemma + + + + + + + + + + + + +
+ +
+ +
+
+ +

Another Generic Dilemma

+

In The Generic Dilemma, Russ Cox observes that you can have only two of

+ +

(but see 1 and 2 for how you can achieve a middle ground with enough compiler wizardry)

+

Now that Go is getting generics, I want to point out another dilemma:

+

Any language has parametric polymorphism, eventually

+

If you start with just dynamic dispatch, youll end up adding generics down the road. +This happened with C++ and Java, and is now happening with Go. +The last one is interesting even if you dont carry accidental OOP baggage (inheritance), interfaces alone are not enough.

+

Why does it happen? +Well, because generics are useful for simple things. +Even if the language special-cases several parametric data structures, like go does with slices, maps and channels, it is impossible to abstract over them. +In particular, its impossible to write list_reverse or list_sort functions without some awkward workarounds.

+

Ok, but wheres the dilemma? +The dilemma is that adding parametric polymorphism to the language opens floodgates of complexity. +At least in my experience, Rust traits, Haskell type classes, and Java generics are the main reason why some libraries in those languages are hard to use.

+

Its not that generics are inherently hard, fn reverse<T>(xs: [T]) -> [T] is simple. +Its that they allow creating complicated solutions, and this doesnt play well with our human bias for complexity.

+

One thing I am wondering is whether a polymorphic language without bounded quantification would be practical? +Again, in my anecdotal experience, cognitive complexity soars when there are bounds on type parameters: T: This<S> + That. +But parametric polymorphism can be useful without them:

+ +
+ + +
fn sort<T: Ord>(xs: &mut [T]) { ... }
+ +
+

is equivalent to

+ +
+ + +
struct Ord<T> {
+  cmp: fn(&T, &T) -> Ordering
+}
+
+fn sort<T>(ord: Ord<T>, xs: &mut [T]) { ... }
+ +
+

Can we build an entire language out of this pattern?

+
+
+ + + + + diff --git a/2021/02/27/delete-cargo-integration-tests.html b/2021/02/27/delete-cargo-integration-tests.html new file mode 100644 index 00000000..45257d65 --- /dev/null +++ b/2021/02/27/delete-cargo-integration-tests.html @@ -0,0 +1,327 @@ + + + + + + + Delete Cargo Integration Tests + + + + + + + + + + + + +
+ +
+ +
+
+ +

Delete Cargo Integration Tests

+

Click bait title! +Well actually look into how integration and unit tests are implemented in Cargo. +A few guidelines for organizing test suites in large Cargo projects naturally arise out of these implementation differences. +And, yes, one of those guidelines will turn out to be: delete all integration tests but one.

+

Keep in mind that this post is explicitly only about Cargo concepts. +It doesnt discuss relative merits of integration or unit styles of testing. +Id love to, but thats going to be a loooong article some other day!

+
+ +

+ Loomings 🐳 +

+

When you use Cargo, you can put #[test] functions directly next to code, in files inside src/ directory. +Alternatively, you can put them into dedicated files inside tests/:

+ +
+ + +
awesomeness-rs/
+  Cargo.toml
+  src/          # unit tests go here
+    lib.rs
+    submodule.rs
+    submodule/
+      tests.rs
+
+  tests/        # integration tests go here
+    is_awesome.rs
+ +
+

I stress that unit/integration terminology is based purely on the location of the #[test] functions, and not on what those functions actually do.

+

To build unit tests, Cargo runs

+ +
+ + +
rustc --test src/lib.rs
+ +
+

Rustc then compiles the library with --cfg test. +It also injects a generated fn main(), which invokes all functions annotated with #[test]. +The result is an executable file which, when run subsequently by Cargo, executes the tests.

+

Integration tests are build differently. +First, Cargo uses rustc to compile the library as usual, without --cfg test:

+ +
+ + +
rustc --crate-type=rlib src/lib.rs
+ +
+

This produces an .rlib file a compiled library.

+

Then, for each file in the tests directory, Cargo runs the equivalent of

+ +
+ + +
rustc --test --extern awesomeness=path/to/awesomeness.rlib \
+    ./tests/is_awesome.rs
+ +
+

That is, each integration test is compiled into a separate binary. +Running those binaries executes the test functions.

+
+
+ +

+ Implications +

+

Note that rustc needs to repeatedly re-link the library crate with each of the integration tests. +This can add up to a significant compilation time blow up for tests. +That is why I recommend that large projects should have only one integration test crate with several modules. +That is, dont do this:

+ +
+ + +
tests/
+  foo.rs
+  bar.rs
+ +
+

Do this instead:

+ +
+ + +
tests/
+  integration/
+    main.rs
+    foo.rs
+    bar.rs
+ +
+

When a refactoring along these lines was applied to Cargo itself, the effects were substantial (numbers). +The time to compile the test suite decreased 3x. +The size of on-disk artifacts decreased 5x.

+

It cant get better than this, right? +Wrong! +Rust tests by default are run in parallel. +The main that is generated by rustc spawns several threads to saturate all of the CPU cores. +However, Cargo itself runs test binaries sequentially. +This makes sense otherwise, concurrently executing test binaries oversubscribe the CPU. +But this means that multiple integration tests leave performance on the table. +The critical path is the sum of longest tests in each binary. +The more binaries, the longer the path. +For one of my projects, consolidating several integration tests into one reduced the time to run the test suite from 20 seconds to just 13.

+

A nice side-effect of a single modularized integration test is that sharing the code between separate tests becomes trivial, you just pull it into a submodule. +Theres no need to awkwardly repeat mod common; for each integration test.

+
+
+ +

+ Rules of Thumb +

+

If the project I am working with is small, I dont worry about test organization. +Theres no need to make tests twice as fast if they are already nearly instant.

+

Conversely, if the project is large (a workspace with many crates) I worry about test organization a lot. +Slow tests are a boiling frog kind of problem. +If you do not proactively fix it, everything is fine up until the moment you realize you need to sink a week to untangle the mess.

+

For a library with a public API which is published to crates.io, I avoid unit tests. +Instead, I use a single integration tests, called it (integration test):

+ +
+ + +
tests/
+  it.rs
+
+# Or, for larger crates
+
+tests/
+  it/
+    main.rs
+    foo.rs
+    bar.rs
+ +
+

Integration tests use the library as an external crate. +This forces the usage of the same public API that consumers use, resulting in a better design feedback.

+

For an internal library, I avoid integration tests all together. +Instead, I use Cargo unit tests for integration bits:

+ +
+ + +
src/
+  lib.rs
+  tests.rs
+  tests/
+    foo.rs
+    bar.rs
+ +
+

That way, I avoid linking the separate integration tests binary altogether. +I also have access to non-pub API of the crate, which is often useful.

+
+
+ +

+ Assorted Tricks +

+

First, documentation tests are extremely slow. +Each doc test is linked as a separate binary. +For this reason, avoid doc tests in internal libraries for big projects and add this to Cargo.toml:

+ +
+ + +
[lib]
+doctest = false
+ +
+

Second, prefer

+ +
+ + +
#[cfg(test)]
+mod tests; // tests in `tests.rs` file
+ +
+

to

+ +
+ + +
#[cfg(test)]
+mod tests {
+    // tests here
+}
+ +
+

This way, when you modify just the tests, the cargo is smart to not recompile the library crate. +It knows that the contents of tests.rs only affects compilation when --test is passed to rustc. +Learned this one from @petrochenkov, thanks!

+

Third, even if you stick to unit tests, the library is recompiled twice: once with, and once without --test. +For this reason, folks from pernosco go even further. +They add

+ +
+ + +
[lib]
+test = false
+ +
+

to Cargo.toml, make all APIs they want to unit test public and have a single test crate for the whole workspace. +This crate links everything and contains all the unit tests.

+

Discussion on /r/rust.

+ +
+
+
+ + + + + diff --git a/2021/03/12/goroutines-are-not-significantly-smaller-than-threads.html b/2021/03/12/goroutines-are-not-significantly-smaller-than-threads.html new file mode 100644 index 00000000..c876ead5 --- /dev/null +++ b/2021/03/12/goroutines-are-not-significantly-smaller-than-threads.html @@ -0,0 +1,232 @@ + + + + + + + Goroutines Are Not Significantly Smaller Than Threads + + + + + + + + + + + + +
+ +
+ +
+
+ +

Goroutines Are Not Significantly Smaller Than Threads

+

The most commonly cited drawback of OS-level threads is that they use a lot of RAM. +This is not true on Linux.

+

Lets compare memory footprint of 10_000 Linux threads with 10_000 goroutines. +We spawn 10k workers, which sleep for about 10 seconds, waking up every 10 milliseconds. +Each worker is staggered by a pseudorandom delay up to 200 milliseconds to avoid the thundering herd problem.

+ +
+
main.rs
+ + +
use std::{thread, time::Duration};
+
+fn main() {
+    let mut threads = Vec::new();
+    for i in 0u32..10_000 {
+        let t = thread::spawn(move || {
+            let bad_hash = i.wrapping_mul(2654435761) % 200_000;
+            thread::sleep(Duration::from_micros(bad_hash as u64));
+            for _ in 0..1000 {
+                thread::sleep(Duration::from_millis(10));
+            }
+        });
+        threads.push(t);
+    }
+
+    for t in threads {
+        t.join().unwrap()
+    }
+}
+ +
+ +
+
main.go
+ + +
package main
+
+import (
+    "sync"
+    "time"
+)
+
+func main() {
+    var wg sync.WaitGroup
+    for i := uint32(0); i < 10_000; i++ {
+        i := i
+        wg.Add(1)
+        go func() {
+            defer wg.Done()
+            bad_hash := (i * 2654435761) % 200_000
+            time.Sleep(time.Duration(bad_hash) * time.Microsecond)
+            for j := 0; j < 1000; j++ {
+                time.Sleep(10 * time.Millisecond)
+            }
+        }()
+    }
+    wg.Wait()
+}
+ +
+

We use time utility to measure memory usage:

+ +
+
t
+ + +
#!/bin/sh
+command time --format 'real %es\nuser %Us\nsys  %Ss\nrss  %Mk' "$@"
+ +
+

The results:

+ +
+ + +
λ rustc main.rs -C opt-level=3 && ./t ./main
+real 10.35s
+user 4.96s
+sys  16.06s
+rss  94472k
+
+λ go build main.go && ./t ./main
+real 10.92s
+user 13.30s
+sys  0.55s
+rss  34924k
+ +
+

A thread is only 3 times as large as a goroutine. +Absolute numbers are also significant: 10k threads require only 100 megabytes of overhead. +If the application does 10k concurrent things, 100mb might be negligible.

+ + +
+

Note that it is wrong to use this benchmark to compare performance of threads and goroutines. +The workload is representative for measuring absolute memory overhead, but is not representative for time overhead.

+

That being said, it is possible to explain why threads need 21 seconds of CPU time while goroutines need only 14. +Go runtime spawns a thread per CPU-core, and tries hard to keep each goroutine tied to specific thread (and, by extension, CPU). +Threads by default migrate between CPUs, which incurs synchronization overhead. +Pinning threads to cores in a round-robin fashion removes this overhead:

+ +
+ + +
$ cargo build --release && ./t ./target/release/main --pin-to-core
+    Finished release [optimized] target(s) in 0.00s
+real 10.36s
+user 3.01s
+sys  9.08s
+rss  94856k
+ +
+

The total CPU time now is approximately the same, but the distribution is different. +On this workload, goroutine scheduler spends roughly the same amount of cycles in the userspace that the thread scheduler spends in the kernel.

+

Code for the benchmarks is available here: matklad/10k_linux_threads.

+
+
+ + + + + diff --git a/2021/03/22/async-benchmarks-index.html b/2021/03/22/async-benchmarks-index.html new file mode 100644 index 00000000..e627d2d3 --- /dev/null +++ b/2021/03/22/async-benchmarks-index.html @@ -0,0 +1,230 @@ + + + + + + + Async Benchmarks Index + + + + + + + + + + + + +
+ +
+ +
+
+ +

Async Benchmarks Index

+

I dont understand performance characteristics of async programming when applied to typical HTTP based web applications. +Lets say we have a CRUD app with a relational database, where a typical request results in N queries to the database and transfers M bytes over the network. +How much (orders of magnitude?) faster/slower would an async solution be in comparison to a threaded solution?

+

In this live post, I am collecting the benchmarks that help to shed the light on this and related questions. +Note that I am definitely not the right person to do this work, so, if there is a better resource, Ill gladly just use that instead. +Feel free to send pull requests with benchmarks! +Every benchmark will be added, but some might go to the rejected section.

+

I am interested in understanding differences between several execution models, regardless of programming language:

+
+
Threads:
+
+

Good old POSIX threads, as implemented on modern Linux.

+
+
Stackful Coroutines
+
+

M:N threading, which expose the same programming model as threads, but are implemented by multiplexing several user-space coroutines over a single OS-level thread. +The most prominent example here is Go

+
+
Stackless Coroutines
+
+

In this model, each concurrent computation is represented by a fixed-size state machine which reacts to events. +This model often uses async / await syntax for describing and composing state machines using standard control flow constructs.

+
+
Threads With Cooperative Scheduling
+
+

This is a mostly hypothetical model of OS threads with an additional primitive for directly switching between two threads of the same process. +It is not implemented on Linux (see this presentation for some old work towards that). +It is implemented on Windows under the fiber branding.

+
+
+

I am also interested in Rusts specific implementation of stackless coroutines

+
+ +

+ Benchmarks +

+
+
https://github.com/jimblandy/context-switch
+
+

This is a micro benchmark comparing the cost of primitive operations of threads and stackless as implemented in Rust coroutines. +Findings:

+
    +
  • +Thread creation is order of magnitude slower +
  • +
  • +Threads use order of magnitude more RAM. +
  • +
  • +IO-related context switches take the same time +
  • +
  • +Thread-to-thread context switches (channel sends) take the same time, if threads are pinned to one core. +This is surprising to me. +Id expect channel send to be significantly more efficient for either stackful or stackless coroutines. +
  • +
  • +Thread-to-thread context switches are order of magnitude slower if theres no pinning +
  • +
  • +Threads hit non-memory resource limitations quickly (its hard to spawn > 50k threads). +
  • +
+
+
https://github.com/jkarneges/rust-async-bench
+
+

Micro benchmark which compares Rusts implementation of stackless coroutines with a manually coded state machine. +Rusts async/await turns out to not be zero-cost, pure overhead is about 4x. +The absolute numbers are still low though, and adding even a single syscall of work reduces the difference to only 10%

+
+
https://matklad.github.io/2021/03/12/goroutines-are-not-significantly-smaller-than-threads.html
+
+

This is a micro benchmark comparing just the memory overhead of threads and stackful coroutines as implemented in Go. +Threads are times, but not orders of magnitude larger.

+
+
https://calpaterson.com/async-python-is-not-faster.html
+
+

Macro benchmark which compares many different Python web frameworks. +The conclusion is that async is worse for both latency and throughput. +Note two important things. +First, the servers are run behind a reverse proxy (nginx), which drastically changes IO patterns that are observed by the server. +Second, Python is not the fastest language, so throughput is roughly correlated with the amount of C code in the stack.

+

There is also a rebuttal post.

+
+
+
+
+ +

+ Rejected Benchmarks +

+
+
https://matej.laitl.cz/bench-actix-rocket/
+
+

This is a macro benchmark comparing performance of sync and async Rust web servers. +This is the kind of benchmark I want to see, and the analysis is exceptionally good. +Sadly, a big part of the analysis is fighting with unreleased version of software and working around bugs, so I dont trust that the results are representative.

+
+
https://www.techempower.com/benchmarks/
+
+

This is a micro benchmark that pretends to be a macro benchmark. +The code is overly optimized to fit a very specific task. +I dont think the results are easily transferable to real-world applications. +At the same time, lack of the analysis and the macro scale of the task itself doesnt help with building a mental model for explaining the observed performance.

+
+
https://inside.java/2020/08/07/loom-performance
+
+

The opposite of a benchmark actually. +This post gives a good theoretical overview of why async might lead to performance improvements. +Sadly, it drops the ball when it comes to practice:

+ +
+

millions of user-mode threads instead of the meager thousands the OS can support.

+
+ +
+

What is the limiting factor for OS threads?

+
+
+
+
+
+ + + + + diff --git a/2021/04/26/concurrent-expression-problem.html b/2021/04/26/concurrent-expression-problem.html new file mode 100644 index 00000000..8fc636e9 --- /dev/null +++ b/2021/04/26/concurrent-expression-problem.html @@ -0,0 +1,218 @@ + + + + + + + Concurrent Expression Problem + + + + + + + + + + + + +
+ +
+ +
+
+ +

Concurrent Expression Problem

+

I am struggling with designing concurrent code. +In this post, I want to share a model problem which exemplifies some of the issues. +It is reminiscent of the famous expression problem in that theres a two dimensional design grid, and a win along one dimension translates to a loss along the other. +If you want a refresher on the expression problem (not required to understand this article), take a look at this post. +Its not canonical, but I like it.

+

Without further ado, concurrent expression problem:

+ + +

I am not sure thats exactly the right formulation, I feel like I am straining it a bit to fit the expression problem shape. +The explanation that follows matters more.

+

I think there are two ways to code the system described. +The first approach is to us a separate thread / goroutine / async task for each concurrent activity, with some synchronization around the access to the shared state. +The alternative approach is to write an explicit state machine / actor loop to receive the next event and process it.

+

In the first scheme, adding new activities is easy, as you just write straight-line code with maybe some .awaits here and there. +In the second scheme, its easy to check and act on invariants, as there is only a single place where the state is modified.

+

Lets take a look at a concrete example. +Well be using a pseudo code for a language with cooperative concurrency and explicit yield points (think Python with async/await).

+

The state consists of two counters. +One activity decrements the first counter every second. +The other activity does the same to the other counter. +When both counters reach zero, we want to print something.

+

The first approach would look roughly like this:

+ +
+ + +
struct State { c1: u32, c2: u32 }
+
+async fn act1(state: State) {
+  while state.c1 > 0 {
+    sleep(1).await;
+    state.c1 -= 1;
+    if state.c1 == 0 && state.c2 == 0 {
+      print("both are zero")
+    }
+  }
+}
+
+async fn act2(state: State) {
+  while state.c2 > 0 {
+    sleep(1).await;
+    state.c2 -= 1;
+    if state.c1 == 0 && state.c2 == 0 {
+      print("both are zero")
+    }
+  }
+}
+ +
+

And the second one like this:

+ +
+ + +
async fn run(state: State) {
+  loop {
+    let event = next_event().await;
+    match event {
+      Event::Dec1 => {
+        state.c1 -= 1;
+        if state.c1 > 0 {
+          send_event_with_delay(Event::Dec1, 1)
+        }
+      }
+      Event::Dec2 => {
+        state.c2 -= 1;
+        if state.c2 > 0 {
+          send_event_with_delay(Event::Dec2, 1)
+        }
+      }
+    }
+    if state.c1 == 0 && state.c2 == 0 {
+      print("both are zero")
+    }
+  }
+}
+ +
+

Its much easier to see what the concurrent activities are in the first case. +Its more clear how the overall state evolves in the second case.

+

The second approach also gives you more control if several events are ready, you can process them in the order of priority (usually it makes sense to prioritize writes over reads). +You can trivially add some logging at the start and end of the loop to collect data about slow events and overall latency. +But the hit to the programming model is big. +If you are new to the code and dont know which conceptual activities are there, its hard to figure out that just from the code. +The core issue is that causal links between asynchronous events are not reified in the code:

+ +
+ + +
match {
+  Event::X => { do_x() },
+  Event::Y => { do_y() },
+}
+
+// vs
+
+async fn do_xy() {
+  do_x().await;
+  do_y().await;
+}
+ +
+
+
+ + + + + diff --git a/2021/05/12/design-pattern-dumping-ground.html b/2021/05/12/design-pattern-dumping-ground.html new file mode 100644 index 00000000..ee63d82f --- /dev/null +++ b/2021/05/12/design-pattern-dumping-ground.html @@ -0,0 +1,149 @@ + + + + + + + Design Pattern: Kitchen Sink + + + + + + + + + + + + +
+ +
+ +
+
+ +

Design Pattern: Kitchen Sink

+

These are the notes on a design pattern I noticed in several contexts.

+

Suppose, metaphorically, you have a neatly organized bookcase which categorizes the books by their topics. +And now, suppose youve got a new book, which doesnt fit clearly into any existing category. +What would you do?

+

Here are some common solutions Ive seen:

+ +

Heres the kitchen sink pattern solution for this problem: have the Uncategorized shelf for books which dont clearly fit into the existing hierarchy.

+

The idea here is that the overall organization becomes better, if you explicitly designate some place as stuff that doesnt fit goes here by default. +Lets see the examples.

+

First, the Django web framework has a shortcuts module with contains conveniences functions, not fitting model/view separation. +The get_object_or_404 function lookups an object in the database and returns HTTP404 if it is not found. +Models (SQL) and views (HTTP) dont know about each other, so the function doesnt belong to either of these modules. +Placing it in shortcuts allows this separation to be more crisp.

+

Second, I have two tricks to keep my home folder organized. +I have a script that clears ~/downloads on every reboot, and I have a ~/tmp as my dumping ground. +Before ~/tmp, various semi-transient things polluted my otherwise perfectly organized workspace.

+

Third, I asked my colleague recently about some testing infrastructure. +They replied that they have an extensive document for it in their fork, because its unclear whats the proper place for it in the main repo. +In this case the absence of a dumping ground prevented useful work for no good reason.

+

Fourth, in rust-analyzer we have a ast::make module which is intended to contain the minimal orthogonal set of constructors for AST nodes. +Historically, people kept adding non-minimal, non-orthogonal constructors there as well. +Useful work was done, but it muddied the design. +This was fixed by adding a dedicated ast::make::ext submodule for convenient shortcuts.

+

Fifth, for big projects I like having stdext modules, which fill-in missing batteries for the standard library. +Without it, various modules tend to accumulate unrelated, and often slightly duplicated, functionality.

+

Sixth, to avoid overthinking and setup costs to start a new hobby project (of which I have a tonne), I have a single monorepo for all incomplete things. +Adding a folder there is much easier than creating a GitHub repo.

+

To sum up, many classifications work best if there is an explicit cant classify this category. +If theres no obvious place to put things which dont fit, a solid design might erode with time. +Note that for this pattern to be useful, an existence of a good solid design is prerequisite, lest all the code ends up in an utils module.

+
+
+ + + + + diff --git a/2021/05/31/how-to-test.html b/2021/05/31/how-to-test.html new file mode 100644 index 00000000..9cc4047f --- /dev/null +++ b/2021/05/31/how-to-test.html @@ -0,0 +1,887 @@ + + + + + + + How to Test + + + + + + + + + + + + +
+ +
+ +
+
+ +

How to Test

+

Alternative titles:
+     Unit Tests are a Scam
+     Test Features, Not Code
+     Data Driven Integrated Tests
+

+

This post describes my current approach to testing. +When I started programming professionally, I knew how to write good code, but good tests remained a mystery for a long time. +This is not due to the lack of advice on the contrary, theres abundance of information & terminology about testing. +This celestial emporium of benevolent knowledge includes TDD, BDD, unit tests, integrated tests, integration tests, end-to-end tests, functional tests, non-functional tests, blackbox tests, glassbox tests,

+

Knowing all this didnt help me to create better software. +What did help was trying out different testing approaches myself, and looking at how other people write tests. +Keep in mind that my background is mostly in writing compiler front-ends for IDEs. +This is a rather niche area, which is especially amendable to testing. +Compilers are pure self-contained functions. +I dont know how to best test modern HTTP applications built around inter-process communication.

+

Without further ado, lets see what I have learned!

+

Further ado(2024-05-21): while writing this post, I was missing a key piece of terminology for +crisply describing various kinds of tests. If you like this post, you might want to read +Unit and Integration Tests +. That post supplies better vocabulary for talking about phenomena described in the present article.

+
+ +

+ Test Driven Design Ossification +

+

This is something I inflicted upon myself early in my career, and something I routinely observe. +You want to refactor some code, say add a new function parameter. +Turns out, there are a dozen of tests calling this function, so now a simple refactor also involves fixing all the tests.

+

There is a simple, mechanical fix to this problem: introduce the check function which encapsulates API under test. +Its easier to explain using a toy example. +Lets look at testing something simple, like a binary search, just to illustrate the technique.

+

We start with direct testing:

+ +
+ + +
/// Given a *sorted* `haystack`, returns `true`
+/// if it contains the `needle`.
+fn binary_search(haystack: &[T], needle: &T) -> bool {
+    ...
+}
+
+#[test]
+fn binary_search_empty() {
+  let res = binary_search(&[], &0);
+  assert_eq!(res, false);
+}
+
+#[test]
+fn binary_search_singleton() {
+  let res = binary_search(&[92], &0);
+  assert_eq!(res, false);
+
+  let res = binary_search(&[92], &92);
+  assert_eq!(res, true);
+
+  let res = binary_search(&[92], &100);
+  assert_eq!(res, false);
+}
+
+// And a dozen more of other similar tests...
+ +
+

Some time passes, and we realize that -> bool is not the best signature for binary search. +Its better if it returned an insertion point (an index where element should be inserted to maintain sortedness). +That is, we want to change the signature to

+ +
+ + +
fn binary_search(haystack: &[T], needle: &T) -> Result<usize, usize>;
+ +
+

Now we have to change every test, because the tests are tightly coupled to the specific API.

+

My solution to this problem is making the tests data driven. +Instead of every test interacting with the API directly, I like to define a single check function which calls the API. +This function takes a pair of input and expected result. +For binary search example, it will look like this:

+ +
+ + +
#[track_caller]
+fn check(
+  input_haystack: &[i32],
+  input_needle: i32,
+  expected_result: bool,
+) {
+  let actual_result =
+    binary_search(input_haystack, &input_needle);
+  assert_eq!(expected_result, actual_result);
+}
+
+#[test]
+fn binary_search_empty() {
+  check(&[], 0, false);
+}
+
+#[test]
+fn binary_search_singleton() {
+  check(&[92], 0, false);
+  check(&[92], 92, true);
+  check(&[92], 100, false);
+}
+ +
+

Now, when the API of the binary_search function changes, we only need to adjust the single place check function:

+ +
+ + +
#[track_caller]
+fn check(
+  input_haystack: &[i32],
+  input_needle: i32,
+  expected_result: bool,
+) {
+  let actual_result =
+    binary_search(input_haystack, &input_needle).is_ok();
+  assert_eq!(expected_result, actual_result);
+}
+ +
+

To be clear, after youve done the refactor, youll need to adjust the tests to check the index as well, but this can be done separately. +Existing test suite does not impede changes.

+ +

Keep in mind that the binary search example is artificially simple. +The main danger here is that this is a boiling frog type of situation. +While the project is small and the tests are few, you dont notice that refactors are ever so slightly longer than necessary. +Then, several tens of thousands lines of code later, you realize that to make a simple change you need to fix a hundred tests.

+
+
+ +

+ Test Friction +

+

Almost no one likes to write tests. +Ive noticed many times how, upon fixing a trivial bug, I am prone to skipping the testing work. +Specifically, if writing a test is more effort than the fix itself, testing tends to go out of the window. +Hence,

+ +

Coming back to the binary search example, note how check function reduces the amount of typing to add a new test. +For tests, this is a significant saving, not because typing is hard, but because it lowers the cognitive barrier to actually do the work.

+
+
+ +

+ Test Features, Not Code +

+

The over-simplified binary search example can be stretched further. +What if you replace the sorted array with a hash map inside your application? +Or what if the calling code no longer needs to search at all, and wants to process all of the elements instead?

+

Good code is easy to delete. +Tests represent an investment into existing code, and make it costlier to delete (or change).

+

The solution is to write tests for features in such a way that they are independent of the code. +I like to use the neural network test for this:

+
+
Neural Network Test
+
+

Can you re-use the test suite if your entire software is replaced with an opaque neural network?

+
+
+

To give a real-life example this time, suppose that you are writing that part of code-completion engine which sorts potential completions according to relevance. +(something I should probably be doing right now, instead of writing this article :-) )

+

Internally, you have a bunch of functions that compute relevance facts, like:

+
    +
  • +Is there direct type match (.foo has the desired type)? +
  • +
  • +Is there an indirect type match (.foo.bar has the right type)? +
  • +
  • +How frequently is this completion used in the current module? +
  • +
+

Then, theres the final ranking function that takes these facts and comes up with an overall rank.

+

The classical unit-test approach here would be to write a bunch of isolated tests for each of the relevance functions, +and a separate bunch of tests which feeds the ranking function a list of relevance facts and checks the final score.

+

This approach obviously fails the neural network test.

+

An alternative approach is to write a test to check that at a given position a specific ordered list of entries is returned. +That suite could work as a cross-validation for an ML-based implementation.

+

In practice, its unlikely (but not impossible), that we use actual ML here. +But its highly probably that the naive independent weights model isnt the end of the story. +At some point there will be special cases which would necessitate change of the interface.

+ +

Note that this advice goes directly against one common understanding of unit-testing. +I am fairly confident that it results in better software over the long run.

+
+
+ +

+ Make Tests Fast +

+

Theres one talk about software engineering, which stands out for me, and which is my favorite. +It is Boundaries by Gary Bernhardt. +Theres a point there though, which I strongly disagree with:

+
+
Integration Tests are Superlinear?
+
+

When you use integration tests, any new feature is accompanied by a bit of new code and a new test. +However, new code slows down all other tests, so the the overall test suite becomes slow, as the total time grows super-linearly.

+
+
+

I dont think more code under test translates to slower test suite. +Merge sort spends more lines of code than bubble sort, but it is way faster.

+

In the abstract, yes, more code generally means more execution time, but I doubt this is the defining factor in tests execution time. +What actually happens is usually:

+
    +
  • +Input/Output reading just a bit from a disk, network or another process slows down the tests significantly. +
  • +
  • +Outliers very often, testing time is dominated by only a couple of slow tests. +
  • +
  • +Overly large input throwing enough data at any software makes it slow. +
  • +
+

The problem with integrated tests is not code volume per se, but the fact that they typically mean doing a lot of IO. +But this doesnt need to be the case

+ +

Nonetheless, some tests are going to be slow. +It pays off to introduce the concept of slow tests early on, arrange the skipping of such tests by default and only exercise them on CI. +You dont need to be fancy, just checking an environment variable at the start of the test is perfectly fine:

+ +
+ + +
#[test]
+fn completion_works_with_real_standard_library() {
+  if std::env::var("RUN_SLOW_TESTS").is_err() {
+    return;
+  }
+  ...
+}
+ +
+

Definitely do not use conditional compilation to hide slow tests its an obvious solution which makes your life harder +(similar observation from the Go ecosystem).

+

To deal with outliers, print each tests execution time by default. +Having the numbers fly by gives you immediate feedback and incentive to improve.

+
+
+ +

+ Data Driven Testing +

+

All these together lead to a particular style of architecture and tests, which I call data driven testing. +The bulk of the software is a pure function, where the state is passed in explicitly. +Removing IO from the picture necessitates that the interface of software is specified in terms of data. +Value in, value out.

+

One property of data is that it can be serialized and deserialized. +That means that the check style tests can easily accept arbitrary complex input, which is specified in a structured format (JSON), ad-hoc plain text format, or via embedded DSL (builder-style interface for data objects).

+

Similarly, The expected argument of check is data. +It is a result which is more-or-less directly displayed to the user.

+

A convincing example of a data driven test would be a Goto Definition tests from rust-analyzer (source):

+ +
+ + +
+

In this case, the check function has only a single argument a string which specifies both the input and the expected result. +The input is a rust project with three files (//- /file.rs syntax shows the boundary between the files). +The current cursor position is also a part of the input and is specified with the $0 syntax. +The result is the //^^^ comment which marks the target of the Goto Definition call. +The check function creates an in-memory Rust project, invokes Goto Definition at the position signified by $0, and checks that the result is the position marked with ^^^.

+

Note that this is decidedly not a unit test. +Nothing is stubbed or mocked. +This test invokes the whole compilation pipeline: virtual file system, parser, macro expander, name resolution. +It runs on top of our incremental computation engine. +It touches a significant fraction of the IDE APIs. +Yet, it takes 4ms in debug mode (and 500µs in release mode). +And note that it absolutely does not depend on any internal API if we replace our dumb compiler with sufficiently smart neural net, nothing needs to be adjusted in the tests.

+

Theres one question though: why on earth am I using a png image to display a bit of code? +Only to show that the raw string literal (r#""#) which contains Rust code is highlighted as such. +This is possible because we re-use the same input format (with //-, $0 and couple of other markup elements) for almost every test in rust-analyzer. +As such, we can invest effort into building cool things on top of this format, which subsequently benefit all our tests.

+
+
+ +

+ Expect Tests +

+

Previous example had a complex data input, but a relatively simple data output a position in the file. +Often, the output is messy and has a complicated structure as well (a symptom of rho problem). +Worse, sometimes the output is a part that is changed frequently. +This often necessitates updating a lot of tests. +Going back to the binary search example, the change from -> bool to -> Result<usize, usize> was an example of this effect.

+

There is a technique to make such simultaneous changes to all gold outputs easy testing with expectations. +You specify the expected result as a bit of data inline with the test. +Theres a special mode of running the test suite for updating this data. +Instead of failing the test, a mismatch between expected and actual causes the gold value to be updated in-place. +That is, the test framework edits the code of the test itself.

+

Heres an example of this workflow in rust-analyzer, used for testing code completion:

+ +
+ + +
+

Often, just Debug representation of the type works well for expect tests, but you can do something more fun. +See this post from Jane Street for a great example: +Using ASCII waveforms to test hardware designs.

+

There are several libraries for this in Rust: insta, k9, expect-test.

+
+
+ +

+ Fluent Assertions +

+

An extremely popular genre for a testing library is a collection of fluent assertions:

+ +
+ + +
// Built-in assertion:
+assert!(x > y);
+
+// Fluent assertion:
+assert_that(x).is_greater_than(y);
+ +
+

The benefit of this style are better error messages. +Instead of just false is not true, the testing framework can print values for x and y.

+

I dont find this useful. +Using the check style testing, there are very few assertions actually written in code. +Usually, I start with plain asserts without messages. +The first time I debug an actual test failure for a particular function, I spend some time to write a detailed assertion message. +To me, fluent assertions are not an attractive point on the curve that includes plain asserts and hand-written, context aware explanations of failures. +A notable exception here is pytest approach this testing framework overrides the standard assert to provide a rich diff without ceremony.

+ +
+
+ +

+ Peeking Inside +

+

One apparent limitation of the style of integrated testing I am describing is checking for properties which are not part of the output. +For example, if some kind of caching is involved, you might want to check that the cache is actually being hit, and is not just sitting there. +But, by definition, cache is not something that an outside client can observe.

+

The solution to this problem is to make this extra data a part of the systems output by adding extra observability points. +A good example here is Cargos test suite. +It is set-up in an integrated, data-driven fashion. +Each tests starts with a succinct DSL for setting up a tree of files on disk. +Then, a full cargo command is invoked. +Finally, the test looks at the commands output and the resulting state of the file system, and asserts the relevant facts.

+

Tests for caching additionally enable verbose internal logging. +In this mode, Cargo prints information about cache hits and misses. +These messages are then used in assertions.

+

A close idea is coverage marks. +Some times, you want to check that something does not happen. +Tests for this tend to be fragile often the thing does not happen, but for the wrong reason. +You can add a side channel which explains the reasoning behind particular behavior, and additionally assert this as well.

+
+
+ +

+ Externalized Tests +

+

In the ultimate stage of data driven tests the definitions of test cases are moved out of test functions and into external files. +That is, you dont do this:

+ +
+ + +
#[test]
+fn test_foo() {
+  check("foo", "oof")
+}
+
+#[test]
+fn test_bar() {
+  check("bar", "rab")
+}
+ +
+

Rather, there is a single test that looks like this:

+ +
+ + +
#[test]
+fn test_all() {
+  for file in read_dir("./test_data/in") {
+    let input = read_to_string(
+      &format!("./test_data/in/{}", file),
+    );
+    let output = read_to_string(
+      &format!("./test_data/out/{}", file),
+    );
+    check(input, output)
+  }
+}
+ +
+

I have a love-hate relationship with this approach. +It has at least two attractive properties. +First, it forces data driven approach without any cheating. +Second, it makes the test suite more re-usable. +An alternative implementation in a different programming language can use the same tests.

+

But theres a drawback as well without literal #[test] attributes, integration with tooling suffers. +For example, you dont automatically get X out of Y tests passed at the end of test run. +You cant conveniently debug just a single test, there isnt a helpful Run icon/shortcut you can use in an IDE.

+

When I do externalized test cases, I like to leave a trivial smoke test behind:

+ +
+ + +
#[test]
+fn smoke() {
+  check("", "");
+}
+ +
+

If I need to debug a failing external test, I first paste the input into this smoke test, and then get my IDE tooling back.

+
+
+ +

+ Beyond Example Based Testing +

+

Reading from a file is not the most fun way to come up with a data input for a check function.

+

Here are a few other popular ones:

+
+
Property Based Testing
+
+

Generate the input at random and verify that the output makes sense. +For a binary search, check that the needle indeed lies between the two elements at the insertion point.

+
+
Full Coverage
+
+

Better still, instead of generating some random inputs, just check that the answer is correct for all inputs. +This is how you should be testing binary search generate every sorted list of length at most 7 with numbers in the 0..=6 range. +Then, for each list and for each number, check that the binary search gives the same result as a naive linear search.

+
+
Coverage Guided Fuzzing
+
+

Just throw random bytes at the check function. +Random bytes probably dont make much sense, but its good to verify that the program returns an error instead of summoning nasal demons. +Instead of piling bytes completely at random, observe which branches are taken, and try to invent byte sequences which cover more branches. +Note that this test is polymorphic in the system under test.

+
+
Structured Fuzzing / Coverage Guided Property Testing
+
+

Use random bytes as a seed to generate syntactically valid inputs, then see you software crash and burn when the most hideous edge cases are uncovered. +If you use Rust, check out wasm-smith and arbitrary crates.

+
+
+ +
+
+ +

+ The External World +

+

What if isolating IO is not possible, and the application is fundamentally build around interacting with external systems? +In this case, my advice is to just accept that the tests are going to be slow, and might need extra effort to avoid flakiness.

+

Cargo is the perfect case study here. +Its raison dêtre is orchestrating a herd of external processes. +Lets look at the basic test:

+ +
+ + +
#[test]
+fn cargo_compile_simple() {
+  let p = project()
+    .file("Cargo.toml", &basic_bin_manifest("foo"))
+    .file("src/foo.rs", &main_file(r#""i am foo""#, &[]))
+    .build();
+
+  p.cargo("build").run();
+
+  assert!(p.bin("foo").is_file());
+  p.process(&p.bin("foo")).with_stdout("i am foo\n").run();
+}
+ +
+

The project() part is a builder, which describes the state of the a system. +First, .build() writes the specified files to a disk in a temporary directory. +Then, p.cargo("build").run() executes the real cargo build command. +Finally, a bunch of assertions are made about the end state of the file system.

+

Neural network test: this is completely independent of internal Cargo APIs, by virtue of interacting with a cargo process via IPC.

+

To give an order-of-magnitude feeling for the cost of IO, Cargos test suite takes around seven minutes (-j 1), while rust-analyzer finishes in less than half a minute.

+

An interesting case is the middle ground, when the IO-ing part is just big and important enough to be annoying. +That is the case for rust-analyzer although almost all code is pure, theres a part which interacts with a specific editor. +What makes this especially finicky is that, in the case of Cargo, its Cargo who calls external processes. +With rust-analyzer, its something which we dont control, the editor, which schedules the IO. +This often results in hard-to-imagine bugs which are caused by particularly weird environments.

+

I dont have good answers here, but here are the tricks I use:

+
    +
  1. +Accept that something will break during integration. +Even if you always create perfect code and never make bugs, your upstream integration point will be buggy sometimes. +
  2. +
  3. +Make integration bugs less costly: +
      +
    • +use release trains, +
    • +
    • +make patch release process non-exceptional and easy, +
    • +
    • +have a checklist for manual QA before the release. +
    • +
    +
  4. +
  5. +Separate the tricky to test bits into a separate project. +This allows you to write slow and not 100% reliable tests for integration parts, while keeping the core test suite fast and dependable. +
  6. +
+ +
+
+ +

+ The Concurrent World +

+

Consider the following API:

+ +
+ + +
fn do_stuff_in_background(p: Param) {
+  std::thread::spawn(move || {
+    // Stuff
+  })
+}
+ +
+

This API is fundamentally untestable. +Can you see why? +It spawns a concurrent computation, but it doesnt allow waiting for this computation to be finished. +So, any test that calls do_stuff_in_background cant check that the Stuff is done. +Worse, even tests which do not call this function might start to fail they now can get interference from other tests. +The concurrent computation can outlive the test that originally spawned it.

+

This problem plagues almost every concurrent application I see. +A common symptom is adding timeouts and sleeps to test, to increase the probability of stuff getting done. +Such timeouts are another common cause of slow test suites.

+

What makes this problem truly insidious is that theres no work-around. +Broken once, causality link is not reforgable by a layer above.

+

The solution is simple: dont do this.

+ +
+
+ +

+ Layers +

+

Another common problem I see in complex projects is a beautifully layered architecture, which is inverted in tests.

+

Lets say you have something fabulous, like L1 <- L2 <- L3 <- L4. +To test L1, the path of least resistance is often to write tests which exercise L4. +You might even think that this is the setup I am advocating for. +Not exactly.

+

The problem with L1 <- L2 <- L3 <- L4 <- Tests is that working on L1 becomes slower, especially in compiled languages. +If you make a change to L1, then, before you get to the tests, you need to recompile the whole chain of reverse dependencies. +My favorite example here is rustc when I worked on the lexer (T1), I spent a lot of time waiting for the rest of the compiler to be rebuild to check my small change.

+

The right setup here is to write integrated tests for each layer:

+ +
+ + +
L1 <- Tests
+L1 <- L2 <- Tests
+L1 <- L2 <- L3 <- Tests
+L1 <- L2 <- L3 <- L4 <- Tests
+ +
+

Note that testing L4 involves testing L1, L2 an L3. +This is not a problem. +Due to layering, only L4 needs to be recompiled. +Other layers dont affect run time meaningfully. +Remember its IO (and sleep-based synchronization) that kills performance, not just code volume.

+
+
+ +

+ Test Everything +

+

In a nutshell, a #[test] is just a bit of code which is plugged into the build system to be executed automatically. +Use this to your advantage, simplify the automation by moving as much as possible into tests.

+

Heres some things in rust-analyzer which are just tests:

+
    +
  • +Code formatting (most common one you dont need an extra pile of YAML in CI, you can shell out to the formatter from the test). +
  • +
  • +Checking that the history does not contain merge commits and teaching new contributors git survival skills. +
  • +
  • +Collecting the manual from specially-formatted doc comments across the code base. +
  • +
  • +Checking that the code base is, in fact, reasonably well-documented. +
  • +
  • +Ensuring that the licenses of dependencies are compatible. +
  • +
  • +Ensuring that high-level operations are linear in the size of the input. +Syntax-highlight a synthetic file of 1, 2, 4, 8, 16 kilobytes, run linear regression, check that result looks like a line rather than a parabola. +
  • +
+
+
+ +

+ Use Bors +

+

This essay already mentioned a couple of cognitive tricks for better testing: reducing the fixed costs for adding new tests, and plotting/printing test times. +The best trick in a similar vein is the not rocket science rule of software engineering.

+

The idea is to have a robot which checks that the merge commit passes all the tests, before advancing the state of the main branch.

+

Besides the evergreen master, such bot adds pressure to keep the test suite fast and non-flaky. +This is another boiling frog, something you need to constantly keep an eye on. +If you have any a single flaky test, its very easy to miss when the second one is added.

+ +
+
+ +

+ Recap +

+

This was a long essay. +Lets look back at some of the key points:

+
    +
  1. +There is a lot of information about testing, but it is not always helpful. +At least, it was not helpful for me. +
  2. +
  3. +The core characteristic of the test suite is how easy it makes changing the software under test. +
  4. +
  5. +To that end, a good strategy is to focus on testing the features of the application, rather than on testing the code used to implement those features. +
  6. +
  7. +A good test suite passes the neural network test it is still useful if the entire application is replaced by an ML model which just comes up with the right answer. +
  8. +
  9. +Corollary: good tests are not helpful for design in the small a good test wont tell you the best signatures for functions. +
  10. +
  11. +Testing time is something worth optimizing for. +Tests are sensitive to IO and IPC. +Tests are relatively insensitive to the amount of code under tests. +
  12. +
  13. +There are useful techniques which are underused expectation tests, coverage marks, externalized tests. +
  14. +
  15. +There are not so useful techniques which are over-represented in the discourse: fluent assertions, mocks, BDD. +
  16. +
  17. +The key for unlocking many of the above techniques is thinking in terms of data, rather than interfaces or objects. +
  18. +
  19. +Corollary: good tests are helpful for design in the large. +They help to crystalize the data model your application is built around. +
  20. +
+
+ +
+
+ + + + + diff --git a/2021/07/09/inline-in-rust.html b/2021/07/09/inline-in-rust.html new file mode 100644 index 00000000..7b3568f3 --- /dev/null +++ b/2021/07/09/inline-in-rust.html @@ -0,0 +1,269 @@ + + + + + + + Inline In Rust + + + + + + + + + + + + +
+ +
+ +
+
+ +

Inline In Rust

+

Theres a lot of tribal knowledge surrounding #[inline] attribute in Rust. +I often find myself teaching how it works, so I finally decided to write this down.

+

Caveat Emptor: this is what I know, not necessarily what is true. +Additionally, exact semantics of #[inline] is not set in stone and may change in future Rust versions.

+
+ +

+ Why Inlining Matters? +

+

Inlining is an optimizing transformation which replaces a call to a function with its body.

+

To give a trivial example, during compilation the compiler can transform this code:

+ +
+ + +
fn f(w: u32) -> u32 {
+    inline_me(w, 2)
+}
+
+fn inline_me(x: u32, y: u32) -> u32 {
+    x * y
+}
+ +
+

Into this code:

+ +
+ + +
fn f(w: u32) -> u32 {
+    w * 2
+}
+ +
+

To paraphrase A Catalogue of Optimizing Transformations by Frances Allen and John Cocke:

+ +
+ + +
There are many obvious advantages to inlining; two are:
+
+a. There is no function call overhead whatsoever.
+b. Caller and callee are optimized together. Advantage can be taken
+   of particular argument values and relationships: constant arguments
+   can be folded into the code, invariant instructions in the callee
+   can be moved to infrequently executed areas of the caller, etc.
+ +
+

In other words, for an ahead of time compiled language inlining is the mother of all other optimizations. +It gives the compiler the necessary context to apply further transformations.

+
+
+ +

+ Inlining and Separate Compilation +

+

Inlining is at odds with another important idea in compilers that of separate compilation. +When compiling big programs, it is desirable to separate them into modules which can be compiled independently to:

+
    +
  • +Process everything in parallel. +
  • +
  • +Scope incremental recompilations to individual changed modules. +
  • +
+

To achieve separate compilation, compilers expose signatures of functions, but keep function bodies invisible to other modules, preventing inlining. +This fundamental tension is what makes #[inline] in Rust trickier than just a hint for the compiler to inline the function.

+
+
+ +

+ Inlining in Rust +

+

In Rust, a unit of (separate) compilation is a crate. +If a function f is defined in a crate A, then all calls to f from within A can be inlined, as the compiler has full access to f. +If, however, f is called from some downstream crate B, such calls cant be inlined. +B has access only to the signature of f, not its body.

+

Thats where the main usage of #[inline] comes from it enables cross-crate inlining. +Without #[inline], even the most trivial of functions cant be inlined across the crate boundary. +The benefit is not without a cost the compiler implements this by compiling a separate copy of the #[inline] function with every crate it is used in, significantly increasing compile times.

+

Besides #[inline], there are two more exceptions to this. +Generic functions are implicitly inlinable. +Indeed, the compiler can only compile a generic function when it knows the specific type arguments it is instantiated with. +As that is known only in the calling crate, bodies of generic functions have to be always available.

+

The other exception is link-time optimization. +LTO opts out of separate compilation it makes bodies of all functions available, at the cost of making compilation much slower.

+
+
+ +

+ Inlining in Practice +

+

Now that the underlying semantics is explained, its possible to infer some rule-of-thumbs for using #[inline].

+

First, its not a good idea to apply #[inline] indiscriminately, as that makes compile time worse. +If you dont care about compile times, a much better solution is to set lto = true in Cargo profile (docs).

+

Second, it usually isnt necessary to apply #[inline] to private functions within a crate, the compiler generally makes good inline decisions. +Theres a joke that LLVMs heuristic for when the function should be inlined is yes.

+

Third, when building an application, apply #[inline] reactively when profiling shows that a particular small function is a bottleneck. +Consider using lto for releases. +It might make sense to proactively #[inline] trivial public functions.

+

Fourth, when building libraries, proactively add #[inline] to small non-generic functions. +Pay special attention to impls: Deref, AsRef and the like often benefit from inlining. +A library cant anticipate all usages upfront, it makes sense to not prematurely pessimize future users. +Note that #[inline] is not transitive: if a trivial public function calls a trivial private function, you need to #[inline] both. +See this benchmark for details.

+

Fifth, mind generic functions. +Its not too wrong to say that generic functions are implicitly inline. +As a result, they often are a cause for code bloat. +Generic functions, especially in libraries, should be written to minimize unwanted inlining. +To give an example from wat:

+ +
+ + +
// Public, generic function.
+// Will cause code bloat if not handled carefully!
+pub fn parse_str(wat: impl AsRef<str>) -> Result<Vec<u8>> {
+  // Immediately delegate to a non-generic function.
+  _parse_str(wat.as_ref())
+}
+
+// Separate-compilation friendly private implementation.
+fn _parse_str(wat: &str) -> Result<Vec<u8>> {
+    ...
+}
+ +
+
+
+ +

+ References +

+
    +
  1. +Language reference. +
  2. +
  3. +Rust performance book. +
  4. +
  5. +@alexcrichton explains inline. +Note that, in reality, the compile time costs are worse than what I described inline functions are compiled per codegen-unit, not per crate. +
  6. +
  7. +More @alexcrichton. +
  8. +
  9. +Even more @alexcrichton. +
  10. +
+

Discussion on /r/rust.

+

There is now a follow up post: Its Not Always iCache.

+ +
+
+
+ + + + + diff --git a/2021/07/10/its-not-always-icache.html b/2021/07/10/its-not-always-icache.html new file mode 100644 index 00000000..6c2b8005 --- /dev/null +++ b/2021/07/10/its-not-always-icache.html @@ -0,0 +1,496 @@ + + + + + + + It's Not Always ICache + + + + + + + + + + + + +
+ +
+ +
+
+ +

Its Not Always ICache

+

This is a follow up to the previous post about #[inline] in Rust specifically. +This post is a bit more general, and a bit more ranty. +Reader, beware!

+

When inlining optimization is discussed, the following is almost always mentioned: inlining can also make code slower, because inlining increases the code size, blowing the instruction cache size and causing cache misses.

+

I myself have seen this repeated on various forms many times. +I have also seen a lot of benchmarks where judicious removal of inlining annotations did increase performance. +However, not once have I seen the performance improvement being traced to ICache specifically. +To me at least, this explanation doesnt seem to be grounded people know that ICache is to blame because other people say this, not because theres a benchmark everyone points to. +It doesnt mean that the ICache explanation is wrong just that I personally dont have evidence to believe it is better than any other explanation.

+

Anyway, Ive decided to look at a specific case where I know #[inline] to cause an observable slow down, and understand why it happens. +Note that the goal here is not to explain real-world impact of #[inline], the benchmark is artificial. +The goal is, first and foremost, to learn more about the tools to use for explaining results. +The secondary goal is to either observe ICache effects in practice, or else to provide an alternative hypothesis for why removing inlining can speed the things up.

+

The benchmark is based on my once_cell Rust library. +The library provides a primitive, similar to double-checked locking. +Theres a function that looks like this:

+ +
+ + +
fn get_or_try_init<F, E>(&self, f: F) -> Result<&T, E>
+where
+ F: FnOnce() -> Result<T, E>,
+{
+  if let Some(value) = self.get() {
+    // Fast path.
+    return Ok(value);
+  }
+
+  // Slow path.
+  self.0.initialize(f)?;
+  Ok(unsafe { self.get_unchecked() })
+}
+ +
+

I know that performance improves significantly when the initialize function is not inlined. +Its somewhat obvious that this is the case (thats why the benchmark is synthetic real world examples are about cases where we dont know if inline is needed). +But it is unclear why, exactly, inlining initialize leads to slower code.

+

For the experiment, I wrote a simple high-level benchmark calling get_or_try_init in a loop:

+ +
+ + +
const N_LOOPS: usize = 8;
+static CELL: OnceCell<usize> = OnceCell::new();
+
+fn main() {
+  for i in 0..N_LOOPS {
+    go(i)
+  }
+}
+
+fn go(i: usize) {
+  for _ in 0..100_000_000 {
+    let &value = CELL.get_or_init(|| i);
+    assert!(value < N_LOOPS);
+  }
+}
+ +
+

I also added compile-time toggle to force or forbid inlining:

+ +
+ + +
#[cfg_attr(feature = "inline_always", inline(always))]
+#[cfg_attr(feature = "inline_never", inline(never))]
+fn initialize() { ... }
+ +
+

You can see the full benchmark in this commit: matklad/once_cell@a741d5f.

+

Running both versions shows that #[inline(never)] is indeed measurably faster:

+ +
+ + +
$ cargo run -q --example bench  --release --features inline_always
+330ms
+
+$ cargo run -q --example bench  --release --features inline_never
+259ms
+ +
+ +

How do we explain the difference? +The first step is to remove cargo from the equation and make two binaries for comparison:

+ +
+ + +
$ cargo build --example bench --release --features inline_never
+$ cp ./target/release/examples/bench never
+$ cargo build --example bench --release --features inline_always
+$ cp ./target/release/examples/bench always
+ +
+

On Linux, the best tool to quickly access the performance of any program is perf stat. +It runs the program and shows a bunch of CPU-level performance counters, which might explain whats going on. +As we suspect that ICache might be to blame, lets include the counters for caches:

+ +
+ + +
$ perf stat -e instructions,cycles,\
+  L1-dcache-loads,L1-dcache-load-misses,L1-dcache-prefetches,\
+  L1-icache-loads,L1-icache-load-misses,cache-misses \
+  ./always
+348ms
+
+ 6,396,754,995      instructions:u
+ 1,601,314,994      cycles:u
+ 1,600,621,170      L1-dcache-loads:u
+         4,806      L1-dcache-load-misses:u
+         4,402      L1-dcache-prefetches:u
+        69,594      L1-icache-loads:u
+           461      L1-icache-load-misses:u
+         1,928      cache-misses:u
+
+$ perf stat -e instructions,cycles,\
+  L1-dcache-loads,L1-dcache-load-misses,L1-dcache-prefetches,\
+  L1-icache-loads,L1-icache-load-misses,cache-misses \
+  ./never
+261ms
+
+ Performance counter stats for './never':
+
+ 5,597,215,493      instructions:u
+ 1,199,960,402      cycles:u
+ 1,599,404,303      L1-dcache-loads:u
+         4,612      L1-dcache-load-misses:u
+         4,290      L1-dcache-prefetches:u
+        62,268      L1-icache-loads:u
+           603      L1-icache-load-misses:u
+         1,675      cache-misses:u
+ +
+

There is some difference in L1-icache-load-misses, but theres also a surprising difference in instructions. +Whats more, the L1-icache-load-misses difference is hard to estimate, because its unclear what L1-icache-loads are. +As a sanity check, statistics for dcache are the same, just as we expect.

+

While perf takes the real data from the CPU, an alternative approach is to run the program in a simulated environment. +Thats what cachegrind tool does. +Fun fact: the primary author of cachegrind is @nnethercote, whose Rust Performance Book we saw in the last post. +Lets see what cachegrind thinks about the benchmark.

+ +
+ + +
$ valgrind --tool=cachegrind ./always
+10s
+ I   refs:      6,400,577,147
+ I1  misses:            1,560
+ LLi misses:            1,524
+ I1  miss rate:          0.00%
+ LLi miss rate:          0.00%
+
+ D   refs:      1,600,196,336
+ D1  misses:            5,549
+ LLd misses:            4,024
+ D1  miss rate:           0.0%
+ LLd miss rate:           0.0%
+
+ LL refs:               7,109
+ LL misses:             5,548
+ LL miss rate:            0.0%
+
+$ valgrind --tool=cachegrind ./never
+9s
+ I   refs:      5,600,577,226
+ I1  misses:            1,572
+ LLi misses:            1,529
+ I1  miss rate:          0.00%
+ LLi miss rate:          0.00%
+
+ D   refs:      1,600,196,330
+ D1  misses:            5,553
+ LLd misses:            4,024
+ D1  miss rate:           0.0%
+ LLd miss rate:           0.0%
+
+ LL refs:               7,125
+ LL misses:             5,553
+ LL miss rate:            0.0%
+ +
+

Note that, because cachegrind simulates the program, it runs much slower. +Here, we dont see a big difference in ICache misses (I1 first level instruction cache, LLi last level instruction cache). +We do see a difference in ICache references. +Note that the number of times CPU refers to ICache should correspond to the number of instructions it executes. +Cross-checking the number with perf, we see that both perf and cachegrind agree on the number of instructions executed. +They also agree that inline_always version executes more instructions. +Its still hard to say what perfs sL1-icache-loads means. +Judging by the name, it should correspond to cachegrinds I refs, but it doesnt.

+

Anyway, it seems theres one thing that bears further investigation why inlining changes the number of instructions executed? +Inlining doesnt actually change the code the CPU runs, so the number of instructions should stay the same. +Lets look at the asm then! +The right tool here is cargo-asm.

+

Again, heres the function we will be looking at:

+ +
+ + +
fn go(tid: usize) {
+  for _ in 0..100_000_000 {
+    let &value = CELL.get_or_init(|| tid);
+    assert!(value < N_THREADS);
+  }
+}
+ +
+

The call to get_or_init will be inlined, and the nested call to initialize will be inlined depending on the flag.

+

Lets first look at the inline_never version:

+ +
+ + +
  push    r14 ;
+  push    rbx ; prologue
+  push    rax ;
+  mov     qword, ptr, [rsp], rdi
+  mov     ebx, 100000001 ; loop counter
+  mov     r14, rsp
+  jmp     .LBB14_1
+ .loop:
+  cmp     qword, ptr, [rip, +, CELL+16], 8
+  jae     .assert_failure
+ .LBB14_1:
+  add     rbx, -1
+  je      .normal_exit
+  mov     rax, qword, ptr, [rip, +, CELL]
+  cmp     rax, 2
+  je      .loop
+  mov     rdi, r14
+  call    once_cell::imp::OnceCell<T>::initialize
+  jmp     .loop
+ .normal_exit:
+  add     rsp, 8 ;
+  pop     rbx    ; epilogue
+  pop     r14a   ;
+  ret            ;
+ .assert_failure:
+  lea     rdi, [rip, +, .L__unnamed_12]
+  lea     rdx, [rip, +, .L__unnamed_13]
+  mov     esi, 35
+  call    qword, ptr, [rip, +, core::panicking::panic@GOTPCREL]
+  ud2
+ +
+

And then at the inline_always version:

+ +
+ + +
  push    rbp  ;
+  push    r15  ;
+  push    r14  ;
+  push    r13  ; prologue
+  push    r12  ;
+  push    rbx  ;
+  sub     rsp, 24
+  mov     r12, rdi
+  xor     ebx, ebx
+  mov     r13d, 1
+  lea     r14, [rip, +, CELL]
+  mov     rbp, qword, ptr, [rip, +, WaiterQueue::drop@GOTPCREL]
+  mov     r15, qword, ptr, [rip, +, once_cell::imp::wait@GOTPCREL]
+  jmp     .LBB10_1
+ .LBB10_10:
+  mov     qword, ptr, [rsp, +, 8], r14
+  mov     qword, ptr, [rip, +, CELL+8], 1
+  mov     qword, ptr, [rip, +, CELL+16], r12
+  mov     qword, ptr, [rsp, +, 16], 2
+  lea     rdi, [rsp, +, 8]
+  call    rbp
+ .loop:
+  add     rbx, 1
+  cmp     qword, ptr, [rip, +, CELL+16], 8
+  jae     .assert_failure
+ .LBB10_1:
+  cmp     rbx, 100000000
+  je      .normal_exit
+  mov     rax, qword, ptr, [rip, +, CELL]
+  cmp     rax, 2
+  je      .loop
+ .LBB10_3:
+  mov     rax, qword, ptr, [rip, +, CELL]
+ .LBB10_4:
+  test    rax, rax
+  jne     .LBB10_5
+  xor     eax, eax
+  lock    cmpxchg, qword, ptr, [rip, +, CELL], r13
+  jne     .LBB10_4
+  jmp     .LBB10_10
+ .LBB10_5:
+  cmp     rax, 2
+  je      .loop
+  mov     ecx, eax
+  and     ecx, 3
+  cmp     ecx, 1
+  jne     .LBB10_8
+  mov     rdi, r14
+  mov     rsi, rax
+  call    r15
+  jmp     .LBB10_3
+ .normal_exit:
+  add     rsp, 24 ;
+  pop     rbx     ;
+  pop     r12     ;
+  pop     r13     ; epilogue
+  pop     r14     ;
+  pop     r15     ;
+  pop     rbp     ;
+  ret
+ .assert_failure:
+  lea     rdi, [rip, +, .L__unnamed_9]
+  lea     rdx, [rip, +, .L__unnamed_10]
+  mov     esi, 35
+  call    qword, ptr, [rip, +, core::panicking::panic@GOTPCREL]
+  ud2
+ .LBB10_8:
+  lea     rdi, [rip, +, .L__unnamed_11]
+  lea     rdx, [rip, +, .L__unnamed_12]
+  mov     esi, 57
+  call    qword, ptr, [rip, +, core::panicking::panic@GOTPCREL]
+  ud2
+ +
+

Ive slightly edited the code and also highlighted the hot loop which constitutes the bulk of the benchmark.

+

Looking at the assembly, we can see the following:

+ +

Note that its highly unlikely that ICache affects the running code, as its a small bunch of instructions next to each other in memory. +On the other hand, an extra cmp with a large immediate precisely accounts for the amount of extra instructions we observe (the loop is run 800_000_000 times).

+
+ +

+ Conclusions +

+

Its hard enough to come up with a benchmark which demonstrate the difference between two alternatives. +Its even harder to explain the difference there might be many readily available explanations, but they are not necessary true. +Nonetheless, today we have a wealth of helpful tooling. +Two notable examples are perf and valgrind. +Tools are not always correct its a good idea to sanity check different tools against each other and against common-sense understanding of the problem.

+

For inlining in particular, we found the following reasons why inlining S into C might cause a slow down:

+
    +
  1. +Inlining might cause C to use more registers. +This means that prologue and epilogue grow additional push/pop instructions, which also use stack memory. +Without inlining, these instructions are hidden in S and are only paid for when C actually calls into S, as opposed to every time C itself is called. +
  2. +
  3. +Generalizing from the first point, if S is called in a loop or in an if, the compiler might hoist some instructions of S to before the branch, moving them from the cold path to the hot path. +
  4. +
  5. +With more local variables and control flow in the stack frame to juggle, compiler might accidentally pessimize the hot loop. +
  6. +
+

If you are curious under which conditions ICache does become an issue, theres this excellent article about one such case.

+
+
+
+ + + + + diff --git a/2021/07/30/shell-injection.html b/2021/07/30/shell-injection.html new file mode 100644 index 00000000..0c2f553c --- /dev/null +++ b/2021/07/30/shell-injection.html @@ -0,0 +1,432 @@ + + + + + + + ; echo Shell Injection + + + + + + + + + + + + +
+ +
+ +
+
+ +

; echo Shell Injection

+

This is an introductory article about shell injection, a security vulnerability allowing an attacker to execute arbitrary code on the users machine. +This is a well-studied problem, and there are simple and efficient solutions to it. +Its relatively easy to design library API in such a way as to shield the application developer from the risk of shell injections.

+

There are two reasons why I am writing this post. +First, this year Ive pointed out this issue in three different libraries. +It seems that, although the problem is well-studied, its not well known, so just repeating some things might help. +Second, Ive recently reported a related problem about one of the VS Code APIs, and I want to use this piece as an extended GitHub comment :-)

+
+ +

+ A Curious Case Of Pwnd Script +

+

Shell injection can happen when a program needs to execute another program, and one of the arguments is controlled by the user/attacker. +As a model example, lets write a quick script to read a list of URLs from stdin, and run curl for each one of those.

+

Thats not realistic, but small and illustrative. +This is what the script could look like in NodeJS:

+ +
+
curl-all.js
+ + +
const readline = require('readline');
+
+const util = require('util');
+const exec = util.promisify(require('child_process').exec);
+
+async function main() {
+  const input = readline.createInterface({
+    input: process.stdin,
+    output: process.stdout,
+    terminal: false,
+  });
+
+  for await (const line of input) {
+    if (line.trim().length > 0) {
+      const { stdout, stderr } = await exec(`curl ${line}`);
+      console.log({ stdout, stderr });
+    }
+  }
+}
+
+main()
+ +
+

I would have written this in Rust, but, alas, its not vulnerable to this particular attack :)

+

The interesting line is this one:

+ +
+ + +
const { stdout, stderr } = await exec(`curl ${line}`);
+ +
+

Here, we use are using exec API from node to spawn a child curl process, passing a line of input as an argument.

+

Seems to work for simple cases?

+ +
+ + +
$ cat urls.txt
+<https://example.com>
+
+$ node curl-all.js < urls.txt
+{
+  stdout: '<!doctype html>...</html>\n',
+  stderr: '% Total    % Received ...'
+}
+ +
+

But what if we use a slightly more imaginative input?

+ +
+ + +
$ node main.js < malice_in_the_wonderland.txt
+{
+  stdout: 'PWNED, reading your secrets from /etc/passwd\n' +
+    'root:x:0:0:System administrator:/root:/bin/fish\n' +
+    '...' +
+    'matklad:x:1000:100::/home/matklad:/bin/fish\n',
+  stderr: "curl: try 'curl --help' for more information\n"
+}
+ +
+

That feels bad seems that the script somehow reads the contents of my /etc/passwd. +How did this happen, weve only invoked curl?

+
+
+ +

+ Spawning a Process +

+

To understand what have just happened, we need to learn a bit about how spawning a process works in general. +This section is somewhat UNIX-specific things are implemented a bit differently on Windows. +Nonetheless, the big picture conclusions hold there as well.

+

The main API to run a program with command line arguments is the exec family of functions. +For example, heres execve:

+ +
+ + +
int execve(const char *pathname, char *const argv[],
+           char *const envp[]);
+ +
+

It takes the name of the program (pathname), a list of command line arguments (argv), and a list of environment variable for the new process (envp), and uses those to run the specified binary. +How exactly this happens is a fascinating story with many forks in the plot, but it is beyond the scope of the article.

+

What is curious though, is that while the underlying system API wants an array of arguments, the child_process.exec function from node takes only a single string: exec("curl http://example.com").

+

Lets find out! +To do that, well use the strace tool. +This tool inspects (traces) all the system calls invoked by the program. +Well ask strace to look for execve in particular, to understand how nodes exec maps to the underlying systems API. +Well need the --follow argument to trace all processes, and not just the top-level one. +To reduce the amount of output and only print execve, well use the --trace flag:

+ +
+ + +
$ strace --follow --trace execve node main.js < urls.txt
+execve("/bin/node", ["node", "curl-all.js"], 0x7fff97776be0)
+...
+execve("/bin/sh", ["/bin/sh", "-c", "curl https://example.com"], 0x3fcacc0)
+...
+execve("/bin/curl", ["curl", "https://example.com"], 0xec4008)
+ +
+

The first execve we see here is our original invocation of the node binary itself. +The last one is what we want to do spawn curl with a single argument, an url. +And the middle one is what nodes exec actually does.

+

Lets take a closer look:

+ +
+ + +
/bin/sh -c "curl https://example.com"
+ +
+

Here, node invokes the sh binary (systems shell) with two arguments: -c and the string we originally passed to child_process.exec. +-c stands for command, and instructs the shell to interpret the value as a shell command, parse, it and then run it.

+

In other words, rather then running the command directly, node asks the shell to do the heavy lifting. +But the shell is an interpreter of the shell language, and, by carefully crafting the input to exec, we can ask it to run arbitrary code. +In particular, thats what we used as a payload in the bad example above:

+ +
+
malice_in_the_wonderland.txt
+ + +
; echo 'PWNED, reading your secrets from /etc/passwd' && cat /etc/passwd
+ +
+

After the string interpolation, the resulting command was

+ +
+ + +
/bin/sh -c "curl; echo '...' && cat /etc/passwd"
+ +
+

That is, first run curl, then echo, then read the /etc/passwd.

+
+
+ +

+ Those Who Study History Are Doomed to Repeat It +

+

Theres an equivalent safe API in node: spawn. +unlike exec, it uses an array of arguments rather then a single string.

+ +
+ + +
-  exec(`curl ${line}`)
++ spawn("curl", line)
+ +
+

Internally, the API bypasses the shell and uses execve directly. +Thus, this API is not vulnerable to shell injection attacker can run curl with bad arguments, but it cant run something else than curl.

+

Note that its easy to implement exec in terms of spawn:

+ +
+ + +
function myExec(cmd) {
+  return spawn("/bin/sh", "-c", cmd)
+}
+ +
+

Its a common pattern among many languages:

+
    +
  • +theres an exec-style function that takes a string and spawns /bin/sh -c under the hood, +
  • +
  • +the docs for this function include a giant disclaimer, saying that using it with user input is a bad idea, +
  • +
  • +theres a safe alternative which takes arguments as an array and spawns the process directly. +
  • +
+

Why provide an exploitable API, while a safe version is possible and is more direct? +I dont know, but my guess is that its mostly just history. +C has system, Perls backticks correspond directly to that, Ruby got backticks from Perl, Python just has system, node was probably influenced by all these scripting languages.

+

Note that security isnt the only issue with /bin/sh -c based API. +Read this other post to learn about the rest of the problems.

+
+
+ +

+ Take Aways +

+

If you are an application developer, be aware that this issue exists. +Read the language documentation carefully most likely, there are two flavors of process spawning functions. +Note how shell injection is similar to SQL injection and XSS.

+

If you develop a library for conveniently working with external processes, use and expose only the shell-less API from the underlying platform.

+

If you build a new platform, dont provide bin/sh -c API in the first place. +Be like deno (and also Go, Rust, Julia), dont be like node (and also Python, Ruby, Perl, C). +If you have to maintain such API for legacy reasons, clearly document the issue about shell injection. +Documenting how to do /bin/sh -c by hand might also be a good idea.

+

If you are designing a programming language, be careful with string interpolation syntax. +Its important that string interpolation can be used to spawn a command in a safe way. +That mostly means that library authors should be able to deconstruct a "cmd -j $arg1 -f $arg2" literal into two (compile-time) arrays: ["cmd -j ", " -f "] and [arg1, arg2]. +If you dont provide this feature in the language, library authors will split the interpolated string, which would be unsafe (not only for shelling out for SQLing or HTMLing as well). +Good examples to learn from are JavaScripts +tagged templates +and Julias +backticks.

+
+
+ +

+ Whats About VS Code? +

+

Oh, right, the actual reason why I am writing this thing. +The TL;DR for this section is that I want to complain about a specific API design a bit.

+

This story begins in #9058.

+

I was happily hacking on some Rust library. +At some point I pressed the run tests button in rust-analyzer. +And, surprised, accidentally pwned myself!

+ +
+ + +
Executing task: cargo test --doc --- Plotter<D>::line_fill --nocapture
+
+warning: An error occurred while redirecting file 'D'
+open: No such file or directory
+
+The terminal process
+/bin/fish '-c', 'cargo test --doc --- Plotter<D>::line_fill --nocapture'
+failed to launch (exit code: 1).
+
+Terminal will be reused by tasks, press any key to close it.
+ +
+

That was disappointing. +Cmon, how come theres a shell injection in the code I help to maintain? +While this is not a big problem for rust-analyzer (our security model assumes trusted code, as each of rustup, cargo, and rustc can execute arbitrary code by design), it definitely was big blow to my aesthetics sensibilities!

+

Looking at the git history, it was me who had missed concatenate arguments into a single string during review. +So I was definitely a part of the problem here. +But the other part is that the API that takes a single string exists at all.

+

Lets look at the API:

+ +
+ + +
export class ShellExecution {
+  /**
+    * Creates a shell execution with a full command line.
+    *
+    * @param commandLine The command line to execute.
+    * @param options Optional options for the started the shell.
+    */
+  constructor(
+    commandLine: string,
+    options?: ShellExecutionOptions
+  );
+
+  /* ... */
+}
+ +
+

So, this is exactly what I am describing a process-spawning API that takes a single string. +I guess, in this case this might even be justified the API opens a literal shell in the GUI, and the user can interact with it after the command finishes.

+

Anyway, after looking around I quickly found another API, which seemed (ominous music in the background) like what I was looking for:

+ +
+ + +

+export class ShellExecution {
+  /**
+    * Creates a shell execution with a command and arguments.
+    * For the real execution the editor will construct a
+    * command line from the command and the arguments. This
+    * is subject to interpretation especially when it comes to
+    * quoting. If full control over the command line is needed
+    * please use the constructor that creates a `ShellExecution`
+    * with the full command line.
+    *
+    * @param command The command to execute.
+    * @param args The command arguments.
+    * @param options Optional options for the started the shell.
+    */
+  constructor(
+    command: string | ShellQuotedString,
+    args: (string | ShellQuotedString)[],
+    options?: ShellExecutionOptions
+  );
+}
+ +
+

The API takes a array of strings. +It also tries to say something about quoting, which is a good sign! +The wording is perplexing, but seems that it struggles to explain to me that passing ["ls", ">", "out.txt"] wont actually redirect, because > will get quoted. +This is exactly what I want! +The absence of any kind of a security note on both APIs is concerning, but oh well.

+

So, I refactored the code to use this second constructor, and, 🥁 🥁 🥁, it still had the exact same behavior! +Turns out that this API takes an array of arguments, and just concatenates them, unless I explicitly say that each argument needs to be escaped.

+

And this is what I am complaining about that the API looks like it is safe for an untrusted user input, while it is not. +This is misuse resistance resistance.

+

Thats all, thanks for reading!

+
+
+
+ + + + + diff --git a/2021/08/22/large-rust-workspaces.html b/2021/08/22/large-rust-workspaces.html new file mode 100644 index 00000000..d482e812 --- /dev/null +++ b/2021/08/22/large-rust-workspaces.html @@ -0,0 +1,273 @@ + + + + + + + Large Rust Workspaces + + + + + + + + + + + + +
+ +
+ +
+
+ +

Large Rust Workspaces

+

In this article, Ill share my experience with organizing large Rust projects. +This is in no way authoritative just some tips Ive discovered through trial and error.

+

Cargo, Rusts build system, follows convention over configuration principle. +It provides a set of good defaults for small projects, and it is especially well-tailored for public crates.io libraries. +The defaults are not perfect, but they are good enough. +The resulting ecosystem-wide consistency is also welcome.

+

However, Cargo is less opinionated when it comes to large, multi-crate projects, organized as a Cargo workspace. +Workspaces are flexible Cargo doesnt have a preferred layout for them. +As a result, people try different things, with varying degrees of success.

+

To cut to the chase, I think for projects in between ten thousand and one million lines of code, the flat layout makes the most sense. +rust-analyzer (200k lines) is good example here. +The repository is laid out this:

+ +
+ + +
rust-analyzer/
+  Cargo.toml
+  Cargo.lock
+  crates/
+    rust-analyzer/
+    hir/
+    hir_def/
+    hir_ty/
+    ...
+ +
+

In the root of the repo, Cargo.toml defines a virtual manifest:

+ +
+
Cargo.toml
+ + +
[workspace]
+members = ["crates/*"]
+ +
+

Everything else (including rust-analyzer main crate) is nested one-level deep under crates/. +The name of each directory is equal to the name of the crate:

+ +
+
crates/hir_def/Cargo.toml
+ + +
[package]
+name = "hir_def"
+version = "0.0.0"
+edition = "2018"
+ +
+

At the time of writing, there are 32 different subfolders in crates/.

+
+ +

+ Flat Is Better Than Nested +

+

Its interesting that this advice goes against the natural tendency to just organize everything hierarchically:

+ +
+ + +
rust-analyzer/
+  Cargo.toml
+  src/
+  hir/
+    Cargo.toml
+    src/
+    def/
+    ty/
+ +
+

There are several reasons why trees are inferior in this case.

+

First, the Cargo-level namespace of crates is flat. +Its not possible to write hir::def in Cargo.toml, so crates typically have prefixes in their names. +Tree layout creates an alternative hierarchy, which adds a possibility for inconsistencies.

+

Second, even comparatively large lists are easier to understand at a glance than even small trees. +ls ./crates gives immediate birds eye view of the project, and this view is small enough:

+ +
+ + +
16:22:57|~/projects/rust-analyzer|master✓
+λ ls ./crates
+base_db
+cfg
+flycheck
+hir
+hir_def
+hir_expand
+hir_ty
+ide
+ide_assists
+ide_completion
+ide_db
+ide_diagnostics
+ide_ssr
+limit
+mbe
+parser
+paths
+proc_macro_api
+proc_macro_srv
+proc_macro_test
+profile
+project_model
+rust-analyzer
+sourcegen
+stdx
+syntax
+test_utils
+text_edit
+toolchain
+tt
+vfs
+ +
+

Doing the same for a tree-based layout is harder. +Looking at a single level doesnt tell you which folders contains nested crates. +Looking at all level lists too many folders. +Looking only at folder that contain Cargo.toml gives the right result, but is not as trivial as just ls.

+

It is true that nested structure scales better than a flat one. +But the constant matters until you hit a million lines of code, the number of crates in the project will probably fit on one screen.

+

Finally, the last problem with hierarchical layout is that there are no perfect hierarchies. +With a flat structure, adding or splitting the crates is trivial. +With a tree, you need to figure out where to put the new crate, and, if there isnt a perfect match for it already, youll have to either:

+
    +
  • +add a stupid mostly empty folder near the top +
  • +
  • +add a catch-all utils folder +
  • +
  • +place the code in a known suboptimal directory. +
  • +
+

This is a significant issue for long-lived multi-person projects tree structure tends to deteriorate over time, while flat structure doesnt need maintenance.

+
+
+ +

+ Smaller Tips +

+

Make the root of the workspace a virtual manifest. +It might be tempting to put the main crate into the root, but that pollutes the root with src/, requires passing --workspace to every Cargo command, and adds an exception to an otherwise consistent structure.

+

Dont succumb to the temptation to strip common prefix from folder names. +If each crate is named exactly as the folder it lives in, navigation and renames become easier. +Cargo.tomls of reverse dependencies mention both the folder and the crate name, its useful when they are exactly the same.

+

For large projects a lot of repository bloat often comes from ad-hoc automation Makefiles and various prepare.sh scripts here and there. +To avoid both the bloat and proliferation of ad-hoc workflows, write all automation in Rust in a dedicated crate. +One pattern useful for this is cargo xtask.

+

Use version = "0.0.0" for internal crates you dont intend to publish. +If you do want to publish a subset of crates with proper semver API, be very deliberate about them. +It probably makes sense to extract all such crates into a separate top-level folder, libs/. +It makes it easier to check that things in libs/ dont use things from crates/.

+

Some crates consist only of a single-file. +For those, it is tempting to flatten out the src directory and keep lib.rs and Cargo.toml in the same directory. +I suggest not doing that even if crate is single file now, it might get expanded later.

+ +
+
+
+ + + + + diff --git a/2021/09/04/fast-rust-builds.html b/2021/09/04/fast-rust-builds.html new file mode 100644 index 00000000..baef9602 --- /dev/null +++ b/2021/09/04/fast-rust-builds.html @@ -0,0 +1,601 @@ + + + + + + + Fast Rust Builds + + + + + + + + + + + + +
+ +
+ +
+
+ +

Fast Rust Builds

+

Its common knowledge that Rust code is slow to compile. +But I have a strong gut feeling that most Rust code out there compiles much slower than it could.

+

As an example, one fairly recent post says:

+ +
+

With Rust, on the other hand, it takes between 15 and 45 minutes to run a CI pipeline, depending on your project and the power of your CI servers.

+
+ +
+

This doesnt make sense to me. +rust-analyzer CI takes 8 minutes on GitHub actions. +It is a fairly large and complex project with 200k lines of own code and 1 million lines of dependencies on top.

+

It is true that Rust is slow to compile in a rather fundamental way. +It picked slow compiler in the generic dilemma, and its overall philosophy prioritizes runtime over compile time (an excellent series of posts about that: +1, +2, +3, +4). +But rustc is not a slow compiler it implements the most advanced incremental compilation in industrial compilers, it takes advantage of compilation model based on proper modules (crates), and it has been meticulously optimized. +Fast to compile Rust projects are a reality, even if they are not common. +Admittedly, some care and domain knowledge is required to do that.

+

So lets take a closer look at what did it take for us to keep the compilation time within reasonable bounds for rust-analyzer!

+
+ +

+ Why Care About Build Times +

+

One thing I want to make clear is that optimizing projects build time is in some sense busy-work. +Reducing compilation time provides very small direct benefits to the users, and is pure accidental complexity.

+

That being said, compilation time is a multiplier for basically everything. +Whether you want to ship more features, to make code faster, to adapt to a change of requirements, or to attract new contributors, build time is a factor in that.

+

It also is a non-linear factor. +Just waiting for the compiler is the smaller problem. +The big one is losing the state of the flow or (worse) mental context switch to do something else while the code is compiling. +One minute of work for the compiler wastes more than one minute of work for the human.

+

Its hard for me to quantify the impact, but my intuitive understanding is that, as soon as the project grows beyond several thousands lines written by a single person, build times become pretty darn important!

+

The most devilish property of build times is that they creep up on you. +While the project is small, build times are going to be acceptable. +As projects grow incrementally, build times start to slowly increase as well. +And if you let them grow, it might be rather hard to get them back in check later!

+

If project is already too slow to compile, then:

+
    +
  • +Improving build times will be time consuming, because each iteration of try a change, trigger the build, measure improvement will take long time (yes, build times are a multiplier for everything, including build times themselves!) +
  • +
  • +There wont be easy wins: in contrast to runtime performance, pareto principle doesnt work! +If you write a thousand lines of code, maybe one hundred of them will be performance-sensitive, but each line will add to compile times! +
  • +
  • +Small wins will seem too small until they add up: shaving off five seconds is a much bigger deal for a five minute build than for an hour-long build. +
  • +
  • +Dually, small regressions will go unnoticed. +
  • +
+

Theres also a culture aspect to it: if you join a project and its CI takes one hour, then an hour-long CI is normal, right?

+

Luckily, theres one simple trick to solve the problem of build times

+
+
+ +

+ The Silver Bullet +

+

You need to care about build times, keep an eye on them, and fix them before they become a problem. +Build times are a fairly easy optimization problem: its trivial to get direct feedback (just time the build), there are a bunch of tools for profiling, and you dont even need to come up with a representative benchmark. +The task is to optimize a particular projects build time, not performance of the compiler in general. +Thats a nice property of most instances of accidental complexity they tend to be well defined engineering problems with well understood solutions.

+

The only hard bit about compilation time is that you dont know that it is a problem until it actually is one! +So, the most valuable thing you can get from this post is this: +if you are working on a Rust project, take some time to optimize its build today, and try to repeat the exercise once in a while.

+

Now, with the software engineering bits cleared, lets finally get to some actionable programming advice!

+
+
+ +

+ bors +

+

I like to use CI time as one of the main metrics to keep an eye on.

+

Part of that is that CI time is important in itself. +While you are not bound by CI when developing features, CI time directly affects how annoying it is to context switch when finishing one piece of work and starting the next one. +Juggling five outstanding PRs waiting for CI to complete is not productive. +Longer CI also creates a pressure to not split the work into independent chunks. +If correcting a typo requires keeping a PR tab open for half a hour, its better to just make a drive by fix in the next feature branch, right?

+

But a bigger part is that CI gives you a standardized benchmark. +Locally, you compile incrementally, and the time of build varies greatly with the kinds of changes you are doing. +Often, you compile just a subset of the project. +Due to this inherent variability, local builds give poor continuous feedback about build times. +Standardized CI though runs for every change and gives you a time series where numbers are directly comparable.

+

To increase this standardization pressure of CI, I recommend following not rocket science rule and setting up a merge robot which guarantees that every state of the main branch passes CI. +bors is a particular implementation I use, but there are others.

+

While its by far not the biggest reason to use something like bors, it gives two benefits for healthy compile times:

+
    +
  • +It ensures that every change goes via CI, and creates pressure to keep CI healthy overall +
  • +
  • +The time between leaving r+ comment on the PR and receiving the PR merged notification gives you an always on feedback loop. +You dont need to specifically time the build, every PR is a build benchmark. +
  • +
+
+
+ +

+ CI Caching +

+

If you think about it, its pretty obvious how a good caching strategy for CI should work. +It makes sense to cache stuff that changes rarely, but its useless to cache frequently changing things. +That is, cache all the dependencies, but dont cache projects own crates.

+

Unfortunately, almost nobody does this. +A typical example would just cache the whole of ./target directory. +Thats wrong the ./target is huge, and most of it is useless on CI.

+

Its not super trivial to fix though sadly, Cargo doesnt make it too easy to figure out which part of ./target are durable dependencies, and which parts are volatile local crates. +So, youll need to write some code to clean the ./target before storing the cache. +For GitHub actions in particular you can also use Swatinem/rust-cache.

+
+
+ +

+ CI Workflow +

+

Caching is usually the low-hanging watermelon, but there are several more things to tweak.

+

Split CI into separate cargo test --no-run and cargo test. +It is vital to know which part of your CI is the build, and which are the tests.

+

Disable incremental compilation. +CI builds often are closer to from-scratch builds, as changes are typically much bigger than from a local edit-compile cycle. +For from-scratch builds, incremental adds an extra dependency-tracking overhead. +It also significantly increases the amount of IO and the size of ./target, which make caching less effective.

+

Disable debuginfo it makes ./target much bigger, which again harms caching. +Depending on your preferred workflow, you might consider disabling debuginfo unconditionally, this brings some benefits for local builds as well.

+

While we are at it, add -D warnings to the RUSTFLAGS environmental variable to deny warning for all crates at the same time. +Its a bad idea to #![deny(warnings)] in code: you need to repeat it for every crate, it needlessly makes local development harder, and it might break your users when they upgrade their compiler. +It might also make sense to bump cargo network retry limits.

+
+
+ +

+ Read The Lockfile +

+

Another obvious advice is to use fewer, smaller dependencies.

+

This is nuanced: libraries do solve actual problems, and it would be stupid to roll your own solution to something already solved by crates.io. +And its not like its guaranteed that your solution will be smaller.

+

But its important to realise what problems your application is and is not solving. +If you are building a CLI utility for thousands of people of to use, you absolutely need clap with all of its features. +If you are writing a quick script to run during CI, which only the team will be using, its probably fine to start with simplistic command line parsing, but faster builds.

+

One tremendously useful exercise here is to read Cargo.lock (not Cargo.toml) and for each dependency think about the actual problem this dependency solves for the person in front of your application. +Its very frequent that youll find dependencies that just dont make sense at all, in your context.

+

As an illustrative example, rust-analyzer depends on regex. +This doesnt make sense we have exact parsers and lexers for Rust and Markdown, we dont need to interpret regular expressions at runtime. +regex is also one of the heavier dependencies its a full implementation of a small language! +The reason why this dependency is there is because the logging library we use allows to say something like:

+ +
+ + +
RUST_LOG=rust_analyzer=very complex filtering expression
+ +
+

where parsing of the filtering expression is done by regular expressions.

+

This is undoubtedly a very useful feature to have for some applications, but in the context of rust-analyzer we dont need it. +Simple env_logger-style filtering would be enough.

+

Once you identify a similar redundant dependency, its usually enough to tweak features field somewhere, or to send a PR upstream to make non-essential bits configurable.

+

Sometimes it is a bigger yak to shave :) +For example, rust-analyzer optionally use jemalloc crate, and its build script pulls in fs_extra and (of all the things!) paste. +The ideal solution here would be of course to have a production grade, stable, pure rust memory allocator.

+
+
+ +

+ Profile Before Optimize +

+

Now that weve dealt with things which are just sensible to do, its time to start measuring before cutting. +A tool to use here is timings flag for Cargo (documentation). +Sadly, I lack the eloquence to adequately express the level of quality and polish of this feature, so let me just say ❤️ and continue with my dry prose.

+

cargo build -Z timings records profiling data during the build, and then renders it as a very legible and information-dense HTML file. +This is a nightly feature, so youll need the +nightly toggle. +This isnt a problem in practice, as you only need to run this manually once in a while.

+

Heres an example from rust-analyzer:

+ +
+ + +
$ cargo +nightly build -p rust-analyzer --bin rust-analyzer \
+  -Z timings --release
+ +
+ +
+ + +
+

Not only can you see how long each crate took to compile, but youll also see how individual compilations where scheduled, when each crate started to compile, and its critical dependency.

+
+
+ +

+ Compilation Model: Crates +

+

This last point is important crates form a directed acyclic graph of dependencies and, on a multicore CPU, the shape of this graph affects the compilation time a lot.

+

This is slow to compile, as all the crates need to be compiled sequentially:

+ +
+ + +
A -> B -> C -> D -> E
+ +
+

This version is much faster, as it enables significantly more parallelism:

+ +
+ + +
   +-  B  -+
+  /         \
+A  ->  C  ->  E
+  \         /
+   +-  D  -+
+ +
+

Theres also connection between parallelism and incrementality. +In the wide graph, changing B doesnt entail recompiling C and D.

+

The first advice you get when complaining about compile times in Rust is: split the code into crates. +It is not that easy if you ended up with a graph like the first one, you are not winning much. +It is important to architect the applications to look like the second picture a common vocabulary crate, a number of independent features, and a leaf crate to tie everything together. +The most important property of a crate is which crates it doesnt (transitively) depend on.

+

Another important consideration is the number of final artifacts (most typically binaries). +Rust is statically linked, so, if two different binaries use the same library, each binary contains a separately linked copy of the library. +If you have n binaries and m libraries, and each binary uses each library, then the amount of work to do during the linking is m * n. +For this reason, its better to minimize the number of artifacts. +One common technique here is BusyBox-style Swiss Army knife executables. +The idea is that you can hardlink the same executable as several files with different names. +The program then can look at the zeroth command line argument to learn the name it was invoked with, and use it effectively as a name of a subcommand. +One cargo-specific gotcha here is that, by default, each file in ./examples or ./tests folder creates a new executable.

+
+
+ +

+ Compilation Model: Macros And Pipelining +

+

But Cargo is even smarter than that! +It does pipelined compilation splitting the compilation of a crate into metadata and codegen phases, and starting compilation of dependent crates as soon as the metadata phase is over.

+

This has interesting interactions with procedural macros (and build scripts). +rustc needs to run procedural macros to compute crates metadata. +That means that procedural macros cant be pipelined, and crates using procedural macros are blocked until the proc macro is fully compiled to the binary code.

+

Separately from that, procedural macros need to parse Rust code, and that is a relatively complex task. +The de-facto crate for this, syn, takes quite some time to compile (not because it is bloated just because parsing Rust is hard).

+

This generally means that projects tend to have syn / serde shaped hole in the CPU utilization profile during compilation. +Its relatively important to use procedural macros only where they pull their weight, and try to push crates before syn in the cargo -Z timings graph.

+

The latter can be tricky, as proc macro dependencies can sneak up on you. +The problem here is that they are often hidden behind feature flags, and those feature flags might be enabled by downstream crates. +Consider this example:

+

You have a convenient utility type for example, an SSO string, in a small_string crate. +To implement serialization, you dont actually need derive (just delegating to String works), so you add an (optional) dependency on serde:

+ +
+ + +
[package]
+name = "small-string"
+
+[dependencies]
+serde = { version = "1" }
+ +
+

SSO string is a rather useful abstraction, so it gets used throughout the codebase. +Then in some leaf crate which, eg, needs to expose a JSON API, you add dependency on small_string with the serde feature, as well as serde with derive itself:

+ +
+ + +
[package]
+name = "json-api"
+
+[dependencies]
+small-string = { version = "1", features = [ "serde" ] }
+serde = { version = "1", features = [ "derive" ] }
+ +
+

The problem here is that json-api enables the derive feature of serde, and that means that small-string and all of its reverse-dependencies now need to wait for syn to compile! +Similarly, if a crate depends on a subset of syns features, but something else in the crate graph enables all features, the original crate gets them as a bonus as well!

+

Its not necessarily the end of the world, but it shows that dependency graph can get tricky with the presence of features. +Luckily, cargo -Z timings makes it easy to notice that something strange is happening, even if it might not be always obvious what exactly went wrong.

+

Theres also a much more direct way for procedural macros to slow down compilation if the macro generates a lot of code, the result would take some time to compile. +That is, some macros allow you to write just a bit of source code, which feels innocuous enough, but expands to substantial amount of logic. +The prime example is serialization Ive noticed that converting values to/from JSON accounts for surprisingly big amount of compiling. +Thinking in terms of overall crate graph helps here you want to keep serialization at the boundary of the system, in the leaf crates. +If you put serialization near the foundation, then all intermediate crates would have to pay its build-time costs.

+

All that being said, an interesting side-note here is that procedural macros are not inherently slow to compile. +Rather, its the fact that most proc macros need to parse Rust or to generate a lot of code that makes them slow. +Sometimes, a macro can accept a simplified syntax which can be parsed without syn, and emit a tiny bit of Rust code based on that. +Producing valid Rust is not nearly as complicated as parsing it!

+
+
+ +

+ Compilation Model: Monomorphization +

+

Now that weve covered macro issues at the level of crates, its time to look closer, at the code-level concerns. +The main thing to look here are generics. +Its vital to understand how they are compiled, which, in case of Rust, is achieved by monomorphization. +Consider a run of the mill generic function:

+ +
+ + +
fn frobnicate<T: SomeTrait>(x: &T) {
+   ...
+}
+ +
+

When Rust compiles this function, it doesnt actually emit machine code. +Instead, it stores an abstract representation of function body in the library. +The actual compilation happens when you instantiate the function with a particular type parameter. +The C++ terminology gives the right intuition here frobnicate is a template, it produces an actual function when a concrete type is substituted for the parameter T.

+

In other words, in the following case

+ +
+ + +
fn frobnicate_both(x: String, y: Widget) {
+  frobnicate(&x);
+  frobnicate(&y);
+}
+ +
+

on the level of machine code there will be two separate copies of frobnicate, which would differ in details of how they deal with parameter, but would be otherwise identical.

+

Sounds pretty bad, right? +Seems like that you can write a gigantic generic function, and then write just a small bit of code to instantiate it with a bunch of types, to create a lot of load for the compiler.

+

Well, I have bad news for you the reality is much, much worse. +You dont even need different types to create duplication. +Lets say we have four crates which form a diamond

+ +
+ + +
   +- B -+
+  /       \
+A           D
+  \       /
+   +- C -+
+ +
+

The frobnicate is defined in A, and is used by B and C

+ +
+ + +
// A
+pub fn frobnicate<T: SomeTrait>(x: &T) { ... }
+
+// B
+pub fn do_b(s: String) { a::frobnicate(&s) }
+
+// C
+pub fn do_c(s: String) { a::frobnicate(&s) }
+
+// D
+fn main() {
+  let hello = "hello".to_owned();
+  b::do_b(&hello);
+  c::do_c(&hello);
+}
+ +
+

In this case, we only ever instantiate frobincate with String, but it will get compiled twice, because monomorphization happens per crate. +B and C are compiled separately, and each includes machine code for do_* functions, so they need frobnicate<String>. +If optimizations are disabled, rustc can share template instantiations with dependencies, but that doesnt work for sibling dependencies. +With optimizations, rustc doesnt share monomorphizations even with direct dependencies.

+

In other words, generics in Rust can lead to accidentally-quadratic compilation times across many crates!

+

If you are wondering whether it gets worse than that, the answer is yes. +I think the actual unit of monomorphization is codegen unit, so duplicates are possible even within one crate.

+
+
+ +

+ Keeping an Eye on Instantiations +

+

Besides just duplication, generics add one more problem they shift the blame for compile times to consumers. +Most of the compile time cost of generic functions is borne out by the crates that use the functionality, while the defining crate just typechecks the code without doing any code generation. +Coupled with the fact that at times it is not at all obvious what gets instantiated where and why (example), this make it hard to directly see the footprint of generic APIs

+

Luckily, this is not needed theres a tool for that! +cargo llvm-lines tells you which monomorphizations are happening in a specific crate.

+

Heres an example from a recent investigation:

+ +
+ + +
$ cargo llvm-lines --lib --release -p ide_ssr | head -n 12
+ Lines          Copies        Function name
+  -----          ------        -------------
+  533069 (100%)  28309 (100%)  (TOTAL)
+   20349 (3.8%)    357 (1.3%)  RawVec<T,A>::current_memory
+   18324 (3.4%)    332 (1.2%)  <Weak<T> as Drop>::drop
+   14024 (2.6%)    332 (1.2%)  Weak<T>::inner
+   11718 (2.2%)    378 (1.3%)  core::ptr::metadata::from_raw_parts_mut
+   10710 (2.0%)    357 (1.3%)  <RawVec<T,A> as Drop>::drop
+    7984 (1.5%)    332 (1.2%)  <Arc<T> as Drop>::drop
+    7968 (1.5%)    332 (1.2%)  Layout::for_value_raw
+    6790 (1.3%)     97 (0.3%)  hashbrown::raw::RawTable<T,A>::drop_elements
+    6596 (1.2%)     97 (0.3%)  <hashbrown::raw::RawIterRange<T> as Iterator>::next
+ +
+

It shows, for each generic function, how many copies of it were generated, and whats their total size. +The size is measured very coarsely, in the number of llvm ir lines it takes to encode the function. +A useful fact: llvm doesnt have generic functions, its the job of rustc to turn a function template and a set of instantiations into a set of actual functions.

+
+
+ +

+ Keeping Instantiations In Check +

+

Now that we understand the pitfalls of monomorphization, a rule of thumb becomes obvious: do not put generic code at the boundaries between the crates. +When designing a large system, architect it as a set of components where each of the components does something concrete and has non-generic interface.

+

If you do need generic interface for better type-safety and ergonomics, make sure that the interface layer is thin, and that it immediately delegates to a non-generic implementation. +The classical example to internalize here are various functions from str::fs module which operate on paths:

+ +
+ + +
pub fn read<P: AsRef<Path>>(path: P) -> io::Result<Vec<u8>> {
+  fn inner(path: &Path) -> io::Result<Vec<u8>> {
+    let mut file = File::open(path)?;
+    let mut bytes = Vec::new();
+    file.read_to_end(&mut bytes)?;
+    Ok(bytes)
+  }
+  inner(path.as_ref())
+}
+ +
+

The outer function is parameterized it is ergonomic to use, but is compiled afresh for every downstream crate. +Thats not a problem though, because it is very small, and immediately delegates to a non-generic function that gets compiled in the std.

+

If you are writing a function which takes a path as an argument, either use &Path, or use impl AsRef<Path> and delegate to a non-generic implementation. +If you care about API ergonomics enough to use impl trait, you should use inner trick compile times are as big part of ergonomics, as the syntax used to call the function.

+

A second common case here are closures: by default, prefer &dyn Fn() over impl Fn(). +Similarly to paths, an impl-based nice API might be a thin wrapper around dyn-based implementation which does the bulk of the work.

+

Another idea along these lines is generic, inline hotpath; concrete, outline coldpath. +In the once_cell crate, theres this curious pattern (simplified, heres the actual source):

+ +
+ + +
struct OnceCell<T> {
+  state: AtomicUsize,
+  inner: Option<T>,
+}
+
+impl<T> OnceCell<T> {
+  #[cold]
+  fn initialize<F: FnOnce() -> T>(&self, f: F) {
+    let mut f = Some(f);
+    synchronize_access(self.state, &mut || {
+      let f = f.take().unwrap();
+      match self.inner {
+        None => self.inner = Some(f()),
+        Some(_value) => (),
+      }
+    });
+  }
+}
+
+fn synchronize_access(state: &AtomicUsize, init: &mut dyn FnMut()) {
+  // One hundred lines of tricky synchronization code on atomics.
+}
+ +
+

Here, the initialize function is generic twice: first, the OnceCell is parametrized with the type of value being stored, and then initialize takes a generic closure parameter. +The job of initialize is to make sure (even if it is called concurrently from many threads) that at most one f is run. +This mutual exclusion task doesnt actually depend on specific T and F and is implemented as non-generic synchronize_access, to improve compile time. +One wrinkle here is that, ideally, wed want an init: dyn FnOnce() argument, but thats not expressible in todays Rust. +The let mut f = Some(f) / let f = f.take().unwrap() is a standard work-around for this case.

+
+
+ +

+ Conclusions +

+

I guess thats it! +To repeat the main ideas:

+

Build times are a big factor in the overall productivity of the humans working on the project. +Optimizing this is a straightforward engineering task the tools are there. +What might be hard is not letting them slowly regress. +I hope this post provides enough motivation and inspiration for that! +As a rough baseline, 200k line Rust project somewhat optimized for reasonable build times should take about 10 minutes of CI on GitHub actions.

+

Discussion on /r/rust.

+ +
+
+
+ + + + + diff --git a/2021/09/05/Rust100k.html b/2021/09/05/Rust100k.html new file mode 100644 index 00000000..bd97bd4c --- /dev/null +++ b/2021/09/05/Rust100k.html @@ -0,0 +1,131 @@ + + + + + + + One Hundred Thousand Lines of Rust + + + + + + + + + + + + +
+ +
+ +
+
+ +

One Hundred Thousand Lines of Rust

+

In 2021, I wrote a series of posts about lessons learned maintaining medium-sized Rust projects. +Heres the list, in chronological order:

+ +
+
+ + + + + diff --git a/2021/11/07/generate-all-the-things.html b/2021/11/07/generate-all-the-things.html new file mode 100644 index 00000000..1db89fa9 --- /dev/null +++ b/2021/11/07/generate-all-the-things.html @@ -0,0 +1,669 @@ + + + + + + + Generate All the Things + + + + + + + + + + + + +
+ +
+ +
+
+ +

Generate All the Things

+

In this post, well look at one technique from property-based testing repertoire: full coverage / exhaustive testing. +Specifically, we will learn how to conveniently enumerate any kind of combinatorial object without using recursion.

+

To start, lets assume we have some algorithmic problem to solve. +For example, we want to sort an array of numbers:

+ +
+ + +
fn sort(xs: &mut [u32]) {
+    ...
+}
+ +
+

To test that the sort function works, we can write a bunch of example-based test cases. +This approach has two flaws:

+ +

A better approach is randomized testing: just generate a random array and check that it is sorted:

+ +
+ + +
#[test]
+fn naive_randomized_testing() {
+  let mut rng = rand::thread_rng();
+  for _ in 0..100_000 {
+    let n: usize = rng.gen_range(0..1_000);
+    let mut xs: Vec<u32> =
+      std::iter::repeat_with(|| rng.gen()).take(n).collect();
+
+    sort(&mut xs);
+
+    for i in 1..xs.len() {
+      assert!(xs[i - 1] <= xs[i]);
+    }
+  }
+}
+ +
+

Here, we generated one hundred thousand completely random test cases!

+

Sadly, the result might actually be worse than a small set of hand-picked examples. +The problem here is that, if you pick an array completely at random (sample uniformly), it will be a rather ordinary array. +In particular, given that the elements are arbitrary u32 numbers, its highly unlikely that we generate an array with at least some equal elements. +And when I write quick sort, I always have that nasty bug that it just loops infinitely when all elements are equal.

+

There are several fixes for the problem. +The simplest one is to just make the sampling space smaller:

+ +
+ + +
std::iter::repeat_with(|| rng.gen_range(0..10)).take(n).collect();
+ +
+

If we generate not an arbitrary u32, but a number between 0 and 10, well get some short arrays where all elements are equal. +Another trick is to use a property-based testing library, which comes with some strategies for generating interesting sequences predefined. +Yet another approach is to combine property-based testing and coverage guided fuzzing. +When checking a particular example, we will collect coverage information for this specific input. +Given a set of inputs with coverage info, we can apply targeted genetic algorithms to try to cover more of the code. +A particularly fruitful insight here is that we dont have to invent a novel structure-aware fuzzer for this. +We can take an existing fuzzer which emits a sequence of bytes, and use those bytes as a sequence of random numbers to generate structured input. +Essentially, we say that the fuzzer is a random number generator. +That way, when the fuzzer flips bits in the raw bytes array, it applies local semantically valid transformations to the random data structure.

+

But this post isnt about those techniques :) +Instead, it is about the idea of full coverage. +Most of the bugs involve small, tricky examples. +If a sorting routine breaks on some array with ten thousand elements its highly likely that theres a much smaller array (a handful of elements), which exposes the same bug. +So what we can do is to just generate every array of length at most n with numbers up to m and exhaustively check them all:

+ +
+ + +
#[test]
+fn exhaustive_testing() {
+  let n = 5;
+  let m = 5;
+  for xs in every_array(n, m) {
+    sort(&mut xs);
+
+    for i in 1..xs.len() {
+      assert!(xs[i - 1] <= xs[i]);
+    }
+  }
+}
+ +
+

The problem here is that implementing every_array is tricky. +It is one of those puzzlers you know how to solve, but which are excruciatingly annoying to implement for the umpteenth time:

+ +
+ + +
fn every_array(n: usize, m: u32) -> Vec<Vec<u32>> {
+  if n == 0 {
+    return vec![Vec::new()];
+  }
+
+  let mut res = Vec::new();
+  for xs in every_array(n - 1, m) {
+    for x in 0..=m {
+      let mut ys = xs.clone();
+      ys.push(x);
+      res.push(ys)
+    }
+  }
+
+  res
+}
+ +
+

Whats more, for algorithms you often need to generate permutations, combinations and subsets, and they all have similar simple but tricky recursive solutions.

+

Yesterday I needed to generate a sequence of up to n segments with integer coordinates up to m, which finally pushed me to realize that theres a relatively simple way to exhaustively enumerate arbitrary combinatorial objects. +I dont recall seeing it anywhere else, which is surprising, as the technique seems rather elegant.

+
+

Lets look again at how we generate a random array:

+ +
+ + +
let l: usize = rng.gen_range(0..l);
+let mut xs: Vec<u32> =
+  std::iter::repeat_with(|| rng.gen(..m)).take(m).collect();
+ +
+

This is definitely much more straightforward than the every_array function above, although it does sort-of the same thing. +The trick is to take this generate a random thing code and just make it generate every thing instead. +In the above code, we base decisions on random numbers. +Specifically, an input sequence of random numbers generates one element in the search space. +If we enumerate all sequences of random numbers, we then explore the whole space.

+

Essentially, well rig the rng to not be random, but instead to emit all finite sequences of numbers. +By writing a single generator of such sequences, we gain an ability to enumerate arbitrary objects. +As we are interested in generating all small objects, we always pass an upper bound when asking for a random number. +We can use the bounds to enumerate only the sequences which fit under them.

+

So, the end result will look like this:

+ +
+ + +
#[test]
+fn for_every_array() {
+  let n = 5;
+  let m = 4;
+
+  let mut g = Gen::new();
+  while !g.done() {
+    let l = g.gen(n) as usize;
+    let xs: Vec<_> =
+      std::iter::repeat_with(|| g.gen(m)).take(l).collect::<_>();
+    // `xs` enumerates all arrays
+  }
+}
+ +
+

The implementation of Gen is relatively straightforward. +On each iteration, we will remember the sequence of numbers we generated together with bounds the user requested, something like this:

+ +
+ + +
value:  3 1 4 4
+bound:  5 4 4 4
+ +
+

To advance to the next iteration, we will find the smallest sequence of values which is larger than the current one, but still satisfies all the bounds. +“Smallest means that well try to increment the rightmost number. +In the above example, the last two fours already match the bound, so we cant increment them. +However, we can increment one to get 3 2 4 4. +This isnt the smallest sequence though, 3 2 0 0 would be smaller. +So, after incrementing the rightmost number we can increment, we zero the rest.

+

Heres the full implementation:

+ +
+ + +
struct Gen {
+  started: bool,
+  v: Vec<(u32, u32)>,
+  p: usize,
+}
+
+impl Gen {
+  fn new() -> Gen {
+    Gen { started: false, v: Vec::new(), p: 0 }
+  }
+  fn done(&mut self) -> bool {
+    if !self.started {
+      self.started = true;
+      return false;
+    }
+
+    for i in (0..self.v.len()).rev() {
+      if self.v[i].0 < self.v[i].1 {
+        self.v[i].0 += 1;
+        self.v.truncate(i + 1);
+        self.p = 0;
+        return false;
+      }
+    }
+
+    true
+  }
+  fn gen(&mut self, bound: u32) -> u32 {
+    if self.p == self.v.len() {
+      self.v.push((0, 0));
+    }
+    self.p += 1;
+    self.v[self.p - 1].1 = bound;
+    self.v[self.p - 1].0
+  }
+}
+ +
+

Some notes:

+ +

Lets see how our gen fairs for generating random arrays of length at most n. +Well count how many distinct cases were covered:

+ +
+ + +
#[test]
+fn gen_arrays() {
+  let n = 5;
+  let m = 4;
+  let expected_total =
+    (0..=n).map(|l| (m + 1).pow(l)).sum::<u32>();
+
+  let mut total = 0;
+  let mut all = HashSet::new();
+
+  let mut g = Gen::new();
+  while !g.done() {
+    let l = g.gen(n) as usize;
+    let xs: Vec<_> =
+      std::iter::repeat_with(|| g.gen(m)).take(l).collect::<_>();
+
+    all.insert(xs);
+    total += 1
+  }
+
+  assert_eq!(all.len(), total);
+  assert_eq!(expected_total, total as u32)
+}
+ +
+

This test passes. +That is, the gen approach for this case is both exhaustive (it generates all arrays) and efficient (each array is generated once).

+

As promised in the posts title, lets now generate all the things.

+

First case: there should be only one nothing (thats the reason why we need start):

+ +
+ + +
#[test]
+fn gen_nothing() {
+  let expected_total = 1;
+
+  let mut total = 0;
+  let mut g = Gen::new();
+  while !g.done() {
+    total += 1;
+  }
+  assert_eq!(expected_total, total)
+}
+ +
+

Second case: we expect to see n numbers and n*2 ordered pairs of numbers.

+ +
+ + +
#[test]
+fn gen_number() {
+  let n = 5;
+  let expected_total = n + 1;
+
+  let mut total = 0;
+  let mut all = HashSet::new();
+  let mut g = Gen::new();
+  while !g.done() {
+    let a = g.gen(n);
+
+    all.insert(a);
+    total += 1;
+  }
+
+  assert_eq!(expected_total, total);
+  assert_eq!(expected_total, all.len() as u32);
+}
+
+#[test]
+fn gen_number_pair() {
+  let n = 5;
+  let expected_total = (n + 1) * (n + 1);
+
+  let mut total = 0;
+  let mut all = HashSet::new();
+  let mut g = Gen::new();
+  while !g.done() {
+    let a = g.gen(n);
+    let b = g.gen(n);
+
+    all.insert((a, b));
+    total += 1;
+  }
+
+  assert_eq!(expected_total, total);
+  assert_eq!(expected_total, all.len() as u32);
+}
+ +
+

Third case: we expect to see n * (n - 1) / 2 unordered pairs of numbers. +This one is interesting here, our second decision is based on the first one, but we still enumerate all the cases efficiently (without duplicates). +(Aside: did you ever realise that the number of ways to pick two objects out of n is equal to the sum of first n natural numbers?)

+ +
+ + +
#[test]
+fn gen_number_combination() {
+  let n = 5;
+  let expected_total = n * (n + 1) / 2;
+
+  let mut total = 0;
+  let mut all = HashSet::new();
+  let mut g = Gen::new();
+  while !g.done() {
+    let a = g.gen(n - 1);
+    let b = a + 1 + g.gen(n - a - 1);
+    all.insert((a, b));
+    total += 1;
+  }
+
+  assert_eq!(expected_total, total);
+  assert_eq!(expected_total, all.len() as u32);
+}
+ +
+

Weve already generated all arrays, so lets try to create all permutations. +Still efficient:

+ +
+ + +
#[test]
+fn gen_permutations() {
+  let n = 5;
+  let expected_total = (1..=n).product::<u32>();
+
+  let mut total = 0;
+  let mut all = HashSet::new();
+  let mut g = Gen::new();
+  while !g.done() {
+    let mut candidates: Vec<i32> = (1..=n).collect();
+    let mut permutation = Vec::new();
+    for _ in 0..n {
+      let idx = g.gen(candidates.len() as u32 - 1);
+      permutation.push(candidates.remove(idx as usize));
+    }
+
+    all.insert(permutation);
+    total += 1;
+  }
+
+  assert_eq!(expected_total, total);
+  assert_eq!(expected_total, all.len() as u32);
+}
+ +
+

Subsets:

+ +
+ + +
#[test]
+fn gen_subset() {
+    let n = 5;
+    let expected_total = 1 << n;
+
+    let mut total = 0;
+    let mut all = HashSet::new();
+    let mut g = Gen::new();
+    while !g.done() {
+        let s: Vec<_> = (0..n).map(|_| g.gen(1) == 1).collect();
+
+        all.insert(s);
+        total += 1;
+    }
+
+    assert_eq!(expected_total, total);
+    assert_eq!(expected_total, all.len() as u32);
+}
+ +
+

Combinations:

+ +
+ + +
#[test]
+fn gen_combinations() {
+    let n = 5;
+    let m = 3;
+    let fact = |n: u32| -> u32 { (1..=n).product() };
+    let expected_total = fact(n) / (fact(m) * fact(n - m));
+
+    let mut total = 0;
+    let mut all = HashSet::new();
+    let mut g = Gen::new();
+    while !g.done() {
+        let mut candidates: Vec<u32> = (1..=n).collect();
+        let mut combination = BTreeSet::new();
+        for _ in 0..m {
+            let idx = g.gen(candidates.len() as u32 - 1);
+            combination.insert(candidates.remove(idx as usize));
+        }
+
+        all.insert(combination);
+        total += 1;
+    }
+
+    assert_eq!(expected_total, total);
+    assert_eq!(expected_total, all.len() as u32);
+}
+ +
+

Now, this one actually fails while this code generates all combinations, some combinations are generated more than once. +Specifically, what we are generating here are k-permutations (combinations with significant order of elements). +While this is not efficient, this is OK for the purposes of exhaustive testing (as we still generate any combination). +Nonetheless, theres an efficient version as well:

+ +
+ + +
let mut combination = BTreeSet::new();
+for c in 1..=n {
+  if combination.len() as u32 == m {
+    break;
+  }
+  if combination.len() as u32 + (n - c + 1) == m {
+    combination.extend(c..=n);
+    break;
+  }
+  if g.gen(1) == 1 {
+    combination.insert(c);
+  }
+}
+ +
+

I think this covers all standard combinatorial structures. +Whats interesting, this approach works for non-standard structures as well. +For example, for https://cses.fi/problemset/task/2168, the problem which started all this, I need to generate sequences of segments:

+ +
+ + +
#[test]
+fn gen_segments() {
+  let n = 5;
+  let m = 6;
+
+  let mut total = 0;
+  let mut all = HashSet::new();
+  let mut g = Gen::new();
+  while !g.done() {
+    let l = g.gen(n);
+
+    let mut xs = Vec::new();
+    for _ in 0..l {
+      if m > 0 {
+        let l = g.gen(m - 1);
+        let r = l + 1 + g.gen(m - l - 1);
+        if !xs.contains(&(l, r)) {
+          xs.push((l, r))
+        }
+      }
+    }
+
+    all.insert(xs);
+    total += 1;
+  }
+  assert_eq!(all.len(), 2_593_942);
+  assert_eq!(total, 4_288_306);
+}
+ +
+

Due to the .contains check there are some duplicates, but thats not a problem as long as all sequences of segments are generated. +Additionally, examples are strictly ordered by their complexity earlier examples have fewer segments with smaller coordinates. +That means that the first example which fails a property test is actually guaranteed to be the smallest counterexample! Nifty!

+

Thats all! +Next time when you need to test something, consider if you can just exhaustively enumerate all sufficiently small inputs. +If thats feasible, you can either write the classical recursive enumerator, or use this imperative Gen thing.

+

Update(2021-11-28):

+

There are now Rust (crates.io link) and C++ (GitHub link) implementations. +Capturing the Future by Replaying the Past is a related paper which includes the above technique as a special case of simulate any monad by simulating delimited continuations via exceptions and replay trick.

+

Balanced parentheses sequences:

+ +
+ + +
#[test]
+fn gen_parenthesis() {
+  let n = 5;
+  let expected_total = 1 + 1 + 2 + 5 + 14 + 42;
+
+  let mut total = 0;
+  let mut all = HashSet::new();
+  let mut g = Gen::new();
+  while !g.done() {
+    let l = g.gen(n);
+    let mut s = String::new();
+    let mut bra = 0;
+    let mut ket = 0;
+    while ket < l {
+      if bra < l && (bra == ket || g.gen(1) == 1) {
+        s.push('(');
+        bra += 1;
+      } else {
+        s.push(')');
+        ket += 1;
+      }
+    }
+
+    all.insert(s);
+    total += 1;
+  }
+
+  assert_eq!(expected_total, total);
+  assert_eq!(expected_total, all.len() as u32);
+}
+ +
+
+
+ + + + + diff --git a/2021/11/27/notes-on-module-system.html b/2021/11/27/notes-on-module-system.html new file mode 100644 index 00000000..20b3102d --- /dev/null +++ b/2021/11/27/notes-on-module-system.html @@ -0,0 +1,299 @@ + + + + + + + Notes On Module System + + + + + + + + + + + + +
+ +
+ +
+
+ +

Notes On Module System

+

Unedited summary of what I think a better module system for a Rust-like +language would look like.

+

Todays Rust module system is its most exciting feature, after borrow checker. +Explicit separation between crates (which form a DAG) and modules (which might +be mutually dependent) and the absence of a single global namespace (crates +dont have innate names; instead, the name is written on a dependency edge +between two crates, and the same crate might be known under different names in +two of its dependents) makes decentralized ecosystems of libraries a-la +crates.io robust. Specifically, Rust allows linking-in several versions of the +same crate without the fear of naming conflicts.

+

However, the specific surface syntax we use to express the model I feel is +suboptimal. Module system is pretty confusing (in the pre-2018 surveys, it was +by far the most confusing aspect of the language after lifetimes. Post-2018 +system is better, but there are still regular questions about module system). +What can we do better?

+

First, be more precise about visibilities. The most single most important +question about an item is can it be visible outside of CU?. Depending on the +answer to that, you have either closed world (all usages are known) or open +world (usages are not-knowable) assumption. This should be reflected in the +modules system. pub is for visible inside the whole CU, but not further. +export or (my favorite) pub* is for visible to the outer world. You sorta +can have these in todays rust with pub(crate), -Dunreachable_pub and some +tolerance for compiler false-positive.

+

I am not sure if the rest of Rust visibility systems pulls its weight. It is OK, +but it is pretty complex pub(in some::path) and doesnt really help — +making visibilities more precise within a single CU doesnt meaningfully make +the code better, as you can control and rewrite all the code anyway. CU doesnt +have internal boundaries which can be reflected in visibilities. If we go this +way, we get a nice, simple system: fn foo() is visible in the current module +only (not its children), pub fn foo() is visible anywhere inside the current +crate, and pub* fn foo() is visible to other crates using ours. But then, +again, the current tree-based visibility is OK, can leave it in as long as +pub/pub* is more explicit and -Dunreachable_pub is an error by default.

+

In a similar way, the fact that use is an item (ie, a::b can use items +imported in a) is an unnecessary cuteness. Imports should only introduce the +name into modules namespace, and should be separate from intentional +re-exports. It might make sense to ban glob re-export thisll give you a +nice property that all the names existing in the module are spelled out +explicitly, which is useful for tooling. Though, as Rust has namespaces, looking +at pub use submod::thing doesnt tell you whether the thing is a type or a +value, so this might not be a meaningful property after all.

+

The second thing to change would be module tree/directory structure mapping. +The current system creates quite some visible problems:

+ +

A bunch of less-objective issues:

+ +

I think a better system would say that a compilation unit is equivalent to a +directory with Rust source files, and that (relative) file paths correspond to +module paths. Theres neither mod foo; nor mod foo {} (yes, sometimes those +are genuinely useful. No, the fact that something can be useful doesnt mean +it should be part of the language its very hard to come up with a language +features which would be completely useless (though mod foo {} I think can be +added back relatively painless)). We use mod.rs, but we name it +_$name_of_the_module$.rs instead, to solve two issues: sort it first +alphabetically, and generate a unique fuzzy-findable name. So, something like +this:

+ +
+ + +
/home/matklad/projects/regex
+  Cargo.toml
+  src/
+    _regex.rs
+    parsing/
+      _parsing.rs
+      ast.rs
+    rt/
+     _rt.rs
+     dfa.rs
+     nfa.rs
+  bins/
+    grep/
+      _grep.rs
+      cli.rs
+  tests/
+    _tests.rs   # just a single integration tests binary by default!
+    lookahead.rs
+    fuzz.rs
+ +
+

The library there would give the following module tree:

+ +
+ + +
crate::{
+    parsing::{ast}
+    rt::{nfa, dfa}
+}
+ +
+

To do conditional compilation, youd do:

+ +
+ + +
mutex/
+  _mutex.rs
+  linux_mutex.rs
+  windows_mutex.rs
+ +
+

where _mutex.rs is

+ +
+ + +
#[cfg(linux)]
+use linux_mutex as os_mutex;
+#[cfg(windows)]
+use windows_mutex as os_mutex;
+
+pub struct Mutex {
+   inner: os_mutex::Mutex
+}
+ +
+

and linux_mutex.rs starts with #![cfg(linux)]. But of course we shouldnt +implement conditional compilation by barbarically cutting the AST, and instead +should push conditional compilation to after the type checking, so that you at +least can check, on Linux, that the windows version of your code wouldnt fail +due to some stupid typos in the name of #[cfg(windows)] functions. Alas, I +dont know how to design such conditional compilation system.

+

The same re-export idiom would be used for specifying non-default visibility: +pub* use rt; would make regex::rt a public module (yeah, this +particular bit is sketchy :-) ).

+

I think this approach would make most of pitfalls impossible. E.g, it wouldnt +be possible to mix several different crates in one source tree. Additionally, +itd be a great help for IDEs, as each file can be processed independently, and +it would be clear just from the file contents and path where in the crate +namespace the items are mounted, unlocking +map-reduce +style IDE.

+

While we are at it, use definitely should use exactly the same path resolution +rules as the rest of the language, without any kind of implicit leading ::” +special cases. Oh, and we shouldnt have nested use groups:

+ +
+ + +
use collections::{
+    hash::{HashMap, HashSet},
+    BTreeMap,
+}
+ +
+

Some projects use them, some projects dont use them, sufficiently large +projects inconsistently both use and dont use them.

+

Afterword: as Ive said in the beginning, this is unedited and not generally +something Ive thought very hard and long about. Please dont take this as one +true way to do things, my level of confidence about these ideas is about 0.5 I +guess.

+
+
+ + + + + diff --git a/2022/03/14/rpath-or-why-lld-doesnt-work-on-nixos.html b/2022/03/14/rpath-or-why-lld-doesnt-work-on-nixos.html new file mode 100644 index 00000000..4238fd97 --- /dev/null +++ b/2022/03/14/rpath-or-why-lld-doesnt-work-on-nixos.html @@ -0,0 +1,439 @@ + + + + + + + RPATH, or why lld doesn't work on NixOS + + + + + + + + + + + + +
+ +
+ +
+
+ +

RPATH, or why lld doesnt work on NixOS

+

Ive learned a thing I wish I didnt know. +As a revenge, I am going to write it down so that you, my dear reader, also learn about this. +You probably want to skip this post unless you are interested and somewhat experienced in all of Rust, NixOS, and dynamic linking.

+
+ +

+ Problem +

+

I use NixOS and Rust. +For linking my Rust code, I would love to use lld, the LLVM linker, as it is significantly faster. +Unfortunately, this often leads to errors when trying to run the resulting binary:

+ +
+ + +
error while loading shared libraries: libbla.so.92:
+cannot open shared object file: No such file or directory
+ +
+

Lets see whats going on here!

+
+
+ +

+ Baseline +

+

Well be using evdev-rs as a running example. +It is binding to the evdev shared library on Linux. +First, well build it with the default linker, which just works (haha, nope, this is NixOS).

+

Lets get the crate:

+ +
+ + +
$ git clone git@github.com:ndesh26/evdev-rs.git
+$ cd evdev-rs
+ +
+

And run the example

+ +
+ + +
$ cargo run --example evtest
+    Updating crates.io index
+  Downloaded libc v0.2.120
+  Downloaded 1 crate (574.7 KB) in 1.10s
+   Compiling cc v1.0.73
+   Compiling pkg-config v0.3.24
+   Compiling libc v0.2.120
+   Compiling log v0.4.14
+   Compiling cfg-if v1.0.0
+   Compiling bitflags v1.3.2
+   Compiling evdev-sys v0.2.4
+error: failed to run custom build command for `evdev-sys`
+<---SNIP--->
+  Couldn't find libevdev from pkgconfig
+<---SNIP--->
+ +
+

This of course doesnt just work and spits out humongous error message, which contains one line of important information: we are missing libevdev library. +As this is NixOS, we are not going to barbarically install it globally. +Lets create an isolated environment instead, using nix-shell:

+ +
+
shell.nix
+ + +
with import <nixpkgs> {};
+mkShell {
+    buildInputs = [
+        pkgconfig
+        libevdev
+    ];
+}
+ +
+

And activate it:

+ +
+ + +
$ nix-shell
+ +
+

This environment gives us two things the pkg-config binary and the evdev library. +pkg-config is a sort of half of a C package manager for UNIX: it cant install libraries, but it helps to locate them. +Lets ask it about libevdev:

+ +
+ + +
$ pkg-config --libs libevdev
+-L/nix/store/62gwpvp0c1i97lr84az2p0qg8nliwzgh-libevdev-1.11.0/lib -levdev
+ +
+

Essentially, it resolved librarys short name (libevdev) to the full path to the directory were the library resides:

+ +
+ + +
$ exa -l /nix/store/62gwpvp0c1i97lr84az2p0qg8nliwzgh-libevdev-1.11.0/lib
+libevdev.la
+libevdev.so -> libevdev.so.2.3.0
+libevdev.so.2 -> libevdev.so.2.3.0
+libevdev.so.2.3.0
+pkgconfig
+ +
+

The libevdev.so.2.3.0 file is the actual dynamic library. +The symlinks stuff is another bit of a C package manager which implements somewhat-semver: libevdev.so.2 version requirement gets resolved to libevdev.so.2.3.0 version.

+

Anyway, this works well enough to allow us to finally run the example

+ +
+ + +
$ cargo run --example evtest
+    Finished dev [unoptimized + debuginfo] target(s) in 0.01s
+     Running `target/debug/examples/evtest`
+Usage: evtest /path/to/device
+ +
+

Success!

+

Ooook, so lets now do what we wanted to from the beginning and configure cargo to use lld, for blazingly fast linking.

+
+
+ +

+ lld +

+

The magic spell you need need to put into .cargo/config is (courtesy of @lnicola):

+ +
+ + +
[build]
+rustflags = ["-Clink-arg=-fuse-ld=lld"]
+ +
+

To unpack this:

+
    +
  • +-C set codegen option link-arg=-fuse-ld=lld. +
  • +
  • +link-arg means that rustc will pass -fuse-ld=lld to the linker. +
  • +
  • +Because linkers are not in the least confusing, the linker here is actually the whole gcc/clang. +That is, rather than invoking the linker, rustc will call cc and that will then call the linker. +
  • +
  • +So -fuse-ld (unlike -C, I think this is an atomic option, not -f use-ld) is an argument to gcc/clang, +which asks it to use lld linker. +
  • +
  • +And note that its lld rather than ldd which confusingly exists and does something completely different. +
  • +
+

Anyhow, the end result is that we switch the linker from ld (default slow GNU linker) to lld (fast LLVM linker).

+

And that breaks!

+

Building the code still works fine:

+ +
+ + +
$ cargo build --example evtest
+   Compiling libc v0.2.120
+   Compiling pkg-config v0.3.24
+   Compiling cc v1.0.73
+   Compiling log v0.4.14
+   Compiling cfg-if v1.0.0
+   Compiling bitflags v1.3.2
+   Compiling evdev-sys v0.2.4 (/home/matklad/tmp/evdev-rs/evdev-sys)
+   Compiling evdev-rs v0.5.0 (/home/matklad/tmp/evdev-rs)
+    Finished dev [unoptimized + debuginfo] target(s) in 2.87s
+ +
+

But running the binary fails:

+ +
+ + +
$ cargo run --example evtest
+    Finished dev [unoptimized + debuginfo] target(s) in 0.01s
+     Running `target/debug/examples/evtest`
+target/debug/examples/evtest: error while loading shared libraries:
+libevdev.so.2: cannot open shared object file: No such file or directory
+ +
+
+
+ +

+ rpath +

+

Ok, whats now? +Now, lets understand why the first example, with ld rather than lld, cant work :-)

+

As a reminder, we use NixOS, so theres no global folder a-la /usr/lib where all shared libraries are stored. +Coming back to our pkgconfig example,

+ +
+ + +
$ pkg-config --libs libevdev
+-L/nix/store/62gwpvp0c1i97lr84az2p0qg8nliwzgh-libevdev-1.11.0/lib -levdev
+ +
+

the libevdev.so is well-hidden behind the hash. +So we need a pkg-config binary at compile time to get from libevdev name to actual location.

+

However, as this is a dynamic library, we need it not only during compilation, but during runtime as well. +And at runtime loader (also known as dynamic linker (its binary name is something like ld-linux-x86-64.so, but despite the .so suffix, its an executable (I kid you not, this stuff is indeed this confusing))) loads the executable together with shared libraries required by it. +Normally, the loader looks for libraries in well-known locations, like the aforementioned /usr/lib or LD_LIBRARY_PATH. +So we need something which would tell the loader that libevdev lives at /nix/store/$HASH/lib.

+

That something is rpath (also known as RUNPATH) this is more or less LD_LIBRARY_PATH, just hard-coded into the executable. +We can use readelf to inspect programs rpath.

+

When the binary is linked with the default linker, the result is as follows (lightly edited for clarity):

+ +
+ + +
$ readelf -d target/debug/examples/evtest | rg PATH
+ 0x000000000000001d (RUNPATH)            Library runpath: [
+    /nix/store/a9m53x4b3jf6mp1ll9acnh55lnx48hcj-nix-shell/lib64
+    /nix/store/a9m53x4b3jf6mp1ll9acnh55lnx48hcj-nix-shell/lib
+    /nix/store/62gwpvp0c1i97lr84az2p0qg8nliwzgh-libevdev-1.11.0/lib
+    /nix/store/z56jcx3j1gfyk4sv7g8iaan0ssbdkhz1-glibc-2.33-56/lib
+    /nix/store/c9f15p1kwm0mw5p13wsnvd1ixrhbhb12-gcc-10.3.0-lib/lib
+]
+ +
+

And sure, we see path to libevdev right there!

+

With rustflags = ["-Clink-arg=-fuse-ld=lld"], the result is different, the library is missing from rpath:

+ +
+ + +
0x000000000000001d (RUNPATH)            Library runpath: [
+    /nix/store/a9m53x4b3jf6mp1ll9acnh55lnx48hcj-nix-shell/lib64
+    /nix/store/a9m53x4b3jf6mp1ll9acnh55lnx48hcj-nix-shell/lib
+]
+ +
+

At this point, I think we know whats going on. +To recap:

+
    +
  • +With both ld and lld, we dont have problems at compile time, because pkg-config helps the compiler to find the library. +
  • +
  • +At runtime, the library linked with lld fails to find the shared library, while the one linked with ld works. +
  • +
  • +The difference between the two binaries is the value of rpath in the binary itself. +ld somehow manages to include rpath which contains path to the library. +This rpath is what allows the loader to locate the library at runtime. +
  • +
+

Curious observation: dynamic linking on NixOS is not entirely dynamic. +Because executables expect to find shared libraries in specific locations marked with hashes of the libraries themselves, its not possible to just upgrade .so on disk for all the binaries to pick it up.

+
+
+ +

+ Who sets rpath? +

+

At this point, we have only one question left:

+

Why?

+

Why do we have that magical rpath thing in one of the binaries. +The answer is simple to set rpath, one passes -rpath /nix/store/... flag to the linker at compile time. +The linker then just embeds the specified string as rpath field in the executable, without really inspecting it in any way.

+

And here comes the magical/hacky bit the thing that adds that -rpath argument to the linkers command line is the NixOS wrapper script! +That is, the ld on NixOS is not a proper ld, but rather a shell script which does a bit of extra fudging here and there, including the rpath:

+ +
+ + +
$ cat (which ld)
+<---SNIP--->
+
+# Three tasks:
+#
+#   1. Find all -L... switches for rpath
+#
+#   2. Find relocatable flag for build id.
+#
+#   3. Choose 32-bit dynamic linker if needed
+declare -a libDirs
+<---SNIP--->
+        case "$prev" in
+            -L)
+                libDirs+=("$p")
+                ;;
+<---SNIP--->
+
+    for dir in ${libDirs+"${libDirs[@]}"}; do
+        <---SNIP--->
+                extraAfter+=(-rpath "$dir")
+        <---SNIP--->
+    done
+<---SNIP--->
+/nix/store/sga0l55gm9nlwglk79lmihwb2bpv597j-binutils-2.35.2/bin/ld \
+    ${extraBefore+"${extraBefore[@]}"} \
+    ${params+"${params[@]}"} \
+    ${extraAfter+"${extraAfter[@]}"}
+ +
+

Theres a lot of going on in that wrapper script, but the relevant thing to us, as far as I understand, is that everything that gets passed as -L at compile time gets embedded into the binarys rpath, so that it can be used at runtime as well.

+

Now, lets take a look at llds wrapper:

+ +
+ + +
$ cat (which lld)
+@@@@@@@TT@@pHpH<<E8o	8o	wN:HgPHwHpp@p@ @@  Stdpp@p@ Ptd@G@@QtdRtd/nix/store/4s21k8k7p1mfik0b33r2spq5hq7774k1-glibc-2.33-108/lib/ld-linux-x86-64.so.2GNUGNU r	\X
+0F                                                                                                                                                                        <C5`
+Bx	rZ1V3	y
+ +
+

Haha, nope, theres no wrapper! +Unlike ld, lld on NixOS is an honest-to-Bosch binary file, and thats why we cant have great things! +This is tracked in issue #24744 in the nixpkgs repo :)

+

Update:

+

So.. turns out theres more than one lld on NixOS. +Theres pkgs.lld, the thing I have been using in the post. +And then theres pkgs.llvmPackages.bintools package, which also contains lld. +And that version is actually wrapped into an rpath-setting shell script, the same way ld is.

+

That is, pkgs.lld is the wrong lld, the right one is pkgs.llvmPackages.bintools.

+
+
+
+ + + + + diff --git a/2022/03/26/self-modifying-code.html b/2022/03/26/self-modifying-code.html new file mode 100644 index 00000000..fd12608c --- /dev/null +++ b/2022/03/26/self-modifying-code.html @@ -0,0 +1,646 @@ + + + + + + + Self Modifying Code + + + + + + + + + + + + +
+ +
+ +
+
+ +

Self Modifying Code

+

This post has nothing to do with JIT-like techniques for patching machine code on the fly (though they are cool!). +Instead, it describes a cute/horrible trick/hack you can use to generate source code if you are not a huge fan of macros. +The final technique is going to be independent of any particular programming language, but the lead-up is going to be Rust-specific. +The pattern can be applied to a wide variety of tasks, but well use a model problem to study different solutions.

+
+ +

+ Problem +

+

I have a field-less enum representing various error conditions:

+ +
+ + +
#[derive(Debug, Clone, Copy)]
+pub enum Error {
+  InvalidSignature,
+  AccountNotFound,
+  InsufficientBalance,
+}
+ +
+

This is a type I expect to change fairly often. +I predict that it will grow a lot. +Even the initial version contains half a dozen variants already! +For brevity, I am showing only a subset here.

+

For the purposes of serialization, I would like to convert this error to and from an error code. +One direction is easy, theres built in mechanism for this in Rust:

+ +
+ + +
impl Error {
+  pub fn as_code(self) -> u32 {
+    self as u32
+  }
+}
+ +
+

The other direction is more annoying: it isnt handled by the language automatically yet (although theres an in-progress PR which adds just that!), so we have to write some code ourselves:

+ +
+ + +
impl Error {
+  pub fn as_code(self) -> u32 {
+    self as u32
+  }
+
+  pub fn from_code(code: u32) -> Option<Error> {
+    let res = match code {
+      0 => Error::InvalidSignature,
+      1 => Error::AccountNotFound,
+      2 => Error::InsufficientBalance,
+      _ => return None,
+    };
+    Some(res)
+  }
+}
+ +
+

Now, given that I expect this type to change frequently, this is asking for trouble! +Its very easy for the match and the enum definition to get out of sync!

+

What should we do? What can we do?

+
+
+ +

+ Minimalist Solution +

+

Now, seasoned Rust developers are probably already thinking about macros (or maybe even about specific macro crates). +And well get there! +But first, lets see how I usually solve the problem, when (as I am by default) I am not keen on adding macros.

+

The idea is to trick the compiler into telling us the number of elements in the enum, which would allow us to implement some sanity checking. +We can do this by adding a fake element at the end of the enum:

+ +
+ + +
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub enum Error {
+  InvalidSignature,
+  AccountNotFound,
+  InsufficientBalance,
+  __LAST,
+}
+
+impl Error {
+  const ALL: [Error; Error::__LAST as usize] = [
+    Error::InvalidSignature,
+    Error::AccountNotFound,
+    Error::InsufficientBalance,
+  ];
+
+  pub fn from_code(code: u32) -> Option<Error> {
+    Error::ALL.get(code as usize).copied()
+  }
+  pub fn as_code(self) -> u32 {
+    Error::ALL
+      .into_iter()
+      .position(|it| it == self)
+      .unwrap_or_default() as u32
+  }
+}
+ +
+

Now, if we add a new error variant, but forget to update the ALL array, the code will fail to compile exactly the reminder we need. +The major drawback here is that __LAST variant has to exist. +This is fine for internal stuff, but something not really great for a public, clean API.

+
+
+ +

+ Minimalist Macro +

+

Now, lets get to macros, and lets start with the simplest possible one I can think of!

+ +
+ + +
define_error![
+  InvalidSignature,
+  AccountNotFound,
+  InsufficientBalance,
+];
+
+impl Error {
+  pub fn as_code(self) -> u32 {
+    self as u32
+  }
+}
+ +
+

Pretty simple, heh? Lets look at the definition of define_error! though:

+ +
+ + +
macro_rules! define_error {
+  ($($err:ident,)*) => {
+    #[derive(Debug, Clone, Copy, PartialEq, Eq)]
+    pub enum Error {
+      $($err,)*
+    }
+
+    impl Error {
+      pub fn from_code(code: u32) -> Option<Error> {
+        #![allow(non_upper_case_globals)]
+        $(const $err: u32 = Error::$err as u32;)*
+        match code {
+          $($err => Some(Error::$err),)*
+          _ => None,
+        }
+      }
+    }
+  };
+}
+ +
+

Thats quite literally a puzzle! +Declarative macro machinery is comparatively inexpressive, so you need to get creative to get what you want. +Here, ideally Id write

+ +
+ + +
match code {
+  0 => Error::InvalidSignature,
+  1 => Error::AccountNotFound,
+  2 => Error::InsufficientBalance,
+}
+ +
+

Alas, counting in macro by example is possible, but not trivial. +Its a subpuzle! +Rather than solving it, I use the following work-around:

+ +
+ + +
const InvalidSignature: u32 = Error::InvalidSignature as u32;
+match {
+  InvalidSignature => Error::InvalidSignature,
+}
+ +
+

And then I have to #![allow(non_upper_case_globals)], to prevent the compiler from complaining.

+
+
+ +

+ Idiomatic Macro +

+

The big problem with macro is that its not only the internal implementation which is baroque! +The call-site is pretty inscrutable as well! +Lets imagine we are new to a codebase, and come across the following snippet:

+ +
+ + +
define_error![
+  InvalidSignature,
+  AccountNotFound,
+  InsufficientBalance,
+];
+
+impl Error {
+  pub fn as_code(self) -> u32 {
+    self as u32
+  }
+}
+ +
+

The question I would ask here would be whats that Error thing is?. +Luckily, we live in the age of powerful IDEs, so we can just goto definition to answer that, right?

+ +
+ + +
+

Well, not really. +An IDE says that the Error token is produced by something inside that macro invocation. +Thats a correct answer, if not the most useful one! +So I have to read the definition of the define_error macro and understand how that works internally to get the idea about public API available externally (e.g., that the Error refers to a public enum). +And here the puzzler nature of declarative macros is exacerbated. +Its hard enough to figure out how to express the idea you want using the restricted language of macros. +Its doubly hard to understand the idea the macros author had when you cant peek inside their brain and observer only to the implementation of the macro.

+

One remedy here is to make macro input look more like the code we want to produce. +Something like this:

+ +
+ + +
define_error![
+  #[derive(Debug, Clone, Copy, PartialEq, Eq)]
+  pub enum Error {
+    InvalidSignature,
+    AccountNotFound,
+    InsufficientBalance,
+  }
+];
+
+impl Error {
+  pub fn as_code(self) -> u32 {
+    self as u32
+  }
+}
+ +
+

This indeed is marginally friendlier for IDEs and people to make sense of:

+ +
+ + +
+

The cost for this is a more complicated macro implementation. +Generally, a macro needs to do two things: parse arbitrary token stream input, and emit valid Rust code as output. +Parsing is usually the more complicated task. +Thats why in our minimal attempt we used maximally simple syntax, just a list of identifiers. +However, if we want to make the input of the macro look more like Rust, we have to parse a subset of Rust, and thats more involved:

+ +
+ + +
macro_rules! define_error {
+  (
+    $(#[$meta:meta])*
+    $vis:vis enum $Error:ident {
+      $($err:ident,)*
+    }
+  ) => {
+    $(#[$meta])*
+    $vis enum $Error {
+      $($err,)*
+    }
+
+    impl Error {
+      pub fn from_code(code: u32) -> Option<Error> {
+        #![allow(non_upper_case_globals)]
+        $(const $err: u32 = $Error::$err as u32;)*
+        match code {
+          $($err => Some($Error::$err),)*
+          _ => None,
+        }
+      }
+    }
+  };
+}
+
+define_error![
+  #[derive(Debug, Clone, Copy, PartialEq, Eq)]
+  pub enum Error {
+    InvalidSignature,
+    AccountNotFound,
+    InsufficientBalance,
+  }
+];
+ +
+

We have to carefully deal with all those visibilities and attributes. +Even after we do that, the connection between the input Rust-like syntax and the output Rust is skin-deep. +This is mostly smoke and mirrors, and is not much different from, e.g., using Haskell syntax here:

+ +
+ + +
macro_rules! define_error {
+  (
+    data $Error:ident = $err0:ident $(| $err:ident)*
+      $(deriving ($($derive:ident),*))?
+  ) => {
+    $(#[derive($($derive),*)])?
+    enum $Error {
+      $err0,
+      $($err,)*
+    }
+
+    impl Error {
+      pub fn from_code(code: u32) -> Option<Error> {
+        #![allow(non_upper_case_globals)]
+        const $err0: u32 = $Error::$err0 as u32;
+        $(const $err: u32 = $Error::$err as u32;)*
+        match code {
+          $err0 => Some($Error::$err0),
+          $($err => Some($Error::$err),)*
+          _ => None,
+        }
+      }
+    }
+  };
+}
+
+define_error![
+  data Error = InvalidSignature | AccountNotFound | InsufficientBalance
+    deriving (Debug, Clone, Copy, PartialEq, Eq)
+
+];
+
+impl Error {
+  pub fn as_code(self) -> u32 {
+    self as u32
+  }
+}
+ +
+
+
+ +

+ Attribute Macro +

+

We can meaningfully increase the fidelity between macro input and macro output by switching to a derive macro. +In contrast to function-like macros, derives require that their input is syntactically and even semantically valid Rust.

+

So the result looks like this:

+ +
+ + +
use macros::FromCode;
+
+#[derive(FromCode, Debug, Clone, Copy, PartialEq, Eq)]
+enum Error {
+  InvalidSignature,
+  AccountNotFound,
+  InsufficientBalance,
+}
+
+impl Error {
+  pub fn as_code(self) -> u32 {
+    self as u32
+  }
+}
+ +
+

Again, the enum Error here is an honest, simple enum! +Its not an alien beast which just wears enums skin.

+

And the implementation of the macro doesnt look too bad either, thanks to @dtolnays tasteful API design:

+ +
+ + +
use proc_macro::TokenStream;
+use quote::quote;
+use syn::{parse_macro_input, DeriveInput};
+
+#[proc_macro_derive(FromCode)]
+pub fn from_code(input: TokenStream) -> TokenStream {
+  let input = parse_macro_input!(input as DeriveInput);
+  let error_name = input.ident;
+  let enum_ = match input.data {
+    syn::Data::Enum(it) => it,
+    _ => panic!("expected an enum"),
+  };
+
+  let arms =
+    enum_.variants.iter().enumerate().map(|(i, var)| {
+      let i = i as u32;
+      let var_name = &var.ident;
+      quote! {
+        #i => Some(#error_name::#var_name),
+      }
+    });
+
+  quote! {
+    impl #error_name {
+      pub fn from_code(code: u32) -> Option<#error_name> {
+        match code {
+          #(#arms)*
+          _ => None,
+        }
+      }
+    }
+  }
+  .into()
+}
+ +
+

Unlike declarative macros, here we just directly express the syntax that we want to emit a match over consecutive natural numbers.

+

The biggest drawback here is that on the call-site now we dont have any idea about the extra API generated by the macro. +If, with declarative macros, you can notice an pub fn from_code in the same file and guess that thats a part of an API, with a procedural macro that string is in a completely different crate! +While proc-macro can greatly improve the ergonomics of using and implementing macros (inflated compile times notwithstanding), for the reader, they are arguably even more opaque than declarative macros.

+
+
+ +

+ Self Modifying Code +

+

Finally, lets see the promised hacky solution :) +While, as you might have noticed, I am not a huge fan of macros, I like plain old code generation text in, text out. +Text manipulation is much worse-is-betterer than advanced macro systems.

+

So what we are going to do is:

+
    +
  • +Read the file with the enum definition as a string (file!() macro will be useful here). +
  • +
  • +“Parse enum definition using unsophisticated string splitting (str::split_once, aka cut would be our parser). +
  • +
  • +Generate the code we want by concatenating strings. +
  • +
  • +Paste the resulting code into a specially marked position. +
  • +
  • +Overwrite the file in place, if there are changes. +
  • +
  • +And we are going to use a #[test] to drive the process! +
  • +
+ +
+ + +
#[derive(Debug, Clone, Copy)]
+pub enum Error {
+  InsufficientBalance,
+  InvalidSignature,
+  AccountNotFound,
+}
+
+impl Error {
+  fn as_code(self) -> u32 {
+    self as u32
+  }
+
+  fn from_code(code: u32) -> Option<Error> {
+    let res = match code {
+      // region:sourcegen
+      0 => Error::InsufficientBalance,
+      1 => Error::InvalidSignature,
+      2 => Error::AccountNotFound,
+      // endregion:sourcegen
+      _ => return None,
+    };
+    Some(res)
+  }
+}
+
+#[test]
+fn sourcegen_from_code() {
+  let original_text = std::fs::read_to_string(file!()).unwrap();
+  let (_, variants, _) =
+    split_twice(&original_text, "pub enum Error {\n", "}")
+      .unwrap();
+
+  let arms = variants
+    .lines()
+    .map(|line| line.trim().trim_end_matches(','))
+    .enumerate()
+    .map(|(i, var)| format!("      {i} => Error::{var},\n"))
+    .collect::<String>();
+
+  let new_text = {
+    let start_marker = "      // region:sourcegen\n";
+    let end_marker = "      // endregion:sourcegen\n";
+    let (prefix, _, suffix) =
+      split_twice(&original_text, start_marker, end_marker)
+        .unwrap();
+    format!("{prefix}{start_marker}{arms}{end_marker}{suffix}")
+  };
+
+  if new_text != original_text {
+    std::fs::write(file!(), new_text).unwrap();
+    panic!("source was not up-to-date")
+  }
+}
+
+fn split_twice<'a>(
+  text: &'a str,
+  start_marker: &str,
+  end_marker: &str,
+) -> Option<(&'a str, &'a str, &'a str)> {
+  let (prefix, rest) = text.split_once(start_marker)?;
+  let (mid, suffix) = rest.split_once(end_marker)?;
+  Some((prefix, mid, suffix))
+}
+ +
+

Thats the whole pattern! +Note how, unlike every other solution, it is crystal clear how the generated code works. +Its just code which you can goto-definition, or step through in debugging. +You can be completely oblivious about the shady #[test] machinery, and that wont harm understanding in any way.

+

The code of the macro is also easy to understand thats literally string manipulation. +Whats more, you can easily see how it works by just running the test!

+

The read and update your own source code part is a bit mind-bending! +But the implementation is tiny and only uses the standard library, so it should be easy to understand.

+

Unlike macros, this doesnt try to enforce at compile time that the generated code is fresh. +If you update the Error definition, you need to re-run test for the generated code to be updated as well. +But this will be caught by the tests. +Note the important detail the test only tries to update the source code if there are, in fact, changes. +That is, writable src/ is required only during development.

+

Thats all, hope this survey was useful! Discussion on /r/rust.

+
+
+
+ + + + + diff --git a/2022/04/25/why-lsp.html b/2022/04/25/why-lsp.html new file mode 100644 index 00000000..64452d96 --- /dev/null +++ b/2022/04/25/why-lsp.html @@ -0,0 +1,327 @@ + + + + + + + Why LSP? + + + + + + + + + + + + +
+ +
+ +
+
+ +

Why LSP?

+

LSP (language server protocol) is fairly popular today. +Theres a standard explanation of why that is the case. +You probably have seen this picture before:

+ +
+ + +
+

I believe that this standard explanation of LSP popularity is wrong. +In this post, I suggest an alternative picture.

+
+ +

+ Standard Explanation +

+

The explanation goes like this:

+

There are M editors and N languages. +If you want to support a particular language in a particular editor, you need to write a dedicated plugin for that. +That means M * N work, as the picture on the left vividly demonstrates. +What LSP does is cutting that to M + N, by providing a common thin waist, as show on the right picture.

+
+
+ +

+ Why is the explanation wrong? +

+

The problem with the explanation is that also best to illustrate pictorially. +In short, the picture above is not drawn to scale. +Heres a better illustration of how, for example, rust-analyzer + VS Code combo works together:

+ +
+ + +
+

The (big) ball on the left is rust-analyzer a language server. +The similarly sized ball on the right is VS Code an editor. +And the small ball in the center is the code to glue them together, including LSP implementations.

+

That code is relatively and absolutely tiny. +The codebases behind either the language server or the editor are enormous.

+

If the standard theory were correct, then, before LSP, we would have lived in a world where some languages has superb IDE support in some editors. +For example, IntelliJ would have been great at Java, Emacs at C++, Vim at C#, etc. +My recollection of that time is quite different. +To get a decent IDE support, you either used a language supported by JetBrains (IntelliJ or ReSharper) or.

+

There was just a single editor providing meaningful semantic IDE support.

+
+
+ +

+ Alternative Theory +

+

I would say that the reason for such poor IDE support in the days of yore is different. +Rather than M * N being too big, it was too small, because N was zero and M just slightly more than that.

+

Id start with N the number of language servers, this is the side I am relatively familiar with. +Before LSP, there simply werent a lot of working language-server shaped things. +The main reason for that is that building a language server is hard.

+

The essential complexity for a server is pretty high. +It is known that compilers are complicated, and a language server is a compiler and then some.

+

First, like a compiler, a language server needs to fully understand the language, it needs to be able to distinguish between valid and invalid programs. +However, while for invalid programs a batch compiler is allowed to emit an error message and exit promptly, a language server must analyze any invalid program as best as it can. +Working with incomplete and invalid programs is the first complication of a language server in comparison to a compiler.

+

Second, while a batch compiler is a pure function which transforms source text into machine code, a language server has to work with a code base which is constantly being modified by the user. +It is a compiler with a time dimension, and evolution of state over time is one of the hardest problems in programming.

+

Third, a batch compiler is optimized for maximum throughput, while a language server aims to minimize latency (while not completely forgoing throughput). +Adding a latency requirement doesnt mean that you need to optimize harder. +Rather, it means that you generally need to turn the architecture on its head to have an acceptable latency at all.

+

And this brings us to a related cluster of accidental complexity surrounding language servers. +It is well understood how to write a batch compiler. +Its common knowledge. +While not everyone have read the dragon book (I didnt meaningfully get past the parsing chapters), everyone knows that that book contains all the answers. +So most existing compilers end up looking like a typical compiler. +And, when compiler authors start thinking about IDE support, the first thought is well, IDE is kinda a compiler, and we have a compiler, so problem solved, right?. +This is quite wrong internally an IDE is very different from a compiler but, until very recently, this wasnt common knowledge.

+

Language servers are a counter example to the never rewrite rule. +Majority of well regarded language servers are rewrites or alternative implementations of batch compilers.

+

Both IntelliJ and Eclipse wrote their own compilers rather than re-using javac inside an IDE. +To provide an adequate IDE support for C#, Microsoft rewrote their batch compiler written in C++ into an interactive self-hosted one (project Roslyn). +Dart, despite being a from-scratch, relatively modern language, ended up with three implementations (host AOT compiler, host IDE compiler (dart-analyzer), on-device JIT compiler). +Rust tried both incremental evolution of rustc (RLS) and from-scratch implementation (rust-analyzer), and rust-analyzer decisively won.

+

The two exceptions I know are C++ and OCaml. +Curiously, both require forward declarations and header files, and I dont think this is a coincidence. +See the Three Architectures for a Responsive IDE post for details.

+

To sum up, on the language servers side things were in a bad equilibrium. +It was totally possible to implement language servers, but that required a bit of an iconoclastic approach, and its hard to be a pioneering iconoclast.

+

I am less certain what was happening on the editors side. +Still, I do want to claim that we had no editors capable of being an IDE.

+

IDE experience consists of a host of semantic features. +The most notable example is, of course completion. +If one wants to implement custom completion for VS Code, one needs to implement +CompletionItemProvider interface:

+ +
+ + +
interface CompletionItemProvider {
+    provideCompletionItems(
+        document: TextDocument,
+        position: Position,
+    ): CompletionItem[]
+}
+ +
+

This means that, in VS Code, code completion (as well as dozens of other IDE related features) is an editors first-class concept, with uniform user UI and developer API.

+

Contrast this with Emacs and Vim. +They just dont have proper completion as an editors extension point. +Rather, they expose low-level cursor and screen manipulation API, and then people implement competing completion frameworks on top of that!

+

And thats just code completion! +What about parameter info, inlay hints, breadcrumbs, extend selection, assists, symbol search, find usages (Ill stop here :) )?

+

To sum the above succinctly, the problem with decent IDE support was not of N * M, but rather of an inadequate equilibrium of a two-sided market.

+

Language vendors were reluctant to create language servers, because it was hard, the demand was low (= no competition from other languages), and, even if one creates a language server, one would find a dozen editors absolutely unprepared to serve as a host for a smart server.

+

On the editors side, there was little incentive for adding high-level APIs needed for IDEs, because there were no potential providers for those APIs.

+
+
+ +

+ Why LSP is great +

+

And thats why I think LSP is great!

+

I dont think it was a big technical innovation (its obvious that you want to separate a language-agnostic editor and a language-specific server). +I think its a rather bad (aka, good enough) technical implementation (stay tuned for Why LSP sucks? post I guess? (update)). +But it moved us from a world where not having a language IDE was normal and no one was even thinking about language servers, to a world where a language without working completion and goto definition looks unprofessional.

+

Notably, the two-sided market problem was solved by Microsoft, who were a vendor of both languages (C# and TypeScript) and editors (VS Code and Visual Studio), and who were generally losing in the IDE space to a competitor (JetBrains). +While I may rant about particular technical details of LSP, I absolutely admire their strategic vision in this particular area. +They:

+
    +
  • +built an editor on web technologies. +
  • +
  • +identified webdev as a big niche where JetBrains struggles (supporting JS in an IDE is next to impossible). +
  • +
  • +built a language (!!!!) to make it feasible to provide IDE support for webdev. +
  • +
  • +built an IDE platform with a very forward-looking architecture (stay tuned for a post where I explain why vscode.d.ts is a marvel of technical excellence). +
  • +
  • +launched LSP to increase the value of their platform in other domains for free (moving the whole world to a significantly better IDE equilibrium as a collateral benefit). +
  • +
  • +and now, with code spaces, are posed to become the dominant player in the remote first development, should we indeed stop editing, building, and running code on our local machines. +
  • +
+

Though, to be fair, I still hope that, in the end, the winner would be JetBrains with their idea of Kotlin as a universal language for any platform :-) +While Microsoft takes full advantage of worse-is-better technologies which are dominant today (TypeScript and Electron), JetBrains tries to fix things from the bottom up (Kotlin and Compose).

+
+
+ +

+ More on M * N +

+

Now I am just going to hammer it in that its really not M * N :)

+

First, M * N argument ignores the fact that this is an embarrassingly parallel problem. +Neither language designers need to write plugins for all editors, nor editors need to add special support for all languages. +Rather, a language should implement a server which speaks some protocol, an editor needs to implement language agnostic APIs for providing completions and such, and, if both the language and the editor are not esoteric, someone who is interested in both would just write a bit of glue code to bind the two together! +rust-analyzers VS Code plugin is 3.2k lines of code, neovim plugin is 2.3k and Emacs plugin is 1.2k. +All three are developed independently by different people. +Thats the magic of decentralized open source development at its finest! +If the plugins were to support custom protocol instead of LSP (provided that the editor supports high-level IDE API inside), Id expect to add maybe 2k lines for that, which is still well within hobbyist working part-time budget.

+

Second, for M * N optimization youd expect the protocol implementation to be generated from some machine readable implementation. +But until the latest release, the source of truth for LSP spec was an informal markdown document. +Every language and client was coming up with their own way to extract protocol out of it, many (including rust-analyzer) were just syncing the changes manually, with quite a bit of dupliction.

+

Third, if M * N is a problem, youd expect to see only one LSP implementation for each editor. +In reality, there are two competing Emacs implementations (lsp-mode and eglot) and, I kid you not, at the time of writing rust-analyzers manual contains instruction for integration with 6 (six) different LSP clients for vim. +To echo the first point, this is open source! +The total amount of work is almost irrelevant, the thing that matters is the amount of coordination to get things done.

+

Fourth, Microsoft itself doesnt try to take advantage of M + N. +Theres no universal LSP implementation in VS Code. +Instead, each language is required to have a dedicated plugin with physically independent implementations of LSP.

+
+
+ +

+ Action Items +

+
+
Everyone
+
+

Please demand better IDE support! +I think today we crossed the threshold of general availability of baseline IDE support, but theres so much we can do beyond the basics. +In the ideal world, it should be possible to inspect every little semantic details about expression at the cursor, using the same simple API one can use today to inspect contents of editors buffer.

+
+
Text Editor Authors
+
+

Pay attention to the architecture of VS Code. +While electron delivers questionable user experience, the internal architecture has a lot of wisdom in it. +Do orient editors API around presentation-agnostic high-level features. +Basic IDE functionality should be a first-class extension point, it shouldnt be re-invented by every plugins author. +In particular, add assist/code action/💡 as a first-class UX concept already. +Its the single most important UX innovation of IDEs, which is very old at this point. +Its outright ridiculous that this isnt a standard interface across all editors.

+

But dont make LSP itself a first class concept. +Surprising as it might seem, VS Code knows nothing about LSP. +It just provides a bunch of extension points without caring the least how they are implemented. +LSP implementation then is just a library, which is used by language-specific plugins. +E.g., Rust and C++ extensions for VS Code do not share the same LSP implementation at runtime, there are two copies of LSP library in memory!

+

Also, try to harness the power of open-source. +Dont enforce centralization of all LSP implementations! +Make it possible for separate groups of people to independently work on perfect Go support and perfect Rust support for your editor. +VS Code is one possible model, with a marketplace and distributed, independent plugins. +But it probably should be possible to organize the work as a single shared repo/source tree, as long as languages can have independent maintainers sets

+
+
Language Server Authors
+
+

You are doing a great job! +The quality of IDE support is improving rapidly for all the languages, though I feel this is only a beginning of a long road. +One thing to keep in mind is that LSP is an interface to a semantic info about the language, but it isnt the interface. +A better thing might come along. +Even today, limitations of LSP prevent from shipping useful features. +So, try to treat LSP as a serialization format, not as an internal data model. +And try to write more about how to implement language servers I feel like theres still not enough knowledge about this out there.

+
+
+

Thats it!

+
+

P.S. If by any chance you are benefiting from using rust-analyzer, consider sponsoring Ferrous Systems Open Source Collective for rust-analyzer to support its development!

+
+
+
+ + + + + diff --git a/2022/05/29/binary-privacy.html b/2022/05/29/binary-privacy.html new file mode 100644 index 00000000..8917c101 --- /dev/null +++ b/2022/05/29/binary-privacy.html @@ -0,0 +1,200 @@ + + + + + + + Binary Privacy + + + + + + + + + + + + +
+ +
+ +
+
+ +

Binary Privacy

+

This post documents one rule of thumb I find useful when coding:

+ + +

Being a rule-of-thumb, it naturally has exceptions, but those are relatively few. +The primary context here is application development. +Libraries with semver-constrained API have other guidelines the rules are different at the boundaries.

+

This privacy rule is a manifestation of the fact that the two most popular kinds of entities in programs are:

+ +

If some fields of a type are private, it cant be data. +If some fields of a type are public, it can still be an ADT, but the abstraction boundary will be a bit awkward. +Better to just add getters for (usually few) fields which can be public, to make it immediately obvious what role is played by the type.

+

An example of ADT would be FileSet from rust-analyzers virtual file system implementation.

+ +
+ + +
#[derive(Default)]
+pub struct FileSet {
+  files: HashMap<VfsPath, FileId>,
+  paths: HashMap<FileId, VfsPath>,
+}
+
+impl FileSet {
+  pub fn insert(&mut self, file_id: FileId, path: VfsPath) {
+    self.files.insert(path.clone(), file_id);
+    self.paths.insert(file_id, path);
+  }
+
+  pub fn len(&self) -> usize {
+    self.files.len()
+  }
+
+  pub fn file_for_path(
+    &self,
+    path: &VfsPath,
+  ) -> Option<FileId> {
+    self.files.get(path).copied()
+  }
+
+  pub fn path_for_file(
+    &self,
+    file: &FileId,
+  ) -> Option<&VfsPath> {
+    self.paths.get(file)
+  }
+
+  pub fn iter(&self) -> impl Iterator<Item = FileId> + '_ {
+    self.paths.keys().copied()
+  }
+}
+ +
+

This type maintains a bidirectional mapping between string paths and integral file ids. +How exactly the mapping is maintained (hash map, search tree, trie?) is irrelevant, this implementation detail is abstracted away. +Additionally, theres an invariant: files and paths fields are consistent, complimentary mappings. +So this is the case where all fields are private and theres a bunch of accessor functions.

+

An example of data would be Directories struct:

+ +
+ + +
#[derive(Debug, Clone, Default)]
+pub struct Directories {
+    pub extensions: Vec<String>,
+    pub include: Vec<AbsPathBuf>,
+    pub exclude: Vec<AbsPathBuf>,
+}
+ +
+

This type specifies a set of paths to include in VFS, a sort-of simplified gitignore. +This is an inert piece of data a bunch of extensions, include paths and exclude paths. +Any combination of the three is valid, so theres no need for privacy here.

+
+ +

+ Connections +

+

This rule is very mechanical, but it reflects a deeper distinction between flavors of types. +For a more thorough treatment of the underlying phenomenon, see Be clear what kind of class youre writing chapter from Alexandrescus C++ Coding Standards and +The Expression Problem from ever thought-provoking Kaminski.

+
+
+
+ + + + + diff --git a/2022/05/29/builder-lite.html b/2022/05/29/builder-lite.html new file mode 100644 index 00000000..c72889fa --- /dev/null +++ b/2022/05/29/builder-lite.html @@ -0,0 +1,243 @@ + + + + + + + Builder Lite + + + + + + + + + + + + +
+ +
+ +
+
+ +

Builder Lite

+

In this short post, I describe and name a cousin of the builder pattern builder lite.

+

Unlike a traditional builder, which uses a separate builder object, builder lite re-uses the object itself to provide builder functionality.

+

Heres an illustrative example

+ +
+
Builder Lite
+ + +
pub struct Shape {
+  position: Vec3,
+  geometry: Geometry,
+  material: Option<Material>,
+}
+
+impl Shape {
+  pub fn new(geometry: Geometry) -> Shape {
+    Shape {
+      position: Vec3::default(),
+      geometry,
+      material: None,
+    }
+  }
+
+  pub fn with_position(mut self, position: Vec3) -> Shape {
+    self.position = position;
+    self
+  }
+
+  pub fn with_material(mut self, material: Material) -> Shape {
+    self.material = Some(material);
+    self
+  }
+}
+
+// Call site
+
+let shape = Shape::new(Geometry::Sphere::with_radius(1))
+  .with_position(Vec3(0, 9, 2))
+  .with_material(Material::SolidColor(Color::Red));
+ +
+

In contrast, the full builder is significantly wordier at the definition site, and requires a couple of extra invocations at the call site:

+ +
+
Builder
+ + +
pub struct Shape {
+  position: Vec3,
+  geometry: Geometry,
+  material: Option<Material>,
+}
+
+pub struct ShapeBuilder {
+  position: Option<Vec3>,
+  geometry: Option<Geometry>,
+  texture: Option<Texture>,
+}
+
+impl Shape {
+  pub fn builder() -> ShapeBuilder { ... }
+}
+
+impl ShapeBuilder {
+  pub fn position(&mut self, position: Vec3) -> &mut Self { ... }
+  pub fn geometry(&mut self, geometry: Geometry) -> &mut Self { ... }
+  pub fn material(&mut self, material: Material) -> &mut Self { ... }
+  pub fn build(&self) -> Shape { ... }
+}
+
+// Call site
+
+let shape = Shape::builder()
+  .position(Vec3(9, 2))
+  .geometry(Geometry::Sphere::with_radius(1))
+  .material(Material::SolidColor(Color::Red))
+  .build();
+ +
+

The primary benefit of builder-lite is that it is an incremental, zero-cost evolution from the new method. +As such, it is especially useful in the context where the code evolves rapidly, in an uncertain direction. +That is, when building applications rather than library.

+

To pull a motivational example from work, we had the following typical code:

+ +
+ + +
impl PeerManagerActor {
+  pub fn new(
+    store: Store,
+    config: NetworkConfig,
+    client_addr: Recipient<NetworkClientMessages>,
+    view_client_addr: Recipient<NetworkViewClientMessages>,
+    routing_table_addr: Addr<RoutingTableActor>,
+  ) -> anyhow::Result<Self> {
+ +
+

Heres a new method with a whole bunch of arguments for various dependencies. +What we needed to do is to add yet another dependency, so that it could be overwritten in tests. +The first attempt just added one more parameter to the new method:

+ +
+ + +
  pub fn new(
+    store: Store,
+    config: NetworkConfig,
+    client_addr: Recipient<NetworkClientMessages>,
+    view_client_addr: Recipient<NetworkViewClientMessages>,
+    routing_table_addr: Addr<RoutingTableActor>,
++   ping_counter: Box<dyn PingCounter>,
+  ) -> anyhow::Result<Self> {
+ +
+

However, this change required update of the seven call-sites where the new was called to supply the default counter. +Switching that to builder lite allowed us to only modify a single call-site where we cared to override the counter.

+

A note on naming:
+If builder methods are to be used only occasionally, with_foo is the best naming. +If most call-sites make use of builder methods, just .foo might work better. +For boolean properties, sometimes it makes sense to have both:

+ +
+ + +
pub fn fancy(mut self) -> Self {
+  self.with_fancy(true)
+}
+
+pub fn with_fancy(mut self, yes: bool) -> Self {
+  self.fancy = yes;
+  self
+}
+ +
+

Discussion on /r/rust.

+
+
+ + + + + diff --git a/2022/06/11/caches-in-rust.html b/2022/06/11/caches-in-rust.html new file mode 100644 index 00000000..be7b6593 --- /dev/null +++ b/2022/06/11/caches-in-rust.html @@ -0,0 +1,432 @@ + + + + + + + Caches In Rust + + + + + + + + + + + + +
+ +
+ +
+
+ +

Caches In Rust

+

In this post Ill describe how to implement caches in Rust. +It is inspired by two recent refactors I landed at nearcore (nearcore#6549, nearcore#6811). +Based on that experience, it seems that implementing caches wrong is rather easy, and making a mistake there risks spilling over, and spoiling the overall architecture of the application a bit.

+

Lets start with an imaginary setup with an application with some configuration and a database:

+ +
+ + +
struct App {
+  config: Config,
+  db: Db,
+}
+ +
+

The database is an untyped key-value store:

+ +
+ + +
impl Db {
+  pub fn load(&self, key: &[u8]) -> io::Result<Option<Vec<u8>>> {
+    ...
+  }
+}
+ +
+

And the App encapsulates database and provides typed access to domain-specific Widget:

+ +
+ + +
#[derive(serde::Serialize, serde::Deserialize)]
+struct Widget {
+  title: String,
+}
+
+impl App {
+  pub fn get_widget(
+    &self,
+    id: u32,
+  ) -> io::Result<Option<Widget>> {
+    let key = id.to_be_bytes();
+    let value = match self.db.load(&key)? {
+      None => return Ok(None),
+      Some(it) => it,
+    };
+
+    let widget: Widget =
+      bincode::deserialize(&value).map_err(|it| {
+        io::Error::new(io::ErrorKind::InvalidData, it)
+      })?;
+
+    Ok(Some(widget))
+  }
+}
+ +
+

Now, for the sake of argument lets assume that database access and subsequent deserialization are costly, and that we want to add a cache of Widgets in front of the database. +Data-oriented thinking would compel us to get rid of deserialization step instead, but we will not pursue that idea this time.

+

Well use a simple HashMap for the cache:

+ +
+ + +
struct App {
+  config: Config,
+  db: Db,
+  cache: HashMap<u32, Widget>,
+}
+ +
+

And we need to modify get_widget method to return the value from the cache, if there is one:

+ +
+ + +
impl App {
+  pub fn get_widget(
+    &mut self,
+    id: u32,
+  ) -> io::Result<Option<&Widget>> {
+
+    if self.cache.contains_key(&id) {
+      let widget = self.cache.get(&id).unwrap();
+      return Ok(Some(widget));
+    }
+
+    let key = id.to_be_bytes();
+    let value = match self.db.load(&key)? {
+      None => return Ok(None),
+      Some(it) => it,
+    };
+    let widget: Widget =
+      bincode::deserialize(&value).map_err(|it| {
+        io::Error::new(io::ErrorKind::InvalidData, it)
+      })?;
+
+    self.cache.insert(id, widget);
+    let widget = self.cache.get(&id).unwrap();
+
+    Ok(Some(widget))
+  }
+}
+ +
+

The biggest change is the &mut self. +Even when reading the widget, we need to modify the cache, and the easiest way to get that ability is to require an exclusive reference.

+

I want to argue that this path of least resistance doesnt lead to a good place. +There are many problems with methods of the following-shape:

+ +
+ + +
fn get(&mut self) -> &Widget
+ +
+

First, such methods conflict with each other. +For example, the following code wont work, because well try to borrow the app exclusively twice.

+ +
+ + +
let app: &mut App = ...;
+let w1 = app.get_widget(1)?;
+let w2 = app.get_widget(2)?;
+ +
+

Second, the &mut methods conflict even with & methods. +Naively, it would seem that, as get_widget returns a shared reference, we should be able to call & methods. +So, one can expect something like this to work:

+ +
+ + +
let w: &Widget = app.get_widget(1)?.unwrap();
+let c: &Color = &app.config.main_color;
+ +
+

Alas, it doesnt. +Rust borrow checker doesnt distinguish between mut and non-mut lifetimes (for a good reason: doing that would be unsound). +So, although w is just &Widget, the lifetime there is the same as on the &mut self, so the app remains mutably borrowed while the widget exists.

+

Third, perhaps the most important point, the &mut self becomes viral most of functions in the program begin requiring &mut, and you lose type-system distinction between read-only and read-write operations. +Theres no distinction between this function can only modify the cache and this function can modify literally everything.

+

Finally, even implementing get_widget is not pleasant. +Seasoned rustaceans among you might twitch at the needlessly-repeated hashmap lookups. +But trying to get rid of those with the help of the entry-API runs into current borrow checker limitations.

+

Lets look at how we can better tackle this!

+

The general idea for this class of problems is to think what the ownership and borrowing situation should be and try to achieve that, as opposed to merely following suggestions by the compiler. +That is, most of the time just using &mut and & as compiler guides you is a path to success, as, it turns out, majority of the code naturally follows simple aliasing rules. +But there are exceptions, its important to recognize them as such and make use of interior mutability to implement the aliasing structure which makes sense.

+

Lets start with a simplified case. +Suppose that theres only one Widget to deal with. +In this case, wed want something like this:

+ +
+ + +
struct App {
+  ...
+  cache: Option<Widget>,
+}
+
+impl App {
+  fn get_widget(&self) -> &Widget {
+    if let Some(widget) = &self.cache {
+      return widget;
+    }
+    self.cache = Some(create_widget());
+    self.cache.as_ref().unwrap()
+  }
+}
+ +
+

This doesnt work as is modifying the cache needs &mut which wed very much prefer to avoid. +However, thinking about this pattern, it feels like it should be valid we enforce at runtime that the contents of the cache is never overwritten. +That is, we actually do have exclusive access to cache on the highlighted line at runtime, we just cant explain that to the type system. +But we can reach out for unsafe for that. +Whats more, Rusts type system is powerful enough to encapsulate that usage of unsafe into a safe and generally re-usable API. +So lets pull once_cell crate for this:

+ +
+ + +
struct App {
+  ...
+  cache: once_cell::sync::OnceCell<Widget>,
+}
+
+impl App {
+  fn get_widget(&self) -> &Widget {
+    self.cache.get_or_init(create_widget)
+  }
+}
+ +
+

Coming back to the original hash-map example, we can apply the same logic here: +as long as we never overwrite, delete or move values, we can safely return references to them. +This is handled by the elsa crate:

+ +
+ + +
struct App {
+  config: Config,
+  db: Db,
+  cache: elsa::map::FrozenMap<u32, Box<Widget>>,
+}
+
+impl App {
+  pub fn get_widget(
+    &self,
+    id: u32,
+  ) -> io::Result<Option<&Widget>> {
+    if let Some(widget) = self.cache.get(&id) {
+      return Ok(Some(widget));
+    }
+
+    let key = id.to_be_bytes();
+    let value = match self.db.load(&key)? {
+      None => return Ok(None),
+      Some(it) => it,
+    };
+    let widget: Widget =
+      bincode::deserialize(&value).map_err(|it| {
+        io::Error::new(io::ErrorKind::InvalidData, it)
+      })?;
+
+    let widget = self.cache.insert(id, Box::new(widget));
+
+    Ok(Some(widget))
+  }
+}
+ +
+

The third case is that of a bounded cache. +If you need to evict values, than the above reasoning does not apply. +If the user of a cache gets a &T, and than the corresponding entry is evicted, the reference would dangle. +In this situations, we want the clients of the cache to co-own the value. +This is easily handled by an Rc:

+ +
+ + +
struct App {
+  config: Config,
+  db: Db,
+  cache: RefCell<lru::LruCache<u32, Rc<Widget>>>,
+}
+
+impl App {
+  pub fn get_widget(
+    &self,
+    id: u32,
+  ) -> io::Result<Option<Rc<Widget>>> {
+    {
+      let mut cache = self.cache.borrow_mut();
+      if let Some(widget) = cache.get(&id) {
+        return Ok(Some(Rc::clone(widget)));
+      }
+    }
+
+    let key = id.to_be_bytes();
+    let value = match self.db.load(&key)? {
+      None => return Ok(None),
+      Some(it) => it,
+    };
+    let widget: Widget =
+      bincode::deserialize(&value).map_err(|it| {
+        io::Error::new(io::ErrorKind::InvalidData, it)
+      })?;
+
+    let widget = Rc::new(widget);
+    {
+      let mut cache = self.cache.borrow_mut();
+      cache.put(id, Rc::clone(&widget));
+    }
+
+    Ok(Some(widget))
+  }
+}
+ +
+

To sum up: when implementing a cache, the path of the least resistance is to come up with a signature like this:

+ +
+ + +
fn get(&mut self) -> &T
+ +
+

This often leads to problems down the line. +Its usually better to employ some interior mutability and get either of these instead:

+ +
+ + +
fn get(&self) -> &T
+fn get(&self) -> T
+ +
+

This is an instance of the more general effect: despite the mutability terminology, Rust references track not mutability, but aliasing. +Mutability and exclusive access are correlated, but not perfectly. +Its important to identify instances where you need to employ interior mutability, often they are architecturally interesting.

+

To learn more about relationships between aliasing and mutability, I recommend the following two posts:

+
+
Rust: A unique perspective
+
+

https://limpet.net/mbrubeck/2019/02/07/rust-a-unique-perspective.html

+
+
Accurate mental model for Rust’s reference types
+
+

https://docs.rs/dtolnay/latest/dtolnay/macro._02__reference_types.html

+
+
+

Finally, the borrow checker limitation is explained (with much skill and humor, I should add), in this document:

+
+
Polonius the Crab
+
+

https://docs.rs/polonius-the-crab/0.2.1/polonius_the_crab/

+
+
+

Thats all! Discussion on /r/rust.

+
+
+ + + + + diff --git a/2022/06/29/notes-on-gats.html b/2022/06/29/notes-on-gats.html new file mode 100644 index 00000000..32728225 --- /dev/null +++ b/2022/06/29/notes-on-gats.html @@ -0,0 +1,137 @@ + + + + + + + Notes on GATs + + + + + + + + + + + + +
+ +
+ +
+
+ +

Notes on GATs

+

Theres a bit of discussion happening in Rust community on the generic associated types topic. +I can not help but add my own thoughts to the pile :-)

+

I dont intend to write a well-edited post considering all pros and cones (intentional typo to demonstrate how unedited this is). +Rather, I just want to dump my experience as is. +Ultimately I trust the lang team to make the right call here way more than I trust myself. +The post could be read as a bit inflammatory, but my stated goal here is not to sway someones mind by the arguments, but rather expose my own thinking process.

+

This post is partially prompted by the following comment from the RFC:

+ +
+

I probably have GATs in every project I do write.

+
+ +
+

It stuck with me, because this is very much the opposite of the experience I have. +Ive been using Rust extensively for a while, mostly as an application (as opposed to library) developer, and I cant remember a single instance where I really wanted to have GATs. +This is a consequences of my overall code style I try to use abstraction sparingly and rarely reach out for traits. +I dont think Ive ever build a meaningful abstraction which was expressed via traits? +On the contrary, I try hard to make everything concrete and non-generic on the language level.

+

Whats more, when I do reach out for traits, most of the time this is to use trait objects, which give me a new runtime capability to use different, substitutable concrete type. +For the static,monomorphization based subset of traits I find that most of the time non-trait solution seem to work.

+

And I think GATs (and associated types in general) dont work with trait objects, which probably explains why, even when I use traits, I dont generally need GATs. +Though, it seems to me that lifetime-only subset of GATs actually works with trait objects? +That is, lending iterator seems to be object safe?

+

I guess, the only place where I do, indirectly, want GATs is to make async trait work, but even then, I usually am interested in object-safe async traits, which I think dont need and cant use GATs?

+
+

Another disconnection between my usage of Rust and discussion surrounding the GATs is in one of the prominent examples parser combinator library. +In practice, for me parser combinators primary use-case was always a vehicle for teaching advanced types (eg, the monads paper uses parsers as one of the examples). +For production use-cases Ive encountered, it was always either a hand-written parser, or a full-blown parser generator.

+
+
+ + + + + diff --git a/2022/07/04/unit-and-integration-tests.html b/2022/07/04/unit-and-integration-tests.html new file mode 100644 index 00000000..420489ca --- /dev/null +++ b/2022/07/04/unit-and-integration-tests.html @@ -0,0 +1,257 @@ + + + + + + + Unit and Integration Tests + + + + + + + + + + + + +
+ +
+ +
+
+ +

Unit and Integration Tests

+

In this post I argue that integration-vs-unit is a confused, and harmful, distinction. +I provide a more useful two-dimensional mental model instead. +The model is descriptive (it allows to think more clearly about any test), but I also include my personal prescriptions (the model shows metrics which are and arent worth optimizing).

+

Credit for the idea goes to the SWE book. +I always felt that integration versus unit debate is confused, the book helped me to formulate in which way exactly.

+

I wont actually rigorously demonstrate the existing confusion I find it self-evident. +As just two examples:

+ +

Most of the time, its more productive to speak about just tests, or maybe automated tests, rather than argue where something should be considered a unit or an integration tests.

+

But I argue that a useful, more precise classification exists.

+
+ +

+ Purity +

+

The first axis of classification is, broadly speaking, performance. +“How much time would a thousand similar tests take? is a very useful metric. +The dependency between the time from making an edit to getting the test results and most other interesting metrics in software (performance, time to fix defects, security) is super-linear. +Tests longer than attention span obliterate productivity.

+

Its useful to take a closer look at what constitutes a performant test. +One non-trivial observation here is that test speed is categorical, rather than numerical. +Certain tests are order-of-magnitude slower than others. +Consider the following list:

+
    +
  1. +Single-threaded pure computation +
  2. +
  3. +Multi-threaded parallel computation +
  4. +
  5. +Multi-threaded concurrent computation with time-based synchronization and access to disk +
  6. +
  7. +Multi-process computation +
  8. +
  9. +Distributed computation +
  10. +
+

Each step of this ladder adds half-an-order of magnitude to a tests runtime.

+

Time is not the only thing affected the higher you go, the bigger is the fraction of flaky tests. +Its nigh impossible to make a test for a pure function flaky. +If you add threads into the mix, keeping flakiness out requires some careful thinking about synchronization. +And if the tests spans several processes, it is almost bound to fail under some more unusual circumstances.

+

Yet another effect we observe along this axis is resilience to unrelated changes. +The more of operating system and other processes is involved in the test, the higher is the probability that some upgrade somewhere breaks something.

+

I think the purity concept from functional programming is a good way to generalize this axis of the differences between the tests. +Pure test do little-to-no IO, they are independent of timings and environment. +Less pure tests do more of the impure things. +Purity is correlated with performance, repeatability and stability. +Test purity is non-binary, but it is mostly discrete. +Threads, time, file-system, network, processes are the notches to think about.

+
+
+ +

+ Extent +

+

The second axis is the fraction of the code which gets exercised, potentially indirectly, by the test. +Does the test exercise only the business logic module, or is the database API and the HTTP handling also required? +This is distinct from performance: running more code doesnt mean that the code will run slower. +An infinite loop takes very little code. +What affects performance is not whether tests for business logic touch persistence, but whether, in tests, persistence is backed by an in-memory hash-map or by an out-of-process database server.

+

The extent of the tests is a good indicator of the overall architecture of the application, but usually it isnt a worthy metric to optimize by itself. +On the contrary, artificially limiting the extent of tests by mocking your own code (as opposed to mocking impure IO) reduces fidelity of the tests, and makes the code more brittle in the face of refactors.

+

One potential exception here is the impact on compilation time. +In a layered application A < B < C, its possible to test A either through its interface to B (small-extent test) or by driving A indirectly through C. +The latter has a problem that, after changing A, running tests might require, depending on the language, rebuilding B and C as well.

+
+

Summing up:

+
    +
  • +Dont think about tests in terms of opposition between unit and integration, whatever that means. Instead, +
  • +
  • +Think in terms of tests purity and extent. +
  • +
  • +Purity corresponds to the amount of generalized IO the test is doing and is correlated with desirable metrics, namely performance and resilience. +
  • +
  • +Extent corresponds to the amount of code the test exercises. Extent somewhat correlates with impurity, but generally does not directly affect performance. +
  • +
+

And, the prescriptive part:

+
    +
  • +Ruthlessly optimize purity, moving one step down on the ladder of impurity gives huge impact. +
  • +
  • +Generally, just let the tests have their natural extent. Extent isnt worth optimizing by itself, but it can tell you something about your applications architecture. +
  • +
+

If you enjoyed this post, you might like How to Test as well. +It goes further in the prescriptive direction, but, when writing it, I didnt have the two dimensional purity-extent vocabulary yet.

+
+

As Ive said, this framing is lifted from the SWE book. +There are two differences, one small and one big. +The small difference is that the book uses size terminology in place of purity. +The big difference is that the second axis is different: rather than looking at which fraction code gets exercised by the test, the book talks about test scope: how large is the bit we are actually testing?

+

I do find scope concept useful to think about! +And, unlike extent, keeping most tests focused is a good active prescriptive advice.

+

I however find the scope concept a bit too fuzzy for actual classification.

+

Consider this test from rust-analyzer, which checks that we can complete a method from a trait if the trait is implemented:

+ +
+ + +
#[test]
+fn completes_trait_method() {
+    check(
+        r"
+struct S {}
+pub trait T {
+    fn f(&self)
+}
+impl T for S {}
+
+fn main(s: S) {
+    s.$0
+}
+",
+        expect![[r#"
+            me f() (as T) fn(&self)
+        "#]],
+    );
+}
+ +
+

I struggle with determining the scope of this test. +On the one hand, this clearly tests very narrow, very specific scenario. +On the other hand, to make this work, all the layers of the system have to work just right. +The lexer, the parser, name resolution and type checking all have to be prepared for incomplete code. +This test tests not so much the completion logic itself, as all the underlying infrastructure for semantic analysis.

+

The test is very easy to classify in the purity/extent framework. +Its 100% pure no IO, just a single thread. +It has maximal extent the tests exercises the bulk of the rust-analyzer codebase, the only thing that isnt touched here is the LSP itself.

+

Also, as a pitch for the How to Test post, take a second to appreciate how simple the test is, considering that it tests an error-resilient, highly incremental compiler :)

+
+
+
+ + + + + diff --git a/2022/07/10/almost-rules.html b/2022/07/10/almost-rules.html new file mode 100644 index 00000000..18a50cf1 --- /dev/null +++ b/2022/07/10/almost-rules.html @@ -0,0 +1,287 @@ + + + + + + + Almost Rules + + + + + + + + + + + + +
+ +
+ +
+
+ +

Almost Rules

+

This is going to be a philosophical post, vaguely about language design, and vaguely about Rust. +If youve been following this blog for a while, you know that one theme I consistently hammer at is that of boundaries. +This article is no exception!

+

Obligatory link to Ted Kaminski:

+

https://www.tedinski.com/2018/02/06/system-boundaries.html

+

The most important boundary for a software project is its external interface, that which the users directly interact with and which you give backwards compatibility guarantees for. +For a web-service, this would be the URL scheme and the shape of JSON request and responses. +For a command line application the set and the meaning of command-line flags. +For an OS kernel the set of syscalls (Linux) or the blessed user-space libraries (Mac). +And, for a programming language, this would be the definition of the language itself, its syntax and semantics.

+

Sometimes, however, it is beneficial to install somewhat artificial, internal boundaries, a sort-of macro level layers pattern. +Boundaries have a high cost. +They prevent changes. +But a skillfully placed internal (or even an artificial external) boundary can also help.

+

It cuts the system in two, and, if the cut is relatively narrow in comparison to the overall size of the system (hourglass shape), this boundary becomes a great way to understand the system. +Understanding just the boundary allows you to imagine how the subsystem beneath it could be implemented. +Most of the time, your imaginary version would be pretty close to what actually happens, and this mental map would help you a great deal to peel off the layers of glue code and get a gut feeling for where the core logic is.

+

Even if an internal boundary starts out in the right place, it, unlike an external one, is ever in danger of being violated. +“Internal boundary is a very non-physical thing, most of the time its just informal rules like module A shall not import module B. +Its very hard to notice that something is not being done! +Thats why, I think, larger companies can benefit from microservices architecture: in theory, if we just solve human coordination problem, a monolith can be architectured just as cleanly, while offering much better performance. +In practice, at sufficient scale, maintaining good architecture across teams is hard, and becomes much easier if the intended internal boundaries are reified as processes.

+

Its hard enough to protect from accidental breaching of internal boundaries. +But theres a bigger problem: often, internal boundaries stand in the way of user-visible system features, and it takes a lot of authority to protect internal systems boundary at the cost of not shipping something.

+

In this post, Id want to catalog some of the cases Ive seen in the Rust programming language where I think an internal boundaries were eroded with time.

+
+ +

+ Namespaces +

+

Its a somewhat obscure feature of Rusts name resolution, but various things that inhabit Rusts scopes (structs, modules, traits, variables) are split into three namespaces: types, values and macros. +This allows to have two things with the same name in the same scope without causing conflicts:

+ +
+ + +
struct x { }
+fn x() {}
+ +
+

The above is legal Rust, because the x struct lives in the types namespace, while the x function lives in the values namespace. +The namespaces are reflected syntactically: . is used to traverse value namespace, while :: traverses types.

+

Except that this is almost a rule. +There are some cases where compiler gives up on clear syntax-driven namespacing rules and just does ad-hoc disambiguation. +For example:

+ +
+ + +
use std::str;
+
+fn main() {
+  let s: &str = str::from_utf8(b"hello").unwrap();
+  str::len(s);
+}
+ +
+

Here, the str in &str and str::len is the str type, from the type namespace. +The two other strs are the str module. +In other words, the str::len is a method of a str type, while str::from_utf8 is a free-standing function in the str module. +Like types, modules inhabit the types namespace, so normally the code here would cause a compilation error. +Compiler (and rust-analyzer) just hacks the primitive types case.

+

Another recently added case is that of const generics. +Previously, the T in foo::<T>() was a syntactically-unambiguous reference to something from the types namespace. +Today, it can refer either to a type or to a value. +This begs the question: is splitting type and value namespaces a good idea? +If we have to disambiguate anyway, perhaps we could have just a single namespace and avoid introducing second lookup syntax? +That is, just use std.collections.HashMap;.

+

I think these namespace aspirations re-enact similar developments from C. +I havent double checked my history here, so take the following with the grain of salt and do your own research before quoting, but I think that C, in the initial versions, used to have very strict syntactic separation between types and values. +Thats why you are required to write struct when declaring a local variable of struct type:

+ +
+ + +
struct foo { int a; };
+
+int main(void) {
+  struct foo x;
+  return 0;
+}
+ +
+

The struct keyword tells the parser that it is parsing a type, and, therefore a declaration. +But then at a latter point typedefs were added, and so the parser was taught to disambiguate types and values via the the lexer hack:

+ +
+ + +
struct foo {
+  int a;
+};
+typedef struct foo bar;
+
+int main(void) {
+  bar x;
+  return 0;
+}
+ +
+
+
+ +

+ Patterns and Expressions +

+

Rust has separate grammatical categories for patterns and expressions. +It used to be the case that any utterance can be unambiguously classified, depending solely on the syntactic context, as either an expression or a pattern. +But then a minor exception happened:

+ +
+ + +
fn f(value: Option<i32>) {
+  match value {
+    None => (),
+    none => (),
+  }
+}
+ +
+

Syntactically, None and none are indistinguishable. +But they play quite different roles: None refers to the Option::None constant, while none introduces a fresh binding into the scope. +Swift elegantly disambiguates the two at the syntax level, by requiring a leading . for enum variants. +Rust just hacks this at the name-resolution layer, by defaulting to a new binding unless theres a matching constant in the scope.

+

Recently, the scope of the hack was increased greatly: with destructing assignment implemented, an expression can be re-classified as a pattern now:

+ +
+ + +
let (mut a, mut b) = (0, 1);
+(a, b) = (b, a)
+ +
+

Syntactically, = is a binary expression, so both the left hand side and the right hand side are expressions. +But now the lhs is re-interpreted as a pattern.

+

So perhaps the syntactic boundary between expressions and patterns is a fake one, and we should have used unified expression syntax throughout?

+
+
+ +

+ ::<> +

+

A boundary which stands intact is the class of the grammar. +Rust is still an LL(k) language: it can be parsed using a straightforward single-pass algorithm which doesnt require backtracking. +The cost of this boundary is that we have to type .collect::<Vec<_>>() rather than .collect<Vec<_>>() (nowadays, I type just .collect() and use the light-bulb to fill-in the turbofish).

+
+
+ +

+ ().0.0 +

+

Another recent development is the erosion of the boundary between the lexer and the parser. +Rust has tuple structs, and uses .0 cutesy syntax to access numbered field. +This is problematic for nested tuple struct. +They need syntax like foo.1.2, but to the lexer this string looks like three tokens: foo, ., 1.2. +That is, 1.2 is a floating point number, 6/5. +So, historically one had to write this expression as foo.1 .2, with a meaningful whitespace.

+

Today, this is hacked in the parser, which takes the 1.2 token from the lexer, inspects its text and further breaks it up into 1, . and 2 tokens.

+

The last example is quite interesting: in Rust, unlike many programming languages, the separation between the lexer and the parser is not an arbitrary internal boundary, but is actually a part of an external, semver protected API. +Tokens are the input to macros, so macro behavior depends on how exactly the input text is split into tokens.

+

And theres a second boundary violation here: in theory, token as seen by a macro is just its text plus hygiene info. +In practice though, to implement captures in macro by example ($x:expr things), a token could also be a fully-formed fragment of internal compilers AST data structure. +The API is carefully future proofed such that, as soon as the macro looks at such a magic token, it gets decomposed into underlying true tokens, but there are some examples where the internal details leak via changes in observable behavior.

+
+
+ +

+ Lifetime Parametricity +

+

To end this on a more positive note, heres one pretty important internal boundary which is holding up pretty well. +In Rust, lifetimes dont affect code generation. +In fact, lifetimes are fully stripped from the data which is passed to codegen. +This is pretty important: although the inferred lifetimes are opaque and hard to reason about, you can be sure that, for example, the exact location where a value is dropped is independent from the whims of the borrow checker.

+
+

Conclusion: not really? It seems that we are generally overly-optimistic about internal boundaries, and they seem to crumble under the pressure of feature requests, unless the boundary in question is physically reified (please dont take this as an endorsement of microservice architecture for compilers).

+
+
+
+ + + + + diff --git a/2022/10/03/from-paxos-to-bft.html b/2022/10/03/from-paxos-to-bft.html new file mode 100644 index 00000000..e796bef4 --- /dev/null +++ b/2022/10/03/from-paxos-to-bft.html @@ -0,0 +1,667 @@ + + + + + + + From Paxos to BFT + + + + + + + + + + + + +
+ +
+ +
+
+ +

From Paxos to BFT

+

This is a sequel to Notes on Paxos post. +Similarly, the primarily goal here is for me to understand why the BFT consensus algorithm works in detail. +This might, or might not be useful for other people! +The Paxos article is a prerequisite, best to read that now, and return to this article tomorrow :)

+

Note also that while Paxos was more or less a direct translation of Lamports lecture, this post is a mish-mash oft the original BFT paper by Liskov and Castro, my own thinking, and a cursory glance as this formalization. +As such, the probability that there are no mistakes here is quite low.

+
+ +

+ What is BFT? +

+

BFT stands for Byzantine Fault Tolerant consensus. +Similarly to Paxos, we imagine a distributed system of computers communicating over a faulty network which can arbitrary reorder, delay, and drop messages. +And we want computers to agree on some specific choice of value among the set of possibilities, such that any two computers pick the same value. +Unlike Paxos though, we also assume that computers themselves might be faulty or malicious. +So, we add a new condition to our list of bad things. +Besides reordering, duplication, delaying and dropping, a fake message can be manufactured out of thin air.

+

Of course, if absolutely arbitrary messages can be forged, then no consensus is possible each machine lives in its own solipsistic world which might be completely unlike the world of every other machine. +So theres one restriction messages are cryptographically signed by the senders, and it is assumed that it is impossible for a faulty node to impersonate non-faulty one.

+

Can we still achieve consensus? +As long as for each f faulty, malicious nodes, we have at least 2f + 1 honest ones.

+

Similarly to the Paxos post, we will capture this intuition into a precise mathematical statement about trajectories of state machines.

+
+
+ +

+ Paxos Revisited +

+

Our plan is to start with vanilla Paxos, and then patch it to allow byzantine behavior. +Heres what weve arrived at last time:

+ +
+
Paxos
+ + +
Sets:
+  𝔹       -- Numbered set of ballots (for example, ℕ)
+  𝕍       -- Arbitrary set of values
+  𝔸       -- Finite set of acceptors
+  ℚ ∈ 2^𝔸 -- Set of quorums
+
+  -- Sets of messages for each of the four subphases
+  Msgs1a ≡ {type: {"1a"}, bal: 𝔹}
+
+  Msgs1b ≡ {type: {"1b"}, bal: 𝔹, acc: 𝔸,
+            vote: {bal: 𝔹, val: 𝕍} ∪ {null}}
+
+  Msgs2a ≡ {type: {"2a"}, bal: 𝔹, val: 𝕍}
+
+  Msgs2b ≡ {type: {"2b"}, bal: 𝔹, val: 𝕍, acc: 𝔸}
+
+Assume:
+  ∀ q1, q2 ∈ ℚ: q1 ∩ q2 ≠ {}
+
+Vars:
+  -- Set of all messages sent so far
+  msgs ∈ 2^(Msgs1a ∪ Msgs1b ∪ Msgs2a ∪ Msgs2b)
+
+  -- Function that maps acceptors to ballot numbers or -1
+  -- maxBal :: 𝔸 -> 𝔹 ∪ {-1}
+  maxBal ∈ (𝔹 ∪ {-1})^𝔸
+
+  -- Function that maps acceptors to their last vote
+  -- lastVote :: 𝔸 -> {bal: 𝔹, val: 𝕍} ∪ {null}
+  lastVote ∈ ({bal: 𝔹, val: 𝕍} ∪ {null})^𝔸
+
+Send(m) ≡ msgs' = msgs ∪ {m}
+
+Safe(b, v) ≡
+  ∃ q ∈ ℚ:
+  let
+    qmsgs  ≡ {m ∈ msgs: m.type = "1b" ∧ m.bal = b ∧ m.acc ∈ q}
+    qvotes ≡ {m ∈ qmsgs: m.vote ≠ null}
+  in
+      ∀ a ∈ q: ∃ m ∈ qmsgs: m.acc = a
+    ∧ (  qvotes = {}
+       ∨ ∃ m ∈ qvotes:
+             m.vote.val = v
+           ∧ ∀ m1 ∈ qvotes: m1.vote.bal <= m.vote.bal)
+
+Phase1a(b) ≡
+    maxBal' = maxBal
+  ∧ lastVote' = lastVote
+  ∧ Send({type: "1a", bal: b})
+
+Phase1b(a) ≡
+  ∃ m ∈ msgs:
+      m.type = "1a" ∧ maxBal(a) < m.bal
+    ∧ maxBal' = λ a1 ∈ 𝔸: if a = a1
+                            then m.bal - 1
+                            else maxBal(a1)
+    ∧ lastVote' = lastVote
+    ∧ Send({type: "1b", bal: m.bal, acc: a, vote: lastVote(a)})
+
+Phase2a(b, v) ≡
+   ¬∃ m ∈ msgs: m.type = "2a" ∧ m.bal = b
+  ∧ Safe(b, v)
+  ∧ maxBal' = maxBal
+  ∧ lastVote' = lastVote
+  ∧ Send({type: "2a", bal: b, val: v})
+
+Phase2b(a) ≡
+  ∃ m ∈ msgs:
+      m.type = "2a" ∧ maxBal(a) < m.bal
+    ∧ maxBal' = λ a1 ∈ 𝔸: if a = a1 then m.bal else maxBal(a1)
+    ∧ lastVote' = λ a1 ∈ 𝔸: if a = a1
+                              then {bal: m.bal, val: m.val}
+                              else lastVote(a1)
+    ∧ Send({type: "2b", bal: m.bal, val: m.val, acc: a})
+
+Init ≡
+    msgs = {}
+  ∧ maxBal   = λ a ∈ 𝔸: -1
+  ∧ lastVote = λ a ∈ 𝔸: null
+
+Next ≡
+    ∃ b ∈ 𝔹:
+        Phase1a(b) ∨ ∃ v ∈ 𝕍: Phase2a(b, v)
+  ∨ ∃ a ∈ 𝔸:
+        Phase1b(a) ∨ Phase2b(a)
+
+chosen ≡
+  {v ∈ V: ∃ q ∈ ℚ, b ∈ 𝔹: AllVotedFor(q, b, v)}
+
+AllVotedFor(q, b, v) ≡
+  ∀ a ∈ q: (a, b, v) ∈ votes
+
+votes ≡
+  let
+    msgs2b ≡ {m ∈ msgs: m.type = "2b"}
+  in
+    {(m.acc, m.bal, m.val): m ∈ msgs2b}
+ +
+

Our general idea is to add some evil acceptors 𝔼 to the mix and allow them sending arbitrary messages, while at the same time making sure that the subset of good acceptors continues to run Paxos. +What makes this complex is that we dont know which acceptor are good and which are bad. +So this is our setup

+ +
+ + +
Sets:
+  𝔹       -- Numbered set of ballots (for example, ℕ)
+  𝕍       -- Arbitrary set of values
+  𝔸       -- Finite set of good acceptors
+  𝔼       -- Finite set of evil acceptors
+  𝔸𝔼 ≡ 𝔸 ∪ 𝔼 -- All acceptors
+  ℚ ∈ 2^𝔸𝔼 -- Set of quorums
+
+  Msgs1a ≡ {type: {"1a"}, bal: 𝔹}
+
+  Msgs1b ≡ {type: {"1b"}, bal: 𝔹, acc: 𝔸𝔼,
+            vote: {bal: 𝔹, val: 𝕍} ∪ {null}}
+
+  Msgs2a ≡ {type: {"2a"}, bal: 𝔹, val: 𝕍}
+
+  Msgs2b ≡ {type: {"2b"}, bal: 𝔹, val: 𝕍, acc: 𝔸𝔼}
+
+Assume:
+  𝔼 ∩ 𝔸 = {}
+  ∀ q1, q2 ∈ ℚ: q1 ∩ q2 ∩ 𝔸 ≠ {}
+ +
+

If previously the quorum condition was any two quorums have an acceptor in common, it is now any two quorums have a good acceptor in common. +An alternative way to say that is a byzantine quorum is a super-set of normal quorum, which corresponds to the intuition where we are running normal Paxos, and there are just some extra evil guys whom we try to ignore. +For Paxos, we allowed f faulty out of 2f + 1 total nodes with f+1 quorums. +For Byzantine Paxos, well have f byzantine out 3f + 1 nodes with 2f+1 quorums. +As Ive said, if we forget about byzantine folks, we get exactly f + 1 out of 2f + 1 picture of normal Paxos.

+

The next step is to determine behavior for byzantine nodes. +They can send any message, as long as they are the author:

+ +
+ + +
Byzantine(a) ≡
+      ∃ b ∈ 𝔹:             Send({type: "1a", bal: b})
+    ∨ ∃ b ∈ 𝔹, v ∈ 𝕍:      Send({type: "2a", bal: b, val: v})
+    ∨ ∃ b1, b2 ∈ 𝔹, v ∈ 𝕍: Send({type: "1b", bal: b1, acc: a,
+                                  vote: {bal: b2, val: v}})
+    ∨ ∃ b ∈ 𝔹, v ∈ 𝕍:      Send({type: "2b", bal: b, val: v, acc: a})
+  ∧ maxBal' = maxBal
+  ∧ lastVote' = lastVote
+ +
+

That is, a byzantine acceptor can send any 1a or 2a message at any time, while for 1b and 2b the author should match.

+

What breaks? +The most obvious thing is Phase2b, that is, voting. +In Paxos, as soon as an acceptor receives a 2a message, it votes for it. +The correctness of Paxos hinges on the Safe check before we send 2a message, but a Byzantine node can send an arbitrary 2a.

+

The solution here is natural: rather than blindly trust 2a messages, acceptors would themselves double-check the safety condition, and reject the message if it doesnt hold:

+ +
+ + +
Phase2b(a) ≡
+  ∃ m ∈ msgs:
+      m.type = "2a" ∧ maxBal(a) < m.bal
+    ∧ Safe(m.bal, m.val)
+    ∧ maxBal' = λ a1 ∈ 𝔸: if a = a1 then m.bal else maxBal(a1)
+    ∧ lastVote' = λ a1 ∈ 𝔸: if a = a1
+                              then {bal: m.bal, val: m.val}
+                              else lastVote(a1)
+    ∧ Send({type: "2b", bal: m.bal, val: m.val, acc: a})
+ +
+

Implementation wise, this means that, when a coordinator sends a 2a, it also wants to include 1b messages proving the safety of 2a. +But in the spec we can just assume that all messages are broadcasted, for simplicity. +Ideally, for correct modeling you also want to model how each acceptor learns new messages, to make sure that negative reasoning about a certain message not being sent doesnt creep in, but well avoid that here.

+

However, just re-checking safety doesnt fully solve the problem. +It might be the case that several values are safe at a particular ballot (indeed, in the first ballot any value is safe), and it is exactly the job of a coordinator / 2a message to pick one value to break the tie. +And in our case a byzantine coordinator can send two 2a for different valid values.

+

And here well make the single non-trivial modification to the algorithm. +Like the Safe condition is at the heart of Paxos, the Confirmed condition is the heart here.

+

So basically we expect a good coordinator to send just one 2a message, but a bad one can send many. +And we want to somehow distinguish the two cases. +One way to do that is to broadcast ACKs for 2a among acceptors. +If I received a 2a message, checked that the value therein is safe, and also know that everyone else received this same 2a message, I can safely vote for the value.

+

So we introduce a new message type, 2ac, which confirms a valid 2a message:

+ +
+ + +
Msgs2ac ≡ {type: {"2ac"}, bal: 𝔹, val: 𝕍, acc: 𝔸}
+ +
+

Naturally, evil acceptors can confirm whatever:

+ +
+ + +
Byzantine(a) ≡
+      ∃ b ∈ 𝔹:             Send({type: "1a", bal: b})
+    ∨ ∃ b1, b2 ∈ 𝔹, v ∈ 𝕍: Send({type: "1b", bal: b1, acc: a,
+                                 vote: {bal: b2, val: v}})
+    ∨ ∃ b ∈ 𝔹, v ∈ 𝕍:      Send({type: "2a", bal: b, val: v})
+    ∨ ∃ b ∈ 𝔹, v ∈ 𝕍:      Send({type: "2ac", bal: b, val: v, acc: a})
+    ∨ ∃ b ∈ 𝔹, v ∈ 𝕍:      Send({type: "2b", bal: b, val: v, acc: a})
+  ∧ maxBal' = maxBal
+  ∧ lastVote' = lastVote
+ +
+

But, if we get a quorum of confirmations, we can be sure that no other value will be confirmed in a given ballot (each good acceptors confirms at most a single message in a ballot (and we need a bit of state for that as well))

+ +
+ + +
Confirmed(b, v) ≡
+  ∃ q ∈ ℚ: ∀ a ∈ q: {type: "2ac", bal: b, val: v, acc: a} ∈ msgs
+ +
+

Putting everything so far together, we get

+ +
+
Not Yet BFT Paxos
+ + +
Sets:
+  𝔹          -- Numbered set of ballots (for example, ℕ)
+  𝕍          -- Arbitrary set of values
+  𝔸          -- Finite set of acceptors
+  𝔼          -- Finite set of evil acceptors
+  𝔸𝔼 ≡ 𝔸 ∪ 𝔼 -- Set of all acceptors
+  ℚ ∈ 2^𝔸𝔼   -- Set of quorums
+
+  Msgs1a ≡ {type: {"1a"}, bal: 𝔹}
+
+  Msgs1b  ≡ {type: {"1b"}, bal: 𝔹, acc: 𝔸,
+             vote: {bal: 𝔹, val: 𝕍} ∪ {null}}
+
+  Msgs2a  ≡ {type: {"2a"}, bal: 𝔹, val: 𝕍}
+  Msgs2ac ≡ {type: {"2ac"}, bal: 𝔹, val: 𝕍, acc: 𝔸}
+
+  Msgs2b  ≡ {type: {"2b"}, bal: 𝔹, val: 𝕍, acc: 𝔸}
+
+Assume:
+  𝔼 ∩ 𝔸 = {}
+  ∀ q1, q2 ∈ ℚ: q1 ∩ q2 ∩ 𝔸 ≠ {}
+
+Vars:
+  -- Set of all messages sent so far
+  msgs ∈ 2^(Msgs1a ∪ Msgs1b ∪ Msgs2a ∪ Msgs2ac ∪ Msgs2b)
+
+  -- Function that maps acceptors to ballot numbers or -1
+  -- maxBal :: 𝔸 -> 𝔹 ∪ {-1}
+  maxBal ∈ (𝔹 ∪ {-1})^𝔸
+
+  -- Function that maps acceptors to their last vote
+  -- lastVote :: 𝔸 -> {bal: 𝔹, val: 𝕍} ∪ {null}
+  lastVote ∈ ({bal: 𝔹, val: 𝕍} ∪ {null})^𝔸
+
+  -- Function which maps acceptors to values they confirmed as safe
+  -- confirm :: (𝔸, 𝔹) -> 𝕍 ∪ {null}
+  confirm ∈ (𝕍 ∪ {null})^(𝔸 × 𝔹)
+
+Send(m) ≡ msgs' = msgs ∪ {m}
+
+Confirmed(b, v) ≡
+  ∃ q ∈ ℚ: ∀ a ∈ q: {type: "2ac", bal: b, val: v, acc: a} ∈ msgs
+
+Safe(b, v) ≡
+  ∃ q ∈ ℚ:
+  let
+    qmsgs  ≡ {m ∈ msgs: m.type = "1b" ∧ m.bal = b ∧ m.acc ∈ q}
+    qvotes ≡ {m ∈ qmsgs: m.vote ≠ null}
+  in
+      ∀ a ∈ q: ∃ m ∈ qmsgs: m.acc = a
+    ∧ (  qvotes = {}
+       ∨ ∃ m ∈ qvotes:
+             m.vote.val = v
+           ∧ ∀ m1 ∈ qvotes: m1.vote.bal <= m.vote.bal)
+
+Byzantine(a) ≡
+      ∃ b ∈ 𝔹:             Send({type: "1a", bal: b})
+    ∨ ∃ b1, b2 ∈ 𝔹, v ∈ 𝕍: Send({type: "1b", bal: b1, acc: a,
+                                 vote: {bal: b2, val: v}})
+    ∨ ∃ b ∈ 𝔹, v ∈ 𝕍:      Send({type: "2a", bal: b, val: v})
+    ∨ ∃ b ∈ 𝔹, v ∈ 𝕍:      Send({type: "2ac", bal: b, val: v, acc: a})
+    ∨ ∃ b ∈ 𝔹, v ∈ 𝕍:      Send({type: "2b", bal: b, val: v, acc: a})
+  ∧ maxBal' = maxBal
+  ∧ lastVote' = lastVote
+  ∧ confirm' = confirm
+
+Phase1b(a) ≡
+  ∃ m ∈ msgs:
+      m.type = "1a" ∧ maxBal(a) < m.bal
+    ∧ maxBal' = λ a1 ∈ 𝔸: if a = a1
+                            then m.bal - 1
+                            else maxBal(a1)
+    ∧ lastVote' = lastVote
+    ∧ confirm' = confirm
+    ∧ Send({type: "1b", bal: m.bal, acc: a, vote: lastVote(a)})
+
+Phase2ac(a) ≡
+  ∃ m ∈ msgs:
+      m.type = "2a"
+    ∧ confirm(a, m.bal) = null
+    ∧ Safe(m.bal, m.val)
+    ∧ maxBal' = maxBal
+    ∧ lastVote' = lastVote
+    ∧ confirm' = λ a1 ∈ 𝔸, b1 \in 𝔹:
+                 if a = a1 ∧ b1 = m.bal then m.val else confirm(a1, b1)
+    ∧ Send({type: "2ac", bal: m.bal, val: m.val, acc: a})
+
+Phase2b(a) ≡
+  ∃ b ∈ 𝔹, v ∈ 𝕍:
+      Confirmed(b, v)
+    ∧ maxBal' = λ a1 ∈ 𝔸: if a = a1 then m.bal else maxBal(a1)
+    ∧ lastVote' = λ a1 ∈ 𝔸: if a = a1
+                              then {bal: m.bal, val: m.val}
+                              else lastVote(a1)
+    ∧ confirm' = confirm
+    ∧ Send({type: "2b", bal: m.bal, val: m.val, acc: a})
+
+Init ≡
+    msgs = {}
+  ∧ maxBal   = λ a ∈ 𝔸: -1
+  ∧ lastVote = λ a ∈ 𝔸: null
+  ∧ confirm = λ a ∈ 𝔸, b ∈ 𝔹: null
+
+Next ≡
+    ∃ a ∈ 𝔸:
+        Phase1b(a) ∨ Phase2ac(a) ∨ Phase2b(a)
+  ∨ ∃ a ∈ 𝔼:
+        Byzantine(a)
+
+chosen ≡
+  {v ∈ V: ∃ q ∈ ℚ, b ∈ 𝔹: AllVotedFor(q, b, v)}
+
+AllVotedFor(q, b, v) ≡
+  ∀ a ∈ q: (a, b, v) ∈ votes
+
+votes ≡
+  let
+    msgs2b ≡ {m ∈ msgs: m.type = "2b"}
+  in
+    {(m.acc, m.bal, m.val): m ∈ msgs2b}
+ +
+

In the above, Ive also removed phases 1a and 2a, as byzantine acceptors are allowed to send arbitrary messages as well (well need explicit 1a/2a for liveness, but we wont discuss that here).

+

The most important conceptual addition is Phase2ac if an acceptor receives a new 2a message for some ballot with a safe value, it sends out the confirmation provided that it hadnt done that already. +In Phase2b then we can vote for confirmed values: confirmation by a quorum guarantees both that the value is safe at this ballot, and that this is a single value that can be voted for in this ballot (two different values cand be confirmed in the same ballot, because quorums have an honest acceptor in common). +This almost works, but theres still a problem. +Can you spot it?

+

The problem is in the Safe condition. +Recall that the goal of the Safe condition is to pick a value v for ballot b, such that, if any earlier ballot b1 concludes, the value chosen in b1 would necessary be v. +The way Safe works for ballot b in normal Paxos is that the coordinator asks a certain quorum to abstain from further voting in ballots earlier than b, collects existing votes, and uses those votes to pick a safe value. +Specifically, it looks at the vote for the highest-numbered ballot in the set, and declares a value from it as safe (it is safe: it was safe at that ballot, and for all future ballots theres a quorum which abstained from voting).

+

This procedure puts a lot of trust in that highest vote, which makes it vulnerable. +An evil acceptor can just say that it voted in some high ballot, and force a choice of arbitrary value. +So, we need some independent confirmation that the vote was cast for a safe value. +And we can re-use 2ac messages for this:

+ +
+ + +
Safe(b, v) ≡
+  ∃ q ∈ Q:
+  let
+    qmsgs  ≡ {m ∈ msgs: m.type = "1b" ∧ m.bal = b ∧ m.acc ∈ q}
+    qvotes ≡ {m ∈ qmsgs: m.vote ≠ null}
+  in
+      ∀ a ∈ q: ∃ m ∈ qmsgs: m.acc = a
+   ∧ (  qvotes = {}
+       ∨ ∃ m ∈ qvotes:
+             m.vote.val = v
+           ∧ ∀ m1 ∈ qvotes: m1.vote.bal <= m.vote.bal
+           ∧ Confirmed(m.vote.bal, v))
+ +
+

And thats it, really. +Now we can sketch a proof that this thing indeed achieves BFT consensus, because it actually models normal Paxos among non-byzantine acceptors.

+

Phase1a messages of Paxos are modeled by Phase1a messages of BFT Paxos, as they dont have any preconditions, the same goes for Phase1b. +Phase2a message of Paxos is emitted when a value becomes confirmed in BFT Paxos. +This is correct modeling, because BFTs Safe condition models normal Paxos Safe condition (this is a bit inexact I think, to make this exact, we want to separate this value is safe from we are voting for this value in original Paxos as well). +Finally, Phase2b also displays direct correspondence.

+

As a final pop-quiz, I claim that the Confirmed(m.vote.bal, v) condition in Safe above can be relaxed. +As stated, Confirmed needs a byzantine quorum of confirmations, which guarantees both that the value is safe and that it is the single confirmed value, which is a bit more than we need here. +Do you see what would be enough?

+

The final specification contains this relaxation:

+ +
+
BFT Paxos
+ + +
Sets:
+  𝔹          -- Numbered set of ballots (for example, ℕ)
+  𝕍          -- Arbitrary set of values
+  𝔸          -- Finite set of acceptors
+  𝔼          -- Finite set of evil acceptors
+  𝔸𝔼 ≡ 𝔸 ∪ 𝔼 -- Set of all acceptors
+  ℚ ∈ 2^𝔸𝔼   -- Set of quorums
+  𝕎ℚ ∈ 2^𝔸𝔼  -- Set of weak quorums
+
+  Msgs1a ≡ {type: {"1a"}, bal: 𝔹}
+
+  Msgs1b  ≡ {type: {"1b"}, bal: 𝔹, acc: 𝔸𝔼,
+             vote: {bal: 𝔹, val: 𝕍} ∪ {null}}
+
+  Msgs2a  ≡ {type: {"2a"}, bal: 𝔹, val: 𝕍}
+  Msgs2ac ≡ {type: {"2ac"}, bal: 𝔹, val: 𝕍, acc: 𝔸𝔸𝔼}
+
+  Msgs2b  ≡ {type: {"2b"}, bal: 𝔹, val: 𝕍, acc: 𝔸𝔸𝔼}
+
+Assume:
+  𝔼 ∩ 𝔸 = {}
+  ∀ q1, q2 ∈ ℚ: q1 ∩ q2 ∩ 𝔸 ≠ {}
+  ∀ q ∈ 𝕎ℚ: q ∩ 𝔸 ≠ {}
+
+Vars:
+  -- Set of all messages sent so far
+  msgs ∈ 2^(Msgs1a ∪ Msgs1b ∪ Msgs2a ∪ Msgs2ac ∪ Msgs2b)
+
+  -- Function that maps acceptors to ballot numbers or -1
+  -- maxBal :: 𝔸 -> 𝔹 ∪ {-1}
+  maxBal ∈ (𝔹 ∪ {-1})^𝔸
+
+  -- Function that maps acceptors to their last vote
+  -- lastVote :: 𝔸 -> {bal: 𝔹, val: 𝕍} ∪ {null}
+  lastVote ∈ ({bal: 𝔹, val: 𝕍} ∪ {null})^𝔸
+
+  -- Function which maps acceptors to values they confirmed as safe
+  -- confirm :: (𝔸, 𝔹) -> 𝕍 ∪ {null}
+  confirm ∈ (𝕍 ∪ {null})^(𝔸 × 𝔹)
+
+Send(m) ≡ msgs' = msgs ∪ {m}
+
+Safe(b, v) ≡
+  ∃ q ∈ ℚ:
+  let
+    qmsgs  ≡ {m ∈ msgs: m.type = "1b" ∧ m.bal = b ∧ m.acc ∈ q}
+    qvotes ≡ {m ∈ qmsgs: m.vote ≠ null}
+  in
+      ∀ a ∈ q: ∃ m ∈ qmsgs: m.acc = a
+    ∧ (  qvotes = {}
+       ∨ ∃ m ∈ qvotes:
+             m.vote.val = v
+           ∧ ∀ m1 ∈ qvotes: m1.vote.bal <= m.vote.bal
+           ∧ confirmedWeak(m.vote.val, v))
+
+Confirmed(b, v) ≡
+  ∃ q ∈ ℚ: ∀ a ∈ q: {type: "2ac", bal: b, val: v, acc: a} ∈ msgs
+
+ConfirmedWeak(b, v) ≡
+  ∃ q ∈ 𝕎ℚ: ∀ a ∈ q: {type: "2ac", bal: b, val: v, acc: a} ∈ msgs
+
+Byzantine(a) ≡
+      ∃ b ∈ 𝔹:             Send({type: "1a", bal: b})
+    ∨ ∃ b1, b2 ∈ 𝔹, v ∈ 𝕍: Send({type: "1b", bal: b1, acc: a,
+                                 vote: {bal: b2, val: v}})
+    ∨ ∃ b ∈ 𝔹, v ∈ 𝕍:      Send({type: "2a", bal: b, val: v})
+    ∨ ∃ b ∈ 𝔹, v ∈ 𝕍:      Send({type: "2ac", bal: b, val: v, acc: a})
+    ∨ ∃ b ∈ 𝔹, v ∈ 𝕍:      Send({type: "2b", bal: b, val: v, acc: a})
+  ∧ maxBal' = maxBal
+  ∧ lastVote' = lastVote
+  ∧ confirm' = confirm
+
+Phase1b(a) ≡
+  ∃ m ∈ msgs:
+      m.type = "1a" ∧ maxBal(a) < m.bal
+    ∧ maxBal' = λ a1 ∈ 𝔸: if a = a1
+                            then m.bal - 1
+                            else maxBal(a1)
+    ∧ lastVote' = lastVote
+    ∧ confirm' = confirm
+    ∧ Send({type: "1b", bal: m.bal, acc: a, vote: lastVote(a)})
+
+Phase2ac(a) ≡
+  ∃ m ∈ msgs:
+      m.type = "2a"
+    ∧ confirm(a, m.bal) = null
+    ∧ Safe(m.bal, m.val)
+    ∧ maxBal' = maxBal
+    ∧ lastVote' = lastVote
+    ∧ confirm' = λ a1 ∈ 𝔸, b1 \in 𝔹:
+                 if a = a1 ∧ b1 = m.bal then m.val else confirm(a1, b1)
+    ∧ Send({type: "2ac", bal: m.bal, val: m.val, acc: a})
+
+Phase2b(a) ≡
+  ∃ b ∈ 𝔹, v ∈ 𝕍:
+      confirmed(b, v)
+    ∧ maxBal' = λ a1 ∈ 𝔸: if a = a1 then m.bal else maxBal(a1)
+    ∧ lastVote' = λ a1 ∈ 𝔸: if a = a1
+                              then {bal: m.bal, val: m.val}
+                              else lastVote(a1)
+    ∧ confirm' = confirm
+    ∧ Send({type: "2b", bal: m.bal, val: m.val, acc: a})
+
+Init ≡
+    msgs = {}
+  ∧ maxBal   = λ a ∈ 𝔸: -1
+  ∧ lastVote = λ a ∈ 𝔸: null
+  ∧ confirm = λ a ∈ 𝔸, b ∈ 𝔹: null
+
+Next ≡
+    ∃ b ∈ 𝔹:
+        Phase1a(b) ∨ ∃ v ∈ 𝕍: Phase2a(b, v)
+  ∨ ∃ a ∈ 𝔸:
+        Phase1b(a) ∨ Phase2ac(a) ∨ Phase2b(a)
+  ∨ ∃ a ∈ 𝔼:
+        Byzantine(a)
+
+chosen ≡
+  {v ∈ V: ∃ q ∈ ℚ, b ∈ 𝔹: AllVotedFor(q, b, v)}
+
+AllVotedFor(q, b, v) ≡
+  ∀ a ∈ q: (a, b, v) ∈ votes
+
+votes ≡
+  let
+    msgs2b ≡ {m ∈ msgs: m.type = "2b"}
+  in
+    {(m.acc, m.bal, m.val): m ∈ msgs2b}
+ +
+

TLA+ specs for this post are available here: https://github.com/matklad/paxosnotes.

+
+
+
+ + + + + diff --git a/2022/10/06/hard-mode-rust.html b/2022/10/06/hard-mode-rust.html new file mode 100644 index 00000000..8c1a1163 --- /dev/null +++ b/2022/10/06/hard-mode-rust.html @@ -0,0 +1,1050 @@ + + + + + + + Hard Mode Rust + + + + + + + + + + + + +
+ +
+ +
+
+ +

Hard Mode Rust

+

This post is a case study of writing a Rust application using only minimal, artificially constrained API (eg, no dynamic memory allocation). +It assumes a fair bit of familiarity with the language.

+
+ +

+ Hard Mode Rust +

+

The back story here is a particular criticism of Rust and C++ from hard-core C programmers. +This criticism is aimed at RAII the language-defining feature of C++, which was wholesale imported to Rust as well. +RAII makes using various resources requiring cleanups (file descriptors, memory, locks) easy any place in the program can create a resource, and the cleanup code will be invoked automatically when needed. +And herein lies the problem because allocating resources becomes easy, RAII encourages a sloppy attitude to resources, where they are allocated and destroyed all over the place. +In particular, this leads to:

+
    +
  • +Decrease in reliability. Resources are usually limited in principle, but actual resource exhaustion happens rarely. +If resources are allocated throughout the program, there are many virtually untested codepaths. +
  • +
  • +Lack of predictability. It usually is impossible to predict up-front how much resources will the program consume. +Instead, resource-consumption is observed empirically. +
  • +
  • +Poor performance. Usually, it is significantly more efficient to allocate and free resources in batches. +Cleanup code for individual resources is scattered throughout codebase, increasing code bloat +
  • +
  • +Spaghetti architecture. Resource allocation is an architecturally salient thing. +If all resource management is centralized to a single place, it becomes significantly easier to understand lifecycle of resources. +
  • +
+

I think this is a fair criticism. +In fact, I think this is the same criticism that C++ and Rust programmers aim at garbage collected languages. +This is a spectrum:

+ +
+ + +
           GC object graph
+                 v v
+                  v
+        Tree of values with RAII
+                 v v
+                  v
+Static allocation of resources at startup
+ +
+

Rust programmers typically are not exposed to the lowest level of this pyramid. +But theres a relatively compact exercise to gain the relevant experience: try re-implementing your favorite Rust programs on hard mode.

+

Hard Mode means that you split your program into std binary and #![no_std] no-alloc library. +Only the small binary is allowed to directly ask OS for resources. +For the library, all resources must be injected. +In particular, to do memory allocation, the library receives a slice of bytes of a fixed size, and should use that for all storage. +Something like this:

+ +
+ + +
// app/src/main.rs
+fn main() {
+  let mem_limit = 64 * 1024;
+  let memory = vec![0u8; mem_limit];
+  app::run(&mut memory)
+}
+
+// app/src/lib.rs
+#![no_std] // <- the point of the exercise
+
+pub fn run(memory: &mut [u8]) {
+  ...
+}
+ +
+
+
+ +

+ Ray Tracing +

+

So, this is what the post is about: my experience implementing a toy hard mode ray tracer. +You can find the code on GitHub: http://github.com/matklad/crt.

+

The task of a ray tracer is to convert a description of a 3D scene like the following one:

+ +
+ + +
background #000000
+
+camera {
+    pos 0,10,-50
+    look_at 0,0,0
+    up 0,-1,0
+    focus 50
+    dim 80x60
+}
+
+light {
+    pos -20,10,0
+    color #aa1111
+}
+
+plane {
+    pos 0,-10,0
+    normal 0,1,0
+    material {
+        color #5566FF
+        diffuse 3
+    }
+}
+
+mesh {
+    material {
+        color #BB5566
+        diffuse 3
+    }
+
+    data {
+        v 5.92,4.12,0.00
+        v 5.83,4.49,0.00
+        v 5.94,4.61,0.00
+        v 6.17,4.49,0.00
+        v 6.42,4.12,0.00
+        v 5.38,4.12,2.74
+        ...
+
+        vn -0.96,-0.25,0.00
+        vn -0.96,0.25,0.00
+        vn -0.09,0.99,0.00
+        vn 0.68,0.73,0.00
+        vn 0.87,0.49,0.00
+        vn -0.89,-0.25,-0.36
+        ...
+
+        f 1/1 2/2 3/3
+        f 4/4 5/5 6/6
+        ...
+    }
+
+}
+ +
+

Into a rendered image like this:

+ +
+ + +
+

This works rather intuitive conceptually. +First, imagine the above scene, with an infinite fuchsia colored plane and a red Utah teapot hovering above that. +Then, imagine a camera standing at 0,10,-50 (in cartesian coordinates) and aiming at the origin. +Now, draw an imaginary rectangular 80x60 screen at a focus distance of 50 from the camera along its line of sight. +To get a 2D picture, we shoot a ray from the camera through each pixel on the screen, note which object on the scene is hit (plan, teapot, background), and color the pixel accordingly. +See PBRT Book if you feel like falling further into this particular rabbit hole (warning: it is very deep) (I apologize for little square pixels simplification I use throughout the post :-) ).

+

I wont focus on specific algorithms to implement that (indeed, crt is a very naive tracer), but rather highlight Hard Mode Rust specific concerns.

+
+
+ +

+ Pixel Buffer +

+

Ultimately, the out of a ray tracer is a 2D buffer with 8bit RGB pixels. +One would typically represent it as follows:

+ +
+ + +
pub struct Color { r: u8, g: u8, b: u8 }
+
+pub struct Buf {
+  dim: [u32; 2]
+  // invariant: data.len() == dim.0 * dim.1
+  data: Box<[Color]>,
+}
+ +
+

For us, we want someone else (main) to allocate that box of colors for us, so instead we do the following:

+ +
+ + +
pub struct Buf<'m> {
+  dim: [u32; 2],
+  buf: &'m mut [Color],
+}
+
+impl<'m> Buf<'m> {
+  pub fn new(dim: Idx, buf: &'m mut [Color]) -> Buf<'m> {
+    assert!(dim.0 * dim.1 == buf.len() as u32);
+    Buf { dim, buf }
+  }
+}
+ +
+

The 'm lifetime we use for abstract memory managed elsewhere. +Note how the struct grew an extra lifetime! +This is extra price we have to pay for not relying on RAII to cleanup resources for us:

+ +
+ + +
// Easy Mode
+fn paint(buf: &mut Buf) { ... }
+
+struct PaintCtx<'a> {
+  buf: &'a mut Buf
+}
+
+// Hard Mode
+fn paint(buf: &mut Buf<'_>) { ... }
+
+struct PaintCtx<'a, 'm> {
+  buf: &'a mut Buf<'m>
+}
+ +
+

Note in particular how the Ctx struct now has to include two lifetimes. +This feels unnecessary: 'a is shorter than 'm. +I wish it was possible to somehow abstract that away:

+ +
+ + +
struct PaintCtx<'a> {
+  buf: &'a mut Buf<'_> // &'a mut exists<'m>: Buf<'m>
+}
+ +
+

I dont think thats really possible (earlier post about this). +In particular, the following would run into variance issues:

+ +
+ + +
struct PaintCtx<'a> {
+  buf: &'a mut Buf<'a>
+}
+ +
+

Ultimately, this is annoying, but not a deal breaker.

+

With this rgb::Buf<'_>, we can sketch the program:

+ +
+ + +
// hard mode library
+#![no_std]
+pub fn render<'a>(
+  crt: &'a str,   // textual description of the scene
+  mem: &mut [u8], // all the memory we can use
+  buf: &mut rgb::Buf, // write image here
+) -> Result<(), Error<'a>> {
+  ...
+}
+
+// main
+#[derive(argh::FromArgs)]
+struct Args {
+  #[argh(option, default = "64")]  mem: usize,
+  #[argh(option, default = "800")] width: u32,
+  #[argh(option, default = "600")] height: u32,
+}
+
+fn main() -> anyhow::Result<()> {
+  let args: Args = argh::from_env();
+
+  let mut crt = String::new();
+  io::stdin()
+    .read_to_string(&mut crt)
+    .context("reading input")?;
+
+  // Allocate all the memory.
+  let mut mem = vec![0; args.mem * 1024];
+
+  // Allocate the image
+  let mut buf = vec![
+    rgb::Color::default();
+    (args.width * args.height) as usize
+  ];
+  let mut buf =
+    rgb::Buf::new([args.width, args.height], &mut buf);
+
+  render::render(
+    &crt,
+    &mut mem,
+    &mut buf,
+  )
+  .map_err(|err| anyhow::format_err!("{err}"))?;
+
+  // Write result as a PPM image format.
+  write_ppm(&buf, &mut io::stdout().lock())
+    .context("writing output")?;
+  Ok(())
+}
+
+fn write_ppm(
+  buf: &rgb::Buf,
+  w: &mut dyn io::Write,
+) -> io::Result<()> {
+  ...
+}
+ +
+
+
+ +

+ Hard Mode Rayon +

+

Ray tracing is an embarrassingly parallel task the color of each output pixel can be computed independently. +Usually, the excellent rayon library is used to take advantage of parallelism, but for our raytracer I want to show a significantly simpler API design for taking advantage of many cores. +Ive seen this design in Sorbet, a type checker for Ruby.

+

Heres how a render function with support for parallelism looks:

+ +
+ + +
type ThreadPool<'t> = dyn Fn(&(dyn Fn() + Sync)) + 't;
+
+pub fn render<'a>(
+  crt: &'a str,
+  mem: &mut [u8],
+  in_parallel: &ThreadPool<'_>,
+  buf: &mut rgb::Buf<'_>,
+) -> Result<(), Error<'a>> {
+ +
+

The interface here is the in_parallel function, which takes another function as an argument and runs it, in parallel, on all available threads. +You typically use it like this:

+ +
+ + +
let work: ConcurrentQueue<Work> = ConcurrentQueue::new();
+work.extend(available_work);
+in_parallel(&|| {
+  while let Some(item) = work.pop() {
+    process(item);
+  }
+})
+ +
+

This is similar to a typical threadpool, but different. +Similar to a threadpool, theres a number of threads (typically one per core) which execute arbitrary jobs. +The first difference is that a typical threadpool sends a job to to a single thread, while in this design the same job is broadcasted to all threads. +The job is Fn + Sync rather than FnOnce + Send. +The second difference is that we block until the job is done on all threads, so we can borrow data from the stack.

+

Its on the caller to explicitly implement a concurrent queue to distributed specific work items. +In my implementation, I slice the image in rows

+ +
+ + +
type ThreadPool<'t> = dyn Fn(&(dyn Fn() + Sync)) + 't;
+
+pub fn render<'a>(
+  crt: &'a str,
+  mem: &mut [u8],
+  in_parallel: &ThreadPool<'_>,
+  buf: &mut rgb::Buf<'_>,
+) -> Result<(), Error<'a>> {
+  ...
+  // Note: this is not mut, because this is
+  // a concurrent iterator.
+  let rows = buf.partition();
+  in_parallel(&|| {
+    // next_row increments an atomic and
+    // uses the row index to give an `&mut`
+    // into the row's pixels.
+    while let Some(row) = rows.next_row() {
+      let y: u32 = row.y;
+      let buf: &mut [rgb::Color] = row.buf;
+      for x in 0..dim[0] {
+        let color = render::render_pixel(&scene, [x, y]);
+        buf[x as usize] = to_rgb(&color);
+      }
+    }
+  });
+  ...
+}
+ +
+

In main, we implement a concrete ThreadPool by spawning a thread per core:

+ +
+ + +
fn main() -> anyhow::Result<()> {
+  ...
+  let threads = match args.jobs {
+    Some(it) => Threads::new(it),
+    None => Threads::with_max_threads()?,
+  };
+  render::render(
+    &crt,
+    &mut mem,
+    &|f| threads.in_parallel(f),
+    &mut buf,
+  )
+  .map_err(|err| anyhow::format_err!("{err}"))?;
+}
+ +
+
+
+ +

+ Allocator +

+

The scenes we are going to render are fundamentally dynamically sized. +They can contain arbitrary number of objects. +So we cant just statically allocate all the memory up-front. +Instead, theres a CLI argument which sets the amount of memory a ray tracer can use, and we should either manage with that, or return an error. +So we do need to write our own allocator. +But well try very hard to only allocate the memory we actually need, so we wont have to implement memory deallocation at all. +So a simple bump allocator would do:

+ +
+ + +
pub struct Mem<'m> {
+  raw: &'m mut [u8],
+}
+
+#[derive(Debug)]
+pub struct Oom;
+
+impl<'m> Mem<'m> {
+  pub fn new(raw: &'m mut [u8]) -> Mem<'m> {
+    Mem { raw }
+  }
+
+  pub fn alloc<T>(&mut self, t: T) -> Result<&'m mut T, Oom> { ... }
+
+  pub fn alloc_array<T>(
+    &mut self,
+    n: usize,
+    mut element: impl FnMut(usize) -> T,
+  ) -> Result<&'m mut [T], Oom> { ... }
+
+  pub fn alloc_array_default<T: Default>(
+    &mut self,
+    n: usize,
+  ) -> Result<&'m mut [T], Oom> {
+    self.alloc_array(n, |_| T::default())
+  }
+}
+ +
+

We can create an allocator from a slice of bytes, and then ask it to allocate values and arrays. +Schematically, alloc looks like this:

+ +
+ + +
// PSEUDOCODE, doesn't handle alignment and is broken.
+pub fn alloc<'a, T>(
+  &'a mut self,
+  val: T,
+) -> Result<&'m mut T, Oom> {
+  let size = mem::size_of::<T>();
+  if self.raw.len() < size {
+    // Return error if there isn't enough of memory.
+    return Err(Oom);
+  }
+
+  // Split off size_of::<T> bytes from the start,
+  // doing a little `mem::take` dance to placate
+  // the borrowchecker.
+  let res: &'m mut [u8] = {
+    let raw = mem::take(&mut self.raw);
+    let (res, raw) = raw.split_at_mut(size);
+    self.raw = raw;
+    res
+  }
+
+  // Initialize the value
+  let res = res as *mut [u8] as *mut u8 as *mut T;
+  unsafe {
+    ptr::write(res, val);
+    Ok(&mut *res)
+  }
+}
+ +
+

To make this fully kosher we need to handle alignment as well, but I cut that bit out for brevity.

+

For allocating arrays, its useful if all-zeros bitpattern is a valid default instance of type, as that allows to skip element-wise initialization. +This condition isnt easily expressible in todays Rust though, so we require initializing every array member.

+

The result of an allocation is &'m T this is how we spell Box<T> on hard mode.

+
+
+ +

+ Parsing +

+

The scene contains various objects, like spheres and planes:

+ +
+ + +
pub struct Sphere {
+  pub center: v64, // v64 is [f64; 3]
+  pub radius: f64,
+}
+
+pub struct Plane {
+  pub origin: v64,
+  pub normal: v64,
+}
+ +
+

Usually, wed represent a scene as

+ +
+ + +
pub struct Scene {
+  pub camera: Camera,
+  pub spheres: Vec<Sphere>,
+  pub planes: Vec<Plane>,
+}
+ +
+

We could implement a resizable array (Vec), but doing that would require us to either leak memory, or to implement proper deallocation logic in our allocator, and add destructors to reliably trigger that. +But destructors is exactly something we are trying to avoid in this exercise. +So our scene will have to look like this instead:

+ +
+ + +
pub struct Scene<'m> {
+  pub camera: Camera,
+  pub spheres: &'m mut [Sphere],
+  pub planes: &'m mut [Plane],
+}
+ +
+

And that means we want to know the number of objects well need upfront. +The way we solve this problem is by doing two-pass parsing. +In the first pass, we just count things, then we allocate them, then we actually parse them into allocated space.

+ +
+ + +
pub(crate) fn parse<'m, 'i>(
+  mem: &mut Mem<'m>,
+  input: &'i str,
+) -> Result<Scene<'m>, Error<'i>> {
+  // Size the allocations.
+  let mut n_spheres = 0;
+  let mut n_planes = 0;
+  for word in input.split_ascii_whitespace() {
+    match word {
+      "sphere" => n_spheres += 1,
+      "plane" => n_planes += 1,
+      _ => (),
+    }
+  }
+
+  // Allocate.
+  let mut res = Scene {
+    camera: Default::default(),
+    spheres: mem.alloc_array_default(n_spheres)?
+    planes: mem.alloc_array_default(n_planes)?,
+  };
+
+  // Parse _into_ the allocated scene.
+  let mut p = Parser::new(mem, input);
+  scene(&mut p, &mut res)?;
+  Ok(res)
+}
+ +
+

If an error is encountered during parsing, we want to create a helpful error message. +If the message is fully dynamic, wed have to allocate it into 'm, but it seems simpler to just re-use bits of input for error message. +Hence, Error<'i> is tied to the input lifetime 'i, rather memory lifetime 'm.

+
+
+ +

+ Nested Objects +

+

One interesting type of object on the scene is a mesh of triangles (for example, the teapot is just a bunch of triangles). +A naive way to represent a bunch of triangles is to use a vector:

+ +
+ + +
pub struct Triangle {
+  pub a: v64,
+  pub b: v64,
+  pub c: v64,
+}
+
+type Mesh = Vec<Triangle>;
+ +
+

This is wasteful: in a mesh, each edge is shared by two triangles. +So a single vertex belongs to a bunch of triangles. +If we store a vector of triangles, we are needlessly duplicating vertex data. +A more compact representation is to store unique vertexes once, and to use indexes for sharing:

+ +
+ + +
pub struct Mesh {
+  pub vertexes: Vec<v64>,
+  pub faces: Vec<MeshFace>,
+}
+// Indexes point into vertexes vector.
+pub struct MeshFace { a: u32, b: u32, c: u32 }
+ +
+

Again, on hard mode that would be

+ +
+ + +
pub struct Mesh<'m> {
+  pub vertexes: &'m mut [v64],
+  pub faces: &'m mut [MeshFace],
+}
+ +
+

And a scene contains a bunch of meshes :

+ +
+ + +
pub struct Scene<'m> {
+  pub camera: Camera,
+  pub spheres: &'m mut [Sphere],
+  pub planes: &'m mut [Plane],
+  pub meshes: &'m mut [Mesh<'m>],
+}
+ +
+

Note how, if the structure is recursive, we have owned pointers of &'m mut T<'m> shape. +Originally I worried that that would cause problem with variance, but it seems to work fine for ownership specifically. +During processing, you still need &'a mut T<'m> though.

+

And thats why parsing functions hold an uncomfortable bunch of lifetimes:

+ +
+ + +
fn mesh<'m, 'i>(
+  p: &mut Parser<'m, 'i, '_>,
+  res: &mut Mesh<'m>,
+) -> Result<(), Error<'i>> { ... }
+ +
+

The parser p holds &'i str input and a &'a mut Mem<'m> memory. +It parses input into a &'b mut Mesh<'m>.

+
+
+ +

+ Bounding Volume Hierarchy +

+

With Scene<'m> fully parsed, we can finally get to rendering the picture. +A naive way to do this would be to iterate through each pixel, shooting a ray through it, and then do a nested iterations over every shape, looking for the closest intersection. +Thats going to be slow! +The teapot model contains about 1k triangles, and we have 640*480 pixels, which gives us 307_200_000 ray-triangle intersection tests, which is quite slow even with multithreading.

+

So we are going to speed this up. +The idea is simple just dont intersect a ray with each triangle. +It is possible to quickly discard batches of triangles. +If we have a batch of triangles, we can draw a 3D box around them as a pre-processing step. +Now if the ray doesnt intersect the bounding box, we know that it cant intersect any of the triangles. +So we can use one test with a bounding box instead of many tests for each triangle.

+

This is of course one-sided if the ray intersects the box, it might still miss all of the triangles. +But, if we place bounding boxes smartly (small boxes which cover many adjacent triangles), we can hope to skip a lot of work.

+

We wont go for really smart ways of doing that, and instead will use a simple divide-and-conquer scheme. +Specifically, well draw a large box around all triangles we have. +Then, well note which dimension of the resulting box is the longest. +If, for example, the box is very tall, well cut it in half horizontally, such that each half contains half of the triangles. +Then, well recursively subdivide the two halves.

+

In the end, we get a binary tree, where each node contains a bounding box and two children, whose bounding boxes are contained in the parents bounding box. +Leaves contains triangles. +This construction is called a bounding volume hierarchy, bvh.

+

To intersect the ray with bvh, we use a recursive procedure. +Starting at the root node, we descend into children whose bounding boxes are intersected by the ray. +Sometimes well have to descend into both children, but often enough at least one childs bounding box wont touch the ray, allowing us to completely skip the subtree.

+

On easy mode Rust, we can code it like this:

+ +
+ + +
struct BoundingBox {
+  // Opposite corners of the box.
+  lo: v64, hi: v64,
+}
+
+struct Bvh {
+  root: BvhNode
+}
+
+enum BvhNode {
+  Split {
+    bb: BoundingBox,
+    children: [Box<BvhNode>; 2],
+    /// Which of X,Y,Z dimensions was used
+    // to cut the bb in two.
+    axis: u8,
+  }
+  Leaf {
+    bb: BoundingBox,
+    /// Index of the triangle in a mesh.
+    triangle: u32,
+  }
+}
+ +
+

On hard mode, we dont really love all those separate boxes, we love arrays! +So what wed rather have is

+ +
+ + +
pub struct Bvh<'m> {
+  splits: &'m mut [BvhSplit],
+  leaves: &'m mut [BvhLeaf],
+}
+
+struct BvhSplit {
+  /// Index into either splits or leaves.
+  /// The `tag` is in the highest bit.
+  children: [u32; 2],
+  bb: BoundingBox,
+  axis: u8,
+}
+
+struct BvhLeaf {
+  face: u32,
+  bb: BoundingBox,
+}
+ +
+

So we want to write the following function which recursively constructs a bvh for a mesh:

+ +
+ + +
pub fn build(
+  mem: &mut Mem<'m>,
+  mesh: &Mesh<'m>,
+) -> Result<Bvh<'m>, Oom> { ... }
+ +
+

The problem is, unlike the parser, we cant cheaply determine the number of leaves and splits without actually building the whole tree.

+
+
+ +

+ Scratch Space +

+

So what we are going to do here is to allocate a pointer-tree structure into some scratch space, and then copy that into an &'m mut array. +How do we find the scratch space? +Our memory is &'m [u8]. +We allocate stuff from the start of the region. +So we can split of some amount of scratch space from the end:

+ +
+ + +
&'m mut [u8] -> (&'m mut [u8], &'s mut [u8])
+ +
+

Stuff we allocate into the first half is allocated permanently. +Stuff we allocate into the second half is allocated temporarily. +When we drop temp buffer, we can reclaim all that space.

+

This probably is the most sketchy part of the whole endeavor. +It is unsafe, requires lifetimes casing, and I actually cant get it past miri. +But it should be fine, right?

+

So, I have the following thing API:

+ +
+ + +
impl Mem<'m> {
+  pub fn with_scratch<T>(
+    &mut self,
+    size: usize,
+    f: impl FnOnce(&mut Mem<'m>, &mut Mem<'_>) -> T,
+  ) -> T { ... }
+}
+ +
+

It can be used like this:

+ +
+ + +
#[test]
+fn test_scratch() {
+  let mut buf = [0u8; 4];
+  let mut mem = Mem::new(&mut buf);
+
+  let x = mem.alloc(0u8).unwrap();
+  let y = mem.with_scratch(2, |mem, scratch| {
+    // Here, we can allocate _permanent_ stuff from `mem`,
+    // and temporary stuff from `scratch`.
+    // Only permanent stuff can escape.
+
+    let y = mem.alloc(1u8).unwrap();
+    let z = scratch.alloc(2u8).unwrap();
+    assert_eq!((*x, *y, *z), (0, 1, 2));
+
+    // The rest of memory is occupied by scratch.
+    assert!(mem.alloc(0u8).is_err());
+
+    y // Returning z here fails.
+  });
+
+  // The scratch memory is now reclaimed.
+  let z = mem.alloc(3u8).unwrap();
+  assert_eq!((*x, *y, *z), (0, 1, 3));
+  assert_eq!(buf, [0, 1, 3, 0]);
+  // Will fail to compile.
+  // assert_eq!(*x, 0);
+}
+ +
+

And heres how with_scratch implemented:

+ +
+ + +
pub fn with_scratch<T>(
+  &mut self,
+  size: usize,
+  f: impl FnOnce(&mut Mem<'m>, &mut Mem<'_>) -> T,
+) -> T {
+  let raw = mem::take(&mut self.raw);
+
+  // Split off scratch space.
+  let mid = raw.len() - size;
+  let (mem, scratch) = raw.split_at_mut(mid);
+
+  self.raw = mem;
+  let res = f(self, &mut Mem::new(scratch));
+
+  let data = self.raw.as_mut_ptr();
+  // Glue the scratch space back in.
+  let len = self.raw.len() + size;
+  // This makes miri unhappy, any suggestions? :(
+  self.raw = unsafe { slice::from_raw_parts_mut(data, len) };
+  res
+}
+ +
+

With this infrastructure in place, we can finally implement bvh construction! +Well do it in three steps:

+
    +
  1. +Split of half the memory into a scratch space. +
  2. +
  3. +Build a dynamically-sized tree in that space, counting leaves and interior nodes. +
  4. +
  5. +Allocate arrays of the right size in the permanent space, and copy data over once. +
  6. +
+ +
+ + +
pub struct Bvh<'m> {
+  splits: &'m mut [BvhSplit],
+  leaves: &'m mut [BvhLeaf],
+}
+
+struct BvhSplit {
+  children: [u32; 2],
+  bb: BoundingBox,
+  axis: u8,
+}
+
+struct BvhLeaf {
+  face: u32,
+  bb: BoundingBox,
+}
+
+// Temporary tree we store in the scratch space.
+enum Node<'s> {
+  Split {
+    children: [&'s mut Node<'s>; 2],
+    bb: BoundingBox,
+    axis: u8
+  },
+  Leaf { face: u32, bb: BoundingBox },
+}
+
+pub fn build(
+  mem: &mut Mem<'m>,
+  mesh: &Mesh<'m>,
+) -> Result<Bvh<'m>, Oom> {
+  let free_mem = mem.free();
+  mem.with_scratch(free_mem / 2, |mem, scratch| {
+    let (node, n_splits, n_leaves) =
+      build_scratch(scratch, mesh);
+
+    let mut res = Bvh {
+      splits: mem.alloc_array_default(n_splits as usize)?,
+      leaves: mem.alloc_array_default(n_leaves as usize)?,
+    };
+    copy(&mut res, &node);
+
+    Ok(res)
+  })
+}
+
+fn build_scratch<'s>(
+  mem: &mut Mem<'s>,
+  mesh: &Mesh<'_>,
+) -> Result<(&'s mut Node<'s>, usize, usize), Oom> {
+  ...
+}
+
+fn copy<'m, 's>(res: &mut Bvh<'m>, node: &Node<'s>) {
+  ...
+}
+ +
+

And thats it! +The thing actually works, miri complaints notwithstanding!

+
+
+ +

+ Conclusions +

+

Actually, I am impressed. +I was certain that this wont actually work out, and that Id have to write copious amount of unsafe to get the runtime behavior I want. +Specifically, I believed that &'m mut T<'m> variance issue would force my hand to add 'm, 'mm, 'mmm and further lifetimes, but that didnt happen. +For owning pointers, &'m mut T<'m> turned out to work fine! +Its only when processing you might need extra lifetimes. +Parser<'m, 'i, 'a> is at least two lifetimes more than I am completely comfortable with, but I guess I can live with that.

+

I wonder how far this style of programming can be pushed. +Aesthetically, I quite like that I can tell precisely how much memory the program would use!

+

Code for the post: http://github.com/matklad/crt.

+

Discussion on /r/rust.

+
+
+
+ + + + + diff --git a/2022/10/19/why-linux-troubleshooting-advice-sucks.html b/2022/10/19/why-linux-troubleshooting-advice-sucks.html new file mode 100644 index 00000000..e05b7692 --- /dev/null +++ b/2022/10/19/why-linux-troubleshooting-advice-sucks.html @@ -0,0 +1,172 @@ + + + + + + + Why Linux Troubleshooting Advice Sucks + + + + + + + + + + + + +
+ +
+ +
+
+ +

Why Linux Troubleshooting Advice Sucks

+

A short post on how to create better troubleshooting documentation, prompted by me spending last evening trying to get builtin display of my laptop working with Linux.

+

What finally fixed the blank screen for me was this advice from NixOS wiki:

+ + +

While this particular approach worked, in contrast to a dozen different ones I tried before, I think it shares a very common flaw, which is endemic to troubleshooting documentation. +Can you spot it?

+

The advice tells you the remedy (add this kernel parameter), but it doesnt explain how to verify that this indeed is the problem. +That is, if the potential problem is a not loaded kernel driver, it would really help me to know how to check which kernel driver is in use, so that I can do both:

+ +

If a fix doesnt come with a linked diagnostic, a very common outcome is:

+
    +
  1. +Apply some random fix from the Internet +
  2. +
  3. +Observe that the final problem (blank screen) isnt fixed +
  4. +
  5. +Wonder which of the two is the case: +
      +
    • +the fix is not relevant for the problem, +
    • +
    • +the fix is relevant, but is applied wrong. +
    • +
    +
  6. +
+

So, call to action: if you are writing any kind of documentation, before explaining how to fix the problem, teach the user how to diagnose it.

+

When helping with git, start with explaining git log and git status, not with git reset or git reflog.

+
+

While the post might come as just a tiny bit angry, I want to explicitly mention that I am eternally grateful to all the people who write any kind of docs for using Linux on desktop. +Ive been running it for more than 10 years at this point, and I am still completely clueless as to how debug issues from the first principles. +If not for all of the wikis, stackoverflows and random forum posts out there, I wouldnt be able to use the OS, so thank you all!

+
+
+ + + + + diff --git a/2022/10/24/actions-permissions.html b/2022/10/24/actions-permissions.html new file mode 100644 index 00000000..2d200552 --- /dev/null +++ b/2022/10/24/actions-permissions.html @@ -0,0 +1,122 @@ + + + + + + + GitHub Actions Permissions + + + + + + + + + + + + +
+ +
+ +
+
+ +

GitHub Actions Permissions

+

This short note documents important wrong default in GitHub Actions, which should be corrected for much better contribution experience.

+

Under Settings › Actions › General theres this setting (default pictured):

+ +
+ + +
+

To save your contributors quite a bit of frustration, you want to flip it to this instead:

+ +
+ + +
+

Obviously, the first best solution here is for GitHub itself to change the default.

+
+
+ + + + + diff --git a/2022/10/28/elements-of-a-great-markup-language.html b/2022/10/28/elements-of-a-great-markup-language.html new file mode 100644 index 00000000..3e0f3044 --- /dev/null +++ b/2022/10/28/elements-of-a-great-markup-language.html @@ -0,0 +1,483 @@ + + + + + + + Elements Of a Great Markup Language + + + + + + + + + + + + +
+ +
+ +
+
+ +

Elements Of a Great Markup Language

+

This post contains some inconclusive musing on lightweight markup languages (Markdown, AsciiDoc, LaTeX, reStructuredText, etc). +The overall mood is that I dont think a genuinely great markup languages exists. +I wish it did though. +As an appropriate disclosure, this text is written in AsciiDoctor.

+

EDIT: if you like this post, you should definitely check out https://djot.net.

+

EDIT: welp, that escalated quickly, this post is now written in Djot.

+
+ +

+ Document Model +

+

This I think is the big one. +Very often, a particular markup language is married to a particular output format, either syntactically (markdown supports HTML syntax), or by the processor just not making a crisp enough distinction between the input document and the output (AsciiDoctor).

+

Roughly, if the markup language is for emitting HTML, or PDF, or DocBook XML, thats bad. +A good markup language describes an abstract hierarchical structure of the document, and lets a separate program to adapt that structure to the desired output.

+

More or less, what I want from markup is to convert a text string into a document tree:

+ +
+ + +
enum Element {
+  Text(String),
+  Node {
+    tag: String,
+    attributes: Map<String, String>
+    children: Vec<Element>,
+  }
+}
+
+fn parse_markup(input: &str) -> Element { ... }
+ +
+

Markup language which nails this perfectly is HTML. +It directly expresses this tree structure. +Various viewers for HTML can then render the document in a particular fashion. +HTMLs syntax itself doesnt really care about tag names and semantics: you can imagine authoring HTML documents using an alternative set of tag names.

+

Markup language which completely falls over this is Markdown. +Theres no way to express generic tree structure, conversion to HTML with specific browser tags is hard-coded.

+

Language which does this half-good is AsciiDoctor.

+

In AsciiDoctor, it is possible to express genuine nesting. +Heres a bunch of nested blocks with some inline content and attributes:

+ +
+ + +
====
+Here are your options:
+
+.Red Pill
+[%collapsible]
+======
+Escape into the real world.
+======
+
+.Blue Pill
+[%collapsible]
+======
+Live within the simulated reality without want or fear.
+======
+
+====
+ +
+

The problem with AsciiDoctor is that generic blocks come of as a bit of implementation detail, not as a foundation. +It is difficult to untangle presentation-specific semantics of particular blocks (examples, admonitions, etc) from the generic document structure. +As a fun consequence, a semantic-neutral block (equivalent of a </div>) is the only kind of block which cant actually nest in AsciiDoctor, due to syntactic ambiguity.

+ +
+
+ +

+ Concrete Syntax +

+

Syntax matters. +For lightweight text markup languages, syntax is of utmost importance.

+

The only right way to spell a list is

+ +
+ + +
- Foo
+- Bar
+- Baz
+ +
+

Not

+ +
+ + +
<ul>
+    <li>Foo</li>
+    <li>Bar</li>
+    <li>Baz</li>
+</ul>
+ +
+

And most definitely not

+ +
+ + +
\begin{itemize}
+    \item foo
+    \item Bar
+    \item Baz
+\end{itemize}
+ +
+

Similarly, you lose if you spell links like this:

+ +
+ + +
`My Blog <https://matklad.github.io>`_
+ +
+

Markdown is the trailblazer here, it picked a lot of great concrete syntaxes. +Though, some choices are questionable, like trailing double space rule, or the syntax for including images.

+

AsciiDoctor is the treasure trove of tasteful syntactic decisions.

+
+ +

+ Inline Formatting +

+

For example *bold* is bold, _italics_ is italics, and repeating the emphasis symbol twice (__like *this*__) allows for unambiguous nesting.

+
+ +
+ +

+ Lists +

+

Another tasteful decision are numbered lists, which use . to avoid tedious renumbering:

+
+ +
+ + +
[lowerroman]
+. One
+. Two
+. Three
+ +
+
    +
  1. +One +
  2. +
  3. +Two +
  4. +
  5. +Three +
  6. +
+
+
+
+ +

+ Tables +

+

And AsciiDoctor also has a reasonable-ish syntax for tables, with one-line per cell and a blank like to delimit rows.

+
+ +
+ + +
[cols="1,1"]
+|===
+|First
+|Row
+
+|X
+|Y
+
+|Last
+|Row
+|===
+ +
+ + + + + + + + + + + + + +
FirstRow
XY
LastRow
+
+
+ +
+
+
+ +

+ Composable Processing +

+

To convert our nice, sweet syntax to general tree and than into the final output, we need some kind of a tool. +One way to do that is by direct translation from our source document to, e.g., html.

+

Such one-step translation is convenient for all-inclusive tools, but is a barrier for extensibility. +Amusingly, AsciiDoctor is both a positive and a negative example here.

+

On the negative side of things, classical AsciiDoctor is an extensible Ruby processor. +To extend it, you essentially write a compiler plugin a bit of Ruby code which gets hook into the main processor and gets invoked as a callback when certain tags are parsed. +This plugin interacts with the Ruby API of the processor itself, and is tied to a particular toolchain.

+

In contrast, asciidoctor-web-pdf, a newer thing (which non-the-less uses the same Ruby core), approaches the task a bit differently. +Theres no API to extend the processor itself. +Rather, the processor produces an abstract document tree, and then a user-supplied JavaScript function can convert that piece of data into whatever html it needs, by following a lightweight visitor pattern. +I think this is the key to a rich ecosystem: strictly separate converting input text to an abstract document model from rendering the model through some template. +The two parts could be done by two separate processes which exchange serialized data. +Its even possible to imagine some canonical JSON encoding of the parsed document.

+

Theres one more behavior where all-inclusive approach of AsciiDoctor gets in a way of doing the right thing. +AsciiDoctor supports includes, and they are textual, preprocessor includes, meaning that syntax of the included file affects what follows afterwards. +A much cleaner solution would have been to keep includes in the document tree as distinct nodes (with the path to the included file as an attribute), and let it to the output layer to interpret those as either verbatim text, or subdocuments.

+

Another aspect of composability is that the parsing part of the processing should have, at minimum, a lightweight, embeddable implementation. +Ideally, of course, theres a spec and an array of implementations to choose from.

+

Markdown fairs fairly well here: there never was a shortage of implementations, and today we even have a bunch of different specs!

+

AsciiDoctor… +Well, I am amazed. +The original implementation of AsciiDoc was in Python. +AsciiDoctor, the current tool, is in Ruby. +Neither is too embeddable. +But! AsciiDoctor folks are crazy, they compiled Ruby to JavaScript (and Java), and so the toolchain is available on JVM and Node. +At least for Node, I can confidently say that thats a real production-ready thing which is quite convenient to use! +Still, Id prefer a Rust library or a small WebAssembly blob instead.

+

A different aspect of composability is extensibility. +In Markdown land, the usual answer for when Markdown doesnt quite do everything needed (i.e., in 90% of cases), the answer is to extend concrete syntax. +This is quite unfortunate, changing syntax is hard. +A much better avenue I think is to take advantage of the generic tree structure, and extend the output layer instead. +Tree-with-attributes should be enough to express whatever structure is needed, and than its up to the converter to pattern-match this structure and emit its special thing.

+

Do you remember the fancy two-column rendering above with source-code on the left, and rendered document on the right? +This is how Ive done it:

+ +
+ + +
[.two-col]
+--
+```
+[lowerroman]
+. One
+. Two
+. Three
+```
+
+[lowerroman]
+. One
+. Two
+. Three
+--
+ +
+

That is, a generic block, with .two-col attribute and two children a listing block and a list. +Then theres a separate css which assigns an appropriate flexbox layout for .two-col elements. +Theres no need for special two column layout extension. +It would be perhaps nice to have a dedicated syntax here, but just re-using generic -- block is quite ok!

+ +
+
+ +

+ Where Do We Stand Now? +

+

Not quite there, I would think! +AsciiDoctor at least half-ticks quite a few of the checkboxes, but it is still not perfect.

+

There is a specification in progress, I have high hopes that itll spur alternative implementations (and most of AsciiDoctor problems are implementation issues). +At the same time, I am not overly-optimistic. +The overriding goal for AsciiDoctor is compatibility, and rightfully so. +Theres a lot of content already written, and I would hate to migrate this blog, for example :)

+

At the same time, there are quite a few rough edges in AsciiDoctor:

+
    +
  • +includes +
  • +
  • +non-nestable generic blocks +
  • +
  • +many ways to do certain things (AsciiDoctor essentially supports the union of Markdown and AsciiDoc concrete syntaxes) +
  • +
  • +lack of some concrete sugar (reference-style links are notably better in Markdown) +
  • +
+

It feels like theres a smaller, simpler language somewhere (no, I will not link that xkcd for once (though xkcd:927[] would be a nice use of AsciiDoctor extensibility))

+

On the positive side of things, it seems that in the recent years we built a lot of infrastructure to make these kinds of projects more feasible.

+

Rust is just about the perfect language to take a String from a user and parse it into some sort of a tree, while packaging the whole thing into a self-contained zero-dependency, highly +embeddable, reliable, and reusable library.

+

WebAssembly greatly extends reusability of low-level libraries: between a static library with a C ABI, and a .wasm module, you got all important platforms covered.

+

True extensibility fundamentally requires taking code as input data. +A converter from a great markup language to HTML should accept some user-written script file as an argument, to do fine tweaking of the conversion process. +WebAssembly can be a part of the solution, it is a toolchain-neutral way of expressing computation. +But we have something even more appropriate. +Deno with its friendly scripting language with nice template literals and a capabilities based security model, is just about the perfect runtime to implement a static site generator which takes a bunch of input documents, a custom conversion script, and outputs a bunch of HTML files.

+

If I didnt have anything else to do, Id certainly be writing my own lightweight markup language today!

+
+
+
+ + + + + diff --git a/2022/11/05/accessibility-px-or-rem.html b/2022/11/05/accessibility-px-or-rem.html new file mode 100644 index 00000000..b583970f --- /dev/null +++ b/2022/11/05/accessibility-px-or-rem.html @@ -0,0 +1,252 @@ + + + + + + + Accessibility: px or rem? + + + + + + + + + + + + +
+ +
+ +
+
+ +

Accessibility: px or rem?

+

The genre of this post is: I am having opinions on something I am not an expert at, so hopefully the Internet would correct me.

+

The specific question in question is:

+ +
+

Should you use px or rem units in your CSS?

+
+ +
+

I am not a web developer, but I do have a blog where I write CSS myself, and I very much want to do the right thing. +I was researching and agonizing over this question for years, as I wasnt able to find a conclusive argument one way or another. +So I am writing one.

+

This isnt ideal, but I am lazy, so this post assumes that you already did the research and understand the mechanics of and the difference between px, em, and rem. +And so, you position is probably:

+ +
+

Of course rem, because that honors users setting for the font-size, and so is more accessible, although

+
+ +
+

Although there are buts:

+

But the default font-size is 16px, and thats just too small. +If you just roll with intended defaults, than the text will be painful to read even for folks with great vision!

+

But default font-size of x pixels just doesnt make sense: the actual perceived font size very much depends on the font itself. +At 16px, some fonts will be small, some tiny, and some maybe even just about right.

+

But the recommended way to actually use rem boils down to setting a percentage font-size for the root element, such that 1rem is not the intended font size of the root element, but is equal to 1px (under default settings). +Which, at this point, sounds like using pixels, just with more steps? +After all, the modern browsers can zoom the pixels just fine?

+

So, yeah, lingering doubts… +If you are like me, you painstakingly used rems everywhere, and then html { font-size: 22px } because default is unusable, and percentage of default is stupidly ugly :-)

+
+

So lets settle the question then.

+

The practical data we want is what do the users actually do in practice? +Do they zoom or do they change default font size? +I have spent 10 minutes googling that, didnt find the answer.

+

After that, I decided to just check how it actually works. +So, I opened browsers settings, cranked the font size to the max, and opened Google.

+

To be honest, that was the moment where the question was mentally settled for me. +If Googles search page doesnt respect user-agents default font-size, its an indirect, but also very strong, evidence that thats not a meaningful thing to do.

+

The result of my ad-hoc survey:

+
+
+
Dont care:
+
+
    +
  • +Google +
  • +
  • +Lobsters +
  • +
  • +Hackernews +
  • +
  • +Substack +
  • +
  • +antirez.com +
  • +
  • +tonsky.me +
  • +
  • +New Reddit +
  • +
+
+
+


+

+
+
Embiggen:
+
+
    +
  • +Wikipedia +
  • +
  • +Discourse +
  • +
  • +Old Reddit +
  • +
+
+
+
+

Google versus Wikipedia it is, eh? +But this is actually quite informative: if you adjust your browsers default font-size, you are in an Alice in the Wonderland version of the web which alternates between too large and too small.

+

The next useful question is: what about mobile? +After some testing and googling, it seems that changing browsers default font-size is just not possible on the iPhone? +That the only option is page zoom?

+

Again, I dont actually have the data on whether users rely on zoom or on font size. +But so far it looks like the user doesnt really have a choice? +Only zoom seems to actually work in practice?

+

The final bit of evidence which completely settled the question in my mind comes from this post:

+

https://www.craigabbott.co.uk/blog/accessibility-and-font-sizes

+

It tells us that

+ +
+

Using the wrong units of measurement in your Cascading Style Sheets (CSS) is a +big barrier for many visually impaired users, and it can cause your website fail +the Web Content Accessibility Guidelines (WCAG) 2.1 on +1.4.4 Resize text.

+
+ +
+

That WCAG document is really worth the read:

+ +
+

The scaling of content is primarily a user agent responsibility. User agents +that satisfy UAAG 1.0 Checkpoint 4.1 allow users to configure text scale. The +authors responsibility is to create Web content that does not prevent the +user agent from scaling the content effectively. Authors may satisfy this +Success Criterion by verifying that content does not interfere with user agent +support for resizing text, including text-based controls, or by providing direct +support for resizing text or changing the layout. An example of direct support +might be via server-side script that can be used to assign different style +sheets.

+

The author cannot rely on the user agent to satisfy this Success Criterion +for HTML content if users do not have access to a user agent with zoom support. +For example, if they work in an environment that requires them to use IE 6.

+

If the author is using a technology whose user agents do not provide zoom +support, the author is responsible to provide this type of functionality +directly or to provide content that works with the type of functionality +provided by the user agent. If the user agent doesnt provide zoom functionality +but does let the user change the text size, the author is responsible for +ensuring that the content remains usable when the text is resized.

+
+ +
+

My reading of the above text: its on me, as an author, to ensure that my readers can scale the content using whatever method their user agent employs. +If the UA can zoom, thats perfect, we are done.

+

If the readers actual UA cant zoom, but it can change default font size (eg, IE 6), then I need to support that.

+

Thats most reasonable I guess? +Just make sure that your actual users, in their actual use, can read stuff. +And I am pretty sure my target audience doesnt use IE 6, which I dont support anyway.

+

TL;DR for the whole post:

+

Use pixels. +The goal is not to check the I suffered pain to make my website accessible checkbox, the goal is to make the site accessible to real users. +Theres an explicit guideline about that. +Theres a strong evidence that, barring highly unusual circumstances, real users zoom, and pixels zoom just fine.

+
+

As a nice bonus, if you dont use rem, you make browsers font size setting more useful, because it can control the scale of the browsers own chrome (which is fixed) independently from the scale of websites (which vary).

+
+
+ + + + + diff --git a/2022/11/18/if-a-tree-falls-in-a-forest-does-it-overflow-the-stack.html b/2022/11/18/if-a-tree-falls-in-a-forest-does-it-overflow-the-stack.html new file mode 100644 index 00000000..d4377486 --- /dev/null +++ b/2022/11/18/if-a-tree-falls-in-a-forest-does-it-overflow-the-stack.html @@ -0,0 +1,330 @@ + + + + + + + If a Tree Falls in a Forest, Does It Overflow the Stack? + + + + + + + + + + + + +
+ +
+ +
+
+ +

If a Tree Falls in a Forest, Does It Overflow the Stack?

+

A well-known pitfall when implementing a linked list in Rust is that the the default recursive drop implementation causes stack overflow for long lists. +A similar problem exists for tree data structures as well. +This post describes a couple of possible solutions for trees. +This is a rather esoteric problem, so the article is denser than is appropriate for a tutorial.

+

Lets start with our beloved linked list:

+ +
+ + +
struct Node<T> {
+  value: T,
+  next: Option<Box<Node<T>>>,
+}
+
+impl<T> Node<T> {
+  fn new(value: T) -> Node<T> {
+    Node { value, next: None }
+  }
+  fn with_next(mut self, next: Node<T>) -> Node<T> {
+    self.next = Some(Box::new(next));
+    self
+  }
+}
+ +
+

Its easy to cause this code to crash:

+ +
+ + +
#[test]
+fn stack_overflow() {
+  let mut node = Node::new(0);
+  for _ in 0..100_000 {
+    node = Node::new(0).with_next(node);
+  }
+  drop(node) // boom
+}
+ +
+

The crash happens in the automatically generated recursive drop function. +The fix is to write drop manually, in a non-recursive way:

+ +
+ + +
impl<T> Drop for Node<T> {
+  fn drop(&mut self) {
+    while let Some(next) = self.next.take() {
+      *self = *next;
+    }
+  }
+}
+ +
+

What about trees?

+ +
+ + +
struct Node<T> {
+  value: T,
+  left: Option<Box<Node<T>>>,
+  right: Option<Box<Node<T>>>,
+}
+ +
+

If the tree is guaranteed to be balanced, the automatically generated drop is actually fine, because the height of the tree will be logarithmic. +If the tree is unbalanced though, the same stack overflow might happen.

+

Lets write an iterative Drop to fix this. +The problem though is that the swap with self trick we used for list doesnt work, as we have two children to recur into. +The standard solution would be to replace a stack with an explicit vector of work times:

+ +
+ + +
impl<T> Drop for Node<T> {
+  fn drop(&mut self) {
+    let mut work = Vec::new();
+    work.extend(self.left.take());
+    work.extend(self.right.take());
+    while let Some(node) = work.pop() {
+      work.extend(node.left.take());
+      work.extend(node.right.take());
+    }
+  }
+}
+ +
+

This works, but also makes my internal C programmer scream: we allocate a vector to free memory! +Can we do better?

+

One approach would be to build on balanced trees observation. +If we recur into the shorter branch, and iteratively drop the longer one, we should be fine:

+ +
+ + +
impl<T> Drop for Node<T> {
+  fn drop(&mut self) {
+    loop {
+      match (self.left.take(), self.right.take()) {
+        (None, None) => break,
+        (None, Some(it)) | (Some(it), None) => *self = *it,
+        (Some(left), Some(right)) => {
+          *self =
+            *if left.depth > right.depth { left } else { right }
+        }
+      }
+    }
+  }
+}
+ +
+

This requires maintaining the depths though. +Can we make do without? +My C instinct (not that I wrote any substantial amount of C though) would be to go down the tree, and stash the parent links into the nodes themselves. +And we actually can do something like that:

+ +

Heres how a single rotation could look:

+ +
+ + +
+

Or, in code,

+ +
+ + +
impl<T> Drop for Node<T> {
+  fn drop(&mut self) {
+    loop {
+      match (self.left.take(), self.right.take()) {
+        (None, None) => break,
+        (None, Some(it)) | (Some(it), None) => *self = *it,
+        (Some(mut left), Some(right)) => {
+          mem::swap(self, &mut *left);
+          left.left = self.right.take();
+          left.right = Some(right);
+          self.right = Some(left);
+        }
+      }
+    }
+  }
+}
+ +
+

Ok, what if we have an n-ary tree?

+ +
+ + +
struct Node<T> {
+  value: T,
+  children: Vec<Node<T>>,
+}
+ +
+

I think the same approach works: we can treat the first child as left, and the last child as right, and do essentially the same rotations. +Though, we will rotate in other direction (as removing the right child is cheaper), and well also check that we have at least two grandchildren (to avoid allocation when pushing to an empty vector).

+

Which gives something like this:

+ +
+ + +
impl<T> Drop for Node<T> {
+  fn drop(&mut self) {
+    loop {
+      let Some(mut right) = self.children.pop() else {
+        break;
+      };
+      if self.children.is_empty() {
+        *self = right;
+        continue;
+      }
+      if right.children.len() < 2 {
+        self.children.extend(right.children.drain(..));
+        continue;
+      }
+      // Non trivial case:
+      //   >= 2 children,
+      //   >= 2 grandchildren.
+      let mut me = mem::replace(self, right);
+      mem::swap(&mut self.children[0], &mut me);
+      // Doesn't allocate, this is the same slot
+      // we popped from at the start of the loop.
+      self.children[0].children.push(me);
+    }
+  }
+}
+ +
+

I am not sure this works, and I am not sure this works in linear time, but I am fairly certain that something like this could be made to work if need be.

+

Though, practically, if something like this is a concern, you probably want to re-design the tree structure to be something like this instead:

+ +
+ + +
struct Node<T> {
+  value: T,
+  children: Range<usize>,
+}
+
+struct Tree<T> {
+   nodes: Vec<Node<T>>,
+}
+ +
+

Update(2023-11-03): Apparently, dropping trees iteratively was trendy in Dec. 2022! The same +idea was described by Ismail in this great post:

+

https://ismailmaj.github.io/destructing-trees-safely-and-cheaply

+
+
+ + + + + diff --git a/2022/12/31/raytracer-construction-kit.html b/2022/12/31/raytracer-construction-kit.html new file mode 100644 index 00000000..3a8279c1 --- /dev/null +++ b/2022/12/31/raytracer-construction-kit.html @@ -0,0 +1,558 @@ + + + + + + + Ray Tracer Construction Kit + + + + + + + + + + + + +
+ +
+ +
+
+ +

Ray Tracer Construction Kit

+

Ray or path tracing is an algorithm for getting a 2D picture out of a 3D virtual scene, by simulating a trajectory of a particle of light which hits the camera. +Its one of the fundamental techniques of computer graphics, but thats not why it is the topic for todays blog post. +Implementing a toy ray tracer is one of the best exercises for learning a particular programming language (and a great deal about software architecture in general as well), and thats the why? for this text. +My goal here is to teach you to learn new programming languages better, by giving a particularly good exercise for that.

+

But first, some background

+
+ +

+ Background +

+

Learning a programming language consists of learning the theory (knowledge) and the set of tricks to actually make computer do things (skills). +For me, the best way to learn skills is to practice them. +Ray tracer is an exceptionally good practice dummy, because:

+
    +
  • +It is a project of an appropriate scale: a couple of weekends. +
  • +
  • +It is a project with a flexible scale if you get carried away, you can sink a lot of weekends before you hit diminishing returns on effort. +
  • +
  • +Ray tracer can make use of a lot of aspects of the language modules, static and runtime polymorphism, parallelism, operator overloading, IO, string parsing, performance optimization, custom data structures. +Really, I think the project doesnt touch only a couple of big things, namely networking and evented programming. +
  • +
  • +It is a very visual and feedback-friendly project a bug is not some constraint violation deep in the guts of the database, its a picture upside-down! +
  • +
+

I want to stress once again that here I view ray tracer as a learning exercise. +We arent going to draw any beautiful photorealistic pictures here, well settle for ugly things with artifacts.

+

Eg, this beauty is the final result of my last exercise:

+ +
+ + +
+

And, to maximize learning, I think its better to do everything yourself from scratch. +A crappy teapot which you did from the first principles is full to the brim with knowledge, while a beautiful landscape which you got by following step-by-step instructions is hollow.

+

And thats the gist of the post: Ill try to teach you as little about ray tracing as possible, to give you just enough clues to get some pixels to the screen. +To be more poetic, youll draw the rest of the proverbial owl.

+

This is in contrast to Ray Tracing in One Weekend which does a splendid job teaching ray tracing, but contains way to many spoilers if you want to learn software architecture (rather than graphics programming). +In particular, it contains snippets of code. +We wont see that here as a corollary, all the code youll write is fully your invention!

+

Sadly, theres one caveat to the plan: as the fundamental task is tracing a ray as it gets reflected through the 3D scene, well need a hefty amount of math. +Not an insurmountable amount everything is going to be pretty visual and logical. +But still, well need some of the more advanced stuff, such as vectors and cross product.

+

If you are very comfortable with that, you can approach the math parts the same way as the programming parts grab a pencil and a stack of paper and try to work out formulas yourself. +If solving math puzzlers is not your cup of tea, feel absolutely free to just look up formulas online. +https://avikdas.com/build-your-own-raytracer is a great resource for that. +If, however, linear algebra is your worst nightmare, you might want to look for a more step-by-step tutorial (or maybe pick a different problem altogether! Another good exercise is a small chat server, for example).

+
+
+ +

+ Algorithm Overview +

+

So, what exactly is ray tracing? +Imagine a 3D scene with different kinds of objects: an infinite plane, a sphere, a bunch of small triangles which resemble a teapot from afar. +The scene is illuminated by some distant light source, and so objects cast shadows and reflect each other. +We observe the scene from a particular view point. +Roughly, a ray of light is emitted by a light source, bounces off scene objects and eventually, if it gets into our eye, we perceive a sensation of color, which is mixed from lights original color, as well the colors of all the objects the ray reflected from.

+

Now, we are going to crudely simplify the picture. +Rather than casting rays from the light source, well cast rays from the point of view. +Whatever is intersected by the ray will be painted as a pixels in the resulting image.

+

Lets do this step-by-step

+
+
+ +

+ Images +

+

The ultimate result of our ray tracer is an image. +A straightforward way to represent an image is to use a 2D grid of pixels, where each pixel is an red, green, blue triple where color values vary from 0 to 255. +How do we display the image? +One can reach out for graphics libraries like OpenGL, or image formats like BMP or PNG.

+

But, in the spirit of simplifying the problem so that we can do everything ourselves, we will simplify the problem! +As a first step, well display image as text in the terminal. +That is, well print . for white pixels and x for black pixels.

+

So, as the very first step, lets write some code to display such image by just printing it. +A good example image would be 64 by 48 pixels wide, with 5 pixel large circle in the center. +And heres the first encounter of math: to do this, we want to iterate all (x, y) pixels and fill them if they are inside the circle. +Its useful to recall equation for circle at the origin: x^2 + y^2 = r^2 where r is the radius.

+

🎉 we got hello-world working! +Now, lets go for more image-y images. +We can roll our own real format like BMP (I think that one is comparatively simple), but theres a cheat code here. +There are text-based image formats! +In particular, PPM is the one especially convenient. +Wikipedia Article should be enough to write our own impl. +I suggest using P3 variation, but P6 is also nice if you want something less offensively inefficient.

+

So, rewrite your image outputting code to produce a .ppm file, and also make sure that you have an image viewer that can actually display it. +Spend some time viewing your circle in its colorful glory (can you color it with a gradient?).

+

If you made it this far, I think you understand the spirit of the exercise youve just implemented an encoder for a real image format, using nothing but a Wikipedia article. +It might not be the fastest encoder out there, but its the thing you did yourself. +You probably want to encapsulate it in a module or something, and do a nice API over it. +Go for it! Experiment with various abstractions in the language.

+ +
+
+ +

+ One Giant Leap Into 3D +

+

Now that we can display stuff, lets do an absolutely basic ray tracer. +Well use a very simple scene: just a single sphere with the camera looking directly at it. +And well use a trivial ray tracing algorithm: shoot the ray from the camera, if it hit the sphere, paint black, else, paint white. +If you do this as a mental experiment, youll realize that the end result is going to be exactly what weve got so far: a picture with a circle in it. +Except now, its going to be in 3D!

+

This is going to be the most annoying part, as there are a lot of fiddly details to get this right, while the result is, ahem, underwhelming. +Lets do this though.

+

First, the sphere. +For simplicity, lets assume that its center is at the origin, and it has radius 5, and so its equation is

+ +
+ + +
x^2 + y^2 + z^2 = 25
+ +
+

Or, in vector form:

+ +
+ + +
v̅ ⋅ v̅ = 25
+ +
+

Here, is a point on a sphere (an (x, y, z) vector) and is the dot product. +As a bit of foreshadowing, if you are brave enough to take a stab at deriving various formulas, keeping to vector notation might be simpler.

+

Now, lets place the camera. +It is convenient to orient axes such that Y points up, X points to the right, and Z points at the viewer (ie, Z is depth). +So lets say that camera is at (0, 0, -20) and it looks at (0, 0, 0) (so, directly at the spheres center).

+

Now, the fiddly bit. +Its somewhat obvious how to cast a ray from the camera. If cameras position is , and we cast the ray in the direction , then the equation of points on the ray is

+ +
+ + +
C̅ + t d̅
+ +
+

where t is a scalar parameter. +Or, in the cartesian form,

+ +
+ + +
(0 + t dx, 0 + t dy, -20 + t dz)
+ +
+

where (dx, dy, dz) is the direction vector for a particular ray. +For example, for a ray which goes straight to the center of the sphere, that would be (0, 0, 1).

+

What is not obvious is how do we pick direction d? +Well figure that out later. +For now, assume that we have some magical box, which, given (x, y) position of the pixel in the image, gives us the (dx, dy, dz) of the corresponding ray. +With that, we can use the following algorithm:

+

Iterate through all (x, y) pixels of our 64x48 the image. +From the (x, y) of each pixel, compute the corresponding rays (dx, dy, dz). +Check if the ray intersects the sphere. +If it does, plaint the (x, y) pixel black.

+

To check for intersection, we can plug the ray equation, C̅ + t d̅, into the sphere equation, v̅ ⋅ v̅ = r^2. +That is, we can substitute C̅ + t d̅ for . +As , and r are specific numbers, the resulting equation would have only a single variable, t, and we could solve for that. +For details, either apply pencil and paper, or look up ray sphere intersection.

+

But how do we find d̅ for each pixel? +To do that, we actually need to add the screen to the scene. +Our image is 64x48 rectangle. +So lets place that between the camera and the sphere.

+

We have camera at (0, 0, -20) our rectangular screen at, say, (0, 0, -10) and a sphere at (0, 0, 0). +Now, each pixel in our 2D image has a corresponding point in our 3D scene, and well cast the ray from cameras position through this point.

+

The full list of parameters to define the scene is:

+ +
+ + +
sphere center:   0 0 0
+sphere radius:   5
+camera position: 0 0 -20
+camera up:       0 1 0
+camera right:    1 0 0
+focal distance:  10
+screen width:    64
+screen height:   48
+ +
+

Focal distance is the distance from the camera to the screen. +If we know the direction camera is looking along and the focal distance, we can calculate the position of the center of the screen, but thats not enough. +The screen can rotate, as we didnt fixed which side is up, so we need an extra parameter for that. +We also add a parameter for direction to the right for convenience, though its possible to derive right from up and forward directions.

+

Given this set of parameters, how do we calculate the ray corresponding to, say, (10, 20) pixel? +Well, Ill leave that up to you, but one hint Ill give is that you can calculate the middle of the screen (camera position + view direction × focal distance). +If you have the middle of the screen, you can get to (x, y) pixel by stepping x steps up (and we know up!) and y steps right (and we know right!). +Once we know the coordinates of the point of the screen through which the ray shoots, we can compute rays direction as the difference between that point and cameras origin.

+

Again, this is super fiddly and frustrating! +My suggestion would be:

+
    +
  • +Draw some illustrations to understand relation between camera, screen, sphere, and rays. +
  • +
  • +Try to write the code which, given (x, y) position of the pixel in the image, gives (dx, dy, dz) coordinates of the direction of the ray from the camera through the pixel. +
  • +
  • +If that doesnt work, lookup the solution, https://avikdas.com/build-your-own-raytracer/01-casting-rays/project.html describes one way to do it! +
  • +
+

Coding wise, we obviously want to introduce some machinery here. +The basic unit we need is a 3D vector a triple of three real numbers (x, y, z). +It should support all the expected operations addition, subtraction, multiplication by scalar, dot product, etc. +If your language supports operator overloading, you might look that up know. +Is it a good idea to overload operator for dot product? +You wont know unless you try!

+

We also need something to hold the info about sphere, camera and the screen and to do the ray casting.

+

If everything works, you should get a familiar image of the circle. +But its now powered by a real ray tracer and its real honest to god 3D, even if it doesnt look like it! +Indeed, with ray casting and ray-sphere intersection code, all the essential aspects are in place, from now on everything else are just bells and whistles.

+
+
+ +

+ Second Sphere +

+

Ok, now that we can see one sphere, lets add the second one. +We need to solve two subproblems for this to make sense. +First, we need to parameterize our single sphere with the color (so that the second one looks differently, once we add it). +Second, we should no longer hard-code (0, 0, 0) as a center of the sphere, and make that a parameter, adjusting the formulas accordingly. +This is a good place to debug the code. +If you think you move the sphere up, does it actually moves up in the image?

+

Now, the second sphere can be added with different radius, position and color. +The ray casting code now needs to be adjusted to say which sphere intersected the ray. +Additionally, it needs to handle the case where the ray intersects both spheres and figure out which one is closer.

+

With this machinery in hand, we can now create some true 3D scenes. +If one sphere is fully in front of the other, thats just concentric circles. +But if the spheres intersect, the picture is somewhat more interesting.

+
+
+ +

+ Let There Be Phong +

+

The next step is going to be comparatively easy implementation wise, but it will fill our spheres with vibrant colors and make them spring out in their full 3D glory. +We will add light to the scene.

+

Light source will be parameterized by two values:

+
    +
  • +Position of the light source. +
  • +
  • +Color and intensity of light. +
  • +
+

For the latter, we can use a vector with three components (red, green, blue), where each components varies from 0.0 (no light) to 1.0 (maximally bright light). +We can use a similar vector to describe a color of the object. +Now, when the light hits the object, the resulting color would be a componentwise product of the lights color and the objects color.

+

Another contributor is the direction of light. +If the light falls straight at the object, it seems bright. +If the light falls obliquely, it is more dull.

+

Lets get more specific:

+
    +
  • + is a point on our sphere where the light falls. +
  • +
  • + is the normal vector at . +That is, its a vector with length 1, which is locally perpendicular to the surface at +
  • +
  • + is the position of the light source +
  • +
  • + is a vector of length one from to : R̅ = (L̅ - P̅) / |L̅ - P̅| +
  • +
+

Then, R̅ ⋅ N̅ gives us this is the light falling straight at the surface? coefficient between 0 and 1. +Dot product between two unit vectors measures how similar their direction is (it is 0 for perpendicular vectors, and 1 for collinear ones). +So, is light perpendicular is the same as is light collinear with normal is dot product.

+

The final color will be the memberwise product of lights color and spheres color multiplied by this attenuating coefficient. +Putting it all together:

+

For each pixel (x, y) we cast a C̅ + t d̅ ray through it. +If the ray hits the sphere, we calculate point P where it happens, as well as spheres normal at point P. +For sphere, normal is a vector which connects spheres center with P. +Then we cast a ray from P to the light source . +If this ray hits the other sphere, the point is occluded and the pixel remains dark. +Otherwise, we compute the color using using the angle between normal and direction to the light.

+

With this logic in place, the picture now should display two 3D-looking spheres, rather than a pair of circles. +In particular, our spheres now cast shadows!

+

What we implemented here is a part of Phong reflection model, specifically, the diffuse part. +Extending the code to include ambient and specular parts is a good way to get some nicer looking pictures!

+
+
+ +

+ Scene Description Language +

+

At this point, we accumulated quite a few parameters: camera config, positions of spheres, there colors, light sources (you totally can have many of them!). +Specifying all those things as constants in the code makes experimentation hard, so a next logical step is to devise some kind of textual format which describes the scene. +That way, our ray tracer reads a textual screen description as an input, and renders a .ppm as an output.

+

One obvious choice is to use JSON, though its not too convenient to edit by hand, and bringing in a JSON parser is contrary to our do it yourself approach. +So I would suggest to design your own small language to specify the scene. +You might want to take a look at https://kdl.dev for the inspiration.

+

Note how the program grows bigger there are now distinctive parts for input parsing, output formatting, rendering per-se, as well as the underlying nascent 3D geometry library. +As usual, if you feel like organizing all that somewhat better, go for it!

+
+
+ +

+ Plane And Other Shapes +

+

So far, weve only rendered spheres. +Theres a huge variety of other shapes we can add, and it makes sense to tackle at least a couple. +A good candidate is a plane. +To specify a plane, we need a normal, and a point on a plane. +For example, N̅ ⋅ v̅ = 0 is the equation of the plain which goes through the origin and is orthogonal to . +We can plug our ray equation instead of and solve for t as usual.

+

The second shape to add is a triangle. +A triangle can be naturally specified using its three vertexes. +One of the more advanced math exercises would be to derive a formula for ray-triangle intersection. +As usual, math isnt the point of the exercise, so feel free to just look that up!

+

With spheres, planes and triangles which are all shapes, there clearly is some amount of polymorphism going on! +You might want to play with various ways to best express that in your language of choice!

+
+
+ +

+ Meshes +

+

Triangles are interesting, because there are a lot of existing 3D models specified as a bunch of triangles. +If you download such a model and put it into the scene, you can render somewhat impressive images.

+

There are many formats for storing 3D meshes, but for out purposes .obj files are the best. +Again, this is a plain text format which you can parse by hand.

+

There are plenty of .obj models to download, with the Utah teapot being the most famous one.

+

Note that the model specifies three parameters for each triangles vertex:

+
    +
  • +coordinate (v) +
  • +
  • +normal (vn) +
  • +
  • +texture (vt) +
  • +
+

For the first implementation, youd want to ignore vn and vt, and aim at getting a highly polygonal teapot on the screen. +Note that the model contains thousands of triangles, and would take significantly more time to render. +You might want to downscale the resolution a bit until we start optimizing performance.

+

To make the picture less polygony, youd want to look at those vn normals. +The idea here is that, instead of using a true triangles normal when calculating light, to use a fake normal as if the the triangle wasnt actually flat. +To do that, the .obj files specifies fake normals for each vertex of a triangle. +If a ray intersects a triangle somewhere in the middle, you can compute a fake normal at that point by taking a weighted average of the three normals at the vertexes.

+

At this point, you should get a picture roughly comparable to the one at the start of the article!

+
+
+ +

+ Performance Optimizations +

+

With all bells and whistles, our ray tracer should be rather slow, especially for larger images. +There are three tricks I suggest to make it faster (and also to learn a bunch of stuff).

+

First, ray tracing is an embarrassingly parallel task: each pixel is independent from the others. +So, as a quick win, make sure that you program uses all the cores for rendering. +Did you manage to get a linear speedup?

+

Second, its a good opportunity to look into profiling tools. +Can you figure out what specifically is the slowest part? +Can you make it faster?

+

Third, our implementation which loops over each shape to find the closest intersection is a bit naive. +It would be cool if we had something like a binary search tree, which would show us the closest shape automatically. +As far as I know, there isnt a general algorithmically optimal index data structure for doing spatial lookups. +However, theres a bunch of somewhat heuristic data structures which tend to work well in practice.

+

One that I suggest implementing is the bounding volume hierarchy. +The crux of the idea is that we can take a bunch of triangles and place them inside a bigger object (eg, a gigantic sphere). +Then, if a ray doesnt intersect this bigger object, we dont need to check any triangles contained within. +Theres a certain freedom in how one picks such bounding objects.

+

For BVH, we will use axis-aligned bounding box as our bounding volumes. +It is a cuboid whose edges are parallel to the coordinate axis. +You can parametrize an AABB with two points the one with the lowest coordinates, and the one with the highest. +Its also easy to construct an AABB which bounds a set of shapes take the minimum and maximum coordinates of all vertexes. +Similarly, intersecting an AABB with a ray is fast.

+

The next idea is to define a hierarchy of AABBs. +First, we define a root AABB for the whole scene. +If the ray doesnt hit it, we are done. +The root box is then subdivided into two smaller boxes. +The ray can hit one or two of them, and we recur into each box that got hit. +Worst case, we are recurring into both subdivisions, which isnt any faster, but in the common case we can skip at least a half. +For simplicity, we also start with computing an AABB for each triangle we have in a scene, so we can think uniformly about a bunch of AABBs.

+

Putting everything together, we start with a bunch of small AABBs for our primitives. +As a first step, we compute their common AABB. +This will be the basis of our recursion step: a bunch of small AABBs, and a huge AABB encompassing all of them. +We want to subdivide the big box. +To do that, we select its longest axis (eg, if the big box is very tall, we aim to cut it in two horizontally), and find a midpoint. +Then, we sort small AABBs into those whoche center is before or after midpoint along this axis. +Finally, for each of the two subsets we compute a pair of new AABBs, and then recur.

+

Crucially, the two new bounding boxes might intersect. +We cant just cut the root box in two and unambiguously assign small AABBs to the two half, as they might not be entirely within one. +But, we can expect the intersection to be pretty small in practice.

+
+
+ +

+ Next Steps +

+

If youve made it this far, you have a pretty amazing pice of software! +While it probably clocks at only a couple of thousands lines of code, it covers a pretty broad range of topics, from text file parsing to advanced data structures for spatial data. +I deliberately spend no time explaining how to best fit all these pieces into a single box, thats the main thing for you to experiment with and to learn.

+

There are two paths one can take from here:

+
    +
  • +If you liked the graphics programming aspect of the exercise, theres a lot you can do to improve the quality of the output. +https://pbrt.org is the canonical book on the topic. +
  • +
  • +If you liked the software engineering side of the project, you can try to re-implement it in different programming languages, to get a specific benchmark to compare different programming paradigms. +Alternatively, you might want to look for other similar self-contained hand-made projects. +Some options include: +
      +
    • +Software rasterizer: rather than simulating a path of a ray, we can project triangles onto the screen. +This is potentially much faster, and should allow for real-time rendering. +
    • +
    • +A highly concurrent chat server: a program which listens on a TCP port, allows clients to connect to it and exchange messages. +
    • +
    • +A toy programming language: going full road from a text file to executable .wasm. Bonus points if you also do an LSP server for your language. +
    • +
    • +A distributed key-value store based on Paxos or Raft. +
    • +
    • +A toy relational database +
    • +
    +
  • +
+
+
+
+ + + + + diff --git a/2023/01/04/on-random-numbers.html b/2023/01/04/on-random-numbers.html new file mode 100644 index 00000000..d6f488a7 --- /dev/null +++ b/2023/01/04/on-random-numbers.html @@ -0,0 +1,253 @@ + + + + + + + On Random Numbers + + + + + + + + + + + + +
+ +
+ +
+
+ +

On Random Numbers

+

This is a short post which decomposes random numbers topic into principal components and maps them to Rust ecosystem.

+
+ +

+ True Randomness +

+

For cryptographic purposes (eg, generating a key pair for public key cryptography), you want to use real random numbers, derived from genuinely stochastic physical signals +(hardware random number generator, keyboard input, etc). +The shape of the API here is:

+ +
+ + +
fn fill_buffer_with_random_data(buf: &mut [u8])
+ +
+

As this fundamentally requires talking to some physical devices, this task is handled by the operating system. +Different operating systems provide different APIs, covering which is beyond the scope of this article (and my own knowledge).

+

In Rust, getrandom crate provides a cross-platform wrapper for this functionality.

+

It is a major deficiency of Rust standard library that this functionality is not exposed there. +Getting cryptographically secure random data is in the same class of OS services as getting the current time or reading standard input. +Arguably, its even more important, as most applications for this functionality are security-critical.

+
+
+ +

+ Pseudorandom Number Generator +

+

For various non-cryptographic randomized algorithms, you want to start with a fixed, deterministic seed, and generate a stream of numbers, statistically indistinguishable from random. +The shape of the API here is:

+ +
+ + +
fn random_u32(state: &mut f64) -> u32
+ +
+

There are many different algorithms to do that. +fastrand crate implements something sufficiently close to the state of the art.

+

Alternatively, a good-enough PRNG can be implemented in 9 lines of code:

+ +
+ + +
pub fn random_numbers(seed: u32) -> impl Iterator<Item = u32> {
+  let mut random = seed;
+  std::iter::repeat_with(move || {
+    random ^= random << 13;
+    random ^= random >> 17;
+    random ^= random << 5;
+    random
+  })
+}
+ +
+

This code was lifted from Rusts standard library (source).

+

The best way to seed a PRNG is usually by using a fixed constant. +If you absolutely need some amount of randomness in the seed, you can use the following hack:

+ +
+ + +
pub fn random_seed() -> u64 {
+  std::hash::Hasher::finish(&std::hash::BuildHasher::build_hasher(
+    &std::collections::hash_map::RandomState::new(),
+  ))
+}
+ +
+

In Rust, hash maps include some amount of randomization to avoid exploitable pathological behavior due to collisions. +The above snippet extracts that randomness.

+
+
+ +

+ Non-Uniformly Distributed Random Numbers, Uniformly Distributed Random Non-Numbers. +

+

Good PRNG gives you a sequence of u32 numbers where each number is as likely as every other one. +You can convert that to a number from 0 to 10 with random_u32() % 10. +This will be good enough for most purposes, but will fail rigorous statistical tests. +Because 232 isnt evenly divisible by 10, 0 would be ever so slightly more frequent than 9. +There is an algorithm to do this correctly (if random_u32() is very large, and falls into the literal remainder after dividing 232 by 10, throw it away and try again).

+

Sometimes you you want to use random_u32() to generate other kinds of random things, like a random point on a 3D sphere, or a random permutation. +There are also algorithms for that.

+

Sphere: generate random point in the unit cube; if it is also in the unit ball, project it onto the surface, otherwise throw it away and try again.

+

Permutation: naive algorithm of selecting a random element to be the first, then selecting a random element among the rest to be the second, etc, works.

+

There are libraries which provide collections of such algorithms. +For example, fastrand includes most common ones, like generating numbers in range, generating floating point numbers or shuffling slices.

+

rand includes more esoteric cases line the aforementioned point on a sphere or a normal distribution.

+
+
+ +

+ Ambient Global Source Of Random Numbers +

+

It is customary to expect existence of a global random number generator seeded for you. +This is an anti-pattern in the overwhelming majority of cases, passing a random number generator explicitly leads to better software. +In particular, this is a requirement for deterministic tests.

+

In any case, this functionality can be achieved by storing a state of PRNG in a thread local:

+ +
+ + +
use std::cell::Cell;
+
+pub fn thread_local_random_u32() -> u32 {
+  thread_local! {
+      static STATE: Cell<u64> = Cell::new(random_seed())
+  }
+  STATE.with(|cell| {
+    let mut state = cell.get();
+    let result = random_u32(&mut state);
+    cell.set(state);
+    result
+  })
+}
+ +
+
+
+ +

+ rand +

+

rand is an umbrella crate which includes all of the above. +rand also provides flexible trait-based plugin interface, allowing you to mix and match different combinations of PRNGs and algorithms. +User interface of rand is formed primarily by extension traits.

+
+
+ +

+ Kinds Of Randomness +

+

Circling back to the beginning of the post, it is very important to distinguish between the two use-cases:

+
    +
  • +using unpredictable data for cryptography +
  • +
  • +using statistically uniform random data for stochastic algorithms +
  • +
+

Although the two use-cases both have randomness in their name, they are disjoint, and underlying algorithms and APIs dont have anything in common. +They are physically different: one is a syscall, another is a pure function mapping integers to integers.

+
+
+
+ + + + + diff --git a/2023/01/25/next-rust-compiler.html b/2023/01/25/next-rust-compiler.html new file mode 100644 index 00000000..435dbcb7 --- /dev/null +++ b/2023/01/25/next-rust-compiler.html @@ -0,0 +1,194 @@ + + + + + + + Next Rust Compiler + + + + + + + + + + + + +
+ +
+ +
+
+ +

Next Rust Compiler

+

In Rust in 2023, @nrc floated an idea of a Rust compiler rewrite. +As my hobby is writing Rust compiler frontends (1, 2), I have some (but not very many) thoughts here! +The post consists of two parts, covering organizational and technical aspects.

+
+ +

+ Organization +

+

Writing a production-grade compiler is not a small endeavor. +The questions of who writes the code, who pays the people writing the code, and whats the economic incentive to fund the work in the first place are quite important.

+

My naive guesstimate is that Rust is currently at that stage of its life where its clear that the language wont die, and would be deployed quite widely, but where, at the same time, the said deployment didnt quite happen to the full extent yet. +From within the Rust community, it seems like Rust is everywhere. +My guess is that from the outside it looks like theres Rust in at least some places.

+

In other words, its high time to invest substantially into Rust ecosystem, as the risk that the investment sinks completely is relatively low, but the expected growth is still quite high. +This makes me think that a next-gen rust compiler isnt too unlikely: I feel that rustc is stuck in a local optimum, and that, with some boldness, it is possible to deliver something more awesome.

+
+
+ +

+ Technicalities +

+

Heres what I think an awesome rust compiler would do:

+
+
rust-native compilation model
+
+

Like C++, Rust (ab)uses the C compilation model compilation units are separately compiled into object files, which are then linked into a single executable by the linker. +This model is at odds with how the language work. +In particular, compiling a generic function isnt actually possible until you know specific type parameters at the call-site. +Rust and C++ hack around that by compiling a separate copy for every call-site (C++ even re-type-checks every call-site), and deduplicating instantiations during the link step. +This creates a lot of wasted work, which is only there because we try to follow compile to object files then link model of operation. +It would be significantly more efficient to merge compiler and linker, such that only the minimal amount of code is compiled, compiled code is fully aware about surrounding context and can be inlined across crates, and where the compilation makes the optimal use of all available CPU and RAM.

+
+
intra-crate parallelism
+
+

C compilation model is not stupid it is the way it is to enable separate compilation. +Back in the day, compiling whole programs was simply not possible due to the limitations of the hardware. +Rather, a program had to be compiled in separate parts, and then the parts linked together into the final artifact. +With bigger computers today, we dont think about separate compilation as much. +It is still important though not only our computers are more powerful, our programs are much bigger. +Moreover, computing power comes not from increasing clock speeds, but from a larger number of cores.

+

Rusts DAG of anonymous crates with well-defined declaration-site checked interfaces is actually quite great for compiling Rust in parallel (especially if we get rid of completely accidental interactions between monomorphization and existing linkers). +However, even a single crate can be quite large, and is compiled sequentially. +For example, in the recent compile time benchmark, a significant chunk of time was spent compiling just this file with a bunch of functions. +Intuitively, as all these functions are completely independent, compiler should be able to process them in parallel. +In reality, Rust doesnt actually make that as easy as it seems, but it definitely is possible to do better than the current compiler.

+
+
open-world compiling; stable MIR
+
+

Today, Rust tooling is a black-box you feed it with source text and an executable binary for the output. +This solves the problem of producing executable binaries quite well!

+

However, for more complex projects you want to have more direct relationship with the code. +You want tools other than compiler to understand the meaning of the code, and to act on it. +For example automated large scale refactors and code analysis, project-specific linting rules or formal proofs of correctness all could benefit from having an access to semantically rich model of the language.

+

Providing such semantic model, where AST is annotated with resolved names, inferred types, and bodies are converted to a simple and precise IR, is a huge ask. +Not because it is technically hard to implement, but because this adds an entirely new stable API to the language. +Nonetheless, such an API would unlock quite a few use cases, so the tradeoff is worth it.

+
+
hermetic deterministic compilation
+
+

It is increasingly common to want reproducible builds. +With NixOS and Guix, whole Linux distros are built in a deterministic fashion. +It is possible to achieve reproducibility by carefully freezing whatever mess you are currently in, the docker way. +But a better approach is to start with inherently pure and hermetic components, and assemble them into a larger system.

+

Today, Rust has some amount of determinism in its compilation, but it is achieved by plugging loopholes, rather than by not admitting impurities into the system in the first place. +For example, the env! macro literally looks up a value in compilers environment, without any attempt at restricting or at least enumerating available inputs. +Procedural macros are an unrestricted RCE.

+

It feels like we can do better, and that we should do better, if the goal is still less mess.

+
+
lazy and error-resilient compilation
+
+

For the task of providing immediate feedback right in the editor when the user types the code, compilation pipeline needs to be changed significantly. +It should be lazy (so that only the minimal amount of code is inspected and re-analyzed on typing) and resilient and robust to errors (IDE job mostly ends when the code is error free). +rust-analyzer shows one possible way to do that, with the only drawback of being a completely separate tool for IDE, and only IDE. +Theres no technical limitation why the full compiler cant be like that, just the organizational limitation of it being very hard to re-architecture existing entrenched code, perfected for its local optimum.

+
+
cargo install rust-compiler
+
+

Finally, for the benefit of compiler writers themselves, a compiler should be a simple rust crate, which builds with stable Rust and is otherwise a very boring text processing utility. +Again, rust-analyzer shows that it is possible, and that the benefits for development velocity are enormous. +I am glad to see a recent movement to making the build process for the compiler simpler!

+
+
+

Discussion on /r/rust

+
+
+
+ + + + + diff --git a/2023/01/26/rusts-ugly-syntax.html b/2023/01/26/rusts-ugly-syntax.html new file mode 100644 index 00000000..7b40efbb --- /dev/null +++ b/2023/01/26/rusts-ugly-syntax.html @@ -0,0 +1,328 @@ + + + + + + + Rust's Ugly Syntax + + + + + + + + + + + + +
+ +
+ +
+
+ +

Rusts Ugly Syntax

+

People complain about Rust syntax. +I think that most of the time when people think they have an issue with Rusts syntax, they actually object to Rusts semantics. +In this slightly whimsical post, Ill try to disentangle the two.

+

Lets start with an example of an ugly Rust syntax:

+ +
+ + +
pub fn read<P: AsRef<Path>>(path: P) -> io::Result<Vec<u8>> {
+  fn inner(path: &Path) -> io::Result<Vec<u8>> {
+    let mut file = File::open(path)?;
+    let mut bytes = Vec::new();
+    file.read_to_end(&mut bytes)?;
+    Ok(bytes)
+  }
+  inner(path.as_ref())
+}
+ +
+

This function reads contents of a given binary file. +This is lifted straight from the standard library, so it is very much not a strawman example. +And, at least to me, its definitely not a pretty one!

+

Lets try to imagine what this same function would look like if Rust had a better syntax. +Any resemblance to real programming languages, living or dead, is purely coincidental!

+

Lets start with Rs++:

+ +
+ + +
template<std::HasConstReference<std::Path> P>
+std::io::outcome<std::vector<uint8_t>>
+std::read(P path) {
+    return read_(path.as_reference());
+}
+
+static
+std::io::outcome<std::vector<uint8_t>>
+read_(&auto const std::Path path) {
+    auto file = try std::File::open(path);
+    std::vector bytes;
+    try file.read_to_end(&bytes);
+    return okey(bytes);
+}
+ +
+

A Rhodes variant:

+ +
+ + +
public io.Result<ArrayList<Byte>> read<P extends ReferencingFinal<Path>>(
+        P path) {
+    return myRead(path.get_final_reference());
+}
+
+private io.Result<ArrayList<Byte>> myRead(
+        final reference lifetime var Path path) {
+    var file = try File.open(path);
+    ArrayList<Byte> bytes = ArrayList.new();
+    try file.readToEnd(borrow bytes);
+    return Success(bytes);
+}
+ +
+

Typical RhodesScript:

+ +
+ + +
public function read<P extends IncludingRef<Path>>(
+    path: P,
+): io.Result<Array<byte>> {
+    return myRead(path.included_ref());
+}
+
+private function myRead(
+    path: &const Path,
+): io.Result<Array<byte>> {
+    let file = try File.open(path);
+    Array<byte> bytes = Array.new()
+    try file.readToEnd(&bytes)
+    return Ok(bytes);
+}
+ +
+

Rattlesnake:

+ +
+ + +
def read[P: Refing[Path]](path: P): io.Result[List[byte]]:
+    def inner(path: @Path): io.Result[List[byte]]:
+        file := try File.open(path)
+        bytes := List.new()
+        try file.read_to_end(@: bytes)
+        return Ok(bytes)
+    return inner(path.ref)
+ +
+

And, to conclude, CrabML:

+ +
+ + +
read :: 'p  ref_of => 'p -> u8 vec io.either.t
+let read p =
+  let
+    inner :: &path -> u8 vec.t io.either.t
+    inner p =
+      let mut file = try (File.open p) in
+      let mut bytes = vec.new () in
+      try (file.read_to_end (&mut bytes)); Right bytes
+  in
+    ref_op p |> inner
+;;
+ +
+

As a slightly more serious and useful exercise, lets do the opposite keep the Rust syntax, but try to simplify semantics until the end result looks presentable.

+

Heres our starting point:

+ +
+ + +
pub fn read<P: AsRef<Path>>(path: P) -> io::Result<Vec<u8>> {
+  fn inner(path: &Path) -> io::Result<Vec<u8>> {
+    let mut file = File::open(path)?;
+    let mut bytes = Vec::new();
+    file.read_to_end(&mut bytes)?;
+    Ok(bytes)
+  }
+  inner(path.as_ref())
+}
+ +
+

The biggest source of noise here is the nested function. +The motivation for it is somewhat esoteric. +The outer function is generic, while the inner function isnt. +With the current compilation model, that means that the outer function is compiled together with the users code, gets inlined and is optimized down to nothing. +In contrast, the inner function is compiled when the std itself is being compiled, saving time when compiling users code. +One way to simplify this (losing a bit of performance) is to say that generic functions are always separately compiled, but accept an extra runtime argument under the hood which describes the physical dimension of input parameters.

+

With that, we get

+ +
+ + +
pub fn read<P: AsRef<Path>>(path: P) -> io::Result<Vec<u8>> {
+  let mut file = File::open(path.as_ref())?;
+  let mut bytes = Vec::new();
+  file.read_to_end(&mut bytes)?;
+  Ok(bytes)
+}
+ +
+

The next noisy element is the <P: AsRef<Path>> constraint. +It is needed because Rust loves exposing physical layout of bytes in memory as an interface, specifically for cases where that brings performance. +In particular, the meaning of Path is not that it is some abstract representation of a file path, but that it is just literally a bunch of contiguous bytes in memory. +So we need AsRef to make this work with any abstraction which is capable of representing such a slice of bytes. +But if we dont care about performance, we can require that all interfaces are fairly abstract and mediated via virtual function calls, rather than direct memory access. +Then we wont need AsRefat all:

+ +
+ + +
pub fn read(path: &Path) -> io::Result<Vec<u8>> {
+  let mut file = File::open(path)?;
+  let mut bytes = Vec::new();
+  file.read_to_end(&mut bytes)?;
+  Ok(bytes)
+}
+ +
+

Having done this, we can actually get rid of Vec<u8> as well we can no longer use generics to express efficient growable array of bytes in the language itself. +Wed have to use some opaque Bytes type provided by the runtime:

+ +
+ + +
pub fn read(path: &Path) -> io::Result<Bytes> {
+  let mut file = File::open(path)?;
+  let mut bytes = Bytes::new();
+  file.read_to_end(&mut bytes)?;
+  Ok(bytes)
+}
+ +
+

Technically, we are still carrying ownership and borrowing system with us, but, without direct control over memory layout of types, it no longer brings massive performance benefits. +It still helps to avoid GC, prevent iterator invalidation, and statically check that non-thread-safe code isnt actually used across threads. +Still, we can easily get rid of those &-pretzels if we just switch to GC. +We dont even need to worry about concurrency much as our objects are separately allocated and always behind a pointer, we can hand-wave data races away by noticing that operations with pointer-sized things are atomic on x86 anyway.

+ +
+ + +
pub fn read(path: Path) -> io::Result<Bytes> {
+  let file = File::open(path)?;
+  let bytes = Bytes::new();
+  file.read_to_end(bytes)?;
+  Ok(bytes)
+}
+ +
+

Finally, we are being overly pedantic with error handling here not only we mention a possibility of failure in the return type, we even use ? to highlight any specific expression that might fail. +It would be much simpler to not think about error handling at all, and let some top-level
+try { } catch (...) { /* intentionally empty */ }
+handler deal with it:

+ +
+ + +
pub fn read(path: Path) -> Bytes {
+  let file = File::open(path);
+  let bytes = Bytes::new();
+  file.read_to_end(bytes);
+  bytes
+}
+ +
+

Much better now!

+
+
+ + + + + diff --git a/2023/02/10/how-a-zig-ide-could-work.html b/2023/02/10/how-a-zig-ide-could-work.html new file mode 100644 index 00000000..783b2691 --- /dev/null +++ b/2023/02/10/how-a-zig-ide-could-work.html @@ -0,0 +1,340 @@ + + + + + + + How a Zig IDE Could Work + + + + + + + + + + + + +
+ +
+ +
+
+ +

How a Zig IDE Could Work

+

Zig is a very interesting language from an IDE point of view. +Some aspects of it are friendly to IDEs, like a very minimal and simple-to-parse syntax +(Zig can even be correctly lexed line-by-line, very cool!), +the absence of syntactic macros, and ability to do a great deal of semantic analysis on a file-by-file basis, in parallel. +On the other hand, comptime. +I accidentally spent some time yesterday thinking about how to build an IDE for that, this post is a result.

+
+ +

+ How Does the Zig Compiler Work? +

+

Its useful to discuss a bit how the compiler works today. +For something more thorough, refer to this excellent series of posts: https://mitchellh.com/zig.

+

First, each Zig file is parsed into an AST. +Delightfully, parsing doesnt require any context whatsoever, its a pure []const u8 -> Ast function, and the resulting Ast is just a piece of data.

+

After parsing, the Ast is converted to an intermediate representation, Zir. +This is where Zig diverges a bit from more typical statically compiled languages. +Zir actually resembles something like Pythons bytecode an intermediate representation that an interpreter for a dynamically-typed language would use. +Thats because it is an interpreters IR the next stage would use Zir to evaluate comptime.

+

Lets look at an example:

+ +
+ + +
fn generic_add(comptime T: type, lhs: T, rhs: T) T {
+  return lhs + rhs;
+}
+ +
+

Here, the Zir for generic_add would encode addition as a typeless operation, because we dont know types at this point. +In particular, T can be whatever. +When the compiler would instantiate generic_add with different Ts, like generic_add(u32, ...), generic_add(f64, ...), it will re-use the same Zir for different instantiations. +Thats the two purposes of Zir: to directly evaluate code at compile time, and to serve as a template for monomorphisation.

+

The next stage is where the magic happens the compiler partially evaluates dynamically typed Zir to convert it into a fairly standard statically typed IR. +The process starts at the main function. +The compiler more or less tries to evaluate the Zir. +If it sees something like 90 + 2, it directly evaluates that to 92. +For something which cant be evaluated at compile time, like a + 2 where a is a runtime variable, the compiler generates typed IR for addition (as, at this point, we already know the type of a).

+

When the compiler sees something like

+ +
+ + +
const T = u8;
+const x = generic_add(T, a, b);
+ +
+

the compiler monomorphises the generic call. +It checks that all comptime arguments (T) are fully evaluated, and starts partial evaluation of the called function, with comptime parameters fixed to particular values (this of course is memoized).

+

The whole process is lazy only things transitively used from main are analyzed. +Compiler wont complain about something like

+ +
+ + +
fn unused() void {
+    1 + "";
+}
+ +
+

This looks perfectly fine at the Zir level, and the compiler will not move beyond Zir unless the function is actually called somewhere.

+
+
+ +

+ And an IDE? +

+

IDE adds several dimensions to the compiler:

+
    +
  • +works with incomplete and incorrect code +
  • +
  • +works with code which rapidly changes over time +
  • +
  • +gives results immediately, there is no edit/compile cycle +
  • +
  • +provides source to source transformations +
  • +
+

The hard bit is the combination of rapid changes and immediate results. +This is usually achieved using some smart, language-specific combination of

+
    +
  • +

    Incrementality: although changes are frequent and plentiful, they are local, and it is often possible to re-use large chunks of previous analysis.

    +
  • +
  • +

    Laziness: unlike a compiler, an IDE does not need full analysis results for the entirety of the codebase. +Usually, analysis of the function which is currently being edited is the only time-critical part, everything else can be done asynchronously, later.

    +
  • +
+

This post gives an overview of some specific fruitful combinations of the two ideas:

+

https://rust-analyzer.github.io/blog/2020/07/20/three-architectures-for-responsive-ide.html

+

How can we apply the ideas to Zig? +Lets use this as our running example:

+ +
+ + +
fn guinea_pig(comptime T: type, foo: Foo) void {
+    foo.<complete here>;
+
+    helper(T).<here>;
+
+    var t: T = undefined;
+    t.<and here>;
+}
+ +
+

There are two, separate interesting questions to ask here:

+
    +
  • +what result do we even want here? +
  • +
  • +how to achieve that given strict performance requirements? +
  • +
+
+
+ +

+ Just Compile Everything +

+

Its useful to start with a pedantically correct approach. +Lets run our usual compilation (recursively monomorphising called functions starting from the main). +The result would contain a bunch of different monomorphisations of guinea_pig, for different values of T. +For each specific monomorphisation its now clear what is the correct answer. +For the unspecialized case as written in the source code, the IDE can now show something reasonable by combining partial results from each monomorphisation.

+

There are several issues with this approach.

+

First, collecting the full set of monomorphisations is not well-defined in the presence of conditional compilation. +Even if you run the full compilation starting from main, today compiler assumes some particular environment (eg, Windows or Linux), which doesnt give you a full picture. +Theres a fascinating issue about multibuilds making the compiler process all combinations of conditional compilation flags at the same time: zig#3028. +With my IDE writer hat on, I really hope it gets in, as it will move IDE support from inherently heuristic territory, to something where, in principle, theres a correct result (even if might not be particularly easy to compute).

+

The second problem is that this probably is going to be much too slow. +If you think about IDE support for the first time, a very tantalizing idea is to try to lean just into incremental compilation. +Specifically, you can imagine a compiler that maintains fully type-checked and resolved view of the code at all times. +If a user edits something, the compiler just incrementally changes what needs to be changed. +So the trick for IDE-grade interactive performance is just to implement sufficiently advanced incremental compilation.

+

The problem with sufficiently incremental compiler is that even the perfect incrementality, which does the minimal required amount of work, will be slow in a non-insignificant amount of cases. +The nature of code is that a small change to the source in a single place might lead to a large change to resolved types all over the project. +For examples, changing the name of some popular type invalidates all the code that uses this type. +Thats the fundamental reason why IDE try hard to maintain an ability to not analyze everything.

+

On the other hand, at the end of the day youll have to do this work at least by the time you run the tests. +And Zigs compiler is written from the ground up to be very incremental and very fast, so perhaps this will be good enough? +My current gut feeling is that the answer is no even if you can re-analyze everything in, say, 100ms, thatll still require burning the battery for essentially useless work. +Usually, theres a lot more atomic small edits for a single test run.

+

The third problem with the approach of collection all monomorphisations is that it simply does not work if the function isnt actually called, yet. +Which is common in incomplete code that is being written, exactly the use-case where the IDE is most useful!

+
+
+ +

+ Compile Only What We Need +

+

Thinking about the full approach more, it feels like it could be, at least in theory, optimized somewhat. +Recall that in this approach we have a graph of function instantiations, which starts at the root (main), and contains various monomorphisations of guinea_pig on paths reachable from the root.

+

It is clear we actually dont need the full graph to answer queries about instantiations of guinea_pig. +For example, if we have something like

+ +
+ + +
fn helper() i32 {
+    ...
+}
+ +
+

and the helper does not (transitively) call guinea_pig, we can avoid looking into its body, as the signature is enough to analyze everything else.

+

More precisely, given the graph of monomorphisations, we can select minimal subgraph which includes all paths from main to guinea_pig instantiations, as well as all the functions whose bodies we need to process to understand their signatures. +My intuition is that the size of that subgraph is going to be much smaller than the whole thing, and, in principle, an algorithm which would analyze only that subgraph should be speedy enough in practice.

+

The problem though is that, as far as I know, its not possible to understand what belongs to the subgraph without analysing the whole thing! +In particular, using compile-time reflection our guinea_pig can be called through something like comptime "guinea" ++ "_pig". +Its impossible to infer the call graph just from Zir.

+

And of course this does not help the case where the function isnt called at all.

+
+
+ +

+ Abstract Comptime Interpretation +

+

It is possible to approach

+ +
+ + +
fn guinea_pig(comptime T: type, foo: Foo) void {
+    foo.<complete here>;
+
+    helper(T).<here>;
+
+    var t: T = undefined;
+    t.<and here>;
+}
+ +
+

from a different direction. +What if we just treat this function as the root of our graph? +We cand do that exactly, because it has some comptime parameters. +But we can say that we have some opaque values for the parameters: T = opaquevalue. +Of course, we wont be able to fully evaluate everything and things like if (T == int) would probably need to propagate opaqueness. +At the same time, something like the result of BoundedArray(opaque) would still be pretty useful for an IDE.

+

I am wondering if theres even perhaps some compilation-time savings in this approach? +My understanding (which might be very wrong!) is that if a generic function contains something like 90 + 2, this expression would be comptime-evaluated anew for every instantiation. +In theory, what we could do is to partially evaluate this function substituting opaque values for comptime parameters, and then, for any specific instantiation, we can use the result of this partial evaluation as a template. +Not sure what that would mean precisely though: it definitely would be more complicated than just substituting Ts in the result.

+
+
+ +

+ What is to Be Done? +

+

Ast and Zir infra is good. +It is per-file, so it naturally just works in an IDE.

+

Multibuilds are important. +I am somewhat skeptical that theyll actually fly, and its not a complete game over if they dont +(Rust has the same problem with conditional compilation, and it does create fundamental problems for both the users and authors of IDEs, but the end result is still pretty useful). +Still, if Zig does ship multibuilds, thatd be awesome.

+

Given the unused function problem, I think its impossible to avoid at least some amount of abstract interpretation, so Sema has to learn to deal with opaque values.

+

With abstract interpretation machinery in place, it can be used as a first, responsive layer of IDE support.

+

Computing the full set of monomoprisations in background can be used to augment these limited synchronous features with precise results asynchronously. +Though, this might be tough to express in existing editor UIs. +Eg, the goto definition result is now an asynchronous stream of values.

+

Discussion on /r/zig.

+
+
+
+ + + + + diff --git a/2023/02/12/a-love-letter-to-deno.html b/2023/02/12/a-love-letter-to-deno.html new file mode 100644 index 00000000..d04e9223 --- /dev/null +++ b/2023/02/12/a-love-letter-to-deno.html @@ -0,0 +1,206 @@ + + + + + + + <3 Deno + + + + + + + + + + + + +
+ +
+ +
+
+ +

<3 Deno

+

Deno is a relatively new JavaScript runtime. +I find quite interesting and aesthetically appealing, in-line with the recent trend to rein in the worse-is-better law of software evolution. +This post explains why.

+

The way I see it, the primary goal of Deno is to simplify development of software, relative to the status quo. +Simplifying means removing the accidental complexity. +To me, a big source of accidental complexity in todays software are implicit dependencies. +Software is built of many components, and while some components are relatively well-defined (Linux syscall interface, amd64 ISA), others are much less so. +Example: upgrading OpenSSL for your Rust project from 1.1.1 to 3.0.0 works on your machine, but breaks on CI, because 3.0.0 now needs some new perl module, which is expected to usually be there together with the perl installation, but that is not universally so. +One way to solve these kinds of problems is by putting an abstraction boundary a docker container around them. +But a different approach is to very carefully avoid creating the issues. +Deno, in the general sense, picks this second noble hard path.

+

One of the first problems in this area is bootstrapping. +In general, you can paper over quite a bit of complexity by writing some custom script to do all the grunt work. +But how do you run it?

+

One answer is to use a shell script, as the shell is already installed. +Which shell? Bash, sh, powershell? +Probably POSIX sh is a sane choice, Windows users can just run a docker container a Linux in their subsystem. +Youll also want to install shellcheck to make sure you dont accidentally use bashisms. +At some point your script grows too large, and you rewrite it in Python. +You now have to install Python, Ive heard its much easier these days on Windows. +Of course, youll run that inside a docker container a virtual environment. +And you would be careful to use python3 -m pip rather than pip3 to make sure you use the right thing.

+

Although scripting and plumbing should be a way to combat complexity, just getting to the point where every contributor to your software can run scripts requires a docker container a great deal of futzing with the environment!

+

Deno doesnt solve the problem of just being already there on every imaginable machine. +However, it strives very hard to not create additional problems once you get the deno binary onto the machine. +Some manifestations of that:

+

Deno comes with a code formatter (deno fmt) and an LSP server (deno lsp) out of the box. +The high order bit here is not that these are high-value features which drive productivity (though that is so), but that you dont need to pull extra deps to get these features. +Similarly, Deno is a TypeScript runtime theres no transpilation step involved, you just deno main.ts.

+

Deno does not rely on systems shell. +Most scripting environments, including node, python, and ruby, make a grave mistake of adding an API to spawn a process intermediated by the shell. +This is slow, insecure, and brittle (which shell was that, again?). +I have a longer post about the issue. +Deno doesnt have this vulnerable API. +Not that not having an API is a particularly challenging technical achievement, but it is better than the current default.

+

Deno has a correctly designed tasks system. +Whenever you do a non-trivial software project, there inevitably comes a point where you need to write some software to orchestrate your software. +Accidental complexity creeps in the form of a Makefile (which make is that?) or a ./scripts/*.sh directory. +Node (as far as I know) pioneered a great idea to treat these as a first-class concern of the project, by including a scripts field in the package.json. +It then botched the execution by running the scripts through systems shell, which downgrades it to ./scripts directory with more indirection. +In contrast, Deno runs the scripts in deno_task_shell a purpose-built small cross-platform shell. +You no longer need to worry that rm might behave differently depending on which rm it is, because its a shells built-in now.

+

These are all engineering nice-to-haves. +They dont necessary matter as much in isolation, but together they point at project values which align very well with my own ones. +But there are a couple of innovative, bigger features as well.

+

The first big feature is the permissions system. +When you run a Deno program, you need to specify explicitly which OS resources it can access. +Pinging google.com would require an explicit opt-in. +You can safely run

+ +
+ + +
$ deno run https://shady.website.eu/caesar-cipher.ts < in.txt > out.txt
+ +
+

and be sure that this wont steal your secrets. +Of course, it can still burn the CPU indefinitely or fill out.txt with garbage, but it wont be able to read anything beyond explicitly passed input. +For many, if not most, scripting tasks this is a nice extra protection from supply chain attacks.

+

The second big feature is Denos interesting, minimal, while still practical, take on dependency management. +First, it goes without saying that there are no global dependencies. +Everything is scoped to the current project. +Naturally, there are also lockfiles with checksums.

+

However, theres no package registry or even a separate package manager. +In Deno, a dependency is always a URL. +The runtime itself understands URLs, downloads their contents and loads the resulting TypeScript or JavaScript. +Surprisingly, it feels like this is enough to express various dependency patterns. +For example, if you need a centralized registry, like https://deno.land/x, you can use URLs pointing to that! +URLs can also express semver, with foo@1 redirecting to foo@1.2.3. +Import maps are a standard, flexible way to remap dependencies, for when you need to tweak something deep in the tree. +Crucially, in addition to lockfiles Deno comes with a built in deno vendor command, which fetches all of the dependencies of the current project and puts them into a subfolder, making production deployments immune to dependencies hosting failures.

+

Denos approach to built-in APIs beautifully bootstraps from its url-based dependency management. +First, Deno provides a set of runtime APIs. +These APIs are absolutely stable, follow existing standards (eg, fetch for doing networking), and play the role of providing cross-platform interface for the underlying OS. +Then theres the standard library. +Theres an ambition to provide a comprehensive batteries included standard library, which is vetted by core developers, a-la Go. +At the same time, huge stdlib requires a lot of work over many years. +So, as a companion to a stable 1.30.3 runtime APIs, which is a part of deno binary, theres 0.177.0 version of stdlib, which is downloaded just like any other dependency. +I am fairly certain that in time this will culminate in actually stable, comprehensive, and high quality stdlib.

+

All these together mean that you can be sure that, if you got deno --version working, then deno run your-script.ts will always work, as the surface area for things to go wrong due to differences in the environment is drastically cut.

+

The only big drawback of Deno is the language all this runtime awesomeness is tied to TypeScript. +JavaScript is a curious beast post ES6, it is actually quite pleasant to use, and has some really good parts, like injection-proof template literal semantics. +But all the old WATs like

+ +
+ + +
["10", "10", "10"].map(parseInt)
+ +
+

are still there. +TypeScript does an admirable job with typing JavaScript, as it exists in the wild, but the resulting type system is not simple. +It seems that, linguistically, something substantially better than TypeScript is possible in theory. +But among the actually existing languages, TypeScript seems like a solid choice.

+

To sum up, historically the domain of scripting and glue code was plagued by the problem of accidentally supergluing oneself to a particular UNIX flavor at hand. +Deno finally seems like a technology that tries to solve this issue of implicit dependencies by not having the said dependencies instead of putting everything in a docker container.

+
+
+ + + + + diff --git a/2023/02/16/three-state-stability.html b/2023/02/16/three-state-stability.html new file mode 100644 index 00000000..a1b788bd --- /dev/null +++ b/2023/02/16/three-state-stability.html @@ -0,0 +1,177 @@ + + + + + + + Three-State Stability + + + + + + + + + + + + +
+ +
+ +
+
+ +

Three-State Stability

+

Usually, when discussing stability of the APIs (in a broad sense; databases and programming languages are also APIs), only two states are mentioned:

+ +

This is reflected in, e.g, SemVer: before 1.0, anything goes, after 1.0 you only allow to break API if you bump major version.

+

I think the actual situation in the real world is a bit more nuanced than that. +In addition to clearly stable or clearly unstable, theres often a poorly defined third category. +It often manifests as either:

+ +

Heres what I think happens over a lifetime of a typical API:

+

In the first phase, the API is actively evolving. +There is a promise of anti-stability theres constant change and a lot of experimentation. +Almost no one is using the project seriously:

+ +

In the second phase, the API is mostly settled. +It does everything it needs to do, and the shape feels mostly right. +Transition to this state happens when the API maintainers feel like they nailed down everything. +However, no wide deployment had happened, so there might still be minor, but backwards incompatible adjustments wanting to be made. +It makes sense to use the API for all active projects (though it costs you an innovation token). +The thing basically works, you might need to adjust your code from time to time, occasionally an adjustment is not trivial, but the overall expected effort is low. +The API is fully production ready, and has everything except stability. +If you write a program on top of the API today, and try to run it ten years later, it will fail. +But if you are making your own releases a couple of times a year, you should be fine.

+

In the third phase, the API is fully stable, and no backwards-incompatible changes are expected. +Otherwise, it is identical to the second phase. +Transition to this phase happens after:

+ +

In other words, it is not unstable -> stable, it is rather:

+ +

We dont have great, catchy terms to describe the second bullet, so it gets lumped together with the first or the last one.

+
+
+ + + + + diff --git a/2023/02/21/why-SAT-is-hard.html b/2023/02/21/why-SAT-is-hard.html new file mode 100644 index 00000000..a6d2fe73 --- /dev/null +++ b/2023/02/21/why-SAT-is-hard.html @@ -0,0 +1,258 @@ + + + + + + + Why SAT Is Hard + + + + + + + + + + + + +
+ +
+ +
+
+ +

Why SAT Is Hard

+

An introductory post about complexity theory today! +It is relatively well-known that there exist so-called NP-complete problems particularly hard problems, such that, if you solve one of them efficiently, you can solve all of them efficiently. +I think Ive learned relatively early that, e.g., SAT is such a hard problem. +Ive similarly learned a bunch of specific examples of equally hard problems, where solving one solves the other. +However, why SAT is harder than any NP problem remained a mystery for a rather long time to me. +It is a shame this fact is rather intuitive and easy to understand. +This post is my attempt at an explanation. +It assumes some familiarity with the space, but its not going to be too technical or thorough.

+
+ +

+ Summary +

+

Lets say you are solving some search problem, like find a path that visits every vertex in a graph once. +It is often possible to write a naive algorithm for it, where we exhaustively check every possible prospective solution:

+ +
+ + +
for every possible path:
+    if path visits every vertex once:
+        return path
+else:
+    return "no solution"
+ +
+

Although checking each specific candidate is pretty fast, the whole algorithm is exponential, because there are too many (exponent of) candidates. +Turns out, it is possible to write check if solution fits part as a SAT formula! +And, if you have a magic algorithm which solves SAT, you can use that to find a candidate solution which would work instead of enumerating all solutions!

+

In other words, solving SAT removes search from search and check.

+

Thats more or less everything I wanted to say today, but lets make this a tiny bit more formal.

+
+
+ +

+ Background +

+

We will be discussing algorithms and their runtime. +Big-O notation is a standard instrument for describing performance of algorithms, as it erases small differences which depend on a particular implementation of the algorithm. +Both 2N + 1000 and 100N are O(N), linear.

+

In this post we will be even less precise. +We will talk about polynomial time an algorithm is polynomial if it is O(Nk) for some k. +For example, N100 is polynomial, while 2N is not.

+

We will also be thinking about Turing machines (TMs) as our implementation device. +Programming algorithms directly on Turing machines is cumbersome, but TMs have two advantages for our use case:

+
    +
  • +its natural to define runtime of TM +
  • +
  • +its easy to simulate a TM as a part of some larger algorithm (an interpreter for a TM is a small program) +
  • +
+

Finally, we will only think about problems with binary answers (decision problem). +“Is there a solution to this formula? rather than what is the solution to this formula?. +“Is there a path in the graph of length at least N? rather than what is the longest path in this graph?.

+
+
+ +

+ Definitions +

+

Intuitively, a problem is NP if its easy to check that a solution is valid (even if finding the solution might be hard). +This intuition doesnt exactly work for yes/no problems we are considering. +To fix this, we will also provide a hint for the checker. +For example, if the problem is is there a path of length N in a given graph? the hint will be a path.

+

A decision problem is NP, if theres an algorithm that can verify a yes answer in polynomial time, given a suitable hint.

+

That is, for every input where the answer is yes (and only for those inputs) there should be a hint that makes our verifying algorithm answer yes.

+

Boolean satisfiability, or SAT is a decision problem where an input is a boolean formula like

+ +
+ + +
(A and B and !C) or
+(C and D) or
+!B
+ +
+

and the answer is yes if the formula evaluates to true for some variable assignment.

+

Its easy to see that SAT is NP: the hint is variable assignment which satisfies the formula, and verifier evaluates the formula.

+
+
+ +

+ Sketch of a Proof +

+

Turns out, there is the hardest problem in NP solving just that single problem in polynomial time automatically solves every other NP problem in polynomial time (we call such problems NP-complete). +Moreover, theres actually a bunch of such problems, and SAT is one of them. +Lets see why!

+

First, lets define a (somewhat artificial) problem which is trivially NP-complete.

+

Lets start with this one: Given a Turing machine and an input for it of length N, will the machine output yes after Nk steps?” +(here k is a fixed parameter; pedantically, I describe a family of problems, one for each k)

+

This is very similar to a halting problem, but also much easier. +We explicitly bound the runtime of the Turing machine by a polynomial, so we dont need to worry about looping forever case that would be a no for us. +The naive algorithm here works: we just run the given machine on a given input for a given amount of steps and look at the answer.

+

Now, if we formulate the problem as Is there an input I for a given Turing machine M such that M(I) answers yes after Nk steps? we get our NP-complete problem. +Its trivially NP the hint is the input that makes the machine answer yes, and the verifier just runs our TM with this input for Nk steps. +It can also be used to efficiently solve any other NP problem (e.g. SAT). +Indeed, we can use the verifying TM as M, and that way find if theres any hint that makes it answer yes.

+

This is a bit circular and hard to wrap ones head around, but, at the same time, trivial. +We essentially just carefully stare at the definition of an NP problem, specifically produce an algorithm that can solve any NP problem by directly using the definition, and notice that the resulting algorithm is also NP. +Now theres no surprise that there exists the hardest NP problem we essentially defined NP such that this is the case.

+

What is still a bit mysterious is why non-weird problems like SAT also turn out to be NP-complete? +This is because SAT is powerful enough to encode a Turing machine!

+

First, note that we can encode a state of a Turing machine as a set of boolean variables. +Well need a boolean variable Ti for each position on a tape. +The tape is in general infinite, but all our Turing machines run for polynomial (finite) time, so they use only a finite amount of cells, and its enough to create variables only for those cells. +Position of the head can also be described by a set of booleans variables. +For example, we can have a Pi is the head at a cell i variable for each cell. +Similarly, we can encode the finite number of states our machine can be in as a set of Si variables (is the machine in state i?).

+

Second, we can write a set of boolean equations which describe a single transition of our Turing machine. +For example the value of cell i at the second step T2i will depend on its value on the previous step T1i, whether the head was at i (P1i) and the rules of our specific states. +For example, if our machine flips bits in state 0 and keeps them in state 1, then the formula we get for each cell is

+ +
+ + +
T2_i <=>
+  (!P1_i and T1_i) # head is not on our cell, it can't change
+or (P1_i and (
+    S1_0 and !T1_i # flip case
+or  S1_1 and T1_i  # keep case
+))
+ +
+

We can write similar formulas for changes of P and S families of variables.

+

Third, after we wrote the transition formula for a single step, we can stack several such formulas on top of each other to get a formula for N steps.

+

Now lets come back to our universal problem: is there an input which makes a given Turing machine answer yes in Nk steps?. +At this point, its clear that we can replace a Turing machine with Nk steps with our transition formula duplicated Nk times. +So, the question of existence of an input for a Turing machine reduces to the question of existence of a solution to a (big, but still polynomial) SAT formula.

+

And this concludes the sketch!

+
+
+ +

+ Summary, Again +

+

SAT is hard, because it allows encoding Turing machine transitions. +We cant encode loops in SAT, but we can encode N steps of a Turing machine by repeating the same formula N times with small variations. +So, if we know that a particular Turing machine runs in polynomial time, we can encode it by a polynomially-sized formula. +(see also pure meson ray-tracer for a significantly more practical application of a similar idea).

+

And that means that every problem that can be solved by a brute-force search over all solutions can be reduced to a SAT instance, by encoding the body of the search loop as a SAT formula!

+
+
+
+ + + + + diff --git a/2023/03/08/an-engine-for-an-editor.html b/2023/03/08/an-engine-for-an-editor.html new file mode 100644 index 00000000..6101e4e4 --- /dev/null +++ b/2023/03/08/an-engine-for-an-editor.html @@ -0,0 +1,204 @@ + + + + + + + An Engine For An Editor + + + + + + + + + + + + +
+ +
+ +
+
+ +

An Engine For An Editor

+

A common trope is how, if one wants to build a game, one should build a game, rather than a game engine, because it is all too easy to fall into a trap of building a generic solution, without getting to the game proper. +It seems to me that the situation with code editors is the opposite many people build editors, but few are building editor engines. +Whats an editor engine? A made up term I use to denote a thin waist the editor is build upon, the set of core concepts, entities and APIs which power the variety of editors components. +In this post, I will highlight Emacs thin waist, which I think is worthy of imitation!

+

Before we get to Emacs, lets survey various APIs for building interactive programs.

+
+
Plain text
+
+

The simplest possible thing, the UNIX way of programs-filters, reading input from stdin and writing data to stdout. +The language here is just plain text.

+
+
ANSI escape sequences
+
+

Adding escape codes to plain text (and a bunch of ioctls) allows changing colors and clearing the screen. +The language becomes a sequence of commands for the terminal (with print text being a fairly frequent one). +This already is rich enough to power a variety of terminal applications, such as vim!

+
+
HTML
+
+

With more structure, we can disentangle ourselves from text, and say that all the stuff is made of trees of attributed elements (whose content might be text). +That turns out to be enough to express basically whatever, as the world of modern web apps testifies.

+
+
Canvas
+
+

Finally, to achieve maximal flexibility, we can start with a clean 2d canvas with pixels and an event stream, and let the app draw however it likes. +Desktop GUIs usually work that way (using some particular widget library to encapsulate common patterns of presentation and event handling).

+
+
+
+

Emacs is different. +Its thin waist consists of (using idiosyncratic olden editor terminology) frames, windows, buffers and attributed text. +This is less general than canvas or HTML, but more general (and way more principled) than ANSI escapes. +Crucially, this also retains most of plain texts composability.

+

The foundation is a text with attributes a pair of a string and a map from strings subranges to key-value dictionaries. +Attributes express presentation (color, font, text decoration), but also semantics. +A range of text can be designated as clickable. +Or it can specify a custom keymap, which is only active when the cursor is on this range.

+

I find this to be a sweet spot for building efficient user interfaces. +Consider magit:

+ +
+ + +
+

The interface is built from text, but it is more discoverable, more readable, and more efficient than GUI solutions.

+

Text is surprisingly good at communicating with humans! +Forgoing arbitrary widgets and restricting oneself to a grid of characters greatly constrains the set of possible designs, but designs which come out of these constraints tend to be better.

+
+

The rest (buffers, windows, and frames) serve to present attributed strings to the user. +A Buffer holds a piece of text and stores position of the cursor (and the rest of editors state for this particular piece of text). +A tiling window manager displays buffers:

+ +

Theres also a tasteful selection of extras outside this orthogonal model. +A buffer holds a status bar at the bottom and a set of fringe decorations at the left edge. +Each floating window has a minibuffer an area to type commands into (minibuffer is a buffer though only presentation is slightly unusual).

+

But the vast majority of everything else is not special every significant thing is a buffer. +So, ./main.rs file, ./src file tree, a terminal session where you type cargo build are all displayed as attributed text. +All use the same tools for navigation and manipulation.

+

Universality is the power of the model. +Good old UNIX pipes, except interactive. +With a GUI file manager, mass-renaming files requires a dedicated utility. +In Emacs, file managers state is text, so you can use standard text-manipulation tools (regexes, multiple cursors, vims .) for the same task.

+
+ +

+ Conclusions +

+

Pay more attention to the editors thin waist. +Dont take it as a given that an editor should be a terminal, HTML, or GUI app there might be a better vocabulary. +In particular, Emacs seems to hit the sweet spot with its language of attributed strings and buffers.

+

I am not sure that Emacs is the best we can do, but having a Rust library which implements Emacs model more or less as is would be nice! +The two best resources to learn about this model are

+ +
+
+
+ + + + + diff --git a/2023/03/26/zig-and-rust.html b/2023/03/26/zig-and-rust.html new file mode 100644 index 00000000..e8332544 --- /dev/null +++ b/2023/03/26/zig-and-rust.html @@ -0,0 +1,361 @@ + + + + + + + Zig And Rust + + + + + + + + + + + + +
+ +
+ +
+
+ +

Zig And Rust

+

This post will be a bit all over the place. +Several months ago, I wrote Hard Mode Rust, exploring an allocation-conscious style of programming. +In the ensuing discussion, @jamii name-dropped TigerBeetle, a reliable, distributed, fast, and small database written in Zig in a similar style, and, well, I now find myself writing Zig full-time, after more than seven years of Rust. +This post is a hand-wavy answer to the why? question. +It is emphatically not a balanced and thorough comparison of the two languages. +I havent yet written my 100k lines of Zig to do that. +(if you are looking for a more general what the heck is Zig, I can recommend @jamiis post). +In fact, this post is going to be less about languages, and more about styles of writing software (but pre-existing knowledge of Rust and Zig would be very helpful). +Without further caveats, lets get started.

+
+ +

+ Reliable Software +

+

To the first approximation, we all strive to write bug-free programs. +But I think a closer look reveals that we dont actually care about programs being correct 100% of the time, at least in the majority of the domains. +Empirically, almost every program has bugs, and yet it somehow works out OK. +To pick one specific example, most programs use stack, but almost no programs understand what their stack usage is exactly, and how far they can go. +When we call malloc, we just hope that we have enough stack space for it, we almost never check. +Similarly, all Rust programs abort on OOM, and cant state their memory requirements up-front. +Certainly good enough, but not perfect.

+

The second approximation is that we strive to balance program usefulness with the effort to develop the program. +Bugs reduce usefulness a lot, and there are two styles of software engineering to deal with the:

+

Erlang style, where we embrace failability of both hardware and software and explicitly design programs to be resilient to partial faults.

+

SQLite style, where we overcome an unreliable environment at the cost of rigorous engineering.

+

rust-analyzer and TigerBeetle are perfect specimens of the two approaches, let me describe them.

+
+
+ +

+ rust-analyzer +

+

rust-analyzer is an LSP server for the Rust programming language. +By its nature, its expansive. +Great developer tools usually have a feature for every niche use-case. +It also is a fast-moving open source project which has to play catch-up with the rustc compiler. +Finally, the nature of IDE dev tooling makes availability significantly more important than correctness. +An erroneous completion option would cause a smirk (if it is noticed at all), while the server crashing and all syntax highlighting turning off will be noticed immediately.

+

For this cluster of reasons, rust-analyzer is shifted far towards the embrace software imperfections side of the spectrum. +rust-analyzer is designed around having bugs. +All the various features are carefully compartmentalized at runtime, such that panicking code in just a single feature cant bring down the whole process. +Critically, almost no code has access to any mutable state, so usage of catch_unwind cant lead to a rotten state.

+

Development process itself is informed by this calculus. +For example, PRs with new features land when theres a reasonable certainty that the happy case works correctly. +If some weird incomplete code would cause the feature to crash, thats OK. +It might be even a benefit fixing a well-reproducible bug in an isolated feature is a gateway drug to heavy contribution to rust-analyzer. +Our tight weekly release schedule (and the nightly release) help to get bug fixes out there faster.

+

Overall, the philosophy is to maximize provided value by focusing on the common case. +Edge cases become eventually correct over time.

+
+
+ +

+ TigerBeetle +

+

TigerBeetle is the opposite of that.

+

It is a database, with domain model fixed at compile time (we currently do double-entry bookkeeping). +The database is distributed, meaning that there are six TigerBeetle replicas running on different geographically and operationally isolated machines, which together implement a replicated state machine. +That is, TigerBeetle replicas exchange messages to make sure every replica processes the same set of transactions, in the same order. +Thats a surprisingly hard problem if you allow machines to fail (the whole point of using many machines for redundancy), so we use a smart consensus algorithm (non-byzantine) for this. +Traditionally, consensus algorithms assume reliable storage data once written to disk can be always retrieved later. +In reality, storage is unreliable, nearly byzantine a disk can return bogus data without signaling an error, and even a single such error can break consensus. +TigerBeetle combats that by allowing a replica to repair its local storage using data from other replicas.

+

On the engineering side of things, we are building a reliable, predictable system. +And predictable means really predictable. +Rather than reining in sources of non-determinism, we build the whole system from the ground up from a set of fully deterministic, hand crafted components. +Here are some of our unconventional choices (design doc):

+

Its hard mode! +We allocate all the memory at a startup, and theres zero allocation after that. +This removes all the uncertainty about allocation.

+

The code is architected with brutal simplicity. +As a single example, we dont use JSON, or ProtoBuf, or CapnProto for serialization. +Rather, we just cast the bytes we received from the network to a desired type. +The motivation here is not so much performance, as reduction of the number of moving parts. +Parsing is hard, but, if you control both sides of the communication channel, you dont need to do it, you can send checksummed data as is.

+

We aggressively minimize all dependencies. +We know exactly the system calls our system is making, because all IO is our own code (on Linux, our main production platform, we dont link libc).

+

Theres little abstraction between components all parts of TigerBeetle work in concert. +For example, one of our core types, Message, is used throughout the stack:

+
    +
  • +network receives bytes from a TCP connection directly into a Message +
  • +
  • +consensus processes and sends Messages +
  • +
  • +similarly, storage writes Messages to disk +
  • +
+

This naturally leads to very simple and fast code. +We dont need to do anything special to be zero copy given that we allocate everything up-front, we simply dont have any extra memory to copy the data to! +(A separate issue is that, arguably, you just cant treat storage as a separate black box in a fault-tolerant distributed system, because storage is also faulty).

+

Everything in TigerBeetle has an explicit upper-bound. +Theres not a thing which is just an u32 all data is checked to meet specific numeric limits at the edges of the system.

+

This includes Messages. +We just upper-bound how many messages can be in-memory at the same time, and allocate precisely that amount of messages (source). +Getting a new message from the message pool cant allocate and cant fail.

+

With all that strictness and explicitness about resources, of course we also fully externalize any IO, including time. +All inputs are passed in explicitly, theres no ambient influences from the environment. +And that means that the bulk of our testing consists of trying all possible permutations of effects of the environment. +Deterministic randomized simulation is very effective at uncovering issues in real implementations of distributed systems.

+

What I am getting at is that TigerBeetle isnt really a normal program program. +It strictly is a finite state machine, explicitly coded as such.

+
+
+ +

+ Back From The Weeds +

+

Oh, right, Rust and Zig, the topic of the post!

+

I find myself often returning to the first Rust slide deck. +A lot of core things are different (no longer Rust uses only the old ideas), but a lot is the same. +To be a bit snarky, while Rust is not for lone genius hackers, Zig kinda is. +On more peaceable terms, while Rust is a language for building modular software, Zig is in some sense anti-modular.

+

Its appropriate to quote Bryan Cantrill here:

+ +
+

I can write C that frees memory properlythat basically doesnt suffer from +memory corruptionI can do that, because Im controlling heaven and earth in +my software. It makes it very hard to compose software. Because even if you and +I both know how to write memory safe C, its very hard for us to have an +interface boundary where we can agree about who does what.

+
+ +
+

Thats the core of what Rust is doing: it provides you with a language to precisely express the contracts between components, such that components can be integrated in a machine-checkable way.

+

Zig doesnt do that. It isnt even memory safe. My first experience writing a non-trivial Zig program went like this:

+ +
+

ME: Oh wow! Do you mean I can finally just store a pointer to a structs field in the struct itself?

+

30 seconds later

+

PROGRAM: Segmentation fault.

+
+ +
+

However!
+Zig is a much smaller language than Rust. +Although youll have to be able to keep the entirety of the program in your head, to control heaven and earth to not mess up resource management, doing that could be easier.

+

Its not true that rewriting a Rust program in Zig would make it simpler. +On the contrary, I expect the result to be significantly more complex (and segfaulty). +I noticed that a lot of Zig code written in lets replace RAII with defer style has resource-management bugs.

+

But it often is possible to architect the software such that theres little resource management to do (eg, allocating everything up-front, like TigerBeetle, or even at compile time, like many smaller embedded systems). +Its hard simplicity is always hard. +But, if you go this way, I feel like Zig can provide substantial benefits.

+

Zig has just a single feature, dynamically-typed comptime, which subsumes most of the special-cased Rust machinery. +It is definitely a tradeoff, instantiation-time errors are much worse for complex cases. +But a lot more of the cases are simple, because theres no need for programming in the language of types. +Zig is very spartan when it comes to the language. +There are no closures if you want them, youll have to pack a wide-pointer yourself. +Zigs expressiveness is aimed at producing just the right assembly, not at allowing maximally concise and abstract source code. +In the words of Andrew Kelley, Zig is a DSL for emitting machine code.

+

Zig strongly prefers explicit resource management. +A lot of Rust programs are web-servers. +Most web servers have a very specific execution pattern of processing multiple independent short-lived requests concurrently. +The most natural way to code this would be to give each request a dedicated bump allocator, which turns drops into no-ops and frees the memory at bulk after each request by resetting offset to zero. +This would be pretty efficient, and would provide per-request memory profiling and limiting out of the box. +I dont think any popular Rust frameworks do this using the global allocator is convenient enough and creates a strong local optima. +Zig forces you to pass the allocator in, so you might as well think about the most appropriate one!

+

Similarly, the standard library is very conscious about allocation, more so than Rusts. +Collections are not parametrized by an allocator, like in C++ or (future) Rust. +Rather, an allocator is passed in explicitly to every method which actually needs to allocate. +This is Call Site Dependency Injection, and it is more flexible. +For example in TigerBeetle we need a couple of hash maps. +These maps are sized at a startup time to hold just the right number of elements, and are never resized. +So we pass an allocator to init method, but we dont pass it to the event loop. +We get to both use the standard hash-map, and to feel confident that theres no way we can allocate in the actual event loop, because it doesnt have access to an allocator.

+
+
+ +

+ Wishlist +

+

Finally, my wishlist for Zig.

+

First, I think Zigs strength lies strictly in the realm of writing perfect systems software. +It is a relatively thin slice of the market, but it is important. +One of the problems with Rust is that we dont have a reliability-oriented high-level programming language with a good quality of implementation (modern ML, if you will). +This is a blessing for Rust, because it makes its niche bigger, increasing the amount of community momentum behind the language. +This is also a curse, because a bigger niche makes it harder to maintain focus. +For Zig, Rust already plays this role of modern ML, which creates bigger pressure to specialize.

+

Second, my biggest worry about Zig is its semantics around aliasing, provenance, mutability and self-reference ball of problems. +I dont worry all that much about this creating iterator invalidation style of UB. +TigerBeetle runs in -DReleaseSafe, which mostly solves spatial memory safety, it doesnt really do dynamic memory allocation, which unasks the question about temporal memory safety, +and it has a very thorough fuzzer-driven test suite, which squashes the remaining bugs. +I do worry about the semantics of the language itself. +My current understanding is that, to correctly compile a C-like low-level language, one really needs to nail down semantics of pointers. +I am not sure portable assembly is really a thing: it is possible to create a compiler which does little optimization and works as expected most of the time, but I am doubtful that its possible to correctly describe the behavior of such a compiler. +If you start asking questions about what are pointers, and what is memory, you end up in a fairly complicated land, where bytes are poison. +Rust tries to define that precisely, but writing code which abides by the Rust rules without a borrow-checker isnt really possible the rules are too subtle. +Zigs implementation today is very fuzzy around potentially aliased pointers, copies of structs with interior-pointers and the like. +I wish that Zig had a clear answer to what the desired semantics is.

+

Third, IDE support. +Ive written about that before on this blog. +As of today, developing Zig is quite pleasant the language server is pretty spartan, but already is quite helpful, and for the rest, Zig is exceptionally greppable. +But, with the lazy compilation model and the absence of out-of-the-language meta programming, I feel like Zig could be more ambitious here. +To position itself well for the future in terms of IDE support, I think it would be nice if the compiler gets the basic data model for IDE use-case. +That is, there should be an API to create a persistent analyzer process, which ingests a stream of code edits, and produces a continuously updated model of the code without explicit compilation requests. +The model can be very simple, just give me an AST of this file at this point in time would do all the fancy IDE features can be filled in later. +What matters is a shape of data flow through the compiler not an edit-compile cycle, but rather a continuously updated view of the world.

+

Fourth, one of the values of Zig which resonates with me a lot is a preference for low-dependency, self-contained processes. +Ideally, you get yourself a ./zig binary, and go from there. +The preference, at this time of changes, is to bundle a particular version of ./zig with a project, instead of using a system-wide zig. +There are two aspects that could be better.

+

Getting yourself a Zig is a finicky problem, because it requires bootstrapping. +To do that, you need to run some code that will download the binary for your platform, but each platform has its own way to run code. +I wish that Zig provided a blessed set of scripts, get_zig.sh, get_zig.bat, etc (or maybe a small actually portable binary?), which projects could just vendor, so that the contribution experience becomes fully project-local and self-contained:

+ +
+ + +
$ ./get_zig.sh
+$ ./zig build
+ +
+

Once you have ./zig, you can use that to drive the rest of the automation. +You already can ./zig build to drive the build, but theres more to software than just building. +Theres always a long tail of small things which traditionally get solved with a pile of platform-dependent bash scripts. +I wish that Zig pushed the users harder towards specifying all that automation in Zig. +A picture is worth a thousand words, so

+ +
+ + +
# BAD: dependency on the OS
+$ ./scripts/deploy.sh --port 92
+
+# OK: no dependency, but a mouthful to type
+$ ./zig build task -- deploy --port 92
+
+# Would be GREAT:
+$ ./zig do deploy --port 92
+ +
+

Attempting to summarize,

+
    +
  • +Rust is about compositional safety, its a more scalable language than Scala. +
  • +
  • +Zig is about perfection. +It is a very sharp, dangerous, but, ultimately, more flexible tool. +
  • +
+

Discussion on /r/Zig and /r/rust.

+
+
+
+ + + + + diff --git a/2023/03/28/rust-is-a-scalable-language.html b/2023/03/28/rust-is-a-scalable-language.html new file mode 100644 index 00000000..efc7712c --- /dev/null +++ b/2023/03/28/rust-is-a-scalable-language.html @@ -0,0 +1,156 @@ + + + + + + + Rust Is a Scalable Language + + + + + + + + + + + + +
+ +
+ +
+
+ +

Rust Is a Scalable Language

+

In my last post about Zig and Rust, I mentioned that Rust is a scalable language. +Let me expand on this a bit.

+
+ +

+ Vertical Scalability +

+

Rust is vertically scalable, in that you can write all kinds of software in it. +You can write an advanced zero-alloc image compression library, build a web server exposing the library to the world as an HTTP SAAS, and cobble together a script for building, testing, and deploying it to wherever people deploy software these days. +And you would only need Rust while it excels in the lowest half of the stack, its pretty ok everywhere else too.

+
+
+ +

+ Horizontal Scalability +

+

Rust is horizontally scalable, in that you can easily parallelize development of large software artifacts across many people and teams. +Rust itself moves with a breakneck speed, which is surprising for such a loosely coordinated and chronically understaffed open source project of this scale. +The relatively small community managed to put together a comprehensive ecosystem of composable high-quality crates on a short notice. +Rust is so easy to compose reliably that even the stdlib itself does not shy from pulling dependencies from crates.io.

+

Steve Klabnik wrote about Rusts Golden Rule, +how function signatures are mandatory and authoritative and explicitly define the interface both for the callers of the function and for the functions body. +This thinking extends to other parts of the language.

+

My second most favorite feature of Rust (after safety) is its module system. +It has first-class support for the concept of a library. +A library is called a crate and is a tree of modules, a unit of compilation, and a principle visibility boundary. +Modules can contain circular dependencies, but libraries always form a directed acyclic graph. +Theres no global namespace of symbols libraries are anonymous, names only appear on dependency edges between two libraries, and are local to the downstream crate.

+

The benefits of this core compilation model are then greatly amplified by Cargo, which is not a generalized task runner, but rather a rigid specification for what is a package of Rust code:

+
    +
  • +a (library) crate, +
  • +
  • +a manifest, which defines dependencies between packages in a declarative way, using semver, +
  • +
  • +an ecosystem-wide agreement on the semantics of dependency specification, and accompanied dependency resolution algorithm. +
  • +
+

Crucially, theres absolutely no way in Cargo to control the actual build process. +The build.rs file can be used to provide extra runtime inputs, but its cargo who calls rustc.

+

Again, Cargo defines a rigid interface for a reusable piece of Rust code. +Both producers and consumers must abide by these rules, there is no way around them. +As a reward, they get a super-power of working together by working apart. +I dont need to ping dtolnay in Slack when I want to use serde-json because we implicitly pre-agreed to a shared golden rule.

+
+
+
+ + + + + diff --git a/2023/04/02/ub-might-be-the-wrong-term-for-newer-languages.html b/2023/04/02/ub-might-be-the-wrong-term-for-newer-languages.html new file mode 100644 index 00000000..9a98a635 --- /dev/null +++ b/2023/04/02/ub-might-be-the-wrong-term-for-newer-languages.html @@ -0,0 +1,132 @@ + + + + + + + UB Might Be a Wrong Term for Newer Languages + + + + + + + + + + + + +
+ +
+ +
+
+ +

UB Might Be a Wrong Term for Newer Languages

+

A short note on undefined behavior, which assumes familiarity with the subject (see this article for the introduction). +The TL;DR is that I think that carrying the wording from the C standard into newer languages, like Zig and Rust, might be a mistake. +This is strictly the word choice, the lexical syntax of the comments argument.

+

The C standard leaves many behaviors undefined. +However, it allows any particular implementation to fill in the gaps and define some of undefined-in-the-standard behaviors. +For example, C23 makes realloc(ptr, 0) into an undefined behavior, so that POSIX can further refine it without interfering with the standard (source).

+

Its also valid for an implementation to leave UB undefined. +If a program compiled with this implementation hits this UB path, the behavior of the program as a whole is undefined +(or rather, bounded by the execution environment. It is not actually possible to summon nasal daemons, because a user-space process can not escape its memory space other than by calling syscalls, and there are no nasal daemons summoning syscalls).

+

C implementations are not required to but may define behaviors left undefined by the standard. +A C program written for a specific implementation may rely on undefined-in-the-standard but defined-in-the-implementation behavior.

+

Modern languages like Rust and Zig re-use the undefined behavior term. +However, the intended semantics is subtly different. +A program exhibiting UB is always considered invalid. +Even if an alternative implementation of Rust defines some of Rusts UB, the programs hitting those behaviors would still be incorrect.

+

For this reason, I think it would be better to use a different term here. +I am not ready to suggest a specific wording, but a couple of reasonable options would be non-trapping programming error or invalid behavior. +The intended semantics being that any program execution containing illegal behavior is invalid under any implementation.

+

Curiously, C++ is ahead of the pack here, as it has an explicit notion of ill-formed, no diagnostic required.

+

Update: Ive since learned that Zig is updating its terminology. +The new term is illegal behavior. +This is perfect, illegal has just the right connotation of being explicitly declared incorrect by a written specification.

+
+
+ + + + + diff --git a/2023/04/09/can-you-trust-a-compiler-to-optimize-your-code.html b/2023/04/09/can-you-trust-a-compiler-to-optimize-your-code.html new file mode 100644 index 00000000..2d8e142e --- /dev/null +++ b/2023/04/09/can-you-trust-a-compiler-to-optimize-your-code.html @@ -0,0 +1,557 @@ + + + + + + + Can You Trust a Compiler to Optimize Your Code? + + + + + + + + + + + + +
+ +
+ +
+
+ +

Can You Trust a Compiler to Optimize Your Code?

+

More or less the title this time, but first, a story about SIMD. There are three +levels of understanding how SIMD works (well, at least I am level 3 at the moment):

+
    +
  1. +

    Compilers are smart! They will auto-vectorize all the code!

    +
  2. +
  3. +

    Compilers are dumb, auto-vectorization is fragile, its very easy to break it +by unrelated changes to the code. Its always better to manually write +explicit SIMD instructions.

    +
  4. +
  5. +

    Writing SIMD by hand is really hard youll need to re-do the work for +every different CPU architecture. Also, you probably think that, for scalar +code, a compiler writes better assembly than you. What makes you think that +youd beat the compiler at SIMD, where there are more funky instructions and +constraints? Compilers are tools. They can reliably vectorize code if it is +written in an amenable-to-vectorization form.

    +
  6. +
+

Ive recently moved from the second level to the third one, and that made me aware of the moment when the model used by a compiler for optimization clicked in my head. +In this post, I want to explain the general framework for reasoning about compiler optimizations for static languages such as Rust or C++. +After that, Ill apply that framework to auto-vectorization.

+

I havent worked on backends of production optimizing compilers, so the following will not be academically correct, but these models are definitely helpful at least to me!

+
+ +

+ Seeing Like a Compiler +

+

The first bit of a puzzle is understanding how a compiler views code. Some useful references here include +The SSA Book or LLVMs +Language Reference.

+

Another interesting choice would be WebAssembly Specification. +While WASM would be a poor IR for an optimizing compiler, it has a lot of structural similarities, and the core spec is exceptionally readable.

+

A unit of optimization is a function. +Lets take a simple function like the following:

+ +
+ + +
fn sum(xs: &[i32]) -> i32 {
+  let mut total = 0;
+  for i in 0..xs.len() {
+    total = total.wrapping_add(xs[i]);
+  }
+  total
+}
+ +
+

In some pseudo-IR, it would look like this:

+ +
+ + +
fn sum return i32 {
+  param xs_ptr: ptr
+  param xs_len: size
+
+  local total: i32 = 0
+  local i: size = 0
+  local x: i32
+
+loop:
+  branch_if i >= xs_len :ret
+  load x base=xs_ptr offset=i
+  add total x
+  add i 1
+  goto :loop
+
+ret:
+  return total
+}
+ +
+

The most important characteristic here is that there are two kinds of entities:

+

First, there is program memory, very roughly an array of bytes. +Compilers generally can not reason about the contents of the memory very well, because it is shared by all the functions, and different functions might interpret the contents of the memory differently.

+

Second, there are local variables. +Local variables are not bytes they are integers, they obey mathematical properties which a compiler can reason about.

+

For example, if a compiler sees a loop like

+ +
+ + +
param n: u32
+local i: u32 = 0
+local total: u32
+local tmp
+
+loop:
+  branch_if i >= n :ret
+  set tmp i
+  mul tmp 4
+  add t tmp
+  goto :loop
+
+ret:
+  return total
+ +
+

It can reason that on each iteration tmp holds i * 4 and optimize the code to

+ +
+ + +
param n: u32
+local i: u32 = 0
+local total: u32
+local tmp = 0
+
+loop:
+  branch_if i >= n :ret
+  add t tmp
+  add tmp 4  # replace multiplication with addition
+  goto :loop
+
+ret:
+  return total
+ +
+

This works, because all locals are just numbers. +If we did the same computation, but all numbers were located in memory, it would be significantly harder for a compiler to reason that the transformation is actually correct. +What if the storage for n and total actually overlaps? +What if tmp overlaps with something which isnt even in the current function?

+

However, theres a bridge between the worlds of mathematical local variables and the world of memory bytes load and store instructions. +The load instruction takes a range of bytes in memory, interprets the bytes as an integer, and stores that integer into a local variable. +The store instruction does the opposite. +By loading something from memory into a local, a compiler gains the ability to reason about it precisely. +Thus, the compiler doesnt need to track the general contents of memory. +It only needs to check that it would be correct to load from memory at a specific point in time.

+

So, a compiler really doesnt see all that well it can only really reason about a single function at a time, and only about the local variables in that function.

+
+
+ +

+ Bringing Code Closer to Compilers Nose +

+

Compilers are myopic. +This can be fixed by giving more context to the compiler, which is the task of two core optimizations.

+

The first core optimization is inlining. +It substitutes callees body for a specific call. +The benefit here is not that we eliminate function call overhead, thats relatively minor. +The big thing is that locals of both the caller and the callee are now in the same frame, and a compiler can optimize them together.

+

Lets look again at that Rust code:

+ +
+ + +
fn sum(xs: &[i32]) -> i32 {
+  let mut total = 0;
+  for i in 0..xs.len() {
+    total = total.wrapping_add(xs[i]);
+  }
+  total
+}
+ +
+

The xs[i] expression there is actually a function call. +The indexing function does a bounds check before accessing the element of an array. +After inlining it into the sum, compiler can see that it is dead code and eliminate it.

+

If you look at various standard optimizations, they often look like getting rid of dumb things, which no one would actually write in the first place, so its not clear immediately if it is worth it to implement such optimizations. +But the thing is, after inlining a lot of dumb things appear, because functions tend to handle the general case, and, at a specific call-site, there are usually enough constraints to dismiss many edge cases.

+

The second core optimization is scalar replacement of aggregates. +It is a generalization of the lets use load to avoid reasoning about memory and reason about a local instead idea weve already seen.

+

If you have a function like

+ +
+ + +
fn permute(xs: &mut Vec<i32>) {
+  ...
+}
+ +
+

its pretty difficult for the compiler to reason about it. +It receives a pointer to some memory which holds a complex struct (ptr, len, capacity triple), so reasoning about evolution of this struct is hard. +What the compiler can do is to load this struct from memory, replacing the aggregate with a bunch of scalar local variables:

+ +
+ + +
fn permute(xs: &mut Vec<i32>) {
+  local ptr: ptr
+  local len: usize
+  local cap: usize
+
+  load ptr xs.ptr
+  load len xs.len
+  load cap xs.cap
+
+  ...
+
+  store xs.ptr ptr
+  store xs.len len
+  store xs.cap cap
+}
+ +
+

This way, a compiler again gains reasoning power. +SROA is like inlining, but for memory rather than code.

+
+
+ +

+ Impossible and Possible +

+

Using this mental model of a compiler which:

+
    +
  • +optimizes on a per-function basis, +
  • +
  • +can inline function calls, +
  • +
  • +is great at noticing relations between local variables and rearranging the code based on that, +
  • +
  • +is capable of limited reasoning about the memory (namely, deciding when its safe to load or store) +
  • +
+

we can describe which code is reliably optimizable, and which code prevents optimizations, explaining zero cost abstractions.

+

To enable inlining, a compiler needs to know which function is actually called. +If a function is called directly, its pretty much guaranteed that a compiler would try to inline it. +If the call is indirect (via function pointer, or via a table of virtual functions), in the general case a compiler wont be able to inline that. +Even for indirect calls, sometimes the compiler can reason about the value of the pointer and de-virtualize the call, but that relies on successful optimization elsewhere.

+

This is the reason why, in Rust, every function has a unique, zero-sized type with no runtime representation. +It statically guarantees that the compiler could always inline the code, and makes this abstraction zero cost, because any decent optimizing compiler will melt it to nothing.

+

A higher level language might choose to always represent functions with function pointers. +In practice, in many cases the resulting code would be equivalently optimizable. +But there wont be any indication in the source whether this is an optimizable case (the actual pointer is knowable at compile time) or a genuinely dynamic call. +With Rust, the difference between guaranteed to be optimizable and potentially optimizable is reflected in the source language:

+ +
+ + +
// Compiler is guaranteed to be able to inline call to `f`.
+fn call1<F: Fn()>(f: F) {
+  f()
+}
+
+// Compiler _might_ be able to inline call to `f`.
+fn call2(f: fn()) {
+  f()
+}
+ +
+

So, the first rule is to make most of the calls statically resolvable, to allow inlining. +Function pointers and dynamic dispatch prevent inlining. +Separate compilation might also get in a way of inlining, see this separate essay on the topic.

+

Similarly, indirection in memory can cause troubles for the compiler.

+

For something like this

+ +
+ + +
struct Foo {
+  bar: Bar,
+  baz: Baz,
+}
+ +
+

the Foo struct is completely transparent for the compiler.

+

While here:

+ +
+ + +
struct Foo {
+  bar: Box<Bar>,
+  baz: Baz,
+}
+ +
+

it is not clear cut. +Proving something about the memory occupied by Foo does not in general transfer to the memory occupied by Bar. +Again, in many cases a compiler can reason through boxes thanks to uniqueness, but this is not guaranteed.

+

A good homework at this point is to look at Rusts iterators and understand why they look the way they do.

+

Why the signature and definition of map is

+ +
+ + +
#[inline]
+fn map<B, F>(self, f: F) -> Map<Self, F>
+where
+  Self: Sized,
+  F: FnMut(Self::Item) -> B,
+{
+  Map::new(self, f)
+}
+ +
+

Another important point about memory is that, in general, a compiler cant change the overall layout of stuff. +SROA can load some data structure into a bunch of local variables, which then can, eg, replace a pointer and an index representation with a pair of pointers. +But at the end of the day SROA would have to materialize a pointer and an index back and store that representation back into the memory. +This is because memory layout is shared across all functions, so a function can not unilaterally dictate a more optimal representation.

+

Together, these observations give a basic rule for the baseline of performant code.

+ +
+
+ +

+ SIMD +

+

Lets apply this general framework of giving a compiler optimizable code to work with to auto-vectorization. +We will be optimizing the function which computes the longest common prefix between two slices of bytes (thanks @nkkarpov for the example).

+

A direct implementation would look like this:

+ +
+ + +
use std::iter::zip;
+
+// 650 milliseconds
+fn common_prefix(xs: &[u8], ys: &[u8]) -> usize {
+  let mut result = 0;
+  for (x, y) in zip(xs, ys) {
+    if x != y { break; }
+    result += 1
+  }
+  result
+}
+ +
+

If you already have a mental model for auto-vectorization, or if you look at the assembly output, you can realize that the function as written works one byte at a time, which is much slower than it needs to be. +Lets fix that!

+

SIMD works on many values simultaneously. +Intuitively, we want the compiler to compare a bunch of bytes at the same time, but our current code does not express that. +Lets make the structure explicit, by processing 16 bytes at a time, and then handling remainder separately:

+ +
+ + +
// 450 milliseconds
+fn common_prefix(xs: &[u8], ys: &[u8]) -> usize {
+  let chunk_size = 16;
+
+  let mut result = 0;
+
+  'outer: for (xs_chunk, ys_chunk) in
+    zip(xs.chunks_exact(chunk_size), ys.chunks_exact(chunk_size))
+  {
+    for (x, y) in zip(xs_chunk, ys_chunk) {
+      if x != y { break 'outer; }
+      result += 1
+    }
+  }
+
+  for (x, y) in zip(&xs[result..], &ys[result..]) {
+    if x != y { break; }
+    result += 1
+  }
+
+  result
+}
+ +
+

Amusingly, this is already a bit faster, but not quite there yet. +Specifically, SIMD needs to process all values in the chunk in parallel in the same way. +In our code above, we have a break, which means that processing of the nth pair of bytes depends on the n-1st pair. +Lets fix that by disabling short-circuiting. +We will check if the whole chunk of bytes matches or not, but we wont care which specific byte is a mismatch:

+ +
+ + +
// 80 milliseconds
+fn common_prefix3(xs: &[u8], ys: &[u8]) -> usize {
+  let chunk_size = 16;
+
+  let mut result = 0;
+  for (xs_chunk, ys_chunk) in
+    zip(xs.chunks_exact(chunk_size), ys.chunks_exact(chunk_size))
+  {
+    let mut chunk_equal: bool = true;
+    for (x, y) in zip(xs_chunk, ys_chunk) {
+      // NB: &, unlike &&, doesn't short-circuit.
+      chunk_equal = chunk_equal & (x == y);
+    }
+
+    if !chunk_equal { break; }
+    result += chunk_size;
+  }
+
+  for (x, y) in zip(&xs[result..], &ys[result..]) {
+    if x != y { break; }
+    result += 1
+  }
+
+  result
+}
+ +
+

And this version finally lets vectorization kick in, reducing the runtime almost by an order of magnitude. +We can now compress this version using iterators.

+ +
+ + +
// 80 milliseconds
+fn common_prefix5(xs: &[u8], ys: &[u8]) -> usize {
+  let chunk_size = 16;
+
+  let off =
+    zip(xs.chunks_exact(chunk_size), ys.chunks_exact(chunk_size))
+      .take_while(|(xs_chunk, ys_chunk)| xs_chunk == ys_chunk)
+      .count() * chunk_size;
+
+  off + zip(&xs[off..], &ys[off..])
+    .take_while(|(x, y)| x == y)
+    .count()
+}
+ +
+

Note how the code is meaningfully different from our starting point. +We do not blindly rely on the compilers optimization. +Rather, we are aware about specific optimizations we need in this case, and write the code in a way that triggers them.

+

Specifically, for SIMD:

+
    +
  • +we express the algorithm in terms of processing chunks of elements, +
  • +
  • +within each chunk, we make sure that theres no branching and all elements are processed in the same way. +
  • +
+
+
+ +

+ Conclusion +

+

Compilers are tools. +While theres a fair share of optimistic transformations which sometimes kick in, the bulk of the impact of an optimizing compiler comes from guaranteed optimizations with specific preconditions. +Compilers are myopic they have a hard time reasoning about code outside of the current function and values not held in the local variables. +Inlining and scalar replacement of aggregates are two optimizations to remedy the situation. +Zero cost abstractions work by expressing opportunities for guaranteed optimizations in the languages type system.

+

If you like this post, I highly recommend A Catalogue of Optimizing Transformations by Frances Allen.

+
+
+
+ + + + + diff --git a/2023/04/13/reasonable-bootstrap.html b/2023/04/13/reasonable-bootstrap.html new file mode 100644 index 00000000..902eff40 --- /dev/null +++ b/2023/04/13/reasonable-bootstrap.html @@ -0,0 +1,191 @@ + + + + + + + Reasonable Bootstrap + + + + + + + + + + + + +
+ +
+ +
+
+ +

Reasonable Bootstrap

+

Compilers for systems programming languages (C, C++, Rust, Zig) tend to be implemented in the languages themselves. +The idea being that the current version of the compiler is built using some previous version. +But how can you get a working compiler if you start out from nothing?

+

The traditional answer has been via bootstrap chain. +You start with the first version of the compiler implemented in assembly, use that to compile the latest version of the compiler it is capable of compiling, then repeat. +This historically worked OK because older versions of GCC were implemented in C (and C is easy to provide a compiler for) and, even today, GCC itself is very conservative in using language features. +I believe GCC 10.4 released in 2022 can be built with just a C++98 compiler. +So, if you start with a C compiler, its not too many hops to get to the latest GCC.

+

This doesnt feel entirely satisfactory, as this approach requires artificially constraining the compiler itself to be very conservative. +Rust does the opposite of that. +Rust requires that rustc 1.x.0 is built by rustc 1.x-1.0, and theres a new rustc version every six weeks. +This seems like a very reasonable way to build compilers, but it also is incompatible with chain bootstrapping. +In the limit, one would need infinite time to compile modern rustc ex nihilo!

+

I think theres a better way if the goal is to compile the world from nothing. +To cut to the chase, the minimal bootstrap seed for Rust could be:

+ +

Bootstrapping from this should be easy. +WebAssembly is a very small language, so a runtime for it can be built out of nothing. +Using this runtime, and rustc-compiled-to-wasm we can re-compile rustc itself. +Then, we can either cross-compile it to the architecture we need, if that architecture is supported by rustc. +If the architecture is not supported, we can implement a new backend for that arch in Rust, compile our modified compiler to wasm, and then cross-compile to the desired target.

+

More complete bootstrap seed would include:

+ +

And this seed is provided for every version of a language. +This way, it is possible to bootstrap, in constant time, any version of Rust.

+

Specific properties we use for this setup:

+ +

This setup does not prevent the trusting trust attack. +However, it is possible to rebuild the bootstrap seed using a different compiler. +Using that compiler to compiler rustc to .wasm will produce a different blob. +But using that .wasm to recompile rustc again should produce the blob from the seed (unless, of course, theres a trojan in the seed).

+

This setup does not minimize the size of opaque binary blobs in the seed. +The size of the .wasm would be substantial. +This setup, however, does minimize the total size of the seed. +In the traditional bootstrap, source code for rustc 1.0.0, rustc 1.1.0, rustc 1.2.0, etc would also have to be part of the seed. +For the suggested approach, you need only one version, at the cost of a bigger binary blob.

+

This idea is not new. +I think it was popularized by Pascal with p-code. +OCaml uses a similar strategy. +Finally, Zig makes an important observation that we no longer need to implement language-specific virtual machines, because WebAssembly is a good fit for the job.

+
+
+ + + + + diff --git a/2023/04/23/data-oriented-parallel-value-interner.html b/2023/04/23/data-oriented-parallel-value-interner.html new file mode 100644 index 00000000..200a06ce --- /dev/null +++ b/2023/04/23/data-oriented-parallel-value-interner.html @@ -0,0 +1,654 @@ + + + + + + + Data Oriented Parallel Value Interner + + + + + + + + + + + + +
+ +
+ +
+
+ +

Data Oriented Parallel Value Interner

+

In this post, I will present a theoretical design for an interner. +It should be fast, but there will be no benchmarks as I havent implemented the thing. +So it might actually be completely broken or super slow for one reason or another. +Still, I think there are a couple of neat ideas, which I would love to call out.

+

The context for the post is this talk by Andrew Kelley, which notices that its hard to reconcile interning and parallel compilation. +This is something I have been thinking about a lot in the context of rust-analyzer, which relies heavily on pointers, atomic reference counting and indirection to make incremental and parallel computation possible.

+

And yes, interning (or, more generally, assigning unique identities to things) is a big part of that.

+

Usually, compilers intern strings, but we will be interning trees today. +Specifically, we will be looking at something like a Value type from the Zig compiler. +In a simplified RAII style it could look like this:

+ +
+ + +
const Value = union(enum) {
+    // A bunch of payload-less variants.
+    u1_type,
+    u8_type,
+    i8_type,
+
+    // A number.
+    u64: u64,
+
+    // A declaration.
+    // Declarations and types are also values in Zig.
+    decl: DeclIndex,
+
+    // Just some bytes for a string.
+    bytes: []u8,
+
+    // The interesting case which makes it a tree.
+    // This is how struct instances are represented.
+    aggregate: []Value,
+};
+
+const DeclIndex = u32;
+ +
+

Such values are individually heap-allocated and in general are held behind pointers. +Zigs compiler adds a couple of extra tricks to this structure, like not overallocating for small enum variants:

+ +
+ + +
const Value = struct {
+    payload: *Payload
+}
+
+// Payload is an "abstract" type:
+// There's some data following the `tag`,
+// whose type and size is determined by
+// this `tag`.
+const Payload = struct {
+    tag: Tag,
+
+    pub const U64 = struct {
+        base: Payload,
+        data: u64,
+    };
+
+    pub const Decl = struct {
+        base: Payload,
+        decl: DeclIndex,
+    };
+}
+ +
+

But how do we intern this stuff, such that:

+ +

Lets start with concurrent SegmentedList:

+ +
+ + +
fn SegmentList(comptime T: type) type {
+    return struct {
+        echelons: [31]?[*]T,
+    };
+}
+ +
+

Segmented list is like ArrayList with an extra super power that pushing new items does not move/invalidate old ones. +In normal ArrayList, when the backing storage fills up, you allocate a slice twice as long, copy over the elements from the old slice and then destroy it. +In SegmentList, you leave the old slice where it is, and just allocate a new one.

+

Now, as we are writing an interner and want to use u32 for an index, we know that we need to store 1<<32 items max. +But that means that well need at most 31 segments for our SegmentList:

+ +
+ + +
[1 << 0]T
+[1 << 1]T
+[1 << 2]T
+...
+[1 << 31]T
+ +
+

So we can just pre-allocate array of 31 pointers to the segments, hence

+ +
+ + +
echelons: [31]?[*]T,
+ +
+

If we want to be more precise with types, we can even use a tuple whose elements are nullable pointers to arrays of power-of-two sizes:

+ +
+ + +
fn SegmentList(comptime T: type) type {
+    return struct {
+        echelons: std.meta.Tuple(get_echelons(31, T)),
+    };
+}
+
+fn get_echelons(
+    comptime level: usize,
+    comptime T: type,
+) []const type {
+    if (level == 0) return &.{ ?*[1]T };
+    return get_echelons(level - 1, T) ++ .{ ?*[1 << level]T };
+}
+ +
+

Indexing into such an echeloned array is still O(1). +Heres how echelons look in terms of indexes

+ +
+ + +
0                      = 1  total
+1 2                    = 3  total
+3 4 5 6                = 7  total
+7 8 9 10 11 12 13 14   = 15 total
+ +
+

The first n echelons hold 2**n - 1 elements. +So, if we want to find the ith item, we first find the echelon it is in, by computing the nearest smaller power of two of i + 1, and then index into the echelon with i - (2**n - 1), give or take a +1 here or there.

+ +
+ + +
// Warning: untested, probably has a couple of bugs.
+
+pub fn get(self: Self, index: u32) *const T {
+    const e = self.get_echelon(index);
+    const i = index - (1 << e - 1);
+    return &self.echelons[e].?[i];
+}
+
+fn get_echelon(index: u32) u5 {
+    @ctz(std.math.floorPowerOfTwo(index + 1));
+}
+ +
+

Note that we pre-allocate an array of pointers to segments, but not the segments themselves. +Pointers are nullable, and we allocate new segments lazily, when we actually write to the corresponding indexes. +This structure is very friendly to parallel code. +Reading items works because items are never reallocated. +Lazily allocating new echelons is easy, because the position of the pointer is fixed. +That is, we can do something like this to insert an item at position i:

+
    +
  1. +compute the echelon index +
  2. +
  3. +@atomicLoad(.Acquire) the pointer +
  4. +
  5. +if the pointer is null +
      +
    • +allocate the echelon +
    • +
    • +@cmpxchgStrong(.Acquire, .Release) the pointer +
    • +
    • +free the redundant echelon if exchange failed +
    • +
    +
  6. +
  7. +insert the item +
  8. +
+

Notice how we dont need any locks or even complicated atomics, at the price of sometimes doing a second redundant allocation.

+

One thing this data structure is bad at is doing bounds checks and tracking which items are actually initialized. +For the interner use-case, we will rely on an invariant that we always use indexes provided to use by someone else, such that possession of the index signifies that:

+ +

If, instead, we manufacture an index out of thin air, we might hit all kinds of nasty behavior without any bullet-proof way to check that.

+

Okay, now that we have this SegmentList, how would we use them?

+

Recall that our simplified value is

+ +
+ + +
const Value = union(enum) {
+    // A bunch of payload-less variants.
+    u1_type,
+    u8_type,
+    i8_type,
+
+    // A number.
+    u64: u64,
+
+    // A declaration.
+    // Declarations and types are also values in Zig.
+    decl: Decl,
+
+    // Just some bytes for a string.
+    bytes: []u8,
+
+    // The interesting case which makes it a tree.
+    // This is how struct instances are represented.
+    aggregate: []Value,
+};
+
+// Index of a declaration.
+const Decl = u32;
+ +
+

Of course we will struct-of-array it now, to arrive at something like this:

+ +
+ + +
const Value = u32;
+
+const Tag = enum(u8) {
+    u1_type, u8_type, i8_type,
+    u64, decl, bytes, aggregate,
+};
+
+const ValueTable = struct {
+    tag: SegmentList(Tag),
+    data: SegmentList(u32),
+
+    u64: SegmentList(u64),
+    aggregate: SegmentList([]Value),
+    bytes: SegmentList([]u8),
+};
+ +
+

A Value is now an index. +This index works for two fields of ValueTable, tag and data. +That is, the index addresses five bytes of payload, which is all that is needed for small values. +For large tags like aggregate, the data field stores an index into the corresponding payload SegmentList.

+

That is, every value allocates a tag and data elements, but only actual u64s occupy a slot in u64 SegmentList.

+

So now we can write a lookup function which takes a value index and reconstructs a value from pieces:

+ +
+ + +
const ValueFull = union(enum) {
+    u1_type,
+    u8_type,
+    i8_type,
+    u64: u64,
+    decl: Decl,
+    bytes: []u8,
+    aggregate: []Value,
+};
+
+fn lookup(self: Self, value: Value) ValueFull {
+    const tag = self.tag.get(value);
+    switch (tag) {
+        .aggregate => return ValueFull{
+            .aggregate = self.aggregate.get(self.data(value)),
+        },
+    }
+}
+ +
+

Note that here ValueFull is non-owning type, it is a reference into the actual data. +Note as well that aggregates now store a slice of indexes, rather than a slice of pointers.

+

Now lets deal with creating and interning values. +We start by creating a ValueFull using data owned by us +(e.g. if we are creating an aggregate, we may use a stack-allocated array as a backing store for []Value slice). +Then we ask ValueTable to intern the data:

+ +
+ + +
fn intern(self: *Self, value_full: ValueFull) Value {
+}
+ +
+

If the table already contains an equal value, its index is returned. +Otherwise, the table copies ValueFull data such that it is owned by the table itself, and returns a freshly allocated index.

+

For bookkeeping, well need a hash table with existing values and a counter to use for a fresh index, something like this:

+ +
+ + +
const ValueTable = struct {
+    value_set: AutoHashMapUnmanaged(Value, void),
+    value_count: u32,
+    tag: SegmentList(Tag),
+    index: SegmentList(u32),
+
+    u64_count: u32,
+    u64: SegmentList(u64),
+
+    aggregate_count: u32,
+    aggregate: SegmentList([]Value),
+
+    bytes_count: u32,
+    bytes: SegmentList([]u8),
+
+    pub fn intern(self: *Self, value_full: ValueFull) Value {
+        ...
+    }
+};
+ +
+

Pay attention to _count fields we have value_count guarding the tag and index, and separate counts for specific kinds of values, as we dont want to allocate, e.g. an u64 for every value.

+

Our hashmap is actually a set which stores u32 integers, but uses ValueFull to do a lookup: when we consider interning a new ValueFull, we dont know its index yet. +Luckily, getOrPutAdapted API provides the required flexibility. +We can use it to compare a Value (index) and a ValueFull by hashing a ValueFull and doing component-wise comparisons in the case of a collision.

+

Note that, because of interning, we can also hash ValueFull efficiently! +As any subvalues in ValueFull are guaranteed to be already interned, we can rely on shallow hash and hash only child values indexes, rather than their data.

+

This is a nice design for a single thread, but how do we make it thread safe? +The straightforward solution would be to slap a mutex around the logic in intern.

+

This actually is not as bad as it seems, as wed need a lock only in intern, and lookup would work without any synchronization whatsoever. +Recall that obtaining an index of a value is a proof that the value was properly published. +Still, we expect to intern a lot of values, and that mutex is all but guaranteed to become a point of contention. +And some amount of contention is inevitable here if two threads try to intern two identical values, we want them to clash, communicate, and end up with a single, shared value.

+

Theres a rather universal recipe for dealing with contention you can shard the data. +In our case, rather than using something like

+ +
+ + +
mutex: Mutex,
+value_set: AutoHashMapUnmanaged(Value, void),
+ +
+

we can do

+ +
+ + +
mutex: [16]Mutex,
+value_set: [16]AutoHashMapUnmanaged(Value, void),
+ +
+

That is, we create not one, but sixteen hashmaps, and use, e.g., lower 4 bits of the hash to decide which mutex and hashmap to use. +Depending on the structure of the hashmap, such locks could even be pushed as far as individual buckets.

+

This doesnt solve all our contention problems now that several threads can simultaneously intern values (as long as they are hashed into different shards) we have to make all count variables atomic. +So we essentially moved the single global point of contention from a mutex to value_count field, which is incremented for every interned value.

+

We can apply the sharding trick again, and shard all our SegmentLists. +But that would mean that we have to dedicate some bits from Value index to the shard number, and to waste some extra space for non-perfectly balanced shards.

+

Theres a better way we can amortize atomic increments by allowing each thread to bulk-allocate indexes. +That is, if a thread wants to allocate a new value, it atomically increments value_count by, say, 1024, and uses those indexes for the next thousand allocations. +In addition to ValueTable, each thread now gets its own distinct LocalTable:

+ +
+ + +
const LocalTable = struct {
+    global: *ValueTable,
+
+    // Invariant: if any `index % 1024 == 0`,
+    // it's time to visit `global` to
+    // refill our budget via atomic fetchAndAdd.
+    value_index: u32,
+    u64_index: u32,
+    aggregate_index: u32,
+    bytes_index: u32,
+};
+ +
+

An attentive reader would notice a bonus here: in this setup, a thread allocates a contiguous chunk of values. +It is reasonable to assume that values allocated together would also be used together, so we potentially increase future spatial locality here.

+

Putting everything together, the pseudo-code for interning would look like this:

+ +
+ + +
fn intern(table: *LocalTable, value_full: ValueFull) Value {
+    const hash = shallow_hash(value_full);
+
+    // Find & lock the shard.
+    const shard = hash & 0xF;
+    let mutex = &table.global.mutex[shard];
+    let value_set = &table.global.value_set[shard]
+
+    mutex.lock();
+    defer mutex.unlock();
+
+    // Either find that this value has been interned already...
+    const gop = value_set.get_or_put(hash, value_full, ...);
+    if (gop.found_existing) return got.key_ptr.*;
+
+    // ... or proceed to allocate a new index for it
+
+    if (table.tag_index & 0xFF == 0) {
+        // Run out of indexes, refill our budget!
+        table.tag_index = @atomicRmw(
+            u32, &table.global.value_count,
+            .Add, 0xFF,
+            .Relaxed,
+        );
+    }
+
+    // Assign the index to the new value
+    // and put it into the hash map.
+    const value = table.tag_index;
+    table.tag_index += 1;
+    gop.key_ptr.* = value;
+
+    // Now initialize the value.
+    // Note that we still hold shard's mutex at this point.
+
+    switch (value_full) {
+        .aggregate => |fields| {
+            // Initialize the tag, common for all values.
+            table.global.tag.set(value, .aggregate);
+
+            // Allocate tag-specific data using
+            // the same atomic add trick.
+            if (table.aggregate_index & 0xFF == 0) {
+                table.aggregate_index = @atomicRmw(
+                    u32, &table.global.aggregate_count,
+                    .Add, 0xFF,
+                    .Relaxed,
+                );
+            }
+            const index = table.aggregate_index;
+            table.aggregate_index += 1;
+
+            // Make it possible to find tag-specific data
+            // from the value index.
+            table.global.index.set(value, index);
+
+            // `value_full` is borrowed, so we must
+            // create a copy that we own.
+            const fields_owned = allocator.dup(fields)
+                catch unreachable;
+
+            table.global.aggregate.set(index, fields_owned);
+        }
+    }
+
+    return value;
+}
+
+// Code for assigning an index of a SegmentList.
+// Shard's mutex guarantees exclusive access to the index.
+// Accesses to the echelon might race though.
+fn set(list: SegmentList(T), index: u32, value: T) {
+    const e = list.get_echelon(index);
+    const i = index - ((1 << e) - 1);
+
+    var echelon = @atomicLoad(?[*]T, &list.echelons[e], .Acquire);
+    if (echelon == null) {
+        // Race with other threads to allocate the echelon.
+        const echelon_new = allocator.alloc(T, 1 << e)
+            catch unreachable;
+
+        const modified = @cmpxchgStrong(
+            ?[*]T, &list.echelons[e],
+            null, echelon_new,
+            .Release, .Acquire,
+        );
+
+        if (modified) |echelon_modified| {
+            // Another thread won, free our useless allocation.
+            echelon = echelon_modified
+            allocator.free(echelon_new);
+        } else {
+            echelon = echelon_new;
+        }
+    }
+
+    echelon.?[i] = value;
+}
+ +
+

Note that it is important that we dont release the mutex immediately after assigning the index for a value, but rather keep it locked all the way until we fully copied thee value into the ValueTable. +If we release the lock earlier, a different thread which tries to intern the same value would get the correct index, but would risk accessing partially-initialized data. +This can be optimized a bit by adding value-specific lock (or rather, a Once). +So we use the shard lock to assign an index, then release the shard lock, and use value-specific lock to do the actual (potentially slow) initialization.

+

And thats all I have for today! +Again, I havent implemented this, so I have no idea how fast or slow it actually is. +But the end result looks rather beautiful, and builds upon many interesting ideas:

+ +

Discussion on /r/Zig.

+
+
+ + + + + diff --git a/2023/05/02/implicits-for-mvs.html b/2023/05/02/implicits-for-mvs.html new file mode 100644 index 00000000..6ab6e46a --- /dev/null +++ b/2023/05/02/implicits-for-mvs.html @@ -0,0 +1,285 @@ + + + + + + + Value Oriented Programming Needs Implicits? + + + + + + + + + + + + +
+ +
+ +
+
+ +

Value Oriented Programming Needs Implicits?

+

An amateur note on language design which explores two important questions:

+ +

Lets start with the second question. +What is the basic stuff that everything else is made of?

+

Not so long ago, the most popular answer to that question was objects blobs of mutable state with references to other blobs. +This turned out to be problematic local mutation of an object might accidentally cause unwanted changes elsewhere. +Defensive copying of collections at the API boundary was a common pattern.

+

Another answer to the question of basic stuff is immutable values, as exemplified by functional programming. +This fixes the ability to reason about programs locally at the cost of developer ergonomics and expressiveness. +A lot of code is naturally formulated in terms of lets mutate this little thing, and functionally threading the update through all the layers is tiresome.

+

The C answer is that everything is made of memory (*). +It is almost as if memory is an array of bytes. +Almost, but not quite to write portable programs amenable to optimization, certain restrictions must be placed on the ways memory is accessed and manipulated, hence (*). +These restrictions not being checked by the compiler (and not even visible in the source code) create a fertile ground for subtle bugs.

+

Rust takes this basic C model and:

+ +

Curiously, this approach allows rust to have an immutable values feel, without requiring the user to thread updates manually, +In Rust, Ordinary Vectors are Values. +But the cognitive cost for this approach is pretty high, as the universe of values is now forked by different flavors of owning/referencing.

+

Lets go back to the pure FP model. +Can we just locally fix it? +Lets take a look at an example:

+ +
+ + +
let xs1 = get_items() in
+let xs2  = modify_items(xs1) in
+let xs3 = sort_items(xs2) in
+...
+ +
+

It is pretty clear that we can allow mutation of local variables via a simple rewrite, as that wont compromise local reasoning:

+ +
+ + +
var xs = get_items()
+xs = modify_items(xs)
+xs = sort_items(xs)
+ +
+

Similarly, we can introduce a rewrite rule for the ubiquitous x = f(x) pattern, such that the code looks like this:

+ +
+ + +
var xs = get_items()
+modify_items(xs)
+sort_items(xs)
+ +
+

Does this actually work? +Yes, it does, as popularized by Swift and distilled in its pure form by Val.

+

Formalizing the rewriting reasoning, we introduce second-class references, which can only appear in function arguments (inout parameters), but, eg, cant be stored as fields. +With these restrictions, borrow checking becomes fairly simple at each function call it suffices to check that no two inout arguments overlap.

+

Now, lets switch gears and explore the second question polymorphism.

+

Starting again with OOP, you can use subtyping with its familiar class Dog extends Triangle, but that is not very flexible. +In particular, expressing something like sorting a list of items with pure subtyping is not too natural. +What works better is parametric polymorphism, where you add type parameters to your data structures:

+ +
+ + +
fn sort<T>(items: &mut Vec<T>)
+ +
+

Except that it doesnt quite work as, as we also need to specify how to sort the Ts. +One approach here would be to introduce some sort of type-of-types, to group types with similar traits into a class:

+ +
+ + +
fn sort<T: Comparable>(items: &mut Vec<T>)
+ +
+

A somewhat simpler approach is to just explicitly pass in a comparison function:

+ +
+ + +
fn sort<T>(
+    compare: fn(T, T) -> bool,
+    items: &mut Vec<T>,
+)
+ +
+

How does this relate to value oriented programming? +It happens that, when programming with values, a very common pattern is to use indexes to express relationships. +For example, to model parent-child relations (or arbitrary graphs), the following setup works:

+ +
+ + +
type Tree = Vec<Node>;
+struct Node {
+    parent: usize,
+    children: Vec<usize>,
+}
+ +
+

Using direct references hits language limitations:

+ +
+ + +
struct Node {
+    parent: Node, // Who owns that?
+    children: Vec<Node>,
+}
+ +
+

Another good use-case is interning, where you have something like this:

+ +
+ + +
struct NameTable {
+    strings: Vec<String>,
+}
+
+struct Name(u32);
+ +
+

How do we sort a Vec<Name>? +We cant use the type class approach here, as knowing the type of Name isnt enough to sort names lexicographically, an instance of NameTable is also required to fetch the actual string data. +The approach with just passing in comparison function works, as it can close over the correct NameTable in scope.

+

The problem with just pass a function is that it gets tedious quickly. +Rather than xs.print() you now need to say xs.print(Int::print). +Luckily, similarly to how the compiler infers the type parameter T by default, we can allow limited inference of value parameters, which should remove most of the boilerplate. +So, something which looks like names.print() would desugar to Vec::print_vec(self.name_table.print, names).

+

This could also synergize well with compile-time evaluation. +If (as is the common case), the value of the implicit function table is known at compile time, no table needs to be passed in at runtime (and we dont have to repeatedly evaluate the table itself). +We can even compile-time partially evaluate things within the compilation unit, and use runtime parameters at the module boundaries, just like Swift does.

+

And thats basically it! +TL;DR: value oriented programming / mutable value semantics is an interesting everything is X approach to get the benefits of functional purity without giving up on mutable hash tables. +This style of programming doesnt work with cyclic data structures (values are always trees), so indexes are often used to express auxiliary relations. +This, however, gets in a way of type-based generic programming a T is no longer Comparable, only T + Context is. +A potential fix for that is to base generic programming on explicit dictionary passing combined with implicit value parameter inference.

+

Is there a language like this already?

+

Links:

+ +
+
+ + + + + diff --git a/2023/05/06/zig-language-server-and-cancellation.html b/2023/05/06/zig-language-server-and-cancellation.html new file mode 100644 index 00000000..15702932 --- /dev/null +++ b/2023/05/06/zig-language-server-and-cancellation.html @@ -0,0 +1,292 @@ + + + + + + + Zig Language Server And Cancellation + + + + + + + + + + + + +
+ +
+ +
+
+ +

Zig Language Server And Cancellation

+

I already have a dedicated post about a hypothetical Zig language server. +But perhaps the most important thing Ive written so far on the topic is the short note at the end of Zig and Rust.

+

If you want to implement an LSP for a language, you need to start with a data model. +If you correctly implement a store of source code which evolves over time and allows computing (initially trivial) derived data, then filling in the data until it covers the whole language is a question of incremental improvement. +If, however, you dont start with a rock-solid data model, and rush to implement language features, you might find yourself needing to make a sharp U-turn several years down the road.

+

I find this pretty insightful! +At least, this evening Ive been pondering a particular aspect of the data model, and I think I realized something new about the problem space! +The aspect is cancellation.

+
+ +

+ Cancellation +

+

Consider this. +Your language server is happily doing something very useful and computationally-intensive — +typechecking a giant typechecker, +computing comptime Ackermann function, +or talking to Postgres. +Now, the user comes in and starts typing in the very file the server is currently processing. +What is the desired behavior, and how could it be achieved?

+

One useful model here is strong consistency. +If the language server acknowledged a source code edit, all future semantic requests (like go to definition or code completion) reflect this change. +The behavior is as if all changes and requests are sequentially ordered, and the server fully processes all preceding edits before responding to a request. +There are two great benefits to this model. +First, for the implementor its an easy model to reason about. Its always clear what the answer to a particular request should be, the model is fully deterministic. +Second, the model gives maximally useful guarantees to the user, strict serializability.

+

So consider this sequence of events:

+
    +
  1. +User types fo. +
  2. +
  3. +The editor sends the edit to the language server. +
  4. +
  5. +The editor requests completions for fo. +
  6. +
  7. +The server starts furiously typechecking modified file to compute the result. +
  8. +
  9. +User types o. +
  10. +
  11. +The editor sends the o. +
  12. +
  13. +The editor re-requests completions, now for foo. +
  14. +
+

How does the server deal with this?

+

The trivial solution is to run everything sequentially to completion. +So, on the step 6, the server doesnt immediately acknowledge the edit, but rather blocks until it fully completes 4. +This is a suboptimal behavior, because reads (computing completion) block writes (updating source code). +As a rule of thumb, writes should be prioritized over reads, because they reflect more up-to-date and more useful data.

+

A more optimal solution is to make the whole data model of the server immutable, such that edits do not modify data inplace, but rather create a separate, new state. +In this model, computing results for 3 and 7 proceeds in parallel, and, crucially, the edit 6 is accepted immediately. +The cost of this model is the requirement that all data structures are immutable. +It also is a bit wasteful burning CPU to compute code completion for an already old file is useless, better dedicate all cores to the latest version.

+

A third approach is cancellation. +On step 6, when the server becomes aware about the pending edit, it actively cancels all in-flight work pertaining to the old state and then applies modification in-place. +That way we dont need to defensively copy the data, and also avoid useless CPU work. +This is the strategy employed by rust-analyzer.

+

Its useful to think about why the server cant just, like, apply the edit in place completely ignoring any possible background work. +The edit ultimately changes some memory somewhere, which might be concurrently read by the code completion thread, yielding a data race and full-on UB. +It is possible to work-around this by applying feral concurrency control and just wrapping each individual bit of data in a mutex. +This removes the data race, but leads to excessive synchronization, sprawling complexity and broken logical invariants (function body might change in the middle of typechecking).

+

Finally, theres this final solution, or rather, idea for a solution. +One interesting approach for dealing with memory which is needed now, but not in the future, is semi-space garbage collection. +We divide the available memory in two equal parts, use one half as a working copy which accumulates useful objects and garbage, and then at some point switch the halves, copying the live objects (but not the garbage) over. +Another place where this idea comes up is Carmacks architecture for functional games. +On every frame, a game copies over the game state applying frame update function. +Because frames happen sequentially, you only need two copies of game state for this. +We can think about applying something like that for cancellation without going for full immutability, we can let cancelled analysis to work with the old half-state, while we switch to the new one.

+

This is not particularly actionable, but a good set of ideas to start thinking about evolution of a state in a language server. +And now for something completely different!

+
+
+ +

+ Relaxed Consistency +

+

The strict consistency is a good default, and works especially well for languages with good support for separate compilation, as the amount of work a language server needs to do after an update is proportional to the size of the update, and to the amount of code on the screen, both of which are typically O(1). +For Zig, whose compilation model is start from the entry point and lazily compile everything thats actually used, this might be difficult to pull off. +It seems that Zig naturally gravitates to a smalltalk-like image-based programming model, where the server stores fully resolved code all the time, and, if some edit triggers re-analysis of a huge chunk of code, the user just has to wait until the server catches up.

+

But what if we dont do strong consistency? +What if we allow IDE to temporarily return non-deterministic and wrong results? +I think we can get some nice properties in exchange, if we use that semi-space idea.

+

The state of our language server would be comprised of three separate pieces of data:

+
    +
  • +A fully analyzed snapshot of the world, ready. +This is a bunch of source file, plus their ASTs, ZIRs and AIRs. +This also probably contains an index of cross-references, so that finding all usages of an identifier requires just listing already precomputed results. +
  • +
  • +The next snapshot, which is being analyzed, working. +This is essentially the same data, but the AIR is being constructed. +We need two snapshots because we want to be able to query one of them while the second one is being updated. +
  • +
  • +Finally, we also hold ASTs for the files which are currently being modified, pending. +
  • +
+

The overall evolution of data is as follows.

+

All edits synchronously go to the pending state. +pending is organized strictly on a per-file basis, so updating it can be done quickly on the main thread (maaaybe we want to move the parsing off the main thread, but my gut feeling is that we dont need to). +pending always reflects the latest state of the world, it is the latest state of the world.

+

Periodically, we collect a batch of changes from pending, create a new working and kick off a full analysis in background. +A good point to do that would be when theres no syntax errors, or when the user saves a file. +Theres at most one analysis in progress, so we accumulate changes in pending until the previous analysis finishes.

+

When working is fully processed, we atomically update the ready. +As ready is just an inert piece of data, it can be safely accessed from whatever thread.

+

When processing requests, we only use ready and pending. +Processing requires some heuristics. +ready and pending describe different states of the world. +pending guarantees that its state is up-to-date, but it only has AST-level data. +ready is outdated, but it has every bit of semantic information pre-computed. +In particular, it includes cross-reference data.

+

So, our choices for computing results are:

+
    +
  • +

    Use the pending AST. +Features like displaying the outline of the current file or globally fuzzy-searching function by name can be implemented like this. +These features always give correct results.

    +
  • +
  • +

    Find the match between the pending AST and the ready semantics. +This works perfectly for non-local goto definition. +Here, we can temporarily get wrong results, or no result at all. +However, the results we get are always instant.

    +
  • +
  • +

    Re-analyze pending AST using results from ready for the analysis of the context. +This is what well use for code completion. +For code completion, pending will be maximally diverging from ready (especially if we use no syntax errors as a heuristic for promoting pending to working), +so we wont be able to complete based purely on ready. +At the same time, completion is heavily semantics-dependent, so we wont be able to drive it through pending. +And we also cant launch full semantic analysis on pending (what we effectively do in rust-analyzer), due to from root analysis nature.

    +

    But we can merge two analysis techniques. +For example, if we are completing in a function which starts as fn f(comptime T: type, param: T), +we can use ready to get a set of values of T the function is actually called with, to complete param. in a useful way. +Dually, if inside f we have something like const list = std.ArrayList(u32){}, we dont have to comptime evaluate the ArrayList function, we can fetch the result from ready.

    +

    Of course, we must also handle the case where theres no ready yet (its a first compilation, or we switched branches), so completion would be somewhat non-deterministic.

    +
  • +
+

One important flow where non-determinism would get in a way is refactoring. +When you rename something, you should be 100% sure that youve found all usages. +So, any refactor would have to be a blocking operation where we first wait for the current working to complete, then update working with the pending accumulated so far, and wait for that to complete, to, finally, apply the refactor using only up-to-date ready. +Luckily, refactoring is almost always a two-phase flow, reminiscent of a GET/POST flow for HTTP form (more about that). +Any refactor starts with read-only analysis to inform the user about available options and to gather input. +For rename, you wait for the user to type the new name, for change signature the user needs to rearrange params. +This brief interactive window should give enough headroom to flush all pending changes, masking the latency.

+

I am pretty excited about this setup. +I think thats the way to go for Zig.

+
    +
  • +The approach meshes extremely well with the ambition of doing incremental binary patching, both because it leans on complete global analysis, and because it contains an explicit notion of switching from one snapshot to the next one +(in contrast, rust-analyzer never really thinks about previous state of the code. Theres always only the current state, with lazy, partially complete analysis). +
  • +
  • +Zig lacks declared interfaces, so a quick find all calls to this function operation is required for useful completion. +Fully resolved historical snapshot gives us just that. +
  • +
  • +Zig is carefully designed to make a lot of semantic information obvious just from the syntax. +Unlike Rust, Zig lacks syntactic macros or glob imports. +This makes is possible to do a lot of analysis correctly using only pending ASTs. +
  • +
  • +This approach nicely dodges the cancellation problem Ive spend half of the blog post explaining, and has a relatively simple threading story, which reduces implementation complexity. +
  • +
  • +Finally, it feels like it should be super fast (if not the most CPU efficient). +
  • +
+ +
+ + +
+

Discussion on /r/Zig.

+
+
+
+ + + + + diff --git a/2023/05/21/resilient-ll-parsing-tutorial.html b/2023/05/21/resilient-ll-parsing-tutorial.html new file mode 100644 index 00000000..9b2711a1 --- /dev/null +++ b/2023/05/21/resilient-ll-parsing-tutorial.html @@ -0,0 +1,1624 @@ + + + + + + + Resilient LL Parsing Tutorial + + + + + + + + + + + + +
+ +
+ +
+
+ +

Resilient LL Parsing Tutorial

+

In this tutorial, I will explain a particular approach to parsing, which gracefully handles syntax errors and is thus suitable for language servers, which, by their nature, have to handle incomplete and invalid code. +Explaining the problem and the solution requires somewhat less than a trivial worked example, and I want to share a couple of tricks not directly related to resilience, so the tutorial builds a full, self-contained parser, instead of explaining abstractly just the resilience.

+

The tutorial is descriptive, rather than prescriptive it tells you what you can do, not what you should do.

+ +
+ +

+ Why Resilience is Needed? +

+

Lets look at one motivational example for resilient parsing:

+ +
+ + +
fn fib_rec(f1: u32,
+
+fn fib(n: u32) -> u32 {
+  fib_rec(1, 1, n)
+}
+ +
+

Here, a user is in the process of defining the fib_rec helper function. +For a language server, its important that the incompleteness doesnt get in the way. +In particular:

+
    +
  • +

    The following function, fib, should be parsed without any errors such that syntax and semantic highlighting is not disturbed, and all calls to fib elsewhere typecheck correctly.

    +
  • +
  • +

    The fib_rec function itself should be recognized as a partially complete function, so that various language server assists can help complete it correctly.

    +
  • +
  • +

    In particular, a smart language server can actually infer the expected type of fib_rec from a call we already have, and suggest completing the whole prototype. +rust-analyzer doesnt do that today, but one day it should.

    +
  • +
+

Generalizing this example, what we want from our parser is to recognize as much of the syntactic structure as feasible. +It should be able to localize errors a mistake in a function generally should not interfere with parsing unrelated functions. +As the code is read and written left-to-right, the parser should also recognize valid partial prefixes of various syntactic constructs.

+

Academic literature suggests another lens to use when looking at this problem: error recovery. +Rather than just recognizing incomplete constructs, the parser can attempt to guess a minimal edit which completes the construct and gets rid of the syntax error. +From this angle, the above example would look rather like fn fib_rec(f1: u32, /* ) {} */ , where the stuff in a comment is automatically inserted by the parser.

+

Resilience is a more fruitful framing to use for a language server incomplete code is the ground truth, and only the user knows how to correctly complete it. +An language server can only offer guesses and suggestions, and they are more precise if they employ post-parsing semantic information.

+

Error recovery might work better when emitting understandable syntax errors, but, in a language server, the importance of clear error messages for syntax errors is relatively lower, as highlighting such errors right in the editor synchronously with typing usually provides tighter, more useful tacit feedback.

+
+
+ +

+ Approaches to Error Resilience +

+

The classic approach for handling parser errors is to explicitly encode error productions and synchronization tokens into the language grammar. +This approach isnt a natural fit for resilience framing you dont want to anticipate every possible error, as there are just too many possibilities. +Rather, you want to recover as much of a valid syntax tree as possible, and more or less ignore arbitrary invalid parts.

+

Tree-sitter does something more interesting. +It is a GLR parser, meaning that it non-deterministically tries many possible LR (bottom-up) parses, and looks for the best one. +This allows Tree-sitter to recognize many complete valid small fragments of a tree, but it might have trouble assembling them into incomplete larger fragments. +In our example fn fib_rec(f1: u32, , Tree-sitter correctly recognizes f1: u32 as a formal parameter, but doesnt recognize fib_rec as a function.

+

Top-down (LL) parsing paradigm makes it harder to recognize valid small fragments, but naturally allows for incomplete large nodes. +Because code is written top-down and left-to-right, LL seems to have an advantage for typical patterns of incomplete code. +Moreover, there isnt really anything special you need to do to make LL parsing resilient. +You sort of just not crash on the first error, and everything else more or less just works.

+

Details are fiddly though, so, in the rest of the post, we will write a complete implementation of a hand-written recursive descent + Pratt resilient parser.

+
+
+ +

+ Introducing L +

+

For the lack of imagination on my side, the toy language we will be parsing is called L. +It is a subset of Rust, which has just enough features to make some syntax mistakes. +Heres Fibonacci:

+ +
+ + +
fn fib(n: u32) -> u32 {
+    let f1 = fib(n - 1);
+    let f2 = fib(n - 2);
+    return f1 + f2;
+}
+ +
+

Note that theres no base case, because L doesnt have syntax for if. +Heres the syntax it does have, as an ungrammar:

+ +
+ + +
File = Fn*
+
+Fn = 'fn' 'name' ParamList ('->' TypeExpr)? Block
+
+ParamList = '(' Param* ')'
+Param = 'name' ':' TypeExpr ','?
+
+TypeExpr = 'name'
+
+Block = '{' Stmt* '}'
+
+Stmt =
+  StmtExpr
+| StmtLet
+| StmtReturn
+
+StmtExpr = Expr ';'
+StmtLet = 'let' 'name' '=' Expr ';'
+StmtReturn = 'return' Expr ';'
+
+Expr =
+  ExprLiteral
+| ExprName
+| ExprParen
+| ExprBinary
+| ExprCall
+
+ExprLiteral = 'int' | 'true' | 'false'
+ExprName = 'name'
+ExprParen = '(' Expr ')'
+ExprBinary = Expr ('+' | '-' | '*' | '/') Expr
+ExprCall = Expr ArgList
+
+ArgList = '(' Arg* ')'
+Arg = Expr ','?
+ +
+

The meta syntax here is similar to BNF, with two important differences:

+
    +
  • +the notation is better specified and more familiar (recursive regular expressions), +
  • +
  • +it describes syntax trees, rather than strings (sequences of tokens). +
  • +
+

Single quotes signify terminals: 'fn' and 'return' are keywords, 'name' stands for any identifier token, like foo, and '(' is punctuation. +Unquoted names are non-terminals. For example, x: i32, would be an example of Param. +Unquoted punctuation are meta symbols of ungrammar itself, semantics identical to regular expressions. Zero or more repetition is *, zero or one is ?, | is alternation and () are used for grouping.

+

The grammar doesnt nail the syntax precisely. For example, the rule for Param, Param = 'name' ':' Type ','? , says that Param syntax node has an optional comma, but theres nothing in the above ungrammar specifying whether the trailing commas are allowed.

+

Overall, L has very little to it a program is a series of function declarations, each function has a body which is a sequence of statements, the set of expressions is spartan, not even an if. Still, itll take us some time to parse all that. +But you can already try the end result in the text-box below. +The syntax tree is updated automatically on typing. +Do make mistakes to see how a partial tree is recovered.

+ +
+
+ +

+ Designing the Tree +

+

A traditional AST for L might look roughly like this:

+ +
+ + +
struct File {
+  functions: Vec<Function>
+}
+
+struct Function {
+  name: String,
+  params: Vec<Param>,
+  return_type: Option<TypeExpr>,
+  block: Block,
+}
+ +
+

Extending this structure to be resilient is non-trivial. There are two problems: trivia and errors.

+

For resilient parsing, we want the AST to contain every detail about the source text. +We actually dont want to use an abstract syntax tree, and need a concrete one. +In a traditional AST, the tree structure is rigidly defined any syntax node has a fixed number of children. +But there can be any number of comments and whitespace anywhere in the tree, and making space for them in the structure requires some fiddly data manipulation. +Similarly, errors (e.g., unexpected tokens), can appear anywhere in the tree.

+

One trick to handle these in the AST paradigm is to attach trivia and error tokens to other tokens. +That is, for something like +fn /* name of the function -> */ f() {} , +the fn and f tokens would be explicit parts of the AST, while the comment and surrounding whitespace would belong to the collection of trivia tokens hanging off the fn token.

+

One complication here is that its not always just tokens that can appear anywhere, sometimes you can have full trees like that. +For example, comments might support markdown syntax, and you might actually want to parse that properly (e.g., to resolve links to declarations). +Syntax errors can also span whole subtrees. +For example, when parsing pub(crate) nope in Rust, it would be smart to parse pub(crate) as a visibility modifier, and nest it into a bigger Error node.

+

SwiftSyntax meticulously adds error placeholders between any two fields of an AST node, giving rise to +unexpectedBetweenModifiersAndDeinitKeyword +and such (source, docs).

+

An alternative approach, used by IntelliJ and rust-analyzer, is to treat the syntax tree as a somewhat dynamically-typed data structure:

+ +
+ + +
enum TokenKind {
+  ErrorToken, LParen, RParen, Eq,
+  ...
+}
+
+struct Token {
+  kind: TokenKind,
+  text: String,
+}
+
+enum TreeKind {
+  ErrorTree, File, Fn, Param,
+  ...
+}
+
+struct Tree {
+  kind: TreeKind,
+  children: Vec<Child>,
+}
+
+enum Child {
+  Token(Token),
+  Tree(Tree),
+}
+ +
+

This structure does not enforce any constraints on the shape of the syntax tree at all, and so it naturally accommodates errors anywhere. +It is possible to layer a well-typed API on top of this dynamic foundation. +An extra benefit of this representation is that you can use the same tree type for different languages; this is a requirement for universal tools.

+

Discussing specifics of syntax tree representation goes beyond this article, as the topic is vast and lacks a clear winning solution. +To learn about it, take a look at Roslyn, SwiftSyntax, rowan and IntelliJ.

+

To simplify things, well ignore comments and whitespace, though youll absolutely want those in a real implementation. +One approach would be to do the parsing without comments, like we do here, and then attach comments to the nodes in a separate pass. +Attaching comments needs some heuristics for example, non-doc comments generally want to be a part of the following syntax node.

+

Another design choice is handling of error messages. +One approach is to treat error messages as properties of the syntax tree itself, by either inferring them from the tree structure, or just storing them inline. +Alternatively, errors can be considered to be a side-effect of the parsing process (that way, trees constructed manually during, eg, refactors, wont carry any error messages, even if they are invalid).

+

Heres the full set of token and tree kinds for our language L:

+ +
+ + +
enum TokenKind {
+  ErrorToken, Eof,
+
+  LParen, RParen, LCurly, RCurly,
+  Eq, Semi, Comma, Colon, Arrow,
+  Plus, Minus, Star, Slash,
+
+  FnKeyword, LetKeyword, ReturnKeyword,
+  TrueKeyword, FalseKeyword,
+
+  Name, Int,
+}
+
+enum TreeKind {
+  ErrorTree,
+  File, Fn, TypeExpr,
+  ParamList, Param,
+  Block,
+  StmtLet, StmtReturn, StmtExpr,
+  ExprLiteral, ExprName, ExprParen,
+  ExprBinary, ExprCall,
+  ArgList, Arg,
+}
+ +
+

Things to note:

+
    +
  • +explicit Error kinds; +
  • +
  • +no whitespace or comments, as an unrealistic simplification; +
  • +
  • +Eof virtual token simplifies parsing, removing the need to handle Option<Token>; +
  • +
  • +punctuators are named after what they are, rather than after what they usually mean: Star, rather than Mult; +
  • +
  • +a good set of name for various kinds of braces is {L,R}{Paren,Curly,Brack,Angle}. +
  • +
+
+
+ +

+ Lexer +

+

Wont be covering lexer here, lets just say we have fn lex(text: &str) -> Vec<Token>, function. Two points worth mentioning:

+
    +
  • +Lexer itself should be resilient, but thats easy produce an Error token for anything which isnt a valid token. +
  • +
  • +Writing lexer by hand is somewhat tedious, but is very simple relative to everything else. +If you are stuck in an analysis-paralysis picking a lexer generator, consider cutting the Gordian knot and hand-writing. +
  • +
+
+
+ +

+ Parser +

+

With homogenous syntax trees, the task of parsing admits an elegant formalization we want to insert extra parenthesis into a stream of tokens.

+ +
+ + +
+-Fun
+|      +-Param
+|      |
+[fn f( [x: Int] ) {}]
+     |            |
+     |            +-Block
+     +-ParamList
+ +
+

Note how the sequence of tokens with extra parenthesis is still a flat sequence. +The parsing will be two-phase:

+
    +
  • +in the first phase, the parser emits a flat list of events, +
  • +
  • +in the second phase, the list is converted to a tree. +
  • +
+

Heres the basic setup for the parser:

+ +
+ + +
enum Event {
+  Open { kind: TreeKind }, 
+  Close,
+  Advance,
+}
+
+struct MarkOpened {
+  index: usize,
+}
+
+struct Parser {
+  tokens: Vec<Token>,
+  pos: usize,
+  fuel: Cell<u32>, 
+  events: Vec<Event>,
+}
+
+impl Parser {
+  fn open(&mut self) -> MarkOpened { 
+    let mark = MarkOpened { index: self.events.len() };
+    self.events.push(Event::Open { kind: TreeKind::ErrorTree });
+    mark
+  }
+
+  fn close(  
+    &mut self,
+    m: MarkOpened,
+    kind: TreeKind, 
+  ) {
+    self.events[m.index] = Event::Open { kind };
+    self.events.push(Event::Close);
+  }
+
+  fn advance(&mut self) { 
+    assert!(!self.eof());
+    self.fuel.set(256); 
+    self.events.push(Event::Advance);
+    self.pos += 1;
+  }
+
+  fn eof(&self) -> bool {
+    self.pos == self.tokens.len()
+  }
+
+  fn nth(&self, lookahead: usize) -> TokenKind { 
+    if self.fuel.get() == 0 { 
+      panic!("parser is stuck")
+    }
+    self.fuel.set(self.fuel.get() - 1);
+    self.tokens.get(self.pos + lookahead)
+      .map_or(TokenKind::Eof, |it| it.kind)
+  }
+
+  fn at(&self, kind: TokenKind) -> bool { 
+    self.nth(0) == kind
+  }
+
+  fn eat(&mut self, kind: TokenKind) -> bool { 
+    if self.at(kind) {
+      self.advance();
+      true
+    } else {
+      false
+    }
+  }
+
+  fn expect(&mut self, kind: TokenKind) {
+    if self.eat(kind) {
+      return;
+    }
+    // TODO: Error reporting.
+    eprintln!("expected {kind:?}");
+  }
+
+  fn advance_with_error(&mut self, error: &str) {
+    let m = self.open();
+    // TODO: Error reporting.
+    eprintln!("{error}");
+    self.advance();
+    self.close(m, ErrorTree);
+  }
+}
+ +
+
    +
  1. +

    open, advance, and close form the basis for constructing the stream of events.

    +
  2. +
  3. +

    Note how kind is stored in the Open event, but is supplied with the close method. +This is required for flexibility sometimes its possible to decide on the type of syntax node only after it is parsed. +The way this works is that the open method returns a Mark which is subsequently passed to close to modify the corresponding Open event.

    +
  4. +
  5. +

    Theres a set of short, convenient methods to navigate through the sequence of tokens:

    +
      +
    • +nth is the lookahead method. Note how it doesnt return an Option, and uses Eof special value for out of bounds indexes. +This simplifies the call-site, no more tokens and token of a wrong kind are always handled the same. +
    • +
    • +at is a convenient specialization to check for a specific next token. +
    • +
    • +eat is at combined with consuming the next token. +
    • +
    • +expect is eat combined with error reporting. +
    • +
    +

    These methods are not a very orthogonal basis, but they are a convenience basis for parsing. +Finally, advance_with_error advanced over any token, but also wraps it into an error node.

    +
  6. +
  7. +

    When writing parsers by hand, its very easy to accidentally write the code which loops or recurses forever. +To simplify debugging, its helpful to add an explicit notion of fuel, which is replenished every time the parser makes progress, +and is spent every time it does not.

    +
  8. +
+

The function to transform a flat list of events into a tree is a bit involved. +It juggles three things: an iterator of events, an iterator of tokens, and a stack of partially constructed nodes (we expect the stack to contain just one node at the end).

+ +
+ + +
impl Parser {
+  fn build_tree(self) -> Tree {
+    let mut tokens = self.tokens.into_iter();
+    let mut events = self.events;
+    let mut stack = Vec::new();
+
+    // Special case: pop the last `Close` event to ensure
+    // that the stack is non-empty inside the loop.
+    assert!(matches!(events.pop(), Some(Event::Close)));
+
+    for event in events {
+      match event {
+        // Starting a new node; just push an empty tree to the stack.
+        Event::Open { kind } => {
+          stack.push(Tree { kind, children: Vec::new() })
+        }
+
+        // A tree is done.
+        // Pop it off the stack and append to a new current tree.
+        Event::Close => {
+          let tree = stack.pop().unwrap();
+          stack
+            .last_mut()
+            // If we don't pop the last `Close` before this loop,
+            // this unwrap would trigger for it.
+            .unwrap()
+            .children
+            .push(Child::Tree(tree));
+        }
+
+        // Consume a token and append it to the current tree
+        Event::Advance => {
+          let token = tokens.next().unwrap();
+          stack
+            .last_mut()
+            .unwrap()
+            .children
+            .push(Child::Token(token));
+        }
+      }
+    }
+
+    // Our parser will guarantee that all the trees are closed
+    // and cover the entirety of tokens.
+    assert!(stack.len() == 1);
+    assert!(tokens.next().is_none());
+
+    stack.pop().unwrap()
+  }
+}
+ +
+
+
+ +

+ Grammar +

+

We are finally getting to the actual topic of resilient parser. +Now we will write a full grammar for L as a sequence of functions. +Usually both atomic parser operations, like fn advance, and grammar productions, like fn parse_fn are implemented as methods on the Parser struct. +I prefer to separate the two and to use free functions for the latter category, as the code is a bit more readable that way.

+

Lets start with parsing the top level.

+ +
+ + +
use TokenKind::*;
+use TreeKind::*;
+
+// File = Fn*
+fn file(p: &mut Parser) {
+  let m = p.open(); 
+
+  while !p.eof() { 
+    if p.at(FnKeyword) {
+      func(p)
+    } else {
+      p.advance_with_error("expected a function"); 
+    }
+  }
+
+  p.close(m, File);  
+}
+ +
+
    +
  1. +

    Wrap the whole thing into a File node.

    +
  2. +
  3. +

    Use the while loop to parse a file as a series of functions. +Importantly, the entirety of the file is parsed; we break out of the loop only when the eof is reached.

    +
  4. +
  5. +

    To not get stuck in this loop, its crucial that every iteration consumes at least one token. +If the token is fn, well parse at least a part of a function. +Otherwise, we consume the token and wrap it into an error node.

    +
  6. +
+

Lets parse functions now:

+ +
+ + +
// Fn = 'fn' 'name' ParamList ('->' TypeExpr)? Block
+fn func(p: &mut Parser) {
+  assert!(p.at(FnKeyword)); 
+  let m = p.open(); 
+
+  p.expect(FnKeyword);
+  p.expect(Name);
+  if p.at(LParen) { 
+    param_list(p);
+  }
+  if p.eat(Arrow) {
+    type_expr(p);
+  }
+  if p.at(LCurly) { 
+    block(p);
+  }
+
+  p.close(m, Fn); 
+}
+ +
+
    +
  1. +

    When parsing a function, we assert that the current token is fn. +Theres some duplication with the if p.at(FnKeyword) , check at the call-site, but this duplication actually helps readability.

    +
  2. +
  3. +

    Again, we surround the body of the function with open/close pair.

    +
  4. +
  5. +

    Although parameter list and function body are mandatory, we precede them with an at check. +We can still report the syntax error by analyzing the structure of the syntax tree (or we can report it as a side effect of parsing in the else branch if we want). +It wouldnt be wrong to just remove the if altogether and try to parse param_list unconditionally, but the if helps with reducing cascading errors.

    +
  6. +
+

Now, the list of parameters:

+ +
+ + +
// ParamList = '(' Param* ')'
+fn param_list(p: &mut Parser) {
+  assert!(p.at(LParen));
+  let m = p.open();
+
+  p.expect(LParen); 
+  while !p.at(RParen) && !p.eof() { 
+    if p.at(Name) { 
+      param(p);
+    } else {
+      break; 
+    }
+  }
+  p.expect(RParen); 
+
+  p.close(m, ParamList);
+}
+ +
+
    +
  1. +Inside, we have a standard code shape for parsing a bracketed list. +It can be extracted into a high-order function, but typing out the code manually is not a problem either. +This bit of code starts and ends with consuming the corresponding parenthesis. +
  2. +
  3. +In the happy case, we loop until the closing parenthesis. +However, it could also be the case that theres no closing parenthesis at all, so we add an eof condition as well. +Generally, every loop we write would have && !p.eof() tackled on. +
  4. +
  5. +As with any loop, we need to ensure that each iteration consumes at least one token to not get stuck. +If the current token is an identifier, everything is ok, as well parse at least some part of the parameter. +
  6. +
+

Parsing parameter is almost nothing new at this point:

+ +
+ + +
// Param = 'name' ':' TypeExpr ','?
+fn param(p: &mut Parser) {
+  assert!(p.at(Name));
+  let m = p.open();
+
+  p.expect(Name);
+  p.expect(Colon);
+  type_expr(p);
+  if !p.at(RParen) { 
+    p.expect(Comma);
+  }
+
+  p.close(m, Param);
+}
+ +
+
    +
  1. +This is the only interesting bit. +To parse a comma-separated list of parameters with a trailing comma, its enough to check if the following token after parameter is ). +This correctly handles all three cases: +
      +
    • +if the next token is ), we are at the end of the list, and no comma is required; +
    • +
    • +if the next token is ,, we correctly advance past it; +
    • +
    • +finally, if the next token is anything else, then its not a ), so we are not at the last element of the list and correctly emit an error. +
    • +
    +
  2. +
+

Parsing types is trivial:

+ +
+ + +
// TypeExpr = 'name'
+fn type_expr(p: &mut Parser) {
+  let m = p.open();
+  p.expect(Name);
+  p.close(m, TypeExpr);
+}
+ +
+

The notable aspect here is naming. +The production is deliberately named TypeExpr, rather than Type, to avoid confusion down the line. +Consider fib(92) . +It is an expression, which evaluates to a value. +The same thing happens with types. +For example, Foo<Int> is not a type yet, its an expression which can be evaluated (at compile time) to a type (if Foo is a type alias, the result might be something like Pair<Int, Int>).

+

Parsing a block gets a bit more involved:

+ +
+ + +
// Block = '{' Stmt* '}'
+//
+// Stmt =
+//   StmtLet
+// | StmtReturn
+// | StmtExpr
+fn block(p: &mut Parser) {
+  assert!(p.at(LCurly));
+  let m = p.open();
+
+  p.expect(LCurly);
+  while !p.at(RCurly) && !p.eof() {
+    match p.nth(0) {
+      LetKeyword => stmt_let(p),
+      ReturnKeyword => stmt_return(p),
+      _ => stmt_expr(p),
+    }
+  }
+  p.expect(RCurly);
+
+  p.close(m, Block);
+}
+ +
+

Block can contain many different kinds of statements, so we branch on the first token in the loops body. +As usual, we need to maintain an invariant that the body consumes at least one token. +For let and return statements thats easy, they consume the fixed first token. +For the expression statement (things like 1 + 1;) it gets more interesting, as an expression can start with many different tokens. +For the time being, well just kick the can down the road and require stmt_expr to deal with it (that is, to guarantee that at least one token is consumed).

+

Statements themselves are straightforward:

+ +
+ + +
// StmtLet = 'let' 'name' '=' Expr ';'
+fn stmt_let(p: &mut Parser) {
+  assert!(p.at(LetKeyword));
+  let m = p.open();
+
+  p.expect(LetKeyword);
+  p.expect(Name);
+  p.expect(Eq);
+  expr(p);
+  p.expect(Semi);
+
+  p.close(m, StmtLet);
+}
+
+// StmtReturn = 'return' Expr ';'
+fn stmt_return(p: &mut Parser) {
+  assert!(p.at(ReturnKeyword));
+  let m = p.open();
+
+  p.expect(ReturnKeyword);
+  expr(p);
+  p.expect(Semi);
+
+  p.close(m, StmtReturn);
+}
+
+// StmtExpr = Expr ';'
+fn stmt_expr(p: &mut Parser) {
+  let m = p.open();
+
+  expr(p);
+  p.expect(Semi);
+
+  p.close(m, StmtExpr);
+}
+ +
+

Again, for stmt_expr, we push must consume a token invariant onto expr.

+

Expressions are tricky. +They always are. +For starters, lets handle just the clearly-delimited cases, like literals and parenthesis:

+ +
+ + +
fn expr(p: &mut Parser) {
+  expr_delimited(p)
+}
+
+fn expr_delimited(p: &mut Parser) {
+  let m = p.open();
+  match p.nth(0) {
+    // ExprLiteral = 'int' | 'true' | 'false'
+    Int | TrueKeyword | FalseKeyword => {
+      p.advance();
+      p.close(m, ExprLiteral)
+    }
+
+    // ExprName = 'name'
+    Name => {
+      p.advance();
+      p.close(m, ExprName)
+    }
+
+    // ExprParen   = '(' Expr ')'
+    LParen => {
+      p.expect(LParen);
+      expr(p);
+      p.expect(RParen);
+      p.close(m, ExprParen)
+    }
+
+    _ => {
+      if !p.eof() {
+        p.advance();
+      }
+      p.close(m, ErrorTree)
+    }
+  }
+}
+ +
+

In the catch-all arm, we take care to consume the token, to make sure that the statement loop in block can always make progress.

+

Next expression to handle would be ExprCall. +This requires some preparation. +Consider this example: f(1)(2) .

+

We want the following parenthesis structure here:

+ +
+ + +
+-ExprCall
+|
+|   +-ExprName
+|   |       +-ArgList
+|   |       |
+[ [ [f](1) ](2) ]
+  |    |
+  |    +-ArgList
+  |
+  +-ExprCall
+ +
+

The problem is, when the parser is at f, it doesnt yet know how many Open events it should emit.

+

We solve the problem by adding an API to go back and inject a new Open event into the middle of existing events.

+ +
+ + +
struct MarkOpened {
+  index: usize,
+}
+
+struct MarkClosed {
+  index: usize,
+}
+
+impl Parser {
+  fn open(&mut self) -> MarkOpened {
+    let mark = MarkOpened { index: self.events.len() };
+    self.events.push(Event::Open { kind: TreeKind::ErrorTree });
+    mark
+  }
+
+  fn close(
+    &mut self,
+    m: MarkOpened,
+    kind: TreeKind,
+  ) -> MarkClosed { 
+    self.events[m.index] = Event::Open { kind };
+    self.events.push(Event::Close);
+    MarkClosed { index: m.index }
+  }
+
+  fn open_before(&mut self, m: MarkClosed) -> MarkOpened { 
+    let mark = MarkOpened { index: m.index };
+    self.events.insert(
+      m.index,
+      Event::Open { kind: TreeKind::ErrorTree },
+    );
+    mark
+  }
+}
+ +
+
    +
  1. +

    Here we adjust close to also return a MarkClosed, such that we can go back and add a new event before it.

    +
  2. +
  3. +

    The new API. It is like open, but also takes a MarkClosed which carries an index of an Open event in front of which we are to inject a new Open. +In the current implementation, for simplicity, we just inject into the middle of the vector, which is an O(N) operation worst-case. +A proper solution here would be to use an index-based linked list. +That is, open_before can push the new open event to the end of the list, and also mark the old event with a pointer to the freshly inserted one. +To store a pointer, an extra field is needed:

    + +
    + + +
    struct Event {
    +  Open {
    +    kind: TreeKind,
    +    // Points forward into a list at the Open event
    +    // which logically happens before this one.
    +    open_before: Option<usize>,
    +  },
    +}
    + +
    +

    The loop in build_tree needs to follow the open_before links.

    +
  4. +
+

With this new API, we can parse function calls:

+ +
+ + +
fn expr_delimited(p: &mut Parser) -> MarkClosed { 
+  ...
+}
+
+fn expr(p: &mut Parser) {
+  let mut lhs = expr_delimited(p); 
+
+  // ExprCall = Expr ArgList
+  while p.at(LParen) { 
+    let m = p.open_before(lhs);
+    arg_list(p);
+    lhs = p.close(m, ExprCall);
+  }
+}
+
+// ArgList = '(' Arg* ')'
+fn arg_list(p: &mut Parser) {
+  assert!(p.at(LParen));
+  let m = p.open();
+
+  p.expect(LParen);
+  while !p.at(RParen) && !p.eof() { 
+    arg(p);
+  }
+  p.expect(RParen);
+
+  p.close(m, ArgList);
+}
+
+// Arg = Expr ','?
+fn arg(p: &mut Parser) {
+  let m = p.open();
+
+  expr(p);
+  if !p.at(RParen) { 
+    p.expect(Comma);
+  }
+
+  p.close(m, Arg);
+}
+ +
+
    +
  1. +

    expr_delimited now returns a MarkClosed rather than (). +No code changes are required for this, as close calls are already in the tail position.

    +
  2. +
  3. +

    To parse function calls, we check whether we are at ( and use open_before API if that is the case.

    +
  4. +
  5. +

    Parsing argument list should be routine by now. +Again, as an expression can start with many different tokens, we dont add an if p.at check to the loops body, and require arg to consume at least one token.

    +
  6. +
  7. +

    Inside arg, we use an already familiar construct to parse an optionally trailing comma.

    +
  8. +
+

Now only binary expressions are left. +We will use a Pratt parser for those. +This is genuinely tricky code, so I have a dedicated article explaining how it all works:

+

Simple but Powerful Pratt Parsing .

+

Here, Ill just dump a pageful of code without much explanation:

+ +
+ + +
fn expr(p: &mut Parser) {
+  expr_rec(p, Eof); 
+}
+
+fn expr_rec(p: &mut Parser, left: TokenKind) { 
+  let mut lhs = expr_delimited(p);
+
+  while p.at(LParen) {
+    let m = p.open_before(lhs);
+    arg_list(p);
+    lhs = p.close(m, ExprCall);
+  }
+
+  loop {
+    let right = p.nth(0);
+    if right_binds_tighter(left, right) { 
+      let m = p.open_before(lhs);
+      p.advance();
+      expr_rec(p, right);
+      lhs = p.close(m, ExprBinary);
+    } else {
+      break;
+    }
+  }
+}
+
+fn right_binds_tighter( 
+  left: TokenKind,
+  right: TokenKind,
+) -> bool {
+  fn tightness(kind: TokenKind) -> Option<usize> {
+    [
+      // Precedence table:
+      [Plus, Minus].as_slice(),
+      &[Star, Slash],
+    ]
+    .iter()
+    .position(|level| level.contains(&kind))
+  }
+
+  let Some(right_tightness) = tightness(right) else { 
+    return false
+  };
+  let Some(left_tightness) = tightness(left) else {
+    assert!(left == Eof);
+    return true;
+  };
+
+  right_tightness > left_tightness
+}
+ +
+
    +
  1. +

    In this version of pratt, rather than passing numerical precedence, I pass the actual token (learned that from jamiis post). +So, to determine whether to break or recur in the Pratt loop, we ask which of the two tokens binds tighter and act accordingly.

    +
  2. +
  3. +

    When we start parsing an expression, we dont have an operator to the left yet, so I just pass Eof as a dummy token.

    +
  4. +
  5. +

    The code naturally handles the case when the next token is not an operator (that is, when expression is complete, or when theres some syntax error).

    +
  6. +
+

And thats it! We have parsed the entirety of L!

+
+
+ +

+ Basic Resilience +

+

Lets see how resilient our basic parser is. +Lets check our motivational example:

+ +
+ + +
fn fib_rec(f1: u32,
+
+fn fib(n: u32) -> u32 {
+  return fib_rec(1, 1, n);
+}
+ +
+

Here, the syntax tree our parser produces is surprisingly exactly what we want:

+ +
+ + +
File
+  Fn
+    'fn'
+    'fib_rec'
+    ParamList
+      '('
+      (Param 'f1' ':' (TypeExpr 'u32') ',')
+    error: expected RParen
+
+  Fn
+    'fn'
+    'fib'
+    ...
+ +
+

For the first incomplete function, we get Fn, Param and ParamList, as we should. +The second function is parsed without any errors.

+

Curiously, we get this great result without much explicit effort to make parsing resilient, its a natural outcome of just not failing in the presence of errors. +The following ingredients help us:

+
    +
  • +homogeneous syntax tree supports arbitrary malformed code, +
  • +
  • +any syntactic construct is parsed left-to-right, and valid prefixes are always recognized, +
  • +
  • +our top-level loop in file is greedy: it either parses a function, or skips a single token and tries to parse a function again. +That way, if theres a valid function somewhere, it will be recognized. +
  • +
+

Thinking about the last case both reveals the limitations of our current code, and shows avenues for improvement. +In general, parsing works as a series of nested loops:

+ +
+ + +
loop { // parse a list of functions
+
+  loop { // parse a list of statements inside a function
+
+    loop { // parse a list of expressions
+
+    }
+  }
+}
+ +
+

If something goes wrong inside a loop, our choices are:

+
    +
  • +skip a token, and continue with the next iteration of the current loop, +
  • +
  • +break out of the inner loop, and let the outer loop handle recovery. +
  • +
+

The top-most loop must use the skip a token solution, because it needs to consume all of the input tokens.

+
+
+ +

+ Improving Resilience +

+

Right now, each loop either always skips, or always breaks. +This is not optimal. +Consider this example:

+ +
+ + +
fn f1(x: i32,
+
+fn f2(x: i32,, z: i32) {}
+
+fn f3() {}
+ +
+

Here, for f1 we want to break out of param_list loop, and our code does just that. +For f2 though, the error is a duplicated comma (the user will add a new parameter between x and z shortly), so we want to skip here. +We dont, and, as a result, the syntax tree for f2 is a train wreck:

+ +
+ + +
Fn
+  'fn'
+  'f2'
+  ParamList
+    '('
+    (Param 'x' ':' (TypeExpr 'i32') ',')
+(ErrorTree ',')
+(ErrorTree 'z')
+(ErrorTree ':')
+(ErrorTree 'i32')
+(ErrorTree ')')
+(ErrorTree '{')
+(ErrorTree '}')
+ +
+

For parameters, it is reasonable to skip tokens until we see something which implies the end of the parameter list. +For example, if we are parsing a list of parameters and see an fn token, then wed better stop. +If we see some less salient token, its better to gobble it up. +Lets implement the idea:

+ +
+ + +
const PARAM_LIST_RECOVERY: &[TokenKind] = &[Arrow, LCurly, FnKeyword];
+fn param_list(p: &mut Parser) {
+  assert!(p.at(LParen));
+  let m = p.open();
+
+  p.expect(LParen);
+  while !p.at(RParen) && !p.eof() {
+    if p.at(Name) {
+      param(p);
+    } else {
+      if p.at_any(PARAM_LIST_RECOVERY) {
+        break;
+      }
+      p.advance_with_error("expected parameter");
+    }
+  }
+  p.expect(RParen);
+
+  p.close(m, ParamList);
+}
+ +
+

Here, we use at_any helper function, which is like at, but takes a list of tokens. +The real implementation would use bitsets for this purpose.

+

The example now parses correctly:

+ +
+ + +
File
+  Fn
+    'fn'
+    'f1'
+    ParamList
+      '('
+      (Param 'x' ':' (TypeExpr 'i32') ',')
+      error: expected RParen
+  Fn
+    'fn'
+    'f2'
+    ParamList
+      '('
+      (Param 'x' ':' (TypeExpr 'i32') ',')
+      ErrorTree
+        error: expected parameter
+        ','
+      (Param 'z' ':' (TypeExpr 'i32'))
+      ')'
+    (Block '{' '}')
+  Fn
+    'fn'
+    'f3'
+    (ParamList '(' ')')
+    (Block '{' '}')
+ +
+

What is a reasonable RECOVERY set in a general case? +I dont know the answer to this question, but follow sets from formal grammar theory give a good intuition. +We dont want exactly the follow set: for ParamList, { is in follow, and we do want it to be a part of the recovery set, but fn is not in follow, and yet it is important to recover on it. +fn is included because its in the follow for Fn, and ParamList is a child of Fn: we also want to recursively include ancestor follow sets into the recovery set.

+

For expressions and statements, we have the opposite problem block and arg_list loops eagerly consume erroneous tokens, but sometimes it would be wise to break out of the loop instead.

+

Consider this example:

+ +
+ + +
fn f() {
+  g(1,
+  let x =
+}
+
+fn g() {}
+ +
+

It gives another train wreck syntax tree, where the g function is completely missed:

+ +
+ + +
File
+  Fn
+    'fn'
+    'f'
+    (ParamList '(' ')')
+    Block
+      '{'
+      StmtExpr
+        ExprCall
+          (ExprName 'g')
+          ArgList
+            '('
+            (Arg (ExprLiteral '1') ',')
+            (Arg (ErrorTree 'let'))
+            (Arg (ExprName 'x'))
+            (Arg (ErrorTree '='))
+            (Arg (ErrorTree '}'))
+            (Arg (ErrorTree 'fn'))
+            Arg
+              ExprCall
+                (ExprName 'g')
+                (ArgList '(' ')')
+            (Arg (ErrorTree '{'))
+            (Arg (ErrorTree '}'))
+ +
+

Recall that the root cause here is that we require expr to consume at least one token, because its not immediately obvious which tokens can start an expression. +Its not immediately obvious, but easy to compute thats exactly first set from formal grammars.

+

Using it, we get:

+ +
+ + +
const STMT_RECOVERY: &[TokenKind] = &[FnKeyword];
+const EXPR_FIRST: &[TokenKind] =
+  &[Int, TrueKeyword, FalseKeyword, Name, LParen];
+
+fn block(p: &mut Parser) {
+  assert!(p.at(LCurly));
+  let m = p.open();
+
+  p.expect(LCurly);
+  while !p.at(RCurly) && !p.eof() {
+    match p.nth(0) {
+      LetKeyword => stmt_let(p),
+      ReturnKeyword => stmt_return(p),
+      _ => {
+        if p.at_any(EXPR_FIRST) {
+          stmt_expr(p)
+        } else {
+          if p.at_any(STMT_RECOVERY) {
+            break;
+          }
+          p.advance_with_error("expected statement");
+        }
+      }
+    }
+  }
+  p.expect(RCurly);
+
+  p.close(m, Block);
+}
+
+fn arg_list(p: &mut Parser) {
+  assert!(p.at(LParen));
+  let m = p.open();
+
+  p.expect(LParen);
+  while !p.at(RParen) && !p.eof() {
+    if p.at_any(EXPR_FIRST) {
+      arg(p);
+    } else {
+        break;
+    }
+  }
+  p.expect(RParen);
+
+  p.close(m, ArgList);
+}
+ +
+

This fixes the syntax tree:

+ +
+ + +
File
+  Fn
+    'fn'
+    'f'
+    (ParamList '(' ')')
+    Block
+      '{'
+      StmtExpr
+        ExprCall
+          (ExprName 'g')
+          ArgList
+            '('
+            (Arg (ExprLiteral '1' ','))
+      StmtLet
+        'let'
+        'x'
+        '='
+        (ErrorTree '}')
+  Fn
+    'fn'
+    'g'
+    (ParamList '(' ')')
+    (Block '{' '}')
+ +
+

Theres only one issue left. +Our expr parsing is still greedy, so, in a case like this

+ +
+ + +
fn f() {
+  let x = 1 +
+  let y = 2
+}
+ +
+

the let will be consumed as a right-hand-side operand of +. +Now that the callers of expr contain a check for EXPR_FIRST, we no longer need this greediness and can return None if no expression can be parsed:

+ +
+ + +
fn expr_delimited(p: &mut Parser) -> Option<MarkClosed> {
+  let result = match p.nth(0) {
+    // ExprLiteral = 'int' | 'true' | 'false'
+    Int | TrueKeyword | FalseKeyword => {
+      let m = p.open();
+      p.advance();
+      p.close(m, ExprLiteral)
+    }
+
+    // ExprName = 'name'
+    Name => {
+      let m = p.open();
+      p.advance();
+      p.close(m, ExprName)
+    }
+
+    // ExprParen   = '(' Expr ')'
+    LParen => {
+      let m = p.open();
+      p.expect(LParen);
+      expr(p);
+      p.expect(RParen);
+      p.close(m, ExprParen)
+    }
+
+    _ => {
+      assert!(!p.at_any(EXPR_FIRST));
+      return None;
+    }
+  };
+  Some(result)
+}
+
+fn expr_rec(p: &mut Parser, left: TokenKind) {
+  let Some(mut lhs) = expr_delimited(p) else {
+    return;
+  };
+  ...
+}
+ +
+

This gives the following syntax tree:

+ +
+ + +
File
+  Fn
+    'fn'
+    'f'
+    (ParamList '(' ')')
+    Block
+      '{'
+      StmtLet
+        'let'
+        'x'
+        '='
+        (ExprBinary (ExprLiteral '1') '+')
+      StmtLet
+        'let'
+        'y'
+        '='
+        (ExprLiteral '2')
+      '}'
+ +
+

And this concludes the tutorial! +You are now capable of implementing an IDE-grade parser for a real programming language from scratch.

+

Summarizing:

+
    +
  • +

    Resilient parsing means recovering as much syntactic structure from erroneous code as possible.

    +
  • +
  • +

    Resilient parsing is important for IDEs and language servers, whos job mostly ends when the code does not have errors any more.

    +
  • +
  • +

    Resilient parsing is related, but distinct from error recovery and repair. +Rather than guessing what the user meant to write, the parser tries to make sense of what is actually written.

    +
  • +
  • +

    Academic literature tends to focus on error repair, and mostly ignores pure resilience.

    +
  • +
  • +

    The biggest challenge of resilient parsing is the design of a syntax tree data structure. +It should provide convenient and type-safe access to well-formed syntax trees, while allowing arbitrary malformed trees.

    +
  • +
  • +

    One possible design here is to make the underlying tree a dynamically-typed data structure (like JSON), and layer typed accessors on top (not covered in this article).

    +
  • +
  • +

    LL style parsers are a good fit for resilient parsing. +Because code is written left-to-right, its important that the parser recognizes well-formed prefixes of incomplete syntactic constructs, and LL does just that.

    +
  • +
  • +

    Ultimately, parsing works as a stack of nested for loops. +Inside a single for loop, on each iteration, we need to decide between:

    +
      +
    • +trying to parse a sequence element, +
    • +
    • +skipping over an unexpected token, +
    • +
    • +breaking out of the nested loop and delegating recovery to the parent loop. +
    • +
    +
  • +
  • +

    first, follow and recovery sets help making a specific decision.

    +
  • +
  • +

    In any case, if a loop tries to parse an item, item parsing must consume at least one token (if only to report an error).

    +
  • +
+ +

Source code for the article is here: https://github.com/matklad/resilient-ll-parsing/blob/master/src/lib.rs#L44 .

+
+
+
+ + + + + diff --git a/2023/06/02/the-worst-zig-version-manager.html b/2023/06/02/the-worst-zig-version-manager.html new file mode 100644 index 00000000..641db56d --- /dev/null +++ b/2023/06/02/the-worst-zig-version-manager.html @@ -0,0 +1,240 @@ + + + + + + + The Worst Zig Version Manager + + + + + + + + + + + + +
+ +
+ +
+
+ +

The Worst Zig Version Manager

+ +
+
./getzig.ps1
+ + +
#!/bin/sh
+echo `# <#`
+
+mkdir -p ./zig
+
+wget https://ziglang.org/download/0.10.1/zig-linux-x86_64-0.10.1.tar.xz -O ./zig/zig-linux-x86_64-0.10.1.tar.xz
+tar -xf ./zig/zig-linux-x86_64-0.10.1.tar.xz -C ./zig --strip-components=1
+rm ./zig/zig-linux-x86_64-0.10.1.tar.xz
+
+echo "Zig installed."
+./zig/zig version
+
+exit
+#> > $null
+
+Invoke-WebRequest -Uri "https://ziglang.org/download/0.10.1/zig-windows-x86_64-0.10.1.zip" -OutFile ".\zig-windows-x86_64-0.10.1.zip"
+Expand-Archive -Path ".\zig-windows-x86_64-0.10.1.zip" -DestinationPath ".\" -Force
+Remove-Item -Path " .\zig-windows-x86_64-0.10.1.zip"
+Rename-Item -Path ".\zig-windows-x86_64-0.10.1" -NewName ".\zig"
+
+Write-Host "Zig installed."
+./zig/zig.exe version
+ +
+

https://github.com/matklad/hello-getzig

+

Longer version:

+

One of the values of Zig which resonates with me deeply is a mindful approach to dependencies. +Zig tries hard not to ask too much from the environment, such that, if you get zig version running, you can be reasonably sure that everything else works. +Thats one of the main motivations for adding an HTTP client to the Zig distribution recently. +Building software today involves downloading various components from the Internet, and, if Zig wants for software built with Zig to be hermetic and self-sufficient, it needs to provide ability to download files from HTTP servers.

+

Theres one hurdle for self-sufficiency: how do you get Zig in the first place? +One answer to this question is from your distributions package manager. +This is not a very satisfying answer, at least until the language is both post 1.0 and semi-frozen in development. +And even then, what if your distribution is Windows? +How many distributions should be covered by Installing Zig section of your CONTRIBUTING.md?

+

Another answer would be a version manager, a-la rustup, nvm, or asdf. +These tools work well, but they are quite complex, and rely on various subtle properties of the environment, like PATH, shell activation scripts and busybox-style multipurpose executable. +And, well, this also kicks the can down the road you can use zvm to get Zig, but how do you get zvm?

+

I like how we do this in TigerBeetle. +We dont use zig from PATH. +Instead, we just put the correct version of Zig into ./zig folder in the root of the repository, and run it like this:

+ +
+ + +
$ ./zig/zig build test
+ +
+

Suddenly, whole swaths of complexity go away. +Quiz time: if you need to add a directory to PATH, which script should be edited so that both the graphical environment and the terminal are affected?

+

Finally, another interesting case study is Gradle. +Usually Gradle is a negative example, but they do have a good approach for installing Gradle itself. +The standard pattern is to store two scripts, gradlew.sh and gradlew.bat, which bootstrap the right version of Gradle by downloading a jar file (java itself is not bootstrapped this way though).

+

What all these approaches struggle to overcome is the problem of bootstrapping. +Generally, if you need to automate anything, you can write a program to do that. +But you need some pre-existing program runner! +And theres just no good options out of the box bash and powershell are passable, but barely, and they are different. +And bash and the set of coreutils also differs depending on the Unix in question. +But theres just no good solution here if you want to bootstrap automatically, you must start with universally available tools.

+

But is there perhaps some scripting language which is shared between Windows and Unix? +@cspotcode suggests a horrible workaround. +You can write a script which is both a bash script and a powershell script. +And it even isnt too too ugly!

+ +
+ + +
!/bin/bash
+echo `# <#`
+
+echo "Bash!"
+
+exit
+#> > $null
+
+Write-Host "PowerShell!"
+ +
+

So, heres an idea for a hermetic Zig version management workflow. +Theres a canonical, short getzig.ps1 PowerShell/sh script which is vendored verbatim by various projects. +Running this script downloads an appropriate version of Zig, and puts it into ./zig/zig inside the repository (.gitignore contains /zig). +Building, testing, and other workflows use ./zig/zig instead of relying on global system state ($PATH).

+

A proof-of-concept getzig.ps1 is at the start of this article. +Note that I dont know bash, powershell, and how to download files from the Internet securely, so the above PoC was mostly written by Chat GPT. +But it seems to work on my machine. +I clone https://github.com/matklad/hello-getzig and run

+ +
+ + +
$ ./getzig.ps1
+$ ./zig/zig run ./hello.zig
+ +
+

on both NixOS and Windows 10, and it prints hello.

+

If anyone wants to make an actual thing out of this idea, heres possible desiderata:

+ +
+
+ + + + + diff --git a/2023/06/18/GitHub-merge-queue.html b/2023/06/18/GitHub-merge-queue.html new file mode 100644 index 00000000..95168016 --- /dev/null +++ b/2023/06/18/GitHub-merge-queue.html @@ -0,0 +1,129 @@ + + + + + + + GitHub Merge Queue + + + + + + + + + + + + +
+ +
+ +
+
+ +

GitHub Merge Queue

+

Short, unedited note on GitHub merge queue.

+

TL;DR, https://bors.tech delivers a meaningfully better experience, although it suffers from being a third-party integration.

+

Specific grievances:

+

Complexity. This is a vague feeling, but merge queue feels like it is built by complexity merchants there are a lot of unclear settings and voluminous and byzantine docs. +Good for allocating extra budget towards build engineering, bad for actual build engineering.

+

GUI-only configuration. Bors is setup using bors.toml in the repository, merge queue is setup by clicking through web GUI. +To share config with other maintainers, I resorted to a zoomed-out screenshot of the page.

+

Unclear set of checks. The purpose of the merge queue is to enforce not rocket science rule of software engineering making sure that the code in the main branch satisfies certain quality invariants (all tests are passing). +It is impossible to tell what merge queue actually enforces. +Typically, when you enable merge queue, you subsequently find out that it actually merges anything, without any checks whatsoever.

+

Double latency. One of the biggest benefits of a merge queue for a high velocity project is its asynchrony. +After submitting a PR, you can do a review and schedule PR to be merged without waiting for CI to finish. +This is massive: it is 2X reduction to human attention required. +Without queue, you need to look at a PR twice: once to do a review, and once to click merge after the green checkmark is in. +With the queue, you only need a review, and the green checkmark comes in asynchronously. +Except that with GitHub merge queue, you cant actually add a PR to the queue until you get a green checkmark. +In effect, thats still 2X attention, and then a PR runs through the same CI checks twice (yes, you can have separate checks for merge queue and PR. No, this is not a good idea, this is complexity and busywork).

+

Lack of delegation. With bors, you can use bors delegate+ to delegate merging of a single, specific pull request to its author. +This is helpful to drive contributor engagement, and to formalize LGTM with the nits fixed approval (which again reduces number of human round trips).

+

You still should use GitHub merge queue, rather than bors-ng, as thats now a first-party feature. +Still, its important to understand how things should work, to be able to improve state of the art some other time.

+
+
+ + + + + diff --git a/2023/07/16/three-different-cuts.html b/2023/07/16/three-different-cuts.html new file mode 100644 index 00000000..dfffffcf --- /dev/null +++ b/2023/07/16/three-different-cuts.html @@ -0,0 +1,244 @@ + + + + + + + Three Different Cuts + + + + + + + + + + + + +
+ +
+ +
+
+ +

Three Different Cuts

+

In this post, well look at how Rust, Go, and Zig express the signature of function cut the power tool of string manipulation. +Cut takes a string and a pattern, and splits the string around the first occurrence of the pattern: +cut("life", "if") = ("l", "e").

+

At a glance, it seems like a non-orthogonal jumbling together of searching and slicing. +However, in practice a lot of ad-hoc string processing can be elegantly expressed via cut.

+

A lot of things are key=value pairs, and cut fits perfectly there. +Whats more, many more complex sequencies, like +--arg=key=value, +can be viewed as nested pairs. +You can cut around = once to get --arg and key=value, and then cut the second time to separate key from value.

+

In Rust, this function looks like this:

+ +
+ + +
fn split_once<'a, P>(
+  &'a self,
+  delimiter: P,
+) -> Option<(&'a str, &'a str)>
+where
+  P: Pattern<'a>,
+{
+}
+ +
+

Rusts Option is a good fit for the result type, it clearnly describes the behavior of the function when the pattern isnt found in the string at all. +Lifetime 'a expresses the relationship between the result and the input both pieces of result are substrings of &'a self, so, as long as they are used, the original string must be kept alive as well. +Finally, the separator isnt another string, but a generic P: Pattern. +This gives a somewhat crowded signature, but allows using strings, single characters, and even fn(c: char) -> bool functions as patterns.

+

When using the function, there are is a multitude of ways to access the result:

+ +
+ + +
// Propagate `None` upwards:
+let (prefix, suffix) = line.split_once("=")?;
+
+// Handle `None` in an ad-hoc way:
+let Some((prefix, suffix)) = line.split_once("=") else {
+    return
+};
+
+// Ignore `None`:
+if let Some((prefix, suffix)) = line.split_once("=") {
+    ...
+};
+
+// Handle `Some` and `None` in a symmetric way:
+let result = match line.split_once("=") {
+    Some((prefix, suffix)) => { ... }
+    None => { ... }
+};
+
+// Access only one component of the result:
+let suffix = line.split_once("=")?.1;
+
+// Use high-order functions to extract key with a default:
+let key = line.split_once("=")
+    .map(|(key, _value)| key)
+    .unwrap_or(line);
+ +
+

Heres a Go equivalent:

+ +
+ + +
func Cut(s, sep string) (before, after string, found bool) {
+    ...
+}
+ +
+

It has a better name! +Its important that frequently used building-block functions have short, memorable names, and cut is just perfect for what the function does. +Go doesnt have an Option, but it allows multiple return values, and any type in Go has a zero value, so a boolean flag can be used to signal None. +Curiously if the sep is not found in s, after is set to "", but before is set to s (that is, the whole string). +This is occasionally useful, and corresponds to the last Rust example. +But it also isnt something immediately obvious from the signature, its an extra detail to keep in mind. +Which might be fine for a foundational function! +Similarly to Rust, the resulting strings point to the same memory as s. +There are no lifetimes, but a potential performance gotcha if one of the resulting strings is alive, then the entire s cant be garbage collected.

+

There isnt much in way of using the function in Go:

+ +
+ + +
prefix, suffix, ok = strings.Cut(line, "=")
+if !ok {
+    ...
+}
+ +
+

Zig doesnt yet have an equivalent function in its standard library, but it probably will at some point, and the signature might look like this:

+ +
+ + +
pub fn cut(
+    s: []const u8,
+    sep: []const u8
+) ?struct { prefix: []const u8, suffix: []const u8 } {
+    ...
+}
+ +
+

Similarly to Rust, Zig can express optional values. +Unlike Rust, the option is a built-in, rather than a user-defined type (Zig can express a generic user-defined option, but chooses not to). +All types in Zig are strictly prefix, so leading ? concisely signals optionality. +Zig doesnt have first-class tuple types, but uses very concise and flexible type declaration syntax, so we can return a named tuple. +Curiously, this anonymous struct is still a nominal, rather than a structural, type! +Similarly to Rust, prefix and suffix borrow the same memory that s does. +Unlike Rust, this isnt expressed in the signature while in this case it is obvious that the lifetime would be bound to s, rather than sep, there are no type system guardrails here.

+

Because ? is a built-in type, we need some amount of special syntax to handle the result, but it curiously feels less special-case and more versatile than the Rust version.

+ +
+ + +
// Propagate `null` upwards / handle `null` in an ad-hoc way.
+const cut = mem.cut(line, "=") orelse return null;
+const cut = mem.cut(line, "=") orelse return;
+
+// Ignore or handle `null`.
+if (mem.cut(line, "=")) |cut| {
+
+} else {
+
+}
+
+// Go semantics: extract key with a default
+let key = if (mem.cut(line, "=")) |cut| cut.first else line;
+ +
+

Moral of the story? +Work with the grain of the language expressing the same concept in different languages usually requires a slightly different vocabulary.

+
+
+ + + + + diff --git a/2023/08/01/on-modularity-of-lexical-analysis.html b/2023/08/01/on-modularity-of-lexical-analysis.html new file mode 100644 index 00000000..aa68ccec --- /dev/null +++ b/2023/08/01/on-modularity-of-lexical-analysis.html @@ -0,0 +1,216 @@ + + + + + + + On Modularity of Lexical Analysis + + + + + + + + + + + + +
+ +
+ +
+
+ +

On Modularity of Lexical Analysis

+

I was going to write a long post about designing an IDE-friendly language. I wrote an intro and +figured that it would make a better, shorter post on its own. Enjoy!

+

The big idea of language server construction is that language servers are not magic capabilities +and performance of tooling are constrained by the syntax and semantics of the underlying language. +If a language is not designed with toolability in mind, some capabilities (e.g, fully automated +refactors) are impossible to implement correctly. Whats more, an IDE-friendly language turns out to +be a fast-to-compile language with easy-to-compose libraries!

+

More abstractly, theres this cluster of unrelated at a first sight, but intimately intertwined and +mutually supportive properties:

+ +

Separate compilation measures how fast we can compile codebase from scratch if we have unlimited +number of CPU cores. For a language server, it solves the cold start problem time to +code-completion when the user opens the project for the first time or switches branches. Incremental +compilation is the steady state of the language server user types code and expects to see +immediate effects throughout the project. Resilience to errors is important for two different +sub-reasons. First, when the user edits the code it is by definition incomplete and erroneous, but a +language server still must analyze the surrounding context correctly. But the killer feature of +resilience is that, if you are absolutely immune to some errors, you dont even have to look at the +code. If a language server can ignore errors in function bodies, it doesnt have to look at the +bodies of functions from dependencies.

+

All three properties, parallelism, incrementality, and resilience, boil down to modularity — +partitioning the code into disjoint components with well-defined interfaces, such that each +particular component is aware only about the interfaces of other components.

+
+ +

+ Minimized Example: Lexical Analysis +

+

Lets do a short drill and observe how the three properties interact at a small scale. Lets +minimize the problem of separate compilation to just lexical analysis. How can we build a +language that is easier to tokenize for an language server?

+

An unclosed quote is a nasty little problem! Practically, it is rare enough that it doesnt really +matter how you handle it, but qualitatively it is illuminating. In a language like Rust, where +strings can span multiple lines, inserting a " in the middle of a file changes the lexical structure +of the following text completely (/*, start of a block comment, has the same effect). When tokens +change, so does the syntax tree and the set of symbols defined by the file. A tiny edit, just one +symbol, unhinges semantic structure of the entire compilation unit.

+

Zig solves this problem. In Zig, no token can span several lines. That is, it would be correct to +first split Zig source file by \n, and then tokenize each line separately. This is achieved by +solving underlying problems requiring multi-line tokens better. Specifically:

+
    +
  • +

    theres a single syntax for comments, //,

    +
  • +
  • +

    double-quoted strings cant contain a \n,

    +
  • +
  • +

    but theres a really nice syntax for multiline strings:

    + +
    + + +
    const greeting =
    +    \\This is
    +    \\a multiline string
    +    \\   <- with a leading whitespace here.
    +    \\
    + +
    +
  • +
+

Do you see modules here? Disjoint-partitioning into interface-connected components? From the +perspective of lexical analysis, each line is a module. And a line always has a trivial, empty +interface different lines are completely independent. As a result:

+

First, we can do lexical analysis in parallel. If you have N CPU cores, you can split file into N +equal chunks, then in parallel locally adjust chunk boundaries such that they fall on newlines, and +then tokenize each chunk separately.

+

Second, we have quick incremental tokenization given a source edit, you determine the set of +lines affected, and re-tokenize only those. The work is proportional to the size of the edit plus at +most two boundary lines.

+

Third, any lexical error in a line is isolated just to this line. Theres no unclosed quote +problem, mistakes are contained.

+

I am by no means saying that line-by-line lexing is a requirement for an IDE-friendly language +(though it would be nice)! Rather, I want you to marvel how the same underlying structure of the +problem can be exploited for quarantining errors, reacting to changes quickly, and parallelizing the +processing.

+

The three properties are just three different faces of modularity in the end!

+
+

I do want to write that IDE-friendly language post at some point, but, as a hedge (after all, I +still owe you Why LSP Sucks? one), here are two comments where I explored the idea somewhat: +1, +2.

+

I also recommend these posts, which explore the same underlying phenomenon from the software +architecture perspective:

+ +
+
+
+ + + + + diff --git a/2023/08/06/fantastic-learning-resources.html b/2023/08/06/fantastic-learning-resources.html new file mode 100644 index 00000000..5125c375 --- /dev/null +++ b/2023/08/06/fantastic-learning-resources.html @@ -0,0 +1,382 @@ + + + + + + + Fantastic Learning Resources + + + + + + + + + + + + +
+ +
+ +
+
+ +

Fantastic Learning Resources

+

People sometimes ask me: Alex, how do I learn X?. This article is a compilation of advice I +usually give. This is things that worked for me rather than the most awesome things on earth. I +do consider every item on the list to be fantastic though, and I am forever grateful to people +putting these resources together.

+
+ +

+ Learning to Code +

+

I dont think I have any useful advice on how to learn programming from zero. The rest of the post +assumes that you at least can, given sufficient time, write simple programs. E.g., a program that +reads a list of integers from an input textual file, sorts them using a quadratic algorithm, and +writes the result to a different file.

+
+
+ +

+ Project Euler +

+

https://projecteuler.net/archives is fantastic. The first 50 problems or so are a perfect drill” +to build programming muscle, to go from I can write a program to sort a list of integers to I can +easily write a program to sort a list of integers.

+

Later problems are very heavily math based. If you are mathematically inclined, this is perfect — +you got to solve fun puzzles while also practicing coding. If advanced math isnt your cup of tea, +feel free to stop doing problems as soon as it stops being fun.

+
+
+ +

+ Modern Operating System +

+

https://en.wikipedia.org/wiki/Modern_Operating_Systems is fantastic. A version of the +book was the first +thick programming related tome I devoured. It gives a big picture of the inner workings of software +stack, and was a turning point for me personally. After reading this book I realized that I want to +be a programmer.

+
+
+ +

+ Nand to Tetris +

+

https://www.nand2tetris.org is fantastic. It plays a similar big picture role as MOS, +but this time you are the painter. In this course you build a whole computing system yourself, +starting almost from nothing. It doesnt teach you how the real software/hardware stack works, but +it thoroughly dispels any magic, and is extremely fun.

+
+
+ +

+ CSES Problem Set +

+

https://cses.fi/problemset/ is fantastic. This is a list of algorithmic problems, which is +meticulously crafted to cover all the standard topics to a reasonable depth. This is by far the best +source for practicing algorithms.

+
+
+ +

+ Programming Languages +

+

https://www.coursera.org/learn/programming-languages is fantastic. This course is a whirlwind tour +across several paradigms of programming, and makes you really get what programming languages are +about (and variance).

+
+
+ +

+ Compilers +

+

http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=Compilers is fantastic. In this +course, you implement a working compiler for a simple, but real programming language. Note that you +can implement your compiler in any language.

+
+
+ +

+ Software Architecture +

+

https://www.tedinski.com/archive/ is fantastic. Work through the whole archive in chronological +order. This is by far the best resource on programming in the large.

+
+
+ +

+ Random Bits of Advice +

+

What follows are some things Ive learned for myself. Take with a pinch of salt!

+
+ +

+ On Mentorship +

+

Having a great mentor is fantastic, but mentors are not always available. Luckily, programming can +be mastered without a mentor, if you got past the initial learning step. When you code, you get a +lot of feedback, and, through trial and error, you can process the feedback to improve your skills. +In fact, the hardest bit is actually finding the problems to solve (and this article suggests many). +But if you have the problem, you can self-improve noticing the following:

+
    +
  • +How you verify that the solution works. +
  • +
  • +Common bugs and techniques to avoid them in the future. +
  • +
  • +Length of the solution: can you solve the problem using shorter, simpler code? +
  • +
  • +Techniques can you apply anything youve read about this week? How would the problem be solved +in Haskell? Could you apply pattern from language X in language Y? +
  • +
+

In this context it is important to solve the same problem repeatedly. E.g., you could try solving +the same model problem in all languages you know, with a month or two break between attempts. +Repeatedly doing the same thing and noticing differences and similarities between tries is the +essence of self-learning.

+
+
+ +

+ On Programming Languages +

+

Learning your first programming language is a nightmare, because you are learning your editing +environment (PyScripter, IntelliJ IDEA, VS Code) first, simple algorithms second, and the language +itself third. It gets much easier afterwards!

+

Learning different programming languages is one of the best way to improve your programming skills. +By seeing whats similar, and whats different, you deeper learn how the things work under the hood. +Different languages put different idioms to the forefront, and learning several expands your +vocabulary considerably. As a bonus, after learning N languages, learning N+1st becomes a question +of skimming through the official docs.

+

In general, you want to cover big families of languages: Python, Java, Haskell, C, Rust, Clojure +would be a good baseline. Erlang, Forth, and Prolog would be good additions afterwards.

+
+
+ +

+ On Algorithms +

+

There are three levels of learning algorithms

+
+
Level 1
+
+

You are not actually learning algorithms, you are learning programming. At this stage, it doesnt +matter how long your code is, how pretty it is, or how efficient it is. The only thing that +matters is that it solves the problem. Generally, this level ends when you are fairly comfortable +with recursion. Few first problems from Project Euler are a great resource here.

+
+
Level 2
+
+

Here you learn algorithms proper. The goal here is mostly encyclopedic knowledge of common +techniques. There are quite a few, but not too many of those. At this stage, the most useful thing +is understanding the math behind the algorithms being able to explain algorithm using +pencil&paper, prove its correctness, and analyze Big-O runtime. Generally, you want to learn the +name of algorithm or technique, read and grok the full explanation, and then implement it.

+

I recommend doing an abstract implementation first (i.e., not HashMap to solve problem X, but +“just HashMap). Include tests in your implementation. Use randomized testing (e.g., when testing +sorting algorithms, dont use a finite set of example, generate a million random ones).

+

Its OK and even desirable to implement the same algorithm multiple times. When solving problems, +like CSES, you could abstract your solutions and re-use them, but its better to code everything +from scratch every time, until youve fully internalized the algorithm.

+
+
Level 3
+
+

One day, long after Ive finished my university, I was a TA for an algorithms course. The lecturer +for the course was the person who originally taught me to program, through a similar algorithms +course. And, during one coffee break, he said something like

+ +
+

We dont teach algorithms so that students can code Dijkstra with their eyes closed on the job. +They probably wont have to code any fancy algorithms themselves.

+

We teach algorithms so that students learn to think about invariants and properties when writing +code. Real-life code is usually simple enough that it mostly works if you just throw spaghetti +onto the wall. But it doesnt always work. To write correct, robust code at work, you need to +think about invariants.

+

The trick with algorithms is that coding them is hard. The only way to avoid bugs is to force +yourself to think in terms of invariants.

+
+ +
+

I was thunderstruck! I didnt realize thats the reason why I am learning (well, teaching at that +point) algorithms! Before, I always muddled through my algorithms by randomly tweaking generally +correct stuff until it works. E.g., with a binary search, just add +1 somewhere until it doesnt +loop on random arrays. After hearing this advice, I went home and wrote my millionth binary +search, but this time I actually added comments with loop invariants, and it worked from the first +try! I applied similar techniques for the rest of the course, and since then my subjective +perception of bug rate (for normal work code) went down dramatically.

+

So this is the third level of algorithms you hone your coding skills to program without bugs. +If you are already fairly comfortable with algorithms, try doing CSES again. But this time, spend +however much you need double-checking the code before submission, but try to get everything +correct on the first try.

+
+
+
+
+ +

+ On Algorithm Names +

+

Heres the list of things you might want to be able to do, algorithmically. You dont need to be +able to code everything on the spot. I think it would help if you know what each word is about, and +have implemented the thing at least once in the past.

+

Linear search, binary search, quadratic sorting, quick sort, merge sort, heap sort, binary heap, +growable array (aka ArrayList, vector), doubly-linked list, binary search tree, avl tree, red-black +tree, B-tree, splay tree, hash table (chaining and open addressing), depth first search, breadth first +search, topological sort, strongly connected components, minimal spanning tree (Prim & Kruskal), +shortest paths (bfs, Dijkstra, Floyd–Warshall, Bellman–Ford), substring search (quadratic, +Rabin-Karp, Boyer-Moore, Knuth-Morris-Pratt), trie, Aho-Corasick, dynamic programming (longest +common subsequence, edit distance).

+
+
+ +

+ On Larger Programs +

+

A very powerful exercise is coding a medium-sized project from scratch. Something that takes more +than a day, but less than a week, and has a meaningful architecture which can be just right, or +messed up. Here are some great projects to do:

+
+
Ray Tracer
+
+

Given an analytical description of a 3D scene, convert it to a colored 2D image, by simulating a +path of a ray of light as it bounces off objects.

+
+
Software Rasterizer
+
+

Given a description of a 3D scene as a set of triangles, convert it to a colored 2D image by +projecting triangles onto the viewing plane and drawing the projections in the correct order.

+
+
Dynamically Typed Programming Language
+
+

An interpreter which reads source code as text, parses it into an AST, and directly executes the +AST (or maybe converts AST to the byte code for some speed up)

+
+
Statically Typed Programming Language
+
+

A compiler which reads source code as text, and spits out a binary (WASM would be a terrific +target).

+
+
Relational Database
+
+

Several components:

+
    +
  • +Storage engine, which stores data durably on disk and implements on-disk ordered data structures +(B-tree or LSM) +
  • +
  • +Relational data model which is implemented on top of primitive ordered data structures. +
  • +
  • +Relational language to express schema and queries. +
  • +
  • +Either a TCP server to accept transactions as a database server, or an API for embedding for an +in-processes embedded database. +
  • +
+
+
Chat Server
+
+

An exercise in networking and asynchronous programming. Multiple client programs connect to a +server program. A client can send a message either to a specific different client, or to all other +clients (broadcast). There are many variations on how to implement this: blocking read/write +calls, epoll, io_uring, threads, callbacks, futures, manually-coded state machines.

+
+
+

Again, its more valuable to do the same exercise six times with variations, than to blast through +everything once.

+
+
+
+
+ + + + + diff --git a/2023/08/09/types-and-zig.html b/2023/08/09/types-and-zig.html new file mode 100644 index 00000000..57a5cff0 --- /dev/null +++ b/2023/08/09/types-and-zig.html @@ -0,0 +1,389 @@ + + + + + + + Types and the Zig Programming Language + + + + + + + + + + + + +
+ +
+ +
+
+ +

Types and the Zig Programming Language

+

Notes on less-than-obvious aspects of Zigs type system and things that surprised me after diving +deeper into the language.

+
+ +

+ Nominal Types +

+

Zig has a nominal type system despite the fact that types lack names. A struct type is declared by +struct { field: T }. +Its anonymous; an explicit assignment is required to name the type:

+ +
+ + +
const S = struct {
+  field: T,
+};
+ +
+

Still, the type system is nominal, not structural. The following does not compile:

+ +
+ + +
fn f() struct { f: i32 } {
+  return .{ .f = 92 };
+}
+
+fn g(s: struct { f: i32 }) void {
+  _ = s;
+}
+
+pub fn main() void {
+  g(f()); // <- type mismatch
+}
+ +
+

The following does:

+ +
+ + +
const S = struct { f: i32 };
+
+fn f() S {
+  return .{ .f = 92 };
+}
+
+fn g(s: S) void {
+  _ = s;
+}
+
+pub fn main() void {
+  g(f());
+}
+ +
+

One place where Zig is structural are anonymous struct literals:

+ +
+ + +
pub fn main() void {
+  const x                      = .{ .foo = 1 };
+  const y: struct { foo: i32 } = x;
+  comptime assert(@TypeOf(x) != @TypeOf(y));
+}
+ +
+

Types of x and y are different, but x can be coerced to y.

+

In other words, Zig structs are anonymous and nominal, but anonymous structs are structural!

+
+
+ +

+ No Unification +

+

Simple type inference for an expression works by first recursively inferring the types of +subexpressions, and then deriving the result type from that. So, to infer types in +foo().bar(), we first derive the type of foo(), then lookup method bar on that +type, and use the return type of the method.

+

More complex type inference works through so called unification algorithm. It starts with a similar +recursive walk over the expression tree, but this walk doesnt infer types directly, but rather +assigns a type variable to each subexpression, and generates equations relating type variables. So the +result of this first phase look like this:

+ +
+ + +
x = y
+Int = y
+ +
+

Then, in the second phase the equations are solved, yielding, in this case, x = Int and y = Int.

+

Usually languages with powerful type systems have unification somewhere, though often unification +is limited in scope (for example, Kotlin infers types statement-at-a-time).

+

It is curious that Zig doesnt do unification, type inference is a simple single-pass recursion (or +at least it should be, I havent looked at how it is actually implemented). So, anytime theres a +generic function like +fn reverse(comptime T: type, xs: []T) void, +the call site has to pass the type in explicitly:

+ +
+ + +
pub fn main() void {
+  var xs: [3]i32 = .{1, 2, 3};
+  reverse(i32, &xs);
+}
+ +
+

Does it mean that you have to pass the types all the time? Not really! In fact, the only place which +feels like a burden are functions in std.mem module which operate on slices, but thats just +because slices are builtin types (a kind of pointer really) without methods. The thing is, when you +call a method on a generic type, its type parameters are implicitly in scope, and dont have to be +specified. Study this example:

+ +
+ + +
const std = @import("std");
+const assert = std.debug.assert;
+
+pub fn Slice(comptime T: type) type {
+  return struct {
+    ptr: [*]T,
+    len: usize,
+
+    fn init(ptr: [*]T, len: usize) @This() {
+      return .{ .ptr = ptr, .len = len };
+    }
+
+    fn reverse(slice: @This()) void{
+      ...
+    }
+  };
+}
+
+pub fn main() void {
+  var xs: [3]i32 = .{1, 2, 3};
+  var slice = Slice(i32).init(&xs, xs.len);
+
+  slice.reverse(); // <- look, no types!
+}
+ +
+

Theres a runtime parallel here. At runtime, theres a single dynamic dispatch, which prioritizes +dynamic type of the first argument, and multiple dynamic dispatch, which can look at dynamic types +of all arguments. Here, at compile time, the type of the first argument gets a preferential +treatment. And, similarly to runtime, this covers 80% of use cases! Though, Id love for things like +std.mem.eql to be actual methods on slices

+
+
+ +

+ Mandatory Function Signatures +

+

One of the best tricks a language server can pull off for as-you-type analysis is skipping bodies of +the functions in dependencies. This works as long as the language requires complete signatures. In +functional languages, its customary to make signatures optional, which precludes this crucial +optimization. As per Modularity Of Lexical +Analysis, this has +repercussions for all of:

+
    +
  • +incremental compilation, +
  • +
  • +parallel compilation, +
  • +
  • +robustness to errors. +
  • +
+

I always assumed that Zig with its crazy comptime requires autopsy. +But thats not actually the case! Zig doesnt have decltype(auto), signatures are always explicit!

+

Lets look at, e.g., std.mem.bytesAsSlice:

+ +
+ + +
fn bytesAsSlice(
+  comptime T: type,
+  bytes: anytype,
+) BytesAsSliceReturnType(T, @TypeOf(bytes)) {
+ +
+

Note how the return type is not anytype, but the actual, real thing. You could write complex +computations there, but you cant look inside the body. Of course, it also is possible to write fn +foo() @TypeOf(bar()) {, but that feels like a fair game bar() will be evaluated at +compile time. In other words, only bodies of functions invoked at comptime needs to be looked at by +a language server. This potentially improves performance for this use-case quite a bit!

+

Its useful to contrast this with Rust. There, you could write

+ +
+ + +
fn sneaky() -> impl Sized {
+  0i32
+}
+ +
+

Although it feels like you are stating the interface, its not really the case. Auto traits like +Send and Sync leak, and that can be detected by downstream code and lead to, e.g., different +methods being called via Deref-based specialization depending on : Send being implemented:

+ +
+ + +
struct X<T>(T);
+
+impl<T: Send> X<T> {
+  fn foo(&self) -> i32 { todo!() }
+}
+
+struct Y;
+impl Y {
+  fn foo(&self) -> String { todo!() }
+}
+
+impl<T> std::ops::Deref for X<T> {
+  type Target = Y;
+  fn deref(&self) -> &Y { todo!() }
+}
+
+fn f() -> impl Sized {
+  ()
+//  std::rc::Rc::new(())
+}
+
+fn main() {
+  let x = X(f());
+  let t = x.foo(); // <- which `foo`?
+  // The answer is inside f's body!
+}
+ +
+

Zig is much more strict here, you have to fully name the return type (the name doesnt have to be +pretty, take a second look at bytesAsSlice). But its not perfect, a genuine leakage happens with +inferred error types (!T syntax). A bad example would look like this:

+ +
+ + +
fn f() !void {
+   // Mystery!
+}
+
+pub fn main() !void {
+  f() catch |err| {
+    comptime assert(
+      @typeInfo(@TypeOf(err)).ErrorSet.?.len == 1,
+    );
+  };
+}
+ +
+

Here, to check main, we actually do need to dissect fs body, we cant treat the error union +abstractly. When the compiler analyzes main, it needs to stop to process f signature (which is +very fast, as it is very short) and then f’s body (this part could be quite slow, there might be a +lot of code behind that Mystery! Its interesting to ponder alternative semantics, where, during +type checking, inferred types are treated abstractly, and error exhastiveness is a separate late +pass in the compiler. That way, complier only needs fs signature to check main. And that means +that bodies of main and f could be checked in parallel.

+

Thats all for today! The type system surprising Ive found so far are:

+
    +
  • +

    Nominal type system despite notable absence of names of types.

    +
  • +
  • +

    Unification-less generics which dont incur unreasonable annotation burden due to methods closing +over generic parameters.

    +
  • +
  • +

    Explicit signatures with no Voldemort types with a +notable exception of error unions.

    +
  • +
+

Discussion on ziggit.dev.

+
+
+
+ + + + + diff --git a/2023/08/13/role-of-algorithms.html b/2023/08/13/role-of-algorithms.html new file mode 100644 index 00000000..982d790b --- /dev/null +++ b/2023/08/13/role-of-algorithms.html @@ -0,0 +1,389 @@ + + + + + + + Role Of Algorithms + + + + + + + + + + + + +
+ +
+ +
+
+ +

Role Of Algorithms

+

This is lobste.rs comment as an article, so expect even more abysmal editing than usual.

+

Let me expand on something I mentioned in the +https://matklad.github.io/2023/08/06/fantastic-learning-resources.html +post:

+

Algorithms are a useful skill not because you use it at work every day, but because they train you +to be better at particular aspects of software engineering.

+

Specifically:

+

First, algorithms drill the skill of bug-free coding. Algorithms are hard and frustrating! Subtle +off-by-one might not matter for simple tests, but breaks corner cases. But if you practice +algorithms, you get better at this particular skill of writing correct small programs, and I think +this probably generalizes.

+

To give an array of analogies:

+ +

I still remember two specific lessons I learned when I started doing algorithms many years ago:

+
+
Debugging complex code is hard, first simplify, then debug
+
+

Originally, when I was getting a failed test, I sort of tried to add more code to my program to +make it pass. At some point I realized that this is going nowhere, and then I changed my workflow +to first try to remove as much code as I can, and only then investigate the problematic test +case (which with time morphed into a skill of not writing more code then necessary in the first +place).

+
+
Single source of truth is good
+
+

A lot of my early bugs was due to me duplicating the same piece of information in two places and +then getting them out of sync. Internalizing that as a single source of truth fixed the issues.

+
+
+

Meta note: if you already know this, my lessons are useless. If you dont yet know them, they are +still useless and most likely will bounce off you. This is tacit knowledge its very hard to +convey it verbally, it is much more efficient to learn these things yourself by doing.

+

Somewhat related, I noticed a surprising correlation between programming skills in the small, and +programming skills in the large. You can solve a problem in five lines of code, or, if you try hard, +in ten lines of code. If you consistently come up with concise solutions in the small, chances are +large scale design will be simple as well.

+

I dont know how true is that, as I never tried to look at a proper study, but it looks very +plausible from what Ive seen. If this is true, the next interesting question is: if you train +programming-in-the-small skills, do they transfer to programming in the large?. Again, I dont +know, but Id take this Pascals wager. As an imperfect and self-serving illustration of this point, +consider that both +https://matklad.github.io/2023/12/21/retry-loop.html +and +https://users.rust-lang.org/t/soft-question-scaling-codebase-50k-loc-500k-loc/104129/10 +were written in a span of a single morning.

+

Second, algorithms teach about properties and invariants. Some lucky people get those skills from +a hard math background, but algorithms are a much more accessible way to learn them, as everything +is very visual, immediately testable, and has very short and clear feedback loop.

+

And properties and invariants is what underlines most big and successful systems. Like 90% of the +code is just fluff and glue, and if you have the skill to see the 10% that is architecturally +salient properties, you could comprehend the system much faster.

+

Third, algorithms occasionally are useful at the job! Just last week on our design walk&talk we +were brainstorming one particular problem, and I was like

+ +
+

Wait, so the problem here is that our solution is O(1) amortized, but really that means O(N) +occasionally and that creates problem. I wonder if we could shift amortized work to when we do the +real work, sort of how there are helper threads in concurrent programming. Ohh, this actually sounds +like range query problem! Yeah, I think that cryptic trick that is called дерево отрезков in +Russian and doesnt have a meme name in English (monoid tree is a good, but unknown, name) could +help here. Yup, that actually does solve amortization issue, this will be O(log N) non-amortized.

+
+ +
+

We probably wont go with that solution as thats too complex algorithmically for what ultimately is +a corner case, but its important that we understand problem space in detail before we pick a +solution.

+

Note also how algorithms vocabulary helps me to think about the problem. In math (including +algorithms), theres just like a handful of ideas which are applied again and again under different +guises. You need some amount of insight of course, but, for most simple problems, what you actually +need is just an ability to recognize the structure youve seen somewhere already.

+

Fourth, connecting to the previous ones, the ideas really do form interconnected web which, on a +deep level, underpins a whole lot of stuff. So, if you do have non-zero amount of pure curiosity +when it comes to learning programming, algorithms cut pretty deep to the foundation. Let me repeat +the list from the last post, but with explicit connections to other things:

+
+
linear search
+
+

assoc lists in most old functional languages work that way

+
+
binary search
+
+

It is literally everywhere. Also, binary search got a cute name, but actually it isnt the +primitive operation. The primitive operation is partition_point, a predicate version of binary +search. This is what you should add to your languages stdlib as a primitive, and base everything +else in terms of it. Also, it is one of the few cases where we know lower bound of complexity. If +an algorithm does k binary comparisons, it can give at most 2k distinct answers. So, to find +insertion point among n items, you need at least k questions such that 2k > n.

+
+
quadratic sorting
+
+

We use it at work! Some collections are statically bound by a small constant, and quadratically +sorting them just needs less machine code. We are also a bit paranoid that production sort +algorithms are very complex and might have subtle bugs, esp in newer languages.

+
+
merge sort
+
+

This is how you sort things on disk. This is also how LSM-trees, the most practically important +data structure you havent learned about in school, works! And k-way merge also is occasionally +useful (this is from work from three weeks ago).

+
+
heap sort
+
+

Well, this one is only actually useful for the heap, but I think maybe the kernel uses it when +it needs to sort something in place, without extra memory, and in guaranteed O(N log N)?

+
+
binary heap
+
+

Binary heaps are everywhere! Notably, simple timers are a binary heap of things in the order of +expiration. This is also a part of Dijkstra and k-way-merge.

+
+
growable array
+
+

Thats the mostly widely used collection of them all! Did you know that grow factor 2 has a +problem that the size after n reallocations is larger then the sum total of all previous sizes, +so the allocator cant re-use the space? Anecdotally, growth factors less than two are preferable +for this reason.

+
+
doubly-linked list
+
+

At the heart of rust-analyzer is a two-dimensional doubly-linked +list.

+
+
binary search tree
+
+

Again, rust-analyzer green tree are binary search trees using offset as an implicit key. +Monoid trees are also binary search trees.

+
+
AVL tree
+
+

Ok, this one I actually dont know a direct application of! But I remember two +programming-in-the-small lessons AVL could have taught me, but didnt. I struggled a lot +implementing all of small left rotation, small right rotation, big left rotation, big right +rotation. Some years later, Ive learned that you dont do

+ +
+ + +
left: Tree,
+right: Tree,
+ +
+

as that forces code duplication. Rather, you do children: [Tree; 2] and then you could +use child_index and child_index ^ 1 to abstract over left-right.

+

And then some years later still I read in wikipedia that big rotations are actually a composition +of two small rotations.

+

Actually, Ive lied that I dont know connections here. You use the same rotations for the splay +tree.

+
+
Red Black Tree
+
+

red-black tree is a 2-3 tree is a B-tree. Also, you probably use jemalloc, and it has a red-black +tree implemented as a C +macro. +Left-leaning red-black tree are an interesting variation, which is claimed to be simpler, but is +also claimed to not actually be simpler, because it is not symmetric and neuters the children +trick.

+
+
B-tree
+
+

If you use Rust, you probably use B-tree. Also, if you use a database, it stores data either in +LSM or in a B-tree. Both of these are because B-trees play nice with memory hierarchy.

+
+
Splay Tree
+
+

Worth knowing just to have a laugh at https://www.link.cs.cmu.edu/splay/tree5.jpg.

+
+
HashTable
+
+

Literally everywhere, both chaining and open-addressing versions are widely used.

+
+
Depth First Search
+
+

This is something I have to code, explicitly or implicitly, fairly often. Every time where you +have a DAG, when things depend on other things, youd have a DFS somewhere. In rust-analyzer, +there are at least a couple one in borrow checker for something (have no idea what that does, +just grepped for fn dfs) and one in crate graph to detect cycles.

+
+
Breadth First Search
+
+

Ditto, any kind of exploration problem is usually solved with bfs. Eg, rust-analyzer uses bfs +for directory traversal.

+

Which is better, bfs or dfs? Why not both?! Take a look at bdfs from rust-analyzer:

+

https://github.com/rust-lang/rust-analyzer/blob/2fbe69d117ff8e3ffb9b21c4a564f835158eb67b/crates/hir-expand/src/ast_id_map.rs#L195-L222

+
+
Topological Sort
+
+

Again, comes up every time you deal with things which depend on each other. rust-analyzer has +crates_in_topological_order

+
+
Strongly Connected Components
+
+

This is needed every time things depend on each other, but you also allow cyclic dependencies. I +dont think Ive needed this one in real life. But, given that SCC is how you solve 2-SAT in +polynomial time, seems important to know to understand the 3 in 3-SAT

+
+
Minimal Spanning Tree
+
+

Ok, really drawing a blank here! Connects to sorting, disjoint set union (which is needed for +unification in type-checkers), and binary heap. Seems practically important algorithm though! Ah, +MST also gives an approximation for planar traveling salseman I think, another border between hard +& easy problems.

+
+
Dijkstra
+
+

Dijkstra is what I think about when I imagine a Platonic algorithm, though +I dont think Ive used it in practice? Connects to heap.

+

Do you know why we use i, j, k for loop indices? Because D ijk stra!

+
+
Floyd-Warshall
+
+

This one is cool! Everybody knows why any regular expression can be complied to an equivalent +finite state machine. Few people know the reverse, why each automaton has an equivalent regex +(many people know this fact, but few understand why). Well, because Floyd-Warshall! To convert an +automaton to regex use the same algorithm you use to find pairwise distances in a graph.

+

Also, this is a final boss of dynamic programming. If you understand why this algorithm works, you +understand dynamic programming. Despite being tricky to understand, its very easy to implement! I +randomly stumbled into Floyd-Warshall, when I tried to implement a different, wrong approach, and +made a bug which turned my broken algo into a correct Floyd-Warshall.

+
+
Bellman-Ford
+
+

Again, not much practical applicaions here, but the theory is well connected. All shortest path +algorithms are actually fixed-point iterations! But with Bellman-Ford and its explicit edge +relaxation operator thats most obvious. Next time you open static analysis textbook and learn +about fixed point iteration, map that onto the problem of finding shortest paths!

+
+
Quadratic Substring Search
+
+

This is what you language standard library does

+
+
Rabin-Karp
+
+

An excellent application of hashes. The same idea, hash(composite) = +compbine(hash(component)*), is used in rust-analyzer to intern syntax +trees.

+
+
Boyer-Moore
+
+

This is beautiful and practical algorithm which probably handles the bulk of real-world searches +(that is, its probably the hottest bit of ripgrep as used by an average person). Delightfully, +this algorithm is faster than theoretically possible it doesnt even look at every byte of +input data!

+
+
Knuth-Morris-Pratt
+
+

Another this is how you do string search in the real world algorithm. It also is the platonic +ideal of a finite state machine, and almost everything is an FSM. It also is Aho-Corasick.

+
+
Aho-Corasick
+
+

This is the same as Knuth-Morris-Pratt, but also teaches you about tries. Again, super-useful for +string searches. As it is an FSM, and a regex is an FSM, and theres a general construct for +building a product of two FSMs, you can use it to implement fuzzy search. Workspace symbol” +feature in rust-analyzer works like this. Heres a part +of implementation.

+
+
Edit Distance
+
+

Everywhere in Bioinformatics (not the actual edit distance, but this problem shape). The first +post on this blog is about this problem:

+

https://matklad.github.io/2017/03/12/min-of-three.html

+

Its not about algorithms though, its about CPU-level parallelism.

+
+
+
+
+ + + + + diff --git a/2023/08/17/typescript-is-surprisingly-ok-for-compilers.html b/2023/08/17/typescript-is-surprisingly-ok-for-compilers.html new file mode 100644 index 00000000..288c10e6 --- /dev/null +++ b/2023/08/17/typescript-is-surprisingly-ok-for-compilers.html @@ -0,0 +1,605 @@ + + + + + + + TypeScript is Surprisingly OK for Compilers + + + + + + + + + + + + +
+ +
+ +
+
+ +

TypeScript is Surprisingly OK for Compilers

+

There are two main historical trends when choosing an implementation language for something +compiler-shaped.

+

For more language-centric tasks, like a formal specification, or a toy hobby language, OCaml makes +most sense. See, for example, plzoo or WebAssembly reference +interpreter.

+

For something implementation-centric and production ready, C++ is often chosen: LLVM, clang, v8, +HotSpot are all C++.

+

These days, Rust is a great new addition to the landscape. It is influenced most directly by ML and +C++, combines their strengths, and even brings something new of its own to the table, like seamless, +safe multithreading. Still, Rust leans heavily towards production readiness side of the spectrum. +While some aspects of it, like a just works build system, help with prototyping as well, theres +still extra complexity tax due to the necessity to model physical layout of data. The usual advice, +when you start building a compiler in Rust, is to avoid pointers and use indexes. Indexes are great! +In large codebase, they allow greater decoupling (side tables can stay local to relevant modules), +improved performance (an index is u32 and nudges you towards struct-of-arrays layouts), and more +flexible computation strategies (indexes are easier to serialize or plug into incremental +compilation framework). But they do make programming-in-the-small significantly more annoying, which +is a deal-breaker for hobbyist tinkering.

+

But OCaml is crufty! Is there something better? Today, I realized that TypeScript might actually be +OK? It is not really surprising, given how the language works, but it never occured to me to think +about TypeScript as an ML equivalent before.

+

So, lets write a tiny-tiny typechecker in TS!

+

Of course, we start with deno. See A Love Letter to +Deno for more details, but the +TL;DR is that deno provides out-of-the-box experience for TypeScript. This is a pain point for +OCaml, and something that Rust does better than either OCaml or C++. But deno does this better than +Rust! Its just a single binary, it comes with linting and formatting, theres no compilation step, +and there are built-in task runner and watch mode. A dream setup for quick PLT hacks!

+

And then theres TypeScript itself, with its sufficiently flexible, yet light-ceremony type system.

+

Lets start with defining an AST. As we are hacking, we wont bother with making it an IDE-friendly +concrete syntax tree, or incremental-friendly only store relative offsets tree, and will just tag +AST nodes with locations in file:

+ +
+ + +
export interface Location {
+  file: string;
+  line: number;
+  column: number;
+}
+ +
+

Even here, we already see high-level nature of TypeScript string is just a string, theres no +thinking about usize vs u32 as numbers are just numbers.

+

Usually, an expression is defined as a sum-type. As we want to tag each expression with a location, +that representation would be slightly inconvenient for us, so we split things up a bit:

+ +
+ + +
export interface Expr {
+    location: Location;
+    kind: ExprKind;
+}
+
+export type ExprKind = ExprBool | ExprInt | ... ;
+ +
+

One more thing as we are going for something quick, well be storing inferred types directly in +the AST nodes. Still, we want to keep raw and type-checked AST separate, so what we are going to do +here is to parametrize the Expr over associated data it stores. A freshly parsed expression would +use void as data, and the type checker will set it to Type. Heres what we get:

+ +
+ + +
export interface Expr<T> {
+  location: Location;
+  data: T;
+  kind: ExprKind<T>;
+}
+
+export type ExprKind<T> =
+  | ExprBool<T>
+  | ExprInt<T>
+  | ExprBinary<T>
+  | ExprControl<T>;
+ +
+

A definition of ExprBinary could look like this:

+ +
+ + +
export interface ExprBinary<T> {
+  op: BinaryOp;
+  lhs: Expr<T>;
+  rhs: Expr<T>;
+}
+
+export enum BinaryOp {
+  Add, Sub, Mul, Div,
+  Eq, Neq,
+  Lt, Gt, Le, Ge,
+}
+ +
+

Note how I dont introduce separate types for, e.g, AddExpr and SubExpr all binary +expressions have the same shape, so one type is enough!

+

But we need a tiny adjustment here. Our Expr kind is defined as a union type. To match a value of +a union type a bit of runtime type information is needed. However, its one of the core properties +of TypeScript that it doesnt add any runtime behaviors. So, if we want to match on expression kinds +(and we for sure want!), we need to give a helping hand to the compiler and include a bit of RTTI +manually. That would be the tag field:

+ +
+ + +
export interface ExprBinary<T> {
+  tag: "binary";
+  op: BinaryOp;
+  lhs: Expr<T>;
+  rhs: Expr<T>;
+}
+ +
+

tag: "binary" means that the only possible runtime value for tag is the string "binary".

+

Similarly to various binary expressions, boolean literal and int literal expressions have almost +identical shape. Almost, because the payload (boolean or number) is different. TypeScript +allows us to neatly abstract this over:

+ +
+ + +
export type ExprBool<T> = ExprLiteral<T, boolean, "bool">;
+export type ExprInt<T> = ExprLiteral<T, number, "int">;
+
+export interface ExprLiteral<T, V, Tag> {
+  tag: Tag;
+  value: V;
+}
+ +
+

Finally, for control-flow expressions we only add if for now:

+ +
+ + +
export type ExprControl<T> = ExprIf<T>;
+
+export interface ExprIf<T> {
+  tag: "if";
+  cond: Expr<T>;
+  then_branch: Expr<T>;
+  else_branch: Expr<T>;
+}
+ +
+

This concludes the definition of the ast! Lets move on to the type inference! Start with types:

+ +
+ + +
type Type = TypeBool | TypeInt;
+
+interface TypeBool {
+  tag: "Bool";
+}
+const TypeBool: TypeBool = { tag: "Bool" };
+
+interface TypeInt {
+  tag: "Int";
+}
+const TypeInt: TypeInt = { tag: "Int" };
+ +
+

Our types are really simple, we could have gone with type Type = "Int" | "Bool", but +lets do this a bit more enterprisy! We define separate types for integer and boolean types. As these +types are singletons, we also provide canonical definitions. And here is another TypeScript-ism. +Because TypeScript fully erases types, everything related to types lives in a separate namespace. So +you can have a type and a value sharing the same name. Which is exactly what we use to define the +singletons!

+

Finally, we can take advantage of our associated-data parametrized expression and write the +signature of

+ +
+ + +
function infer_types(expr: ast.Expr<void>): ast.Expr<Type>
+ +
+

As it says on the tin, inter_types fills in Type information into the void! Lets fill in the +details!

+ +
+ + +
function infer_types(expr: ast.Expr<void>): ast.Expr<Type> {
+  switch (expr.kind.tag) {
+    cas
+  }
+}
+ +
+

If at this point we hit Enter, the editor completes:

+ +
+ + +
function infer_types(expr: ast.Expr<void>): ast.Expr<Type> {
+  switch (expr.kind.tag) {
+    case "bool":
+    case "int":
+    case "binary":
+    case "if":
+  }
+}
+ +
+

Theres one problem though. What we really want to write here is something like +const inferred_type = switch(..), +but in TypeScript switch is a statement, not an expression. +So lets define a generic visitor!

+ +
+ + +
export type Visitor<T, R> = {
+  bool(kind: ExprBool<T>): R;
+  int(kind: ExprInt<T>): R;
+  binary(kind: ExprBinary<T>): R;
+  if(kind: ExprIf<T>): R;
+};
+
+export function visit<T, R>(
+  expr: Expr<T>,
+  v: Visitor<T, R>,
+): R {
+  switch (expr.kind.tag) {
+    case "bool": return v.bool(expr.kind);
+    case "int": return v.int(expr.kind);
+    case "binary": return v.binary(expr.kind);
+    case "if": return v.if(expr.kind);
+  }
+}
+ +
+

Armed with the visit, we can ergonomically match over the expression:

+ +
+ + +
function infer_types(expr: ast.Expr<void>): ast.Expr<Type> {
+  const ty = visit(expr, {
+    bool: () => TypeBool,
+    int: () => TypeInt,
+    binary: (kind: ast.ExprBinary<void>) => result_type(kind.op),
+    if: (kind: ast.ExprIf<void>) {
+      ...
+    },
+  });
+  ...
+}
+
+function result_type(op: ast.BinaryOp): Type {
+  switch (op) { // A tad verbose, but auto-completed!
+    case ast.BinaryOp.Add: case ast.BinaryOp.Sub:
+    case ast.BinaryOp.Mul: case ast.BinaryOp.Div:
+      return TypeInt
+
+    case ast.BinaryOp.Eq: case ast.BinaryOp.Neq:
+      return TypeBool
+
+    case ast.BinaryOp.Lt: case ast.BinaryOp.Gt:
+    case ast.BinaryOp.Le: case ast.BinaryOp.Ge:
+      return TypeBool
+  }
+}
+ +
+

Before we go further, lets generalize this visiting pattern a bit! Recall that our expressions are +parametrized by the type of associated data, and type-checker-shaped transformations are essentially an +Expr<U> -> Expr<V> +transformation.

+

Lets make this generic!

+ +
+ + +
export function transform<U, V>(expr: Expr<U>, v: Visitor<V, V>): Expr<V> {
+ +
+

Transform maps an expression carrying T into an expression carrying V by applying an f +visitor. Importantly, its Visitor<V, V>, rather than a Visitor<U, V>. This is +counter-intuitive, but correct we run transformation bottom up, transforming the leaves first. +So, when the time comes to visit an interior node, all subexpression will have been transformed!

+

The body of transform is wordy, but regular, rectangular, and auto-completes itself:

+ +
+ + +
export function transform<U, V>(expr: Expr<U>, v: Visitor<V, V>): Expr<V> {
+  switch (expr.kind.tag) {
+    case "bool":
+      return {
+        location: expr.location,
+        data: v.bool(expr.kind),
+        kind: expr.kind, 
+      };
+    case "int":
+      return {
+        location: expr.location,
+        data: v.int(expr.kind),
+        kind: expr.kind,
+      };
+    case "binary": {
+      const kind: ExprBinary<V> = { 
+        tag: "binary",
+        op: expr.kind.op,
+        lhs: transform(expr.kind.lhs, v),
+        rhs: transform(expr.kind.rhs, v),
+      };
+      return {
+        location: expr.location,
+        data: v.binary(kind), 
+        kind: kind,
+      };
+    }
+    case "if": {
+      const kind: ExprIf<V> = {
+        tag: "if",
+        cond: transform(expr.kind.cond, v),
+        then_branch: transform(expr.kind.then_branch, v),
+        else_branch: transform(expr.kind.else_branch, v),
+      };
+      return {
+        location: expr.location,
+        data: v.if(kind),
+        kind: kind,
+      };
+    }
+  }
+}
+ +
+
    +
  1. +

    Note how here expr.kind is both Expr<U> and Expr<V> literals dont depend on this type +parameter, and TypeScript is smart enough to figure this out without us manually re-assembling +the same value with a different type.

    +
  2. +
  3. +

    This is where that magic with Visitor<V, V> happens.

    +
  4. +
+

The code is pretty regular here though! So at this point we might actually recall that TypeScript is +a dynamically-typed language, and write a generic traversal using Object.keys, while keeping the +static function signature in-place. I dont think we need to do it here, but theres comfort in +knowing that its possible!

+

Now implementing type inference should be a breeze! We need some way to emit type errors though. +With TypeScript, it would be trivial to accumulate errors into an array as a side-effect, but lets +actually represent type errors as instances of a specific type, TypeError (pun intended):

+ +
+ + +
type Type = TypeBool | TypeInt | TypeError;
+
+interface TypeError {
+  tag: "Error";
+  location: ast.Location;
+  message: string;
+}
+ +
+

To check ifs and binary expressions, we would also need a utility for comparing types:

+ +
+ + +
function type_equal(lhs: Type, rhs: Type): boolean {
+  if (lhs.tag == "Error" || rhs.tag == "Error") return true;
+  return lhs.tag == rhs.tag;
+}
+ +
+

We make the Error type equal to any other type to prevent cascading failures. With all that +machinery in place, our type checker is finally:

+ +
+ + +
function infer_types(expr: ast.Expr<void>): ast.Expr<Type> {
+  return ast.transform(expr, {
+    bool: (): Type => TypeBool,
+    int: (): Type => TypeInt,
+
+    binary: (kind: ast.ExprBinary<Type>, location: ast.Location): Type => {
+      if (!type_equal(kind.lhs.data, kind.rhs.data)) {
+        return {
+          tag: "Error",
+          location,
+          message: "binary expression operands have different types",
+        };
+      }
+      return result_type(kind.op);
+    },
+
+    if: (kind: ast.ExprIf<Type>, location: ast.Location): Type => {
+      if (!type_equal(kind.cond.data, TypeBool)) {
+        return {
+          tag: "Error",
+          location,
+          message: "if condition is not a boolean",
+        };
+      }
+      if (!type_equal(kind.then_branch.data, kind.else_branch.data)) {
+        return {
+          tag: "Error",
+          location,
+          message: "if branches have different types",
+        };
+      }
+      return kind.then_branch.data;
+    },
+  });
+}
+
+function result_type(op: ast.BinaryOp): Type {
+    ...
+}
+ +
+

Astute reader will notice that our visitor functions now take an extra ast.Location argument. +TypeScript allows using this argument only in cases where it is needed, cutting down verbosity.

+

And thats all for today! The end result is pretty neat and concise. It took some typing to get there, +but TypeScript autocompletion really helps with that! Whats more important, there was very little +fighting with the language, and the result feels quite natural and directly corresponds to the shape +of the problem.

+

I am not entirely sure in the conclusion just yet, but I think Ill be using TypeScript as my tool +of choice for various small language hacks. It is surprisingly productive due to the confluence of +three aspects:

+ +
+

Just kidding, heres one more cute thing. Lets say that we want to have lots of syntactic sugar, +and also want type-safe desugaring. We could tweak our setup a bit for that: instead of Expr and +ExprKind being parametrized over associated data, we circularly parametrize Expr by the whole +ExprKind and vice verse:

+ +
+ + +
interface Expr<K> {
+  location: Location,
+  kind: K,
+}
+
+interface ExprBinary<E> {
+  op: BinaryOp,
+  lhs: E,
+  rhs: E,
+}
+ +
+

This allows expressing desugaring in a type-safe manner!

+ +
+ + +
// Fundamental, primitive expressions.
+type ExprKindCore<E> =
+    ExprInt<E> | ExprBinary<E> | ExprIf<E>
+
+// Expressions which are either themselves primitive,
+// or can be desugared to primitives.
+type ExprKindSugar<E> = ExprKindCore<E>
+    | ExprCond<E> | ExprUnless<E>
+
+type ExprCore = Expr<ExprKindCore<ExprCore>>;
+type ExprSugar = Expr<ExprKindSugar<ExprSugar>>;
+
+// Desugaring works by reducing the set of expression kinds.
+function desugar(expr: ExprSugar): ExprCore
+
+// A desugaring steps takes a (potentially sugar) expression,
+// whose subexpression are already desugared,
+// and produces an equivalent core expression.
+function desugar_one(
+    expr: ExprKindSugar<ExprCore>,
+): ExprKindCore<ExprCore>
+ +
+
+
+ + + + + diff --git a/2023/09/13/comparative-analysis.html b/2023/09/13/comparative-analysis.html new file mode 100644 index 00000000..9450ff11 --- /dev/null +++ b/2023/09/13/comparative-analysis.html @@ -0,0 +1,196 @@ + + + + + + + Comparative Analysis + + + + + + + + + + + + +
+ +
+ +
+
+ +

Comparative Analysis

+

Most languages provide 6 comparison operators:

+ +
+ + +
<
+<=
+>
+>=
+=
+!=
+ +
+

Thats too damn many of them! Some time ago Ive noticed that my code involving comparisons is often +hard to understand, and hides bugs. Ive figured some rules of thumb to reduce complexity, which I +want to share.

+

The core idea is to canonicalize things. Both x < y and y > x mean the same, and, if you use +them with roughly equal frequency, you need to spend extra mental capacity to fold the two versions +into the single x tiny, y HUGE concept in your head.

+

The number line is a great intuition and visualization +for comparisons. If you order things from small to big, +A B C D, +you get intuitive concept of ordering without using comparison operators. You also plug into your +existing intuition that the sort function arranges arrays in the ascending order.

+

So, as a first order rule-of-thumb: +Strongly prefer < and <= over > and >= +And, when using comparisons, use number line intuition.

+

Some snippets:

+

Checking if a point is inside the interval:

+ +
+ + +
lo <= x and x <= hi
+ +
+

Checking if a point is outside of the interval:

+ +
+ + +
x < lo or hi < x
+ +
+

Segment a is inside segment b:

+ +
+ + +
b.start <= a.start and a.end <= b.end
+ +
+

Segments a and b are disjoint (either a is to the left of b or a is to the right of b):

+ +
+ + +
a.end < b.start or b.end < a.start
+ +
+

A particular common case for ordered comparisons is checking that an index is in bounds for an +array. Here, the rule about number line works together with another important rule: State +invariants positively

+

The indexing invariant is spelled as index < xs.len(),

+

and you should prefer to see it exactly that way in the source code. Concretely,

+ +
+ + +
if (index >= xs.len) {
+
+}
+ +
+

is hard to get right, because is spells the converse of the invariant, and involves an extra mental +negation (this is subtle although there isnt a literal negation operator, you absolutely do +think about this as a negation of the invariant). If possible, the code should be reshaped to

+ +
+ + +
if (index < xs.len) {
+
+} else {
+
+}
+ +
+
+
+ + + + + diff --git a/2023/10/06/what-is-an-invariant.html b/2023/10/06/what-is-an-invariant.html new file mode 100644 index 00000000..57a74498 --- /dev/null +++ b/2023/10/06/what-is-an-invariant.html @@ -0,0 +1,385 @@ + + + + + + + What is an Invariant? + + + + + + + + + + + + +
+ +
+ +
+
+ +

What is an Invariant?

+

I extolled the benefits of programming with invariants in a couple of recent posts. +Naturally, I didnt explain what I think when I write invariant. This post fixes that.

+

There are at least three different concepts I label with invariant:

+ +

I wouldnt discuss the first point here I dont know how to describe this better than that +thing that you do when you solve non-trivial math puzzler. The bulk of the post describes the +second bullet point, for which I think I have a perfect litmus test to explain exactly what I am +thinking here. I also touch a bit on the last point in the end.

+

So lets start with a litmus test program to show invariants in +the small in action:

+ + +

You might want to write one yourself before proceeding. Heres an exhaustive +test for this functionality, +using exhaustigen crate:

+ +
+ + +
fn main() {
+  let N = 5;
+  let M = 5;
+
+  let mut g = exhaustigen::Gen::new();
+  while !g.done() {
+    // Generate an arbitrary sorted array of length at most M.
+    let mut xs =
+      (0..g.gen(N)).map(|_| g.gen(M) as i32).collect::<Vec<_>>();
+    xs.sort();
+
+    let x = g.gen(M) as i32;
+
+    let i = insertion_point(&xs, x);
+    if i > 0        { assert!(xs[i - 1] < x) }
+    if i < xs.len() { assert!(x <= xs[i]) }
+  }
+}
+ +
+

Heres how I would naively write this function. First, I start with defining the boundaries for the +binary search:

+ +
+ + +
fn insertion_point(xs: &[i32], x: i32) -> usize {
+    let mut lo = 0;
+    let mut hi = xs.len();
+    ...
+}
+ +
+

Then, repeatedly cut the interval in half until it vanishes

+ +
+ + +
    while hi > lo {
+        let mid = lo + (hi - lo) / 2;
+        ...
+    }
+ +
+

and recur into the left or the right half accordingly:

+ +
+ + +
        if x < xs[mid] {
+            lo = mid;
+        } else {
+            hi = mid;
+        }
+ +
+

Altogether:

+ +
+ + +
fn insertion_point(xs: &[i32], x: i32) -> usize {
+  let mut lo = 0;
+  let mut hi = xs.len();
+
+  while lo < hi {
+    let mid = lo + (hi - lo) / 2;
+    if x < xs[mid] {
+      hi = mid;
+    } else {
+      lo = mid;
+    }
+  }
+
+  lo
+}
+ +
+

I love this code! It has so many details right!

+ +

Theres only one problem with this code it doesnt work. Just blindly following rules-of-thumb +gives you working code surprisingly often, but this particular algorithm is an exception.

+

The question is, how do we fix this overwise great code? And heres where thinking invariants helps. +Before I internalized invariants, my approach would be to find a failing example, and to fumble with +some plus or minus ones here and there and other special casing to make it work. That is, find a +concrete problem, solve it. This works, but is slow, and doesnt allow discovering the problem +before running the code.

+

The alternative is to actually make an effort and spell out, explicitly, what the code is supposed +to do. In this case, we want lo and hi to bound the result. That is, +lo <= insertion_point <= hi +should hold on every iteration. It clearly holds before we enter the loop. On each iteration, we +would like to shorten this interval, cutting away the part that definitely does not contain +insertion point.

+

Elaborating the invariant, all elements to the left of lo should be less than the target. +Conversely, all elements to the right of hi should be at least as large as the target.

+ +
+ + +
for i in 0..lo: xs[i] < x
+for i in hi..:  x <= xs[i]
+ +
+

Lets now take a second look at the branching condition:

+ +
+ + +
x < xs[mid]
+ +
+

It matches neither invariant prong exactly: x is on the left, but inequality is strict. We can +rearrange the code to follow the invariant more closely:

+ +
+ + +
if xs[mid] < x {
+    lo = mid + 1;
+} else {
+    hi = mid;
+}
+ +
+ +

The code now works. So what went wrong with the original version with x < xs[mid]? In the else +case, when x >= xs[mid] we set lo = mid, but thats wrong! It might be the case that x == +xs[mid] and x == xs[mid - 1], which would break the invariant for lo.

+

The point isnt in this particular invariant or this particular algorithm. Its the general +pattern that its easy to write the code which implements the right algorithm, and sort-of works, +but is wrong in details. To get the details right for the right reason, you need to understand +precisely what the result should be, and formulating this as a (loop or recursion) invariant +helps.

+
+

Perhaps its time to answer the title question: invariant is some property which holds at all times +during dynamic evolution of the system. In the above example, the evolution is the program +progressing through subsequent loop iterations. The invariant, the condition binding lo and hi, +holds on every iteration. Invariants are powerful, because they are compressed descriptions of +the system, they collapse away the time dimension, which is a huge simplification. Reasoning about +each particular path the program could take is hard, because there are so many different paths. +Reasoning about invariants is easy, because they capture properties shared by all execution paths.

+

The same idea applies when programming in the large. In the small, we looked at how the state of a +running program evolves over time. In the large, we will look at how the source code of the program +itself evolves, as it is being refactored and extended to support new features. Here are some +systems invariants from the systems Ive worked with:

+

Cargo:

+

File system paths entered by users are preserved exactly. If the user types +cargo frob ../some/dir, +Cargo doesnt attempt to resolve ../some/dir to an absolute path and passes the path +to the underlying OS as is. The reason for that is that file systems are very finicky. Although it +might look as if two paths are equivalent, there are bound to be cases where they are not. If the +user typed a particular form of a path, they believe that itll work, and any changes can mess +things up easily.

+

This is a relatively compact invariant basically, code is just forbidden from calling +fs::canonicalize.

+

rust-analyzer:

+

Syntax trees are identity-less value types. That is, if you take an object representing an if +expression, that object doesnt have any knowledge of where in the larger program the if +expression is. The thinking about this invariant was that it simplifies refactors while in the +static program its natural to talk about if on the line X in file Y, when you start modifying +code, identity becomes much more fluid.

+

This is an invariant with far reaching consequences that means that literally everything in +rust-analyzer needs to track identities of things explicitly. You dont just pass around syntax +nodes, you pass nodes with extra breadcrumbs describing their origin. I think this might have been a +mistake while it does make refactoring APIs more principled, refactoring is not the common case! +Most of the work of a language server consists of read-only analysis of existing code, and the +actual refactor is just a cherry on top. So perhaps its better to try to bind identity mode tightly +into the core data structure, and just use fake identities for temporary trees that arise during +refactors.

+

A more successful invariant from rust-analyzer is that the IDE has a full, frozen view of a snapshot +of the world. Theres no API for inferring the types, rather, the API looks as if all the types are +computed at all times. Similarly, theres no explicit API for changing the code or talking about +different historical versions of the code the IDE sees a single current snapshot with all +derived data computed. Underneath, theres a smart system to secretly compute the information on +demand and re-use previous results, but this is all hidden from the API.

+

This is a great, simple mental model, and it provides for a nice boundary between the compiler +proper and IDE fluff like refactors and code completion. Long term, Id love to see several +implementations of the compiler parts.

+

TigerBeetle:

+

A lot of thoughtful invariants here! To touch only a few:

+

TigerBeetle doesnt allocate memory after startup. This simple invariant affects every bit of code +— whatever you do, you must manage with existing, pre-allocated data structures. You cant just +memcpy stuff around, theres no ambient available space to memcpy to! As a consequence (and, +historically, as a motivation for the design) +everything +has a specific numeric limit.

+

Another fun one is that transaction logic cant read from disk. Every object which could be touched +by a transaction needs to be explicitly prefetched into memory before transaction begins. Because +disk IO happens separately from the execution, it is possible to parallelize IO for a whole batch of +transactions. The actual transaction execution is then a very tight serial CPU loop without any +locks.

+

Speaking of disk IO, in TigerBeetle reading from disk cant fail. The central API for reading +takes a data block address, a checksum, and invokes the callback with data with a matching checksum. +Everything built on top doesnt need to worry about error handling. The way this works internally is +that reads that fail on a local disk are repaired through other replicas in the cluster. Its just +that the repair happens transparently to the caller. If the block of data of interest isnt found on +the set of reachable replicas, the cluster correctly gets stuck until it is found.

+
+

Summing up: invariants are helpful for describing systems that evolve over time. Theres a +combinatorial explosion of trajectories that a system could take. Invariants compactly describe +properties shared by an infinite amount of trajectories.

+

In the small, formulating invariants about program state helps to wire correct code.

+

In the large, formulating invariants about the code itself helps to go from a small, simple system +that works to a large system which is used in production.

+
+
+ + + + + diff --git a/2023/10/11/unix-structured-concurrency.html b/2023/10/11/unix-structured-concurrency.html new file mode 100644 index 00000000..403bdaa6 --- /dev/null +++ b/2023/10/11/unix-structured-concurrency.html @@ -0,0 +1,208 @@ + + + + + + + UNIX Structured Concurrency + + + + + + + + + + + + +
+ +
+ +
+
+ +

UNIX Structured Concurrency

+

A short note on a particular structured concurrency pattern for UNIX systems programming.

+
+ +

+ The pattern +

+ + +

That is, in the child process (which you control), do a blocking read on stdin, and exit promptly +if the read returned zero bytes.

+

Example of the pattern from one of the side hacks:

+ +
+ + +
fn main() -> anyhow::Result<()> {
+  let args = Args::parse()?;
+
+  let token = CancellationToken::new();
+  let _guard = token.clone().drop_guard();
+  let _watchdog_thread = std::thread::spawn({
+    let token = token.clone();
+    move || run_watchdog(token)
+  });
+
+  let tcp_socket = TcpListener::bind(args.addr.sock)?;
+  let udp_socket = UdpSocket::bind(args.addr.sock)?;
+  println!("listening on {}", args.addr.sock);
+  run(args, &token, tcp_socket, udp_socket)
+}
+
+fn run_watchdog(token: CancellationToken) {
+  let _guard = token.drop_guard();
+  let stdin = std::io::stdin();
+  let mut stdin = stdin.lock();
+  let mut buf = [0];
+  let n = stdin.read(&mut buf).unwrap();
+  if n != 0 {
+    panic!("unexpected input");
+  }
+}
+ +
+
+
+ +

+ Context +

+

Two bits of background reading here:

+

A famous novel by Leo Tolstoy blog post by njs:

+

https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/

+

A less famous, but no less classic, gotchas.md from duct.py:

+

https://github.com/oconnor663/duct.py/blob/master/gotchas.md#killing-grandchild-processes

+

It is often desirable to spawn a process, and make sure that, when the parent process exits, the +child process is also killed. This can not be achieved using a pattern equivalent to

+ +
+ + +
try {
+    process = spawn(...)
+} finally {
+    _ = process.kill()
+}
+ +
+

The parent process itself might be abruptly killed, and the finally blocks / destructors / atexit +hooks are not run in this case.

+

The natural habitat for this pattern are integration tests, where you often spawn external processes +in large amounts, and expect occasional abrupt crashes.

+

Sadly, as far as I know, UNIX doesnt provide an easy mechanism to bind the lifetimes of two +processes thusly. Theres process group mechanism, but it is one-level deep and is mostly reserved +for the shell. Theres docker cgroups, but thats a Linux-specific mechanism which isnt usually +exposed by cross-platform standard libraries of various languages.

+

The trick is using closed stdin as the signal for exit, as that is evenly supported by all platforms, +doesnt require much code, and will do nearly the right thing most of the time.

+

The drawbacks of this pattern:

+
    +
  • +Its cooperative in the child (you must control the code of the child process to inject the exit +logic) +
  • +
  • +Its somewhat cooperative in the parent: while exiting on standard input EOF will do the right +thing most of the time, there are exceptions. For example, reading from /dev/null returns 0 (as +opposed to blocking), and daemon processes often have their stdin set to /dev/null. Sadly, +theres no /dev/reads-and-writes-block-forever +
  • +
  • +It is not actually structured. Ideally, parents exit should block on all descendants exiting, but +thats not the case in this pattern. Still, its good enough for cleaning up in tests! +
  • +
+
+
+
+ + + + + diff --git a/2023/10/12/lsp-could-have-been-better.html b/2023/10/12/lsp-could-have-been-better.html new file mode 100644 index 00000000..a8807bfd --- /dev/null +++ b/2023/10/12/lsp-could-have-been-better.html @@ -0,0 +1,508 @@ + + + + + + + LSP could have been better + + + + + + + + + + + + +
+ +
+ +
+
+ +

LSP could have been better

+ +
+

We talk about programming like it is about writing code, but the code ends up being less important +than the architecture, and the architecture ends up being less important than social issues.

+
+
The Success and Failure of Ninja
+
+

The Why LSP post discusses the social +issues solved by LSP. LSP (as a part of overarching Microsoft strategy) is brilliant, because it +moved the world to a new equilibrium where not having basic IDE support is frowned upon. This post +instead discusses architectural aspects of LSP, which I personally find not as brilliant(especially given that +Dart Analysis Protocol +predates LSP and is technically superior in some aspects). Perhaps it +could be useful for someone designing other LSP-shaped protocols! Note that its been couple of +years since I was actively involved in LSP, probably the grass is greener these days!

+

Lets get to the list of properties, good and bad, in no particular order.

+
+ +

+ Focus on Presentation +

+

And lets start with an aspect of the architecture which is genius, and which, I think, is +responsible for a big share of LSP success on the technical side. If you build a tool for working +with multiple programming languages, one of the biggest questions is how to find common ground +among different, but ultimately similar, languages. A first attempt is to uncover essential +commonality: after all, all languages have files, variables, functions, classes, right? This is … +maybe not necessary a dead end, but definitely a thorny and treacherous path languages are +different, each language is weird in at least some of its aspects, and common ground risks to level +away meaningful distinctions.

+

So, what does LSP do here? It just doesnt provide a semantic model of the code base. Instead, it is +focused squarely on the presentation. No matter how different each programming language is, they +all, in the end, use the same completion widget. So LSP is formulated in terms of whats shown in +the completion widget, not in terms of the underlying semantic language entities. That means that +each language has an internal semantic model which is full fidelity for this particular language, +and uses it to provide the best completion experience which is possible for a given completion +widget. This is how rust-analyzer is structured internally as well:

+
    +
  1. +Compiler layer deals with the messy language analysis tasks, it derives more structured +information (types) from less structured information (source text), explicitly tracking analysis +layers and phases. +
  2. +
  3. +The HIR (high-level intermediate representation) is a façade around the compiler, which provides +a rich graph-based object model of code which looks as if all derived information, like types, is +pre-computed. +
  4. +
  5. +The IDE layer uses HIR to compute things like completions, and presents them as Rust-specific, +but semantics-less POD structures to be shown to the user in GUI more or less as is. +
  6. +
+

One consequence of this architecture is that LSP requests map to editor widgets, and not to the +underlying language concepts, even when several different widgets are powered by the same underlying +data. For example, LSP has separate requests for:

+
    +
  • +hierarchical outline of a file displayed in the side bar, +
  • +
  • +“breadcrumbs shown in the header, +
  • +
  • +syntax-aware selection ranges, +
  • +
  • +code folding. +
  • +
+

Although all four features are just different views into an AST, theres no get AST request in the +LSP. Different requests allow to fine-tune presentation for the different use-cases, and the +details do differ! Semantic selection might contain some sub-syntax ranges inside string literals +and comments, breadcrumb need to include things like conditionals of if expressions, while the +outline might want to get rid of less important nodes. Attentive reader will notice that breadcrumbs +and the outline actually use the same LSP request. Even LSP doesnt follow LSP philosophy fully!

+
+
+ +

+ Transport +

+

After a big thing that LSP did right, lets look at a small thing that it got wrong. Lets look at +how information is transmitted over the wire.

+

JSON is actually OK! Many people complain that JSON is slow, but thats not actually the case +generally. There are some edge cases, where particular client libraries can be slow as was the case +at least at some point with Swift and Emacs, but JSON is definitely fast enough for Rust, Java and +JavaScript. Of course, something substantially better than JSON is possible in theory.

+

I think ideally we need WebAssembly for IPC, a format that:

+
    +
  • +has dual text and binary encoding, +
  • +
  • +is stupidly simple, +
  • +
  • +is thoroughly, readably, and precisely specified, +
  • +
  • +and, in general, is principled and a joy to use. +
  • +
+

Theres no such format yet, so JSON it is. Good enough.

+

HTTP framing is not OK. On the wire, the messages framed like this:

+ +
+ + +
Content-Length: 92 \r\n
+\r\n
+Actual message
+ +
+

That is:

+
    +
  • +case-insensitive content-length header, +
  • +
  • +followed by length of the following message, formatted as a decimal number in ASCII, +
  • +
  • +followed by double \r\n, +
  • +
  • +followed by the actual message. +
  • +
+

This resembles HTTP, but is not actual HTTP, so you need to write a bit of custom code to deal +with the framing. Thats not hard:

+ +
+ + +
  let mut size = None;
+  let mut buf = String::new();
+  loop {
+    buf.clear();
+    if inp.read_line(&mut buf)? == 0 {
+      return Ok(None);
+    }
+    if !buf.ends_with("\r\n") {
+      return Err(invalid_data!("malformed header: {:?}", buf));
+    }
+    let buf = &buf[..buf.len() - 2];
+    if buf.is_empty() {
+      break;
+    }
+    let mut parts = buf.splitn(2, ": ");
+    let header_name = parts.next().unwrap();
+    let header_value = parts.next().ok_or_else(|| {
+      invalid_data!("malformed header: {:?}", buf)
+    })?;
+    if header_name.eq_ignore_ascii_case("Content-Length") {
+      size = Some(
+        header_value.parse::<usize>().map_err(invalid_data)?,
+      );
+    }
+  }
+  let size: usize =
+    size.ok_or_else(|| invalid_data!("no Content-Length"))?;
+  let mut buf = buf.into_bytes();
+  buf.resize(size, 0);
+  inp.read_exact(&mut buf)?;
+  let buf = String::from_utf8(buf).map_err(invalid_data)?;
+ +
+

But, still, decoding ASCII message length from variable-length header? Thats accidental complexity. +Just separate json objects with newlines instead:

+

https://jsonlines.org

+

Framing using \n as a separator is almost certainly available out of the box in the programming +language of choice.

+

Wiping away the tears and peeling one more layer from the onion, we see json-rpc:

+ +
+ + +
{
+    "jsonrpc": "2.0",
+    "method": "initialize",
+    "id": 1,
+    "params": { ... }
+}
+ +
+

This again is a bit of needless accidental complexity. Again, not hard to handle:

+ +
+ + +
fn _write(self, w: &mut dyn Write) -> io::Result<()> {
+  #[derive(Serialize)]
+  struct JsonRpc {
+    jsonrpc: &'static str,
+    #[serde(flatten)]
+    msg: Message,
+  }
+  let text = serde_json::to_string(&JsonRpc {
+    jsonrpc: "2.0",
+    msg: self,
+  })?;
+  write_msg_text(w, &text)
+}
+ +
+

But:

+
    +
  • +Prone to complexity amplification, invites jsonrpc framework with all the latest patterns. +
  • +
  • +"jsonrpc": "2.0" is meaningless noise which you have to look at during debugging. +
  • +
  • +Error codes like -32601 (ah, that comes from xml-rpc!). +
  • +
  • +Includes notifications. Notification are a big anti-pattern in RPC, for a somewhat subtle reason. +More on this later. +
  • +
+

What to do instead? Do what Dart does, some excerpts from the specification:

+ +
+

Messages are delineated by newlines. This means, +in particular, that the JSON encoding process must not introduce newlines within a message. Note +however that newlines are used in this document for readability.

+

To ease interoperability with Lisp-based clients (which may not be able to easily distinguish +between empty lists, empty maps, and null), client-to-server communication is allowed to replace any +instance of {} or [] with null. The server will always properly represent empty lists as []” +and empty maps as {}.

+

Clients can make a request of the server and the server will provide a response for each request +that it receives. While many of the requests that can be made by a client are informational in +nature, we have chosen to always return a response so that clients can know whether the request was +received and was correct.

+

Example request:

+ +
+ + +
request: {
+  "id": String
+  "method": "server.getVersion"
+}
+
+response: {
+  "id": String
+  "error": optional RequestError
+  "result": {
+    "version": String
+  }
+}
+ +
+
+ +
+

Thats basically jsonrpc, the good parts, including using "UNKNOWN_REQUEST" instead of -32601.

+
+
+ +

+ Coordinates +

+

LSP uses (line, column) pairs for coordinates. The neat thing here is that this solves significant +chunk of \n vs \r\n problems client and server may represent line endings differently, but +this doesnt matter, because coordinates are the same.

+

Focus on the presentation provides another motivation, because location information received by the +client can be directly presented to the user, without the need to parse the underlying file. I have +mixed feelings about this.

+

The problem, column is counted using UTF-16 code units. This is, like, no. For many reasons, +but in particular, UTF-16 is definitely the wrong number to show to the user as a column.

+

Theres no entirely obvious answer what should be used instead. My personal favorite would be +counting utf-8 code units (so, just bytes). You need some coordinate space. Any reasonable +coordinate space wont be useful for presentation, so you might as well use the space that matches +the underlying utf-8 encoding, so that accessing substrings is O(1).

+

Using unicode codepoints would perhaps be the most agreeable solution. Codepoints are useless — +youll need to convert to grapheme clusters for presentation, and to utf-8 code units to do anything +with the string. Still, codepoints are a common denominator, they are more often correct if +incorrectly used for presentation, and they have a nice property that any index less than length is +valid irrespective of the actual string.

+
+
+ +

+ Causality Casualty +

+

As mentioned above, one drawback of one-way notifications from jsonrpc is that they dont allow +signaling errors. But theres a more subtle problem here: because you dont receive response to a +notification, it might be hard to order it relative to other events. The Dart protocol is pretty +strict about the ordering of events:

+ +
+

There is no guarantee concerning the order in which responses will be returned, but there is a +guarantee that the server will process requests in the order in which they are sent as long as the +transport mechanism also makes this guarantee.

+
+ +
+

This guarantee ensures that the client and the server mutually understand each others state. For +every request the client knows which file modifications happened before it, and which came afterwards.

+

In LSP, when the client wants to modify the state of a file on the server, it sends a notification. +LSP also supports server-initiated edits. Now, if the client sends a didChangeTextDocument +notification, and then receives a workspace/applyEdit request from the server, theres no way for +the client to know whether the edit takes the latest change into the account or not. Were +didChangeTextDocument a request instead, the client could have looked at the relative order of the +corresponding response and workspace/applyEdit.

+

LSP papers over this fundamental loss of causality by including numeric versions of the documents +with every edit, but this is a best effort solution. Edits might be invalidated by changes to +unrelated documents. For example, for a rename refactor, if a new usage was introduced in a new file +after the refactor was computed, version numbers of the changed files would wrongly tell you that +the edit is still correct, while it will miss this new usage.

+

Practically, this is a small problem it works most of the time (I think I have seen zero +actual bugs caused by causality loss), and even the proper solution cant order events originating +from the client relative to the events originating from the file system. But the fix is also very +simple just dont voluntarily lose causality links!

+
+
+ +

+ Remote Procedural State Synchronization +

+

And this touches what I think is the biggest architectural issue with LSP. LSP is an RPC protocol +— it is formed by edge triggered requests that make something happen on the other side. But this +is not how most of IDE features work. What actually is needed is level triggered state +synchronization. The client and the server need to agree what something is, deciding the course +of action is secondary. It is to be or not to be rather than what is to be done.

+

At the bottom is synchronization of text documents the server and the client need to agree which +files there are, and what is their content.

+

Above is synchronization of derived data. For example, theres a set of errors in the project. This +set changes when the underlying text files change. Errors change with some lag, as it takes time to +compute them (and sometimes files changes faster than the errors could be re-computed).

+

Things like file outline, syntax highlighting, cross-reference information, e.t.c, all follow the +same pattern.

+

Crucially, predicting which changes to the source invalidate which derived data requires language +specific knowledge. Changing the text of foo.rs might affect syntax highlighting in bar.rs (as +syntax highlighting is affected by types).

+

In LSP, highlighting and such are requests. This means that either the client is incorrect and shows +stale highlighting results, or it conservatively re-queries all highlighting results after every +change, wasting the CPU, and still showing stale results sometimes, when an update happens outside +of the client (eg, when cargo finished downloading external crates).

+

The Dart model is more flexible, performant and elegant. Instead of highlighting being a request, it +is a subscription. The client subscribes to syntax highlighting of particular files, the server +notifies the client whenever highlights for the selected files change. That is, two pieces of state +are synchronized between the client and the server:

+
    +
  • +The set of files the client is subscribed to +
  • +
  • +The actual state of syntax highlighting for these files. +
  • +
+

The former is synchronized by sending the whole current set of files in a request, whenever the +set changes. The latter is synchronized by sending incremental updates.

+

Subscriptions are granular both in terms of the file set, as well as in terms of features. The +client might subscribe for errors in the whole project, and for highlights in the currently opened +documents only.

+

Subscriptions are implemented in terms of RPC, but they are an overarching organizational pattern +followed by the majority of the requests. LSP doesnt have an equivalent, and has real bugs with +outdated information shown to the user.

+

I dont think Dart goes as far as possible here. JetBrains Rider, if I understand correctly, does +something smarter:

+

https://www.codemag.com/Article/1811091/Building-a-.NET-IDE-with-JetBrains-Rider

+

I think the idea behind the rider protocol is that you directly define the state you want to +synchronize between the client and the server as state. The protocol then manages magic” +synchronization of the state by sending minimal diffs.

+
+
+ +

+ Simplistic Refactorings +

+

Lets unwind to something more down to earth, like refactorings. Not the simple ones, like rename, +but complex ones, like change signature:

+

https://www.jetbrains.com/idea/guide/tips/change-signature/

+

In this refactoring, the user selects a function declaration, then rearranges +parameters in some way (reorders, removes, adds, renames, changes types, whatever), and then the IDE +fixes all call-sites.

+

The thing that makes this refactor complex is that it is interactive its not an atomic request +“rename foo to bar, its a dialog between the IDE and the user. There are many parameters that +the user tweaks based on the analysis of the original code and the already specified aspects of the +refactoring.

+

LSP doesnt support this workflows. Dart somewhat supports them, though each refactoring gets to use +custom messages (that is, theres quite good overall protocol for multistep refactorings, but each +refactoring essentially sends any over the wire, and the IDE on the other side hard-codes specific +GUIs for specific refactorings). This per-refactoring work is not nice, but it is much better than +not having these complex refactorings at all.

+
+
+ +

+ Dynamic Registration +

+

A small one to conclude. Significant chunk of conceptual LSP complexity comes from support for +dynamic registration of capabilities. I dont understand why that features is there, rust-analyzer +uses dynamic registration only for specifying which files should be watched. And that would be much +simpler if it used a plain request (or a subscription mechanism).

+
+
+
+ + + + + diff --git a/2023/10/18/obligations.html b/2023/10/18/obligations.html new file mode 100644 index 00000000..1598e360 --- /dev/null +++ b/2023/10/18/obligations.html @@ -0,0 +1,161 @@ + + + + + + + Unless Explicitly Specified Otherwise, Open Source Software With Users Carries Moral Obligations + + + + + + + + + + + + +
+ +
+ +
+
+ +

Unless Explicitly Specified Otherwise, Open Source Software With Users Carries Moral Obligations

+

My thoughts on the topic of whether maintainers owe you anything. Speaking as an author, a maintainer, +a user of, and a contributor to open-source software.

+

Lets start with a thing which I find obvious and non-negotiable: I cant lie in my README.md.

+

I cant write this software is reliable, fast, and secure if in fact my software is slow, +crashes, and comes with a backdoor pre-installed. More generally, if I promise something in the +readme, Id better follow up on the promise and be ready to apologize if I fail.

+

If I create expectations between me and my users, I am on the hook for conforming to them.

+

The subtle point here is, if I make an Open Source Project, push it to some forge, write a nice +readme explaining why one would want to use it, provide one-liner for installation, and publish +builds to some package registries, I am already creating some expectations. The act of inviting +users (and writing usage instructions aimed at a general audience is an act of inviting users) +forms an agreement between me as a maintainer and the user.

+

Expectations, but how great? Lets say that tomorrow at this place I am run over by an automobile. +That would be a tragedy for many reasons! But should I worry, on top of all that, that I can no +longer swiftly react to vulnerabilities reported against my open-source software? Obviously not! And +thats the bound on expectations here: it is absolutely ok for a maintainer to do absolutely +nothing.

+

At the same time, if I publish a project, write a nice readme, provide installation instructions, +etc, and then add a backdoor to my software, I am wrong. Yes, I didnt explicitly mention in the +readme that I am not going to add a backdoor. Still, there is a basic, implicit expectation about +software security, and it is wrong for me to violate it without an explicit mention.

+

So I think the default expectations for a published open-source project boil down to:

+ +

What about the license? Doesnt it say that THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF +ANY KIND, EXPRESS OR IMPLIED?

+

It does, but thats a statement about legality, not ethicality. If my readme says that my software +is fit for a particular purpose, while it actually (and subtly) isnt in a big way, my users have +the moral right to be mad at me. They dont have the legal right to sue me though.

+

So, if you, as an open-source maintainer, publish your software and gain users, you should ask +yourself: do I actually want to have users?. It is totally fine if the answer is no! It is a +safe default answer and what governs most of the git repositories out there.

+

Never the less, if the answer to question of users is no, you should make it clear in your Readme +that it is a hobby, non-production-ready project which isnt intended to be used by anyone but you. +Usually, its enough to just not have a readme at all, or have a very short readme which makes it +obvious that the project isnt supported.

+

However, if you do have a nice README with installation instructions and such, that constitutes a +“yes answer. And then you, as a maintainer, are responsible for a tiny bit of life of your +explicitly invited users. Its not expected that you do much (in fact, doing nothing is totally OK), +but the amount of expectation is greater than zero.

+
+
+ + + + + diff --git a/2023/10/23/unified-vs-split-diff.html b/2023/10/23/unified-vs-split-diff.html new file mode 100644 index 00000000..c16f68f3 --- /dev/null +++ b/2023/10/23/unified-vs-split-diff.html @@ -0,0 +1,173 @@ + + + + + + + Unified Versus Split Diff + + + + + + + + + + + + +
+ +
+ +
+
+ +

Unified Versus Split Diff

+

Which is better for code reviews, a unified diff or a split diff?

+

A split diff looks like this for me:

+ +
+ + +
+

And this is a unified one:

+ +
+ + +
+

If the changes are simple and small, both views are good. But for larger, more complex changes +neither works for me.

+

For a large change, I dont want to do a diff review, I want to do a proper code review of a +codebase at a particular instant in time, paying specific attention to the recently changed areas, +but mostly just doing general review, as if I am writing the code. I need to run tests, use goto +definition and other editor navigation features, apply local changes to check if some things could +have been written differently, look at the wider context to notice things that should have been +changed, and in general notice anything that might be not quite right with the codebase, +irrespective of the historical path to the current state of the code.

+

So, for me, the ideal diff view would look rather like this:

+ +
+ + +
+

On the left, the current state of the code (which is also the on-disk state), with changes subtly +highlighted in the margins. On the right, the unified diff for the portion of the codebase currently +visible on the left.

+

Sadly, this format of review isnt well supported by the tools everyone seems to be happy +reviewing diffs, rather than the actual code?

+

I have a low-tech and pretty inefficient workflow for this style of review. A gpr +script for checking out a pull +request locally:

+ +
+ + +
$ gpr 1234 --review
+ +
+

Internally, it does roughly

+ +
+ + +
$ git fetch upstream refs/pull/1234/head
+$ git switch --detach FETCH_HEAD
+$ git reset $(git merge-base HEAD main)
+ +
+

The last line is the key it erases all the commits from the pull request, but keeps all of the +changes. This lets me abuse my workflow for staging&committing to do a code review — +edamagit shows the list of changed files, I get go to +next/previous change shortcuts in the editor, I can even use the staging area to mark hunks I have +reviewed.

+

The only thing I dont get is automatic synchronization between magit status buffer, and the file +thats currently open in the editor. That is, to view the current file and the diff on the side, I +have to manually open the diff and scroll it to the point I am currently looking at.

+

I wish it was easier to get this close to the code without building custom ad-hoc tools!

+

P.S. This post talks about how to review code, but reviewing the code is not necessary the primary +goal of code review. See this related post: +Two Kinds of Code Review.

+
+
+ + + + + diff --git a/2023/11/07/dta-oriented-blogging.html b/2023/11/07/dta-oriented-blogging.html new file mode 100644 index 00000000..0e291907 --- /dev/null +++ b/2023/11/07/dta-oriented-blogging.html @@ -0,0 +1,305 @@ + + + + + + + Data Oriented Blogging + + + + + + + + + + + + +
+ +
+ +
+
+ +

Data Oriented Blogging

+

Wherein I describe the setup of this blog. The main take away from the post are not specific +technical tools, but the underlying principles and ideas, which I wish I had articulated earlier.

+ +
+

If you don’t understand the data you don’t understand the problem.

+
+
Sun Tzu
+
+

Physically, a typical blog is a directory of .html and .css files which are available over HTTP. +The simplest way to create those files is to just write them by hand. While in some cases that might +be enough, often it isnt.

+

If a blog has multiple pages, you usually want to have some common elements header, footer, +style, etc. It is possible to get common style by copy-pasting an existing page every time you +need to add something. This makes changes hard having consistent layout at a single point in time +is not sufficient, one almost always wants to be able to apply consistent modifications as well.

+

The second issue with hand-written html is that some parts might be very annoying to hand-write. For +example, code snippets with syntax highlighting require quite a few tags.

+

Finally, writing html by hand is not necessarily most convenient. A * bullet-list certainly is more +pleasant to look at than an <ul><li></li></ul>!

+

Thats why a blog usually is some sort of a script (a program) which reads input content in some +light markup language (Markdown typically) and writes HTML. As most blogs are similar, it is +possible to generalize and abstract such a script, and the result would be called a static site +generator. I dont like this term very much, as it sounds more complicated than the underlying +problem at hand reading .md files and writing out .htmls.

+

Static Site Generators are an example of the template method pattern, where the framework +provides the overall control flow, but also includes copious extension points for customizing +behaviors. Template method allows for some code re-use at the cost of obscure and indirect control +flow. This pattern pays off when you have many different invocations of template method with few, if +any, non-trivial customizations. Conversely, a template method with a single, highly customized +call-site probably should be refactored away in favor of direct control flow.

+

If you maintain dozens mostly identical websites, you definitely need a static site generator. If +you have only one site to maintain, you might consider writing the overall scaffolding yourself.

+

If you pick an SSG, pay close attention to its extensibility mechanism, you want to avoid situations +where you know how to do something by hand, but the SSG is not flexible enough to express that. +Flexible SSG typically have some way to inject user code in their processing, for free-form +customization. For this reason, I am somewhat skeptical of static site generators implemented in +languages without eval, such as Go and Rust. They might be excellent as long as they fulfill your +needs exactly. However, should you need something more custom than whats available out of the box, +you might find yourself unable to implement that. This creates discontinuity in complexity.

+

Note that I am not saying that every site out there needs some custom plugins. Most (certainly, +most well-maintained ones) work just fine with fairly vanilla configurations. Rather, its a +statement about risks theres small, but non-zero probability that youll need something quite +unusual. However, should you find yourself with a use-case which is not supported by your SSGs +available customization options, the cost of the work-around could be very high.

+

I only have to maintain this single blog, and I want the freedom to experiment with fairly custom +things, so, in this context, writing the scaffolding script myself makes more sense.

+
+ +

+ The Best Tool For The Job +

+

Converting from one text format to another isnt particularly resource intensive and is trivial to +parallelize. But it requires a fair amount of work with strings, dates, file-system and such, which +points towards a higher-level programming language. However, the overriding concern is stability — +blogs usually dont enjoy active daily maintenance, and nothing can distract more than the need to +sort out the tooling even before you get to writing.

+

I wish I could recommend a stable high-level scripting language, but, to my knowledge, there isnt +any yet. Python falls apart as soon as you need to install a dependency. While I highly recommend +Boring Dependency Management, +knowing that pip-compile is that one extra tool you need is one extra tool too many. While Node +works for adding dependencies, it makes it hard to keep up with them (in Node.JS, dependencies +manage you). I dont know a lot about Ruby, but, in the Jekyll days of this blog, Ive never learned +how to configure bundler (or is it gem?) to use project-local dependencies by default.

+

For this reason, Id say picking Go or Rust for the task makes sense. Yes, those are quite a bit +more verbose than what youd ideally need for bossing Markdown around, but their quality of +implementation is great, and QoI is what matters here most.

+

I use Deno for this blog. Deno is poised to become that scripting environment I +wish existed: https://matklad.github.io/2023/02/12/a-love-letter-to-deno.html

+

In addition to the overall QoI, it has particular affinity for web stuff. Out of the box, it has +extra niceties, like file system watching and hot-reloading, or the permissions system to catch +mistakes when reading or writing wrong files. The only reason to recommend Rust and Go over Deno at +this point is that Deno is still pretty young, and, subjectively, needs more time to graduate into +boring tech.

+

Having picked the language, which text format should be the input?

+
+
+ +

+ Data In +

+

The most typical choice here is Markdown. Markdown is fine overall, but it does have one pretty +glaring deficiency vanilla Markdown doesnt allow for custom elements. An example of a custom +element would be a shortcut, like ctrl + c. In stock Markdown, theres no syntactic +mechanism to designate something as this is my custom element. You can add syntactic extensions, +but then youll need new syntax for each custom element. Alternatively, you can use a Markdown +dialect which supports generic extensibility, like Pandoc +Markdown or MDX.

+

An interesting choice for source data format would be HTML with custom tags. If you write some HTML +by hand, that doesnt mean you have to write all HTML manually a script can desugar hand-written +HTML into more verbose form for the browser. For example, the source for a post can contain a +snippet like this:

+ +
+ + +
<listing lang="rust">
+fn main() {
+    println!("Hello, World!")
+}
+</listing>
+ +
+

The script then reads and parses this HTML, and produces appropriate pre > code with syntax +highlighting soup.

+

HTML would be my recommendation if I were optimizing for stability. These days, many editors have +emmet out of the box, which makes producing HTML not that horrible:

+ +
+ + +
+

But wouldnt it be great if there was an extensible light markup language, which combines concise +syntax and tasty sugar of Markdown with extensibility and flexibility of HTML, +The Elements of a Great Markup Language? +It actually sort-of-exists already: +https://djot.net

+

Djot is quite a bit like Deno in that it takes well established good ideas, and just doesnt mess up +the implementation. But it is also an emerging technology at this point, not even at 1.0, so use at +your own risk.

+
+
+ +

+ Data Out +

+

The output clearly has to be HTML, but there are many ways to manufacture this markup. Producing +HTML is not an entirely solved problem. Usually, some sort of textual templating is used, but thats +a fundamentally wrong approach: https://www.devever.net/~hl/stringtemplates

+

For this problem, the shape of data is not that of a string, rather it is a tree.

+

Luckily for the blogging domain, the main motivation for proper solution is protection from XSS. +Blogs usually dont include user-submitted content, so one can play fast and loose with escaping. +That is to say, if your language supports string +interpolation, +that might be enough of a templating engine. That is what this blog does just backticks in +TypeScript.

+

Whats tantalizing is that the proper solutions is clearly visible, and is just out of reach. We +have JSX now, the proper write code to produce trees solution. Sadly, I dont think its +immediately usable as of yet. Deno +docs mention some +“new JSX API, with an initial support. I also dont see some built-in (or obviously blessed) way to +take my JSX syntax and convert it to string when writing to an .html file.

+
+
+ +

+ Look and Feel +

+

Its not enough to produce HTML, it also has to look good. I dont find this acceptable:

+

https://danluu.com/input-lag/

+

It is completely unreadable on a 16:9 screen. Which might have good second-order effects (people +clearly come for content rather than for style), but, really, just no :-)

+

It would be fair to say that thats browsers (aka backwards compatibility) fault ideally, +unstyled HTML would look good, with some reasonable max-width and default body font-size a touch +larger than 16px. Id love if there were some sort of <style modern/> tag to opt-into a new set of +default css rules, which would be consistent across browsers (obviating the need for CSS reset), and +would make classless HTML readable. Alas, we dont have that, and need to provide browsers with some +minimum amount of CSS ourselves.

+

The good news is, thats not so hard this days. When I was starting with programming, web dev was +pretty arcane, and consisted mostly of clever hacks, like tables & floats. At that time, I didnt +feel qualified to do CSS.

+

Today, the specifications evolved to become much simpler to use (if sprawling at the edges), and +browsers are significantly more uniform, so even I can cook up something presentable. Some survival +tips here:

+
    +
  • +At small scale, built-in web technologies work. HTML&CSS are plenty; you could use, but you dont +necessary need React, css processors, transpilers, and the rest. +
  • +
  • +MDN docs are awesome. +
  • +
  • +box-sizing: border-box +and understanding margin +collapsing +are two required things to make sense of layout in the small. +
  • +
  • +Flexbox is the modern, +intuitive way for layout in the large. +
  • +
  • +CSS reset/normalization is sadly still a thing. Browsers come with default CSS rules for various +elements, and sometimes these rules differ between them, which requires an explicit override in +the css you write. Unfortunately, I dont know much beyond that, but +https://github.com/sindresorhus/modern-normalize looks like a reasonable place to start. +
  • +
+
+

To conclude, lets circle back to that claim that a typical blog is a directory with a bunch of +.html and .css files. This is not true. Theres no physical relation between HTTP requests and +responses, and the contents of the file system. Rather, its just a thin waist, a convention +collectively employed by many different HTTP servers to allow easy customization of HTTP responses, +yet another template method pattern. This is a remarkably successful thin waist though, it merges +completely with the background and is invisible unless you really go looking for it.

+
+
+
+ + + + + diff --git a/2023/11/15/push-ifs-up-and-fors-down.html b/2023/11/15/push-ifs-up-and-fors-down.html new file mode 100644 index 00000000..554208ef --- /dev/null +++ b/2023/11/15/push-ifs-up-and-fors-down.html @@ -0,0 +1,293 @@ + + + + + + + Push Ifs Up And Fors Down + + + + + + + + + + + + +
+ +
+ +
+
+ +

Push Ifs Up And Fors Down

+

A short note on two related rules of thumb.

+
+ +

+ Push Ifs Up +

+

If theres an if condition inside a function, consider if it could be moved to the caller instead:

+ +
+ + +
// GOOD
+fn frobnicate(walrus: Walrus) {
+    ...
+}
+
+// BAD
+fn frobnicate(walrus: Option<Walrus>) {
+  let walrus = match walrus {
+    Some(it) => it,
+    None => return,
+  };
+  ...
+}
+ +
+

As in the example above, this often comes up with preconditions: a function might check precondition +inside and do nothing if it doesnt hold, or it could push the task of precondition checking to +its caller, and enforce via types (or an assert) that the precondition holds. With preconditions +especially, pushing up can become viral, and result in fewer checks overall, which is one +motivation for this rule of thumb.

+

Another motivation is that control flow and ifs are complicated, and are a source of bugs. By +pushing ifs up, you often end up centralizing control flow in a single function, which has a +complex branching logic, but all the actual work is delegated to straight line subroutines.

+

If you have complex control flow, better to fit it on a screen in a single function, rather than +spread throughout the file. Whats more, with all the flow in one place it often is possible to +notice redundancies and dead conditions. Compare:

+ +
+ + +
fn f() {
+  if foo && bar {
+    if foo {
+
+    } else {
+
+    }
+  }
+}
+
+fn g() {
+  if foo && bar {
+    h()
+  }
+}
+
+fn h() {
+  if foo {
+
+  } else {
+
+  }
+}
+ +
+

For f, its much easier to notice a dead branch than for a combination of g and h!

+

A related pattern here is what I call dissolving enum refactor. Sometimes, the code ends up +looking like this:

+ +
+ + +
enum E {
+  Foo(i32),
+  Bar(String),
+}
+
+fn main() {
+  let e = f();
+  g(e)
+}
+
+fn f() -> E {
+  if condition {
+    E::Foo(x)
+  } else {
+    E::Bar(y)
+  }
+}
+
+fn g(e: E) {
+  match e {
+    E::Foo(x) => foo(x),
+    E::Bar(y) => bar(y)
+  }
+}
+ +
+

There are two branching instructions here and, by pulling them up, it becomes apparent that it is +the exact same condition, triplicated (the third time reified as a data structure):

+ +
+ + +
fn main() {
+  if condition {
+    foo(x)
+  } else {
+    bar(y)
+  }
+}
+ +
+
+
+ +

+ Push Fors Down +

+

This comes from data oriented school of thought. Few things are few, many things are many. Programs +usually operate with bunches of objects. Or at least the hot path usually involves handling many +entities. It is the volume of entities that makes the path hot in the first place. So it often is +prudent to introduce a concept of a batch of objects, and make operations on batches the base +case, with a scalar version being a special case of a batched ones:

+ +
+ + +
// GOOD
+frobnicate_batch(walruses)
+
+// BAD
+for walrus in walruses {
+  frobnicate(walrus)
+}
+ +
+

The primary benefit here is performance. Plenty of performance, in extreme +cases.

+

If you have a whole batch of things to work with, you can amortize startup cost and be flexible +about the order you process things. In fact, you dont even need to process entities in any +particular order, you can do vectorized/struct-of-array tricks to process one field of all entities +first, before continuing with other fields.

+

Perhaps the most fun example here is FFT-based polynomial +multiplication: turns out, +evaluating a polynomial at a bunch of points simultaneously could be done faster than a bunch of +individual point evaluations!

+

The two pieces of advice about fors and ifs even compose!

+ +
+ + +
// GOOD
+if condition {
+  for walrus in walruses {
+    walrus.frobnicate()
+  }
+} else {
+  for walrus in walruses {
+    walrus.transmogrify()
+  }
+}
+
+// BAD
+for walrus in walruses {
+  if condition {
+    walrus.frobnicate()
+  } else {
+    walrus.transmogrify()
+  }
+}
+ +
+

The GOOD version is good, because it avoids repeatedly re-evaluating condition, removes a branch +from the hot loop, and potentially unlocks vectorization. This pattern works on a micro level and on +a macro level the good version is the architecture of TigerBeetle, where in the data plane we +operate on batches of objects at the same time, to amortize the cost of decision making in the +control plane.

+

While performance is perhaps the primary motivation for the for advice, sometimes it helps with +expressiveness as well. jQuery was quite successful back in the day, and it operates on +collections of elements. The language of abstract vector spaces is often a better tool for thought +than bunches of coordinate-wise equations.

+

To sum up, push the ifs up and the fors down!

+
+
+
+ + + + + diff --git a/2023/11/16/IronBeetle.html b/2023/11/16/IronBeetle.html new file mode 100644 index 00000000..af370530 --- /dev/null +++ b/2023/11/16/IronBeetle.html @@ -0,0 +1,117 @@ + + + + + + + IronBeetle + + + + + + + + + + + + +
+ +
+ +
+
+ +

IronBeetle

+

Hey, I am trying my hand at this Twitch thing and stream stuff about TigerBeetle at 17:00 UTC on +Thursdays. The format is unscripted, unedited stream&talk, so this is not particularly information +dense, but it is fun (at least for me):

+

https://www.twitch.tv/tigerbeetle

+

The videos are mirrored on YouTube at

+

https://www.youtube.com/playlist?list=PL9eL-xg48OM3pnVqFSRyBFleHtBBw-nmZ

+

IronBeetle logo

+
+
+ + + + + diff --git a/2023/12/10/nsfw.html b/2023/12/10/nsfw.html new file mode 100644 index 00000000..f6cb6821 --- /dev/null +++ b/2023/12/10/nsfw.html @@ -0,0 +1,474 @@ + + + + + + + Non-Send Futures When? + + + + + + + + + + + + +
+ +
+ +
+
+ +

Non-Send Futures When?

+

Ever since reading +What If We Pretended That a Task = Thread? +I cant stop thinking about borrowing non-Sync data across .await. +In this post, Id love to take one more look at the problem.

+
+ +

+ Send And Sync +

+

To warm up, a refresher on +Send and +Sync auto-traits. +These traits are a library feature that enable fearless concurrency a statically checked +guarantee that non-thread-safe data structures dont escape from their original thread.

+

Why do we need two traits, rather than just a single ThreadSafe? Because there are two degrees of +thread-unsafety.

+

Some types are fine to use from multiple threads, as long as only a single thread at a time uses a +particular value. An example here would be a Cell<i32>. If two threads have a reference to a cell +at the same time, a &Cell<i32>, we are in trouble Cells loads and stores are not atomic +and are UB by definition if used concurrently. However, if two different threads have exclusive +access to a Cell, thats fine because the access is exclusive, it necessary means that it is +not simultaneous. That is, its OK for thread A to send a Cell<i32> to a different thread B, +as long as A itself loses access to the cell.

+

But there are also types which are unsafe to use from multiple threads even if only a single thread +at a time has access to a value. An example here would be an Arc<Cell<i32>>. Its not possible +to safely send such an Arc to a different thread, because a .clone call can be used to get an +independent copy of an Arc, effectively creating a share operation out of a send one.

+

But turns out both cases are covered by just a single trait, Send. The thing is, to share a +Cell<i32> across two threads, it is necessary to send an &Cell<i32>. So we get the following +table:

+ + + + + + + + + + + + + + + + + +
Send!Send
Cell<i32>&Cell<i32>
i32Arc<Cell<i32>>
&i32&Arc<Cell<i32>>
+

If T is Send, &T might or might not be Send. And thats where the Sync traits +comes from: &T: Send if and only if (iff) T: Sync. Which gives the following table:

+ + + + + + + + + + + + + + + + +
Send!Send
Synci32
!SyncCell<i32>Arc<Cell<i32>>
+

What about that last empty cell? Types which are Sync and !Send are indeed quite rare, and I +dont know examples which dont boil down to underlying API mandates that a type doesnt leave a +thread. One example here would be MutexGuard from the standard library pthreads require +that only the thread that originally locked a mutex can unlock it. This isnt a fundamental +requirement for a mutex a MutexGuard from parking lot +can be Send.

+
+
+ +

+ Thread Safety And Async +

+

As you see, the Send & Sync infrastructure is quite intricate. Is it worth it? Absolutely, as it +leads to simpler code. In Rust, you can explicitly designate certain parts of a code base as +non-thread-safe, and then avoid worrying about threads, because compiler will catch your hand if you +accidentally violate this constraint.

+

The power of Rust is not defensively making everything thread safe, its the ability to use +thread-unsafe code fearlessly.

+

And it seems like async doesnt quite have this power. Lets build an example, a litmus test!

+

Lets start with a Context pattern, where a bunch of stuff is grouped into a single struct, so +that they can be threaded through the program as one parameter. Such Context object is usually +scoped to a particular operation the ultimate owner of Context is a local variable in some +top-level main function, it is threaded as &Context or &mut Context everywhere, and usually +isnt stored anywhere. For the &Context variant, it is also customary to add some interior +mutability for things like caches. One real-life example would be a Config type from Cargo: +config/mod.rs#L168.

+

Distilling the pattern down, we get something like this:

+ +
+ + +
#[derive(Default)]
+pub struct Context {
+  counter: Cell<i32>
+}
+
+impl Context {
+  fn increment(&self) {
+    self.counter.set(self.counter.get() + 1);
+  }
+}
+ +
+

Here, a counter is an interior-mutable value which could, e.g., track cache hit rate. And here how +this type could be used:

+ +
+ + +
fn f(context: &Context) {
+  g(context);
+  context.increment();
+}
+
+fn g(_context: &Context) {
+}
+ +
+

However, the async version of the code doesnt really work, and in a subtle way:

+ +
+ + +
async fn f(context: &Context) {
+  g(context).await;
+  context.increment();
+}
+
+async fn g(_context: &Context) {
+}
+ +
+

Do you see the problem? Surprisingly, even rustc doesnt see it, the code above compiles in +isolation. However, when we start using it with Tokios work-stealing runtime,

+ +
+ + +
async fn task_main() {
+  let context = Context::default();
+  f(&context).await;
+}
+
+#[tokio::main]
+async fn main() {
+  tokio::spawn(task_main());
+}
+ +
+

well hit an error:

+ +
+ + +
error: future cannot be sent between threads safely
+
+--> src/main.rs:29:18
+ |
+ | tokio::spawn(task_main());
+ |              ^^^^^^^^^^^ future returned by `task_main` is not `Send`
+ |
+
+within `Context`, the trait `Sync` is not implemented for `Cell<i32>`.
+
+if you want to do aliasing and mutation between multiple threads,
+use `std::sync::RwLock` or `std::sync::atomic::AtomicI32` instead.
+ +
+

What happened here? When compiling async fn f, compiler reifies its stack frame as a Rust struct:

+ +
+ + +
struct FStackFrame<'a> {
+  context: &'a Context,
+  await_state: usize
+}
+ +
+

This struct contains a reference to our Context type, and then Context: !Sync implies &Context: +!Send implies FStackFrame<'_>: !Send . And that finally clashes with the signature of +tokio::spawn:

+ +
+ + +
pub fn spawn<F>(future: F) -> JoinHandle<F::Output>
+where
+    F: Future + Send + 'static, // <- note this Send
+    F::Output: Send + 'static,
+ +
+

Tokios default executor is work-stealing. Its going to poll the future from different threads, and thats +why it is required that the future is Send.

+

In my eyes this is a rather significant limitation, and a big difference with synchronous Rust. +Async Rust has to be defensively thread-safe, while sync Rust is free to use non-thread-safe data +structures when convenient.

+
+
+ +

+ A Better Spawn +

+

One solution here is to avoid work-stealing executors:

+

Local Async Executors and Why They Should be the Default

+

That post correctly identifies the culprit:

+ +
+

I suggest to you, dear reader, that this function signature:

+ +
+ + +
pub fn spawn<T>(future: T) -> JoinHandle<T::Output> where
+    T: Future + Send + 'static,
+    T::Output: Send + 'static,
+ +
+

is a gun.

+
+ +
+

But as for the fix, I think Auri (blaz.is) got it right. The fix is not to +remove + Send bound, but rather to mirror std::thread::spawn more closely:

+ +
+ + +
// std::thread::spawn
+pub fn spawn<F, T>(f: F) -> JoinHandle<T>
+where
+    F: FnOnce() -> T + Send + 'static,
+    T: Send + 'static,
+
+// A hypothetical better async spawn
+pub fn spawn<F, Fut>(f: F) -> JoinHandle<Fut::Output>
+where
+    F: FnOnce() -> Fut + Send + 'static,
+    Fut: Future,
+    Fut::Output: Send + 'static,
+ +
+

Let me explain first why this works, and then why this cant work.

+

A Future is essentially a stack-frame of an asynchronous function. Original tokio version requires +that all such stack frames are thread safe. This is not what happens in synchronous code there, +functions are free to put cells on their stacks. The Sendness is only guarded when data are +actually send to a different thread, in Chanel::send and thread::spawn. The spawn function in +particular says nothing about the stack of a new thread. It only requires that the data used to +create the first stack frame is Send.

+

And thats what we do in the async version: instead of spawning a future directly, it, just like the +sync version, takes a closure. The closure is moved to a different execution context, so it must be +: Send. The actual future created by the closure in the new context can be whatever. An async +runtime is free to poll this future from different threads regardless of its Sync status.

+

Async work-stealing still works for the same reason that blocking work stealing works. Logical +threads of execution can migrate between physical CPU cores because OS restores execution context +when switching threads. Task can migrate between threads because async runtime restores execution +context when switching tasks. Go is a proof that this is possible goroutines migrate between +different threads but they are free to use on-stack non-thread safe state. The pattern is clearly +sound, the question is, can we express this fundamental soundness in Rusts type system, like we +managed to do for OS threads?

+

This is going to be tricky, because Send today absolutely means same thread, not same +execution context. Heres one example that would break:

+ +
+ + +
async fn sneaky() {
+  thread_local! { static TL: Rc<()> = Rc::new(()); }
+  let rc = TL.with(|it| it.clone());
+  async {}.await;
+  rc.clone();
+}
+ +
+

If the .await migrates to a different thread, we are in trouble: two tasks can start on the same +thread, then diverge, but continue to hammer the same non-atomic reference count.

+

Another breakage example is various OS APIs that just mandate that things happen on a particular +execution thread, like pthread_mutex_unlock. Though I think that the turtle those APIs stand on +are thread locals again?

+

Can we fix it? As an absolute strawman proposal, lets redefine Send & Sync in terms of abstract +“execution contexts, add OsThreadSend and OsThreadSync, and change API which involve thread +locals to use the OsThread variants. It seems that everything else works?

+
+
+ +

+ Four Questions +

+

I would like to posit four questions to the wider async Rust community.

+
    +
  1. +

    Does this work in theory? As far as I can tell, this does indeed works, but I am not an async +expert. Am I missing something?

    +

    Ideally, Id love to see small, self-contained litmus test examples that break OsThreadSend +Rust.

    +
  2. +
  3. +

    Is this an important problem in practice to look into? On the one hand, people are quite +successful with async Rust as it is. On the other hand, the expressivity gap here is real, and +Rust, as a systems programming language, strives to minimize such gaps. And then theres the fact +that failure mode today is rather nasty although the actual type error is inside the f +function, we learn about it only at the call site in main.

    +

    EDIT: I am also wondering if we stop caring whether futures are : Send, does that mean we +no longer need an explicit syntax for Send bounds in async traits?

    +
  4. +
  5. +

    Assuming that this idea does work, and we decide that we care enough to try to fix it, is there a +backwards-compatible path we could take to make this a reality?

    +

    EDIT: to clarify, no way we are really adding a new auto-trait like OsThreadSend. But there +could be some less invasive change to get the desired result. For example, a more promising +approach is to expose some runtime hook for async runtimes to switch TLS, such that each task +gets an independent copy of thread-local storage, as if task=thread.

    +
  6. +
  7. +

    Is it a new idea that !Send futures and work-stealing dont conflict with each other? For me, +that 22.05.2023 post +was the first time Ive learned that having a &Cell<i32> in a futures state machine does not +preclude polling it from different OS threads. But theres nothing particularly new there, the +relevant APIs were stabilized years ago. Was this issue articulated and discussed back when the +async Rust was designed, or is it a genuinely new finding?

    +
  8. +
+
+

Update(2023-12-30): there was some discussion of the ideas on +Zulip. +It looks this isnt completely broken and that, indeed, thread-locals are the main principled obstacle.

+

I think I also got a clear picture of a solution for ideal world, where we are not bound by +backwards compatibility requirements: make thread local access unsafe. Specifically:

+

First, remove any references to OS threads from the definition of Send and Sync. Instead, +define them in terms of abstract concurrency. I am not well-versed enough in formal side of things +to understand precisely what that should entail, but I have a litmus test. The new definition should +work for interrupt handlers in embedded. In OS and embedded programming, one needs to deal with +interrupt handlers code that is run by a CPU as a response to a hardware interrupt. When CPU is +interrupted, it saves the current execution context, runs the interrupt, and then restores the +original context. Although it all happens on a single core and there are no OS-threads in sight, the +restrictions are similar to those of threads: an interrupt can arrive in the middle of reference +counter upgrade. To rephrase: Sync should be a core trait. Right now it is defined in core, +but its definition references OS threads a concept no_std is agnostic about!

+

Second, replace thread_local! macro with a #[thread_local] attribute on (unsafe) statics. +There are two reasons why people reach for thread locals:

+
    +
  • +to implement really fast concurrent data structures (eg, a global allocator or an async runtime), +
  • +
  • +as a programming shortcut, to avoid passing a Context argument everywhere. +
  • +
+

The thread_local! macro mostly addresses the second use-case for a very long time, it even was +a non-zero cost abstraction, so that implementing a fast allocator in Rust was impossible! But, +given that this pattern is rare in software (and, where it is used, it then takes years to refactor +it away, like it was the case with rustcs usage of thread locals for parsing session), I think its +OK to say that Rust flat-out doesnt support it safely, like it doesnt support mutable statics.

+

The safety contract for #[thread_local] statics would be more strict then the contract on static +mut: the user must also ensure that the value isnt used past the corresponding threads lifetime.

+
+
+
+ + + + + diff --git a/2023/12/21/retry-loop.html b/2023/12/21/retry-loop.html new file mode 100644 index 00000000..079dcfa2 --- /dev/null +++ b/2023/12/21/retry-loop.html @@ -0,0 +1,207 @@ + + + + + + + Retry Loop + + + + + + + + + + + + +
+ +
+ +
+
+ +

Retry Loop

+

A post about writing a retry loop. Not a smart post about avoiding thundering heards and resonance. +A simpleton kind of post about wrangling ifs and fors together to minimize bugs.

+

Stage: you are writing a script for some build automation or some such.

+

Example problem: you want to get a freshly deployed package from Maven Central. As you learn after +a CI failure, packages in Maven dont become available immediately after a deploy, there could be a +delay. This is a poor API which breaks causality and makes it impossible to code correctly against, +but what other alternative do you have? You just need to go and write a retry loop.

+

You want to retry some action. The action either succeeds or fails. Some, but not all, failures +are transient and can be retried after a timeout. If a failure persists after a bounded number +of retries, it should be propagated.

+

The runtime sequence of event we want to see is:

+ +
+ + +
action()
+sleep()
+action()
+sleep()
+action()
+ +
+

It has that mightily annoying a-loop-and-a-half shape.

+

Heres the set of properties I would like to see in a solution:

+
    +
  1. +No useless sleep. A naive loop would sleep one extra time before reporting a retry failure, but +we dont want to do that. +
  2. +
  3. +In the event of a retry failure, the underlying error is reported. I dont want to see just +that all attempts failed, I want to see an actual error from the last attempt. +
  4. +
  5. +Obvious upper bound: I dont want to write a while (true) loop with a break in the middle. If I +am to do at most 5 attempts, I want to see a for (0..5) loop. Dont ask me +why. +
  6. +
  7. +No syntactic redundancy there should be a single call to action and a single sleep in the +source code. +
  8. +
+

I dont know how to achieve all four. Thats the best I can do:

+ +
+ + +
fn action() !enum { ok, retry: anyerror } {
+
+}
+
+fn retry_loop() !void {
+    for (0..5) {
+        if (try action() == .ok) break;
+        sleep();
+    } else {
+        switch (try action()) {
+            .ok => {},
+            .retry => |err| return err
+        }
+    }
+}
+ +
+

This solution achieves 1-3, fails at 4, and relies on a somewhat esoteric language feature — +for/else.

+

Salient points:

+ +
+
+ + + + + diff --git a/2023/12/24/ci-dream.html b/2023/12/24/ci-dream.html new file mode 100644 index 00000000..b93c92f7 --- /dev/null +++ b/2023/12/24/ci-dream.html @@ -0,0 +1,158 @@ + + + + + + + CI Dream + + + + + + + + + + + + +
+ +
+ +
+
+ +

CI Dream

+

This is more of an android dream (that one with a unicorn) than a coherent post, but please indulge me. +Its a short one at least!

+

Several years ago, it made sense for things like Travis CI or GitHub Actions to exist as technical products as well as businesses. +Back in the day, maintaining a fleet of machines was hard. +So you could take that shepherd job onto yourself, and provide your users and customers with an API to run their tests.

+

Is it true today though? +I am not well-versed in cloud things, but my impression is that today one can rent machines as a commodity. +Cloud providers give you a distributed computer which you pay for as you go.

+

In this world, CI as a SaaS feels like accidental complexity of midlayer mistake variety. +Can we make it simpler? +Can we say that CI is just a program for a distributed computer? +So, in your projects repo, theres a ./ci folder with a such program — +a bunch of Docker files, or .yamls, or whatever is the programming language of the cloud. +You then point, say, AWS to it, tell it run this, here are my credentials, and you get your entire CI infra, +with not rocket science rule, continuous fuzzing, releases, and what not. +And, crucially, whatever project specific logic you need AWS doesnt care what it runs, everything is under your control.

+

Of course, theres a hefty amount of logic required — +interacting with your forge webhooks, +UI through @magic comments and maybe a web server with an HTML GUI, +the management of storage to ensure that cross-build caches stay close, +the management of compute and idempotence, to allow running on cheap spot instances, +and perhaps a thousands of other CI concerns.

+

But it feels like all that could conceivably be a library (an ecosystem of competing projects even)?

+

If I want to have a merge queue, why are these my choices?:

+ +

Why this isnt the world we live in?:

+ +
+ + +
$ cd ci
+$ cargo add protection-agency
+ +
+

Update(2024-01-01): If you like this post, please also read +https://gregoryszorc.com/blog/2021/04/07/modern-ci-is-too-complex-and-misdirected/

+

Although that post contains much fewer references to +Philip K. Dick, +it is superior in every other respect.

+
+
+ + + + + diff --git a/2023/12/31/O(1)-build-file.html b/2023/12/31/O(1)-build-file.html new file mode 100644 index 00000000..4f24f2f3 --- /dev/null +++ b/2023/12/31/O(1)-build-file.html @@ -0,0 +1,163 @@ + + + + + + + O(1) Build File + + + + + + + + + + + + +
+ +
+ +
+
+ +

O(1) Build File

+

Rule of thumb: the size of build or CI configuration should be mostly independent of the project size. +In other words, adding, say, a new test should not require adding a new line to the build file to build the test, and a new line to .yml to run it on CI.

+

Lines in CI config are costly each line is typically a new entry point, +and a bit of required knowledge to be able to run the project locally. +That is, every time you add something to CI, you need to explain that to your colleagues, +so that they know that they need to run more things locally.

+

Lines in build config are usually a little cheaper, but are still far from free. +Often a new build config also implies a new entry point. +At other times, its just a new build artifact tied to an existing entry point, for example, a new integration test binary. +Build artifacts are costly in terms of compile time as your project is linked with every build artifact, the total linking time is quadratic.

+

What to do instead?

+

Minimize the number of entry points and artifacts. +Enumerate O(1) of project entry points explicitly. +You probably need:

+ +

This is a point of contention, but consider if you can avoid separate lint and fmt entry points, as those are a form of automated tests.

+

Of course, an entry point can allow filters to run a subset of things: run --test-filter=tidy. +Its much easier to discover how to filter out things you dont need, +than to realize that theres something you need to opt into.

+

Minimize the number of build artifacts, Delete Cargo Integration Tests. +You probably need separate production and test builds, to avoid linking in test code with the production binaries. +But chances are, these two binaries are all you need. +Avoid building a set of related binaries, use subcommands or BusyBox-style multicall binaries instead. +Not only does this improve compile times, it also helps with putting out fires in the field, as the binary you have in production also contains all the debug tools.

+
+

On rules of thumb in general: for me, the term doesnt mean that what follows is the correct way to do things, better than alternatives. +Rather:

+ +
+
+ + + + + diff --git a/2023/12/31/git-things.html b/2023/12/31/git-things.html new file mode 100644 index 00000000..753cc035 --- /dev/null +++ b/2023/12/31/git-things.html @@ -0,0 +1,276 @@ + + + + + + + Git Things + + + + + + + + + + + + +
+ +
+ +
+
+ +

Git Things

+

A grab bag of less frequently talked about git adjacent points.

+
+ +

+ Not Rocket Science Rule Applies To Merge Commits +

+

Should every commit pass the tests? If it should, then your not rocket science +rule implementation must be verifying this property. It +probably doesnt, and only tests the final result of merging the feature branch into the main +branch.

+

Thats why for typical project it is useful to merge pull requests into the main branch the +linear sequence of merge commits is a record of successful CI runs, and is a set of commits you want +to git bisect over.

+

Within a feature branch, not every commit necessary passes the tests (or even builds), and that is a +useful property! Heres some ways this can be exploited:

+
    +
  • +

    When fixing a bug, add a failing test first, as a separate commit. +That way it becomes easy to verify for anyone that the test indeed fails without the follow up +fix.

    +

    Related advice: often I see people commenting out tests that currently fail, or tests that are yet +to be fixed in the future. Thats bad, because commented-out code rots faster than the JavaScript +framework of the day. Instead, adjust the asserts such that they lock down the current (wrong) +behavior, and add a clear // TODO: comment explaining what would be the correct result. This +prevents such tests from rotting and also catches cases where the behavior is fixed by an +unrelated change.

    +
  • +
  • +

    To refactor an API which has a lot of usages, split the work in two commits. In the first commit, +change the API itself, but dont touch the usages. In the second commit, mechanically adjust all +call sites.

    +

    That way during review it is trivial to separate meaningful changes from a large, but trivial +diff.

    +
  • +
  • +

    git mv is fake. For a long time, I believed that git mv adds some special bit of git metadata +which tells it that the file was moved, such that it can be understood by diff or blame. +Thats not the case: git mv is essentially mv followed by git add. Theres nothing in git to +track that a file was moved specifically, the moved illusion is created by the diff tool when it +heuristically compares repository state at two points in time.

    +

    For this reason, if you want to reliably record file moves during refactors in git, you should do +two commits: the first commit just moves the file without any changes, the second commit applies +all the required fixups.

    +

    Speaking of moves, consider adding this to your gitconfig:

    + +
    + + +
    [diff]
    +  colormoved = "default"
    +  colormovedws = "allow-indentation-change"
    + +
    +

    This way, moved lines will be colored differently in diff, so that code motions not confused +with additions and deletions, and are easier to review. It is unclear to me why this isnt the +default, and why this isnt an option in GitHubs UI.

    +
  • +
+

Merge into main, but rebase feature branches might be a hard rule to wrap your head around if you +are new to git. Luckily, its easy to use not-rocket-science rule to enforce this property. The +history is as much a part of your project as is the source code. You can write a test that shells +out to git and checks that the only merge commits in the history are those from the merge bot. While +you are at it, it would be a good idea to test that no large large files are present in the +repository the size of a repository only grows, and you cant easily remove large blobs from the +repo later on!

+
+
+ +

+ Commit Messages +

+

Let me phrase this in the most inflammatory way possible :)

+

If your project has great commit messages, with short and precise summary lines and long and +detailed bodies, this probably means that your CI and code review process suck.

+

Not all changes are equal. In a typical project, most of the changes that should be made are small +and trivial some renames, visibility tightening, attention to details polish in user-visible +features.

+

However, in a typical project, landing a trivial change is slow. How long would it take you to fix +it's/its typo in a comment? Probably 30 seconds to push the actual change, 30 minutes to get the +CI results, and 3 hours for a review roundtrip.

+

The fixed costs to making a change are tremendous. Main branch gatekeeping strongly incentivizes +against trivial changes. As a result, such changes either are not being made, or are tacked onto +larger changes as a drive by bonus. In any case, the total number of commits and PRs goes down. And +you are crafting a novel of a commit message because you have to wait for your previous PR to land +anyway.

+

What can be done better?

+

First, make changes smaller and more frequent. +Most likely, this is possible for you. +At least, I tend to out-commit most colleagues (example). +Thats not because I am more productive I just do work in smaller batches.

+

Second, make CI asynchronous. +At no point in your workflow you should be waiting for CI to pass. +You should flag a change for merging, move on to the next thing, and only get back if CI fails. +This is something bors-ng does right its possible to r+ a commit immediately on submission. +This is something GitHub merge queue does wrong its impossible to add a PR to queue until checks on the PR itself are green.

+

Third, our review process is backwards. Review is done before code gets into main, but thats +inefficient for most of the non-mission critical projects out there. A better approach is to +optimistically merge most changes as soon as not-rocket-science allows it, and then later review the +code in situ, in the main branch. And instead of adding comments in web ui, just changing the code +in-place, sending a new PR ccing the original author.

+ +
+
    +
  1. +Maintainers SHALL NOT make value judgments on correct patches. +
  2. +
  3. +Maintainers SHALL merge correct patches from other Contributors rapidly. +
  4. +
+

+
    +
  1. +Any Contributor who has value judgments on a patch SHOULD express these via their own patches. +
  2. +
+
+
Collective Code Construction Contract
+
+

I am skeptical that this exact workflow would +ever fly, but I am cautiously optimistic about Zeds idea about just allowing +two people to code in the same editor at the same time. I think that achieves a similar effect, and +nicely dodges unease about allowing temporarily unreviewed code.

+

Ok, back to git!

+

First, not every project needs a clean history. Have you ever looked at the git history of your +personal blog or dotfiles? If you havent, feel free to use a . as a commit message. I do that for +https://github.com/matklad/matklad.github.io, +it works fine so far.

+

Second, not every change needs a great commit message. If a change is really minor, I would say +minor is an okay commit message!

+

Third, some changes absolutely do require very detailed commit messages. If there is a context, +by all means, include all of it into the commit message (and spill some as comments in the source +code). And heres a tip for this case: write the commit message first!

+

When I work on a larger feature, I start with +git commit --allow-empty +to type out what I set to do. Most of the time, by the third paragraph of the commit message I +realize that theres a flaw in my plan and refine it. So, by the time I get to actually writing the +code, I am already on the second iteration. And, when I am done, I just amend the commit with the +actual changes, and the commit message is already there, needing only minor adjustments.

+

And the last thing I want to touch about commit messages: man git-commit tells me that the summary +line should be shorter than 50 characters. This feels obviously wrong, thats much too short! +Kernel docs suggest a much +more reasonable 70-75 limit! And indeed, looking at a some recent kernel commits, 50 is clearly not +enough!

+ +
+ + +
<---               50 characters              --->
+
+get_maintainer: remove stray punctuation when cleaning file emails
+get_maintainer: correctly parse UTF-8 encoded names in files
+locking/osq_lock: Clarify osq_wait_next()
+locking/osq_lock: Clarify osq_wait_next() calling convention
+locking/osq_lock: Move the definition of optimistic_spin_node into osq_lock.c
+ftrace: Fix modification of direct_function hash while in use
+tracing: Fix blocked reader of snapshot buffer
+ring-buffer: Fix wake ups when buffer_percent is set to 100
+platform/x86/intel/pmc: Move GBE LTR ignore to suspend callback
+platform/x86/intel/pmc: Allow reenabling LTRs
+platform/x86/intel/pmc: Add suspend callback
+platform/x86: p2sb: Allow p2sb_bar() calls during PCI device probe
+
+<---               50 characters              --->
+ +
+

Happy new year, dear reader!

+
+
+
+ + + + + diff --git a/2024/01/03/of-rats-and-ratchets.html b/2024/01/03/of-rats-and-ratchets.html new file mode 100644 index 00000000..3c440eb5 --- /dev/null +++ b/2024/01/03/of-rats-and-ratchets.html @@ -0,0 +1,159 @@ + + + + + + + Of Rats and Ratchets + + + + + + + + + + + + +
+ +
+ +
+
+ +

Of Rats and Ratchets

+

This is going to be related to software engineering, pinky promise!

+

I was re-reading Doctor Zhivago by Boris Pasternak recently. It is a beautiful novel set in Russia +during the revolutionary years before World War II. It focuses on the life of Yuri Zhivago, a doctor +and a poet, while the Russian revolutions roar in the background. It is a poignant and topical tale +of a country descending into blood-thirsty madness.

+

Being a doctor, a literati, and a descendant of once wealthy family, Zhivago is not exactly welcomed +in the new Russia. Thats why a significant part of the novel takes place far away from Moscow and +St. Petersburg, in Siberia, where it is easier for undesirables to exist in a fragile truce with the +state.

+

Whats your first problem, if you are going to live in someone elses abandoned house in Siberia, +eking out a living off whatever supplies had been left? The rats, who are also very keen on the said +supplies. Clearly, rats are a big problem, and require immediate attention.

+

Its easy to exert effort and get rid of the rats take a broom, some light source, and just +chase away the rascals from the house. However observably effective the method is, it is not a +solution the rats will come back as soon as you are asleep. The proper solution starts with +identifying all the holes through which the pest gets in, and thoroughly plugging those! Only then +can you hope that the house stays rat free.

+

I feel the dynamics plays out in software projects. Theres lots of rats, everythings broken and in +need of fixing, all the time. And theres usually plenty of desire and energy to fix things. The +problem is, often times the fixes are not durable an immediate problem is resolved promptly, but +then it returns back two years down the line. This is most apparent in benchmarks everyone loves +adding a microbenchmark to motivate a particular change, and then the benchmark bitrots with no one +to run it.

+

Its important not only to fix things, but to fix them in a durable way; to seal up the holes, not +just to wave the broom vigorously.

+

The best way to do this is to setup a not rocket science rule, and then to use it as a ratchet to +monotonically increase the set of properties the codebase possesses, one small check at a time. +Crucially, the ratchet should be set up up front, before any of the problems are actually fixed, +and it must allow for incremental steps.

+

Lets say you lack documentation, and want to ensure that every file in the code-base has a +top-level comment explaining the relevant context. A good way to approach this problem is to write +a test that reads every file in the project, computes the set of poorly documented files, and xors +that against the hard-coded naughty list. This test is then committed to the project with the +naughty list encompassing all the existing files. Although no new docs are added, the ratchet is in +place all new files are guaranteed to be documented. And its easier to move a notch up the +ratchet by documenting a single file and crossing it out from the naughty list.

+

More generally, widen your view of tests a test is a program that checks a property of a +repository of code at a particular commit. Any property code style, absence of warnings, +licenses of dependencies, the maximum size of any binary file committed into the repository, +presence of unwanted merge commits, average assertion density.

+

Not everything can be automated though. For things which cant be, the best trick Ive found is +writing them down. Just agreeing that X is a team practice is not enough, even if it might +work for the first six months. Only when X is written down in a markdown document inside a +repository it might becomes a durable practice. But beware document what is, rather than what +should be. If theres a clear disagreement between what the docs say the world is, and the actual +world, the ratcheting effect of the written word disappears. If theres a large diff between reality +and documentation, dont hesitate to remove conflicting parts of the documentation. Having a ratchet +that enforces a tiny set of properties is much more valuable than aspirations to enforce everything.

+

Coming back to Doctor Zhivago, it is worth noting that the novel is arranged into a myriad of +self-contained small chapters a blessing for a modern attention-deprived world, as it creates a +clear sense of progression even when you dont have enough focus to get lost in a book for hours.

+
+
+ + + + + diff --git a/2024/01/12/write-less.html b/2024/01/12/write-less.html new file mode 100644 index 00000000..f0ea08fc --- /dev/null +++ b/2024/01/12/write-less.html @@ -0,0 +1,119 @@ + + + + + + + Write Less + + + + + + + + + + + + +
+ +
+ +
+
+ +

Write Less

+ +
+

If we wish to count lines of code, we should not regard them as lines produced but as lines spent

+
+
Dijkstra
+
+

The same applies to technical writing. Theres a tendency to think that the more is written, the +better. It is wrong: given the same information content, a shorter piece of prose is easier to +understand, up to a reasonable limit.

+

To communicate effectively, write a bullet-point list of ideas that you need to get across. Then, +write a short paragraph in simple language that communicates these ideas precisely.

+
+
+ + + + + diff --git a/2024/02/10/window-live-constant-time-grep.html b/2024/02/10/window-live-constant-time-grep.html new file mode 100644 index 00000000..d3e2ccc4 --- /dev/null +++ b/2024/02/10/window-live-constant-time-grep.html @@ -0,0 +1,327 @@ + + + + + + + Window: Live, Constant Time Grep + + + + + + + + + + + + +
+ +
+ +
+
+ +

Window: Live, Constant Time Grep

+

In this post, I describe the design of window a small +grep-like utility I implemented in 500 lines of Rust. The utility itself is likely not that +interesting I bet some greybeard can implement an equivalent in 5 lines of bash. But the +design principles behind it might be interesting this small utility manages to combine core +ideas of rust-analyzer and TigerBeetle!

+
+ +

+ Problem Statement +

+

TigerBeetle is tested primarily through a deterministic simulator: a cluster of replicas runs in a +single process (in a single thread even), replicas are connected to a virtual network and a virtual +hard drive. Both the net and the disk are extra nasty, and regularly drop, reorder, and corrupt IO +requests. The cluster has to correctly process randomly generated load in spite of this radioactive +environment. You can play with a visualization of the simulator here: +https://sim.tigerbeetle.com

+

Of course, sometimes we have bugs, and need to debug crashes found by the simulator. Because +everything is perfectly deterministic, a crash is a pair of commit hash and a seed for a random +number generator. We dont yet have any minimization infrastructure, so some crashes tend to be +rather large: a debug log from a crash can easily reach 50 gigabytes!

+

So thats my problem: given multi-gigabyte log of a crash, find a dozen or so of log-lines which +explain the crash.

+

I think you are supposed to use coreutils to solve this problem, but I am not good enough with +grep to make that efficient: my experience that grepping anything in this large file takes seconds, +and still produces gigabytes of output which is hard to make heads or tails of.

+

I had relatively more success with lnav.org, but:

+
    +
  • +it is still slower than I would like, +
  • +
  • +it comes with its own unique TUI interface, shortcuts, and workflow, which is at odds with my +standard editing environment. +
  • +
+
+
+ +

+ Window +

+

So, I made window. You run it as

+ +
+ + +
$ window huge-file.log &
+ +
+

It then creates two files:

+
    +
  • +window.toml the file with the input query, +
  • +
  • +huge-file.log.window the result of the query. +
  • +
+

You open both files side-by-side in your editor of choice. Edits to the query file are immediately +reflected in the results file (assuming the editor has auto-save and automatically reloads files +changed on disk):

+

Heres a demo in Emacs (you might want to full-screen that video):

+ +

In the demo, I have to manually save the window.toml file with C-x C-s, but in my +actual usage in VS Code the file is saved automatically after 100ms.

+

As you can see, window is pretty much instant. How is this possible?

+
+
+ +

+ When Best Ideas of rust-analyzer and TigerBeetle are Combined in a Tool of Questionable +Usefulness +

+

Lets take a closer look at that query string:

+ +
+ + +
reverse = false
+position = "0%"
+anchor = ""
+source_bytes_max = 104857600
+target_bytes_max = 102400
+target_lines_max = 50
+filter_in = [
+      ["(replica): 0", "view=74"],
+      ["(replica): 1", "view=74"]
+]
+filter_out = [
+       "ping", "pong"
+]
+ +
+

The secret sauce are source_bytes_max and target_bytes_max parameters.

+

Lets start with target_bytes_max. This is a lesson from rust-analyzer. For dev tools, the user +of software is a human. Humans are slow, and cant process a lot of information. That means it is +generally useless to produce more than a hundred lines of output a human wont be able to make +use of a larger result set theyd rather refine the query than manually sift through pages of +results.

+

So, when designing software to execute a user-supplied query, the inner loop should have some idea +about the amount of results produced so far, and a short-circuiting logic. It is more valuable to +produce some result quickly and to inform the user that the query is not specific, than to spend a +second computing the full result set.

+

A similar assumption underpins the architecture of a lot of language servers. No matter the size of +the codebase, the amount of information displayed on the screen in users IDE at a given point in +time is O(1). A typical successful language server tries hard to do the absolute minimal amount of +work to compute the relevant information, and nothing more.

+

So, the window, by default, limits the output size to the minimum of 100 kilobytes / 50 lines, and +never tries to compute more than that. If the first 50 lines of the output dont contain the result, +the user can make the query more specific by adding more AND terms to filter_in causes, or adding +OR terms to filter_out.

+

TigerBeetle gives window the second magic parameter source_bytes_max. The big insight of +TigerBeetle is that all software always has limits. Sometimes the limit is a hard wall: if a server +runs out of file descriptors, it just crashes. The limit can also be a soft, sloughy bog as well: if +the server runs out of memory, it might start paging memory in and out, slowing to a crawl. Even if +some requests are, functionally speaking, fulfilled, the results are useless, as they arrive too +late. Or, in other words, every request has a (potentially quite large) latency window.

+

It might be a good idea to make the limits explicit, and design software around them. That gives +predictable performance, and allows the user to manually chunk larger requests in manageable pieces.

+

That is exactly what window does. Grepping 100 megabytes is pretty fast. Grepping more might be +slow. So window just doesnt do it. Heres a rough rundown of the algorithm:

+
    +
  1. +mmap the entire input file to a &[u8]. +
  2. +
  3. +Wait until the control file (window.toml) changes and contains a valid query. +
  4. +
  5. +Convert the position field (which might be absolute or a percentage) to an absolute offset. +
  6. +
  7. +Select slice of source_bytes_max starting at that offset. +
  8. +
  9. +Adjust boundaries of the slice to be on \n. +
  10. +
  11. +Iterate lines. +
  12. +
  13. +If a line matches any of filter_out conditions, skip over it. +
  14. +
  15. +If a line matches any of filter_in conditions, add it to the result. +
  16. +
  17. +Break when reaching the end of source_bytes_max window, or when the size of output exceeds +target_bytes_max. +
  18. +
+

The deal is:

+
    +
  • +Its on the user to position a limited window over the interesting part of the input. +
  • +
  • +In exchange, the window tool guarantees constant-time performance. +
  • +
+
+
+ +

+ Limits of Applicability +

+

Important pre-requisites to make the limit the size of the output work are:

+
    +
  • +The user can refine the query. +
  • +
  • +The results are computed instantly. +
  • +
+

If these assumptions are violated, it might be best to return the full list of results.

+

Heres one counterexample! I love reading blogs. When I find a great post, I often try to read all +other posts by the same author older posts which are still relevant usually are much more +valuable then the news of the day. I love when blogs have a simple chronological list of all +articles, a-la: https://matklad.github.io

+

Two blogging platforms mess up this feature:

+

WordPress blogs love to have archives organized by month, where a months page typically has 1 to +3 entries. Whats more, WordPress loves to display a couple of pages of content for each entry. This +is just comically unusable the amount of entries on a page is too few to effectively search +them, but the actual amount of content on a page is overwhelming.

+

Substacks archive is an infinite scroll that fetches 12 entries at a time. 12 entries is a joke! +Its only 1kb compressed, and is clearly bellow human processing limit. There might be some +argument for client-side pagination to postpone loading of posts images, but feeding the posts +themselves over the network one tiny droplet at a time seems excessive.

+
+

To recap:

+
    +
  • +

    Limiting output size might be a good idea, because, with a human on the other side of display, +any additional line of output has a diminishing return (and might even be a net-negative). On the +other hand, constant-time output allows reducing latency, and can even push a batch workflow into +an interactive one

    +
  • +
  • +

    Limiting input size might be a good idea, because the input is always limited anyway. The +question is whether you know the limit, and whether the clients know how to cut their queries into +reasonably-sized batches.

    +
  • +
  • +

    If you have exactly the same 20 GB log file problems as me, you might install window with

    + +
    + + +
    $ cargo install --git https://github.com/matklad/window
    + +
    +
  • +
+
+
+
+ + + + + diff --git a/2024/03/02/Kafka-vs-Nabokov.html b/2024/03/02/Kafka-vs-Nabokov.html new file mode 100644 index 00000000..986a1314 --- /dev/null +++ b/2024/03/02/Kafka-vs-Nabokov.html @@ -0,0 +1,193 @@ + + + + + + + Kafka versus Nabokov + + + + + + + + + + + + +
+ +
+ +
+
+ +

Kafka versus Nabokov

+

Uplifting a lobste.rs comment to a stand-alone post.

+

objectif_lune asks:

+ +
+

I am on the cusp (hopefully) of kicking off development of a fairly large and complex system +(multiple integrated services, kafkas involved, background processes, multiple client frontends, +etc…). It’s predominantly going to be built in rust (but that’s only trivially relevant; i.e. not +following standard OOP).

+

Here’s where i’m at:

+
    +
  1. +I have defined all the components, services, data stores to use / or develop +
  2. +
  3. +I have a a fairly concrete conceptualisation of how to structure and manage data on the storage +end of the system which i’m formalizing into a specification +
  4. +
  5. +I have a deployment model for the various parts of the system to go into production +
  6. +
+

The problem is, I have a gap, from these specs of the individual components and services that need +to be built out, to the actual implementation of those services. I’ve scaffolded the code-base +around what “feels” like sensible semantics, but bridging from the scope, through the high-level +code organisation through to implementation is where I start to get a bit queasy.

+

In the past, i’ve more or less dove head-first into just starting to implement, but the problem has +been that I will very easily end up going in circles, or I end up with a lot of duplicated code +across areas and just generally feel like it’s not working out the way I had hoped (obviously +because i’ve just gone ahead and implemented).

+

What are some tools, processes, design concepts, thinking patterns that you can use to sort of fill +in that “last mile” from high-level spec to implementing to try and ensure that things stay on track +and limit abandonment or going down dead-ends?

+

I’m interested in advice, articles, books, or anything else that makes sense in the rough context +above. Not specifically around for instance design patterns themselves, i’m more than familiar with +the tools in that arsenal, but how do you bridge the gap between the concept and the implementation +without going too deep down the rabbit-hole of modelling out actual code and everything else in UML +for instance? How do you basically minimize getting mired in massive refactors once you get to +implementation phase?

+
+ +
+

My answer:

+
+

I don’t have much experience building these kind of systems (I like Kafka, but I must say I prefer +Nabokov’s rendition of similar ideas in “Invitation to a Beheading” and “Pale Fire” more), but +here’s a couple of things that come to mind.

+

First, every complex system that works started out as a simple system that worked. Write code top +down: https://www.teamten.com/lawrence/programming/write-code-top-down.html

+

Even if it is a gigantic complex system with many moving parts, start with spiking and end-to-end +solution which can handle one particular variation of a happy path. Build skeleton first, flesh can +be added incrementally.

+

To do this, you’ll need some way to actually run the entire system while it isn’t deployed yet, +which is something you need to solve before you start writing pages of code.

+

Similarly, include testing strategy in the specification, and start with one single simple +end-to-end test. I think that TDD as a way to design a class or a function is mostly snake oil +(because “unit” tests are mostly snake +oil), but the overall large scale design of +the system should absolutely be driven by the way the system will be tested.

+

It is helpful to dwell on these two laws:

+

First Law of Distributed Object Design:

+ +
+

Don’t distribute your objects.

+
+ +
+

Conway’s law:

+ +
+

Organizations which design systems are constrained to produce designs which are copies of the +communication structures of these organizations.

+
+ +
+

The code architecture of your solution is going to be isomorphic to your org chart, not to your +deployment topology. Let’s say you want to deploy three different services: foo, bar, and baz. +Just put all three into a single binary, which can be invoked as app foo, app bar, and app +baz. This mostly solves any code duplication issues — if there’s shared code, just call it!

+

Finally, system boundaries are the focus of the design: +https://www.tedinski.com/2018/02/06/system-boundaries.html

+

Figure out hard system boundaries between “your system” and “not your system”, and do design those +carefully. Anything else that looks like a boundary isn’t. It is useful to spend some effort +designing those things as well, but it’s more important to make sure that you can easily change +them. Solid upgrade strategy for deployment trumps any design which seems perfect at a given moment +in time.

+
+
+ + + + + diff --git a/2024/03/21/defer-patterns.html b/2024/03/21/defer-patterns.html new file mode 100644 index 00000000..c6c4149e --- /dev/null +++ b/2024/03/21/defer-patterns.html @@ -0,0 +1,246 @@ + + + + + + + Zig defer Patterns + + + + + + + + + + + + +
+ +
+ +
+
+ +

Zig defer Patterns

+

A short note about some unexpected usages of Zigs defer statement.

+

This post assumes that you already know the basics about RAII, defer and errdefer. While +discussing the differences between them is not the point, I will allow myself one high level +comment. I dont like defer as a replacement for RAII: after writing Zig for some time, I am +relatively confident that humans are just not good at not forgetting defers, especially when +“optional ownership transfer is at play (i.e, this function takes ownership of an argument, unless +an error is returned). But defer is good at discouraging RAII oriented programming. RAII encourages +binding lifetime of resources (such as memory) with lifetimes of individual domain objects (such as +a String). But often, in pursuit of performance and small code size, you want to separate the two +concerns, and let many domain objects to share the single pool of resources. Instead of each +individual string managing its own allocation, you might want to store the contents of all related +strings into a single continuously allocated buffer. Because RAII with defer is painful, Zig +naturally pushes you towards batching your resource acquisition and release calls, such that you have +far fewer resources than objects in your program.

+

But, as Ive said, this post isnt about all that. This post is about non-resource-oriented usages +of defer. Theres more to defer than just RAII, its a nice little powerful construct! This is way +to much ado already, so here come the patterns:

+
+ +

+ Asserting Post Conditions +

+

defer gives you poor mans contract programming in the form of

+ +
+ + +
assert(precondition)
+defer assert(postcondition)
+ +
+

Real life example:

+ +
+ + +
{
+  assert(!grid.free_set.opened);
+  defer assert(grid.free_set.opened);
+
+  // Code to open the free set
+}
+ +
+
+
+ +

+ Statically Enforcing Absence of Errors +

+

This is basically peak Zig:

+ +
+ + +
errdefer comptime unreachable
+ +
+

errdefer runs when a function returns an error (e.g., when a try fails). unreachable +crashes the program (in ReleaseSafe). But comptime unreachable straight up fails compilation +if the compiler tries to generate the corresponding runtime code. The three together ensure the +absence of error-returning paths.

+

Heres an example +from the standard library, the function to grow a hash map:

+ +
+ + +
// The function as a whole can fail...
+fn grow(
+  self: *Self,
+  allocator: Allocator,
+  new_capacity: Size,
+) Allocator.Error!void {
+  @setCold(true);
+  var map: Self = .{};
+  try map.allocate(allocator, new_capacity);
+
+  // ...but from this point on, failure is impossible
+  errdefer comptime unreachable;
+
+  // Code to rehash&copy self to map
+  std.mem.swap(Self, self, &map);
+  map.deinit(allocator);
+}
+ +
+
+
+ +

+ Logging Errors +

+

Zigs error handling mechanism provides only error code (a number) and an error trace. This is +usually plenty to programmatically handle the error in an application and for the operator to +debug a failure, but this is decidedly not enough to provide a nice report for the end user. +However, if you are in a business of reporting errors to users, you are likely writing an +application, and application might get away without propagating extra information about the error +to the caller. Often, theres enough context at the point where the error originates in the first +place to produce a user-facing report right there.

+

Example:

+ +
+ + +
const port = port: {
+  errdefer |err| log.err("failed to read the port number: {}", .{err});
+
+  var buf: [fmt.count("{}\n", .{maxInt(u16)})]u8 = undefined;
+  const len = try process.stdout.?.readAll(&buf);
+  break :port try fmt.parseInt(u16, buf[0 .. len -| 1], 10);
+};
+ +
+
+
+ +

+ Post Increment +

+

Finally, defer can be used as an i++ of sorts. For +example, +heres how you can pop an item off a free list:

+ +
+ + +
pub fn acquire(self: *ScanBufferPool) Error!*const ScanBuffer {
+  if (self.scan_buffer_used == constants.lsm_scans_max) {
+    return Error.ScansMaxExceeded;
+  }
+
+  defer self.scan_buffer_used += 1;
+  return &self.scan_buffers[self.scan_buffer_used];
+}
+ +
+
+
+
+ + + + + diff --git a/2024/03/22/basic-things.html b/2024/03/22/basic-things.html new file mode 100644 index 00000000..6b54667d --- /dev/null +++ b/2024/03/22/basic-things.html @@ -0,0 +1,635 @@ + + + + + + + Basic Things + + + + + + + + + + + + +
+ +
+ +
+
+ +

Basic Things

+

After working on the initial stages of several largish projects, I accumulated a list of things that +share the following three properties:

+ +

Heres the list:

+
+ +

+ READMEs +

+

A project should have a short one-page readme that is mostly links to more topical documentation. +The two most important links are the user docs and the dev docs.

+

A common failure is a readme growing haphazardly by accretion, such that it is neither a good +landing page, nor a source of comprehensive docs on any particular topic. It is hard to refactor +such an unstructured readme later. The information is valuable, if disorganized, but there +isnt any better place to move it to.

+
+
+ +

+ Developer Docs +

+

For developers, you generally want to have a docs folder in the repository. The docs folder should +also contain a short landing page describing the structure of the documentation. This structure +should allow for both a small number of high quality curated documents, and a large number of ad-hoc +append-only notes on any particular topic. For example, docs/README.md could point to carefully +crafted +ARCHITECTURE.md +and CONTRIBUTING.md, which describe high level code and social +architectures, and explicitly say that everything else in the docs/ folder is a set of unorganized +topical guides.

+

Common failure modes here:

+
    +
  1. +

    Theres no place where to put new developer documentation at all. As a result, no docs are +getting written, and, by the time you do need docs, the knowledge is lost.

    +
  2. +
  3. +

    Theres only highly structured, carefully reviewed developer documentation. Contributing docs +requires a lot of efforts, and many small things go undocumented.

    +
  4. +
  5. +

    Theres only unstructured append-only pile of isolated documents. Things are mostly documented, +often two or three times, but any new team member has to do the wheat from the chaff thing anew.

    +
  6. +
+
+
+ +

+ Users Website +

+

Most project can benefit from a dedicated website targeted at the users. You want to have website +ready when there are few-to-no users: usage compounds over time, so, if you find yourself with a +significant number of users and no web face, youve lost quite a bit of value already!

+

Some other failure modes here:

+
    +
  1. +

    A different team manages the website. This prevents project developers from directly contributing +improvements, and may lead to divergence between the docs and the shipped product.

    +
  2. +
  3. +

    Todays web stacks gravitate towards infinite complexity. Its all too natural to pick an easy” +heavy framework at the start, and then get yourself into npms bog. Website is about content, and +content has gravity. Whatever markup language dialect you choose at the beginning is going to +stay with for some time. Do carefully consider the choice of your web stack.

    +
  4. +
  5. +

    Saying that which isnt quite done yet. Dont overpromise, its much easier to say more later +than to take back your words, and humbleness might be a good marketing. Consider if you are in a +domain where engineering credibility travel faster than buzz words. But this is situational. More +general advice would be that marketing also compounds over time, so it pays off to be deliberate +about your image from the start.

    +
  6. +
+
+
+ +

+ Internal Website +

+

This is more situational, but consider if, in addition to public-facing website, you also need an +internal, engineering-facing one. At some point youll probably need a bit more interactivity than +whats available in a README.md perhaps you need a place to display code-related metrics like +coverage or some javascript to compute release rotation. Having a place on the web where a +contributor can place something they need right now without much red tape is nice!

+

This is a recurring theme you should be organized, you should not be organized. Some things +have large fan-out and should be guarded with careful review. Other things benefit from just being +there and a lightweight process. You need to create places for both kinds of things, and a clear +decision rule about what goes where.

+

For internal website, youll probably need some kind of data store as well. If you want to track +binary size across commits, something needs to map commit hashes to (lets be optimistic) +kilobytes! I dont know a good solution here. I use a JSON file in a github repository for similar +purposes.

+
+
+ +

+ Process Docs +

+

There are many possible ways to get some code into the main branch. Pick one, and spell it out in +an .md file explicitly:

+
    +
  • +

    Are feature branches pushed to the central repository, or is anyone works off their fork? I find +forks work better in general as they automatically namespace everyones branches, and put team +members and external contributors on equal footing.

    +
  • +
  • +

    If the repository is shared, what is the naming convention for branches? I prefix mine with +matklad/.

    +
  • +
  • +

    You use not rocket-science rule (more on this later :).

    +
  • +
  • +

    Who should do code review of a particular PR? A single person, to avoid bystander effect and to +reduce notification fatigue. The reviewer is picked by the author of PR, as thats a stable +equilibrium in a high-trust team and cuts red tape.

    +
  • +
  • +

    How the reviewer knows that they need to review code? On GitHub, you want to assign rather than +request a review. Assign is level-triggered it wont go away until the PR is merged, and it +becomes the responsibility of the reviewer to help the PR along until it is merged (request +review is still useful to poke the assignee after a round of feedback&changes). More generally, +code review is the highest priority task theres no reason to work on new code +if theres already some finished code which is just blocked on your review.

    +
  • +
  • +

    What is the purpose of review? Reviewing for correctness, for single voice, for idioms, for +knowledge sharing, for high-level architecture are choices! Explicitly spell out what makes most +sense in the context of your project.

    +
  • +
  • +

    Meta process docs: positively encourage contributing process documentation itself.

    +
  • +
+
+
+ +

+ Style +

+

Speaking about meta process, style guide is where it is most practically valuable. Make sure that +most stylistic comments during code reviews are immediately codified in the project-specific style +document. New contributors should learn projects voice not through a hundred repetitive comments on +PRs, but through a dozen links to specific items of the style guide.

+

Do you even need a project-specific style guide? I think you do cutting down mental energy for +trivial decisions is helpful. If you need a result variable, and half of the functions call it res +and another half of the functions call it result, making this choice is just distracting.

+

Project-specific naming conventions is one of the more useful thing to place in the style guide.

+

Optimize style guide for extensibility. Uplifting a comment from a code review to the style guide +should not require much work.

+

Ensure that theres a style tzar building consensus around specific style choices is very +hard, better to delegate the entire responsibility to one person who can make good enough choices. +Style usually is not about whats better, its about removing needless options in a semi-arbitrary +ways.

+
+
+ +

+ Git +

+

Document stylistic details pertaining to git. If project uses area: prefixes for commits, spell +out an explicit list of such prefixes.

+

Consider documenting acceptable line length for the summary line. Git man page boldly declares that +a summary should be under 50 characters, but that is just plain false. Even in the kernel, most +summaries are somewhere between 50 and 80 characters.

+

Definitely explicitly forbid adding large files to git. Repository size increases monotonically, +git clone time is important.

+

Document merge-vs-rebase thing. My preferred answer is:

+
    +
  • +A unit of change is a pull request, which might contain several commits +
  • +
  • +Merge commit for the pull request is what is being tested +
  • +
  • +The main branch contains only merge commits +
  • +
  • +Conversely, only the main branch contains merge commits, pull requests themselves are always +rebased. +
  • +
+

Forbidding large files in the repo is a good policy, but its hard to follow. Over the lifetime of +the project, someone somewhere will sneakily add and revert a megabyte of generated protobufs, and +that will fly under code review radar.

+

This brings us to the most basic thing of them all:

+
+
+ +

+ Not Rocket Science Rule +

+

Maintain a well-defined set of automated checks that pass on the main branch at all times. If you +dont want large blobs in git repository, write a test rejecting large git objects and run that +right before updating the main branch. No merge commits on feature branches? Write a test which +fails with a pageful of Git self-help if one is detected. Want to wrap .md at 80 columns? Write a +test :)

+

It is perhaps worth you while to re-read the original post: +https://graydon2.dreamwidth.org/1597.html

+

This mindset of monotonically growing set of properties +that are true about the codebase is incredibly powerful. You start seeing code as temporary, fluid +thing that can always be changed relatively cheaply, and the accumulated set of automated tests as +the real value of the project.

+

Another second order effect is that NRSR puts a pressure to optimize your build and test +infrastructure. If you dont have an option to merge the code when an unrelated flaky test fails, +you wont have flaky tests.

+

A common anti-pattern here is that a project grows a set of semi-checks tests that exists, but +are not 100% reliable, and thus are not exercised by the CI routinely. And that creates ambiguity +— are tests failing due to a regression which should be fixed, or were they never reliable, and +just test a property that isnt actually essential for functioning of the project? This fuzziness +compounds over time. If a check isnt reliable enough to be part of NRSR CI gate, it isnt actually +a check you care about, and should be removed.

+

But to do NRSR, you need to build & CI your code first:

+
+
+ +

+ Build & CI +

+

This is a complex topic. Lets start with the basics: what is a build system? I would love to +highlight a couple of slightly unconventional answers here.

+

First, a build system is a bootstrap process: it is how you get from git clone to a working +binary. The two aspects of this boostrapping process are important:

+
    +
  • +It should be simple. No +sudo apt-get install bazzilion packages, +the single binary of your build system should be able to bring everything else thats needed, +automatically. +
  • +
  • +It should be repeatable. Your laptop and your CI should end up with exactly identical set of +dependencies. The end result should be a function of commit hash, and not your local shell +history, otherwise NRSR doesnt work. +
  • +
+

Second, a build system is developer UI. To do almost anything, you need to type some sort of build +system invocation into your shell. There should be a single, clearly documented command for building +and testing the project. If it is not a single makebelieve test, somethings wrong.

+

One anti-pattern here is when the build system spills over to CI. When, to figure out what the set +of checks even is, you need to read .github/workflows/*.yml to compile a list of commands. Thats +accidental complexity! Sprawling yamls are a bad entry point. Put all the logic into the build +system and let the CI drive that, and not vice verse.

+

There is a stronger version of the +advice. No matter the size of the +project, theres probably only a handful of workflows that make sense for it: testing, running, +releasing, etc. This small set of workflows should be nailed from the start, and specific commands +should be documented. When the project subsequently grows in volumes, this set of build-system entry +points should not grow.

+

If you add a Frobnicator, makebelieve test invocation should test that Frobnicator works. If +instead you need a dedicated makebelieve test-frobnicator and the corresponding line in some CI +yaml, you are on a perilous path.

+

Finally, a build system is a collection of commands to make stuff happen. In larger projects, +youll inevitably need some non-trivial amount of glue automation. Even if the entry point is just +makebelive release, internally that might require any number of different tools to build, sign, +tag, upload, validate, and generate a changelog for a new release.

+

A common anti-pattern is to write these sorts of automations in bash and Python, but thats almost +pure technical debt. These ecosystems are extremely finnicky in and of themselves, and, crucially +(unless your project itself is written in bash or Python), they are a second ecosystem to what you +already have in your project for normal code.

+

But releasing software is also just code, which you can write in your primarly language. +The right tool for the job is often the tool you are already using. +It pays off to explicitly attack the problem of glue from the start, and to pick/write a library +that makes writing subprocess wrangling logic easy.

+

Summing the build and CI story up:

+

Build system is self-contained, reproducible and takes on the task of downloading all external +dependencies. Irrespective of size of the project, it contains O(1) different entry points. One of +those entry points is triggered by the not rocket science rule CI infra to run the set of canonical +checks. Theres an explicit support for free-form automation, which is implemented in the same +language as the bulk of the project.

+

Integration with NRSR is the most important aspect of the build process, as it determines how the +project evolves over time. Lets zoom in.

+
+
+ +

+ Testing +

+

Testing is a primary architectural concern. When the first line of code is written, you already +should understand the big picture testing story. It is empathically not every class and module +has unit-test. Testing should be data oriented the job of a particular software is to take some +data in, transform it, and spit different data out. Overall testing strategy requires:

+
    +
  • +some way to specify/generate input data, +
  • +
  • +some way to assert desired properties of output data, and +
  • +
  • +a way to run many individual checks very fast. +
  • +
+

If time is a meaningful part of the input data, it should be modeled explicitly. Not getting the +testing architecture right usually results in:

+
    +
  • +Software that is hard to change because thousands of test nail existing internal APIs. +
  • +
  • +Software that is hard to change because there are no test to confidently verify absence of +unintended breakages. +
  • +
  • +Software that is hard to change because each change requires hours of testing time to verify. +
  • +
+

How to architect a test suite goes beyond the scope of this article, but please read +Unit and Integration Tests +and +How To Test.

+

Some specific things that are in scope for this article:

+

Zero tolerance for flaky tests. Strict not rocket science rules gives this by construction if +you cant merge your pull request because someone elses test is flaky, that flaky test immediately +becomes your problem.

+

Fast tests. Again, NRSR already provides a natural pressure for this, but it also helps to make +testing time more salient otherwise. Just by default printing the total test time and five slowest +tests in a run goes a long way.

+

Not all tests could be fast. Continuing the ying-yang theme of embracing order and chaos +simultaneously, it helps to introduce the concept of slow tests early on. CI always runs the full +suite of tests, fast and slow. But the local makebelive test by default runs only fast test, with +an opt-in for slow tests. Opt in can be as simple as an SLOW_TESTS=1 environmental variable.

+

Introduce a snapshot testing library early. +Although the bulk of tests should probably use project-specific testing harness, for everything else +inline repl-driven snapshot testing is a good default approach, and is something costly to introduce +once youve accumulated a body of non-snapshot-based tests.

+

Alongside the tests, come the benchmarks.

+
+
+ +

+ Benchmarking +

+

I dont have a grand vison about how to make benchmark work in a large, living project, it always +feels like a struggle to me. I do have a couple of tactical tips though.

+

Firstly, any code that is not running during NRSR is effectively dead. It is exceedingly common +for benchmarks to be added alongside a performance improvement, and then not getting hooked up +with CI. So, two month down the line, the benchmark either stops compiling outright, or maybe just +panics at a startup due to some unrelated change.

+

This fix here is to make sure that every benchmark is also a test. Parametrize every benchmark by +input size, such that with a small input it finishes in milliseconds. Then write a test that +literally just calls the benchmarking code with this small input. And remember that your build +system should have O(1) entry points. Plug this into a makebelieve test, not into a +dedicated makebelieve benchmark --small-size.

+

Secondly, any large project has a certain amount of very important macro metrics.

+
    +
  • +How long does it take to build? +
  • +
  • +How long does it take to test? +
  • +
  • +How large is the resulting artifact shipping to users? +
  • +
+

These are some of the questions that always matter. You need infrastructure to track these numbers, +and to see them regularly. This where the internal website and its data store come in. During CI, +note those number. After CI run, upload a record with commit hash, metric name, metric value +somewhere. Dont worry if the results are noisy you target the baseline here, ability to +notice large changes over time.

+

Two options for the upload part:

+
    +
  • +

    Just put them into some .json file in a git repo, and LLM a bit of javascript to display a nice +graph from these data.

    +
  • +
  • +

    https://nyrkio.com is a surprisingly good SaaS offering that I can recommend.

    +
  • +
+
+
+ +

+ Fuzz Testing +

+

Serious fuzz testing curiously shares characteristics of tests and benchmarks. Like a normal test, a +fuzz test informs you about a correctness issue in your application, and is reproducible. Like a +benchmark, it is (infinitely) long running and infeasible to do as a part of NRSR.

+

I dont yet have a good hang on how to most effectively integrate continuous fuzzing into +development process. I dont know what is the not rocket science rule of fuzzing. But two things +help:

+

First, even if you cant run fuzzing loop during CI, you can run isolated seeds. To help ensure +that the fuzing code doesnt get broken, do the same thing as with benchmark add a test that +runs fuzzing logic with a fixed seed and small, fast parameters. One variation here is that you can +use commit sha as random a seed that way the code is still reproducible, but there is enough +variation to avoid dynamically dead code.

+

Second, it is helpful to think about fuzzing in terms of level triggering. With tests, when you +make an erroneous commit, you immediately know that it breaks stuff. With fuzzing, you generally +discover this later, and a broken seed generally persists for several commits. So, as an output of +the fuzzer, I think what you want is not a set of GitHub issues, but rather a dashboard of sorts +which shows a table of recent commits and failing seeds for those commits.

+

With not rocket science rule firmly in place, it makes sense to think about releases.

+
+
+ +

+ Releases +

+

Two core insights here:

+

First release process is orthogonal from software being production ready. You can release +stuff before it is ready (provided that you add a short disclaimer to the readme). So, it pays off +to add proper release process early on, such that, when the time comes to actually release +software, it comes down to removing disclaimers and writing the announcement post, as all technical +work has been done ages ago.

+

Second, software engineering in general observes reverse triangle inequality: to get from A to C, +it is faster to go from A to B and then from B to C, then moving from A to C atomically. If you make +a pull request, it helps to split it up into smaller parts. If you refactor something, it is faster +to first introduce a new working copy and then separately retire the old code, rather than changing +the thing in place.

+

Releases are no different: faster, more frequent releases are easier and less risky. Weekly cadence +works great, provided that you have a solid set of checks in your NRSR.

+

It is much easier to start with a state where almost nothing works, but theres a solid release +(with an empty set of features), and ramp up from there, than to hack with reckless abandon +without thinking much about eventual release, and then scramble to decide which is ready and +releasable, a what should be cut.

+
+
+ +

+ Summary +

+

I think thats it for today? Thats a lot of small points! Heres a bullet list for convenient +reference:

+
    +
  • +README as a landing page. +
  • +
  • +Dev docs. +
  • +
  • +User docs. +
  • +
  • +Structured dev docs (architecture and processes). +
  • +
  • +Unstructured ingest-optimized dev docs (code style, topical guides). +
  • +
  • +User website, beware of content gravity. +
  • +
  • +Ingest-optimized internal web site. +
  • +
  • +Meta documentation process its everyone job to append to code style and process docs. +
  • +
  • +Clear code review protocol (in whose court is the ball currently?). +
  • +
  • +Automated check for no large blobs in a git repo. +
  • +
  • +Not rocket science rule. +
  • +
  • +Lets repeat: at all times, the main branch points at a commit hash which is known to pass a +set of well-defined checks. +
  • +
  • +No semi tests: if the code is not good enough to add to NRSR, it is deleted. +
  • +
  • +No flaky tests (mostly by construction from NRSR). +
  • +
  • +Single command build. +
  • +
  • +Reproducible build. +
  • +
  • +Fixed number of build system entry points. No separate lint step, a lint is a kind of a test. +
  • +
  • +CI delegates to the build system. +
  • +
  • +Space for ad-hoc automation in the main language. +
  • +
  • +Overarching testing infrastructure, grand unified theory of projects testing. +
  • +
  • +Fast/Slow test split (fast=seconds per test suite, slow=low digit minutes per test suite). +
  • +
  • +Snapshot testing. +
  • +
  • +Benchmarks are tests. +
  • +
  • +Macro metrics tracking (time to build, time to test). +
  • +
  • +Fuzz tests are tests. +
  • +
  • +Level-triggered display of continuous fuzzing results. +
  • +
  • +Inverse triangle inequality. +
  • +
  • +Weekly releases. +
  • +
+
+
+
+ + + + + diff --git a/2024/06/04/regular-recursive-restricted.html b/2024/06/04/regular-recursive-restricted.html new file mode 100644 index 00000000..b9dc91d2 --- /dev/null +++ b/2024/06/04/regular-recursive-restricted.html @@ -0,0 +1,338 @@ + + + + + + + Regular, Recursive, Restricted + + + + + + + + + + + + +
+ +
+ +
+
+ +

Regular, Recursive, Restricted

+

A post/question about formal grammars, wherein I search for a good formalism for describing infix +expressions.

+

Problem statement: its hard to describe arithmetic expressions in a way that:

+ +

Lets start with the following grammar for arithmetic expressions:

+ +
+ + +
Expr =
+    'number'
+  | '(' Expr ')'
+  | Expr '+' Expr
+  | Expr '*' Expr
+ +
+

It is definitely declarative and obvious. But it is ambiguous it doesnt tell whether * or + +binds tighter, and their associativity. You can express those properties directly in the grammar:

+ +
+ + +
Expr =
+    Factor
+  | Expr '+' Factor
+
+Factor =
+    Atom
+  | Factor '*' Atom
+
+Atom = 'number' | '(' Expr ')'
+ +
+

But at this point we lose decorativeness. The way my brain parses the above grammar is by pattern +matching it as a grammar for infix expressions and folding it back to the initial compressed form, +not by reading the grammar rules as written.

+

To go in another direction, you can define ambiguity away and get parsing expression grammars:

+ +
+ + +
Exp =
+    Sum
+  / Product
+  / Atom
+
+Sum     = Expr (('+' / '-') Expr)+
+Product = Expr (('*' / '/') Expr)+
+
+Atom = 'number' | '(' Expr ')'
+ +
+

This captures precedence mostly declaratively: we first match Sum, and, failing that, match +Product. But the clarity of semantics is lost PEGs are never ambiguous by virtue of always +picking the first alternative, so its too easy to introduce an unintended ambiguity.

+

Can we have both? Clarity with respect to tree shape and clarity with respect to ambiguity?

+

Let me present a formalism that, I think, ticks both boxes for the toy example and pose a question +of whether it generalizes.

+
+

Running example:

+ +
+ + +
E =
+    'number'
+  | '(' E ')'
+  | E '+' E
+ +
+

As a grammar for strings, it is ambiguous. There are two parse trees for 1 + 2 + 3 the +“correct one (1 + 2) + 3, and the alternative: 1 + (2 + 3).

+

Instead, lets see it as a grammar for trees instead. Specifically, trees where:

+ +

For trees, this is a perfectly fine grammar! Given a labeled tree, its trivial to check whether it +matches the grammar: for each node, you can directly match the regular expression. Theres also no +meaningful ambiguity while arbitrary regular expressions can be ambiguous (aa | a*), this +doesnt really come up as harmful in practice all that often, and, in any case, its easy to check +that any two regular alternatives are disjoint (intersect the two automata, minimize the result, +check if it is empty).

+

As a grammar for trees, it has the following property: there are two distinct trees which +nevertheless share the same sequence of leaves:

+ +
+ + +
        E                  E
+        o                  o
+      / | \              / | \
+     E '+' E            E '+' E
+     o     |            |     o
+   / | \  '3'          '1'  / | \
+  E '+' E                  E '+' E
+  |     |                  |     |
+ '1'   '2'                '2'   '3'
+ +
+

So lets restrict the set of trees, in the most straightforward manner, by adding some inequalities:

+ +
+ + +
E =
+    'number'
+  | '(' E ')'
+  | E '+' E
+
+E !=
+    E '+' [E '+' E]
+ +
+

Here, square brackets denote a child. E '+' [E '+' E] is a plus node whose right child is also a +plus node. Checking whether a tree conform to this modified set of rules is easy as negative rules +are also just regular expressions. Well, I think you need some fiddling here, as, as written, a +negative rule matches two different levels of the tree, but you can flatten both the rule and the +actual tree to the grandchildren level by enclosing children in parenthesis. Let me show an example:

+

We want to match this node:

+ +
+ + +
    E
+    o
+  / | \
+ E '+' E
+ |     o
+'1'  / | \
+    E '+' E
+ +
+

against this rule concerning children and grand children:

+ +
+ + +
E '+' [E '+' E]
+ +
+

We write the list of children and grandchidren of the node, while adding extra [], to get this +string:

+ +
+ + +
['1'] '+' [E '+' E]
+ +
+

And in the rule we replace top-level non-terminals with [.*], to get this regular expression:

+ +
+ + +
[.*] '+' [E '+' E]
+ +
+

Now we can match the string against a regex, get a mach, and rule out the tree (remember, this is +!=).

+

So here it is, a perfectly functional mathematical animal: recursive restricted regular expression:

+ +

This construction denotes a set of labeled trees, where interior nodes are labeled with N, leaves +are labeled with T and for each interior node

+ +

And the main question one would have, if confronted with a specimen, is is it ambiguous? That is, +are there two trees in the set which have the same sequence of leaves?

+

Lets look at an example:

+ +
+ + +
Expr =
+    'number'
+  | '(' Expr ')'
+  | Expr '+' Expr
+  | Expr '*' Expr
+
+Expr !=
+             Expr '+' [Expr '+' Expr]
+|            Expr '*' [Expr '*' Expr]
+|            Expr '*' [Expr '+' Expr]
+| [Expr '+' Expr] '*' Expr
+ +
+

It looks unambiguous to me! And I am pretty sure that I can prove, by hand, that it is in fact +unambiguous (well, I might discover that I miss a couple of restrictions in process, but it feels +like it should work in principle). The question is, can a computer take an arbitrary recursive +restricted regular expression and tell me that its unambiguous, or, failing that, provide a +counter-example?

+

In the general case, the answer is no this is at least as expressive as CFG, and ambiguity of +arbitrary CFG is undecidable. But perhaps theres some reasonable set of restrictions under which it +is in fact possible to prove the absence of ambiguity?

+
+
+ + + + + diff --git a/2024/07/05/properly-testing-concurrent-data-structures.html b/2024/07/05/properly-testing-concurrent-data-structures.html new file mode 100644 index 00000000..7f61933d --- /dev/null +++ b/2024/07/05/properly-testing-concurrent-data-structures.html @@ -0,0 +1,1471 @@ + + + + + + + Properly Testing Concurrent Data Structures + + + + + + + + + + + + +
+ +
+ +
+
+ +

Properly Testing Concurrent Data Structures

+

Theres a fascinating Rust library, loom, which can be used to +thoroughly test lock-free data structures. I always wanted to learn how it works. I still do! But +recently I accidentally implemented a small toy which, I think, contains some of the looms ideas, +and it seems worthwhile to write about that. The goal here isnt to teach you what you should be +using in practice (if you need that, go read looms docs), but rather to derive a couple of neat +ideas from first principles.

+
+ +

+ One, Two, Three, Two +

+

As usual, we need the simplest possible model program to mess with. The example we use comes from +this excellent article. +Behold, a humble (and broken) concurrent counter:

+ +
+ + +
use std::sync::atomic::{
+  AtomicU32,
+  Ordering::SeqCst,
+};
+
+#[derive(Default)]
+pub struct Counter {
+  value: AtomicU32,
+}
+
+impl Counter {
+  pub fn increment(&self) {
+    let value = self.value.load(SeqCst);
+    self.value.store(value + 1, SeqCst);
+  }
+
+  pub fn get(&self) -> u32 {
+    self.value.load(SeqCst)
+  }
+}
+ +
+

The bug is obvious here the increment is not atomic. But what is the best test we can write to +expose it?

+
+
+ +

+ Trivial Test +

+

The simplest idea that comes to mind is to just hammer the same counter from multiple threads and +check the result at the end;

+ +
+ + +
#[test]
+fn threaded_test() {
+  let counter = Counter::default();
+
+  let thread_count = 100;
+  let increment_count = 100;
+
+  std::thread::scope(|scope| {
+    for _ in 0..thread_count {
+      scope.spawn(|| {
+        for _ in 0..increment_count {
+          counter.increment()
+        }
+      });
+    }
+  });
+
+  assert_eq!(counter.get(), thread_count * increment_count);
+}
+ +
+

This fails successfully:

+ +
+ + +
thread 'counter::trivial' panicked:
+assertion `left == right` failed
+  left: 9598
+ right: 10000
+ +
+

But I wouldnt call this test satisfactory it very much depends on the timing, so you cant +reproduce it deterministically and you cant debug it. You also cant minimize it if you reduce +the number of threads and increments, chances are the test passes by luck!

+
+
+ +

+ PBT +

+

Of course the temptation is to apply property based testing here! The problem almost fits: we have +easy-to-generate input (the sequence of increments spread over several threads), a good property to +check (result of concurrent increments is identical to that of sequential execution) and the desire +to minimize the test.

+

But just how can we plug threads into a property-based test?

+

PBTs are great for testing state machines. You can run your state machine through a series of steps +where at each step a PBT selects an arbitrary next action to apply to the state:

+ +
+ + +
#[test]
+fn state_machine_test() {
+  arbtest::arbtest(|rng| {
+    // This is our state machine!
+    let mut state: i32 = 0;
+
+    // We'll run it for up to 100 steps.
+    let step_count: usize = rng.int_in_range(0..=100)?;
+
+    for _ in 0..step_count {
+      // At each step, we flip a coin and
+      // either increment or decrement.
+      match *rng.choose(&["inc", "dec"])? {
+        "inc" => state += 1,
+        "dec" => state -= 1,
+        _ => unreachable!(),
+      }
+    }
+    Ok(())
+  });
+}
+ +
+

And it feels like we should be able to apply the same technique here. At every iteration, pick a +random thread and make it do a single step. If you can step the threads manually, it should be easy +to maneuver one thread in between load&store of a different thread.

+

But we cant step through threads! Or can we?

+
+
+ +

+ Simple Instrumentation +

+

Ok, lets fake it until we make it! Lets take a look at the buggy increment method:

+ +
+ + +
pub fn increment(&self) {
+  let value = self.value.load(SeqCst);
+  self.value.store(value + 1, SeqCst);
+}
+ +
+

Ideally, wed love to be able to somehow pause the thread in-between atomic operations. Something +like this:

+ +
+ + +
pub fn increment(&self) {
+  pause();
+  let value = self.value.load(SeqCst);
+  pause();
+  self.value.store(value + 1, SeqCst);
+  pause();
+}
+
+fn pause() {
+    // ¯\_(ツ)_/¯
+}
+ +
+

So lets start with implementing our own wrapper for AtomicU32 which includes calls to pause.

+ +
+ + +
use std::sync::atomic::Ordering;
+
+struct AtomicU32 {
+  inner: std::sync::atomic::AtomicU32,
+}
+
+impl AtomicU32 {
+  pub fn load(&self, ordering: Ordering) -> u32 {
+    pause();
+    let result = self.inner.load(ordering);
+    pause();
+    result
+  }
+
+  pub fn store(&self, value: u32, ordering: Ordering) {
+    pause();
+    self.inner.store(value, ordering);
+    pause();
+  }
+}
+
+fn pause() {
+  // still no idea :(
+}
+ +
+
+
+ +

+ Managed Threads API +

+

One rule of a great API design is that you start by implement a single user of an API, to +understand how the API should feel, and only then proceed to the actual implementation.

+

So, in the spirit of faking, lets just write a PBT using these pausable, managed threads, even if +we still have no idea how to actually implement pausing.

+

We start with creating a counter and two managed threads. And we probably want to pass a reference +to the counter to each of the threads:

+ +
+ + +
let counter = Counter::default();
+let t1 = managed_thread::spawn(&counter);
+let t2 = managed_thread::spawn(&counter);
+ +
+

Now, we want to step through the threads:

+ +
+ + +
while !rng.is_empty() {
+  let coin_flip: bool = rng.arbitrary()?;
+  if t1.is_paused() {
+    if coin_flip {
+      t1.unpause();
+    }
+  }
+  if t2.is_paused() {
+    if coin_flip {
+      t2.unpause();
+    }
+  }
+}
+ +
+

Or, refactoring this a bit to semantically compress:

+ +
+ + +
let counter = Counter::default();
+let t1 = managed_thread::spawn(&counter);
+let t2 = managed_thread::spawn(&counter);
+let threads = [t1, t2];
+
+while !rng.is_empty() {
+  for t in &mut threads {
+    if t.is_paused() && rng.arbitrary()? {
+      t.unpause()
+    }
+  }
+}
+ +
+

That is, on each step of our state machine, we loop through all threads and unpause a random subset +of them.

+

But besides pausing and unpausing, we need our threads to actually do something, to increment the +counter. One idea is to mirror the std::spawn API and pass a closure in:

+ +
+ + +
let t1 = managed_thread::spawn({
+  let counter = &counter;
+  move || {
+    for _ in 0..100 {
+      counter.increment();
+    }
+  }
+});
+ +
+

But as these are managed threads, and we want to control them from our tests, lets actually go all +the way there and give the controlling thread an ability to change the code running in a managed +thread. That is, well start managed threads without a main function, and provide an API to +execute arbitrary closures in the context of this by-default inert thread (universal +server anyone?):

+ +
+ + +
let counter = Counter::default();
+
+// We pass the state, &counter, in, but otherwise the thread is inert.
+let t = managed_thread::spawn(&counter);
+
+// But we can manually poke it:
+t.submit(|thread_state: &Counter| thread_state.increment());
+t.submit(|thread_state: &Counter| thread_state.increment());
+ +
+

Putting everything together, we get a nice-looking property test:

+ +
+ + +
#[cfg(test)]
+use managed_thread::AtomicU32;
+#[cfg(not(test))]
+use std::sync::atomic::AtomicU32;
+
+#[derive(Default)]
+pub struct Counter {
+  value: AtomicU32,
+}
+
+impl Counter {
+  // ...
+}
+
+#[test]
+fn test_counter() {
+  arbtest::arbtest(|rng| {
+    // Our "Concurrent System Under Test".
+    let counter = Counter::default();
+
+    // The sequential model we'll compare the result against.
+    let counter_model: u32 = 0;
+
+    // Two managed threads which we will be stepping through
+    // manually.
+    let t1 = managed_thread::spawn(&counter);
+    let t2 = managed_thread::spawn(&counter);
+    let threads = [t1, t2];
+
+    // Bulk of the test: in a loop, flip a coin and advance
+    // one of the threads.
+    while !rng.is_empty() {
+      for t in &mut [t1, t2] {
+        if rng.arbitrary() {
+          if t.is_paused() {
+            t.unpause()
+          } else {
+            // Standard "model equivalence" property: apply
+            // isomorphic actions to the system and its model.
+            t.submit(|c| c.increment());
+            counter_model += 1;
+          }
+        }
+      }
+    }
+
+    for t in threads {
+      t.join();
+    }
+
+    assert_eq!(counter_model, counter.get());
+
+    Ok(())
+  });
+}
+ +
+

Now, if only we could make this API work Remember, our pause implementation is a shrug emoji!

+

At this point, you might be mightily annoyed at me for this rhetorical device where I pretend that I +dont know the answer. No need for annoyance when writing this code for the first time, I traced +exactly these steps I realized that I need a pausing AtomicU32 so I did that (with dummy +pause calls), then I played with the API I wanted to have, ending at roughly this spot, without +yet knowing how I would make it work or, indeed, if it is possible at all.

+

Well, if I am being honest, there is a bit of up-front knowledge here. I dont think we can avoid +spawning real threads here, unless we do something really cursed with inline assembly. When +something calls that pause() function, and we want it to stay paused until further notice, that +just has to happen in a thread which maintains a stack separate from the stack of our test. And, if +we are going to spawn threads, we might as well spawn scoped threads, so that we can freely borrow +stack-local data. And to spawn a scope thread, you need a +Scope parameter. So in reality +well need one more level of indentation here:

+ +
+ + +
    std::thread::scope(|scope| {
+      let t1 = managed_thread::spawn(scope, &counter);
+      let t2 = managed_thread::spawn(scope, &counter);
+      let threads = [t1, t2];
+      while !rng.is_empty() {
+        for t in &mut [t1, t2] {
+          // ...
+        }
+      }
+    });
+ +
+
+
+ +

+ Managed Threads Implementation +

+

Now, the fun part: how the heck are we going to make pausing and unpausing work? For starters, there +clearly needs to be some communication between the main thread (t.unpause()) and the managed +thread (pause()). And, because we dont want to change Counter API to thread some kind of +test-only context, the context needs to be smuggled. So thread_local! it is. And this context +is going to be shared between two threads, so it must be wrapped in an Arc.

+ +
+ + +
struct SharedContext {
+  // 🤷
+}
+
+thread_local! {
+  static INSTANCE: RefCell<Option<Arc<SharedContext>>> =
+    RefCell::new(None);
+}
+
+impl SharedContext {
+  fn set(ctx: Arc<SharedContext>) {
+    INSTANCE.with(|it| *it.borrow_mut() = Some(ctx));
+  }
+
+  fn get() -> Option<Arc<SharedContext>> {
+    INSTANCE.with(|it| it.borrow().clone())
+  }
+}
+ +
+

As usual when using thread_local! or lazy_static!, it is convenient to immediately wrap it into +better typed accessor functions. And, given that we are using an Arc here anyway, we can +conveniently escape thread_locals with by cloning the Arc.

+

So now we finally can implement the global pause function (or at least can kick the proverbial can +a little bit farther):

+ +
+ + +
fn pause() {
+  if let Some(ctx) = SharedContext::get() {
+    ctx.pause()
+  }
+}
+
+impl SharedContext {
+  fn pause(&self) {
+    // 😕
+  }
+}
+ +
+

Ok, what to do next? We somehow need to coordinate the control thread and the managed thread. And we +need some sort of notification mechanism, so that the managed thread knows when it can continue. The +most brute force solution here is a pair of a mutex protecting some state and a condition variable. +Mutex guards the state that can be manipulated by either of the threads. Condition variable can be +used to signal about the changes.

+ +
+ + +
struct SharedContext {
+  state: Mutex<State>,
+  cv: Condvar,
+}
+
+struct State {
+  // 🤡
+}
+ +
+

Okay, it looks like I am running out of emojies here. Theres no more layers of indirection or +infrastructure left, we need to write some real code that actually does do that pausing thing. So +lets say that the state is tracking, well, the state of our managed thread, which can be either +running or paused:

+ +
+ + +
#[derive(PartialEq, Eq, Default)]
+enum State {
+  #[default]
+  Running,
+  Paused,
+}
+ +
+

And then the logic of the pause function flip the state from Running to Paused, notify the +controlling thread that we are Paused, and wait until the controlling thread flips our state back +to Running:

+ +
+ + +
impl SharedContext {
+  fn pause(&self) {
+    let mut guard = self.state.lock().unwrap();
+    assert_eq!(*guard, State::Running);
+    *guard = State::Paused;
+    self.cv.notify_all();
+    while *guard == State::Paused {
+      guard = self.cv.wait(guard).unwrap();
+    }
+    assert_eq!(*guard, State::Running);
+  }
+}
+ +
+

Aside: Rusts API for condition variables is beautiful. Condvars are tricky, and I didnt really +understood them until seeing the signatures of Rust functions. Notice how the wait function +takes a mutex guard as an argument, and returns a mutex guard. This protects you from the logical +races and guides you towards the standard pattern of using condvars:

+

First, you lock the mutex around the shared state. Then, you inspect whether the state is what you +need. If thats the case, great, you do what you wanted to do and unlock the mutex. If not, then, +while still holding the mutex, you wait on the condition variable. Which means that the +mutex gets unlocked, and other threads get the chance to change the shared state. When they do +change it, and notify the condvar, your thread wakes up, and it gets the locked mutex back (but the +state now is different). Due to the possibility of spurious wake-ups, you need to double check the +state and be ready to loop back again to waiting.

+

Naturally, theres a helper that encapsulates this whole pattern:

+ +
+ + +
impl SharedContext {
+  fn pause(&self) {
+    let mut guard = self.state.lock().unwrap();
+    assert_eq!(*guard, State::Running);
+    *guard = State::Paused;
+    self.cv.notify_all();
+    guard = self
+      .cv
+      .wait_while(guard, |state| *state == State::Paused)
+      .unwrap();
+    assert_eq!(*guard, State::Running)
+  }
+}
+ +
+

Ok, this actually does look like a reasonable implementation of pause. Lets move on to +managed_thread::spawn:

+ +
+ + +
fn spawn<'scope, T: 'scope + Send>(
+  scope: &Scope<'scope, '_>,
+  state: T,
+) {
+  // ? ? ?? ??? ?????
+}
+ +
+

Theres a bunch of stuff that needs to happen here:

+
    +
  • +As we have established, we are going to spawn a (scoped) thread, so we need the scope parameter +with its three lifetimes. I dont know how it works, so I am just going by the docs here! +
  • +
  • +We are going to return some kind of handle, which we can use to pause and unpause our managed +thread. And that handle is going to be parametrized over the same 'scope lifetime, because itll +hold onto the actual join handle. +
  • +
  • +We are going to pass the generic state to our new thread, and that state needs to be Send, and +bounded by the same lifetime as our scoped thread. +
  • +
  • +Inside, we are going to spawn a thread for sure, and well need to setup the INSTANCE thread +local on that thread. +
  • +
  • +And it would actually be a good idea to stuff a reference to that SharedContext into the handle +we return. +
  • +
+

A bunch of stuff, in other words. Lets do it:

+ +
+ + +
struct ManagedHandle<'scope> {
+  inner: std::thread::ScopedJoinHandle<'scope, ()>,
+  ctx: Arc<SharedContext>,
+}
+
+fn spawn<'scope, T: 'scope + Send>(
+  scope: &'scope Scope<'scope, '_>,
+  state: T,
+) -> ManagedHandle<'scope> {
+  let ctx: Arc<SharedContext> = Default::default();
+  let inner = scope.spawn({
+    let ctx = Arc::clone(&ctx);
+    move || {
+      SharedContext::set(ctx);
+      drop(state); // TODO: ¿
+    }
+  });
+  ManagedHandle { inner, ctx }
+}
+ +
+

The essentially no-op function we spawn looks sus. Well fix later! Lets try to implement +is_paused and unpause first! They should be relatively straightforward. For is_paused, we just +need to lock the mutex and check the state:

+ +
+ + +
impl ManagedHandle<'_> {
+  pub fn is_paused(&self,) -> bool {
+    let guard = self.ctx.state.lock().unwrap();
+    *guard == State::Paused
+  }
+}
+ +
+

For unpause, we should additionally flip the state back to Running and notify the other thread:

+ +
+ + +
impl ManagedHandle<'_> {
+  pub fn unpause(&self) {
+    let mut guard = self.ctx.state.lock().unwrap();
+    assert_eq!(*guard, State::Paused);
+    *guard = State::Running;
+    self.ctx.cv.notify_all();
+  }
+}
+ +
+

But I think thats not quite correct. Can you see why?

+

With this implementation, after unpause, the controlling and the managed threads will be running +concurrently. And that can lead to non-determinism, the very problem we are trying to avoid here! In +particular, if you call is_paused right after you unpause the thread, youll most likely get +false back, as the other thread will still be running. But it might also hit the next pause +call, so, depending on timing, you might also get true.

+

What we want is actually completely eliminating all unmanaged concurrency. That means that at any +given point in time, only one thread (controlling or managed) should be running. So the right +semantics for unpause is to unblock the managed thread, and then block the controlling thread +until the managed one hits the next pause!

+ +
+ + +
impl ManagedHandle<'_> {
+  pub fn unpause(&self) {
+    let mut guard = self.ctx.state.lock().unwrap();
+    assert_eq!(*guard, State::Paused);
+    *guard = State::Running;
+    self.ctx.cv.notify_all();
+    guard = self
+      .ctx
+      .cv
+      .wait_while(guard, |state| *state == State::Running)
+      .unwrap();
+  }
+}
+ +
+

At this point we can spawn a managed thread, pause it and resume. But right now it doesnt do +anything. Next step is implementing that idea where the controlling thread can directly send an +arbitrary closure to the managed one to make it do something:

+ +
+ + +
impl<'scope> ManagedHandle<'scope> {
+  pub fn submit<F: FnSomething>(&self, f: F)
+}
+ +
+

Lets figure this FnSomething bound! We are going to yeet this f over to the managed thread and +run it there once, so it is FnOnce. It is crossing thread-boundary, so it needs to be + Send. +And, because we are using scoped threads, it doesnt have to be 'static, just 'scope is +enough. Moreover, in that managed thread the f will have exclusive access to threads state, T. +So we have:

+ +
+ + +
impl<'scope> ManagedHandle<'scope> {
+  pub fn submit<F: FnOnce(&mut T) + Send + 'scope>(self, f: F)
+}
+ +
+

Implementing this is a bit tricky. First, well need some sort of the channel to actually move the +function. Then, similarly to the unpause logic, well need synchronization to make sure that the +control thread doesnt resume until the managed thread starts running f and hits a pause (or maybe +completes f). And well also need a new state, Ready, because now there are two different +reasons why a managed thread might be blocked it might wait for an unpause event, or it might +wait for the next f to execute. This is the new code:

+ +
+ + +
#[derive(Default)]
+enum State {
+  #[default]
+  Ready,
+  Running,
+  Paused,
+}
+
+struct ManagedHandle<'scope, T> {
+  inner: std::thread::ScopedJoinHandle<'scope, ()>,
+  ctx: Arc<SharedContext>,
+  sender: mpsc::Sender<Box<dyn FnOnce(&mut T) + 'scope + Send>>,
+}
+
+pub fn spawn<'scope, T: 'scope + Send>(
+  scope: &'scope Scope<'scope, '_>,
+  mut state: T,
+) -> ManagedHandle<'scope, T> {
+  let ctx: Arc<SharedContext> = Default::default();
+  let (sender, receiver) =
+    mpsc::channel::<Box<dyn FnOnce(&mut T) + 'scope + Send>>();
+  let inner = scope.spawn({
+    let ctx = Arc::clone(&ctx);
+    move || {
+      SharedContext::set(Arc::clone(&ctx));
+
+      for f in receiver {
+        f(&mut state);
+
+        let mut guard = ctx.state.lock().unwrap();
+        assert_eq!(*guard, State::Running);
+        *guard = State::Ready;
+        ctx.cv.notify_all()
+      }
+    }
+  });
+  ManagedHandle { inner, ctx, sender }
+}
+
+impl<'scope, T> ManagedHandle<'scope, T> {
+  pub fn submit<F: FnOnce(&mut T) + Send + 'scope>(&self, f: F) {
+    let mut guard = self.ctx.state.lock().unwrap();
+    assert_eq!(*guard, State::Ready);
+    *guard = State::Running;
+    self.sender.send(Box::new(f)).unwrap();
+    guard = self
+      .ctx
+      .cv
+      .wait_while(guard, |state| *state == State::Running)
+      .unwrap();
+  }
+}
+ +
+

The last small piece of the puzzle is the join function. Its almost standard! First we close +our side of the channel. This serves as a natural stop signal for the other thread, so it exits. +Which in turn allows us to join it. The small wrinkle here is that the thread might be paused when +we try to join it, so we need to unpause it beforehand:

+ +
+ + +
impl<'scope, T> ManagedHandle<'scope, T> {
+  pub fn join(self) {
+    while self.is_paused() {
+      self.unpause();
+    }
+    drop(self.sender);
+    self.inner.join().unwrap();
+  }
+}
+ +
+

Thats it! Lets put everything together!

+

Helper library, managed_thread.rs:

+ +
+ + +
use std::{
+  cell::RefCell,
+  sync::{atomic::Ordering, mpsc, Arc, Condvar, Mutex},
+  thread::Scope,
+};
+
+#[derive(Default)]
+pub struct AtomicU32 {
+  inner: std::sync::atomic::AtomicU32,
+}
+
+impl AtomicU32 {
+  pub fn load(&self, ordering: Ordering) -> u32 {
+    pause();
+    let result = self.inner.load(ordering);
+    pause();
+    result
+  }
+
+  pub fn store(&self, value: u32, ordering: Ordering) {
+    pause();
+    self.inner.store(value, ordering);
+    pause();
+  }
+}
+
+fn pause() {
+  if let Some(ctx) = SharedContext::get() {
+    ctx.pause()
+  }
+}
+
+#[derive(Default)]
+struct SharedContext {
+  state: Mutex<State>,
+  cv: Condvar,
+}
+
+#[derive(Default, PartialEq, Eq, Debug)]
+enum State {
+  #[default]
+  Ready,
+  Running,
+  Paused,
+}
+
+thread_local! {
+  static INSTANCE: RefCell<Option<Arc<SharedContext>>> =
+    RefCell::new(None);
+}
+
+impl SharedContext {
+  fn set(ctx: Arc<SharedContext>) {
+    INSTANCE.with(|it| *it.borrow_mut() = Some(ctx));
+  }
+
+  fn get() -> Option<Arc<SharedContext>> {
+    INSTANCE.with(|it| it.borrow().clone())
+  }
+
+  fn pause(&self) {
+    let mut guard = self.state.lock().unwrap();
+    assert_eq!(*guard, State::Running);
+    *guard = State::Paused;
+    self.cv.notify_all();
+    guard = self
+      .cv
+      .wait_while(guard, |state| *state == State::Paused)
+      .unwrap();
+    assert_eq!(*guard, State::Running)
+  }
+}
+
+pub struct ManagedHandle<'scope, T> {
+  inner: std::thread::ScopedJoinHandle<'scope, ()>,
+  sender: mpsc::Sender<Box<dyn FnOnce(&mut T) + 'scope + Send>>,
+  ctx: Arc<SharedContext>,
+}
+
+pub fn spawn<'scope, T: 'scope + Send>(
+  scope: &'scope Scope<'scope, '_>,
+  mut state: T,
+) -> ManagedHandle<'scope, T> {
+  let ctx: Arc<SharedContext> = Default::default();
+  let (sender, receiver) =
+    mpsc::channel::<Box<dyn FnOnce(&mut T) + 'scope + Send>>();
+  let inner = scope.spawn({
+    let ctx = Arc::clone(&ctx);
+    move || {
+      SharedContext::set(Arc::clone(&ctx));
+      for f in receiver {
+        f(&mut state);
+        let mut guard = ctx.state.lock().unwrap();
+        assert_eq!(*guard, State::Running);
+        *guard = State::Ready;
+        ctx.cv.notify_all()
+      }
+    }
+  });
+  ManagedHandle { inner, ctx, sender }
+}
+
+impl<'scope, T> ManagedHandle<'scope, T> {
+  pub fn is_paused(&self) -> bool {
+    let guard = self.ctx.state.lock().unwrap();
+    *guard == State::Paused
+  }
+
+  pub fn unpause(&self) {
+    let mut guard = self.ctx.state.lock().unwrap();
+    assert_eq!(*guard, State::Paused);
+    *guard = State::Running;
+    self.ctx.cv.notify_all();
+    guard = self
+      .ctx
+      .cv
+      .wait_while(guard, |state| *state == State::Running)
+      .unwrap();
+  }
+
+  pub fn submit<F: FnOnce(&mut T) + Send + 'scope>(&self, f: F) {
+    let mut guard = self.ctx.state.lock().unwrap();
+    assert_eq!(*guard, State::Ready);
+    *guard = State::Running;
+    self.sender.send(Box::new(f)).unwrap();
+    guard = self
+      .ctx
+      .cv
+      .wait_while(guard, |state| *state == State::Running)
+      .unwrap();
+  }
+
+  pub fn join(self) {
+    while self.is_paused() {
+      self.unpause();
+    }
+    drop(self.sender);
+    self.inner.join().unwrap();
+  }
+}
+ +
+

System under test, not-exactly-atomic counter:

+ +
+ + +
use std::sync::atomic::Ordering::SeqCst;
+
+#[cfg(test)]
+use managed_thread::AtomicU32;
+#[cfg(not(test))]
+use std::sync::atomic::AtomicU32;
+
+#[derive(Default)]
+pub struct Counter {
+  value: AtomicU32,
+}
+
+impl Counter {
+  pub fn increment(&self) {
+    let value = self.value.load(SeqCst);
+    self.value.store(value + 1, SeqCst);
+  }
+
+  pub fn get(&self) -> u32 {
+    self.value.load(SeqCst)
+  }
+}
+ +
+

And the test itself:

+ +
+ + +
#[test]
+fn test_counter() {
+  arbtest::arbtest(|rng| {
+    eprintln!("begin trace");
+    let counter = Counter::default();
+    let mut counter_model: u32 = 0;
+
+    std::thread::scope(|scope| {
+      let t1 = managed_thread::spawn(scope, &counter);
+      let t2 = managed_thread::spawn(scope, &counter);
+      let mut threads = [t1, t2];
+
+      while !rng.is_empty() {
+        for (tid, t) in threads.iter_mut().enumerate() {
+          if rng.arbitrary()? {
+            if t.is_paused() {
+              eprintln!("{tid}: unpause");
+              t.unpause()
+            } else {
+              eprintln!("{tid}: increment");
+              t.submit(|c| c.increment());
+              counter_model += 1;
+            }
+          }
+        }
+      }
+
+      for t in threads {
+        t.join();
+      }
+      assert_eq!(counter_model, counter.get());
+
+      Ok(())
+    })
+  });
+}
+ +
+

Running it identifies a failure:

+ +
+ + +
---- test_counter stdout ----
+begin trace
+0: increment
+1: increment
+0: unpause
+1: unpause
+1: unpause
+0: unpause
+0: unpause
+1: unpause
+0: unpause
+0: increment
+1: unpause
+0: unpause
+1: increment
+0: unpause
+0: unpause
+1: unpause
+0: unpause
+thread 'test_counter' panicked at src/lib.rs:56:7:
+assertion `left == right` failed
+  left: 4
+ right: 3
+
+arbtest failed!
+    Seed: 0x4fd7ddff00000020
+ +
+

Which is something we got like 5% into this article already, with normal threads! But theres +more to this failure. First, it is reproducible. If I specify the same seed, I get the exact same +interleaving:

+ +
+ + +
#[test]
+fn test_counter() {
+  arbtest::arbtest(|rng| {
+    eprintln!("begin trace");
+    ...
+  })
+    .seed(0x71aafcd900000020);
+}
+ +
+

And this is completely machine independent! If you specify this seed, youll get exact same +interleaving. So, if I am having trouble debugging this, I can DM you this hex in Zulip, and +youll be able to help out!

+

But theres more we dont need to debug this failure, we can minimize it!

+ +
+ + +
#[test]
+fn test_counter() {
+  arbtest::arbtest(|rng| {
+    eprintln!("begin trace");
+    ...
+  })
+    .seed(0x71aafcd900000020)
+    .minimize();
+}
+ +
+

This gives me the following minimization trace:

+ +
+ + +
begin trace
+0: increment
+1: increment
+0: unpause
+1: unpause
+1: unpause
+0: unpause
+0: unpause
+1: unpause
+0: unpause
+0: increment
+1: unpause
+0: unpause
+1: increment
+0: unpause
+0: unpause
+1: unpause
+0: unpause
+seed 0x4fd7ddff00000020, seed size 32, search time 106.00ns
+
+begin trace
+0: increment
+1: increment
+0: unpause
+0: unpause
+1: unpause
+0: unpause
+1: unpause
+0: unpause
+1: unpause
+1: unpause
+1: increment
+seed 0x540c0c1c00000010, seed size 16, search time 282.16µs
+
+begin trace
+0: increment
+1: increment
+0: unpause
+1: unpause
+1: unpause
+1: unpause
+seed 0x084ca71200000008, seed size 8, search time 805.74µs
+
+begin trace
+0: increment
+1: increment
+0: unpause
+1: unpause
+seed 0x5699b19400000004, seed size 4, search time 1.44ms
+
+begin trace
+0: increment
+1: increment
+0: unpause
+1: unpause
+seed 0x4bb0ea5c00000002, seed size 2, search time 4.03ms
+
+begin trace
+0: increment
+1: increment
+0: unpause
+1: unpause
+seed 0x9c2a13a600000001, seed size 1, search time 4.31ms
+
+minimized
+seed 0x9c2a13a600000001, seed size 1, search time 100.03ms
+ +
+

That is, we ended up with this tiny, minimal example:

+ +
+ + +
#[test]
+fn test_counter() {
+  arbtest::arbtest(|rng| {
+    eprintln!("begin trace");
+    ...
+  })
+    .seed(0x9c2a13a600000001);
+}
+ +
+ +
+ + +
begin trace
+0: increment
+1: increment
+0: unpause
+1: unpause
+ +
+

And this is how you properly test concurrent data structures.

+
+
+ +

+ Postscript +

+

Of course, this is just a toy. But you can see some ways to extend it. For example, right now our +AtomicU32 just delegates to the real one. But what you could do instead is, for each atomic, to +maintain a set of values written and, on read, return an arbitrary written value consistent with a +weak memory model.

+

You could also be smarter with exploring interleavings. Instead of interleaving threads at random, +like we do here, you can try to apply model checking approaches and prove that you have considered +all meaningfully different interleavings.

+

Or you can apply the approach from Generate All The +Things and exhaustively +enumerate all interleavings for up to, say, five increments. In fact, why dont we just do this?

+

$ cargo add exhaustigen

+ +
+ + +
#[test]
+fn exhaustytest() {
+  let mut g = exhaustigen::Gen::new();
+  let mut interleavings_count = 0;
+
+  while !g.done() {
+    interleavings_count += 1;
+    let counter = Counter::default();
+    let mut counter_model: u32 = 0;
+
+    let increment_count = g.gen(5) as u32;
+    std::thread::scope(|scope| {
+      let t1 = managed_thread::spawn(scope, &counter);
+      let t2 = managed_thread::spawn(scope, &counter);
+
+      'outer: while t1.is_paused()
+        || t2.is_paused()
+        || counter_model < increment_count
+      {
+        for t in [&t1, &t2] {
+          if g.flip() {
+            if t.is_paused() {
+              t.unpause();
+              continue 'outer;
+            }
+            if counter_model < increment_count {
+              t.submit(|c| c.increment());
+              counter_model += 1;
+              continue 'outer;
+            }
+          }
+        }
+        return for t in [t1, t2] {
+          t.join()
+        };
+      }
+
+      assert_eq!(counter_model, counter.get());
+    });
+  }
+  eprintln!("interleavings_count = {:?}", interleavings_count);
+}
+ +
+

The shape of the test is more or less the same, except that we need to make sure that there are no +“dummy iterations, and that we always either unpause a thread or submit an increment.

+

It finds the same bug, naturally:

+ +
+ + +
thread 'exhaustytest' panicked at src/lib.rs:103:7:
+assertion `left == right` failed
+  left: 2
+ right: 1
+ +
+

But the cool thing is, if we fix the issue by using atomic increment,

+ +
+ + +
impl AtomicU32 {
+  pub fn fetch_add(
+    &self,
+    value: u32,
+    ordering: Ordering,
+  ) -> u32 {
+    pause();
+    let result = self.inner.fetch_add(value, ordering);
+    pause();
+    result
+  }
+}
+
+impl Counter {
+  pub fn increment(&self) {
+    self.value.fetch_add(1, SeqCst);
+  }
+}
+ +
+

we can get a rather specific correctness statements out of our test, that any sequence of at +most five increments is correct:

+ +
+ + +
$ t cargo t -r -- exhaustytest --nocapture
+running 1 test
+all 81133 interleavings are fine!
+test exhaustytest ... ok
+
+real 8.65s
+cpu  8.16s (2.22s user + 5.94s sys)
+rss  63.91mb
+ +
+

And the last small thing. Recall that our PBT minimized the first sequence it found :

+ +
+ + +
begin trace
+0: increment
+1: increment
+0: unpause
+1: unpause
+1: unpause
+0: unpause
+0: unpause
+1: unpause
+0: unpause
+0: increment
+1: unpause
+0: unpause
+1: increment
+0: unpause
+0: unpause
+1: unpause
+0: unpause
+thread 'test_counter' panicked at src/lib.rs:56:7:
+assertion `left == right` failed
+  left: 4
+ right: 3
+
+arbtest failed!
+    Seed: 0x4fd7ddff00000020
+ +
+

down to just

+ +
+ + +
begin trace
+0: increment
+1: increment
+0: unpause
+1: unpause
+thread 'test_counter' panicked at src/lib.rs:57:7:
+assertion `left == right` failed
+  left: 2
+ right: 1
+
+arbtest failed!
+    Seed: 0x9c2a13a600000001
+ +
+

But we never implemented shrinking! How is this possible? Well, strictly speaking, this is out of +scope for this post. And Ive already described this +elsewhere. And, at 32k, this is the +third-longest post on this blog. And its 3AM here in Lisbon right now. But of course Ill explain!

+

The trick is the simplified hypothesis +approach. The +arbtest PBT library we use in this post is based on a +familiar interface of a PRNG:

+ +
+ + +
arbtest::arbtest(|rng| {
+  let random_int: usize = rng.int_in_range(0..=100)?;
+  let random_bool: bool = rng.arbitrary()?;
+  Ok(())
+});
+ +
+

But theres a twist! This is a finite PRNG. So, if you ask it to flip a coin it can give you +heads. And next time it might give you tails. But if you continue asking it for more, at some point +itll give you Err(OutOfEntropy).

+

Thats why all these ? and the outer loop of +while !rng.is_empty() {.

+

In other words, as soon as the test runs out of entropy, it short-circuits and completes. And that +means that by reducing the amount of entropy available the test becomes shorter, and this works +irrespective of how complex is the logic inside the test!

+

And entropy is a big scary word here, what actually happens is that the PRNG is just an &mut +&[u8] inside. That is, a slice of random bytes, which is shortened every time you ask for a random +number. And the shorter the initial slice, the simpler the test gets. Minimization can be this +simple!

+

You can find source code for this article at +https://github.com/matklad/properly-concurrent

+
+
+
+ + + + + diff --git a/2024/07/25/git-worktrees.html b/2024/07/25/git-worktrees.html new file mode 100644 index 00000000..20ab93de --- /dev/null +++ b/2024/07/25/git-worktrees.html @@ -0,0 +1,317 @@ + + + + + + + How I Use Git Worktrees + + + + + + + + + + + + +
+ +
+ +
+
+ +

How I Use Git Worktrees

+

There are a bunch of posts on the internet about using git worktree command. As far as I can tell, +most of them are primarily about using worktrees as a replacement of, or a supplement to git +branches. Instead of switching branches, you just change directories. This is also how I originally +had used worktrees, but that didnt stick, and I abandoned them. But recently worktrees grew +on me, though my new use-case is unlike branching.

+
+ +

+ When a Branch is Enough +

+

If you use worktrees as a replacement for branching, thats great, no need to change anything! But +let me start with explaining why that workflow isnt for me.

+

The principal problem with using branches is that its hard to context switch in the middle of doing +something. You have your branch, your commit, a bunch of changes in the work tree, some of them +might be stages and some unstaged. You cant really tell Git save all this context and restore it +later. The solution that Git suggests here is to use stashing, but thats awkward, as it is too +easy to get lost when stashing several things at the same time, and then applying the stash on top +of the wrong branch.

+

Managing Git state became much easier for me when I realized that the staging area and the stash are just bad +features, and life is easier if I avoid them. Instead, I just commit whatever and deal with +it later. So, when I need to switch a branch in the middle of things, what I do is, basically:

+ +
+ + +
$ git add .
+$ git commit -m.
+$ git switch another-branch
+ +
+

And, to switch back,

+ +
+ + +
$ git switch -
+
+# Undo the last commit, but keep its changes in the working tree
+$ git reset HEAD~
+ +
+

To make this more streamlined, I have a ggc utility which does commit all with a trivial message” +atomically.

+ +

And I dont always reset HEAD~ I usually just continue hacking with . in my Git log and then amend the commit +once I am satisfied with subset of changes

+ +

So thats how I deal with switching branches. But why worktrees then?

+
+
+ +

+ Worktree Per Concurrent Activity +

+

Its a bit hard to describe, but:

+
    +
  • +I have a fixed number of worktrees (5, to be exact) +
  • +
  • +worktrees are mostly uncorrelated to branches +
  • +
  • +but instead correspond to my concurrent activities during coding. +
  • +
+

Specifically:

+
    +
  • +

    The main worktree is a readonly worktree that contains a recent snapshot of the remote main +branch. I use this tree to compare the code I am currently working on and/or reviewing with the +master version (this includes things like how long the build takes, what is the behavior of +this test and the like, so not just the actual source code).

    +
  • +
  • +

    The work worktree, where I write most of the code. I often need to write new code and compare it +with old code at the same time. But cant actually work on two different things in parallel. +Thats why main and work are different worktrees, but work also constantly switches branches.

    +
  • +
  • +

    The review worktree, where I checkout code for code review. While I cant review code and write +code at the same time, there is one thing I am implementing, and one thing I am reviewing, but the +review and implementation proceed concurrently.

    +
  • +
  • +

    Then, theres the fuzz tree, where I run long-running fuzzing jobs for the code I am actively working +on. My overall idealized feature workflow looks like this:

    + +
    + + +
    # go to the `work` worktree
    +$ cd ~/projects/tigerbeetle/work
    +
    +# Create a new branch. As we work with a centralized repo,
    +# rather than personal forks, I tend to prefix my branch names
    +# with `matklad/`
    +$ git switch -c matklad/awesome-feature
    +
    +# Start with a reasonably clean slate.
    +# In reality, I have yet another script to start a branch off
    +# fresh from the main remote, but this reset is a good enough approximation.
    +$ git reset --hard origin/main
    +
    +# For more complicated features, I start with an empty commit
    +# and write the commit message _first_, before starting the work.
    +# That's a good way to collect your thoughts and discover dead
    +# ends more gracefully than hitting a brick wall coding at 80 WPM.
    +$ git commit --allow-empty
    +
    +# Hack furiously writing throughway code.
    +$ code .
    +
    +# At this point, I have something that I hope works
    +# but would be embarrassed to share with anyone!
    +# So that's the good place to kick off fuzzing.
    +
    +# First, I commit everything so far.
    +# Remember, I have `ggc` one liner for this:
    +$ git add . && git commit -m.
    +
    +# Now I go to my `fuzz` worktree and kick off fuzzing.
    +# I usually split screen here.
    +# On the left, I copy the current commit hash.
    +# On the right, I switch to the fuzzing worktree,
    +# switch to the copied commit, and start fuzzing:
    +
    +$ git add . && git commit -m.  |
    +$ git rev-parse HEAD | ctrlc   | $ cd ../fuzz
    +$                              | $ git switch -d $(ctrlv)
    +$                              | $ ./zig/zig build fuzz
    +$                              |
    +
    +# While the fuzzer hums on the right, I continue to furiously refactor
    +# the code on the left and hammer my empty commit with a wishful
    +# thinking message and my messy code commit with `.` message into
    +# a semblance of clean git history
    +
    +$ code .
    +$ magit-goes-brrrrr
    +
    +# At this point, in the work tree, I am happy with both the code
    +# and the Git history, so, if the fuzzer on the right is happy,
    +# a PR is opened!
    +
    +$                              |
    +$ git push --force-with-lease  | $ ./zig/zig build fuzz
    +$ gh pr create --web           | # Still hasn't failed
    +$                              |
    + +
    +

    This is again concurrent: I can hack on the branch while the fuzzer tests the same code. Note +that it is crucial that the fuzzing tree operates in the detached head state (-d flag for git +switch). In general, -d is very helpful with this style of worktree work. I am also +sympathetic to the argument that, like the staging area +and the stash, Git branches are a misfeature, but I havent made the plunge personally yet.

    +
  • +
  • +

    Finally, the last tree I have is scratch this is a tree for arbitrary random things I need +to do while working on something else. For example, if I am working on matklad/my-feature in +work, and reviewing #6292 in review, and, while reviewing, notice a tiny unrelated typo, the +PR for that typo is quickly prepped in the scratch worktree:

    + +
    + + +
    $ cd ../scratch
    +$ git switch -c matklad/quick-fix
    +$ code . && git add . && git commit -m 'typo' && git push
    +$ cd -
    + +
    +
  • +
+

TL;DR: consider using worktrees not as a replacement for branches, but as a means to manage +concurrency in your tasks. My level of concurrency is:

+
    +
  • +main for looking at the pristine code, +
  • +
  • +work for looking at my code, +
  • +
  • +review for looking at someone elses code, +
  • +
  • +fuzz for my computer to look at my code, +
  • +
  • +scratch for everything else! +
  • +
+
+
+
+ + + + + diff --git a/2024/08/01/primitive-recursive-functions.html b/2024/08/01/primitive-recursive-functions.html new file mode 100644 index 00000000..b10dc528 --- /dev/null +++ b/2024/08/01/primitive-recursive-functions.html @@ -0,0 +1,1404 @@ + + + + + + + Primitive Recursive Functions For A Working Programmer + + + + + + + + + + + + +
+ +
+ +
+
+ +

Primitive Recursive Functions For A Working Programmer

+

Programmers on the internet often use Turing-completeness terminology. Typically, not being +Turing-complete is extolled as a virtue or even a requirement in specific domains. I claim that most +such discussions are misinformed that not being Turing complete doesnt actually mean what folks +want it to mean, and is instead a stand-in for a bunch of different practically useful properties, +which are mostly orthogonal to actual Turing completeness.

+

While I am generally descriptivist in nature and am ok with words losing their original meaning +as long as the new meaning is sufficiently commonly understood, Turing completeness is a hill I will +die on. It is a term from math, it has a very specific meaning, and you are not allowed to +re-purpose it for anything else, sorry!

+

I understand why this happens: to really understand what Turing completeness is and is not you need +to know one (simple!) theoretical result about so-called primitive recursive functions. And, +although this result is simple, I was only made aware of it in a fairly advanced course during my +masters. Thats the CS education deficiency I want to rectify you cant teach students the +halting problem without also teaching them about primitive recursion!

+

The post is going to be rather meaty, and will be split in three parts:

+

In Part I, I give a TL;DR for the theoretical result and some of its consequences. Part II is going +to be a whirlwind tour of Turing Machines, Finite State Automata and Primitive Recursive Functions. +And then Part III will circle back to practical matters.

+

If math makes you slightly nauseous, you might to skip Part II. But maybe give it a try? The math +well need will be baby math from first principles, without reference to any advanced results.

+
+ +

+ Part I: TL;DR +

+

Heres the key result suppose you have a program in some Turing complete language, and you also +know that its not too slow. Suppose it runs faster than +O(22N). +That is, two to the power of two to the power of N, a very large number. In this case, you can +implement this algorithm in a non-Turing complete language.

+

Most practical problems fall into this faster than two to the two to the power of two space. +Hence it follows that you dont need the full power of a Turing Machine to tackle them. Hence, a +language not being Turing complete doesnt in any way restrict you in practice, or give you extra +powers to control the computation.

+

Or, to restate this: in practice, a program which doesnt terminate, and a program that needs a +billion billion steps to terminate are equivalent. Making something non-Turing complete by itself +doesnt help with the second problem in any way. And theres a trivial approach that solves the +first problem for any existing Turing-complete language in the implementation, count the steps +and bail with an error after a billion.

+
+
+ +

+ Part II: Weird Machines +

+

The actual theoretical result is quite a bit more general than that. It is (unsurprisingly) +recursive:

+ +
+

If a function is computed by a Turing Machine, and the runtime of this machine is bounded by some +primitive recursive function of input, then the original function itself can be written as a +primitive recursive function.

+
+ +
+

It is expected that this sounds like gibberish at this point! So lets just go and prove this thing, +right here in this blog post! Will work up slowly towards this result. The plan is as follows:

+
    +
  • +First, to brush up notation, well define Finite State Machines. +
  • +
  • +Second, well turn our humble Finite State Machine into the all-powerful Turing Machine (spoiler +— a Turing Machine is an FSM with a pair of stacks), and, as is customary, wave our hands about +the Universal Turing Machine. +
  • +
  • +Third, we leave the cozy world of imperative programming and define primitive recursive +functions. +
  • +
  • +Finally, well talk about the relative computational power of TMs and PRFs, including the teased +up result and more! +
  • +
+
+
+ +

+ Finite State Machines +

+

Finite State Machines are simple! An FSM takes a string as input, and returns a binary +answer, yes or no. Unsurprisingly an FSM has a finite number of states: Q0, Q1, , Qn. +A subset of states are designated as yes states, the rest are no states. Theres also one +specific starting state.

+

The behavior of the state machine is guided by a transition (step) function, s. This function +takes the current state of FSM, the next symbol of input, and returns a new state.

+

The semantics of FSM is determined by repeatably applying the single step function for all symbols of +the input, and noting whether the final state is a yes state or a no state.

+

Heres an FSM which accepts only strings of zeros and ones of even length:

+ +
+ + +
States:     { Q0, Q1 }
+Yes States: { Q0 }
+Start State:  Q0
+
+s :: State -> Symbol -> State
+s Q0 0 = Q1
+s Q0 1 = Q1
+s Q1 0 = Q0
+s Q1 1 = Q0
+ +
+

This machine ping-pongs between states Q0 and Q1 ends up in Q0 only for inputs of even length +(including an empty input).

+

What can FSMs do? As they give a binary answer, they are recognizers they dont compute +functions, but rather just characterize certain sets of strings. A famous result is that the +expressive power of FSMs is equivalent to the expressive power of regular expressions. If you can +write a regular expression for it, you could also do an FSM!

+

There are also certain things that state machines cant do. For example they cant enter an infinite +loop. Any FSM is linear in the input size and always terminates. But there are much more specific +sets of strings that couldnt be recognized by an FSM. Consider this set:

+ +
+ + +
1
+010
+00100
+0001000
+...
+ +
+

That is, an infinite set which contains 1s surrounded by the equal number of 0s on the both +sides. Lets prove that there isnt a state machine that recognizes this set!

+

As usually, suppose there is such a state machine. It has a certain number of states maybe a +dozen, maybe a hundred, maybe a thousand, maybe even more. But lets say fewer than a million. +Then, lets take a string which looks like a million zeros, followed by one, followed by million +zeros. And lets observe our FSM eating this particular string.

+

First of all, because the string is in fact a one surrounded by the equal number of zeros on both +sides, the FSM ends up in a yes state. Moreover, because the length of the string is much greater +than the number of states in the state machine, the state machine necessarily visits some state twice. +There is a cycle, where the machine goes from A to B to C to D and back to A. This cycle might be +pretty long, but its definitely shorter than the total number of states we have.

+

And now we can fool the state machine. Lets make it eat our string again, but this time, once it +completes the ABCDA cycle, well force it to traverse this cycle again. That is, the original cycle +corresponds to some portion of our giant string:

+ +
+ + +
0000 0000000000000000000 00 .... 1 .... 00000
+     <- cycle portion ->
+ +
+

If we duplicate this portion, our string will no longer look like one surrounded by equal number of +twos, but the state machine will still in the yes state. Which is a contradiction that completes +the proof.

+
+
+ +

+ Turing Machine: Definition +

+

A Turing Machine is only slightly more complex than an FSM. Like an FSM, a TM has a bunch of states +and a single-step transition function. While an FSM has an immutable input which is being fed to it +symbol by symbol, a TM operates with a mutable tape. The input gets written to the tape at the +start. At each step, a TM looks at the current symbol on the tape, changes its state according to a +transition function and, additionally:

+
    +
  • +Replaces the current symbol with a new one (which might or might not be different). +
  • +
  • +Moves the reading head that points at the current symbol one position to the left or to the right. +
  • +
+

When a machine reaches a designated halt state, it stops, and whatever is written on the tape at +that moment is the result. That is, while FSMs are binary recognizers, TMs are functions. Keep in +mind that a TM does not necessarily stop. It might be the case that a TM goes back and forth over the +tape, overwrites it, changes its internal state, but never quite gets to the final state.

+

Heres an example Turing Machine:

+ +
+ + +
States:  {A, B, C, H}
+Start State: A
+Final State: H
+
+s :: State -> Symbol -> (State, Symbol, Left | Right)
+s A 0 = (B, 1, Right)
+s A 1 = (H, 1, Right)
+s B 0 = (C, 0, Right)
+s B 1 = (B, 1, Right)
+s C 0 = (C, 1, Left)
+s C 1 = (A, 1, Left)
+ +
+

If the configuration of the machine looks like this:

+ +
+ + +
000010100000
+     ^
+     A
+ +
+

Then we are in the s A 0 = (B, 1, Right) case, so we should change the state to B, replace 0 with +1, and move to the right:

+ +
+ + +
000011100000
+      ^
+      B
+ +
+
+
+ +

+ Turing Machine: Programming +

+

There are a bunch of fiddly details to Turing Machines!

+

The tape is conceptually infinite, so beyond the input, everything is just zeros. This creates a +problem: it might be hard to say where the input (or the output) ends! There are a couple of +technical solutions here. One is to say that there are three different symbols on the tape — +zeros, ones, and blanks, and require that the tape is initialized with blanks. A different solution +is to invent some encoding scheme. For example, we can say that the input is a sequence of 8-bit +bytes, without interior null bytes. So, eight consecutive zeros at a byte boundary designate the end +of input/output.

+

Its useful to think about how this byte-oriented TM could be implemented. We could have one large +state for each byte of input. So, Q142 would mean that the head is on the byte with value 142. And +then well have a bunch of small states to read out the current byte. Eg, we start reading a byte in +state S. Depending on the next bit we move to S0 or S1, then to S00, or S01, etc. Once we reached +something like S01111001, we move back 8 positions and enter state Q121. This is one of the patterns +of Turing Machine programming while your main memory is the tape, you can represent some +constant amount of memory directly in the states.

+

What weve done here is essentially lowering a byte-oriented Turing Machine to a bit-oriented +machine. So, we could think only in terms of big states operating on bytes, as we know the general +pattern for converting that to direct bit-twiddling.

+

With this encoding scheme in place, we now can feed arbitrary files to a Turing Machine! Which will +be handy to the next observation:

+

You cant actually program a Turing Machine. What I mean is that, counter-intuitively, there isnt +some user-supplied program that a Turing Machine executes. Rather, the program is hard-wired into +the machine. The transition function is the program.

+

But with some ingenuity we can regain our ability to write programs. Recall that weve just learned +to feed arbitrary files to a TM. So what we could do is to write a text file that specifies a TM and +its input, and then feed that entire file as an input to an interpreter Turing Machine which would +read the file, and act as the machine specified there. A Turing Machine can have an eval +function.

+

Is such an interpreter Turing Machine possible? Yes! And it is not hard: if you spend a couple of hours +programming Turing Machines by hand, youll see that you pretty much can do anything you can do +numbers, arithmetic, loops, control flow. Its just very very tedious.

+

So lets just declare that weve actually coded up this Universal Turing Machine which simulates a +TM given to it as an input in a particular encoding.

+

This sort of construct also gives rise to the Church-Turing thesis. We have a TM which can run other +TMs. And you can implement a TM interpreter in something like Python. And, with a bit of legwork, +you could also implement a Python interpreter as a TM (you likely want to avoid doing that +directly, and instead do a simpler interpreter for WASM, and then use a Python interpreter compiled +to WASM). This sort of bidirectional interpretation shows that Python and TMs have equivalent +computing power. Moreover, its quite hard to come up with a reasonable computational device which +is more powerful than a Turing Machine.

+

There are computational devices that are strictly weaker than TMs though. Recall FSMs. By this point, +it should be obvious that a TM can simulate an FSM. Everything a Finite State Machine can do, a +Turing Machine can do as well. And it should be intuitively clear that a TM is more powerful than an +FSM. An FSM gets to use only a finite number of states. A TM has these same states, but it also posses +a tape which serves like an infinitely sized external memory.

+

Directly proving that you cant encode a Universal Turing Machine as an FSM sounds complicated, +so lets prove something simpler. Recall that we have established that theres no FSM that accepts +only ones surrounded by an equal number of zeros on both sides (because a sufficiently large word +of this form would necessary enter a cycle in a state machine, which could then be further pumped). +But its actually easy to write a Turing Machine that does this:

+
    +
  • +Erase zero (at the left side of the tape) +
  • +
  • +Go to the right end of the tape +
  • +
  • +Erase zero +
  • +
  • +Go to the left side of the tape +
  • +
  • +Repeat +
  • +
  • +If whats left is a single 1 the answer is yes, otherwise it is a no” +
  • +
+

We found a specific problem that can be solved by a TM, but is out of reach of any FSM. So it +necessarily follows that there isnt an FSM that can simulate an arbitrary TM.

+

It is also useful to take a closer look at the tape. It is a convenient skeuomorphic abstraction +which makes the behavior of the machine intuitive, but it is inconvenient to implement in a normal +programming language. There isnt a standard data structure that behaves just like a tape.

+

One cool practical trick is to simulate the tape as a pair of stacks. Take this:

+ +
+ + +
Tape: A B C D E F G
+Head:     ^
+ +
+

And transform it to something like this:

+ +
+ + +
Left Stack:  [A, B, C]
+Right Stack: [G, F, E, D]
+ +
+

That is, everything to the left of the head is one stack, everything to the right, reversed, is the +other. Here, moving the reading head left or right corresponds to popping a value off one stack and +pushing it onto another.

+

So, an equivalent-in-power definition would be to say that a TM is an FSM endowed with two +stacks.

+

This of course creates an obvious question: is an FSM with just one stack a thing? Yes! It would be +called a pushdown automaton, and it would correspond to context-free languages. But thats beyond +the scope of this post!

+

Theres yet another way to look at the tape, or the pair of stacks, if the set of symbols is 0 and +1. You could say that a stack is just a number! So, something like +[1, 0, 1, 1] +will be +1 + 2 + 8 = 11. +Looking at the top of the stack is stack % 2, removing an item from the stack is stack / 2 and +pushing x onto the stack is stack * 2 + x. We wont need this right now, so just hold onto this +for a brief moment.

+
+
+ +

+ Turing Machine: Limits +

+

Ok, so we have some idea about the lower bound for the power of a Turing Machine FSMs are strictly +less expressive. What about the opposite direction? Is there some computation that a Turing Machine +is incapable of doing?

+

Yes! Lets construct a function which maps natural numbers to natural numbers, which cant be +implemented by a Turing Machine. Recall that we can encode an arbitrary Turing Machine as text. That +means that we can actually enumerate all possible Turing Machines, and write them in a giant line, +from the most simple Turing Machine to more complex ones:

+ +
+ + +
TM_0
+TM_1
+TM_2
+...
+TM_326
+...
+ +
+

This is of course going to be an infinite list.

+

Now, lets see how TM0 behaves on input 0: it either prints something, or doesnt terminate. Then, +note how TM1 behaves on input 1, and generalizing, create function f that behaves as the nth TM +on input n. It might look something like this:

+ +
+ + +
f(0) = 0
+f(1) = 111011
+f(2) = doesn't terminate
+f(3) = 0
+f(4) = 101
+...
+ +
+

Now, lets construct function g which is maximally diffed from f: where f gives 0, g will +return 1, and it will return 0 in all other cases:

+ +
+ + +
g(0) = 1
+g(1) = 0
+g(2) = 0
+g(3) = 1
+g(4) = 0
+...
+ +
+

There isnt a Turing machine that computes g. For suppose there is. Then, it exists in our list of +all Turing Machines somewhere. Lets say it is TM1000064. So, if we feed 0 to it, it will return +g(0), which is 1, which is different from f(0). And the same holds for 1, and 2, and 3. +But once we get to g(1000064), we are in trouble, because, by the definition of g, g(1000064) +is different from what is computed by TM1000064! So such a machine is impossible.

+

Those math savvy might express this more succinctly theres a countably-infinite number of +Turing Machines, and an uncountably-infinite number of functions. So there must be some functions +which do not have a corresponding Turing Machine. It is the same proof the diagonalization +argument is hiding in the claim that the set of all functions is an uncountable set.

+

But this is super weird and abstract. Lets rather come up with some very specific problem which +isnt solvable by a Turing Machine. The halting problem: given source code for a Turing Machine and +its input, determine if the machine halts on this input eventually.

+

As we have waved our hands sufficiently vigorously to establish that Python and Turing Machines have +equivalent computational power, I am going to try to solve this in Python:

+ +
+ + +
def halts(program_source_code: str, program_input: str) -> Bool:
+    # One million lines of readable, but somewhat
+    # unsettling and intimidating Python code.
+    return the_answer
+
+raw_input = input()
+[program_source_code, program_input] = parse(raw_input)
+print("Yes" if halts(program_source_code, program_input) else "No")
+ +
+

Now, I will do a weird thing and start asking whether a program terminates, if it is fed its own +source code, in a reverse-quine of sorts:

+ +
+ + +
def halts_on_self(program_source_code: str) -> Bool:
+    program_input = program_source_code
+    return halts(program_source_code, program_input)
+ +
+

and finally I construct this weird beast of a program:

+ +
+ + +
def halts(program_source_code: str, program_input: str) -> Bool:
+    # ...
+    return the_answer
+
+def halts_on_self(program_source_code: str) -> Bool:
+    program_input = program_source_code
+    return halts(program_source_code, program_input)
+
+def weird(program_input):
+    if halts_on_self(program_input):
+        while True:
+            pass
+
+weird(input())
+ +
+

To make this even worse, Ill feed the text of this weird program to itself. Does it terminate +with this input? Well, if it terminates, and if our halts function is implemented correctly, then +the halts_on_self(program_input) invocation above returns True. But then we enter the infinite +loop and dont actually terminate.

+

Hence, it must be the case that weird does not terminate when self-applied. But then +halts_on_self returns False, and it should terminate. So we get a contradiction both ways. Which +necessarily means that either our halts sometimes returns a straight-up incorrect answer, or that it +sometimes does not terminate.

+

So this is the flip side of a Turing Machines power it is so powerful that it becomes impossible +to tell whether itll terminate or not!

+

It actually gets much worse, because this result can be generalized to an unreasonable degree! +In general, theres very little we can say about arbitrary programs.

+

We can easily check syntactic properties (is the program text shorter than 4 kilobytes?), but they +are, in some sense, not very interesting, as they depend a lot on how exactly one writes a program. +It would be much more interesting to check some refactoring-invariant properties, which hold when +you change the text of the program, but leave the behavior intact. Indeed, does this change +preserve behavior? would be one very useful property to check!

+

So lets define two TMs to be equivalent, if they have identical behavior. That is, for each +specific input, either both machines dont terminate, or they both halt, and give identical results.

+

Then, our refactoring-invariant properties are, by definition, properties that hold (or do not hold) +for the entire classes of equivalence of TMs.

+

And a somewhat depressing result here is that there are no non-trivial refactoring-invariant +properties that you can algorithmically check.

+

Suppose we have some magic TM, called P, which checks such a property. Lets show that, using P, we can +solve the problem we know we can not solve the halting problem.

+

Consider a Turing Machine that is just an infinite loop and never terminates, M1. P might or might +not hold for it. But, because P is non-trivial (it holds for some machines and doesnt hold for some +machines), theres some different machine M2 which differs from M1 with respect to P. That is, +P(M1) xor P(M2) holds.

+

Lets use these M1 and M2 to figure out whether a given machine M halts on input I. Using Universal +Turing Machine (interpreter), we can construct a new machine, M12 that just runs M on input I, then +erases the contents of the tape and runs M2. Now, if M halts on I, then the resulting machine M12 is +behaviorally-equivalent to M2. If M doesnt halt on I, then the result is equivalent to the infinite +loop program, M1. Or, in pseudo-code:

+ +
+ + +
def M1(input):
+    while True:
+        pass
+
+def M2(input):
+    # We don't actually know what's here
+    # but we know that such a machine exists.
+
+assert(P(M1) != P(M2))
+
+def halts(M, I):
+    def M12(input):
+        M(I) # might or might not halt
+        return M2(input)
+
+    return P(M12) == P(M2)
+ +
+

This is pretty bad and depressing we cant learn anything meaningful about an arbitrary Turing +Machine! So lets finally get to the actual topic of todays post:

+
+
+ +

+ Primitive Recursive Functions +

+

This is going to be another computational device, like FSMs and TMs. Like an FSM, its going to be a +nice, always terminating, non-Turing complete device. But it will turn out to have quite a bit of +the power of a full Turing Machine!

+

However, unlike both TMs and FSMs, Primitive Recursive Functions are defined directly as +functions which take a tuple of natural numbers and return a natural number. The two simplest ones +are zero (that is, zero-arity function that returns 0) and succ a unary function that +just adds 1. Everything else is going to get constructed out of these two:

+ +
+ + +
zero = 0
+succ(x) = x + 1
+ +
+

One way we are allowed to combine these functions is by composition. So we can get all the constants +right off the bat:

+ +
+ + +
succ(zero) = 1
+succ(succ(zero)) = 2
+succ(succ(succ(zero))) = 3
+ +
+

We arent going to be allowed to use general recursion (because it can trivially non-terminate), +but we do get to use a restricted form of C-style loop. It is a bit fiddly to define formally! The +overall shape is LOOP(init, f, n).

+

Here, init and n are numbers the initial value of the accumulator and the total number of +iterations. The f is a unary function that specifies the loop body it takes the current value +of the accumulator and returns the new value. So

+ +
+ + +
LOOP(init, f, 0) = init
+LOOP(init, f, 1) = f(init)
+LOOP(init, f, 2) = f(f(init))
+LOOP(init, f, 3) = f(f(f(init)))
+ +
+

While this is similar to a C-style loop, the crucial difference here is that the total number of +iterations n is fixed up-front. Theres no way to mutate the loop counter in the loop body.

+

This allows us to define addition:

+ +
+ + +
add(x, y) = LOOP(x, succ, y)
+ +
+

Multiplication is trickier. Conceptually, to multiply x and y, we want to LOOP from zero, and +repeat add x y times. The problem here is that we cant write an add x function yet

+ +
+ + +
# Doesn't work, add is a binary function!
+mul(x, y) = LOOP(0, add, y)
+ +
+ +
+ + +
# Doesn't work either, no x in scope!
+add_x v = add(x, v)
+mul(x, y) = LOOP(0, add_x, y)
+ +
+

One way around this is to define LOOP as a family of operators, which can pass extra arguments to +the iteration function:

+ +
+ + +
LOOP0(init, f, 2) = f(f(init))
+LOOP1(c1, init, f, 2) = f(c1, f(c1, init))
+LOOP2(c1, c2, init, f, 2) = f(c1, c2, f(c1, c2, init))
+ +
+

That is, LOOP_N takes an extra n arguments, and passes them through to any invocation of the body +function. To express this idea a little bit more succinctly, lets just allow to partially apply +the second argument of LOOP. That is:

+
    +
  • +All our functions are going to be first order. All arguments are numbers, the result is a number. +There arent higher order functions, there arent closures. +
  • +
  • +The LOOP is not a function in our language its a builtin operator, a keyword. So, for +convenience, we allow passing partially applied functions to it. But semantically this is +equivalent to just passing in extra arguments on each iteration. +
  • +
+

Which finally allows us to write

+ +
+ + +
mul(x, y) = LOOP(0, add x, y)
+ +
+

Ok, so thats progress we made something as complicated as multiplication, and we still are in +the guaranteed-to-terminate land. Because each loop has a fixed number of iterations, everything +eventually finishes.

+

We can go on and define xy:

+ +
+ + +
pow(x, y) = LOOP(1, mul x, y)
+ +
+

And this in turn allows us to define a couple of concerning fast growing functions:

+ +
+ + +
pow_2(n) = pow(2, n)
+pow_2_2(n) = pow_2(pow_2(n))
+ +
+

Thats fun, but to do some programming, well need an if. Well get to it, but first well need +some boolean operations. We can encode false as 0 and true as 1. Then

+ +
+ + +
and(x, y) = mul(x, y)
+ +
+

But or creates a problem: well need a subtraction.

+ +
+ + +
or(x, y) = sub(
+  add(x, y),
+  mul(x, y),
+)
+ +
+

Defining sub is tricky, due to two problems:

+

First, we only have natural numbers, no negatives. This one is easy to solve well just define +subtraction to saturate.

+

The second problem is more severe I think we actually cant express subtraction given the set of +allowable operations so far. That is because all our operations are monotonic the result is +never less than the arguments. One way to solve this problem is to define the LOOP in such a way +that the body function also gets passed a second argument the current iteration. So, if you +iterate up to n, the last iteration will observe n - 1, and that would be the non-monotonic +operation that creates subtraction. But that seems somewhat inelegant to me, so instead I will just +add a pred function to the basis, and use that to add loop counters to our iterations.

+ +
+ + +
pred(0) = 0 # saturate
+pred(1) = 0
+pred(2) = 1
+...
+ +
+

Now we can say:

+ +
+ + +
sub(x, y) = LOOP(x, pred, y)
+
+and(x, y) = mul(x, y)
+or(x, y) = sub(
+  add(x, y),
+  mul(x, y)
+)
+not(x) = sub(1, x)
+
+if(cond, a, b) = add(
+  mul(a, cond),
+  mul(b, not(cond)),
+)
+ +
+

And now we can do a bunch of comparison operators:

+ +
+ + +
is_zero(x) = sub(1, x)
+
+# x >= y
+ge(x, y) = is_zero(sub(y, x))
+
+# x == y
+eq(x, y) = and(ge(x, y), ge(y, x))
+
+# x > y
+gt(x, y) = and(ge(x, y), not(eq(x, y)))
+
+# x < y
+lt(x, y) = gt(y, x)
+ +
+

With that we could implement modulus. To compute x % m we will start with x, and will be +subtracting m until we get a number smaller than m. Well need at most x iterations for that.

+

In pseudo-code:

+ +
+ + +
def mod(x, m):
+  current = x
+
+  for _ in 0..x:
+    if current < m:
+      current = current
+    else:
+      current = current - m
+
+  return current
+ +
+

And as a bona fide PRF:

+ +
+ + +
mod_iter(m, x) = if(
+  lt(x, m),
+  x,        # then
+  sub(x, m) # else
+)
+mod(x, m) = LOOP(x, mod_iter m, x)
+ +
+

Thats a curious structure rather than computing the modulo directly, we essentially search for +it using trial and error, and relying on the fact that the search has a clear upper bound.

+

Division can be done similarly: to divide x by y, start with 0, and then repeatedly add one to the +accumulator until the product of the accumulator and y exceeds x:

+ +
+ + +
div_iter x y acc = if(
+  le(mul(succ(acc), y), y),
+  succ(acc), # then
+  acc        # else
+)
+div(x, y) = LOOP(0, div_iter x y, x)
+ +
+

This really starts to look like programming! One thing we are currently missing are data structures. +While our functions take multiple arguments, they only return one number. But its easy enough to +pack two numbers into one: to represent an (a, b) pair, well use 2a 3b number:

+ +
+ + +
mk_pair(a, b) = mul(pow(2, a), pow(3, b))
+ +
+

To deconstruct such a pair into its first and second components, we need to find the maximum power +of 2 or 3 that divides our number. Which is exactly the same shape we used to implement div:

+ +
+ + +
max_factor_iter p m acc = if(
+  is_zero(mod(p, pow(m, succ(acc)))),
+  succ(acc), # then
+  acc,       # else
+)
+max_factor(p, m) = LOOP(0, max_factor_iter p m, p)
+
+fst(p) = max_factor(p, 2)
+snd(p) = max_factor(p, 3)
+ +
+

Here again we use the fact that the maximal power of two that divides p is not larger than p +itself, so we can over-estimate the number of iterations well need as p.

+

Using this pair construction, we can finally add a loop counter to our LOOP construct. To track +the counter, we pack it as a pair with the accumulator:

+ +
+ + +
LOOP(mk_pair(init, 0), f, n)
+ +
+

And then inside f, we first unpack that pair into accumulator and counter, pass them to actual loop +iteration, and then pack the result again, incrementing the counter:

+ +
+ + +
f acc = mk_pair(
+  g(fst(acc), snd(acc)),
+  succ(snd(acc)),
+)
+ +
+

Ok, so we have achieved something remarkable: while we are writing terminating-by-construction +programs, which are definitely not Turing complete, we have constructed basic programming staples, +like boolean logic and data structures, and we have also built some rather complicated mathematical +functions, like 22N.

+

We could try to further enrich our little primitive recursive kingdom by adding more and more +functions on an ad hoc basis, but lets try to be really ambitious and go for the main prize — +simulating Turing Machines.

+

We know that we will fail: Turing machines can enter an infinite loop, but PRFs necessarily terminate. +That means, that, if a PRF were able to simulate an arbitrary TM, it would have to say after a certain +finite amount of steps that this TM doesnt terminate. And, while we didnt do this, its easy to +see that you could simulate the other way around and implement PRFs in a TM. But that would give +us a TM algorithm to decide if an arbitrary TM halts, which we know doesnt exist.

+

So, this is hopeless! But we might still be able to learn something from failing.

+

Ok! So lets start with a configuration of a TM which we somehow need to encode into a single +number. First, we need the state variable proper (Q0, Q1, etc), which seems easy enough to represent +with a number. Then, we need a tape and a position of the reading head. Recall how we used a pair of +stacks to represent exactly the tape and the position. And recall that we can look at a stack of +zeros and ones as a number in binary form, where push and pop operations are implemented using %, +*, and / exactly the operations we already can do. So, our configuration is just three +numbers: (S, stack1, stack2).

+

And, using the 2a3b5c trick, we can pack this triple into just a single number. But that means we +could directly encode a single step of a Turing Machine:

+ +
+ + +
single_step(config) = if(
+  # if the state is Q0 ...
+  eq(fst(config), 0)
+
+  # and the symbol at the top of left stack is 0
+  if(is_zero(mod(snd(config), 2))
+    mk_triple(
+      1,                    # move to state Q1
+      div(snd(config), 2),  # pop value from the left stack
+      mul(trd(config), 2),  # push zero onto the right stack
+    ),
+    ... # Handle symbol 1 in state Q1
+  )
+  # if the state is Q1 ...
+  if(eq(fst(config), 1)
+    ...
+  )
+)
+ +
+

And now we could plug that into our LOOP to simulate a Turing Machine running for N steps:

+ +
+ + +
n_steps initial_config n =
+  LOOP(initial_config, single_step, n)
+ +
+

The catch of course is that we cant know the N thats going to be enough. But we can have a very +good guess! We could do something like this:

+ +
+ + +
hopefully_enough_steps initial_config =
+  LOOP(initial_config, single_step, pow_2_2(initial_config))
+ +
+

That is, run for some large tower of exponents of the initial state. Which would be plenty for +normal algorithms, which are usually 2N at worst!

+

Or, generalizing:

+ +
+

If a TM has a runtime which is bounded by some primitive-recursive function, then the entire +TM can be replaced with a PRF. Be advised that PRFs can grow really fast.

+
+ +
+

Which is the headline result we have set out to prove!

+
+
+ +

+ Primitive Recursive Functions: Limit +

+

It might seem that non-termination is the only principle obstacle. That anything that terminates at +all has to be implementable as a PRF. Alas, thats not so. Lets go and construct a function that is +surmountable by a TM, but is out of reach of PRFs.

+

We will combine the ideas of the impossibility proofs for FSMs (noting that if a function is +computed by some machine, that machine has a specific finite size) and TMs (diagonalization).

+

So, suppose we have some function f that cant be computed by a PRF. How would we go about proving +that? Well, wed start with suppose that we have a PRF P that computes f. And then we could +notice that P would have some finite size. If you look at it abstractly, the P is its syntax tree, +with lots of LOOP constructs, but it always boils down to some succs and zeros at the leaves. +Lets say that the depth of P is d.

+

And, actually, if you look at it, there are only a finite number of PRFs with depth at most d. Some +of them describe pretty fast growing functions. But probably theres a limit to how fast a function +can grow, given that it is computed by a PRF of size d. Or, to use a concrete example: we have +constructed a PRF of depth 5 that computes two to the power of two to the power of N. Probably if we +were smarter, we could have squeezed a couple more levels into that tower of exponents. But +intuitively it seems that if you build a tower of, say, 10 exponents, that would grow faster than +any PRF of depth 5. And that this generalizes for any fixed depth, theres a high-enough +tower of exponents that grows faster than any PRF with that depth.

+

So we could conceivably build an f that defeats our d-deep P. But thats not quite a victory +yet: maybe that f is feasible for d+2-deep PRFs! So here well additionally apply +diagonalization: for each depth, well build its own depth-specific nemesis f_d. And then well +define our overall function as

+ +
+ + +
a(n) = f_n(n)
+ +
+

So, for n large enough itll grow faster than a PRF with any fixed depth.

+

So thats the general plan, the rest of the own is basically just calculating the upper bound on the +growth of a PRF of depth d.

+

One technical difficulty here is that PRFs tend to have different arities:

+ +
+ + +
f(x, y)
+g(x, y, z, t)
+h(x)
+ +
+

Ideally, wed use just one upper bound of them all. So well be looking for an upper bound of the +following form:

+ +
+ + +
f(x, y, z, t) <= A_d(max(x, y, z, t))
+ +
+

That is:

+
    +
  • +Compute the depth of f, d. +
  • +
  • +Compute the largest of its arguments. +
  • +
  • +And plug that into unary function for depth d. +
  • +
+

Lets start with d=1. We have only primitive functions on this level, succ, zero, and pred, +so we could say that

+ +
+ + +
A_1(x) = x + 1
+ +
+

Now, lets handle an arbitrary other depth d + 1. In that case, our function is non-primitive, so at +the root of the syntax tree we have either a composition or a LOOP.

+

Composition would look like this:

+ +
+ + +
f(x, y, z, ...) = g(
+  h1(x, y, z, ...),
+  h2(x, y, z, ...),
+  h3(x, y, z, ...),
+)
+ +
+

where g and h_n are d deep and the resulting f is d+1 deep. We can immediately estimate +the h_n then:

+ +
+ + +
f(args...) <= g(
+  A_d(maxarg),
+  A_d(maxarg),
+  A_d(maxarg),
+  ...
+)
+ +
+

In this somewhat loose notation, args... stands for a tuple of arguments, and maxarg stands for +the largest one.

+

And then we could use the same estimate for g:

+ +
+ + +
f(args...) <= A_d(A_d(maxarg))
+ +
+

This is super high-order, so lets do a concrete example for a depth-2 two-argument function which +starts with a composition:

+ +
+ + +
f(x, y) <= A_1(A_1(max(x, y)))
+         = A_1(max(x, y) + 1)
+         = max(x, y) + 2
+ +
+

This sounds legit: if we dont use LOOP, then f(x, y) is either succ(succ(x)) or succ(succ(y)) +so max(x, y) + 2 indeed is the bound!

+

Ok, now the fun case! If the top-level node is a LOOP, then we have

+ +
+ + +
f(args...) = LOOP(
+  g(args...),
+  h(args...),
+  t(args...),
+)
+ +
+

This sounds complicated to estimate, especially due to that last t(args...) argument, which is the +number of iterations. So well be cowards and wont actually try to estimate this case. Instead, +we will require that our PRF is written in a simplified form, where the first and the last arguments +to LOOP are simple.

+

So, if your PRF looks like

+ +
+ + +
f(x, y) = LOOP(x + y, mul, pow2(x))
+ +
+

you are required to re-write it first as

+ +
+ + +
helper(u, v) = LOOP(u, mul, v)
+f(x, y) = helper(x + y, pow2(x))
+ +
+

So now we only have to deal with this:

+ +
+ + +
f(args...) = LOOP(
+  arg,
+  g(args...),
+  arg,
+)
+ +
+

f has depth d+1, g has depth d.

+

On the first iteration, well call g(args..., arg), which we can estimate as A_d(maxarg). That +is, g does get an extra argument, but it is one of the original arguments of f, and we are +looking at the maximum argument anyway, so it doesnt matter.

+

On the second iteration, we are going to call +g(args..., prev_iteration) +which we can estimate as +A_d(max(maxarg, prev_iteration)).

+

Now we plug our estimation for the first iteration:

+ +
+ + +
g(args..., prev_iteration)
+  <= A_d(max(maxarg, prev_iteration))
+  <= A_d(max(maxarg, A_d(maxarg)))
+  =  A_d(A_d(maxarg))
+ +
+

That is, the estimate for the first iteration is A_d(maxarg). The estimation for the second +iteration adds one more layer: A_d(A_d(maxarg)). For the third iteration well get +A_d(A_d(A_d(maxarg))).

+

So the overall thing is going to be smaller than A_d iteratively applied to itself some number of +times, where some number is one of the f original arguments. But no harms done if we iterate up +to maxarg.

+

As a sanity check, the worst depth-2 function constructed with iteration is probably

+ +
+ + +
f(x, y) = LOOP(x, succ, y)
+ +
+

which is x + y. And our estimate gives x + 1 applied maxarg times to maxarg, which is 2 * +maxarg, which is indeed the correct upper bound!

+

Combining everything together, we have:

+ +
+ + +
A_1(x) = x + 1
+
+f(args...) <= max(
+  A_d(A_d(maxarg)),               # composition case
+  A_d(A_d(A_d(... A_d(maxarg)))), # LOOP case,
+   <-    maxarg A's         ->
+)
+ +
+

That max there is significant although it seems like the second line, with maxarg +applications, is always going to be longer, maxarg, in fact, could be as small as zero. But we +can take maxarg + 2 repetitions to fix this:

+ +
+ + +
f(args...) <=
+  A_d(A_d(A_d(... A_d(maxarg)))),
+  <-    maxarg + 2 A's         ->
+ +
+

So lets just define A_{d+1}(x) to make that inequality work:

+ +
+ + +
A_{d+1}(x) = A_d(A_d( .... A_d(x)))
+            <- x + 2 A_d's in total->
+ +
+

Unpacking:

+

We define a family of unary functions A_d, such that each A_d grows faster than any n-ary PRF +of depth d. If f is a ternary PRF of depth 3, then f(1, 92, 10) <= A_3(92).

+

To evaluate A_d at point x, we use the following recursive procedure:

+
    +
  • +If d is 1, return x + 1. +
  • +
  • +Otherwise, evaluate A_{d-1} at point x to get, say, v. Then evaluate A_{d-1} again at +point v this time, yielding u. Then compute A_{d-1}(u). Overall, repeat this process x+2 +times, and return the final number. +
  • +
+

We can simplify this a bit if we stop treating d as a kind of function index, and instead say +that our A is just a function of two arguments. Then we have the following equations:

+ +
+ + +
A(1, x) = x + 1
+A(d + 1, x) = A(d, A(d, A(d, ..., A(d, x))))
+                <- x + 2 A_d's in total->
+ +
+

The last equation can re-formatted as

+ +
+ + +
A(
+  d,
+  A(d, A(d, ..., A(d, x))),
+  <- x + 1 A_d's in total->
+)
+ +
+

And for non-zero x that is just

+ +
+ + +
A(
+  d,
+  A(d + 1, x - 1),
+)
+ +
+

So we get the following recursive definition for A(d, x):

+ +
+ + +
A(1, x) = x + 1
+A(d + 1, 0) = A(d, A(d, 0))
+A(d + 1, x) = A(d, A(d + 1, x - 1))
+ +
+

As a Python program:

+ +
+ + +
def A(d, x):
+  if d == 1: return x + 1
+  if x == 0: return A(d-1, A(d-1, 0))
+  return A(d-1, A(d, x - 1))
+ +
+

Its easy to see that computing A on a Turing Machine using this definition terminates this +is a function with two arguments, and every recursive call uses a lexicographically smaller pair of +arguments. And we constructed A in such a way that A(d, x) as a function of x is larger than any +PRF with a single argument of depth d. But that means that the following function with one argument +a(x) = A(x, x)

+

grows faster than any PRF. And thats an example of a function which a Turing Machine has no +trouble computing (given sufficient time), but which is beyond the capabilities of PRFs.

+
+
+ +

+ Part III, Descent From the Ivory Tower +

+

Remember, this is a three-part post! And are finally at the part 3! So lets circle back to the +practical matters. We have learned that:

+
    +
  • +Turing machines dont necessarily terminate. +
  • +
  • +While other computational devices, like FSMs and PRFs, can be made to always terminate, theres no +guarantee that theyll terminate fast. PRFs in particular can compute quite large functions! +
  • +
  • +And non-Turing complete devices can be quite expressive. For example, any real-world algorithm +that works on a TM can be adapted to run as a PRF. +
  • +
  • +Moreover, you dont even have to contort the algorithm much to make it fit. Theres a universal +recipe for how to take something Turing complete and make it a primitive recursive function +instead just add an iteration counter to the device, and forcibly halt it if the counter grows +too large. +
  • +
+

Or, more succinctly: theres no practical difference between a program that doesnt terminate, and +the one that terminates after a billion years. As a practitioner, if you think you need to solve the +first problem, you need to solve the second problem as well. And making your programming language +non-Turing complete doesnt really help with this.

+

And yet, there are a lot of configuration languages out there that use non-Turing completeness as +one of their key design goals. Why is that?

+

I would say that we are never interested in Turing-completeness per-se. We usually want some much +stronger properties. And yet theres no convenient catchy name for that bag of features of a good +configuration language. So, non-Turing-complete gets used as a sort of rallying cry to signal that +something is a good configuration language, and maybe sometimes even to justify to others inventing +a new language instead of taking something like Lua. That is, the real reason why you want at +least a different implementation is all those properties you really need, but they are kinda hard to +explain, or at least much harder than we cant use Python/Lua/JavaScript because they are +Turing-complete.

+

So what are the properties of a good configuration language?

+

First, we need the language to be deterministic. If you launch Python and type id([]), youll +see some number. If you hit ^C, and than do this again, youll see a different number. This is OK +for normal programming, but is usually anathema for configuration. Configuration is often used as a +key in some incremental, caching system, and letting in non-determinism there wreaks absolute chaos!

+

Second, you need the language to be well-defined. You can compile Python with ASLR disabled, and +use some specific allocator, such that id([]) always returns the same result. But that result +would be hard to predict! And if someone tries to do an alternative implementation, even if they +disable ASLR as well, they are likely to get a different deterministic number! Or the same could +happen if you just update the version of Python. So, the semantics of the language should be clearly +pinned-down by some sort of a reference, such that it is possible to guarantee not only +deterministic behavior, but fully identical behavior across different implementations.

+

Third, you need the language to be pure. If your configuration can access environment variables or +read files on disk, than the meaning of the configuration would depend on the environment where the +configuration is evaluated, and you again dont want that, to make caching work.

+

Fourth, a thing that is closely related to purity is security and sandboxing. The mechanism to +achieve both purity and security is the same you dont expose general IO to your language. But +the purpose is different: purity is about not letting the results be non-deterministic, while +security is about not exposing access tokens to the attacker.

+

And now this gets tricky. One particular possible attack is a denial of service sending some bad +config which makes our system just spin there burning the CPU. Even if you control all IO, you +are generally still open to these kinds of attacks. It might be OK to say this is outside of the +threat model that no one would find it valuable enough to just burn your CPU, if they cant also +do IO, and that, even in the event that this happens, theres going to be some easy mitigation in the +form of a higher-level timeout.

+

But you also might choose to provide some sort of guarantees about execution time, and thats really +hard. Two approaches work. One is to make sure that processing is obviously linear. Not just +terminates, but is actually proportional to the size of inputs, and in a very direct way. If the +correspondence is not direct, than its highly likely that it is in fact non linear. The second +approach is to ensure metered execution during processing, decrement a counter for every +simple atomic step and terminate processing when the counter reaches zero.

+

Finally one more vague property youd want from a configuration language is for it to be simple. +That is, to ensure that, when people use your language, they write simple programs. It seems to me +that this might actually be the case where banning recursion and unbounded loops could help, though +I am not sure. As we know from the PRF exercise, this wont actually prevent people from writing +arbitrary recursive programs. Itll just require some roundabout +code to do that. But maybe thatll be enough of a +speedbump to make someone invent a simple solution, instead of brute-forcing the most obvious one?

+

Thats all for today! Have a great weekend, and remember:

+ +
+

Any algorithm that can be implemented by a Turing Machine such that its runtime is bounded by some +primitive recursive function of input can also be implemented by a primitive recursive function!

+
+ +
+
+
+
+ + + + + diff --git a/2024/08/12/std-io.html b/2024/08/12/std-io.html new file mode 100644 index 00000000..3afa81dd --- /dev/null +++ b/2024/08/12/std-io.html @@ -0,0 +1,161 @@ + + + + + + + STD Doesn't Have to Abstract OS IO + + + + + + + + + + + + +
+ +
+ +
+
+ +

STD Doesnt Have to Abstract OS IO

+

A short note on what goes into a languages standard library, and whats left for third party +libraries to implement!

+

Usually, the main underlying driving factor here is cardinality. If it is important that theres +only one of a thing, it goes into std. If having many of a thing is a requirement, it is better +handled by a third-party library. That is, the usual physical constraint is that theres only a +single standard library, and everyone uses the same standard library. In contrast, there are many +different third-party libraries, and they all can be used at the same time.

+

So, until very recently, my set of rules of thumb for what goes into stdlib looked roughly like +this:

+
    +
  1. +If this is a vocabulary type, which will be used by APIs of different libraries, it should be in +the stdlib. +
  2. +
  3. +If this is a cross platform abstraction around an IO facility provided by an OS, and this IO +facility has a reasonable common subset across most OSes, it should be in the stdlib. +
  4. +
  5. +If theres one obvious way to implement it, it might go to stdlib. +
  6. +
+

So for example something like Vec goes +into a standard library, because all other libraries are going to use vectors at the interfaces.

+

Something like lazy_static +doesnt: while it is often needed, it is not a vocabulary interface type.

+

But it is acceptable for something like +OnceCell to be in std +— it is still not a vocabulary type, but, unlike lazy_static, it is clear that the API is more +or less optimal, and that there arent that many good options to do this differently.

+

But Ive changed my mind about the second bullet point, about facilities like file IO or TCP +sockets. I was always under the impression that these things are a must for a standard library. +But now I think thats not necessarily true!

+

Consider randomness. Not the PRNG kind of randomness youd use to make a game fun, but a +cryptographically secure randomness that youd use to generate an SSH key pair. This sort of +randomness ultimately bottoms out in hardware, and fundamentally requires talking to the OS and +doing IO. This is squarely the bullet point number 2. And Rust is an interesting case study here: it +failed to provide this abstraction in std, even though std itself actually needs it! But this turned +out to be mostly a non-issue in practice a third party crate, getrandom, took the job of +writing all the relevant bindings to various platform-specific API and using a bunch of conditional +compilation to abstract that all away and provide a nice cross-platform API.

+

So, no, it is not a requirement that std has to wrap any wrappable IOing API. This could be +handled by the library ecosystem, if the language allows first-class bindings to raw OS APIs +outside of compiler-privileged code (and Rust certainly allows for that).

+

So perhaps it wont be too unreasonable to leave even things like files and sockets to community +experimentation? In a sense, that is happening in the async land anyway.

+
+

To clarify, I still believe that Rust should provide bindings to OS-sourced crypto randomness, and +I am extremely happy to see recent motion in that area. But the reason for this belief changed. I no +longer feel the mere fact that OS-specific APIs are involved to be particularly salient. However, it +is still true that theres more or less one correct way to do +this.

+
+
+ + + + + diff --git a/2024/09/03/the-fundamental-law-of-dependencies.html b/2024/09/03/the-fundamental-law-of-dependencies.html new file mode 100644 index 00000000..76d0b2c3 --- /dev/null +++ b/2024/09/03/the-fundamental-law-of-dependencies.html @@ -0,0 +1,157 @@ + + + + + + + The Fundamental Law Of Software Dependencies + + + + + + + + + + + + +
+ +
+ +
+
+ +

The Fundamental Law Of Software Dependencies

+ +
+

Canonical source code for software should include checksums of the content of all its +dependencies.

+
+ +
+

Several examples of the law:

+

Software obviously depends on its source code. The law says that something should hold the hash of +the entire source, and thus mandates the use of a content-addressed version control system such as +git.

+

Software often depends on 3rd party libraries. These libraries could in turn depend on other +libraries. It is imperative to include a lockfile that covers this entire set and comes with +checksums. Curiously, the lockfile itself is a part of source code, and gets mixed into the VCS +root hash.

+

Software needs a compiler. The hash of the required compiler should be included in the lockfile. +Typically, this is not done only the version is specified. I think that is a mistake. Specifying +a version and a hash is not much more trouble than just the version, but that gives you a superpower +— you no longer need to trust the party that distributes your compiler. You could take a shady +blob of bytes youve found laying on the street, as long as its checksum checks out.

+

Note that you can compress hashes by mixing them. For compiler use-case, theres a separate hash per +platform, because the Linux and the Windows versions of the compiler differ. This doesnt mean that +your project should include one compilers hash per platform, one hash is enough. Compiler +distribution should include a manifest a small text file which lists all platform and their +platform specific hashes. The single hash of that file is what is to be included by downstream +consumers. To verify a specific binary, the consumer first downloads a manifest, checks that it +has the correct hash, and then extracts the hash for the specific platform.

+
+

The law is an instrumental goal. By itself, hashes are not that useful. But to get to the point +where you actually know the hashes requires:

+ +

These things are what actually make developing software easier.

+
+
+ + + + + diff --git a/2024/09/06/fix-one-level-deeper.html b/2024/09/06/fix-one-level-deeper.html new file mode 100644 index 00000000..4726adc9 --- /dev/null +++ b/2024/09/06/fix-one-level-deeper.html @@ -0,0 +1,154 @@ + + + + + + + Try to Fix It One Level Deeper + + + + + + + + + + + + +
+ +
+ +
+
+ +

Try to Fix It One Level Deeper

+

I had a productive day today! I did many different and unrelated things, but they all had the same +unifying theme:

+

Theres a bug! And it is sort-of obvious how to fix it. But if you dont laser-focus on that, and +try to perceive the surrounding context, it turns out that the bug is valuable, and it is pointing +in the direction of a bigger related problem. So, instead of fixing the bug directly, a detour is +warranted to close off the avenue for a class of bugs.

+

Here are the examples!

+

In the morning, my colleague pointed out that we are giving substandard error message for a pretty +stressful situation when the database runs out of disk space. I went ahead and added appropriate log +messages to make it clearer. But then I stopped for a moment and noticed that the problem is bigger +— we are missing an infrastructure for fatal errors, and NoSpaceLeft is just one of a kind. So I +went ahead and added that along the way: +#2289.

+

Then, I was reviewing a PR by @martinconic which was fixing some typos, and noticed that it was +also changing the formatting of our Go code. The latter is by far the biggest problem, as it is the +sign that we somehow are not running gofmt during our CI, which I fixed in +#2287.

+

Then, there was a PR from yesterday, where we again had a not quite right log message. The cause was +a confusion between two compile-time configuration parameters, which were close, but not quite +identical. So, instead of fixing the error message I went ahead and made the two parameters +exactly the same. But then my colleague noticed that I actually failed to fix it one level deeper +in this case! Turns out, it is possible to remove this compile-time parametrization altogether, +which I did in #2292.

+

But these all were randomly-generated side quests. My intended story line for today was to refactor +the piece of code I had trouble explaining (and understanding!) on yesterdays +episode +of Iron Beetle. To get into the groove, I decided to first refactor the code that calls the +problematic piece of logic, as I noticed a couple of minor stylistic problems there. Of course, when +doing that, I discovered that we have a bit of dead code, which luckily doesnt affect correctness, +but does obscure the logic. While fixing that, I used one of my favorite Zig patterns: +defer assert(postcondition);

+

It of course failed in the simulator in a way postcondition checks tend to fail there was an +unintended reentrancy in the code. So I slacked my colleague something like

+ +
+

I thought myself to be so clever adding this assert, but now it fails and I have to fix it TT +I think Ill just go and .next_tick the prefetch path. It feels like there should be a more +elegant solution here, but I am not seeing it.

+
+ +
+

But of course I cant just go and .next_tick it, so here I am, trying to figure out how to +encode a Duffs device in Zig +pre-#8220, so as to make this class of issues much +less likely.

+
+
+ + + + + diff --git a/2024/09/23/what-is-io-uring.html b/2024/09/23/what-is-io-uring.html new file mode 100644 index 00000000..a787e2be --- /dev/null +++ b/2024/09/23/what-is-io-uring.html @@ -0,0 +1,142 @@ + + + + + + + What is io_uring? + + + + + + + + + + + + +
+ +
+ +
+
+ +

What is io_uring?

+

An attempt at concise explanation of what io_uring is.

+

io_uring is a new Linux kernel interface for making system calls. +Traditionally, syscalls are submitted to the kernel individually and +synchronously: a syscall CPU instruction transfers control from the +application to the kernel; control returns to the application only when the +syscall is completed. In contrast, io_uring is a batched and asynchronous +interface. The application submits several syscalls by writing their codes & +arguments to a lock-free shared-memory ring buffer. The kernel reads the +syscalls from this shared memory and executes them at its own pace. To +communicate results back to the application, the kernel writes the results to a +second lock-free shared-memory ring buffer, where they become available to the +application asynchronously.

+

You might want to use io_uring if:

+ +

You might want to avoid io_uring if:

+ +
+
+ + + + + diff --git a/2024/09/24/watermelon-operator.html b/2024/09/24/watermelon-operator.html new file mode 100644 index 00000000..17e05821 --- /dev/null +++ b/2024/09/24/watermelon-operator.html @@ -0,0 +1,862 @@ + + + + + + + The Watermelon Operator + + + + + + + + + + + + +
+ +
+ +
+
+ +

The Watermelon Operator

+

In these two most excellent articles, +https://without.boats/blog/let-futures-be-futures +and +https://without.boats/blog/futures-unordered, +withoutboats introduces the concepts of multi-task and intra-task concurrency. +I want to revisit this distinction while I agree that there are different classes +of patterns of concurrency here, I am not quite satisfied with this specific partitioning of the +design space. I will use Rust-like syntax for most of the examples, but I am more interested in the +language-agnostic patterns, rather than in Rusts specific implementation of async.

+
+ +

+ The Two Examples +

+

Lets introduce the two kinds of concurrency using a somewhat abstract example. We want to handle a +Request by doing some computation and then persisting the results in the database and in the cache. +Notably, writes to the cache and to the database can proceed concurrently. So, something like this:

+ +
+ + +
async fn process(
+  db: Database,
+  cache: Cache,
+  request: Request,
+) -> Response {
+  let response = compute_response(db, cache, request).await;
+  spawn(update_db(db, response));
+  spawn(update_cache(cache, response));
+  response
+}
+
+async fn update_db(db: Database, response: Response);
+async fn update_cache(cache: Cache, response: Response);
+
+fn spawn<T>(f: impl Future<Output = T>) -> JoinHandle<T>;
+ +
+

This is multi-task concurrency style we fire off two tasks for updating the database and the +cache. Heres the same snippet in intra-task style, where we use join function on futures:

+ +
+ + +
async fn process(
+  db: Database,
+  cache: Cache,
+  request: Request,
+) -> Response {
+  let response = compute_response(db, cache, request).await;
+  join(
+    update_db(db, response),
+    update_cache(cache, response),
+  ).await;
+  response
+}
+
+async fn update_db(db: Database, response: Response) { ... }
+async fn update_cache(cache: Cache, response: Response) { ... }
+
+async fn join<U, V>(
+  f: impl Future<Output = U>,
+  g: impl Future<Output = V>,
+) -> (U, V);
+ +
+

In other words:

+

Multi-task concurrency uses spawn an operation that takes a future and starts a tasks that +executes independently of the parent task.

+

Intra-task concurrency uses join an operation that takes a pair of futures and executes them +concurrently as a part of the current task.

+

But what is the actual difference between the two?

+
+
+ +

+ Parallelism is not +

+

One candidate is parallelism with spawn, the tasks can run not only concurrently, but actually +in parallel, on different CPU cores. join restricts them to the same thread that runs the main +task. But I think this is not quite right, abstractly, and is more of a product of specific Rust +APIs. There are executors which spawn on the current thread only. And, while in Rust its not +really possible to make join poll the futures in parallel, I think this is just an artifact of +Rust existing API design (futures cant opt-out of synchronous cancellation). In other words, I +think it is possible in theory to implement an async runtime which provides all of the following +functions at the same time:

+ +
+ + +
fn spawn<F>(fut: F) -> JoinHandle<Output = F::Output>
+where
+  F: Future;
+
+fn pspawn<F>(fut: F) -> PJoinHandle<Output = F>
+where
+  F: Future + Send + 'static,
+  F::Output: Send + 'static;
+
+async fn join<F1, F2>(
+  fut1: F1,
+  fut2: F2,
+) -> (F1::Output, F2::Output)
+where
+  F1: Future,
+  F2: Future;
+
+async fn pjoin<F1, F2>(
+  fut1: F1,
+  fut2: F2,
+) -> (F1::Output, F2::Output)
+where
+  F1: Future + Send, // NB: only Send, no 'static
+  F1::Output:  Send,
+  F2: Future + Send,
+  F2::Output:  Send;
+ +
+

To confuse matters further, lets rewrite our example in TypeScript:

+ +
+ + +
async function process(
+  db: Database,
+  cache: Cache,
+  request: Request,
+): Response {
+  const response = await compute_response(db, cache, request);
+  const db_update = update_db(db, response);
+  const cache_update = update_cache(cache, response);
+  await Promise.all([db_update, cache_update]);
+  return response
+}
+ +
+

and using Rusts rayon library:

+ +
+ + +
fn process(
+  db: Database,
+  cache: Cache,
+  request: Request,
+) -> Response {
+  let response = compute_response(db, cache, request).await;
+  rayon::join(
+    || update_db(db, response),
+    || update_cache(cache, response),
+  );
+  response
+}
+ +
+

Are these examples multi-task or intra-task? To me, the TypeScript one feels multi-task although +it is syntactically close to join().async, the two update promises are running independently from +the parent task. If we forget the call to Promise.all, the cache and the database would still get +updated (but likely after we would have returned the response to the user)! In contrast, rayon +feels intra-task although the closures could get stolen and be run by a different thread, they +wont escape dynamic extent of the encompassing process call.

+
+
+ +

+ To await or await to? +

+

Lets zoom in onto the JS and the join examples:

+ +
+ + +
async function process(
+  db: Database,
+  cache: Cache,
+  request: Request
+): Response {
+  const response = await compute_response(db, cache, request);
+
+  await Promise.all([
+    update_db(db, response),
+    update_cache(cache, response),
+  ]);
+
+  return response;
+}
+ +
+ +
+ + +
async fn process(
+  db: Database,
+  cache: Cache,
+  request: Request,
+) -> Response {
+  let response = compute_response(db, cache, request).await;
+
+  join(
+    update_db(db, response),
+    update_cache(cache, response),
+  ).await;
+
+  response
+}
+ +
+

Ive re-written the JavaScript version to be syntactically isomorphic to the Rust one. The +difference is on the semantic level: JavaScript promises are eager, they start executing as soon as +a promise is created. In contrast, Rust futures are lazy they do nothing until polled. And +this I think is the fundamental difference, it is lazy vs. eager futures (thread::spawn is an +eager future while rayon::join a lazy one).

+

And it seems that lazy semantics is quite a bit more elegant! The beauty of

+ +
+ + +
join(
+  update_db(db, response),
+  update_cache(cache, response),
+).await;
+ +
+

is that its Molières prose this is structured concurrency, but without bundles, nurseries, +scopes, and other weird APIs.

+

It makes runtime semantics nicer even in dynamically typed languages. In JavaScript, forgetting an +await is a common, and very hard to spot problem without await, code still works, but is +sometimes wrong (if the async operation doesnt finish quite as fast as usual). Imagine JS with +lazy promises there, forgetting an await would always consistently break. So, the need to +statically lint missing awaits will be less pressing.

+

Compare this with Erlangs take on nulls: while in typical dynamically typed languages partial +functions can return a value T or a None, in Erlang the convention is to return either {ok, T} +or none. That is, even if the value is non-null, the call-site is forced to unpack it, you cant +write code that happens to work as long as T is non-null.

+

And of course, in Rust, the killer feature of lazy futures is that you can just borrow data from the +enclosing scope.

+

But it seems like there is one difference between multi-task and intra-task concurrency.

+
+
+ +

+ One, Two, N, and More +

+

In the words of withoutboats:

+ +
+

The first limitation is that it is only possible to achieve a static arity of concurrency with +intra-task concurrency. That is, you cannot join (or select, etc) an arbitrary number of futures +with intra-task concurrency: the number must be fixed at compile time.

+
+ +
+

That is, you can do +join(a, b).await, +and

+ +
+ + +
join(
+  join(a, b)
+  c,
+).await
+ +
+

and, with some macros, even

+ +
+ + +
join!(a, b, c, d, e, f).await;
+ +
+

but you cant do join(xs...).await.

+

I think this is incorrect, in a trivial and in an interesting way.

+

The trivial incorrectness is that theres join_all, that takes a slice of futures and is a direct +generalization of join to a runtime-variable number of futures.

+

But join_all still cant express the case where you dont know the number of futures up-front, +where you spawn some work, and only later realize that you need to spawn some more.

+

This is sort-of possible to express with FuturesUnordered, but thats a yuck API. I mean, even +its name screams DO NOT USE ME!.

+

But I do think that this is just an unfortunate API, and that the pattern actually can be expressed +in intra-task concurrency style nicely.

+

Lets take a closer look at the base case, join!

+
+
+ +

+ Asynchronous Semicolon +

+

Section title is a bit of a giveaway. The join operator is async ;. The semicolon is an +operator of sequential composition: +A; B

+

runs A first and then B.

+

In contrast, join is concurrent composition: +join(A, B)

+

runs A and B concurrently.

+

And both join and ; share the same problem they can compose only a finite number of things.

+

But thats why we have other operators for sequential composition! If we know how many things we +need to run, we can use a counted for loop. And join_all is an analogue of a counted for loop!

+

In case where we dont know up-front when to stop, we use a while. And this is exactly what we +miss theres no concurrently-flavored while operator.

+

Importantly, what we are looking for is not an async for:

+ +
+ + +
async for x in iter {
+  process(x).await;
+}
+ +
+

Here, although there could be some concurrency inside a single loop iteration, the iterations +themselves are run sequentially. The second iteration starts only when the first one finished. +Pictorially, this looks like a spiral, or a loop if we look from the side:

+ +
+ + +
+

What we rather want is to run many copies of the body concurrently, something like this:

+ +
+ + +
+

A spindle-like shape with many concurrent strands, which looks like wheels spokes from the side. +Or, if you are really short on fitting metaphors:

+
+
+ +

+ The Watermelon Operator +

+

Now, I understand that Ive already poked fun at unfortunate FuturesUnordered name, but I cant +really find a fitting name for the construct we want here. So I am going to boringly use +concurrently keyword, which is way too long, but Ill refer to it as the watermelon operator” +The stripes on the watermelon resemble the independent strands of execution this operator creates:

+ +
+ +wikipedia watermelons +
+

So, if you are writing a TCP server, your accept loop could look like this:

+ +
+ + +
concurrently let Some(socket) = listener.accept().await in {
+  handle_connection(socket).await;
+}.await
+ +
+

This runs accept in a loop, and, for each accepted socket, runs handle_connection concurrently. +There are as many concurrent handle_connection calls as there are ready sockets in our listener!

+

Lets limit the maximum number of concurrent connections, to provide back pressure:

+ +
+ + +
let semaphore = Semaphore::new(16);
+
+concurrently
+  let Some((socket, permit)) = try {
+    let permit = semaphore.acquire().await;
+    let socket = listener.accept().await?;
+    (socket, permit)
+  }
+in {
+  handle_connection(socket).await;
+  drop(permit);
+}.await
+ +
+

You get the idea (hopefully):

+
    +
  • +In the head of our concurrent loop (cooloop?) construct, we first acquire a semaphore permit +and then fetch a socket. +
  • +
  • +Both the socket and the permit are passed to the body. +
  • +
  • +The body releases the permit at the end. +
  • +
  • +While the head construct runs in a loop concurrently to bodies, it is throttled by the minimum +of the available permits and ready connections. +
  • +
+

To make this more concrete, lets spell this out as a library +function:

+ +
+ + +
async fn join<F1, F2>(
+  fut1: F1,
+  fut2: F2,
+) -> (F1::Output, F2::Output)
+where
+  F1: Future,
+  F2: Future;
+
+async fn join_all<F>(futs: Vec<F>) -> Vec<F::Output>
+where
+  F: Future;
+
+async fn concurrently<C, FC, B, FB, T>(condition: C, body: B)
+where
+  C: FnMut() -> FC,
+  FC: Future<Output = Option<T>>,
+  B: FnMut(T) -> FB,
+  FB: Future<Output = ()>;
+ +
+

I claim that this is the full set of primitive operations needed to express +more-or-less everything in intra-task concurrency style.

+

In particular, we can implement multi-task concurrency this way! To do so, well write a universal +watermelon operator, where the T which is passed to the body is an +Box<dyn Future<Output=()>>, +and where the body just runs this future:

+ +
+ + +
async fn multi_task_concurrency_main(
+  spawn: impl Fn(impl Future<Output = ()> + 'static),
+) {
+    ...
+}
+
+type AnyFuture = Box<dyn Future<Output = ()> + 'static>;
+
+async fn universal_watermelon() {
+  let (sender, receiver) = channel::<AnyFuture>();
+  join(
+    multi_task_concurrency_main(move |fut| {
+      sender.send(Box::new(fut))
+    }),
+    concurrently(
+      || async {
+        receiver.recv().await;
+      },
+      |fut| async {
+        fut.await;
+      },
+    ),
+  )
+  .await;
+}
+ +
+

Note that the conversion in the opposite direction is not possible! With intra-task concurrency, we +can borrow from the parent stack frame. So it is not a problem to restrict that to only allow +'static futures into the channel. In a sense, in the above example we return the future up the +stack, which explains why it cant borrow locals from our stack frame.

+

With multi-task concurrency though, we start with static futures. To let them borrow any stack data +requires unsafe.

+

Note also that the above set of operators, join, join_all, concurrently is orthogonal to +parallelism. Alongside those operators, there could exist pjoin, pjoin_all and pconcurrently +with the Send bounds, such that you could mix and match parallel and single-core concurrency.

+
+
+ +

+ If a Stack is a Tree, Does it Make Any Difference? +

+

One possible objection to the above framing of watermelon as a language-level operator is that it +seemingly doesnt pass zero-cost abstraction test. It can start an unbounded number of futures, +and those futures have to be stored somewhere. So we have a language operator which requires +dynamic memory allocation, which is a big no-no for any systems programming language.

+

I think there is some truth to it, and not an insignificant amount of it, but I think I can maybe +weasel out of it.

+

Consider recursion. Recursion also can allocate arbitrary amount of memory (on the stack), but +that is considered fine (I would also agree that it is not in fact fine that unbounded recursion +is considered fine, but, for the scope of this discussion, I will be a hypocrite and will ignore +that opinion of mine).

+

And here, we have essentially the same situation we want to allocate arbitrary many (async) +stack frames, arranged in a tree. Doing it on the heap is easy, but we dont like the heap here. +Luckily, I believe theres a compilation scheme (hat tip to @rpjohnst +for patiently explaining it to me five times in different words) that implements this more-or-less +as efficiently as the normal call stack.

+

The idea is that we will have two stacks a sync one and an async one. Specifically:

+
    +
  • +Every sync function we compile normally, with a single stack. +
  • +
  • +Async functions get two stack pointers. So, we burn sp and one other register +(lets call it asp). +
  • +
  • +If an async function calls a sync function, the callees frame is pushed onto sp. +Crucially, because sync functions can only call other sync functions, the callee doesnt need +to know the value of asp. +
  • +
  • +If an async function calls another async function, the frame (specifically, the variables live +across await point part of it) is pushed onto asp. +
  • +
  • +This async stack is segmented. So, for async function calls, we also do a check for do we have +enough stack? and, if not, allocate a new segment, linking them via a frame pointer. +
  • +
  • +“Allocating a new segment doesnt mean that we actually go and call malloc. Rather, theres a +fixed-sized contiguous slab of say, 8 megs, out of which all async frames are allocated. +
  • +
  • +If we are out of async-stack, we crash in pretty much the same way as for the boring sync stack +overflow. +
  • +
+

While this looks just like Go-style segmented stacks, I think this scheme is quite a bit more +efficient (warning: I in general have a tendency to confidently talk about things I know little +about, and this one is the extreme case of that. If some Go compiler engineer disagrees with me, I +am probably in the wrong!).

+

The main difference is that the distinction between sync and async functions is maintained in the +type system. There are no changes for sync functions at all, so the principle of dont pay for +what you dont use is observed. This is in contrast to Go I believe that Go, in general, cant +know whether a particular function can yield (that is, if any function it (indirectly) calls can +yield), so it has to conservatively insert stack checks everywhere.

+

Then, even the async stack frames dont have to store everything, but just the stuff live across +await. Everything that happens between two awaits can go to the normal stack.

+

On top of that, async functions can still do aggressive inlining. So, the async call (and the stack +growth check) has to happen only for dynamically dispatched async calls!

+

Furthermore, the future trait could have some kind of size_hint method, which returns the lower +and the upper bound on the size of the stack. Fully concrete futures type-erased to dyn Future +would return the exact amount (a, Some(a)). The caller would be required to allocate at least +a bytes of the async stack. The callee uses that contract to elide stack checks. Unknown bound, +(a, None) would only be returned if type-erased concrete future itself calls something +dynamically dispatched. So only dynamically dispatched calls would have to do stack grow checks, and +that cost seems negligible in comparison to the cost of missing optimizations due to inability to +inline.

+

Altogether, it feels like this adds up to something sufficiently cheap to just call it async stack +allocation.

+

I guess thats all for today? Summarizing:

+
    +
  • +Inter-task vs intra-task distinction is mostly orthogonal to the question of parallelism. +
  • +
  • +I claim that this is the same distinction as between eager and lazy futures. +
  • +
  • +In particular, theres no principled obstacles for runtime-bounded intra-task concurrency. +
  • +
  • +But we do miss FuturesUnordered, but nice. The concurrently operator/function feels like a +sufficiently low-hanging watermelon here. +
  • +
  • +One wrinkle is that watermelon requires dynamic allocation, but it looks like we could just +completely upend the compilation strategy we use for futures, implement async segmented stacks +which should be pretty fast, and also gain nice dynamically dispatched (and recursive) async +functions for free? +
  • +
+
+

Haha, just kidding! Bonus content! This really should be a separate blog post, but it is +tangentially related, so here we go:

+
+
+ +

+ Applied Duality +

+

So far, weve focused on join, the operator that takes two futures, and runs them concurrently, +returning both results as a pair. But theres a second, dual operator:

+ +
+ + +
async fn race<F1, F2>(
+  fut1: F1,
+  fut2: F2,
+) -> Either<F1::Output, F2::Output>
+where
+  F1: Future,
+  F2: Future,
+ +
+

Like join, race runs two futures concurrently. Unlike join, it returns only one result — +that which came first. This operator is the basis for a more general select facility.

+

Although race is dual to join, I dont think it is as fundamental. It is possible to have two +dual things, where one of them is in the basis and the other is derived. For example, it is an axiom +of the set theory that the union of two sets, A ∪ B, is a set. Although the intersection of sets, +A ∩ B is a dual for union, existence of intersection is not an axiom. Rather, the intersection +is defined using axiom of specification:

+ +
+ + +
A ∩ B := {x ∈ A : x ∈ B}
+ +
+

Proposition 131.7.1: race can be defined in terms of join

+

The race operator is trickier than it seems. Yes, it returns the result of the future that +finished first, but what happens with the other one? It gets cancelled. Rust implements this +cancellation for free, by just dropping the future, but this is restrictive. This is precisely the +issue that prevents pjoin from working.

+

I postulate that fully general cancellation is an asynchronous protocol:

+
    +
  1. +A requests that B is cancelled. +
  2. +
  3. +B receives this cancellation request and starts winding down. +
  4. +
  5. +A waits until B is cancelled. +
  6. +
+

That is, cancellation is not I cancel thou. Rather it is I ask you to stop, and then I +cooperatively wait until you do so. This is very abstract, but the following three examples should +help make this concrete.

+
    +
  1. +

    A is some generic asynchronous task, which offloads some computation-heavy work to a CPU pool. +That work (B) doesnt have checks for cancelled flags. So, if A is canceled, it cant really +stop B, which means we are violating structured concurrency.

    +
  2. +
  3. +

    A is doing async IO. Specifically, A uses io_uring to read data from a socket. A owns a buffer, +and passes a pointer to it to the kernel via io_uring as the target buffer for a read +syscall. While A is being cancelled, the kernel writes data to this buffer. If A doesnt wait +until the kernel is done, buffers memory might get reused, and the kernel would corrupt some +unrelated data.

    +
  4. +
+

These examples are somewhat unsatisfactory A is philosophical (who needs structured +concurrency?), while B is esoteric (who uses io_uring in 2024?). But the two can be combined into +something rather pedestrianly bad:

+

Like in the case A, an async task submits some work to a CPU pool. But this time the work is very +specific computing a cryptographic checksum of a message owned by A. Because this is +cryptography, this is going to be some hyper-optimized SIMD loop which definitely wont have any +affordance for checking some sort of a cancelled flag. The loop would have to run to completion, +or at least to a safe point. And, because the loop checksums data owned by A, we cant destroy A +before the loop exits, otherwise itll be reading garbage memory!

+

And this example is the reason why

+ +
+ + +
async fn pjoin<F1, F2>(
+  fut1: F1,
+  fut2: F2,
+) -> (F1::Output, F2::Output)
+where
+  F1: Future + Send,
+  F1::Output:  Send,
+  F2: Future + Send,
+  F2::Output:  Send,
+ +
+

cant be a thing in Rust if fut1 runs on a thread separate from the pjon future, then, if +pjoin ends up being cancelled, fut1 would be pointing at garbage. You could have

+ +
+ + +
async fn pjoin<F1, F2>(
+  fut1: F1,
+  fut2: F2,
+) -> (F1::Output, F2::Output)
+where
+  F1: Future + Send + 'static,
+  F1::Output:  Send + 'static,
+  F2: Future + Send + 'static,
+  F2::Output:  Send + 'static,
+ +
+

but that removes one of the major benefits of intra-task style API ability to just borrow data.

+

So the fully general cancellation should be cooperative. Lets assume that it is driven by some sort +of cancellation token API:

+ +
+ + +
impl CancellationSource {
+  fn request_cancellation(&self) { ... }
+  async fn await_cancellation(self) { ...  }
+
+  async fn cancel(self) {
+    self.request_cancellation();
+    self.await_cancellation().await;
+  }
+
+  fn new_token(&self) -> CancellationToken { ... }
+}
+
+impl CancellationToken {
+  fn is_cancelled(&self) -> bool { ... }
+  fn on_cancelled(&self, callback: impl FnOnce()) { ... }
+}
+ +
+

Note that the question of cancellation being cooperative is orthogonal to the question of explicit +threading of cancellation tokens! They can be threaded implicitly (cooperative, implicit +cancellation is how Pythons trio does this, though they dont really document the cooperative part +(the shields stuff)).

+

With this, we can write our own race well create a cancellation scope and then join +modified futures, each of which would cancel the other upon completion:

+ +
+ + +
fn race<U, V>(
+  fut1: impl async FnOnce(&CancellationToken) -> U,
+  fut2: impl async FnOnce(&CancellationToken) -> V,
+) -> Either<U, V> {
+  let source = CancellationSource::new();
+  let token = source.new_token();
+  let u_or_v = join(
+    async {
+      let u = fut1(&token).await;
+      if token.is_cancelled() {
+        return None;
+      }
+      source.cancel();
+      Some(u)
+    },
+    async {
+      let v = fut2(&token).await;
+      if token.is_cancelled() {
+        return None;
+      }
+      source.cancel();
+      Some(v)
+    },
+  )
+  .await;
+  match u_or_v {
+    (Some(u), None) => Left(u),
+    (None, Some(v)) => Right(v),
+    _ => unreachable!(),
+  }
+}
+ +
+

In other words, race is but a cooperatively-cancelled join!

+

Thats all for real for today, viva la vida!

+
+
+
+ + + + + diff --git a/2024/09/32/-what-is-io-uring.html b/2024/09/32/-what-is-io-uring.html new file mode 100644 index 00000000..f94f84c7 --- /dev/null +++ b/2024/09/32/-what-is-io-uring.html @@ -0,0 +1,11 @@ + + + + Redirecting… + + + + +

Redirecting…

+ Click here if you are not redirected. + \ No newline at end of file diff --git a/2024/10/06/ousterhouts-dichotomy.html b/2024/10/06/ousterhouts-dichotomy.html new file mode 100644 index 00000000..e983d3ef --- /dev/null +++ b/2024/10/06/ousterhouts-dichotomy.html @@ -0,0 +1,192 @@ + + + + + + + On Ousterhout's Dichotomy + + + + + + + + + + + + +
+ +
+ +
+
+ +

On Ousterhouts Dichotomy

+

Why are there so many programming languages? One of the driving reasons for this is that some +languages tend to produce fast code, but are a bit of a pain to use (C++), while others are a breeze +to write, but run somewhat slow (Python). Depending on the ratio of CPUs to programmers, one or the +other might be relatively more important.

+

But cant we just, like, implement a universal language that is convenient but slowish by default, +but allows an expert programmer to drop to a lower, more performant but harder register? I think +there were many attempts at this, and they didnt quite work out.

+

The natural way to go about this is to start from the high-level side. Build a high-level +featureful language with large runtime, and then provide granular opt outs of specific runtime +facilities. Two great examples here are C# and D. And the most famous example of this paradigm is +Python, with rewrite slow parts in C mantra.

+

It seems to me that such an approach can indeed solve the easy to use part of the dichotomy, but +doesnt quite work as promised for runs fast one. And heres the reason. For performance, what +matters is not so much the code thats executed, but rather the layout of objects in memory. And the +high-level dialect locks-in pointer-heavy GC object model! Even if you write your code in assembly, +the performance ceiling will be determined by all those pointers GC needs. To actually get full +“low-level performance, you need to effectively mirror the data across the dialects across a +quasi-FFI boundary.

+

And thats what kills write most of the code in Python, rewrite hot spots in C approach the +overhead for transitioning between the native C data structures and the Python ones tends to eat any +performance benefits that C brings to the table. There are some very real, very important +exceptions, where it is possible to batch sufficiently large packages of work to minimize the +overhead: http://venge.net/graydon/talks/VectorizedInterpretersTalk-2023-05-12.pdf. +But it seems that the average case looks more like this: +https://code.visualstudio.com/blogs/2018/03/23/text-buffer-reimplementation.

+

And this brings me to Rust. It feels like it accidentally blundered into the space of universal +languages through the floor. There are no heavy runtime-features to opt out of in Rust. The object +model is universal throughout the language. There isnt a value-semantics/reference-semantics +dichotomy, references are first-class values. And yet:

+ +

As a result, there is a certain spectrum of Rust:

+ +

While the bottom end here sits pretty comfortably next to C, the upper tip doesnt quite reach the +usability level of Python. But this is mostly compensated through these three effects:

+ +
+
+ + + + + diff --git a/2024/10/08/two-tips.html b/2024/10/08/two-tips.html new file mode 100644 index 00000000..c6352f59 --- /dev/null +++ b/2024/10/08/two-tips.html @@ -0,0 +1,255 @@ + + + + + + + Two Workflow Tips + + + + + + + + + + + + +
+ +
+ +
+
+ +

Two Workflow Tips

+

An article about a couple of relatively recent additions to my workflow which I wish I knew about +years ago.

+
+ +

+ Split And Go To Definition +

+

Go to definition is super useful (in general, navigation is much more important than code +completion). But often, when I use goto def I dont actually mean to permanently go there. +Rather, I want to stay where I am, but I need a bit more context about a particular thing at point.

+

What Ive found works really great in this context is to split the screen in two, and issue go to +def in the split. So that you see both the original context, and the definition at the same time, +and can choose just how far you would like to go. Heres an example, where I want to understand how +apply function works, and, to understand that, I want to quickly look up the definition of +FuzzOp:

+ +
+ + +
+

VS Code actually has a first-class UI for something like this, called Peek Definition but it is +just not good it opens some kind of separate pop-up, with a completely custom UX. Its much more +fruitful to compose two existing basic features splitting the screen and going to definition.

+

Note that in the example above I do move focus to the split. I also tried a version that keeps focus +in the original split, but focusing new one turned out to be much better. You actually dont always +know up front which split would become the main one, and moving the focus gives you flexibility of +moving around, closing the split, or closing the other split.

+

I highly recommend adding a shortcut for this action. Its a good idea to make it a complementary” +shortcut for the usual goto definition. I use , . for goto definition, and hence , > +is the splitting version:

+ +
+ + +
{ "key": ", .",       "command": "editor.action.revealDefinition" },
+{ "key": ", shift+.", "command": "editor.action.revealDefinitionAside" },
+ +
+
+
+ +

+ , . +

+

Yes, you are reading this right. , ., that is a comma followed by a full stop, is my goto +definition shortcut. This is not some kind of evil vim mode. I use pedestrian non-modal editing, +where I copy with ctrl + c, move to the beginning of line with Home and kill a word +with ctrl + Backspace (though keys like Home, Backspace, or arrows are on my +home row thanks to kanata).

+

And yet, I use , as a first keypress in a sequence for multiple shortcuts. That is, , . is +not , + . pressed together, but rather a , followed by a separate .. So, +when I press , my editor doesnt actually type a comma, but rather waits for me to complete +the shortcut. I have many of them, with just a few being:

+
    +
  • +, . goes to definition, +
  • +
  • +, > goes to definition in a split, +
  • +
  • +, r runs a task, +
  • +
  • +, e s edits selection by sorting it, , e C converts to camelCase, +
  • +
  • +, o g opens magit for VS Code, , o k opens +keybindings. +
  • +
  • +, w re-wraps selection at 80 (something I just did to format the previous bullet point), +, p pretty-prints the whole file. +
  • +
+

Ive used many different shortcut schemes, but this is by far the most convenient one for me. How do +I type an comma? I bind , Space and , Enter to insert comma and a space/newline +respectively, which handles most of the cases. And theres , , which types just a lone comma.

+

To remember longer sequences, I pair the comma with +whichkey, such that, when I type , e, what +I see is actually a menu of editing operations:

+ +
+ + +
+

This horrible idea was born in the mind of Susam Pal, and is officially (and aptly I should say) +named Devil Mode.

+

I highly recommend trying it out! It is the perfect interface for actions that you do once in a +while. Where it doesnt work is for actions you want to repeat. For example, if you want to cycle +through compilation errors, binding , e to the next error would probably be a bad +idea, as typing , e , e , e to cycle three times is quite tiring.

+

This is actually a common theme, there are many things you might to cycle back and forward +through:

+
    +
  • +completion suggestions +
  • +
  • +compiler errors +
  • +
  • +textual search results +
  • +
  • +reference search results +
  • +
  • +merge conflicts +
  • +
  • +working tree changes +
  • +
+

It is mighty annoying to have to remember different shortcuts for all of them, isnt it? If only +there was some way to have a universal pair of shortcuts for the next/prev generalized motion

+

The insight here is that youd rarely need to cycle through several different categories of things +at the same time. So I bind the venerable ctrl+n and ctrl+p to repeating the last +next/prev motion. So, if the last next thing was a worktree change, then ctrl+n moves me to +the next worktree change. But if I then query the next compilation error, the subsequent +ctrl+n would continue cycling through compilation errors. To kick-start the cycle, I have a +, n hydra:

+
    +
  • +, n e next error +
  • +
  • +, n c next change +
  • +
  • +, n C next merge Conflict +
  • +
  • +, n r next reference +
  • +
  • +, n f next find +
  • +
  • +, n . previous edit +
  • +
+

I dont know if theres some existing VS Code extension to do this, I implement this in +my personal extension.

+

Hope this is useful! Now go and make a deal with the devil yourself!

+
+
+
+ + + + + diff --git a/2024/10/14/missing-ide-feature.html b/2024/10/14/missing-ide-feature.html new file mode 100644 index 00000000..8c20cd32 --- /dev/null +++ b/2024/10/14/missing-ide-feature.html @@ -0,0 +1,245 @@ + + + + + + + A Missing IDE Feature + + + + + + + + + + + + +
+ +
+ +
+
+ +

A Missing IDE Feature

+

Slightly unusual genre with this article, I want to try to enact a change in the world. I +believe that there is a missing IDE feature which is:

+ +

The target audience here is anyone who can land a PR in Zed, VS Code, Helix, Neovim, Emacs, Kakoune, +or any other editor or any language server. The blog post would be a success if one of you feels +sufficiently inspired to do the thing!

+
+ +

+ The Feature +

+

Suppose you are casually reading the source code of rust-analyzer, and are curious about handling of +method bodies. Theres a Body struct in the code base, and you want to understand how it is used.

+

Would you rather look at this?

+ +
+ + +
+

Or this?

+ +
+ + +
+

(The screenshots are from IntelliJ/RustRover, because of course it gets this right)

+

The second option is clearly superior it conveys significantly more useful information in the +same amount of pixels. Function names, argument lists and return types are so much more valuable +than a body of any particular function. Especially if the function is a page-full of boilerplate +code!

+

And this is the feature I am asking for make the code look like the second image. Or, +specifically, Fold Method Bodies by Default.

+

There are two components here. First, only method bodies are folded. This is a syntactic check — +we are not folding the second level. For code like

+ +
+ + +
fn f() { ... }
+
+impl S {
+    fn g(&self) { ... }
+}
+ +
+

Both f and g are folded, but impl S is not. Similarly, function parameters and function body +are actually on the same level of folding hierarchy, but it is imperative that parameters are not +folded. This is the part that was hard ten years ago but is easy today. what is function body is a +non-trivial question, which requires proper parsing of the code. These days, either an LSP server or +Tree-sitter can answer this question quickly and reliably.

+

The second component of the feature is that folded is a default state. It is not a fold method +bodies action. It is a setting that ensures that, whenever you visit a new file, bodies are +folded by default. To make this work, the editor should be smart to seamlessly unfold specific +function when appropriate. For example, if you go to definition to a function, that function +should get unfolded, while the surrounding code should remail folded.

+

Now that I have explained how the feature works, I will not try to motivate it. I think it is +pretty obvious how awesome this actually is. Code is read more often than written, and this is one +of the best multipliers for readability. Most of the code is in method bodies, but most important +code is in function signatures. Folding bodies auto-magically hide the 80% of boring code, leaving +the most important 20%. It was in 2018 when I last used an IDE (IntelliJ) which has this implemented +properly, and Ive been missing this function ever since!

+

You might also be wondering whether it is the same feature as the Outline, that special UI which +shows a graphical, hierarchical table of contents of the file. It is true that outline and +fold-bodies-by-default attack the same issue. But Id argue that folding solves it better. This is +an instance of a common pattern. In a smart editor, it is often possible to implement any given +feature either by lowering it to plain text, or by creating a dedicated GUI. And the lowering +approach almost always wins, because it gets to re-use all existing functionality for free. For +example, the folding approach trivially gives you an ability to move a bunch of functions from one +impl block to the other by selecting them with Shift + Down, cutting with Ctrl + X +and pasting with Ctrl + V.

+
+
+ +

+ Call to Action +

+

So, if you are a committer to one of the editors, please consider adding a fold function bodies by +default mode. It probably should be off by default, as it can easily scare new users away, but it +should be there for power users to enable, and it should be prominently documented, so that people +can learn that they want it. After the checkbox is in place, see if you can implement the actual +logic! If your editor uses Tree-sitter, this should be relatively easy its syntax tree contains +all the information you need. Just make sure that:

+
    +
  • +bodies are folded when the new file is opened, +
  • +
  • +the editor unfolds them when appropriate (generally, when navigated to a function from elsewhere). +
  • +
+

If your editor is not based on Tree-sitter, youll have a harder time. In theory, the information +should be readily available from the language server, but LSP currently doesnt expose it. Heres +the problem:

+

https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#foldingRangeKind

+

Theres no body kind there! Adding it should be trivially technically, but it always is a pain to +get something into the protocol if you are not VS Code.

+
+
+ +

+ My Role +

+

What is my job here, besides sitting there and writing blog posts? I actually think that writing +this down is quite valuable!

+

I suppose the feature is still commonly missing due to a two-sided market failure the feature +doesnt exist, so prospective users dont realize that it is possible, and dont ask editors +authors to implement it. Without users asking, editor authors themselves dont realize this feature +could exist, and dont rush implementing it. This is exacerbated by the fact that it was a hard +feature to implement ten years ago, when we didnt have Tree-sitter/LSP, so there are poor +workarounds in place actions to fold a certain level. These workarounds the prevent the proper +feature from gaining momentum.

+

So here I hope to maybe tip the equilibriums scale a bit, and start a feedback loop where more +people realize that they want this feature, such that it is implemented in some of the more +experimental editors, which hopefully would expose the feature to more users, popularizing it until +it gets implemented everywhere!

+

Still, just talking isnt everything I did here! Six years ago, I implemented the language-server +side of this in rust-analyzer:

+

https://github.com/rust-lang/rust-analyzer/commit/23b040962ff299feeef1f967bc2d5ba92b01c2bc

+

This currently isnt exposed to LSP, because it doesnt allow flagging a folding range as method +body. To fix that I opened this VS Code issue (LSP generally avoids doing something before VS +Code):

+

https://github.com/microsoft/vscode/issues/128912

+

And since then I have been quietly waiting for some editor (not necessary VS Code) to pick this up. +This hasnt happened yet, hence this article!

+

Thanks for reading!

+
+
+
+ + + + + diff --git a/2024/11/23/semver-is-not-about-you.html b/2024/11/23/semver-is-not-about-you.html new file mode 100644 index 00000000..19baa143 --- /dev/null +++ b/2024/11/23/semver-is-not-about-you.html @@ -0,0 +1,252 @@ + + + + + + + SemVer Is Not About You + + + + + + + + + + + + +
+ +
+ +
+
+ +

SemVer Is Not About You

+

A popular genre of articles for the past few year has been a SemVer Critique, pointing out various +things that are wrong with SemVer itself, or with the way SemVer is being applied, and, customary, +suggesting an alternative versioning scheme. Usually, the focus is either on how SemVer ought to be +used, by library authors (nitpicking the definition of a breaking change), or on how SemVer is (not) +useful for a library consumer (nitpicking the definition of a breaking change).

+

I think these are valid lenses to study SemVer through, but not the most useful. This article +suggest an alternative framing: SemVer is not about you.

+

Before we begin, I would like to carefully delineate the scope. Among other things, SemVer can be +applied to, broadly speaking, applications and libraries. Applications are stand-alone software +artifacts, usable as is. Libraries exist within the larger library ecosystem, and are building +blocks for assembling applications. Libraries both depend on and are depended upon by other +libraries. In the present article, we will look only at the library side.

+
+

At the first glance, it appears that SemVer solves the problem of informing the user when to do the +upgrade: upgrade patch for latest bugfixes, upgrade minor if you want new features, upgrade major if +you want new features and are ready to clean-up your code. But this is not the primary value of this +versioning scheme. The real reason of semver is for managing transitive dependencies.

+

Lets say you are using some version of apples library and some version of oranges library. And +suppose they both depend on the trees library. Because apples and oranges were authored at +different times, they do not necessary depend on the same version of trees. There are two paths +from here.

+

The first is to include two different versions of trees library with your app. This is unfortunate +for a trivial reason of code bloat, and for a more subtle reason of interface leaking: if for some +reason your code needs to pass a tree originating in apples over to the oranges, you must use +exactly the same trees library.

+

The second path is to somehow unify transitive dependencies, and pick a single version of trees +thats good for both apples and oranges. But perhaps there isnt a version that works for both?

+

Whos the right person to choose the appropriate course of action? It could be you, but thats +unfortunate you are using libraries precisely because you want to avoid thinking too much about +their internals. You dont know how apples is using trees. You could learn that, but, +arguably, thats not a good tradeoff (if it is, perhaps you shouldnt depend on apples and instead +maintain your own). Whats worse, for featurefull applications dependency trees run very deep, +potential for conflicts scales at least linearly, and theres only a single you.

+

Another candidate is the author of the trees library they dont know apples and oranges +directly, but they should be thinking about how their library could be used. And, because +different libraries tend to have different authors, the work for resolving version conflicts get +distributed across the set of people that also scales linearly!

+

This is the problem that SemVer solves it has nothing to do with your code or your direct +dependencies, its all about dependencies of your dependencies. SemVer is library maintainer +saying when two versions of their library can be unified:

+ +

Thats it! Thats the whole thing! All the talk about breaking changes is downstream of this actual +behavior of version resolvers.

+
+

Notably, if you are a library maintainer, SemVer isnt about you either. When deciding between major +and minor, you shouldnt be thinking about your direct dependents. They knowingly use your +library, so they are capable of making informed decisions and will manage just fine. The problem are +your transitive dependents. If you release a new major version, dependencies of some application up +the stack could get wedged if somewhere in its dependencies tree there are both versions of your +library which need interoperable types.

+

Or, rather, if you release a new major version, it is guaranteed that some application would have +two copies of your library. Theres no such thing as atomic upgrade of dependencies across the +ecosystem, propagating your new major will take time and there will be extended period where +both majors are used, by different libraries, and both majors end up in applications’ +lockfiles. The question is rather would this be more harmful than just code bloat? If your library +ends up in others public API you will likely lock some upstream applications in a variant of the +following problem:

+ +

Its also worth thinking about virility of major versions if your library is someone elses +public API, your major bump implies their major bump, which is of course bad because putting +work on the plate of other maintainers is bad, but, whats worse, is that this virally amplifies the +number of unsatisfiable graph of dependencies a-la the example above.

+
+ +

+ SemVer-- +

+

Ive seen two interesting extensions to the core SemVer. One is the observation that, to make +tooling work, only two version numbers are sufficient. Theres no real difference between patch +and minor, as far as the actual behavior of version resolution algorithm goes. I am sympathetic to +this argument!

+

The second one is an observation that many projects follow the deprecate than remove cycle. Ive +learned this with the release of Ember 2.0. The big deal about Ember 2.0 is that the only thing +that it did was the removal of deprecation warnings. Code that didnt emit warnings on the latest +Ember 1.x was compatible with 2.0.

+

This feels like the fundamentally right way going about the larger, more important building blocks. +And you sort-of can do this with semver today, if you declare that you are compatible with "1.9, +2.0". But, even today, many years after Ember 2.0, this still feels like a cute trick. This isnt +yet a pattern with a catchy name (like release trains or not rocket science rule) that everyone is +using because it is an obviously good idea

+
+
+ +

+ And Now To Something Completely Different +

+

Circling back to the introduction, the general pattern here is that theres a prescriptivist +approach and a descriptivist one. Prescriptivist argues about the right and wrong ways to use a +particular tool. Descriptivist avoids value judgement, and describes how the thing actually behaves.

+

Another instance of this pattern playing out Ive noticed are log levels. You can get very +philosophical about the difference between error, warn and info. But what helps is looking at +what they do:

+
    +
  • +error pages the operator immediately. +
  • +
  • +warn pages if it repeats frequently. +
  • +
  • +info is what you see in the prog logs when you actively look at them. +
  • +
  • +And debug is what your developers see when they enable extra logging. +
  • +
+ +
+

давайте одевать одежду
+давайте звонит говорить
+а на прескриптивистов будем
+ложить

+
+
avva
+
+
+
+
+ + + + + diff --git a/about.html b/about.html new file mode 100644 index 00000000..76453125 --- /dev/null +++ b/about.html @@ -0,0 +1,116 @@ + + + + + + + matklad + + + + + + + + + + + + +
+ +
+ +
+
+ +

+ Hello! +

+

matklad +I am Alex Kladov, a programmer who loves simple code and programming languages. +You can find me on GitHub or send me an email. +If you want to find me in real life, I am in Lisbon. My resume is here.

+

Code samples on this blog are dual licensed under MIT OR Apache-2.0.

+
+ +
+ + + + + diff --git a/assets/LSP-MxN.png b/assets/LSP-MxN.png new file mode 100644 index 00000000..0b234d45 Binary files /dev/null and b/assets/LSP-MxN.png differ diff --git a/assets/PPerlang.png b/assets/PPerlang.png new file mode 100644 index 00000000..73666f21 Binary files /dev/null and b/assets/PPerlang.png differ diff --git a/assets/active-window.png b/assets/active-window.png new file mode 100644 index 00000000..68cf3f8c Binary files /dev/null and b/assets/active-window.png differ diff --git a/assets/adoc-hl-error.png b/assets/adoc-hl-error.png new file mode 100644 index 00000000..2c71d81d Binary files /dev/null and b/assets/adoc-hl-error.png differ diff --git a/assets/adoc-slide.png b/assets/adoc-slide.png new file mode 100644 index 00000000..a56723ae Binary files /dev/null and b/assets/adoc-slide.png differ diff --git a/assets/cargo-timings.png b/assets/cargo-timings.png new file mode 100644 index 00000000..2b9a8910 Binary files /dev/null and b/assets/cargo-timings.png differ diff --git a/assets/goto-definition-test.png b/assets/goto-definition-test.png new file mode 100644 index 00000000..43bd2182 Binary files /dev/null and b/assets/goto-definition-test.png differ diff --git a/assets/gotodef-macro-1.gif b/assets/gotodef-macro-1.gif new file mode 100644 index 00000000..985ddf85 Binary files /dev/null and b/assets/gotodef-macro-1.gif differ diff --git a/assets/gotodef-macro-2.gif b/assets/gotodef-macro-2.gif new file mode 100644 index 00000000..8c9ba235 Binary files /dev/null and b/assets/gotodef-macro-2.gif differ diff --git a/assets/icons.svg b/assets/icons.svg new file mode 100644 index 00000000..887d3f66 --- /dev/null +++ b/assets/icons.svg @@ -0,0 +1,31 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/assets/lang-pop.png b/assets/lang-pop.png new file mode 100644 index 00000000..fc32f71a Binary files /dev/null and b/assets/lang-pop.png differ diff --git a/assets/magit.png b/assets/magit.png new file mode 100644 index 00000000..f88a5939 Binary files /dev/null and b/assets/magit.png differ diff --git a/assets/min3_diag.png b/assets/min3_diag.png new file mode 100644 index 00000000..f4c2edcd Binary files /dev/null and b/assets/min3_diag.png differ diff --git a/assets/min3_diag_color.png b/assets/min3_diag_color.png new file mode 100644 index 00000000..d9f49ad2 Binary files /dev/null and b/assets/min3_diag_color.png differ diff --git a/assets/min3_par.png b/assets/min3_par.png new file mode 100644 index 00000000..fc1523bf Binary files /dev/null and b/assets/min3_par.png differ diff --git a/assets/min3_rows.png b/assets/min3_rows.png new file mode 100644 index 00000000..533c073f Binary files /dev/null and b/assets/min3_rows.png differ diff --git a/assets/min3_table.png b/assets/min3_table.png new file mode 100644 index 00000000..f642a340 Binary files /dev/null and b/assets/min3_table.png differ diff --git a/assets/priority-inversion.png b/assets/priority-inversion.png new file mode 100644 index 00000000..e89c7ffd Binary files /dev/null and b/assets/priority-inversion.png differ diff --git a/assets/ra-code.png b/assets/ra-code.png new file mode 100644 index 00000000..1eb5758d Binary files /dev/null and b/assets/ra-code.png differ diff --git a/assets/resilient-parsing/lparso.js b/assets/resilient-parsing/lparso.js new file mode 100644 index 00000000..25d241cd --- /dev/null +++ b/assets/resilient-parsing/lparso.js @@ -0,0 +1,195 @@ +let wasm; + +let WASM_VECTOR_LEN = 0; + +let cachedUint8Memory0 = null; + +function getUint8Memory0() { + if (cachedUint8Memory0 === null || cachedUint8Memory0.byteLength === 0) { + cachedUint8Memory0 = new Uint8Array(wasm.memory.buffer); + } + return cachedUint8Memory0; +} + +const cachedTextEncoder = (typeof TextEncoder !== 'undefined' ? new TextEncoder('utf-8') : { encode: () => { throw Error('TextEncoder not available') } } ); + +const encodeString = (typeof cachedTextEncoder.encodeInto === 'function' + ? function (arg, view) { + return cachedTextEncoder.encodeInto(arg, view); +} + : function (arg, view) { + const buf = cachedTextEncoder.encode(arg); + view.set(buf); + return { + read: arg.length, + written: buf.length + }; +}); + +function passStringToWasm0(arg, malloc, realloc) { + + if (realloc === undefined) { + const buf = cachedTextEncoder.encode(arg); + const ptr = malloc(buf.length) >>> 0; + getUint8Memory0().subarray(ptr, ptr + buf.length).set(buf); + WASM_VECTOR_LEN = buf.length; + return ptr; + } + + let len = arg.length; + let ptr = malloc(len) >>> 0; + + const mem = getUint8Memory0(); + + let offset = 0; + + for (; offset < len; offset++) { + const code = arg.charCodeAt(offset); + if (code > 0x7F) break; + mem[ptr + offset] = code; + } + + if (offset !== len) { + if (offset !== 0) { + arg = arg.slice(offset); + } + ptr = realloc(ptr, len, len = offset + arg.length * 3) >>> 0; + const view = getUint8Memory0().subarray(ptr + offset, ptr + len); + const ret = encodeString(arg, view); + + offset += ret.written; + } + + WASM_VECTOR_LEN = offset; + return ptr; +} + +let cachedInt32Memory0 = null; + +function getInt32Memory0() { + if (cachedInt32Memory0 === null || cachedInt32Memory0.byteLength === 0) { + cachedInt32Memory0 = new Int32Array(wasm.memory.buffer); + } + return cachedInt32Memory0; +} + +const cachedTextDecoder = (typeof TextDecoder !== 'undefined' ? new TextDecoder('utf-8', { ignoreBOM: true, fatal: true }) : { decode: () => { throw Error('TextDecoder not available') } } ); + +if (typeof TextDecoder !== 'undefined') { cachedTextDecoder.decode(); }; + +function getStringFromWasm0(ptr, len) { + ptr = ptr >>> 0; + return cachedTextDecoder.decode(getUint8Memory0().subarray(ptr, ptr + len)); +} +/** +* @param {string} text +* @returns {string} +*/ +export function print_syntax_tree(text) { + let deferred2_0; + let deferred2_1; + try { + const retptr = wasm.__wbindgen_add_to_stack_pointer(-16); + const ptr0 = passStringToWasm0(text, wasm.__wbindgen_malloc, wasm.__wbindgen_realloc); + const len0 = WASM_VECTOR_LEN; + wasm.print_syntax_tree(retptr, ptr0, len0); + var r0 = getInt32Memory0()[retptr / 4 + 0]; + var r1 = getInt32Memory0()[retptr / 4 + 1]; + deferred2_0 = r0; + deferred2_1 = r1; + return getStringFromWasm0(r0, r1); + } finally { + wasm.__wbindgen_add_to_stack_pointer(16); + wasm.__wbindgen_free(deferred2_0, deferred2_1); + } +} + +async function __wbg_load(module, imports) { + if (typeof Response === 'function' && module instanceof Response) { + if (typeof WebAssembly.instantiateStreaming === 'function') { + try { + return await WebAssembly.instantiateStreaming(module, imports); + + } catch (e) { + if (module.headers.get('Content-Type') != 'application/wasm') { + console.warn("`WebAssembly.instantiateStreaming` failed because your server does not serve wasm with `application/wasm` MIME type. Falling back to `WebAssembly.instantiate` which is slower. Original error:\n", e); + + } else { + throw e; + } + } + } + + const bytes = await module.arrayBuffer(); + return await WebAssembly.instantiate(bytes, imports); + + } else { + const instance = await WebAssembly.instantiate(module, imports); + + if (instance instanceof WebAssembly.Instance) { + return { instance, module }; + + } else { + return instance; + } + } +} + +function __wbg_get_imports() { + const imports = {}; + imports.wbg = {}; + + return imports; +} + +function __wbg_init_memory(imports, maybe_memory) { + +} + +function __wbg_finalize_init(instance, module) { + wasm = instance.exports; + __wbg_init.__wbindgen_wasm_module = module; + cachedInt32Memory0 = null; + cachedUint8Memory0 = null; + + + return wasm; +} + +function initSync(module) { + if (wasm !== undefined) return wasm; + + const imports = __wbg_get_imports(); + + __wbg_init_memory(imports); + + if (!(module instanceof WebAssembly.Module)) { + module = new WebAssembly.Module(module); + } + + const instance = new WebAssembly.Instance(module, imports); + + return __wbg_finalize_init(instance, module); +} + +async function __wbg_init(input) { + if (wasm !== undefined) return wasm; + + if (typeof input === 'undefined') { + input = new URL('lparso_bg.wasm', import.meta.url); + } + const imports = __wbg_get_imports(); + + if (typeof input === 'string' || (typeof Request === 'function' && input instanceof Request) || (typeof URL === 'function' && input instanceof URL)) { + input = fetch(input); + } + + __wbg_init_memory(imports); + + const { instance, module } = await __wbg_load(await input, imports); + + return __wbg_finalize_init(instance, module); +} + +export { initSync } +export default __wbg_init; diff --git a/assets/resilient-parsing/lparso_bg.wasm b/assets/resilient-parsing/lparso_bg.wasm new file mode 100644 index 00000000..5c0c5e08 Binary files /dev/null and b/assets/resilient-parsing/lparso_bg.wasm differ diff --git a/assets/resilient-parsing/main.js b/assets/resilient-parsing/main.js new file mode 100644 index 00000000..236f1000 --- /dev/null +++ b/assets/resilient-parsing/main.js @@ -0,0 +1,15 @@ +import init, { print_syntax_tree } from "./lparso.js"; + +async function main() { + await init(); + const input = document.querySelector("#playground > .input"); + const output = document.querySelector("#playground > .output"); + + function update(text) { + output.textContent = print_syntax_tree(text); + } + input.addEventListener("input", (event) => update(event.target.value)); + update(input.textContent); +} + +main(); diff --git a/assets/zig-lsp.jpg b/assets/zig-lsp.jpg new file mode 100644 index 00000000..af821a56 Binary files /dev/null and b/assets/zig-lsp.jpg differ diff --git a/blogroll.html b/blogroll.html new file mode 100644 index 00000000..1a34153b --- /dev/null +++ b/blogroll.html @@ -0,0 +1,195 @@ + + + + + + + matklad + + + + + + + + + + + + +
+ +
+ +
+ +
+ + + + + diff --git a/css/EBGaramond-400-Italic.woff2 b/css/EBGaramond-400-Italic.woff2 new file mode 100644 index 00000000..924fe133 Binary files /dev/null and b/css/EBGaramond-400-Italic.woff2 differ diff --git a/css/EBGaramond-400-Normal.woff2 b/css/EBGaramond-400-Normal.woff2 new file mode 100644 index 00000000..88953224 Binary files /dev/null and b/css/EBGaramond-400-Normal.woff2 differ diff --git a/css/EBGaramond-700-Italic.woff2 b/css/EBGaramond-700-Italic.woff2 new file mode 100644 index 00000000..e53c9394 Binary files /dev/null and b/css/EBGaramond-700-Italic.woff2 differ diff --git a/css/EBGaramond-700-Normal.woff2 b/css/EBGaramond-700-Normal.woff2 new file mode 100644 index 00000000..9c46aca1 Binary files /dev/null and b/css/EBGaramond-700-Normal.woff2 differ diff --git a/css/JetBrainsMono-400-Normal.woff2 b/css/JetBrainsMono-400-Normal.woff2 new file mode 100644 index 00000000..fdf95dde Binary files /dev/null and b/css/JetBrainsMono-400-Normal.woff2 differ diff --git a/css/JetBrainsMono-700-Normal.woff2 b/css/JetBrainsMono-700-Normal.woff2 new file mode 100644 index 00000000..d0980761 Binary files /dev/null and b/css/JetBrainsMono-700-Normal.woff2 differ diff --git a/css/OpenSans-300-Normal.woff2 b/css/OpenSans-300-Normal.woff2 new file mode 100644 index 00000000..67a6cffc Binary files /dev/null and b/css/OpenSans-300-Normal.woff2 differ diff --git a/css/main.css b/css/main.css new file mode 100644 index 00000000..baa15eaf --- /dev/null +++ b/css/main.css @@ -0,0 +1,166 @@ +html { font-family: "EB Garamond", serif; font-size: 22px; line-height: 1.3em; } + +h1 { margin-bottom: 0.75rem; } +h2, h3 { margin-bottom: 0.5rem; } +section { margin-top: 1rem; } +p, table, ol, ul, figure, aside, dl, hr { margin-bottom: 0.5rem; } + +sup, sub { line-height: 0; } +svg.icon { width: 1rem; height: 1rem; vertical-align: middle; } + +h1, h2, h3 { + font-family: "Open Sans", sans-serif; + font-weight: 300; + color: #ba3925; + text-rendering: optimizeLegibility; + line-height: 1em; +} +h1, h2 { font-size: 1.5rem; } +h3 { font-size: 1.2rem;} + +section:target > :is(h1, h2, h3) { position: relative; } +section:target > :is(h1, h2, h3)::before { position: absolute; left: -1.2ch; content: "§"; } +:is(h1, h2, h3) > a { color: inherit; text-decoration:none } +:is(h1, h2, h3) > a:hover { color: inherit; text-decoration:none } + + +/* Block */ + +img, video { display: inline-block; vertical-align: middle; max-width: 100%; height: auto; } +figure > img, figure > video { display: block; margin-left: auto; margin-right: auto; } + +p { hyphens: auto; -webkit-hyphens: auto; text-align: justify; } +figure.blockquote { padding-left: 1em; border-left: 3px solid #ba3925; } +figure.blockquote > figcaption { text-align: right; } + +table { border-collapse: collapse; background: #fff; } +table, td, th { border: 1px solid #dedede; } +td, th { padding: 0.5625em 0.625em } + +ol, ul, dd { margin-left: 3ch; } +ul { list-style-type: circle;} +.roman { list-style-type: lower-roman; } + +dt { font-weight: bold; } + +aside.admn { display: flex; flex-direction: row; align-items: center; } +aside.admn > svg.icon { flex-shrink: 0; width: 2rem; height: 2rem; fill: #19407c; } +aside.admn.warn > svg.icon { fill: #ba3925; } +aside.admn > div { padding-left: 1ch; margin-left: 1ch; border-left: 1px solid #dddddf } + +aside.block { + border-style: solid; border-width: 1px; border-radius: 4px; border-color: #dbdbd6; + padding: 1em; + background: #f3f3f2; +} +aside.block > .title { + font-family: "Open Sans", sans-serif; font-size: 1.5rem; color: #7a2518; + text-align: center; + margin-top: 0; margin-bottom: 0.5rem; +} +aside.block > :last-child { margin-bottom: 0; } + +details { padding-left: 1em; border-left: 3px solid #19407c; } + +pre { line-height: 1rem;} +code { + font-family: "JetBrains Mono", monospace; font-variant-ligatures: none; font-size: 0.75em; + color: rgba(0, 0, 0, .9); +} +figcaption.title { + font-style: italic; font-weight: 400; + line-height: 1.45; + color: #7a2518; + margin-top: 0; margin-bottom: 0.25em; +} +figure.code-block > pre > code { + display: flex; flex-direction: column; + overflow-x: auto; overflow-y: clip; + counter-reset: line; +} +figure.code-block > pre > code > .line { counter-increment: line; } +figure.code-block > pre > code > .line:before { + content: counter(line); + display: inline-block; + width: 3ch; padding-right: 0.5ch; margin-right: 1ch; + text-align: right; + opacity: .35; + border-right: 1px solid black; +} +ol.callout { list-style: none; counter-reset: callout; } +ol.callout > li { position: relative; } +ol.callout > li::before { + counter-increment: callout; content: counter(callout); + position: absolute; top: 0.2rem; left: -1.1rem; +} +i.callout::after { + content: attr(data-value); +} +ol.callout > li::before, i.callout::after { + font-family: "JetBrains Mono"; font-style: normal; font-size: 0.75rem; font-weight: bold; + display: inline-block; width: 0.9rem; height: 0.9rem; line-height: 0.9rem; + border-radius: 100%; + background-color: black; + color: white; + text-align: center; +} + +.two-col { display: flex; flex-direction: row; } +.two-col > *:first-child { flex: 30%; } +.two-col > *:last-child { flex: 30%; } + +hr { border: none; height: 0; overflow: visible; color: black; height: 1rem; } +hr::after { content: '❧'; display: block; text-align: center; } + +/* Inline */ + +p>code { white-space: nowrap; } /* Sadly, overflow-wrap: anywhere doesn't compose with this */ +.display { display: block; margin: 1em 0; text-align: center } + +a { text-decoration-color: #2156a5; color: black; } +a:hover, a:focus { color: #2156a5; fill: #2156a5; } +a.url { word-break: break-all; } + +kbd { + font-family: "JetBrains Mono", monospace; font-variant-ligatures: none; font-size: .65rem; + line-height: 1.45; +} + +kbd > kbd { + display: inline-block; + color: rgba(0, 0, 0, .8); background: #f7f7f7; border: 1px solid #ccc; border-radius: 3px; box-shadow: 0 1px 0 rgb(0 0 0 / 20%), 0 0 0 0.1em #fff inset; + margin: 0 0.15em; padding: 0.2em 0.5em; top: -0.1em; + vertical-align: middle; position: relative; white-space: nowrap; +} + +dfn, .small-caps { font-style: normal; font-variant: small-caps; } + +.meta { display: block; display: block; color: #828282; font-family: "Open Sans", sans-serif; font-size: 1rem;} + +.menu { font-weight: bold; } + +/* Special Cases */ + +.post-list { margin-left: 0; list-style: none; } +.post-list > li { margin-top: 1em; } +.post-list h2 { margin-top: 0; } +.post-list a { color: #ba3925; text-decoration: none; display: block; } +.post-list a:hover { color: #ba3925; text-decoration: underline; } + +.about-ava { float: left; margin-right: 2ch; display: inline;} + +/* Highlighting */ + +.hl-keyword, .hl-literal { color: #000000; font-weight: bold; } +.hl-type { color: #0086B3; } +.hl-tag { color: #000080; } +.hl-title.function_ { color: #990000; font-weight: bold; } +.hl-title.class_{ color: #445588; font-weight: bold; } +.hl-comment { color: #008000; font-style: italic; } +.hl-built_in, .hl-meta { color: #3c5d5d; font-weight: bold; } +.hl-number { color: #009999; } +.hl-string { color: #d14; } +.hl-output { color: #2156a5; } +.hl-subst { color: rgba(0, 0, 0, .9); } +.hl-attr, .hl-symbol { color: #008080; } +.hl-line { background-color: #ffc; } diff --git a/css/resume.css b/css/resume.css new file mode 100644 index 00000000..e376af8a --- /dev/null +++ b/css/resume.css @@ -0,0 +1,16 @@ +@media print { + header { display: none; } + footer { display: none; } + main { display: block; } + html { font-size: 18px; } + h1 { display: none; } + .page-break { break-before: page; } + section:has(>h3) { break-inside: avoid;} +} + +h3 { + font: inherit; + font-weight: bold; + color: black; + font-size: 1.2rem; +} diff --git a/favicon.png b/favicon.png new file mode 100644 index 00000000..e3c0ed70 Binary files /dev/null and b/favicon.png differ diff --git a/favicon.svg b/favicon.svg new file mode 100644 index 00000000..6eda2dd5 --- /dev/null +++ b/favicon.svg @@ -0,0 +1,4 @@ + + + + diff --git a/feed.xml b/feed.xml new file mode 100644 index 00000000..3cd5c15e --- /dev/null +++ b/feed.xml @@ -0,0 +1,2885 @@ + + + + +2024-11-24T00:33:48.236Z +https://matklad.github.io/feed.xml +matklad +Yet another programming blog by Alex Kladov aka matklad. +Alex Kladov + + +SemVer Is Not About You + +2024-11-23T00:00:00+00:00 +2024-11-23T00:00:00+00:00 +https://matklad.github.io/2024/11/23/semver-is-not-about-you +Alex Kladov + +SemVer Is Not About You +

A popular genre of articles for the past few year has been a SemVer Critique, pointing out various +things that are wrong with SemVer itself, or with the way SemVer is being applied, and, customary, +suggesting an alternative versioning scheme. Usually, the focus is either on how SemVer ought to be +used, by library authors (nitpicking the definition of a breaking change), or on how SemVer is (not) +useful for a library consumer (nitpicking the definition of a breaking change).

+

I think these are valid lenses to study SemVer through, but not the most useful. This article +suggest an alternative framing: SemVer is not about you.

+

Before we begin, I would like to carefully delineate the scope. Among other things, SemVer can be +applied to, broadly speaking, applications and libraries. Applications are stand-alone software +artifacts, usable as is. Libraries exist within the larger library ecosystem, and are building +blocks for assembling applications. Libraries both depend on and are depended upon by other +libraries. In the present article, we will look only at the library side.

+
+

At the first glance, it appears that SemVer solves the problem of informing the user when to do the +upgrade: upgrade patch for latest bugfixes, upgrade minor if you want new features, upgrade major if +you want new features and are ready to clean-up your code. But this is not the primary value of this +versioning scheme. The real reason of semver is for managing transitive dependencies.

+

Lets say you are using some version of apples library and some version of oranges library. And +suppose they both depend on the trees library. Because apples and oranges were authored at +different times, they do not necessary depend on the same version of trees. There are two paths +from here.

+

The first is to include two different versions of trees library with your app. This is unfortunate +for a trivial reason of code bloat, and for a more subtle reason of interface leaking: if for some +reason your code needs to pass a tree originating in apples over to the oranges, you must use +exactly the same trees library.

+

The second path is to somehow unify transitive dependencies, and pick a single version of trees +thats good for both apples and oranges. But perhaps there isnt a version that works for both?

+

Whos the right person to choose the appropriate course of action? It could be you, but thats +unfortunate you are using libraries precisely because you want to avoid thinking too much about +their internals. You dont know how apples is using trees. You could learn that, but, +arguably, thats not a good tradeoff (if it is, perhaps you shouldnt depend on apples and instead +maintain your own). Whats worse, for featurefull applications dependency trees run very deep, +potential for conflicts scales at least linearly, and theres only a single you.

+

Another candidate is the author of the trees library they dont know apples and oranges +directly, but they should be thinking about how their library could be used. And, because +different libraries tend to have different authors, the work for resolving version conflicts get +distributed across the set of people that also scales linearly!

+

This is the problem that SemVer solves it has nothing to do with your code or your direct +dependencies, its all about dependencies of your dependencies. SemVer is library maintainer +saying when two versions of their library can be unified:

+
    +
  • +If major version is bumped, no unification happens, the library will get duplicated. +
  • +
  • +If major is not bumped, the versions can be unified. +
  • +
+

Thats it! Thats the whole thing! All the talk about breaking changes is downstream of this actual +behavior of version resolvers.

+
+

Notably, if you are a library maintainer, SemVer isnt about you either. When deciding between major +and minor, you shouldnt be thinking about your direct dependents. They knowingly use your +library, so they are capable of making informed decisions and will manage just fine. The problem are +your transitive dependents. If you release a new major version, dependencies of some application up +the stack could get wedged if somewhere in its dependencies tree there are both versions of your +library which need interoperable types.

+

Or, rather, if you release a new major version, it is guaranteed that some application would have +two copies of your library. Theres no such thing as atomic upgrade of dependencies across the +ecosystem, propagating your new major will take time and there will be extended period where +both majors are used, by different libraries, and both majors end up in applications’ +lockfiles. The question is rather would this be more harmful than just code bloat? If your library +ends up in others public API you will likely lock some upstream applications in a variant of the +following problem:

+
    +
  • +We need to update lemons to new version to get access to this critical bug fix for the new MacOS +version +
  • +
  • +But lemons is an actively developed library, it upgraded to the new version of trees library +three months ago and MacOS bugfix sits on top of that version. +
  • +
  • +But we also use limes, which is a bit of a more niche product, and so hasnt seen upgrade for +about a year. +
  • +
  • +And we also use the same pool of trees for both, so our latest limes prevent upgrading +lemons. +
  • +
+

Its also worth thinking about virility of major versions if your library is someone elses +public API, your major bump implies their major bump, which is of course bad because putting +work on the plate of other maintainers is bad, but, whats worse, is that this virally amplifies the +number of unsatisfiable graph of dependencies a-la the example above.

+
+ +

+ SemVer-- +

+

Ive seen two interesting extensions to the core SemVer. One is the observation that, to make +tooling work, only two version numbers are sufficient. Theres no real difference between patch +and minor, as far as the actual behavior of version resolution algorithm goes. I am sympathetic to +this argument!

+

The second one is an observation that many projects follow the deprecate than remove cycle. Ive +learned this with the release of Ember 2.0. The big deal about Ember 2.0 is that the only thing +that it did was the removal of deprecation warnings. Code that didnt emit warnings on the latest +Ember 1.x was compatible with 2.0.

+

This feels like the fundamentally right way going about the larger, more important building blocks. +And you sort-of can do this with semver today, if you declare that you are compatible with "1.9, +2.0". But, even today, many years after Ember 2.0, this still feels like a cute trick. This isnt +yet a pattern with a catchy name (like release trains or not rocket science rule) that everyone is +using because it is an obviously good idea

+
+
+ +

+ And Now To Something Completely Different +

+

Circling back to the introduction, the general pattern here is that theres a prescriptivist +approach and a descriptivist one. Prescriptivist argues about the right and wrong ways to use a +particular tool. Descriptivist avoids value judgement, and describes how the thing actually behaves.

+

Another instance of this pattern playing out Ive noticed are log levels. You can get very +philosophical about the difference between error, warn and info. But what helps is looking at +what they do:

+
    +
  • +error pages the operator immediately. +
  • +
  • +warn pages if it repeats frequently. +
  • +
  • +info is what you see in the prog logs when you actively look at them. +
  • +
  • +And debug is what your developers see when they enable extra logging. +
  • +
+ +
+

давайте одевать одежду
+давайте звонит говорить
+а на прескриптивистов будем
+ложить

+
+
avva
+
+
+]]>
+
+ + +A Missing IDE Feature + +2024-10-14T00:00:00+00:00 +2024-10-14T00:00:00+00:00 +https://matklad.github.io/2024/10/14/missing-ide-feature +Alex Kladov + +A Missing IDE Feature +

Slightly unusual genre with this article, I want to try to enact a change in the world. I +believe that there is a missing IDE feature which is:

+
    +
  • +very easy to implement (these days), +
  • +
  • +is a large force multiplier for experienced users, +
  • +
  • +is conspicuously missing from almost every editor. +
  • +
+

The target audience here is anyone who can land a PR in Zed, VS Code, Helix, Neovim, Emacs, Kakoune, +or any other editor or any language server. The blog post would be a success if one of you feels +sufficiently inspired to do the thing!

+
+ +

+ The Feature +

+

Suppose you are casually reading the source code of rust-analyzer, and are curious about handling of +method bodies. Theres a Body struct in the code base, and you want to understand how it is used.

+

Would you rather look at this?

+ +
+ + +
+

Or this?

+ +
+ + +
+

(The screenshots are from IntelliJ/RustRover, because of course it gets this right)

+

The second option is clearly superior it conveys significantly more useful information in the +same amount of pixels. Function names, argument lists and return types are so much more valuable +than a body of any particular function. Especially if the function is a page-full of boilerplate +code!

+

And this is the feature I am asking for make the code look like the second image. Or, +specifically, Fold Method Bodies by Default.

+

There are two components here. First, only method bodies are folded. This is a syntactic check — +we are not folding the second level. For code like

+ +
+ + +
fn f() { ... }
+
+impl S {
+    fn g(&self) { ... }
+}
+ +
+

Both f and g are folded, but impl S is not. Similarly, function parameters and function body +are actually on the same level of folding hierarchy, but it is imperative that parameters are not +folded. This is the part that was hard ten years ago but is easy today. what is function body is a +non-trivial question, which requires proper parsing of the code. These days, either an LSP server or +Tree-sitter can answer this question quickly and reliably.

+

The second component of the feature is that folded is a default state. It is not a fold method +bodies action. It is a setting that ensures that, whenever you visit a new file, bodies are +folded by default. To make this work, the editor should be smart to seamlessly unfold specific +function when appropriate. For example, if you go to definition to a function, that function +should get unfolded, while the surrounding code should remail folded.

+

Now that I have explained how the feature works, I will not try to motivate it. I think it is +pretty obvious how awesome this actually is. Code is read more often than written, and this is one +of the best multipliers for readability. Most of the code is in method bodies, but most important +code is in function signatures. Folding bodies auto-magically hide the 80% of boring code, leaving +the most important 20%. It was in 2018 when I last used an IDE (IntelliJ) which has this implemented +properly, and Ive been missing this function ever since!

+

You might also be wondering whether it is the same feature as the Outline, that special UI which +shows a graphical, hierarchical table of contents of the file. It is true that outline and +fold-bodies-by-default attack the same issue. But Id argue that folding solves it better. This is +an instance of a common pattern. In a smart editor, it is often possible to implement any given +feature either by lowering it to plain text, or by creating a dedicated GUI. And the lowering +approach almost always wins, because it gets to re-use all existing functionality for free. For +example, the folding approach trivially gives you an ability to move a bunch of functions from one +impl block to the other by selecting them with Shift + Down, cutting with Ctrl + X +and pasting with Ctrl + V.

+
+
+ +

+ Call to Action +

+

So, if you are a committer to one of the editors, please consider adding a fold function bodies by +default mode. It probably should be off by default, as it can easily scare new users away, but it +should be there for power users to enable, and it should be prominently documented, so that people +can learn that they want it. After the checkbox is in place, see if you can implement the actual +logic! If your editor uses Tree-sitter, this should be relatively easy its syntax tree contains +all the information you need. Just make sure that:

+
    +
  • +bodies are folded when the new file is opened, +
  • +
  • +the editor unfolds them when appropriate (generally, when navigated to a function from elsewhere). +
  • +
+

If your editor is not based on Tree-sitter, youll have a harder time. In theory, the information +should be readily available from the language server, but LSP currently doesnt expose it. Heres +the problem:

+

https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#foldingRangeKind

+

Theres no body kind there! Adding it should be trivially technically, but it always is a pain to +get something into the protocol if you are not VS Code.

+
+
+ +

+ My Role +

+

What is my job here, besides sitting there and writing blog posts? I actually think that writing +this down is quite valuable!

+

I suppose the feature is still commonly missing due to a two-sided market failure the feature +doesnt exist, so prospective users dont realize that it is possible, and dont ask editors +authors to implement it. Without users asking, editor authors themselves dont realize this feature +could exist, and dont rush implementing it. This is exacerbated by the fact that it was a hard +feature to implement ten years ago, when we didnt have Tree-sitter/LSP, so there are poor +workarounds in place actions to fold a certain level. These workarounds the prevent the proper +feature from gaining momentum.

+

So here I hope to maybe tip the equilibriums scale a bit, and start a feedback loop where more +people realize that they want this feature, such that it is implemented in some of the more +experimental editors, which hopefully would expose the feature to more users, popularizing it until +it gets implemented everywhere!

+

Still, just talking isnt everything I did here! Six years ago, I implemented the language-server +side of this in rust-analyzer:

+

https://github.com/rust-lang/rust-analyzer/commit/23b040962ff299feeef1f967bc2d5ba92b01c2bc

+

This currently isnt exposed to LSP, because it doesnt allow flagging a folding range as method +body. To fix that I opened this VS Code issue (LSP generally avoids doing something before VS +Code):

+

https://github.com/microsoft/vscode/issues/128912

+

And since then I have been quietly waiting for some editor (not necessary VS Code) to pick this up. +This hasnt happened yet, hence this article!

+

Thanks for reading!

+
+]]>
+
+ + +Two Workflow Tips + +2024-10-08T00:00:00+00:00 +2024-10-08T00:00:00+00:00 +https://matklad.github.io/2024/10/08/two-tips +Alex Kladov + +Two Workflow Tips +

An article about a couple of relatively recent additions to my workflow which I wish I knew about +years ago.

+
+ +

+ Split And Go To Definition +

+

Go to definition is super useful (in general, navigation is much more important than code +completion). But often, when I use goto def I dont actually mean to permanently go there. +Rather, I want to stay where I am, but I need a bit more context about a particular thing at point.

+

What Ive found works really great in this context is to split the screen in two, and issue go to +def in the split. So that you see both the original context, and the definition at the same time, +and can choose just how far you would like to go. Heres an example, where I want to understand how +apply function works, and, to understand that, I want to quickly look up the definition of +FuzzOp:

+ +
+ + +
+

VS Code actually has a first-class UI for something like this, called Peek Definition but it is +just not good it opens some kind of separate pop-up, with a completely custom UX. Its much more +fruitful to compose two existing basic features splitting the screen and going to definition.

+

Note that in the example above I do move focus to the split. I also tried a version that keeps focus +in the original split, but focusing new one turned out to be much better. You actually dont always +know up front which split would become the main one, and moving the focus gives you flexibility of +moving around, closing the split, or closing the other split.

+

I highly recommend adding a shortcut for this action. Its a good idea to make it a complementary” +shortcut for the usual goto definition. I use , . for goto definition, and hence , > +is the splitting version:

+ +
+ + +
{ "key": ", .",       "command": "editor.action.revealDefinition" },
+{ "key": ", shift+.", "command": "editor.action.revealDefinitionAside" },
+ +
+
+
+ +

+ , . +

+

Yes, you are reading this right. , ., that is a comma followed by a full stop, is my goto +definition shortcut. This is not some kind of evil vim mode. I use pedestrian non-modal editing, +where I copy with ctrl + c, move to the beginning of line with Home and kill a word +with ctrl + Backspace (though keys like Home, Backspace, or arrows are on my +home row thanks to kanata).

+

And yet, I use , as a first keypress in a sequence for multiple shortcuts. That is, , . is +not , + . pressed together, but rather a , followed by a separate .. So, +when I press , my editor doesnt actually type a comma, but rather waits for me to complete +the shortcut. I have many of them, with just a few being:

+
    +
  • +, . goes to definition, +
  • +
  • +, > goes to definition in a split, +
  • +
  • +, r runs a task, +
  • +
  • +, e s edits selection by sorting it, , e C converts to camelCase, +
  • +
  • +, o g opens magit for VS Code, , o k opens +keybindings. +
  • +
  • +, w re-wraps selection at 80 (something I just did to format the previous bullet point), +, p pretty-prints the whole file. +
  • +
+

Ive used many different shortcut schemes, but this is by far the most convenient one for me. How do +I type an comma? I bind , Space and , Enter to insert comma and a space/newline +respectively, which handles most of the cases. And theres , , which types just a lone comma.

+

To remember longer sequences, I pair the comma with +whichkey, such that, when I type , e, what +I see is actually a menu of editing operations:

+ +
+ + +
+

This horrible idea was born in the mind of Susam Pal, and is officially (and aptly I should say) +named Devil Mode.

+

I highly recommend trying it out! It is the perfect interface for actions that you do once in a +while. Where it doesnt work is for actions you want to repeat. For example, if you want to cycle +through compilation errors, binding , e to the next error would probably be a bad +idea, as typing , e , e , e to cycle three times is quite tiring.

+

This is actually a common theme, there are many things you might to cycle back and forward +through:

+
    +
  • +completion suggestions +
  • +
  • +compiler errors +
  • +
  • +textual search results +
  • +
  • +reference search results +
  • +
  • +merge conflicts +
  • +
  • +working tree changes +
  • +
+

It is mighty annoying to have to remember different shortcuts for all of them, isnt it? If only +there was some way to have a universal pair of shortcuts for the next/prev generalized motion

+

The insight here is that youd rarely need to cycle through several different categories of things +at the same time. So I bind the venerable ctrl+n and ctrl+p to repeating the last +next/prev motion. So, if the last next thing was a worktree change, then ctrl+n moves me to +the next worktree change. But if I then query the next compilation error, the subsequent +ctrl+n would continue cycling through compilation errors. To kick-start the cycle, I have a +, n hydra:

+
    +
  • +, n e next error +
  • +
  • +, n c next change +
  • +
  • +, n C next merge Conflict +
  • +
  • +, n r next reference +
  • +
  • +, n f next find +
  • +
  • +, n . previous edit +
  • +
+

I dont know if theres some existing VS Code extension to do this, I implement this in +my personal extension.

+

Hope this is useful! Now go and make a deal with the devil yourself!

+
+]]>
+
+ + +On Ousterhout's Dichotomy + +2024-10-06T00:00:00+00:00 +2024-10-06T00:00:00+00:00 +https://matklad.github.io/2024/10/06/ousterhouts-dichotomy +Alex Kladov + +On Ousterhouts Dichotomy +

Why are there so many programming languages? One of the driving reasons for this is that some +languages tend to produce fast code, but are a bit of a pain to use (C++), while others are a breeze +to write, but run somewhat slow (Python). Depending on the ratio of CPUs to programmers, one or the +other might be relatively more important.

+

But cant we just, like, implement a universal language that is convenient but slowish by default, +but allows an expert programmer to drop to a lower, more performant but harder register? I think +there were many attempts at this, and they didnt quite work out.

+

The natural way to go about this is to start from the high-level side. Build a high-level +featureful language with large runtime, and then provide granular opt outs of specific runtime +facilities. Two great examples here are C# and D. And the most famous example of this paradigm is +Python, with rewrite slow parts in C mantra.

+

It seems to me that such an approach can indeed solve the easy to use part of the dichotomy, but +doesnt quite work as promised for runs fast one. And heres the reason. For performance, what +matters is not so much the code thats executed, but rather the layout of objects in memory. And the +high-level dialect locks-in pointer-heavy GC object model! Even if you write your code in assembly, +the performance ceiling will be determined by all those pointers GC needs. To actually get full +“low-level performance, you need to effectively mirror the data across the dialects across a +quasi-FFI boundary.

+

And thats what kills write most of the code in Python, rewrite hot spots in C approach the +overhead for transitioning between the native C data structures and the Python ones tends to eat any +performance benefits that C brings to the table. There are some very real, very important +exceptions, where it is possible to batch sufficiently large packages of work to minimize the +overhead: http://venge.net/graydon/talks/VectorizedInterpretersTalk-2023-05-12.pdf. +But it seems that the average case looks more like this: +https://code.visualstudio.com/blogs/2018/03/23/text-buffer-reimplementation.

+

And this brings me to Rust. It feels like it accidentally blundered into the space of universal +languages through the floor. There are no heavy runtime-features to opt out of in Rust. The object +model is universal throughout the language. There isnt a value-semantics/reference-semantics +dichotomy, references are first-class values. And yet:

+
    +
  • +Theres memory safety, which removes most of the fun aspects of low-level programming. +
  • +
  • +The language didnt sleep on basic PL niceties like sum-types, generics and +“everything-is-expression. +
  • +
  • +And a healthy minority of rubyists in the community worked tirelessly to ensure that systems +programmers can have nice +things. +
  • +
+

As a result, there is a certain spectrum of Rust:

+
    +
  • +Sloppy Rust, which allocates and clones left-and-right. +
  • +
  • +Normal Rust, which opportunistically uses pretzels and avoids gratuitous allocations but otherwise +doesnt try to optimize anything specifically. +
  • +
  • +DoD Rust, which thinks a bit about cache-lines, packs things into arenas, uses indexes instead of +pointers with an occasional SoA and SIMD. +
  • +
  • +Crazy here-be-dragons Rust with untagged unions, unsafe, inline assembly and other wizardry. +
  • +
+

While the bottom end here sits pretty comfortably next to C, the upper tip doesnt quite reach the +usability level of Python. But this is mostly compensated through these three effects:

+
    +
  • +Unified object model ensures that theres no performance tax and little ceremony when going up and, +down performance sloppiness spectrum. +
  • +
  • +Unsafe abstractions +not only allow an expert programmer to write optimal code, but, crucially, they allow wrapping it +into misuse-resistance interface, which a non-expert programmer can easily use from a high-level +Rust dialect. +
  • +
  • +Performance option is quite an unfair advantage. When you start writing something, you dont +necessary know how fast the thing would have to be. It often depends on the uncertain future. But, +if you can sacrifice just a tiny bit of developer experience to get an insurance that, if push +comes to shove, you could incrementally arrive at the optimal performance without whole-system +rewrites, that is often a hard-to-refuse offer. +
  • +
+]]>
+
+ + +The Watermelon Operator + +2024-09-24T00:00:00+00:00 +2024-09-24T00:00:00+00:00 +https://matklad.github.io/2024/09/24/watermelon-operator +Alex Kladov + +The Watermelon Operator +

In these two most excellent articles, +https://without.boats/blog/let-futures-be-futures +and +https://without.boats/blog/futures-unordered, +withoutboats introduces the concepts of multi-task and intra-task concurrency. +I want to revisit this distinction while I agree that there are different classes +of patterns of concurrency here, I am not quite satisfied with this specific partitioning of the +design space. I will use Rust-like syntax for most of the examples, but I am more interested in the +language-agnostic patterns, rather than in Rusts specific implementation of async.

+
+ +

+ The Two Examples +

+

Lets introduce the two kinds of concurrency using a somewhat abstract example. We want to handle a +Request by doing some computation and then persisting the results in the database and in the cache. +Notably, writes to the cache and to the database can proceed concurrently. So, something like this:

+ +
+ + +
async fn process(
+  db: Database,
+  cache: Cache,
+  request: Request,
+) -> Response {
+  let response = compute_response(db, cache, request).await;
+  spawn(update_db(db, response));
+  spawn(update_cache(cache, response));
+  response
+}
+
+async fn update_db(db: Database, response: Response);
+async fn update_cache(cache: Cache, response: Response);
+
+fn spawn<T>(f: impl Future<Output = T>) -> JoinHandle<T>;
+ +
+

This is multi-task concurrency style we fire off two tasks for updating the database and the +cache. Heres the same snippet in intra-task style, where we use join function on futures:

+ +
+ + +
async fn process(
+  db: Database,
+  cache: Cache,
+  request: Request,
+) -> Response {
+  let response = compute_response(db, cache, request).await;
+  join(
+    update_db(db, response),
+    update_cache(cache, response),
+  ).await;
+  response
+}
+
+async fn update_db(db: Database, response: Response) { ... }
+async fn update_cache(cache: Cache, response: Response) { ... }
+
+async fn join<U, V>(
+  f: impl Future<Output = U>,
+  g: impl Future<Output = V>,
+) -> (U, V);
+ +
+

In other words:

+

Multi-task concurrency uses spawn an operation that takes a future and starts a tasks that +executes independently of the parent task.

+

Intra-task concurrency uses join an operation that takes a pair of futures and executes them +concurrently as a part of the current task.

+

But what is the actual difference between the two?

+
+
+ +

+ Parallelism is not +

+

One candidate is parallelism with spawn, the tasks can run not only concurrently, but actually +in parallel, on different CPU cores. join restricts them to the same thread that runs the main +task. But I think this is not quite right, abstractly, and is more of a product of specific Rust +APIs. There are executors which spawn on the current thread only. And, while in Rust its not +really possible to make join poll the futures in parallel, I think this is just an artifact of +Rust existing API design (futures cant opt-out of synchronous cancellation). In other words, I +think it is possible in theory to implement an async runtime which provides all of the following +functions at the same time:

+ +
+ + +
fn spawn<F>(fut: F) -> JoinHandle<Output = F::Output>
+where
+  F: Future;
+
+fn pspawn<F>(fut: F) -> PJoinHandle<Output = F>
+where
+  F: Future + Send + 'static,
+  F::Output: Send + 'static;
+
+async fn join<F1, F2>(
+  fut1: F1,
+  fut2: F2,
+) -> (F1::Output, F2::Output)
+where
+  F1: Future,
+  F2: Future;
+
+async fn pjoin<F1, F2>(
+  fut1: F1,
+  fut2: F2,
+) -> (F1::Output, F2::Output)
+where
+  F1: Future + Send, // NB: only Send, no 'static
+  F1::Output:  Send,
+  F2: Future + Send,
+  F2::Output:  Send;
+ +
+

To confuse matters further, lets rewrite our example in TypeScript:

+ +
+ + +
async function process(
+  db: Database,
+  cache: Cache,
+  request: Request,
+): Response {
+  const response = await compute_response(db, cache, request);
+  const db_update = update_db(db, response);
+  const cache_update = update_cache(cache, response);
+  await Promise.all([db_update, cache_update]);
+  return response
+}
+ +
+

and using Rusts rayon library:

+ +
+ + +
fn process(
+  db: Database,
+  cache: Cache,
+  request: Request,
+) -> Response {
+  let response = compute_response(db, cache, request).await;
+  rayon::join(
+    || update_db(db, response),
+    || update_cache(cache, response),
+  );
+  response
+}
+ +
+

Are these examples multi-task or intra-task? To me, the TypeScript one feels multi-task although +it is syntactically close to join().async, the two update promises are running independently from +the parent task. If we forget the call to Promise.all, the cache and the database would still get +updated (but likely after we would have returned the response to the user)! In contrast, rayon +feels intra-task although the closures could get stolen and be run by a different thread, they +wont escape dynamic extent of the encompassing process call.

+
+
+ +

+ To await or await to? +

+

Lets zoom in onto the JS and the join examples:

+ +
+ + +
async function process(
+  db: Database,
+  cache: Cache,
+  request: Request
+): Response {
+  const response = await compute_response(db, cache, request);
+
+  await Promise.all([
+    update_db(db, response),
+    update_cache(cache, response),
+  ]);
+
+  return response;
+}
+ +
+ +
+ + +
async fn process(
+  db: Database,
+  cache: Cache,
+  request: Request,
+) -> Response {
+  let response = compute_response(db, cache, request).await;
+
+  join(
+    update_db(db, response),
+    update_cache(cache, response),
+  ).await;
+
+  response
+}
+ +
+

Ive re-written the JavaScript version to be syntactically isomorphic to the Rust one. The +difference is on the semantic level: JavaScript promises are eager, they start executing as soon as +a promise is created. In contrast, Rust futures are lazy they do nothing until polled. And +this I think is the fundamental difference, it is lazy vs. eager futures (thread::spawn is an +eager future while rayon::join a lazy one).

+

And it seems that lazy semantics is quite a bit more elegant! The beauty of

+ +
+ + +
join(
+  update_db(db, response),
+  update_cache(cache, response),
+).await;
+ +
+

is that its Molières prose this is structured concurrency, but without bundles, nurseries, +scopes, and other weird APIs.

+

It makes runtime semantics nicer even in dynamically typed languages. In JavaScript, forgetting an +await is a common, and very hard to spot problem without await, code still works, but is +sometimes wrong (if the async operation doesnt finish quite as fast as usual). Imagine JS with +lazy promises there, forgetting an await would always consistently break. So, the need to +statically lint missing awaits will be less pressing.

+

Compare this with Erlangs take on nulls: while in typical dynamically typed languages partial +functions can return a value T or a None, in Erlang the convention is to return either {ok, T} +or none. That is, even if the value is non-null, the call-site is forced to unpack it, you cant +write code that happens to work as long as T is non-null.

+

And of course, in Rust, the killer feature of lazy futures is that you can just borrow data from the +enclosing scope.

+

But it seems like there is one difference between multi-task and intra-task concurrency.

+
+
+ +

+ One, Two, N, and More +

+

In the words of withoutboats:

+ +
+

The first limitation is that it is only possible to achieve a static arity of concurrency with +intra-task concurrency. That is, you cannot join (or select, etc) an arbitrary number of futures +with intra-task concurrency: the number must be fixed at compile time.

+
+ +
+

That is, you can do +join(a, b).await, +and

+ +
+ + +
join(
+  join(a, b)
+  c,
+).await
+ +
+

and, with some macros, even

+ +
+ + +
join!(a, b, c, d, e, f).await;
+ +
+

but you cant do join(xs...).await.

+

I think this is incorrect, in a trivial and in an interesting way.

+

The trivial incorrectness is that theres join_all, that takes a slice of futures and is a direct +generalization of join to a runtime-variable number of futures.

+

But join_all still cant express the case where you dont know the number of futures up-front, +where you spawn some work, and only later realize that you need to spawn some more.

+

This is sort-of possible to express with FuturesUnordered, but thats a yuck API. I mean, even +its name screams DO NOT USE ME!.

+

But I do think that this is just an unfortunate API, and that the pattern actually can be expressed +in intra-task concurrency style nicely.

+

Lets take a closer look at the base case, join!

+
+
+ +

+ Asynchronous Semicolon +

+

Section title is a bit of a giveaway. The join operator is async ;. The semicolon is an +operator of sequential composition: +A; B

+

runs A first and then B.

+

In contrast, join is concurrent composition: +join(A, B)

+

runs A and B concurrently.

+

And both join and ; share the same problem they can compose only a finite number of things.

+

But thats why we have other operators for sequential composition! If we know how many things we +need to run, we can use a counted for loop. And join_all is an analogue of a counted for loop!

+

In case where we dont know up-front when to stop, we use a while. And this is exactly what we +miss theres no concurrently-flavored while operator.

+

Importantly, what we are looking for is not an async for:

+ +
+ + +
async for x in iter {
+  process(x).await;
+}
+ +
+

Here, although there could be some concurrency inside a single loop iteration, the iterations +themselves are run sequentially. The second iteration starts only when the first one finished. +Pictorially, this looks like a spiral, or a loop if we look from the side:

+ +
+ + +
+

What we rather want is to run many copies of the body concurrently, something like this:

+ +
+ + +
+

A spindle-like shape with many concurrent strands, which looks like wheels spokes from the side. +Or, if you are really short on fitting metaphors:

+
+
+ +

+ The Watermelon Operator +

+

Now, I understand that Ive already poked fun at unfortunate FuturesUnordered name, but I cant +really find a fitting name for the construct we want here. So I am going to boringly use +concurrently keyword, which is way too long, but Ill refer to it as the watermelon operator” +The stripes on the watermelon resemble the independent strands of execution this operator creates:

+ +
+ +wikipedia watermelons +
+

So, if you are writing a TCP server, your accept loop could look like this:

+ +
+ + +
concurrently let Some(socket) = listener.accept().await in {
+  handle_connection(socket).await;
+}.await
+ +
+

This runs accept in a loop, and, for each accepted socket, runs handle_connection concurrently. +There are as many concurrent handle_connection calls as there are ready sockets in our listener!

+

Lets limit the maximum number of concurrent connections, to provide back pressure:

+ +
+ + +
let semaphore = Semaphore::new(16);
+
+concurrently
+  let Some((socket, permit)) = try {
+    let permit = semaphore.acquire().await;
+    let socket = listener.accept().await?;
+    (socket, permit)
+  }
+in {
+  handle_connection(socket).await;
+  drop(permit);
+}.await
+ +
+

You get the idea (hopefully):

+
    +
  • +In the head of our concurrent loop (cooloop?) construct, we first acquire a semaphore permit +and then fetch a socket. +
  • +
  • +Both the socket and the permit are passed to the body. +
  • +
  • +The body releases the permit at the end. +
  • +
  • +While the head construct runs in a loop concurrently to bodies, it is throttled by the minimum +of the available permits and ready connections. +
  • +
+

To make this more concrete, lets spell this out as a library +function:

+ +
+ + +
async fn join<F1, F2>(
+  fut1: F1,
+  fut2: F2,
+) -> (F1::Output, F2::Output)
+where
+  F1: Future,
+  F2: Future;
+
+async fn join_all<F>(futs: Vec<F>) -> Vec<F::Output>
+where
+  F: Future;
+
+async fn concurrently<C, FC, B, FB, T>(condition: C, body: B)
+where
+  C: FnMut() -> FC,
+  FC: Future<Output = Option<T>>,
+  B: FnMut(T) -> FB,
+  FB: Future<Output = ()>;
+ +
+

I claim that this is the full set of primitive operations needed to express +more-or-less everything in intra-task concurrency style.

+

In particular, we can implement multi-task concurrency this way! To do so, well write a universal +watermelon operator, where the T which is passed to the body is an +Box<dyn Future<Output=()>>, +and where the body just runs this future:

+ +
+ + +
async fn multi_task_concurrency_main(
+  spawn: impl Fn(impl Future<Output = ()> + 'static),
+) {
+    ...
+}
+
+type AnyFuture = Box<dyn Future<Output = ()> + 'static>;
+
+async fn universal_watermelon() {
+  let (sender, receiver) = channel::<AnyFuture>();
+  join(
+    multi_task_concurrency_main(move |fut| {
+      sender.send(Box::new(fut))
+    }),
+    concurrently(
+      || async {
+        receiver.recv().await;
+      },
+      |fut| async {
+        fut.await;
+      },
+    ),
+  )
+  .await;
+}
+ +
+

Note that the conversion in the opposite direction is not possible! With intra-task concurrency, we +can borrow from the parent stack frame. So it is not a problem to restrict that to only allow +'static futures into the channel. In a sense, in the above example we return the future up the +stack, which explains why it cant borrow locals from our stack frame.

+

With multi-task concurrency though, we start with static futures. To let them borrow any stack data +requires unsafe.

+

Note also that the above set of operators, join, join_all, concurrently is orthogonal to +parallelism. Alongside those operators, there could exist pjoin, pjoin_all and pconcurrently +with the Send bounds, such that you could mix and match parallel and single-core concurrency.

+
+
+ +

+ If a Stack is a Tree, Does it Make Any Difference? +

+

One possible objection to the above framing of watermelon as a language-level operator is that it +seemingly doesnt pass zero-cost abstraction test. It can start an unbounded number of futures, +and those futures have to be stored somewhere. So we have a language operator which requires +dynamic memory allocation, which is a big no-no for any systems programming language.

+

I think there is some truth to it, and not an insignificant amount of it, but I think I can maybe +weasel out of it.

+

Consider recursion. Recursion also can allocate arbitrary amount of memory (on the stack), but +that is considered fine (I would also agree that it is not in fact fine that unbounded recursion +is considered fine, but, for the scope of this discussion, I will be a hypocrite and will ignore +that opinion of mine).

+

And here, we have essentially the same situation we want to allocate arbitrary many (async) +stack frames, arranged in a tree. Doing it on the heap is easy, but we dont like the heap here. +Luckily, I believe theres a compilation scheme (hat tip to @rpjohnst +for patiently explaining it to me five times in different words) that implements this more-or-less +as efficiently as the normal call stack.

+

The idea is that we will have two stacks a sync one and an async one. Specifically:

+
    +
  • +Every sync function we compile normally, with a single stack. +
  • +
  • +Async functions get two stack pointers. So, we burn sp and one other register +(lets call it asp). +
  • +
  • +If an async function calls a sync function, the callees frame is pushed onto sp. +Crucially, because sync functions can only call other sync functions, the callee doesnt need +to know the value of asp. +
  • +
  • +If an async function calls another async function, the frame (specifically, the variables live +across await point part of it) is pushed onto asp. +
  • +
  • +This async stack is segmented. So, for async function calls, we also do a check for do we have +enough stack? and, if not, allocate a new segment, linking them via a frame pointer. +
  • +
  • +“Allocating a new segment doesnt mean that we actually go and call malloc. Rather, theres a +fixed-sized contiguous slab of say, 8 megs, out of which all async frames are allocated. +
  • +
  • +If we are out of async-stack, we crash in pretty much the same way as for the boring sync stack +overflow. +
  • +
+

While this looks just like Go-style segmented stacks, I think this scheme is quite a bit more +efficient (warning: I in general have a tendency to confidently talk about things I know little +about, and this one is the extreme case of that. If some Go compiler engineer disagrees with me, I +am probably in the wrong!).

+

The main difference is that the distinction between sync and async functions is maintained in the +type system. There are no changes for sync functions at all, so the principle of dont pay for +what you dont use is observed. This is in contrast to Go I believe that Go, in general, cant +know whether a particular function can yield (that is, if any function it (indirectly) calls can +yield), so it has to conservatively insert stack checks everywhere.

+

Then, even the async stack frames dont have to store everything, but just the stuff live across +await. Everything that happens between two awaits can go to the normal stack.

+

On top of that, async functions can still do aggressive inlining. So, the async call (and the stack +growth check) has to happen only for dynamically dispatched async calls!

+

Furthermore, the future trait could have some kind of size_hint method, which returns the lower +and the upper bound on the size of the stack. Fully concrete futures type-erased to dyn Future +would return the exact amount (a, Some(a)). The caller would be required to allocate at least +a bytes of the async stack. The callee uses that contract to elide stack checks. Unknown bound, +(a, None) would only be returned if type-erased concrete future itself calls something +dynamically dispatched. So only dynamically dispatched calls would have to do stack grow checks, and +that cost seems negligible in comparison to the cost of missing optimizations due to inability to +inline.

+

Altogether, it feels like this adds up to something sufficiently cheap to just call it async stack +allocation.

+

I guess thats all for today? Summarizing:

+
    +
  • +Inter-task vs intra-task distinction is mostly orthogonal to the question of parallelism. +
  • +
  • +I claim that this is the same distinction as between eager and lazy futures. +
  • +
  • +In particular, theres no principled obstacles for runtime-bounded intra-task concurrency. +
  • +
  • +But we do miss FuturesUnordered, but nice. The concurrently operator/function feels like a +sufficiently low-hanging watermelon here. +
  • +
  • +One wrinkle is that watermelon requires dynamic allocation, but it looks like we could just +completely upend the compilation strategy we use for futures, implement async segmented stacks +which should be pretty fast, and also gain nice dynamically dispatched (and recursive) async +functions for free? +
  • +
+
+

Haha, just kidding! Bonus content! This really should be a separate blog post, but it is +tangentially related, so here we go:

+
+
+ +

+ Applied Duality +

+

So far, weve focused on join, the operator that takes two futures, and runs them concurrently, +returning both results as a pair. But theres a second, dual operator:

+ +
+ + +
async fn race<F1, F2>(
+  fut1: F1,
+  fut2: F2,
+) -> Either<F1::Output, F2::Output>
+where
+  F1: Future,
+  F2: Future,
+ +
+

Like join, race runs two futures concurrently. Unlike join, it returns only one result — +that which came first. This operator is the basis for a more general select facility.

+

Although race is dual to join, I dont think it is as fundamental. It is possible to have two +dual things, where one of them is in the basis and the other is derived. For example, it is an axiom +of the set theory that the union of two sets, A ∪ B, is a set. Although the intersection of sets, +A ∩ B is a dual for union, existence of intersection is not an axiom. Rather, the intersection +is defined using axiom of specification:

+ +
+ + +
A ∩ B := {x ∈ A : x ∈ B}
+ +
+

Proposition 131.7.1: race can be defined in terms of join

+

The race operator is trickier than it seems. Yes, it returns the result of the future that +finished first, but what happens with the other one? It gets cancelled. Rust implements this +cancellation for free, by just dropping the future, but this is restrictive. This is precisely the +issue that prevents pjoin from working.

+

I postulate that fully general cancellation is an asynchronous protocol:

+
    +
  1. +A requests that B is cancelled. +
  2. +
  3. +B receives this cancellation request and starts winding down. +
  4. +
  5. +A waits until B is cancelled. +
  6. +
+

That is, cancellation is not I cancel thou. Rather it is I ask you to stop, and then I +cooperatively wait until you do so. This is very abstract, but the following three examples should +help make this concrete.

+
    +
  1. +

    A is some generic asynchronous task, which offloads some computation-heavy work to a CPU pool. +That work (B) doesnt have checks for cancelled flags. So, if A is canceled, it cant really +stop B, which means we are violating structured concurrency.

    +
  2. +
  3. +

    A is doing async IO. Specifically, A uses io_uring to read data from a socket. A owns a buffer, +and passes a pointer to it to the kernel via io_uring as the target buffer for a read +syscall. While A is being cancelled, the kernel writes data to this buffer. If A doesnt wait +until the kernel is done, buffers memory might get reused, and the kernel would corrupt some +unrelated data.

    +
  4. +
+

These examples are somewhat unsatisfactory A is philosophical (who needs structured +concurrency?), while B is esoteric (who uses io_uring in 2024?). But the two can be combined into +something rather pedestrianly bad:

+

Like in the case A, an async task submits some work to a CPU pool. But this time the work is very +specific computing a cryptographic checksum of a message owned by A. Because this is +cryptography, this is going to be some hyper-optimized SIMD loop which definitely wont have any +affordance for checking some sort of a cancelled flag. The loop would have to run to completion, +or at least to a safe point. And, because the loop checksums data owned by A, we cant destroy A +before the loop exits, otherwise itll be reading garbage memory!

+

And this example is the reason why

+ +
+ + +
async fn pjoin<F1, F2>(
+  fut1: F1,
+  fut2: F2,
+) -> (F1::Output, F2::Output)
+where
+  F1: Future + Send,
+  F1::Output:  Send,
+  F2: Future + Send,
+  F2::Output:  Send,
+ +
+

cant be a thing in Rust if fut1 runs on a thread separate from the pjon future, then, if +pjoin ends up being cancelled, fut1 would be pointing at garbage. You could have

+ +
+ + +
async fn pjoin<F1, F2>(
+  fut1: F1,
+  fut2: F2,
+) -> (F1::Output, F2::Output)
+where
+  F1: Future + Send + 'static,
+  F1::Output:  Send + 'static,
+  F2: Future + Send + 'static,
+  F2::Output:  Send + 'static,
+ +
+

but that removes one of the major benefits of intra-task style API ability to just borrow data.

+

So the fully general cancellation should be cooperative. Lets assume that it is driven by some sort +of cancellation token API:

+ +
+ + +
impl CancellationSource {
+  fn request_cancellation(&self) { ... }
+  async fn await_cancellation(self) { ...  }
+
+  async fn cancel(self) {
+    self.request_cancellation();
+    self.await_cancellation().await;
+  }
+
+  fn new_token(&self) -> CancellationToken { ... }
+}
+
+impl CancellationToken {
+  fn is_cancelled(&self) -> bool { ... }
+  fn on_cancelled(&self, callback: impl FnOnce()) { ... }
+}
+ +
+

Note that the question of cancellation being cooperative is orthogonal to the question of explicit +threading of cancellation tokens! They can be threaded implicitly (cooperative, implicit +cancellation is how Pythons trio does this, though they dont really document the cooperative part +(the shields stuff)).

+

With this, we can write our own race well create a cancellation scope and then join +modified futures, each of which would cancel the other upon completion:

+ +
+ + +
fn race<U, V>(
+  fut1: impl async FnOnce(&CancellationToken) -> U,
+  fut2: impl async FnOnce(&CancellationToken) -> V,
+) -> Either<U, V> {
+  let source = CancellationSource::new();
+  let token = source.new_token();
+  let u_or_v = join(
+    async {
+      let u = fut1(&token).await;
+      if token.is_cancelled() {
+        return None;
+      }
+      source.cancel();
+      Some(u)
+    },
+    async {
+      let v = fut2(&token).await;
+      if token.is_cancelled() {
+        return None;
+      }
+      source.cancel();
+      Some(v)
+    },
+  )
+  .await;
+  match u_or_v {
+    (Some(u), None) => Left(u),
+    (None, Some(v)) => Right(v),
+    _ => unreachable!(),
+  }
+}
+ +
+

In other words, race is but a cooperatively-cancelled join!

+

Thats all for real for today, viva la vida!

+
+]]>
+
+ + +What is io_uring? + +2024-09-23T00:00:00+00:00 +2024-09-23T00:00:00+00:00 +https://matklad.github.io/2024/09/23/what-is-io-uring +Alex Kladov + +What is io_uring? +

An attempt at concise explanation of what io_uring is.

+

io_uring is a new Linux kernel interface for making system calls. +Traditionally, syscalls are submitted to the kernel individually and +synchronously: a syscall CPU instruction transfers control from the +application to the kernel; control returns to the application only when the +syscall is completed. In contrast, io_uring is a batched and asynchronous +interface. The application submits several syscalls by writing their codes & +arguments to a lock-free shared-memory ring buffer. The kernel reads the +syscalls from this shared memory and executes them at its own pace. To +communicate results back to the application, the kernel writes the results to a +second lock-free shared-memory ring buffer, where they become available to the +application asynchronously.

+

You might want to use io_uring if:

+
    +
  • +you need extra performance unlocked by amortizing userspace/kernelspace +context switching across entire batches of syscalls, +
  • +
  • +you want a unified asynchronous interface to the entire system. +
  • +
+

You might want to avoid io_uring if:

+
    +
  • +you need to write portable software, +
  • +
  • +you want to use only old, proven features, +
  • +
  • +and in particular you want to use features with a good security track record. +
  • +
+]]>
+
+ + +Try to Fix It One Level Deeper + +2024-09-06T00:00:00+00:00 +2024-09-06T00:00:00+00:00 +https://matklad.github.io/2024/09/06/fix-one-level-deeper +Alex Kladov + +Try to Fix It One Level Deeper +

I had a productive day today! I did many different and unrelated things, but they all had the same +unifying theme:

+

Theres a bug! And it is sort-of obvious how to fix it. But if you dont laser-focus on that, and +try to perceive the surrounding context, it turns out that the bug is valuable, and it is pointing +in the direction of a bigger related problem. So, instead of fixing the bug directly, a detour is +warranted to close off the avenue for a class of bugs.

+

Here are the examples!

+

In the morning, my colleague pointed out that we are giving substandard error message for a pretty +stressful situation when the database runs out of disk space. I went ahead and added appropriate log +messages to make it clearer. But then I stopped for a moment and noticed that the problem is bigger +— we are missing an infrastructure for fatal errors, and NoSpaceLeft is just one of a kind. So I +went ahead and added that along the way: +#2289.

+

Then, I was reviewing a PR by @martinconic which was fixing some typos, and noticed that it was +also changing the formatting of our Go code. The latter is by far the biggest problem, as it is the +sign that we somehow are not running gofmt during our CI, which I fixed in +#2287.

+

Then, there was a PR from yesterday, where we again had a not quite right log message. The cause was +a confusion between two compile-time configuration parameters, which were close, but not quite +identical. So, instead of fixing the error message I went ahead and made the two parameters +exactly the same. But then my colleague noticed that I actually failed to fix it one level deeper +in this case! Turns out, it is possible to remove this compile-time parametrization altogether, +which I did in #2292.

+

But these all were randomly-generated side quests. My intended story line for today was to refactor +the piece of code I had trouble explaining (and understanding!) on yesterdays +episode +of Iron Beetle. To get into the groove, I decided to first refactor the code that calls the +problematic piece of logic, as I noticed a couple of minor stylistic problems there. Of course, when +doing that, I discovered that we have a bit of dead code, which luckily doesnt affect correctness, +but does obscure the logic. While fixing that, I used one of my favorite Zig patterns: +defer assert(postcondition);

+

It of course failed in the simulator in a way postcondition checks tend to fail there was an +unintended reentrancy in the code. So I slacked my colleague something like

+ +
+

I thought myself to be so clever adding this assert, but now it fails and I have to fix it TT +I think Ill just go and .next_tick the prefetch path. It feels like there should be a more +elegant solution here, but I am not seeing it.

+
+ +
+

But of course I cant just go and .next_tick it, so here I am, trying to figure out how to +encode a Duffs device in Zig +pre-#8220, so as to make this class of issues much +less likely.

+]]>
+
+ + +The Fundamental Law Of Software Dependencies + +2024-09-03T00:00:00+00:00 +2024-09-03T00:00:00+00:00 +https://matklad.github.io/2024/09/03/the-fundamental-law-of-dependencies +Alex Kladov + +The Fundamental Law Of Software Dependencies + +
+

Canonical source code for software should include checksums of the content of all its +dependencies.

+
+ +
+

Several examples of the law:

+

Software obviously depends on its source code. The law says that something should hold the hash of +the entire source, and thus mandates the use of a content-addressed version control system such as +git.

+

Software often depends on 3rd party libraries. These libraries could in turn depend on other +libraries. It is imperative to include a lockfile that covers this entire set and comes with +checksums. Curiously, the lockfile itself is a part of source code, and gets mixed into the VCS +root hash.

+

Software needs a compiler. The hash of the required compiler should be included in the lockfile. +Typically, this is not done only the version is specified. I think that is a mistake. Specifying +a version and a hash is not much more trouble than just the version, but that gives you a superpower +— you no longer need to trust the party that distributes your compiler. You could take a shady +blob of bytes youve found laying on the street, as long as its checksum checks out.

+

Note that you can compress hashes by mixing them. For compiler use-case, theres a separate hash per +platform, because the Linux and the Windows versions of the compiler differ. This doesnt mean that +your project should include one compilers hash per platform, one hash is enough. Compiler +distribution should include a manifest a small text file which lists all platform and their +platform specific hashes. The single hash of that file is what is to be included by downstream +consumers. To verify a specific binary, the consumer first downloads a manifest, checks that it +has the correct hash, and then extracts the hash for the specific platform.

+
+

The law is an instrumental goal. By itself, hashes are not that useful. But to get to the point +where you actually know the hashes requires:

+
    +
  • +Actually learning what are your dependencies (this is not trivial! If you have a single +Makefile or an .sh, you most likely dont know the set of your dependencies). +
  • +
  • +Coming up with some automated way to download those dependencies. +
  • +
  • +Fixing dependenciess build process to become reproducible, so as to have a meaningful hash at +all. +
  • +
  • +Learning to isolate dependencies per project, as hashed dependencies cant be installed into a +global shared namespace. +
  • +
+

These things are what actually make developing software easier.

+]]>
+
+ + +STD Doesn't Have to Abstract OS IO + +2024-08-12T00:00:00+00:00 +2024-08-12T00:00:00+00:00 +https://matklad.github.io/2024/08/12/std-io +Alex Kladov + +STD Doesnt Have to Abstract OS IO +

A short note on what goes into a languages standard library, and whats left for third party +libraries to implement!

+

Usually, the main underlying driving factor here is cardinality. If it is important that theres +only one of a thing, it goes into std. If having many of a thing is a requirement, it is better +handled by a third-party library. That is, the usual physical constraint is that theres only a +single standard library, and everyone uses the same standard library. In contrast, there are many +different third-party libraries, and they all can be used at the same time.

+

So, until very recently, my set of rules of thumb for what goes into stdlib looked roughly like +this:

+
    +
  1. +If this is a vocabulary type, which will be used by APIs of different libraries, it should be in +the stdlib. +
  2. +
  3. +If this is a cross platform abstraction around an IO facility provided by an OS, and this IO +facility has a reasonable common subset across most OSes, it should be in the stdlib. +
  4. +
  5. +If theres one obvious way to implement it, it might go to stdlib. +
  6. +
+

So for example something like Vec goes +into a standard library, because all other libraries are going to use vectors at the interfaces.

+

Something like lazy_static +doesnt: while it is often needed, it is not a vocabulary interface type.

+

But it is acceptable for something like +OnceCell to be in std +— it is still not a vocabulary type, but, unlike lazy_static, it is clear that the API is more +or less optimal, and that there arent that many good options to do this differently.

+

But Ive changed my mind about the second bullet point, about facilities like file IO or TCP +sockets. I was always under the impression that these things are a must for a standard library. +But now I think thats not necessarily true!

+

Consider randomness. Not the PRNG kind of randomness youd use to make a game fun, but a +cryptographically secure randomness that youd use to generate an SSH key pair. This sort of +randomness ultimately bottoms out in hardware, and fundamentally requires talking to the OS and +doing IO. This is squarely the bullet point number 2. And Rust is an interesting case study here: it +failed to provide this abstraction in std, even though std itself actually needs it! But this turned +out to be mostly a non-issue in practice a third party crate, getrandom, took the job of +writing all the relevant bindings to various platform-specific API and using a bunch of conditional +compilation to abstract that all away and provide a nice cross-platform API.

+

So, no, it is not a requirement that std has to wrap any wrappable IOing API. This could be +handled by the library ecosystem, if the language allows first-class bindings to raw OS APIs +outside of compiler-privileged code (and Rust certainly allows for that).

+

So perhaps it wont be too unreasonable to leave even things like files and sockets to community +experimentation? In a sense, that is happening in the async land anyway.

+
+

To clarify, I still believe that Rust should provide bindings to OS-sourced crypto randomness, and +I am extremely happy to see recent motion in that area. But the reason for this belief changed. I no +longer feel the mere fact that OS-specific APIs are involved to be particularly salient. However, it +is still true that theres more or less one correct way to do +this.

+]]>
+
+ + +Primitive Recursive Functions For A Working Programmer + +2024-08-01T00:00:00+00:00 +2024-08-01T00:00:00+00:00 +https://matklad.github.io/2024/08/01/primitive-recursive-functions +Alex Kladov + +Primitive Recursive Functions For A Working Programmer +

Programmers on the internet often use Turing-completeness terminology. Typically, not being +Turing-complete is extolled as a virtue or even a requirement in specific domains. I claim that most +such discussions are misinformed that not being Turing complete doesnt actually mean what folks +want it to mean, and is instead a stand-in for a bunch of different practically useful properties, +which are mostly orthogonal to actual Turing completeness.

+

While I am generally descriptivist in nature and am ok with words losing their original meaning +as long as the new meaning is sufficiently commonly understood, Turing completeness is a hill I will +die on. It is a term from math, it has a very specific meaning, and you are not allowed to +re-purpose it for anything else, sorry!

+

I understand why this happens: to really understand what Turing completeness is and is not you need +to know one (simple!) theoretical result about so-called primitive recursive functions. And, +although this result is simple, I was only made aware of it in a fairly advanced course during my +masters. Thats the CS education deficiency I want to rectify you cant teach students the +halting problem without also teaching them about primitive recursion!

+

The post is going to be rather meaty, and will be split in three parts:

+

In Part I, I give a TL;DR for the theoretical result and some of its consequences. Part II is going +to be a whirlwind tour of Turing Machines, Finite State Automata and Primitive Recursive Functions. +And then Part III will circle back to practical matters.

+

If math makes you slightly nauseous, you might to skip Part II. But maybe give it a try? The math +well need will be baby math from first principles, without reference to any advanced results.

+
+ +

+ Part I: TL;DR +

+

Heres the key result suppose you have a program in some Turing complete language, and you also +know that its not too slow. Suppose it runs faster than +O(22N). +That is, two to the power of two to the power of N, a very large number. In this case, you can +implement this algorithm in a non-Turing complete language.

+

Most practical problems fall into this faster than two to the two to the power of two space. +Hence it follows that you dont need the full power of a Turing Machine to tackle them. Hence, a +language not being Turing complete doesnt in any way restrict you in practice, or give you extra +powers to control the computation.

+

Or, to restate this: in practice, a program which doesnt terminate, and a program that needs a +billion billion steps to terminate are equivalent. Making something non-Turing complete by itself +doesnt help with the second problem in any way. And theres a trivial approach that solves the +first problem for any existing Turing-complete language in the implementation, count the steps +and bail with an error after a billion.

+
+
+ +

+ Part II: Weird Machines +

+

The actual theoretical result is quite a bit more general than that. It is (unsurprisingly) +recursive:

+ +
+

If a function is computed by a Turing Machine, and the runtime of this machine is bounded by some +primitive recursive function of input, then the original function itself can be written as a +primitive recursive function.

+
+ +
+

It is expected that this sounds like gibberish at this point! So lets just go and prove this thing, +right here in this blog post! Will work up slowly towards this result. The plan is as follows:

+
    +
  • +First, to brush up notation, well define Finite State Machines. +
  • +
  • +Second, well turn our humble Finite State Machine into the all-powerful Turing Machine (spoiler +— a Turing Machine is an FSM with a pair of stacks), and, as is customary, wave our hands about +the Universal Turing Machine. +
  • +
  • +Third, we leave the cozy world of imperative programming and define primitive recursive +functions. +
  • +
  • +Finally, well talk about the relative computational power of TMs and PRFs, including the teased +up result and more! +
  • +
+
+
+ +

+ Finite State Machines +

+

Finite State Machines are simple! An FSM takes a string as input, and returns a binary +answer, yes or no. Unsurprisingly an FSM has a finite number of states: Q0, Q1, , Qn. +A subset of states are designated as yes states, the rest are no states. Theres also one +specific starting state.

+

The behavior of the state machine is guided by a transition (step) function, s. This function +takes the current state of FSM, the next symbol of input, and returns a new state.

+

The semantics of FSM is determined by repeatably applying the single step function for all symbols of +the input, and noting whether the final state is a yes state or a no state.

+

Heres an FSM which accepts only strings of zeros and ones of even length:

+ +
+ + +
States:     { Q0, Q1 }
+Yes States: { Q0 }
+Start State:  Q0
+
+s :: State -> Symbol -> State
+s Q0 0 = Q1
+s Q0 1 = Q1
+s Q1 0 = Q0
+s Q1 1 = Q0
+ +
+

This machine ping-pongs between states Q0 and Q1 ends up in Q0 only for inputs of even length +(including an empty input).

+

What can FSMs do? As they give a binary answer, they are recognizers they dont compute +functions, but rather just characterize certain sets of strings. A famous result is that the +expressive power of FSMs is equivalent to the expressive power of regular expressions. If you can +write a regular expression for it, you could also do an FSM!

+

There are also certain things that state machines cant do. For example they cant enter an infinite +loop. Any FSM is linear in the input size and always terminates. But there are much more specific +sets of strings that couldnt be recognized by an FSM. Consider this set:

+ +
+ + +
1
+010
+00100
+0001000
+...
+ +
+

That is, an infinite set which contains 1s surrounded by the equal number of 0s on the both +sides. Lets prove that there isnt a state machine that recognizes this set!

+

As usually, suppose there is such a state machine. It has a certain number of states maybe a +dozen, maybe a hundred, maybe a thousand, maybe even more. But lets say fewer than a million. +Then, lets take a string which looks like a million zeros, followed by one, followed by million +zeros. And lets observe our FSM eating this particular string.

+

First of all, because the string is in fact a one surrounded by the equal number of zeros on both +sides, the FSM ends up in a yes state. Moreover, because the length of the string is much greater +than the number of states in the state machine, the state machine necessarily visits some state twice. +There is a cycle, where the machine goes from A to B to C to D and back to A. This cycle might be +pretty long, but its definitely shorter than the total number of states we have.

+

And now we can fool the state machine. Lets make it eat our string again, but this time, once it +completes the ABCDA cycle, well force it to traverse this cycle again. That is, the original cycle +corresponds to some portion of our giant string:

+ +
+ + +
0000 0000000000000000000 00 .... 1 .... 00000
+     <- cycle portion ->
+ +
+

If we duplicate this portion, our string will no longer look like one surrounded by equal number of +twos, but the state machine will still in the yes state. Which is a contradiction that completes +the proof.

+
+
+ +

+ Turing Machine: Definition +

+

A Turing Machine is only slightly more complex than an FSM. Like an FSM, a TM has a bunch of states +and a single-step transition function. While an FSM has an immutable input which is being fed to it +symbol by symbol, a TM operates with a mutable tape. The input gets written to the tape at the +start. At each step, a TM looks at the current symbol on the tape, changes its state according to a +transition function and, additionally:

+
    +
  • +Replaces the current symbol with a new one (which might or might not be different). +
  • +
  • +Moves the reading head that points at the current symbol one position to the left or to the right. +
  • +
+

When a machine reaches a designated halt state, it stops, and whatever is written on the tape at +that moment is the result. That is, while FSMs are binary recognizers, TMs are functions. Keep in +mind that a TM does not necessarily stop. It might be the case that a TM goes back and forth over the +tape, overwrites it, changes its internal state, but never quite gets to the final state.

+

Heres an example Turing Machine:

+ +
+ + +
States:  {A, B, C, H}
+Start State: A
+Final State: H
+
+s :: State -> Symbol -> (State, Symbol, Left | Right)
+s A 0 = (B, 1, Right)
+s A 1 = (H, 1, Right)
+s B 0 = (C, 0, Right)
+s B 1 = (B, 1, Right)
+s C 0 = (C, 1, Left)
+s C 1 = (A, 1, Left)
+ +
+

If the configuration of the machine looks like this:

+ +
+ + +
000010100000
+     ^
+     A
+ +
+

Then we are in the s A 0 = (B, 1, Right) case, so we should change the state to B, replace 0 with +1, and move to the right:

+ +
+ + +
000011100000
+      ^
+      B
+ +
+
+
+ +

+ Turing Machine: Programming +

+

There are a bunch of fiddly details to Turing Machines!

+

The tape is conceptually infinite, so beyond the input, everything is just zeros. This creates a +problem: it might be hard to say where the input (or the output) ends! There are a couple of +technical solutions here. One is to say that there are three different symbols on the tape — +zeros, ones, and blanks, and require that the tape is initialized with blanks. A different solution +is to invent some encoding scheme. For example, we can say that the input is a sequence of 8-bit +bytes, without interior null bytes. So, eight consecutive zeros at a byte boundary designate the end +of input/output.

+

Its useful to think about how this byte-oriented TM could be implemented. We could have one large +state for each byte of input. So, Q142 would mean that the head is on the byte with value 142. And +then well have a bunch of small states to read out the current byte. Eg, we start reading a byte in +state S. Depending on the next bit we move to S0 or S1, then to S00, or S01, etc. Once we reached +something like S01111001, we move back 8 positions and enter state Q121. This is one of the patterns +of Turing Machine programming while your main memory is the tape, you can represent some +constant amount of memory directly in the states.

+

What weve done here is essentially lowering a byte-oriented Turing Machine to a bit-oriented +machine. So, we could think only in terms of big states operating on bytes, as we know the general +pattern for converting that to direct bit-twiddling.

+

With this encoding scheme in place, we now can feed arbitrary files to a Turing Machine! Which will +be handy to the next observation:

+

You cant actually program a Turing Machine. What I mean is that, counter-intuitively, there isnt +some user-supplied program that a Turing Machine executes. Rather, the program is hard-wired into +the machine. The transition function is the program.

+

But with some ingenuity we can regain our ability to write programs. Recall that weve just learned +to feed arbitrary files to a TM. So what we could do is to write a text file that specifies a TM and +its input, and then feed that entire file as an input to an interpreter Turing Machine which would +read the file, and act as the machine specified there. A Turing Machine can have an eval +function.

+

Is such an interpreter Turing Machine possible? Yes! And it is not hard: if you spend a couple of hours +programming Turing Machines by hand, youll see that you pretty much can do anything you can do +numbers, arithmetic, loops, control flow. Its just very very tedious.

+

So lets just declare that weve actually coded up this Universal Turing Machine which simulates a +TM given to it as an input in a particular encoding.

+

This sort of construct also gives rise to the Church-Turing thesis. We have a TM which can run other +TMs. And you can implement a TM interpreter in something like Python. And, with a bit of legwork, +you could also implement a Python interpreter as a TM (you likely want to avoid doing that +directly, and instead do a simpler interpreter for WASM, and then use a Python interpreter compiled +to WASM). This sort of bidirectional interpretation shows that Python and TMs have equivalent +computing power. Moreover, its quite hard to come up with a reasonable computational device which +is more powerful than a Turing Machine.

+

There are computational devices that are strictly weaker than TMs though. Recall FSMs. By this point, +it should be obvious that a TM can simulate an FSM. Everything a Finite State Machine can do, a +Turing Machine can do as well. And it should be intuitively clear that a TM is more powerful than an +FSM. An FSM gets to use only a finite number of states. A TM has these same states, but it also posses +a tape which serves like an infinitely sized external memory.

+

Directly proving that you cant encode a Universal Turing Machine as an FSM sounds complicated, +so lets prove something simpler. Recall that we have established that theres no FSM that accepts +only ones surrounded by an equal number of zeros on both sides (because a sufficiently large word +of this form would necessary enter a cycle in a state machine, which could then be further pumped). +But its actually easy to write a Turing Machine that does this:

+
    +
  • +Erase zero (at the left side of the tape) +
  • +
  • +Go to the right end of the tape +
  • +
  • +Erase zero +
  • +
  • +Go to the left side of the tape +
  • +
  • +Repeat +
  • +
  • +If whats left is a single 1 the answer is yes, otherwise it is a no” +
  • +
+

We found a specific problem that can be solved by a TM, but is out of reach of any FSM. So it +necessarily follows that there isnt an FSM that can simulate an arbitrary TM.

+

It is also useful to take a closer look at the tape. It is a convenient skeuomorphic abstraction +which makes the behavior of the machine intuitive, but it is inconvenient to implement in a normal +programming language. There isnt a standard data structure that behaves just like a tape.

+

One cool practical trick is to simulate the tape as a pair of stacks. Take this:

+ +
+ + +
Tape: A B C D E F G
+Head:     ^
+ +
+

And transform it to something like this:

+ +
+ + +
Left Stack:  [A, B, C]
+Right Stack: [G, F, E, D]
+ +
+

That is, everything to the left of the head is one stack, everything to the right, reversed, is the +other. Here, moving the reading head left or right corresponds to popping a value off one stack and +pushing it onto another.

+

So, an equivalent-in-power definition would be to say that a TM is an FSM endowed with two +stacks.

+

This of course creates an obvious question: is an FSM with just one stack a thing? Yes! It would be +called a pushdown automaton, and it would correspond to context-free languages. But thats beyond +the scope of this post!

+

Theres yet another way to look at the tape, or the pair of stacks, if the set of symbols is 0 and +1. You could say that a stack is just a number! So, something like +[1, 0, 1, 1] +will be +1 + 2 + 8 = 11. +Looking at the top of the stack is stack % 2, removing an item from the stack is stack / 2 and +pushing x onto the stack is stack * 2 + x. We wont need this right now, so just hold onto this +for a brief moment.

+
+
+ +

+ Turing Machine: Limits +

+

Ok, so we have some idea about the lower bound for the power of a Turing Machine FSMs are strictly +less expressive. What about the opposite direction? Is there some computation that a Turing Machine +is incapable of doing?

+

Yes! Lets construct a function which maps natural numbers to natural numbers, which cant be +implemented by a Turing Machine. Recall that we can encode an arbitrary Turing Machine as text. That +means that we can actually enumerate all possible Turing Machines, and write them in a giant line, +from the most simple Turing Machine to more complex ones:

+ +
+ + +
TM_0
+TM_1
+TM_2
+...
+TM_326
+...
+ +
+

This is of course going to be an infinite list.

+

Now, lets see how TM0 behaves on input 0: it either prints something, or doesnt terminate. Then, +note how TM1 behaves on input 1, and generalizing, create function f that behaves as the nth TM +on input n. It might look something like this:

+ +
+ + +
f(0) = 0
+f(1) = 111011
+f(2) = doesn't terminate
+f(3) = 0
+f(4) = 101
+...
+ +
+

Now, lets construct function g which is maximally diffed from f: where f gives 0, g will +return 1, and it will return 0 in all other cases:

+ +
+ + +
g(0) = 1
+g(1) = 0
+g(2) = 0
+g(3) = 1
+g(4) = 0
+...
+ +
+

There isnt a Turing machine that computes g. For suppose there is. Then, it exists in our list of +all Turing Machines somewhere. Lets say it is TM1000064. So, if we feed 0 to it, it will return +g(0), which is 1, which is different from f(0). And the same holds for 1, and 2, and 3. +But once we get to g(1000064), we are in trouble, because, by the definition of g, g(1000064) +is different from what is computed by TM1000064! So such a machine is impossible.

+

Those math savvy might express this more succinctly theres a countably-infinite number of +Turing Machines, and an uncountably-infinite number of functions. So there must be some functions +which do not have a corresponding Turing Machine. It is the same proof the diagonalization +argument is hiding in the claim that the set of all functions is an uncountable set.

+

But this is super weird and abstract. Lets rather come up with some very specific problem which +isnt solvable by a Turing Machine. The halting problem: given source code for a Turing Machine and +its input, determine if the machine halts on this input eventually.

+

As we have waved our hands sufficiently vigorously to establish that Python and Turing Machines have +equivalent computational power, I am going to try to solve this in Python:

+ +
+ + +
def halts(program_source_code: str, program_input: str) -> Bool:
+    # One million lines of readable, but somewhat
+    # unsettling and intimidating Python code.
+    return the_answer
+
+raw_input = input()
+[program_source_code, program_input] = parse(raw_input)
+print("Yes" if halts(program_source_code, program_input) else "No")
+ +
+

Now, I will do a weird thing and start asking whether a program terminates, if it is fed its own +source code, in a reverse-quine of sorts:

+ +
+ + +
def halts_on_self(program_source_code: str) -> Bool:
+    program_input = program_source_code
+    return halts(program_source_code, program_input)
+ +
+

and finally I construct this weird beast of a program:

+ +
+ + +
def halts(program_source_code: str, program_input: str) -> Bool:
+    # ...
+    return the_answer
+
+def halts_on_self(program_source_code: str) -> Bool:
+    program_input = program_source_code
+    return halts(program_source_code, program_input)
+
+def weird(program_input):
+    if halts_on_self(program_input):
+        while True:
+            pass
+
+weird(input())
+ +
+

To make this even worse, Ill feed the text of this weird program to itself. Does it terminate +with this input? Well, if it terminates, and if our halts function is implemented correctly, then +the halts_on_self(program_input) invocation above returns True. But then we enter the infinite +loop and dont actually terminate.

+

Hence, it must be the case that weird does not terminate when self-applied. But then +halts_on_self returns False, and it should terminate. So we get a contradiction both ways. Which +necessarily means that either our halts sometimes returns a straight-up incorrect answer, or that it +sometimes does not terminate.

+

So this is the flip side of a Turing Machines power it is so powerful that it becomes impossible +to tell whether itll terminate or not!

+

It actually gets much worse, because this result can be generalized to an unreasonable degree! +In general, theres very little we can say about arbitrary programs.

+

We can easily check syntactic properties (is the program text shorter than 4 kilobytes?), but they +are, in some sense, not very interesting, as they depend a lot on how exactly one writes a program. +It would be much more interesting to check some refactoring-invariant properties, which hold when +you change the text of the program, but leave the behavior intact. Indeed, does this change +preserve behavior? would be one very useful property to check!

+

So lets define two TMs to be equivalent, if they have identical behavior. That is, for each +specific input, either both machines dont terminate, or they both halt, and give identical results.

+

Then, our refactoring-invariant properties are, by definition, properties that hold (or do not hold) +for the entire classes of equivalence of TMs.

+

And a somewhat depressing result here is that there are no non-trivial refactoring-invariant +properties that you can algorithmically check.

+

Suppose we have some magic TM, called P, which checks such a property. Lets show that, using P, we can +solve the problem we know we can not solve the halting problem.

+

Consider a Turing Machine that is just an infinite loop and never terminates, M1. P might or might +not hold for it. But, because P is non-trivial (it holds for some machines and doesnt hold for some +machines), theres some different machine M2 which differs from M1 with respect to P. That is, +P(M1) xor P(M2) holds.

+

Lets use these M1 and M2 to figure out whether a given machine M halts on input I. Using Universal +Turing Machine (interpreter), we can construct a new machine, M12 that just runs M on input I, then +erases the contents of the tape and runs M2. Now, if M halts on I, then the resulting machine M12 is +behaviorally-equivalent to M2. If M doesnt halt on I, then the result is equivalent to the infinite +loop program, M1. Or, in pseudo-code:

+ +
+ + +
def M1(input):
+    while True:
+        pass
+
+def M2(input):
+    # We don't actually know what's here
+    # but we know that such a machine exists.
+
+assert(P(M1) != P(M2))
+
+def halts(M, I):
+    def M12(input):
+        M(I) # might or might not halt
+        return M2(input)
+
+    return P(M12) == P(M2)
+ +
+

This is pretty bad and depressing we cant learn anything meaningful about an arbitrary Turing +Machine! So lets finally get to the actual topic of todays post:

+
+
+ +

+ Primitive Recursive Functions +

+

This is going to be another computational device, like FSMs and TMs. Like an FSM, its going to be a +nice, always terminating, non-Turing complete device. But it will turn out to have quite a bit of +the power of a full Turing Machine!

+

However, unlike both TMs and FSMs, Primitive Recursive Functions are defined directly as +functions which take a tuple of natural numbers and return a natural number. The two simplest ones +are zero (that is, zero-arity function that returns 0) and succ a unary function that +just adds 1. Everything else is going to get constructed out of these two:

+ +
+ + +
zero = 0
+succ(x) = x + 1
+ +
+

One way we are allowed to combine these functions is by composition. So we can get all the constants +right off the bat:

+ +
+ + +
succ(zero) = 1
+succ(succ(zero)) = 2
+succ(succ(succ(zero))) = 3
+ +
+

We arent going to be allowed to use general recursion (because it can trivially non-terminate), +but we do get to use a restricted form of C-style loop. It is a bit fiddly to define formally! The +overall shape is LOOP(init, f, n).

+

Here, init and n are numbers the initial value of the accumulator and the total number of +iterations. The f is a unary function that specifies the loop body it takes the current value +of the accumulator and returns the new value. So

+ +
+ + +
LOOP(init, f, 0) = init
+LOOP(init, f, 1) = f(init)
+LOOP(init, f, 2) = f(f(init))
+LOOP(init, f, 3) = f(f(f(init)))
+ +
+

While this is similar to a C-style loop, the crucial difference here is that the total number of +iterations n is fixed up-front. Theres no way to mutate the loop counter in the loop body.

+

This allows us to define addition:

+ +
+ + +
add(x, y) = LOOP(x, succ, y)
+ +
+

Multiplication is trickier. Conceptually, to multiply x and y, we want to LOOP from zero, and +repeat add x y times. The problem here is that we cant write an add x function yet

+ +
+ + +
# Doesn't work, add is a binary function!
+mul(x, y) = LOOP(0, add, y)
+ +
+ +
+ + +
# Doesn't work either, no x in scope!
+add_x v = add(x, v)
+mul(x, y) = LOOP(0, add_x, y)
+ +
+

One way around this is to define LOOP as a family of operators, which can pass extra arguments to +the iteration function:

+ +
+ + +
LOOP0(init, f, 2) = f(f(init))
+LOOP1(c1, init, f, 2) = f(c1, f(c1, init))
+LOOP2(c1, c2, init, f, 2) = f(c1, c2, f(c1, c2, init))
+ +
+

That is, LOOP_N takes an extra n arguments, and passes them through to any invocation of the body +function. To express this idea a little bit more succinctly, lets just allow to partially apply +the second argument of LOOP. That is:

+
    +
  • +All our functions are going to be first order. All arguments are numbers, the result is a number. +There arent higher order functions, there arent closures. +
  • +
  • +The LOOP is not a function in our language its a builtin operator, a keyword. So, for +convenience, we allow passing partially applied functions to it. But semantically this is +equivalent to just passing in extra arguments on each iteration. +
  • +
+

Which finally allows us to write

+ +
+ + +
mul(x, y) = LOOP(0, add x, y)
+ +
+

Ok, so thats progress we made something as complicated as multiplication, and we still are in +the guaranteed-to-terminate land. Because each loop has a fixed number of iterations, everything +eventually finishes.

+

We can go on and define xy:

+ +
+ + +
pow(x, y) = LOOP(1, mul x, y)
+ +
+

And this in turn allows us to define a couple of concerning fast growing functions:

+ +
+ + +
pow_2(n) = pow(2, n)
+pow_2_2(n) = pow_2(pow_2(n))
+ +
+

Thats fun, but to do some programming, well need an if. Well get to it, but first well need +some boolean operations. We can encode false as 0 and true as 1. Then

+ +
+ + +
and(x, y) = mul(x, y)
+ +
+

But or creates a problem: well need a subtraction.

+ +
+ + +
or(x, y) = sub(
+  add(x, y),
+  mul(x, y),
+)
+ +
+

Defining sub is tricky, due to two problems:

+

First, we only have natural numbers, no negatives. This one is easy to solve well just define +subtraction to saturate.

+

The second problem is more severe I think we actually cant express subtraction given the set of +allowable operations so far. That is because all our operations are monotonic the result is +never less than the arguments. One way to solve this problem is to define the LOOP in such a way +that the body function also gets passed a second argument the current iteration. So, if you +iterate up to n, the last iteration will observe n - 1, and that would be the non-monotonic +operation that creates subtraction. But that seems somewhat inelegant to me, so instead I will just +add a pred function to the basis, and use that to add loop counters to our iterations.

+ +
+ + +
pred(0) = 0 # saturate
+pred(1) = 0
+pred(2) = 1
+...
+ +
+

Now we can say:

+ +
+ + +
sub(x, y) = LOOP(x, pred, y)
+
+and(x, y) = mul(x, y)
+or(x, y) = sub(
+  add(x, y),
+  mul(x, y)
+)
+not(x) = sub(1, x)
+
+if(cond, a, b) = add(
+  mul(a, cond),
+  mul(b, not(cond)),
+)
+ +
+

And now we can do a bunch of comparison operators:

+ +
+ + +
is_zero(x) = sub(1, x)
+
+# x >= y
+ge(x, y) = is_zero(sub(y, x))
+
+# x == y
+eq(x, y) = and(ge(x, y), ge(y, x))
+
+# x > y
+gt(x, y) = and(ge(x, y), not(eq(x, y)))
+
+# x < y
+lt(x, y) = gt(y, x)
+ +
+

With that we could implement modulus. To compute x % m we will start with x, and will be +subtracting m until we get a number smaller than m. Well need at most x iterations for that.

+

In pseudo-code:

+ +
+ + +
def mod(x, m):
+  current = x
+
+  for _ in 0..x:
+    if current < m:
+      current = current
+    else:
+      current = current - m
+
+  return current
+ +
+

And as a bona fide PRF:

+ +
+ + +
mod_iter(m, x) = if(
+  lt(x, m),
+  x,        # then
+  sub(x, m) # else
+)
+mod(x, m) = LOOP(x, mod_iter m, x)
+ +
+

Thats a curious structure rather than computing the modulo directly, we essentially search for +it using trial and error, and relying on the fact that the search has a clear upper bound.

+

Division can be done similarly: to divide x by y, start with 0, and then repeatedly add one to the +accumulator until the product of the accumulator and y exceeds x:

+ +
+ + +
div_iter x y acc = if(
+  le(mul(succ(acc), y), y),
+  succ(acc), # then
+  acc        # else
+)
+div(x, y) = LOOP(0, div_iter x y, x)
+ +
+

This really starts to look like programming! One thing we are currently missing are data structures. +While our functions take multiple arguments, they only return one number. But its easy enough to +pack two numbers into one: to represent an (a, b) pair, well use 2a 3b number:

+ +
+ + +
mk_pair(a, b) = mul(pow(2, a), pow(3, b))
+ +
+

To deconstruct such a pair into its first and second components, we need to find the maximum power +of 2 or 3 that divides our number. Which is exactly the same shape we used to implement div:

+ +
+ + +
max_factor_iter p m acc = if(
+  is_zero(mod(p, pow(m, succ(acc)))),
+  succ(acc), # then
+  acc,       # else
+)
+max_factor(p, m) = LOOP(0, max_factor_iter p m, p)
+
+fst(p) = max_factor(p, 2)
+snd(p) = max_factor(p, 3)
+ +
+

Here again we use the fact that the maximal power of two that divides p is not larger than p +itself, so we can over-estimate the number of iterations well need as p.

+

Using this pair construction, we can finally add a loop counter to our LOOP construct. To track +the counter, we pack it as a pair with the accumulator:

+ +
+ + +
LOOP(mk_pair(init, 0), f, n)
+ +
+

And then inside f, we first unpack that pair into accumulator and counter, pass them to actual loop +iteration, and then pack the result again, incrementing the counter:

+ +
+ + +
f acc = mk_pair(
+  g(fst(acc), snd(acc)),
+  succ(snd(acc)),
+)
+ +
+

Ok, so we have achieved something remarkable: while we are writing terminating-by-construction +programs, which are definitely not Turing complete, we have constructed basic programming staples, +like boolean logic and data structures, and we have also built some rather complicated mathematical +functions, like 22N.

+

We could try to further enrich our little primitive recursive kingdom by adding more and more +functions on an ad hoc basis, but lets try to be really ambitious and go for the main prize — +simulating Turing Machines.

+

We know that we will fail: Turing machines can enter an infinite loop, but PRFs necessarily terminate. +That means, that, if a PRF were able to simulate an arbitrary TM, it would have to say after a certain +finite amount of steps that this TM doesnt terminate. And, while we didnt do this, its easy to +see that you could simulate the other way around and implement PRFs in a TM. But that would give +us a TM algorithm to decide if an arbitrary TM halts, which we know doesnt exist.

+

So, this is hopeless! But we might still be able to learn something from failing.

+

Ok! So lets start with a configuration of a TM which we somehow need to encode into a single +number. First, we need the state variable proper (Q0, Q1, etc), which seems easy enough to represent +with a number. Then, we need a tape and a position of the reading head. Recall how we used a pair of +stacks to represent exactly the tape and the position. And recall that we can look at a stack of +zeros and ones as a number in binary form, where push and pop operations are implemented using %, +*, and / exactly the operations we already can do. So, our configuration is just three +numbers: (S, stack1, stack2).

+

And, using the 2a3b5c trick, we can pack this triple into just a single number. But that means we +could directly encode a single step of a Turing Machine:

+ +
+ + +
single_step(config) = if(
+  # if the state is Q0 ...
+  eq(fst(config), 0)
+
+  # and the symbol at the top of left stack is 0
+  if(is_zero(mod(snd(config), 2))
+    mk_triple(
+      1,                    # move to state Q1
+      div(snd(config), 2),  # pop value from the left stack
+      mul(trd(config), 2),  # push zero onto the right stack
+    ),
+    ... # Handle symbol 1 in state Q1
+  )
+  # if the state is Q1 ...
+  if(eq(fst(config), 1)
+    ...
+  )
+)
+ +
+

And now we could plug that into our LOOP to simulate a Turing Machine running for N steps:

+ +
+ + +
n_steps initial_config n =
+  LOOP(initial_config, single_step, n)
+ +
+

The catch of course is that we cant know the N thats going to be enough. But we can have a very +good guess! We could do something like this:

+ +
+ + +
hopefully_enough_steps initial_config =
+  LOOP(initial_config, single_step, pow_2_2(initial_config))
+ +
+

That is, run for some large tower of exponents of the initial state. Which would be plenty for +normal algorithms, which are usually 2N at worst!

+

Or, generalizing:

+ +
+

If a TM has a runtime which is bounded by some primitive-recursive function, then the entire +TM can be replaced with a PRF. Be advised that PRFs can grow really fast.

+
+ +
+

Which is the headline result we have set out to prove!

+
+
+ +

+ Primitive Recursive Functions: Limit +

+

It might seem that non-termination is the only principle obstacle. That anything that terminates at +all has to be implementable as a PRF. Alas, thats not so. Lets go and construct a function that is +surmountable by a TM, but is out of reach of PRFs.

+

We will combine the ideas of the impossibility proofs for FSMs (noting that if a function is +computed by some machine, that machine has a specific finite size) and TMs (diagonalization).

+

So, suppose we have some function f that cant be computed by a PRF. How would we go about proving +that? Well, wed start with suppose that we have a PRF P that computes f. And then we could +notice that P would have some finite size. If you look at it abstractly, the P is its syntax tree, +with lots of LOOP constructs, but it always boils down to some succs and zeros at the leaves. +Lets say that the depth of P is d.

+

And, actually, if you look at it, there are only a finite number of PRFs with depth at most d. Some +of them describe pretty fast growing functions. But probably theres a limit to how fast a function +can grow, given that it is computed by a PRF of size d. Or, to use a concrete example: we have +constructed a PRF of depth 5 that computes two to the power of two to the power of N. Probably if we +were smarter, we could have squeezed a couple more levels into that tower of exponents. But +intuitively it seems that if you build a tower of, say, 10 exponents, that would grow faster than +any PRF of depth 5. And that this generalizes for any fixed depth, theres a high-enough +tower of exponents that grows faster than any PRF with that depth.

+

So we could conceivably build an f that defeats our d-deep P. But thats not quite a victory +yet: maybe that f is feasible for d+2-deep PRFs! So here well additionally apply +diagonalization: for each depth, well build its own depth-specific nemesis f_d. And then well +define our overall function as

+ +
+ + +
a(n) = f_n(n)
+ +
+

So, for n large enough itll grow faster than a PRF with any fixed depth.

+

So thats the general plan, the rest of the own is basically just calculating the upper bound on the +growth of a PRF of depth d.

+

One technical difficulty here is that PRFs tend to have different arities:

+ +
+ + +
f(x, y)
+g(x, y, z, t)
+h(x)
+ +
+

Ideally, wed use just one upper bound of them all. So well be looking for an upper bound of the +following form:

+ +
+ + +
f(x, y, z, t) <= A_d(max(x, y, z, t))
+ +
+

That is:

+
    +
  • +Compute the depth of f, d. +
  • +
  • +Compute the largest of its arguments. +
  • +
  • +And plug that into unary function for depth d. +
  • +
+

Lets start with d=1. We have only primitive functions on this level, succ, zero, and pred, +so we could say that

+ +
+ + +
A_1(x) = x + 1
+ +
+

Now, lets handle an arbitrary other depth d + 1. In that case, our function is non-primitive, so at +the root of the syntax tree we have either a composition or a LOOP.

+

Composition would look like this:

+ +
+ + +
f(x, y, z, ...) = g(
+  h1(x, y, z, ...),
+  h2(x, y, z, ...),
+  h3(x, y, z, ...),
+)
+ +
+

where g and h_n are d deep and the resulting f is d+1 deep. We can immediately estimate +the h_n then:

+ +
+ + +
f(args...) <= g(
+  A_d(maxarg),
+  A_d(maxarg),
+  A_d(maxarg),
+  ...
+)
+ +
+

In this somewhat loose notation, args... stands for a tuple of arguments, and maxarg stands for +the largest one.

+

And then we could use the same estimate for g:

+ +
+ + +
f(args...) <= A_d(A_d(maxarg))
+ +
+

This is super high-order, so lets do a concrete example for a depth-2 two-argument function which +starts with a composition:

+ +
+ + +
f(x, y) <= A_1(A_1(max(x, y)))
+         = A_1(max(x, y) + 1)
+         = max(x, y) + 2
+ +
+

This sounds legit: if we dont use LOOP, then f(x, y) is either succ(succ(x)) or succ(succ(y)) +so max(x, y) + 2 indeed is the bound!

+

Ok, now the fun case! If the top-level node is a LOOP, then we have

+ +
+ + +
f(args...) = LOOP(
+  g(args...),
+  h(args...),
+  t(args...),
+)
+ +
+

This sounds complicated to estimate, especially due to that last t(args...) argument, which is the +number of iterations. So well be cowards and wont actually try to estimate this case. Instead, +we will require that our PRF is written in a simplified form, where the first and the last arguments +to LOOP are simple.

+

So, if your PRF looks like

+ +
+ + +
f(x, y) = LOOP(x + y, mul, pow2(x))
+ +
+

you are required to re-write it first as

+ +
+ + +
helper(u, v) = LOOP(u, mul, v)
+f(x, y) = helper(x + y, pow2(x))
+ +
+

So now we only have to deal with this:

+ +
+ + +
f(args...) = LOOP(
+  arg,
+  g(args...),
+  arg,
+)
+ +
+

f has depth d+1, g has depth d.

+

On the first iteration, well call g(args..., arg), which we can estimate as A_d(maxarg). That +is, g does get an extra argument, but it is one of the original arguments of f, and we are +looking at the maximum argument anyway, so it doesnt matter.

+

On the second iteration, we are going to call +g(args..., prev_iteration) +which we can estimate as +A_d(max(maxarg, prev_iteration)).

+

Now we plug our estimation for the first iteration:

+ +
+ + +
g(args..., prev_iteration)
+  <= A_d(max(maxarg, prev_iteration))
+  <= A_d(max(maxarg, A_d(maxarg)))
+  =  A_d(A_d(maxarg))
+ +
+

That is, the estimate for the first iteration is A_d(maxarg). The estimation for the second +iteration adds one more layer: A_d(A_d(maxarg)). For the third iteration well get +A_d(A_d(A_d(maxarg))).

+

So the overall thing is going to be smaller than A_d iteratively applied to itself some number of +times, where some number is one of the f original arguments. But no harms done if we iterate up +to maxarg.

+

As a sanity check, the worst depth-2 function constructed with iteration is probably

+ +
+ + +
f(x, y) = LOOP(x, succ, y)
+ +
+

which is x + y. And our estimate gives x + 1 applied maxarg times to maxarg, which is 2 * +maxarg, which is indeed the correct upper bound!

+

Combining everything together, we have:

+ +
+ + +
A_1(x) = x + 1
+
+f(args...) <= max(
+  A_d(A_d(maxarg)),               # composition case
+  A_d(A_d(A_d(... A_d(maxarg)))), # LOOP case,
+   <-    maxarg A's         ->
+)
+ +
+

That max there is significant although it seems like the second line, with maxarg +applications, is always going to be longer, maxarg, in fact, could be as small as zero. But we +can take maxarg + 2 repetitions to fix this:

+ +
+ + +
f(args...) <=
+  A_d(A_d(A_d(... A_d(maxarg)))),
+  <-    maxarg + 2 A's         ->
+ +
+

So lets just define A_{d+1}(x) to make that inequality work:

+ +
+ + +
A_{d+1}(x) = A_d(A_d( .... A_d(x)))
+            <- x + 2 A_d's in total->
+ +
+

Unpacking:

+

We define a family of unary functions A_d, such that each A_d grows faster than any n-ary PRF +of depth d. If f is a ternary PRF of depth 3, then f(1, 92, 10) <= A_3(92).

+

To evaluate A_d at point x, we use the following recursive procedure:

+
    +
  • +If d is 1, return x + 1. +
  • +
  • +Otherwise, evaluate A_{d-1} at point x to get, say, v. Then evaluate A_{d-1} again at +point v this time, yielding u. Then compute A_{d-1}(u). Overall, repeat this process x+2 +times, and return the final number. +
  • +
+

We can simplify this a bit if we stop treating d as a kind of function index, and instead say +that our A is just a function of two arguments. Then we have the following equations:

+ +
+ + +
A(1, x) = x + 1
+A(d + 1, x) = A(d, A(d, A(d, ..., A(d, x))))
+                <- x + 2 A_d's in total->
+ +
+

The last equation can re-formatted as

+ +
+ + +
A(
+  d,
+  A(d, A(d, ..., A(d, x))),
+  <- x + 1 A_d's in total->
+)
+ +
+

And for non-zero x that is just

+ +
+ + +
A(
+  d,
+  A(d + 1, x - 1),
+)
+ +
+

So we get the following recursive definition for A(d, x):

+ +
+ + +
A(1, x) = x + 1
+A(d + 1, 0) = A(d, A(d, 0))
+A(d + 1, x) = A(d, A(d + 1, x - 1))
+ +
+

As a Python program:

+ +
+ + +
def A(d, x):
+  if d == 1: return x + 1
+  if x == 0: return A(d-1, A(d-1, 0))
+  return A(d-1, A(d, x - 1))
+ +
+

Its easy to see that computing A on a Turing Machine using this definition terminates this +is a function with two arguments, and every recursive call uses a lexicographically smaller pair of +arguments. And we constructed A in such a way that A(d, x) as a function of x is larger than any +PRF with a single argument of depth d. But that means that the following function with one argument +a(x) = A(x, x)

+

grows faster than any PRF. And thats an example of a function which a Turing Machine has no +trouble computing (given sufficient time), but which is beyond the capabilities of PRFs.

+
+
+ +

+ Part III, Descent From the Ivory Tower +

+

Remember, this is a three-part post! And are finally at the part 3! So lets circle back to the +practical matters. We have learned that:

+
    +
  • +Turing machines dont necessarily terminate. +
  • +
  • +While other computational devices, like FSMs and PRFs, can be made to always terminate, theres no +guarantee that theyll terminate fast. PRFs in particular can compute quite large functions! +
  • +
  • +And non-Turing complete devices can be quite expressive. For example, any real-world algorithm +that works on a TM can be adapted to run as a PRF. +
  • +
  • +Moreover, you dont even have to contort the algorithm much to make it fit. Theres a universal +recipe for how to take something Turing complete and make it a primitive recursive function +instead just add an iteration counter to the device, and forcibly halt it if the counter grows +too large. +
  • +
+

Or, more succinctly: theres no practical difference between a program that doesnt terminate, and +the one that terminates after a billion years. As a practitioner, if you think you need to solve the +first problem, you need to solve the second problem as well. And making your programming language +non-Turing complete doesnt really help with this.

+

And yet, there are a lot of configuration languages out there that use non-Turing completeness as +one of their key design goals. Why is that?

+

I would say that we are never interested in Turing-completeness per-se. We usually want some much +stronger properties. And yet theres no convenient catchy name for that bag of features of a good +configuration language. So, non-Turing-complete gets used as a sort of rallying cry to signal that +something is a good configuration language, and maybe sometimes even to justify to others inventing +a new language instead of taking something like Lua. That is, the real reason why you want at +least a different implementation is all those properties you really need, but they are kinda hard to +explain, or at least much harder than we cant use Python/Lua/JavaScript because they are +Turing-complete.

+

So what are the properties of a good configuration language?

+

First, we need the language to be deterministic. If you launch Python and type id([]), youll +see some number. If you hit ^C, and than do this again, youll see a different number. This is OK +for normal programming, but is usually anathema for configuration. Configuration is often used as a +key in some incremental, caching system, and letting in non-determinism there wreaks absolute chaos!

+

Second, you need the language to be well-defined. You can compile Python with ASLR disabled, and +use some specific allocator, such that id([]) always returns the same result. But that result +would be hard to predict! And if someone tries to do an alternative implementation, even if they +disable ASLR as well, they are likely to get a different deterministic number! Or the same could +happen if you just update the version of Python. So, the semantics of the language should be clearly +pinned-down by some sort of a reference, such that it is possible to guarantee not only +deterministic behavior, but fully identical behavior across different implementations.

+

Third, you need the language to be pure. If your configuration can access environment variables or +read files on disk, than the meaning of the configuration would depend on the environment where the +configuration is evaluated, and you again dont want that, to make caching work.

+

Fourth, a thing that is closely related to purity is security and sandboxing. The mechanism to +achieve both purity and security is the same you dont expose general IO to your language. But +the purpose is different: purity is about not letting the results be non-deterministic, while +security is about not exposing access tokens to the attacker.

+

And now this gets tricky. One particular possible attack is a denial of service sending some bad +config which makes our system just spin there burning the CPU. Even if you control all IO, you +are generally still open to these kinds of attacks. It might be OK to say this is outside of the +threat model that no one would find it valuable enough to just burn your CPU, if they cant also +do IO, and that, even in the event that this happens, theres going to be some easy mitigation in the +form of a higher-level timeout.

+

But you also might choose to provide some sort of guarantees about execution time, and thats really +hard. Two approaches work. One is to make sure that processing is obviously linear. Not just +terminates, but is actually proportional to the size of inputs, and in a very direct way. If the +correspondence is not direct, than its highly likely that it is in fact non linear. The second +approach is to ensure metered execution during processing, decrement a counter for every +simple atomic step and terminate processing when the counter reaches zero.

+

Finally one more vague property youd want from a configuration language is for it to be simple. +That is, to ensure that, when people use your language, they write simple programs. It seems to me +that this might actually be the case where banning recursion and unbounded loops could help, though +I am not sure. As we know from the PRF exercise, this wont actually prevent people from writing +arbitrary recursive programs. Itll just require some roundabout +code to do that. But maybe thatll be enough of a +speedbump to make someone invent a simple solution, instead of brute-forcing the most obvious one?

+

Thats all for today! Have a great weekend, and remember:

+ +
+

Any algorithm that can be implemented by a Turing Machine such that its runtime is bounded by some +primitive recursive function of input can also be implemented by a primitive recursive function!

+
+ +
+
+]]>
+
+ +
diff --git a/index.html b/index.html new file mode 100644 index 00000000..0d7945a6 --- /dev/null +++ b/index.html @@ -0,0 +1,510 @@ + + + + + + + matklad + + + + + + + + + + + + +
+ +
+ +
+ +
+ + + + + diff --git a/links.html b/links.html new file mode 100644 index 00000000..d4302984 --- /dev/null +++ b/links.html @@ -0,0 +1,314 @@ + + + + + + + matklad + + + + + + + + + + + + +
+ +
+ +
+

A bunch of things I find myself repeatedly referring to in various discussions!

+
+
Two meanings of systems programming
+
+

http://willcrichton.net/notes/systems-programming/

+
+
Systems programmers can have nice things
+
+

https://robert.ocallahan.org/2016/08/random-thoughts-on-rust-cratesio-and.html

+
+
Goals and priorities for C++
+
+

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2137r0.html

+
+
Boundaries
+
+

https://www.destroyallsoftware.com/talks/boundaries

+
+
Plugin diagram
+
+

https://www.tedinski.com/2018/01/30/the-one-ring-problem-abstraction-and-power.html

+
+
Data, ADT, Object
+
+

https://www.tedinski.com/2018/02/27/the-expression-problem.html

+
+
John Carmack on inlined code
+
+

http://number-none.com/blow/john_carmack_on_inlined_code.html

+
+
A few billion lines of code later
+
+

https://cacm.acm.org/magazines/2010/2/69354-a-few-billion-lines-of-code-later/abstract

+
+
Simple testing can prevent most critical failures
+
+

https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-yuan.pdf

+
+
The Error Model
+
+

http://joeduffyblog.com/2016/02/07/the-error-model/

+
+
Talks that changed the way I think about programming
+
+

http://www.opowell.com/post/talks-that-changed-the-way-i-think-about-programming/

+
+
Why rewriting hotspots in a faster language doesnt work
+
+

https://code.visualstudio.com/blogs/2018/03/23/text-buffer-reimplementation

+ +
+

The right tool for the job is often the tool you are already using adding new tools has a higher cost than many people appreciate

+
+
John Carmack
+
+
+
Programming people
+
+

https://leftoversalad.com/c/015_programmingpeople/

+
+
Python spends almost all of its time in the C runtime
+
+

http://blog.kevmod.com/2016/07/why-is-python-slow/

+
+
Rider architecture
+
+

https://www.codemag.com/Article/1811091/Building-a-.NET-IDE-with-JetBrains-Rider

+
+
System programming values
+
+

https://www.youtube.com/watch?v=2wZ1pCpJUIM

+
+
Midlayer mistake
+
+

https://lwn.net/Articles/336262/

+
+
Technology from the past come to save the future from itself
+
+

http://venge.net/graydon/talks/

+
+
Not Rocket Science Rule
+
+

https://graydon2.dreamwidth.org/1597.html

+
+
Why C++ Sails When the Vasa Sank
+
+

https://www.youtube.com/watch?v=ltCgzYcpFUI

+
+
Composition of Unsafe
+
+

https://smallcultfollowing.com/babysteps/blog/2016/10/02/observational-equivalence-and-unsafe-code/

+
+
In Rust, Ordinary Vectors are Values
+
+

http://smallcultfollowing.com/babysteps/blog/2018/02/01/in-rust-ordinary-vectors-are-values/

+
+
Implementing Swift Generics
+
+

https://www.youtube.com/watch?v=ctS8FzqcRug

+
+
Generics Dilemma
+
+

https://research.swtch.com/generic

+
+
A Catalogue of Optimizing Transformations
+
+

https://www.clear.rice.edu/comp512/Lectures/Papers/1971-allen-catalog.pdf

+
+
Static Program Analysis
+
+

https://cs.au.dk/~amoeller/spa/spa.pdf

+
+
Accurate mental model for Rusts reference types
+
+

https://docs.rs/dtolnay/0.0.9/dtolnay/macro._02__reference_types.html

+
+
Dont write bugs
+
+

https://www.teamten.com/lawrence/programming/dont-write-bugs.html

+
+
Dont use _ patterns
+
+

https://youtu.be/-J8YyfrSwTk?t=1819

+
+
What Is The Minimal Set Of Optimizations Needed For Zero-Cost Abstraction?
+
+

https://robert.ocallahan.org/2020/08/what-is-minimal-set-of-optimizations.html

+
+
Expect Tests
+
+

https://blog.janestreet.com/using-ascii-waveforms-to-test-hardware-designs/

+
+
What Every C Programmer Should Know About UB
+
+

https://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html

+
+
Build a Mountain
+
+

https://www.youtube.com/watch?v=443UNeGrFoM&t=6949s

+
+
Precise Profiling via rdpmc
+
+

https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view

+
+
JSONMutexDB
+
+

https://tailscale.com/blog/an-unlikely-database-migration/

+
+
Swift Is Undecidable
+
+

https://forums.swift.org/t/swift-type-checking-is-undecidable/39024

+
+
Don Syme of F# on typeclasses
+
+

https://github.com/fsharp/fslang-suggestions/issues/243#issuecomment-916079347

+
+
Outlined Containers
+
+

https://github.com/rust-lang/rust/pull/60470#issuecomment-489136965

+
+
Your ABI is Probably Wrong
+
+

https://outerproduct.net/boring/2021-05-07_abi-wrong.html

+
+
Limits to Growth
+
+

https://graydon2.dreamwidth.org/263429.html

+
+
Subprocess Gotchas
+
+

https://github.com/oconnor663/duct.py/blob/0764961a8c799873a9375d4100ae9ddbee624594/gotchas.md

+
+
The Unix process API is unreliable and unsafe
+
+

https://catern.com/process.html

+
+
Distributed Systems via remote syscalls
+
+

http://www.catern.com/integration.html

+
+
Latency Numbers
+
+

https://github.com/sirupsen/napkin-math

+
+
Moderation
+
+

https://old.reddit.com/r/rust/comments/hnfnti/where_is_the_rust_community_allowed_to_talk_about/fxf65nf/

+
+
Interfaces Belong With Users
+
+

https://neugierig.org/software/blog/2019/11/interface-pattern.html

+
+
Structured Concurrency
+
+

https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/

+
+
Typographic Size on The Web
+
+

https://fonts.google.com/knowledge/using_type/the_complications_of_typographic_size

+
+
Snapshot Testing
+
+

https://ianthehenry.com/posts/my-kind-of-repl/

+
+
Post IntelliJ
+
+

https://martinfowler.com/bliki/PostIntelliJ.html

+
+
+ +
+ + + + + diff --git a/resume.html b/resume.html new file mode 100644 index 00000000..d4143bd1 --- /dev/null +++ b/resume.html @@ -0,0 +1,321 @@ + + + + + + + matklad + + + + + + + + + + + + +
+ +
+ +
+ +

Resume

+

Welcome to my resume! +It consists of two parts. +The first part is the free-form narrative of what I do work-wise. +This is something I would be excited to read from a person I am going to work with. +The second part is a more traditional bullet-list of companies, positions, and projects. +The resume is available as .html and .pdf.

+
+ +

+ Narrative +

+

I used to do math. +Although I no longer do mathematics daily, it is the basis I use to think about programming. +I enjoy solving an occasional puzzle. +See Generate All the Things and Notes on Paxos articles as examples of math I like.

+

I am a programmer. +I like writing code just for the sake of it. +I like deleting code even more. +I like short, simple, robust and beautiful code, which not only gets the job done, but does it in an obviously correct way. +See, eg, ungrammar for an example of relatively short and self-contained piece of programming.

+

I am a pragmatist. +The above two points sound outright scary, but dont worry :) +While I do enjoy encoding lambda calculus in types, thats not what I spend most of my time on. +I see most code as something to be replaced and re-written later, and optimise for making changes over time, not for perfection right now. +This section from rust-analyzer style guide is a good example of this.

+

I loathe accidental complexity. +I think I spend most of my time trying to make things simpler, trying to remove parts, trying to make foundational APIs more crisp. +I have a visceral reaction to the gaps between how the thing should be, and how they are. +cargo xtask pattern shows to what lengths I am willing to go just to get rid of the mess the unix shell is.

+

I build systems. +Software engineering is programming integrated over time, and its that time dimension that really matters. +The shape of the software today is determined by accidental, runaway, viral successes of yesterday. +Theres a reason why VT100 interface is still programmed against today, and it is not its technical adequacy. +This is not my article, but I like it so much that Ill advertise it even in my resume. +Systems thinking is why I am fascinated with Rust and not, eg, with Kotlin. +Since Java with its reasonably fast managed runtime, Rust is the first PL revolution which meaningfully changes how we write software, and not just repacks known-good idioms with a better syntax (which is also important!, just not as exciting!).

+

I build open source communities. +My biggest successes so far I think are IntelliJ Rust and rust-analyzer. +I didnt write the hardest, smartest bits of those. +But I tried very hard to make sure that others can do that, by removing accidental complexity, by making contribution enjoyable, by trying to program the architecture which would be robust to time and systems effects.

+

More generally, I help build moderately large projects, which are combinations of all of the above: people, systematic forces, beautiful mathematical abstractions at the core, and hundreds of thousands of lines of code as a physical manifestation. +See One Hundred Thousand Lines of Rust series for a bunch of concrete, pragmatic lessons Ive learned so far.

+

I love teaching! +See, for example, my Russian Rust Course (YouTube), a series of videos about rust-analyzer (YouTube), or the article about Pratt parsers.

+

Oh, and I love writing :-)

+
+
+
+ +

+ Contacts +

+
+
Name
+
+

Alex Kladov

+
+
GitHub
+
+

https://github.com/matklad

+
+
Email
+
+

aleksey.kladov@gmail.com

+
+
+
+
+ +

+ Core Competencies +

+ +
+
+ +

+ Education +

+ +
+
+ +

+ Professional Experience +

+
+ +

+ TigerBeetle +

+

From Dec 2022
+https://tigerbeetle.com
+https://github.com/tigerbeetle/tigerbeetle

+

At TigerBeetle I help to build correct and fast distributed database for accounting in Zig. I work +throughout the stack, but my primary focus areas are consensus, testing, overall +build&development process, and knowledge sharing.

+
+
+ +

+ Rust Programming Language +

+

Sep 2015 to Dec 2022
+https://www.rust-lang.org/governance/teams/dev-tools

+

I was a member of the dev-tools team of the Rust programming language. I am the +original author of both IntelliJ +Rust and rust-analyzer the two +tools which today power IDE support for Rust in virtually every editor. My work +included both the technical task of writing an advanced, incremental, resilient +compilers and organizing a vibrant community of contributors and maintainers +to ensure that my direct involvement is not a requirement.

+

I made many smaller contributions across the Rust ecosystem. I was a +co-maintainer of Cargo in 2016-2018, +maintain prominent libraries, and +document emerging ecosystem patterns.

+
+
+ +

+ NEAR +

+

Feb 2021 to Dec 2022
+https://near.org

+

At NEAR, I was a TLM/TL for the contract runtime team which is responsible for +secure, reliable, and fast execution of WebAssembly smart contracts.

+
+
+ +

+ Ferrous Systems +

+

Sep 2018 to Feb 2021
+https://ferrous-systems.com

+

With Ferrous Systems, we brought rust-analyzer project from an MVP to a de-facto +standard for the ecosystem. I also helped with teaching people to use Rust +efficiently.

+
+
+ +

+ Computer Science Center +

+

Sep 2014 to Sep 2019
+https://compscicenter.ru/teachers/934/

+

At CSC I taught two major courses:

+
+
Programming In Rust, Winter-Spring 2019, video
+
+

A semester long introduction course, focused on contrasting unique Rust +features with more mainstream languages like C++ or Java.

+
+
Programming In Python, Autumn-Winter 2018, video
+
+

A semester long advanced course focusing on the language inner workings and +programming idioms.

+
+
+

I have also worked as a teaching assistant for Algorithms and Data structures” +and Python courses.

+
+
+ +

+ JetBrains +

+

Sep 2015 to Jan 2018
+https://intellij-rust.github.io

+

At JetBrains, I have led the development of +IntelljJ-Rust plugin for the Rust +programming language. The plugin is a Rust compiler written in Kotlin, with +full-blown parser, name resolution and type inference algorithms, and +integrations with build tools and debuggers. Besides solving the technical +problems, Ive created an open source community around the plugin by mentoring +issues, writing developer documentation and supporting contributors.

+

The plugin later became the basis for the stand-alone Rust Rover +IDE.

+
+
+ +

+ Stepik.org +

+

2012 to 2014
+http://stepik.org/

+

Stepik is a e-learning platform, written in Python, focused on rich variety of +practical exercises and ease of creating content. I was on the backend team of +three from the start of the project. Among other things, Ive worked on +exercises subsystem and students code sandboxing, progress tracking and +designed and implemented JSON API interface for the single-page frontend.

+
+
+ +
+ + + + + diff --git a/resume.pdf b/resume.pdf new file mode 100644 index 00000000..8555cfec Binary files /dev/null and b/resume.pdf differ diff --git a/style.html b/style.html new file mode 100644 index 00000000..6f332375 --- /dev/null +++ b/style.html @@ -0,0 +1,327 @@ + + + + + + + matklad + + + + + + + + + + + + +
+ +
+ +
+ +

Programming Style

+

Congratulations, youve found a secret level!

+

This is a super work-in-progress page which collects various rules-of-thumb I use. +The primary goal so far is to collect the rules for myself, thats why I dont link to this page from anywhere yet.

+
+ +

+ General +

+
+ +

+ Naming +

+

Prefer full names except for extremely common cases (ctx for context), or equal-length pairs +(next/prev). Use consistent names. Naming variables after types (let thing: Thing) is a way +to achieve global consistency with little coordination.

+

Build a vocabulary of standard names and re-use it:

+
+
ctx
+
+

context of an operation. Typically holds something mutable. Read-only +context is named params.

+
+
params
+
+

A bag of named arguments. Unlike config, might hold not only pod types.

+
+
config
+
+

Generally user-specified POD parameters.

+
+
sink
+
+

output of an internal iterator, typically sink: &mut FnMut(T) or sink: &mut Vec<T>.

+
+
lhs, rhs
+
+

operands of a binary operator.

+
+
fuel
+
+

Recursion and infinite loop guards

+
+
result
+
+

A return variable.

+
+
line_index, line_number
+
+

Index is unambiguously zero-based. By convention, number is one-based.

+
+
+

Equisized pairs;

+
    +
  • +add/sub, mul/div +
  • +
  • +lhs/rhs +
  • +
  • +s/e +
  • +
  • +next/prev +
  • +
  • +source/target +
  • +
  • +src/dst +
  • +
  • +index/count +
  • +
  • +insert/remove +
  • +
  • +beg/end +
  • +
  • +fresh/stale +
  • +
+
+
+ +

+ Explicit Data Tables +

+

Remove code duplication by extracting commonalities into tabular data

+ +
+ + +
// GOOD
+const cases = ["foo", "bar", "baz"];
+for case in cases {
+    if x == case {
+
+    }
+}
+
+// BAD
+if x == "foo" {
+
+} else if x == "bar" {
+
+} else if x == "baz" {
+
+}
+ +
+
+
+ +

+ Bulk IO +

+

Avoid opening file descriptors in favor of bulk operations. To write data to a file, you need to +follow a lifecycle: open file descriptor, issue write syscalls, close file descriptor. Lifecycle +handling requires complicated type-system machinery and is bettre avoided. Usually, standard library +provides something like std::fs::read_to_string which encapsulates lifecycle management.

+
+
+
+ +

+ Rust +

+
+ +

+ No Self Types +

+

Write types out explicitly, avoid Self alias if possible:

+ +
+ + +
// Good
+pub struct Diagnostic {
+    pub code: DiagnosticCode,
+    pub text: String,
+}
+
+impl Diagnostic {
+    pub fn new(code: DiagnosticCode, text: String) -> Diagnostic {
+        Diagnostic { code, text }
+    }
+}
+
+// Bad
+impl Diagnostic {
+    pub fn new(code: DiagnosticCode, text: String) -> Self {
+        Self { code, text }
+    }
+}
+ +
+

Rationale: reducing cognitive load, optimizing for the reader. +Resolving Self is a small mental effort, it can be avoided.

+
+
+ +

+ Prefer new Over default +

+

Use new over default to construct instances.

+

Rationale: new is too ingrained.

+
+
+ +

+ Blank Line Between Declarations +

+

Leave blank line between top-level declarations:

+ +
+ + +
// Good
+impl Foo {
+    pub fn foo() {
+    }
+
+    pub fn bar() {
+    }
+}
+
+// Bad
+impl Foo {
+    pub fn foo() {
+    }
+    pub fn bar() {
+    }
+}
+ +
+

Rationale: consistency. +Omitting blank line leads to somewhat terser code, but is very hard to do consistently.

+
+
+ +

+ Derive Order +

+

Use the following order of derives:

+ +
+ + +
#[derive(Clone, Copy, Default, PartialEq, Eq, PartialOrd, Ord, Hash, Debug)]
+ +
+

Rationale: consistency. +Debug comes last because it is the most often added item.o

+
+
+ +
+ + + + +