Skip to content

Commit

Permalink
Make some changes to the "Why Maelstrom" blog entry.
Browse files Browse the repository at this point in the history
  • Loading branch information
nfachan committed Oct 16, 2024
1 parent 38989ca commit 4e428a7
Showing 1 changed file with 136 additions and 112 deletions.
248 changes: 136 additions & 112 deletions website/content/blog/why-maelstrom/index.md
Original file line number Diff line number Diff line change
@@ -1,169 +1,193 @@
+++
title = "Why Maelstrom?"
authors = ["Neal Fachan <[email protected]>"]
date = 2024-10-08
date = 2024-10-16
weight = 0
+++

At Maelstrom, our goal is to improve developer productivity. We believe that
tests are key to writing high-quality, reliable software, and we spend a lot of
time thinking about how to make them better.
In my 25 years of writing software professionally, I've witnessed a major shift
in our industry's attitude towards testing. When I started, we developers
rarely tested our software. We would run a few manual, ad hoc tests, and then
call it good. We would then throw our code over the figurative wall to a
dedicated testing team. If we were lucky, they'd get to our changes in a week
or two.

We believe that there are three important aspects to good tests. Tests should
be **fast**, **accessible**, and **reliable**.
Today, we strive to have as many automated tests as possible, and to run those
tests early and often. We continuously integrate our code, running those
automated tests when we do. We've all experienced, sometimes with great pain,
the fact that the earlier a defect is caught, the cheaper and easier it is to
fix.

Even though things have gotten better, we still don't run our tests nearly
early nor often enough. Our testing tools have failed to keep up with our
testing needs. This post will take a look at why that is, and will introduce
Maelstrom, a toolset that we've build to address the problem.

<!-- more -->

### CI Is Part of the Problem

As we've increased the number of tests we write, we've moved to running them
primarily in CI.

Testing on our resource-constrained local machines is slow, making it
frustrating to do frequently. We also often have tests that can't even be run
on our local machines because of missing dependencies, reducing the efficacy of
local testing.

As a result, we tend to rely on CI for the majority of our test running. In CI
we can configure the required dependencies, and we can easily add more
resources as necessary.

However, running tests only in CI comes with its own set of problems, the
biggest being that we don't run our tests often enough. Instead of running
tests every few minutes, we run them a few times a day. This means we catch
issues later in the development cycle, which wastes our time.

Our assertion is that we should be able to run all of our tests locally, as
well as in CI, in the exact same environments with the exact same dependencies.
That's why we built Maelstrom.

### Easier, Better, Faster, Stronger

Our goal with Maelstrom is to give developers the tools to run their tests
after every change they make. You shouldn't have to wait for CI to run your
tests: you should be able to do it right from your machine, on demand.

Moreover, if you run your tests this often, you need those tests to run
quickly. Tests are embarrassingly parallel, as by definition, they shouldn't
interact with each other. Maelstrom harnesses the immense amount of
computational power available today to run tests in parallel, which reduces how
long you wait for tests.

We've identified three traits a good test-running system must have to achieve
these goals. Running tests should be be **fast**, **accessible**, and
**reliable**. We'll next look at what we mean by these traits and why they are
important, and then proceed to show what we've built to meet these
requirements.

### Fast

We believe that a developer should be able to run their tests quickly.
You should be able to run your tests quickly.

When a developer is waiting for tests to run, they're not doing productive
work --- namely, writing code. On top of that, if tests take too long to run, the
developer's mind starts to wander. They context switch to some other task and
lose their state of flow.
Waiting for tests to run is frustrating. When you're not able to do productive
work --- namely, writing code --- you inevitably have to move on to other
tasks. You context switch and lose your state of flow.

If running tests take too long, then running them will begin to feel like a
chore. A developer might stop running them as often, or they may choose to only
run a small subset of them regularly. Either of these behaviors can lead to a
developer discovering defects late in the development process. Defects are much
easier and cheaper to correct when discovered immediately. The longer they
persist, the harder and more expensive they are to fix.
If running tests takes too long, running them feels like a chore. You might
stop running them as often, or your may run a smaller subset of them. Both
response, while understandable, can lead defects that get discovered late in
the development process. Defects are much easier and cheaper to correct when
discovered immediately. The longer they persist, the harder and more expensive
they are to fix.

Even speeding up test running from one minute to fifteen seconds can have a
huge impact on a developer's behavior. With a fast test-running setup, a
developer can run their tests scores of times during a day, catch defects
immediately, and do so without losing interest or getting distracted.
We've found that even small differences in the time it takes tests to run can
have a huge impact on our testing practices. With a fast test-running setup,
you can run your tests scores of times during the day, catch defects
immediately, and do so without losing focus.

### Accessible

We believe that a developer should be able to run any test at any time, right
from their local development environment.
You should be able to run any test at any time, right from your machine.

We've noticed a disturbing trend where many tests can only be run in CI. The
implication of this behavior is that instead of running those tests many times
throughout the day, a developer only runs them occasionally, maybe as little as
a few times a week. A developer often completes hours or days of work only to
find out in CI that they have to start over with a new design.
There's a disturbing trend in today's software development practices where many
tests can only be run in CI. The implication is that instead of running those
tests many times throughout the day, you only run them occasionally, sometimes
as little as a few times a week. You often complete hours or days of work only
to find out in CI that you need a new design.

Additionally, it can be hard to diagnose and fix a failure that was seen in CI,
as it may not be reproducible locally. A developer may have to resort to
repeatedly submitting speculative change to CI just to kick off new runs of the
tests.
It can be incredibly hard to diagnose and fix a failure found in CI, as it may
not be reproducible locally. You may have to resort to repeatedly submitting
speculative changes to CI just to kick off new runs of the tests.

When a developer can simply and habitually run all of their tests, in exactly
the same environment that CI does, right from their local machine, it
dramatically decreases the time they spend on root-cause analysis, bug fixing,
or redesign.
When you can run all of your tests, in the exact same environment that CI does,
right from your machine, it dramatically decreases the time you spend on
root-cause analysis, bug fixing, or redesign later.

### Reliable

We believe that tests should be reliable.
Tests should be reliable.

It seems obvious that we should make our tests as reliable as possible.
However, there are some sources of unreliability that seem to go
under-acknowledged.
It should be obvious that tests need to be reliable, but we seem to accept some
sources of unreliability as if they are inevitable.

The first source of unreliability is the reliance of the test on the execution
environment. A test may have a dependency on a certain file or piece of
software that is installed on the test-running machine. If that software is
upgraded, or if the depended-upon file is changed, the test may suddenly break.
The first is the reliance on the test execution environment. A test may have
dependencies on software installed on the test-running machine, or on other
aspects of its file system. If the test is changed to require a newer version
of the software, but the test-running machine isn't, the test may break.
Conversely, if the test-running machine is changed --- its software updated or
the file system modified --- the test may break.

The second source of unreliability is test-to-test interactions. A test may
leave around artifacts --- in the process's memory, files stored on disk, or
data stored in network services --- that affect tests that are run afterwards.
If the tests are run in a different order, or if the first test is changed,
then a later test may fail in an unexpected way. Diagnosing that failure may be
then a later test may fail unexpectedly, and diagnosing that failure may be
incredibly difficult.

These instances of "action at a distance" can be confusing to understand
and frustrating to debug.
These instances of "action at a distance" can be confusing to understand and
frustrating to debug.

### What We've Built

Maelstrom is a framework that addresses these problems. It follows three
guiding principles.
Maelstrom addresses these problems by following three guiding principles.

First, we provide a way to specify the dependencies of every test. These
dependencies include any external software, devices, and files the test expects
to be installed on the system, as well as any shared network resources the test
relies on.

These dependencies are "opt-in": very few dependencies are provided by default.
This ensure that the set of dependencies stays small and also that the
This ensures that the set of dependencies stays small and also that the
developer understands what's going on with their tests.

Second, we run every test in its own isolated environment that includes only
the test's enumerated dependencies. This is an extension of the best practice
of running every test in its own fixture. Here, the fixture is taken to include
the whole file system, system resources, etc. We've written our own extremely
lightweight container runtime to implement this feature, which means that there
is very little overhead compared to running every test in its own process.
lightweight container runtime to implement this feature, meaning there is very
little overhead compared to running every test in its own process.

Third, we provide a mechanism to build a cluster of tests runners, so
developers can either run tests locally on their machines, or in parallel on an
arbitrarily large cluster.
Third, we provide a mechanism to build a cluster of test runners, so you can
run tests locally on your machine, or in parallel on an arbitrarily large
cluster.

### How to Use Maelstrom

Maelstrom provides specialized test runners for a variety of test frameworks.
We currently support Rust (via `cargo test`), Golang (via `go
test`), and Pytest. We will continue to add more tests runners in the
future.

If Maelstrom doesn't currently support a test runner, a developer can write
their own using our Rust or gRPC bindings. We also provide a command-line
utility for running an arbitrary program in the Maelstrom environment. This can
be used for ad hoc testing or as a target for scripts.

To get started running their tests using Maelstrom, a developer starts by using
the Maelstrom test runner as a drop-in replacement for their test runner. For
example, a developer would type `cargo maelstrom` instead of `cargo
test`. This will run all tests in a minimal environment to start with. Some
tests may not run properly without more dependencies specified. A developer
can then opt in to these required dependencies by adding dependencies to tests
that match certain patterns. It usually only takes a few minutes to get a whole
project's tests working with Maelstrom.

### Advantages Once Set Up

Once a developer starts using Maelstrom, there are a lot of things they can do.
Ideally, they will have access to a cluster of test runners: the bigger the
better. This cluster should be shared by all developers on the project, and
probably also with CI.

They can run more of their tests more often. With a cluster available,
running tests can become a very fast affair. A developer can then get instant
feedback when something breaks.

They can run all of their tests while developing, including the ones run by CI.
And since CI uses the same specifications, a developer can be confident that a
test that passes for them will also pass in CI. The converse is also true: a
test that fails in CI will fail for the developer, making reproducing a failure
much easier.

CI will also complete more quickly, since it will have a cluster available to
it. It will efficiently distribute test runs across the cluster, utilizing all
of the execution machines at once, without a developer having to do manual
balancing of test runs.

Having a cluster available also makes debugging flaky tests easier. A developer
can tell Maelstrom to run thousands of individual instances of a test, and
return when one fails. With enough cores in the cluster, a developer can
quickly track down even the most troublesome Heisenbug.

We have found that once developers have the experience of having a system like
Maelstrom at their disposal, they start to change how they approach testing.
Being able to test more of the system more often is part of it, but they also
start to write test differently. It now becomes more feasible to engage in more
computationally intensive testing. Developers can rely more heavily on fuzz
testing, "chaos testing", and exhaustive simulation.
Maelstrom provides specialized test runners for multiple test frameworks.
We currently support Rust (via `cargo test`), Golang (via `go test`), and
Pytest. We will continue to add more test runners in the future.

If Maelstrom doesn't currently support your chosen test runner, you can write
your own using our Rust or gRPC bindings. We also provide a command-line
utility for running an arbitrary program in the Maelstrom environment. You can
use this for ad hoc testing or as a target for scripts.

To get started running your tests using Maelstrom, start by using the
Maelstrom test runner as a drop-in replacement for your test runner. For
example, you would type `cargo maelstrom` instead of `cargo test`. This will
run all tests in a minimal environment to start with. Some tests may not run
properly without more dependencies specified. You can then opt into these
required dependencies by adding dependencies to tests that match certain
patterns. It usually only takes a few minutes to get a whole project's tests
working with Maelstrom.

### Conclusion

I have found through my own experience that once developers have the experience
of having a system like Maelstrom at their disposal, they change how they
approach testing. Being able to test more of the system more often is part of
it, but they also start to write test differently. It now becomes more feasible
to engage in more computationally intensive testing. Developers can rely more
heavily on fuzz testing, "chaos testing", and exhaustive simulation.


While the full value of Maelstrom's model works best when developers have
clusters available to them, there are a still a lot of advantages even when
that is not the case. It's still great to have tests always run the same way,
whether in CI or locally. And it's still great to be able to run tests
everywhere. In fact, we imagine a lot of developers will start with Maelstrom
without a generally-available cluster, then maybe move to having a cluster just
clusters available to them, there are a multitude of advantages even when
run locally. We know that most developers will start with Maelstrom
without a generally available cluster, then maybe move to having a cluster just
for CI, and then progress eventually to having a shared cluster for the entire
project.

You should have testing tools that move as fast as you do. Give Maelstrom a
shot and tell us what you think.

0 comments on commit 4e428a7

Please sign in to comment.