Skip to content

Latest commit

 

History

History
327 lines (222 loc) · 10 KB

README.md

File metadata and controls

327 lines (222 loc) · 10 KB

WASM 1.1 compiler / decompiler

Clojars

Cljdoc

A novel Clojure/script library for the WebAssembly (WASM) ecosystem:

  • WASM programs as simple immutable Clojure data structures
  • Decompiling and compiling WASM binaries
  • JVM and browser, no compromise
  • Allowing all sorts of crazy WASM analysis and metaprogramming
  • Working interactively as opposed to using the command-line
  • Generating random WASM programs for runtime testing
  • Fully described using Malli
  • Robust, backed up by generative testing

Supported platforms:

  • Babashka >= v0.3.5 (besides helins.wasm.schema namespace)
  • Browser
  • JVM
  • NodeJS

All binary processing relies on the powerful BinF library.

Status

The implementation of this library follows the WASM specification very closely.

Even the order of the definitions in the namespaces is identical to the order of the definitions in the WASM binary specification so that they can be read alongside. Indeed, there is no better documentation than the specification itself.

This design also reduces the chance of creating breaking changes. However, WASM development is very active and new proposals are being built. There should never be a breaking change within WASM itself. However, since this library is novel, current status is alpha for the time bing in spite of the fact that the design is robust and well-tested.

The goal is to remain up-to-date with all stable WASM proposals.

Documentation

The full API is available on Cljdoc.

Namespaces follow one naming scheme. In the WASM binary specification, any item is defined by a so-called "non-terminal symbol". For instance, the function section is designated by funcsec.

Names that refer to those non-terminal symbols end with the ' character. For instance, the helins.wasm.read namespace for decompiling WASM code has a funcsec' function which decompiles a function section. Those names do not have docstrings in Cljdoc since it is best to read and follow the WASM specification. Namespaces mimick exactly that specification for that reason.

All other names, such as higher-level abstractions, are fully described on Cljdoc.

Examples

Working examples are available in the helins.wasm.example namespace.

Usage, brief overview

Compilation / decompilation is easy as demonstrated in the example below.

The rest of the document is about understanding and modifying a decompiled program.

In very, very short:

(require 'clojure.pprint
         '[helins.wasm :as wasm])


;; Reading an example file from this repo:
;;
(def decompiled
     (wasm/decompile-file "src/wasm/test.wasm"))


;; Pretty printing decompiled form (Clojure data):
;;
(clojure.pprint/pprint decompiled)


;; Of course, we can recompile it:
;;
(def compiled
     (wasm/compile-file decompiled
                        "/tmp/test2.wasm"))

Working with files is the only JVM-exclusive utility in this library.

WASM binaries are represented as BinF views. For instance, from Clojurescript:

(require '[helins.binf :as binf])


;; Storage for our decompiled WASM program
;;
(def *decompiled
     (atom nil))


;; Fetching and decompiling WASM source from somewhere
;;
(-> (js/fetch "some_url/some_module.wasm")
    (.then (fn [resp]
             (.arrayBuffer resp)))
    (.then (fn [array-buffer]
             (reset! *decompiled
                     (-> array-buffer
                         ;; Wrapping buffer in a BinF view and preparing it (will set the right endianess)
                         binf/view
                         wasm/prepare-view
                         ;; Decompiling
                         wasm/decompile)))))


;; And later, we can just as well recompile it to a BinF view
;;
(def compiled
     (wasm/compile @*decompiled))

Installation

After adding this library to dependencies, one must also manually add Malli. As of today, an unreleased version is needed:

{metosin/malli {:git/url "https://github.com/metosin/malli"
                :sha     "0e5e3f1ee9bc8d6ea60dc16e59abf9cc295ab510"}}

The imported version (lastest release), does not support generation of instructions (and hence, modules).

Namespaces

In summary:

Namespace About
helins.wasm Compiling and decompiling WASM modules
helins.wasm.bin Defines all simple binary values such as opcodes
helins.wasm.ir Simple manipulations of WASM programs in Clojure
helins.wasm.read Implementing decompilation (for "experts")
helins.wasm.schema Using Malli, describes the WASM binary format in Clojure
helins.wasm.write Implementing compilation (for "experts")

Schema

The Clojure data structures representing WASM programs are almost a direct translation of the WASM binary specification. Very little abstraction has been added on purpose. The goal is to leverage those wonderful data structures while having the illusion of working directly with the binary representation.

The registry of Malli schemas describes everything:

(require '[helins.wasm.schema :as wasm.schema]
         '[malli.core         :as malli]
         '[malli.generator    :as malli.gen]
         '[malli.util])


;; Merging all needed registries.
;;
(def registry
     (merge (malli/default-schemas)
            (malli.util/schemas)
            (wasm.schema/registry)))


;; What is a `funcsec`?
;;
(get registry
     :wasm/funcsec)


;; Let us generate a random WASM program.
;;
(malli.gen/generate :wasm/module
                    {:registry registry})

Overall shape

A WASM program is a map referred in the namespaces as a ctx (context). It holds the program itself (WASM sections) as well as a few extra things (akin to the context described in other sections of the WASM specification).

Almost everything is a map but WASM instructions which are vectors. All simples values, such as opcodes, remain as binary values (see "Instructions" section for an example).

Sections

In the binary format, most WASM sections format are essentially a list of items, such as the data section being a list of data segments. Other parts of the program, such as instructions operating on such a data segment, refer to an item by its index in that list.

Howewer, working with lists of items and addressing those items by index is hard work, especially maintaining those references when things are removed, added, and move around. Hence, those sections are described by sorted maps of index -> item. They can be sparse and during compilation, indices (references) will be transparently recomputed into a dense list.

See helins.wasm.ir namespace for a few functions showing how to handle things like adding a data segment.

Instructions

Instructions are expressed as vectors where the first item is an opcode and the rest might be so-called "immediates" (ie. mandatory arguments). Once again, they look almost exactly like the binary format and the official specification is the best documentation.

For example, here is a WASM block which adds 42 to a value from the WASM stack:

(require '[helins.wasm.bin :as wasm.bin])

[wasm.bin/block
 nil
 [wasm.bin/i32-const 42]
 [wasm.bin/i32-add]]

Modifying a WASM program

Since everything is described in the helins.wasm.schema namespace and since those definitions are well documented in the WASM binary specification, it is fairly easy to create or modify WASM programs. Once one understands the format, it is just common Clojure programming without much surprise.

The helins.wasm.ir namespace ("ir" standing for "Intermediary Representation"), proposes a few utilities for doing basic things such as adding a function. It is not very well featured because usually, doing almost anything is very straightforward and do not require special helpers.

Novel WASM tools

The vast majority of existing WASM tools are implemented in Rust or C++. Doing things such as dead code elimination of WASM if a tedious process performed from the command-line. Building new tools in that ecosystem means abiding by that fact and working excusively with those native languages.

Hence, this library is one of its kind by offering a powerful interactive environment, on the JVM as well as in the browser, and leveraging Clojure idioms which are excellent for analyzing WASM code.

Babashka

Currently, Babashka does not support Malli. Hence, the helins.wasm.schema namespace is not supported. However, compilation, decompilation, and everything else work.

This very simple script shows how to decompile a WASM file in the terminal using barely a few lines.

This opens the possibility for quickly developing WASM dev tools that start up fast and, for instance, output some structural information about given binaries.

Running tests

Depending on hardware, tests usually takes a few minutes to run.

On the JVM, using Kaocha:

$ ./bin/test/jvm/run

On NodeJS, using Shadow-CLJS:

$ ./bin/test/node/run

# Or testing an advanced build:
$ ./bin/test/node/advanced

Development

Starting in Clojure JVM mode, mentioning an additional Deps alias (here, a local setup of NREPL):

$ ./bin/dev/clojure :nrepl

Starting in CLJS mode using Shadow-CLJS:

$ ./bin/dev/cljs
# Then open ./cljs/index.html

License

Copyright © 2021 Adam Helinski

Licensed under the term of the Mozilla Public License 2.0, see LICENSE.