Lexer/parser/translator generator in pure Go. Generates compiler frontends written in and accessible via Go, exclusively.
Ictiobus is intended to have implementations of the techniques given in the textbook "Compilers: Principles, Techniques, and Tools", by Aho, Lam, Sethi, and Ullman (otherwise known as the "Purple Dragon Book"). It is first and foremost an experimental learning system and secondarily the parser generator used as the parser for a scripting language in the tunaquest text adventure engine.
The name ictiobus
comes from the Latin name for the buffalo fish. The buffalo
(as in the non-fish kind) is related to the bison, and bison
is a popular
parser-generator. That, combined with the fact that the name ictiobus
doesn't
seem likely to be confused with other tools, is why the name was chosen.
Ictiobus is used to generate parsers and other tools for languages. This allows new languages to be created for use within programs written in Go; it lets you create new scripting languages! In theory, its output could be plugged into a modern compiler middle end or back end, but this is not the intent; its main purposes is to help you design and use your own scripting languages.
Building and using a scripting language with Ictiobus consists of a few steps. First, you'll need to create a specification for the language. This includes a context-free grammar that describes the language, the regular expression patterns that are to be lexed as tokens, and any steps needed to translate parsed constructs from your program into something your code can analyze, such as an abstract syntax tree. This specification is then laid out in a markdown file in the FISHI specification language, which is explained in the FISHI manual and in the FISHI spec itself.
This file is then read by the ictcc
command, which compiles a compiler
frontend for the language it describes. It outputs Go code with the token
definitions, generated lexer, generated parser, and generated syntax-directed
translation scheme built from the spec. The package it outputs will include a
Frontend()
function which can be called by outside code to get a Frontend;
from there, Analyze(io.Reader)
can be called on readers containingg the
scripting language to parse them into the intermediate representation.
For more information on using ictcc
, invoke it with -h
to see the help,
check the Go docs page for the ictcc command, or see the
ictcc manual.
Ictiobus parsers are generated by running the ictcc
command on markdown files
that contain specially-formatted codeblocks that give a specification for a
programming language. For information on that format, called FISHI, see the
FISHI manual.
To install ictcc
on your system, either grab one of the distributions from
the Releases page of the
Ictiobus repository, or run Go install:
go install github.com/dekarrin/ictiobus/cmd/ictcc@latest
Once ictcc
is installed on your system, it's time to put together a FISHI spec
that describes the language you want to create. This section gives a brief
overview on the topic; for an in-depth description of the FISHI language and
using it to define specs, see the FISHI manual.
A FISHI spec is defined in a markdown file with codeblocks containing FISHI
marked with the label "fishi". These are the only parts of the document that
will be read by the parser generator tool, ictcc
. Within these blocks, you'll
need at least one of each of the three parts of a FISHI spec:
- A
%%tokens
block, which gives definitions of the tokens in your language and defines the text patterns that the lexer should use to find them in source text. - A
%%grammar
block, which defines the context-free grammar that the parser will use to generate a parse tree from input tokens. - An
%%actions
block, which gives the actions for a syntax-directed translation to take to convert a parse tree into a final value that you will receive when callingAnalyze(io.Reader)
on input written in the new language.
Your spec might look like the following example:
neatlang-spec.md
# NeatLang Specification
This is a cool new language made with ictiobus! It's called NeatLang and it
does simple math.
```fishi
%%tokens
\+ %token + %human plus sign '+'
\* %token * %human multiplication sign '*'
\( %token lp %human left parenthesis '('
\) %token rp %human right parenthesis ')'
\d+ %token int %human integer
[A-Za-z_][A-Za-z_0-9]* %token id %human identifier
# ignore whitespace
\s+ %discard
%%grammar
{SUM} = {SUM} + {PRODUCT} | {PRODUCT}
{PRODUCT} = {PRODUCT} * {TERM} | {TERM}
{TERM} = lp {S} rp | id | int
%%actions
%symbol {S}
-> {S} + {E} : {^}.value = add({0}.value, {2}.value)
-> {E} : {^}.value = identity({0}.value)
%symbol {E}
-> {E} * {F} : {^}.value = mult({0}.value, {2}.value)
-> {F} : {^}.value = identity({0}.value)
%symbol {F}
-> lp {S} rp : {^}.value = identity({1}.value)
-> id : {^}.value = lookup_value({0}.$text)
-> int : {^}.value = int({0}.$text)
```
The first action definition for the starting symbol in the %%actions
section
of the grammar is important; the name of the attribute used there (.value
in
%symbol {S}
in the above example) is the one that the final value parsed from
the input code is taken from (the "intermediate representation", or IR).
For the most part, FISHI is self-contained and builds up definitions used in one
section from FISHI code in a prior section; %%tokens
sections define token
classes to be used as terminal symbols in a grammar, %%grammar
sections define
non-terminal symbols by using the defined terminal symbols, and %%actions
sections will refer to both non-terminal and terminal symbols defined in the
prior section.
However, the %%actions
section needs a bit more extra tooling before it can
be used. It refers to hook functions by names in its syntax-directed
definitions: add
, identity
, mult
, lookup_value
, and int
. These
functions must be implemented in Go code and provided to ictiobus by creating a
github.com/dekarrin/ictiobus/trans.HookMap
called HookTable
that maps the
names used in the FISHI spec to their implementation functions. Later, when
ictcc
is called, it will be informed of this code's location using CLI
arguments.
The hook functions for the above spec might look something like the following:
neatlanghooks/neatlanghooks.go
package neatlanghooks
import (
"strconv"
"github.com/dekarrin/ictiobus/trans"
)
var (
HooksTable = trans.HookMap{
"int": hookInt,
"identity": hookIdentity,
"add": hookAdd,
"mult": hookMult,
"lookup_value": hookLookupValue,
}
)
func hookInt(_ trans.SetterInfo, args []interface{}) (interface{}, error) {
str, _ := args[0].(string)
return strconv.Atoi(str)
}
func hookIdentity(_ trans.SetterInfo, args []interface{}) (interface{}, error) {
return args[0], nil
}
func hookAdd(_ trans.SetterInfo, args []interface{}) (interface{}, error) {
num1, _ := args[0].(int)
num2, _ := args[1].(int)
return num1 + num2, nil
}
func hookMult(_ trans.SetterInfo, args []interface{}) (interface{}, error) {
num1, _ := args[0].(int)
num2, _ := args[1].(int)
return num1 * num2, nil
}
func hookLookupValue(_ trans.SetterInfo, args []interface{}) (interface{}, error) {
varName, _ := args[0].(string)
// in a real program, we might use a lookup table to find real value of the
// variable, as this function name implies. For now, to keep things simple
// we'll just use the length of the variable name as its value.
return len(varName), nil
}
Note that a spec is not necessarily required to have Go implementations
associated with the hook functions before ictcc
can be run on it at all. Some
modes of operation only require looking at the FISHI spec, and some don't even
require that an %%actions
section exist. But Go hook implementations are
required to use automatic SDTS validation when generating a new compiler
frontend, an important step that can greatly reduce testing time. Additionally,
the Go hook implementations are required at run-time to be provided by callers
who wish to obtain a Frontend, even if they are not provided when calling
ictcc
, so it's a good idea to start thinking about them early in the process.
The ictcc
command takes FISHI spec files and performs analysis on them to
ultimately produce a complete frontend for the language in the spec. The files
will be scanned for code fences with a language tag of fishi
. FISHI is the
Frontend Instruction Specification for languages Hosted on Ictiobus and is how
languages to build compilers for is are specified.
These files are read in and then a compiler frontend is produced. With no other
options, ictcc will read in the spec file(s), attempt to create the most
restrictive parser possible by trying them in order of most to least
restrictive, and if successful, outputs generated Go code containing the
Frontend into a package called fe
by default.
This section contains a brief overview of some of the most common usages of the ictcc command. For a complete guide and reference to using ictcc, be shore to check the ictcc manual, glub.
For instance, to build a compiler frontend for a new language called 'NeatLang' specified by a FISHI file called 'neatlang-spec.md' while specifying the name and version of the language:
$ ictcc neatlang-spec.md --lang NeatLang --lang-ver 1.0
Then, ictcc will get to work reading the spec and producing a frontend for it:
Reading FISHI input file...
Generating language spec from FISHI...
Creating most restrictive parser from spec...
Successfully generated SLR(1) parser from grammar
WARN: skipping SDTS validation due to missing --ir parameter
Generating compiler frontend in ./fe...
SDTS (syntax-directed translation scheme) validation is an important step for preparing a frontend that can automatically validate that the provided translation scheme can handle all possible combinations of grammar constructs in the language. If you have the Go implementations of the hook functions (see above section on creating a FISHI spec for an example of one), it is recommended you provide it to ictcc so the validation process can execute.
You will need to provide two flags to do this. The --ir
flag specifies the
type that the Frontend will return from parsing the input; SDTS validation
requires knowing the type in order to test the value returned from the SDTS.
But --ir
by itself isn't quite enough for validation to run; it also needs the
--hooks
flag. This flag tells ictiobus the path on disk to a Go package that
contains the special hooks table variable which binds names used in the
%%actions
section of the FISHI spec to their implementations. Validation needs
this by necessity; it cannot evaluate the translation scheme on a parse tree
without knowing which functions to call.
For instance, for the above NeatLang spec, we could give SDTS hook
implementations in a Go package called neatlanghooks
in the current directory.
The neatlanghooks package would define functions needed and provide bindings of
hook implementations to names in an exported variable, by default called
HooksTable
. When calling ictcc, we specify the path to neatlanghooks
and
that our SDTS will produce an int.
$ ictcc neatlang-spec.md --ir int --hooks neatlanghooks
Reading FISHI input file...
Generating language spec from FISHI...
Creating most restrictive parser from spec...
Successfully generated SLR(1) parser from grammar
Generating parser simulation binary in .sim...
Simulation completed with no errors
Generating compiler frontend in ./fe...
This time ictcc
has enough information to perform full validation, and does
so with language input simulation. Luckily, we have everything defined nicely in
NeatLang, so the simulation finds no errors. Great! This means that our NeatLang
translation scheme can handle any input the its grammar allows us to throw at it
without crashing.
In addition to outputting the generated Go code, ictcc supports building a 'diagnostics binary' for testing the frontend from the command line. This is a self-contained executable that can read code in the target language, parse it for an intermediate value, and return the string representation of that value.
Creating the diagnostics binary requires the same CLI flags as enabling SDTS
validation, --ir
and --hooks
. --ir
gives the type that the compiler
returns as its ultimate value, and --hooks
gives the path to a Go package
which binds implementations of hooks to their names. See the above section for a
more detailed explanation of the two flags.
The diagnostics binary is created by specifying --diag
(or just -d
) and
giving a path for the new binary. For example, if we wanted to create a
diagnostics binary called nli
(for NeatLang Interpreter, glub) for the
NeatLang language used as an example in this doc, we would do the following:
$ ictcc neatlang-spec.md --ir int --hooks ./neatlanghooks --diag nli -n
(The -n
flag instructs ictcc not to output the Go code as it normally does. We
used it above because right now we only care about the diagnostics binary, not
the generated code).
Now after ictcc reads the spec file, it will create a new nli
program:
Reading FISHI input file...
Generating language spec from FISHI...
Creating most restrictive parser from spec...
Successfully generated SLR(1) parser from grammar
Generating parser simulation binary in .sim...
Simulation completed with no errors
Format preprocessing disabled in diagnostics bin; set -f to enable
Generating diagnostics binary code in .gen...
Built diagnostics binary 'nli'
(frontend code output skipped due to flags)
And that nli program can be used to parse programs written in NeatLang and interpret them into their value:
$ echo '5 + 2' > input.txt
$ ./nli input.txt
=== Analysis of input.txt ===
7
It can even execute code directly with the -C flag:
$ ./nli -C '8 + 2 * 6'
20
$ ./nil -C '(8 + 2) * 6' # test parentheses grouping
60
The diagnostics binary is a powerful tool for testing. For more information on the diagnostics binary, including how to make it perform preprocessing on input before executing it or how to manually run SDTS validation and output debug traces for each phase of the frontend, see the appropriate section in the ictcc manual.
Once you've generated a new Go package containing the frontend with ictcc
, you
can access it from external Go code.
The generated package contains a Frontend()
function that will return the
ready-to-use frontend. The exact form of this function will vary a bit; by
default, this function will be parameterized with a generic type argument that
gives the type of the intermediate value returned by the frontend. Users of the
frontend must provide this parameter when they call Frontend()
and it must
match what the SDTS is actually generating.
The exception to this is if the frontend is generated with knowledge of what
that type is by giving it as the --ir
argument to ictcc
. This changes the
signature of Frontend()
from
Frontend[E](trans.HooksTable, *FrontendOptions) ictiobus.Frontend[E]
to
Frontend(trans.HooksTable, *FrontendOptions) ictiobus.Frontend[<type specified by --ir>]
so that callers need not provide the IR type.
To obtain the frontend from calling code, you will need to provide a hooks table
that maps SDTS hook names to their implementations (For an example of one, see
the neatlanghooks/neatlanghooks.go
file in the Creating The FISHI Spec section
above). Additionally, a FrontendOptions
may be provided to further customize
the returned Frontend, but it may also be set to nil
to use default options.
Once a Frontend is obtained, Analyze()
can be called on it, which accepts an
io.Reader
and parses any code inside it to the intermediate value.
To use the generated frontend for the above example of 'NeatLang' which was generated in the package fe, assuming that the local codebase we are working in is a Go module called 'github.com/example/neato', we might do the following in our main.go file:
main.go
package main
import (
"os"
"fmt"
"github.com/dekarrin/ictiobus/syntaxerr"
"github.com/example/neato/neatlanghooks"
"github.com/example/neato/fe"
)
func main() {
// The hooks table is defined in our neatlanghooks package, so we can get
// that now:
hooksTable := neatlanghooks.HooksTable
// we don't really care about setting debug options right now, so a nil
// frontend options object is fine.
var feOpts *fe.FrontendOptions
// we know the Frontend will produce an int because we have set up the hook
// function which creates the value to be assigned to the attribute called
// "value" to return an int. Specifically, according to our FISHI spec it is
// always going to be assigned using either the 'add' hook or 'identity'
// hook, i.e. hookAdd or hookIdentity in neatlanghooks, both of which we
// have written to ensure that they will always return an int.
scriptEngine := fe.Frontend[int](hooksTable, feOpts)
// alternatively, if we had used --ir int in ictcc while generating this,
// then Frontend() will not be parameterized:
// scriptEngine := fe.Frontend(hooksTable, feOpts)
// Great! Now that we have the input, we can use it to analyze code in a
// reader:
f, err := os.Open("some-neatlang-expressions.txt")
if err != nil {
panic(err)
}
defer f.Close()
// 2nd return value is the parse tree, which we don't really care about
// right here.
value, _, err := scriptEngine.Analyze(f)
if err != nil {
if syntaxErr, ok := err.(*syntaxerr.Error); ok {
// if it's an ictiobus syntax error, display the detailed message
// to the user:
fmt.Fprintf(os.Stderr, "%s\n", syntaxErr.FullMessage())
return
} else {
panic(err)
}
}
fmt.Printf("Result from file: %v\n", value)
// or we can use AnalyzeString if we just want to run a string of source code:
value, _, err = scriptEngine.AnalyzeString("8 + 24")
if err != nil {
if syntaxErr, ok := err.(*syntaxerr.Error); ok {
// if it's an ictiobus syntax error, display the detailed message
// to the user:
fmt.Fprintf(os.Stderr, syntaxErr.FullMessage())
return
} else {
panic(err)
}
}
fmt.Printf("Result from string: %v\n", value)
}
And that's pretty much it! This was just a simple example that uses the SDTS itself to both translate and immediately evaluate expressions; more advanced scripting languages might instead return an abstract syntax tree which is then further processed to do things.
If you're developing on ictiobus, you must have at least Go 1.19 in order to support generics. Ictiobus relies somewhat heavily on their existence.
To create a new version of the ictcc command, the primary entrypoint into
frontend generation, call scripts/build.sh
from bash or a compatible shell:
$ scripts/build.sh
The scripts directory contains several shell scripts useful for development.
scripts/all-tests.sh
- Executes all unit tests and assuming they pass, all integration tests as well, and then all example test runs.scripts/build.sh
- Builds theictcc
command in the root of the repo.scripts/create-dists.sh
- Creates distribution tarballs ofictcc
. By default, it will build for Darwin (Mac), Windows, and Linux, all amd64. Two tarballs per build are created; one denoted as 'latest' and the other denoted as the actual version of the build. Version info is automatically obtained by running the currently-builtictcc
with--version
.scripts/example-tests.sh
- Runs all test runs of example uses of Ictiobus.scripts/gendev.sh
- Generates a frontend for the FISHI language itself, using the FISHI spec in docs/fishi.md. This script will build a new diagnostics binary calledfishic
in the current directory, and output generated Go code to a directory called.testout
in the repo root for examination by the developer and to avoid replacing the actual live FISHI frontend. Does not automatically build a newictcc
binary to use for the generation; it will use whatever ictcc is in the repo root, and will fail if one has not yet been built.scripts/genfrontend.sh
- Generates a new live frontend for the FISHI spec language and replaces the one currently used in ictcc with the new one. This script will build a new version ofictcc
automatically by callingscripts/build.sh
.scripts/int-tests.sh
- Run all integration tests.