From f1da07a248c254c9ff36dfcffe39e7a3e4d2a70f Mon Sep 17 00:00:00 2001 From: Bill Hails Date: Wed, 24 Jan 2024 13:50:51 +0000 Subject: [PATCH] Create CODEGEN.md --- docs/CODEGEN.md | 41 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) create mode 100644 docs/CODEGEN.md diff --git a/docs/CODEGEN.md b/docs/CODEGEN.md new file mode 100644 index 0000000..a1f81b6 --- /dev/null +++ b/docs/CODEGEN.md @@ -0,0 +1,41 @@ +# Code Generation + +I'd initially wanted to hide the code generation aspects of this project and just commit the generated `.c` and `.h` +files in the `src/` directory. However this is disingenuous and probably hinders other people playing with the code, +so I've made it official. + +There always was a `tmp/` directory created by `make` that hosts the generated flex and bison output, so it was +simple enough to generate additional code into there instead of in to `src/`, and remove those generated files from git. + +So what's the code generation for? It just removes the need to maintain a ton of boilerplate code around structures +used by the project. There are a number of `.yaml` files in the `src/` directory which basically declare +C structs and their typedefs. At the time of writing they are: + +* [`anf.yaml`](../src/anf.yaml) A-Normal form structures input to the bytecode compiler, generated fron the lambda structures. +* [`ast.yaml`](../src/ast.yaml) The abstract syntax tree generated by the parser. +* [`lambda.yaml`](../src/lambda.yaml) Lambda calculus-like structures generated from the AST. +* [`tc.yaml`](src/tc.yaml) Type checking support for Algorithm W. +* [`tpmc.yaml`](../src/tpmc.yaml) Term Pattern Matching Compiler support structures, part of lambda conversion. + +For example `ast.yaml` contains the declarations for the abstract +syntax tree generated by the parser. A python script [makeAST.py](../tools/makeAST.py) is given each of those yaml files +and generates the same set of `.c` and `.h` files for each. Continuing with the `ast.yaml` example, from that file +will be generated: +* `tmp/ast.c` a number of different functions for each structure: + * `new()` functions that allocate memory and poulate the allocated structs with argument values. + * `copy()` functions that will make a deep copy of the struct. + * `push()` functions that will push data onto any declared 1-dimensional arrays. + * `mark()` functions that will recursively mark the structures as part of garbage collection. + * a generic `mark` function that will switch on the type and call the correct `mark` function. + * `free` functions that will release unused memory when requested by the garbage collection system + * a generic `free` function that dispatches to the correct `free` function. + * a `typename` function that will return the name of a struct for debugging etc. +* `tmp/ast_debug.c` debugging utilities, namely: + * `print()` functions that will recursively display a representation of the struct for debugging. + * `eq()` functions that perform deep comparisons for testing and debugging. +* `tmp/ast_debug.h` header for `ast_debug.c` +* `tmp/ast.h` header for `ast.c` includes the structure declarations themselves. +* `tmp/ast_objtypes.h` macros collecting the enums and case statements that can then easily be incorporated into the memory management system. + +This all means that it's relatively easy to make fairly sweeping changes to the various trees without all the +tedious re-writing of the above.