Skip to content

Commit

Permalink
More rigorous description of language philosophy
Browse files Browse the repository at this point in the history
  • Loading branch information
VonTum committed Oct 8, 2023
1 parent b9c6f19 commit 14ec0ee
Show file tree
Hide file tree
Showing 6 changed files with 171 additions and 0 deletions.
9 changes: 9 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,13 +48,22 @@ The main goals of the language are roughly listed below:
- [ ] Rythm Syntax
- [ ] Generator Syntax

### Linking and Name Resolution
- [x] Single File Name Resolution
- [x] Multi File Name Resolution
- [x] Incremental Linking
- [ ] Incremental Compilation
- [ ] Multi-Threaded Compilation

### LSP
- [x] Basic LSP for VSCode integration
- [x] Syntax Highlighting
- [x] Error and Warning Reporting
- [ ] Per-Line Resource Utilization Reporting

### Code Generation
- [x] Expression Flattening
- [ ] State Machine Generation
- [ ] Can Generate Verilog for Multiply-Add pipeline
- [ ] Can Generate Verilog for Blur2 filter
- [ ] Can Generate Verilog for FIFO
Expand Down
9 changes: 9 additions & 0 deletions philosophy/instantiation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Instantiation Modifiers
Because we have a broader vocabulary describing our modules, it becomes possible to modify instantiations of modules to add functionality.

- Continuous (default): The module behaves like a freestanding module, inputs and outputs are expected on each clock pulse
- Push: The module only advances when instructed by the parent module. This only affects `state` registers. Latency is unaffected.

Additional modifiers
- Latency-free: All latency registers are removed
- Set latency: between two
24 changes: 24 additions & 0 deletions philosophy/safety.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Safety

So what does the Safety-First in Safety-First HDL mean? Like with our counterparts in Software Design such as Rust, it does not mean that the code you write is guaranteed to be correct. Rather it eliminates common classes of bugs that would otherwise have to be found through manual debugging. Counterintuitively however, is that the safety abstractions employed should never limit the programmer in the hardware they want to design. This means *any* hardware design one could possibly build in Verilog or VHDL, should also be representable in SUS. The difference should be that safe hardware should be easy to design, while unsafe should be comparatively difficult. Finally, as with Safe Software Languages, the goal is to enable fearless development and maintenance. The programmer should be able to rest easy that after implementing their change and fixing all compilation errors, the code again works properly.

Common classes of HW bugs are:
- Cycle-wise timing errors through incorrectly pipelined HW.
- Misunderstood module documentation leading to incorrect use.
- Operation results being cast to a too small integer bitwidth.
- Data loss or state corruption for unready modules
- Data duplication from held state
- Data loss or duplication at Clock Domain Boundaries.

The SUS compiler attempts to make these classes impossible through the following ways:
- Cycle-wise timing errors through incorrectly pipelined HW.

Manually keeping their pipeline in sync is taken out of the programmer's hands. The language makes a distinction between registers used for *latency* and those used for *state*. Latency registers are handled by latency counting and adding registers the other paths to keep them in sync.


## Flow Descriptors
On any module or interface we can specify flow descriptors. These describe how and in which patterns data are allowed to flow through a module. Much like rust's borrow checker, this provides an additional layer of code flow analysis that must be verified for correctness. They are written in a kind of regex-like syntax, ideally with the full descriptive power of Linear Temporal Logic (LTL). Like with typing and borrow checking, the additional information describes the *what*, whereas the code describes the *how*.

The exact notation of this is still in flux. A straight-forward option would be to straight up just use LTL notation, though I have some reservations about this. Certainly there's already a great body of work on LTL notation, making it an attractive choice, but the first big spanner in the works is that LTL allows itself to be recursively nested within arbitrary boolean expressions. Allowing this much freedom would require the compiler to effectively contain a SAT solver as part of this typechecking. Instead, perhaps only a subset of LTL could be used, which provides only simple regex-like pattern matching.


22 changes: 22 additions & 0 deletions philosophy/standardlibrary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# The SUS Standard Library
By making cycle-latency mostly transparent to the programmer, we enable the use of more generic building blocks. These should be grouped into a standard library built specifically for the language.

## Memory blocks and FIFOs
Configurable Memory primitives and FIFOs should certainly be part of the standard library. These are so fundamental to any hardware design, and appear to be uniquitous across FPGA vendors. These Memory primitives should however not be fully fixed. Attributes such as read latency and read-write conflict resolution vary substantially between vendors, so this should be left up to the target platform. However of course it should always be possible to properly fix these values in situations where the programmer needs them. Such as when one needs a 0-cycle memory read, even if that would mean it would reach terrible timing, or not synthesize at all on some platforms.

This is also the reason why I believe the 'inference' doctrine of defining memory blocks is fundamentally flawed. An inference implementation will always make implicit assumptions about the read latency and read-write conflict, meaning the code isn't properly portable across devices.

### Multi-Clock Memories and FIFOs
It is still up for debate whether multi-clock variants should be implicit from the use, or explicit different types. There are arguments to be made for both approaches. Certainly this gets blurry when making the distinction between synchronous and asynchronous clocks. In any case, multi-clock modules should be available in the STL in some form.

## Shift registers, packers, unpackers
These are quite natural utilities that any project could use.

## Clock Domain Crossings
Clock Domain Crossings are a famously difficult problem in hardware design, and are a source of many sporadic and difficult to find bugs. The way one does a clock domain crossing also very much depends on the circumstances. Thus again, no all-encompassing generic solution can really be given. However, various common CDC implementations should be offered in the STL, which can then prove connecting code safe using *rythms*.

# Implementation of the STL
Generally, a generic implementation can be given for all STL types. But these won't work well on most platforms. For each platform, there should be platform-specific implementations of each of these.

# STL extensions
In fields like HPC, certain interfaces are ubiquitous, such as DDR memory interfaces, HBM, PCIE and Ethernet. In general these don't fit in the standard library itself, as these features are not available on the majority of platforms, but instead could be offered as separate libraries. These could (like with the STL) provide differing implementations over a generic interface, to again enable more cross-platform code.
94 changes: 94 additions & 0 deletions philosophy/state.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@


# On Registers

## State vs Latency

In my experience, the use of registers usually boils down to two use cases:
- Representing a current working state, which gets updated across clock cycles
- Improving timing closure by introducing registers on tight paths.

While this distinction exists in the programmer's mind, it isn't in the vocabulary of common compilers. Verilog and VHDL just call both 'reg' (And non-registers too, but that's another can of worms.)

Philosophically, the difference is quite important though. Registers that are part of the state are critical, and they directly direct the functioning of the device. While latency registers should not affect the functioning of the design at all, aside from trivially affecting the latency of the whole design. Some would argue that worrying about latency registers is a solved problem, with retiming tools that can automatically migrate latency registers across a design to place them wherever more timing slack is required. In practice though, this capability is limited, usually by explicitly marking specific paths as latency insensitive, or in a limited way by synthesizing a block of registers somewhere, which should then be migrated across the design. Still, this practice is always limited by the first design register it comes across along the path. Explicitly differentiating between state and latency registers could make this automatic retiming much more powerful.

While indeed generally latency can't affect the actual operation of the device, it can be disallowed in certain circumstances. Certain paths are latency sensitive, and would no longer produce correct results if latency were introduced. A trivial example is any kind of feedback loop. In this case, no latency can be introduced within the feedback loop itself, as the result for the current feedback loop cycle wouldn't arrive in time. In this case the latency should either be forbidden, or reincorporated in a different way, such as interpreting the state loop as a [C-Slowed](https://en.wikipedia.org/wiki/C-slowing) state loop.

## Latency Counting
Inserting latency registers on every path that requires them is an incredibly tedious job. Especicially if one has many signals that have to be kept in sync for every latency register added. This is why I propose a terse pipelining notation. Simply add the `reg` keyword to any critical path and any paths running parallel to it will get latency added to compensate. This is accomplished by adding a 'latency' field to every path. Starting from an arbitrary starting point, all locals connected to it can then get an 'absolute' latency value, where locals dependent on multiple paths take the maximum latency of their source paths. From this we can then recompute the path latencies to be exact latencies, and add the necessary registers.

Example:
```
(start - 0)
A -----------+-- reg -- reg --\
(-1) / +-- C (2)
B -- reg --/------------------/
```

### Latency counting with state
Of course, state registers are also moved around by latency. This means that while it appears like two state modules get updated at the same time, if they are independent they need not.

However, state registers should not count towards the latency count. So specifying `reg reg` should increase the latency count by 2, but specifying `state` does not. This makes sense, because this means a feedback loop to a state register has a latency of 0, which it requires to stay within. Also, this maintains that by removing all latency registers, the total latency count becomes 0 on all ports.

If this rule holds for all possible hardware designs is up for further research.

## On State
State goes hand-in-hand with the flow descriptors on the ports of modules. Without state all a module could represent is a simple flow-through pipeline.

But once we introduce state, suddenly modules can have a wide range of output patterns and required input patterns. A simple example would be a data packer or unpacker. An unpacker receives a data packet, and outputs its contents in parts over the next N cycles. How should this unpacker behave when it receives another data packet before it finishes? It can either discard what it's currently working on, or discard the incoming data. Either way, data is lost. So the packer's interface must prohibit incoming data for N-1 cycles after a valid packet.

The language we choose for the interfaces is that of the regex. This is a natural choice, since in effect any module the user writes is a state machine, and regexes can be converted to state machines. State machines have a nice property, that operators for working with state machines are polynomial and easy to understand.

### Structural and Data State
We have to check the state machine that is each module against the state machines of the modules it uses of course. Sadly, this checking can only really be done in a generic way by generating the full module state machine, and checking its behavior against the state machine from its dependents' interfaces, as well as its own.

Generating the whole state machine is a combinatorial endeavour however, and a too wide state vector quickly leads to an unmanageable number of states. This encourages us to differentiate between two types of state. Structural State (namely state whose instances are incorporated into the module STM), and Data State, which (aside from its validity) is not. We wouldn't care about every possible bitpattern of a floating point number we happened to include in our state right?

### Examples
#### Summing module
```Verilog
timeline (X, false -> /)* .. (X, true -> T)
module Accumulator : int term, bool done -> int total {
state int tot init 0;
int new_tot = tot + term;
if done {
total = new_tot;
tot = 0;
} else {
tot = new_tot;
}
}
```

In this case the compiler would generate a state machine with one state. The regex is mapped to a 3-state state machine. Represented below:

- A: `inactive`
- B: `(X, false - /)`
- C: `(X, true - T)`

The regex produces the following NFA: (-> is a consuming transition, => is not)
- A -> A when !valid
- A => B when valid
- B -> B when !done
- B => C when done
- C -> A

Compiled to a DFA this gives:
- A -> A when !valid
- A -> B when valid & !done
- A -> C when valid & done
- B -> B when !done
- B -> C when done
- C -> A when !valid
- C -> B when valid & !done
- C -> C when valid & done

This state machine m
These two state machines must be proven equivalent. There must be exactly one edge-preserving mapping from the regex to the code. This means, each code state should uphold the constraints of all regex states that map to it. There may be no additional reachable edges.

Finally the initial conditions must be reestablished on any edge back to inactive.

In this example all three states are mapped on the single code state. So the code must abide by all their constraints. And it does, in the case `done == false` the module may not output `total` Likewise, in the case `done == true`, the module *must* output `total`.

The caller is then responsible for providing a stream of the form of the regex.
13 changes: 13 additions & 0 deletions philosophy/types.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Product Types
Product types or structs are quite natural to express in hardware and should be supported by the language. A product type is represented as a bundle of the field data lines that together form the whole struct.

# Sum Types
Sum types, are the natural companion of Product Types. Though they are far less commonly supported in Software Languages, they have been gaining ground in recent years due to the popularity of the Rust Language. For software compilers, their implementation is quite natural. Since we only ever use one variant of a sum type at a time, we can share the memory that each of them would take up. This has many benefits: reduced struct size, possibility for shared field access optimization, and no real downsides.

The same however, cannot be said for hardware design. In hardware design, there are two main ways one could implement a sum type. Either sharing the wires for the variants, or having each variant have their own separate set. Which implemenation is most efficient depends on how the sum type is used. In the case of sharing the wires, we incur the cost of the multiplexers to put the signals on the same wire, as well as the additional routing and timing cross dependencies it introduces between the variants. On the other hand, in the case of separating out the variants into their own wires does not incur this cost, but storing and moving the sum type around takes far more wires and registers.

No natural implementation choice exists for Sum Types, and thus they shouldn't be supported at the language level.

One exception however, is quite natural in hardware, and that is the Maybe (or Option) type. Sum types in general actually fit nicely with the flow descriptors system, where the developer can specify which level of wire sharing they want, and which ports should describe separate variants.

Finally, there should be a type-safe implementation for a full wire-sharing sum type. That should be supported by the standard library, using something like a Union type, for those cases where the reduction in bus width is worth the additional multiplexers and routing constraints.

0 comments on commit 14ec0ee

Please sign in to comment.