Added more philosophy

pc2 · Oct 10, 2023 · 46f57d5 · 46f57d5
1 parent 14ec0ee
commit 46f57d5
Show file tree

Hide file tree

Showing 7 changed files with 96 additions and 17 deletions.
diff --git a/README.md b/README.md
@@ -28,6 +28,11 @@ The main goals of the language are roughly listed below:
 - Lambda Modules
 
 ## Tasks
+### Major Milestones
+- [ ] Arbitrary forward pipelines representable
+- [ ] Arbitrary FPGA hardware representable
+- [ ] Generative Code
+- [ ] Templates
 ### Parsing
 - [x] Basic Tokenizer
 - [x] Basic Syntax Error Reporting

diff --git a/philosophy/instantiation.md b/philosophy/instantiation.md
@@ -6,4 +6,4 @@ Because we have a broader vocabulary describing our modules, it becomes possible
 
 Additional modifiers
 - Latency-free: All latency registers are removed
-- Set latency: between two 
+- Set latency: sets the latency between two connectors (ports, locals, fields etc), adding or removing latency registers as needed. Mustly used to override latency for tight feedback loops. 
diff --git a/philosophy/standardlibrary.md → philosophy/library.md b/philosophy/standardlibrary.md → philosophy/library.md
@@ -1,6 +1,9 @@
 # The SUS Standard Library
 By making cycle-latency mostly transparent to the programmer, we enable the use of more generic building blocks. These should be grouped into a standard library built specifically for the language. 
 
+## Common interfaces
+The STL should provide standardized names for common interface types, such as memory read and write ports and AXI ports. This helps standardise the interfaces of distinct libraries, allowing for easier integration. 
+
 ## Memory blocks and FIFOs
 Configurable Memory primitives and FIFOs should certainly be part of the standard library. These are so fundamental to any hardware design, and appear to be uniquitous across FPGA vendors. These Memory primitives should however not be fully fixed. Attributes such as read latency and read-write conflict resolution vary substantially between vendors, so this should be left up to the target platform. However of course it should always be possible to properly fix these values in situations where the programmer needs them. Such as when one needs a 0-cycle memory read, even if that would mean it would reach terrible timing, or not synthesize at all on some platforms. 
 
@@ -9,7 +12,7 @@ This is also the reason why I believe the 'inference' doctrine of defining memor
 ### Multi-Clock Memories and FIFOs
 It is still up for debate whether multi-clock variants should be implicit from the use, or explicit different types. There are arguments to be made for both approaches. Certainly this gets blurry when making the distinction between synchronous and asynchronous clocks. In any case, multi-clock modules should be available in the STL in some form. 
 
-## Shift registers, packers, unpackers
+## Shift registers, skid buffers, packers, unpackers
 These are quite natural utilities that any project could use. 
 
 ## Clock Domain Crossings

diff --git a/philosophy/optimization.md b/philosophy/optimization.md
@@ -0,0 +1,14 @@
+# Optimization
+One important observation I recently made was on Optimization. I had been quite proud of my unusual stance on "Optimization is a Non-Goal". But as I add more and more abstractions to the language, I come around to a different conclusion. Being a hardware designer in the lowest abstraction layer - Verilog, I believed that hardware itself was fundamentally unoptimizable when written out in this lowest abstraction layer, because any optimization the compiler could do could go against the intention of the programmer, undoing for example place-and-route considerations the programmer had made. As I introduced new abstractions however, I kept bumping into the problem of "what if I don't need this abstraction?". 
+
+Making the abstractions always optional seemed to run counter to the safety promises I wanted to make, instead what would we ideal was if the programmer can specify their interface within the bounds of the abstraction, and somehow prove that the compiler will recognise the situation and remove the abstraction's overhead. In essense, this is what it means to do optimization. 
+
+This more nuanced view is summarized as follows:
+
+> Optimization should be a goal insofar as it is the un-doing of abstractions. 
+> 
+> Likewise, abstractions are only permissible if there is some all-encompassing optimization the compiler can perform to undo the abstraction if needed.
+
+I still believe hardware is still broadly unoptimizeable. In contrast to software design, where the primary optimization target is Speed, in hardware there are multiple targets that are mutually at odds with each other. Clock speed, Cycle latency, logic, memory and DLS utilization, and routing congestion. No 'optimal' HW solution exists, and so it is up to the programmer to make this tradeoff. 
+
+This is why I consider HLS to be misguided in their "Just write it as software code and we'll do the optimization for you" approach. 
diff --git a/philosophy/safety.md b/philosophy/safety.md
@@ -2,23 +2,40 @@
 
 So what does the Safety-First in Safety-First HDL mean? Like with our counterparts in Software Design such as Rust, it does not mean that the code you write is guaranteed to be correct. Rather it eliminates common classes of bugs that would otherwise have to be found through manual debugging. Counterintuitively however, is that the safety abstractions employed should never limit the programmer in the hardware they want to design. This means *any* hardware design one could possibly build in Verilog or VHDL, should also be representable in SUS. The difference should be that safe hardware should be easy to design, while unsafe should be comparatively difficult. Finally, as with Safe Software Languages, the goal is to enable fearless development and maintenance. The programmer should be able to rest easy that after implementing their change and fixing all compilation errors, the code again works properly. 
 
-Common classes of HW bugs are: 
-- Cycle-wise timing errors through incorrectly pipelined HW. 
-- Misunderstood module documentation leading to incorrect use. 
-- Operation results being cast to a too small integer bitwidth. 
+## Common classes of HW bugs are: 
+### Cycle-wise timing errors through incorrectly pipelined HW. 
+Manually keeping their pipeline in sync is taken out of the programmer's hands. The language makes a distinction between registers used for *latency* and those used for *state*. Latency registers are handled by latency counting and adding registers the other paths to keep them in sync. 
+
+### Misunderstood module documentation leading to incorrect use. 
+The system of Flow Descriptors is there to prevent incorrect use of library modules. Flow descriptors are not optional, so they force the programmer to add the proper descriptors when they define a module containing state. 
+
+### Operation results being cast to a too small integer bitwidth. 
+SUS disallows implicit casts that lose information. Instead, the programmer is required to specify either unsafe casts, where runtime checks can be inserted, or adding modular arithmetic to specify overflow behaviour.
+
+### Data loss or duplication
+Examples: 
 - Data loss or state corruption for unready modules
 - Data duplication from held state
 - Data loss or duplication at Clock Domain Boundaries. 
 
-The SUS compiler attempts to make these classes impossible through the following ways:
-- Cycle-wise timing errors through incorrectly pipelined HW. 
-
-Manually keeping their pipeline in sync is taken out of the programmer's hands. The language makes a distinction between registers used for *latency* and those used for *state*. Latency registers are handled by latency counting and adding registers the other paths to keep them in sync. 
-
+Compiler warnings or errors on data that sits unused, ports that go unread, or ports that are written when no data is expected are all prevented by the flow descriptor system. Of course, it's not possible to prevent data from being lost within the module state itself. 
 
 ## Flow Descriptors
 On any module or interface we can specify flow descriptors. These describe how and in which patterns data are allowed to flow through a module. Much like rust's borrow checker, this provides an additional layer of code flow analysis that must be verified for correctness. They are written in a kind of regex-like syntax, ideally with the full descriptive power of Linear Temporal Logic (LTL). Like with typing and borrow checking, the additional information describes the *what*, whereas the code describes the *how*. 
 
 The exact notation of this is still in flux. A straight-forward option would be to straight up just use LTL notation, though I have some reservations about this. Certainly there's already a great body of work on LTL notation, making it an attractive choice, but the first big spanner in the works is that LTL allows itself to be recursively nested within arbitrary boolean expressions. Allowing this much freedom would require the compiler to effectively contain a SAT solver as part of this typechecking. Instead, perhaps only a subset of LTL could be used, which provides only simple regex-like pattern matching. 
 
+## Error locality
+While the primary objective of a safety-first compiler is to prevent compilation of code that has proven errors, just providing a binary yes-no at the end of compilation doesn't help the programmer much. Indeed errors should be descriptive and point to a relevant position in the code. Errors should in the best case also point the programmer towards a way to fix the error. 
+
+But a type of locality that isn't often discussed is across template bounds. It's still a long way off for a fresh language, but it's important to think ahead a bit. 
+
+There's broadly two types of error reporting with templates: Pre-Instantiation and Post-Instantiation errors. As the name implies, these are errors that can be located before, and after instantiation. 
+
+There is a conflict between these error reporting types though. On the one had, Post-Instantiation errors are easy to implement. The compiler can deal with only known types and values. Any code generation code will have already run, giving the compiler a simple component network to work with. However, Post-Instantiation errors can only be reported after the module is instantiated of course. So after the user writes a templated module, they can only know it correct after actually using it somewhere with concrete parameters. Even then, the user will only see errors that occurred with this specific set of parameters, leaving them unsure their code is correct in the general case. 
+
+This is why users strongly prefer Pre-Instantiation error reports. For these, the compiler only needs to look at the templated code itself to generate these errors. Famously, this is one of the biggest reason Rust programmers cite for preferring the language over C++. Rust with it's Trait system forces the user to apply the proper trait bounds to their template arguments to be able to use the abilities provided by the trait, allowing both error reporting within the function code, as well as reporting errors at instantiation time if the provided type doesn't implement the trait bounds. This is in stark contrast to C++'s approach, which doesn't even perform name resolution before template instantiation. 
+
+Sadly, Pre-Instantiation error reporting comes with a lot of strings attached, or in many cases it may actually be impossible. Errors such as unused variables are impossible to detect in the general case with generative code for example, because ideally code generation should be turing-complete. For the same reason, errors in integer bounds also can't be caught in the general case. Perhaps typing errors could, by following Rust's approach of using Traits. 
 
+In any case, Post-Instantiation errors are just easier to implement. For a working first version, it's probably for the best to leave out these nice programmer-friendly improvements. 
diff --git a/philosophy/state.md b/philosophy/state.md
@@ -49,19 +49,20 @@ Generating the whole state machine is a combinatorial endeavour however, and a t
 ```Verilog
 timeline (X, false -> /)* .. (X, true -> T)
 module Accumulator : int term, bool done -> int total {
-    state int tot init 0;
+    state int tot := 0; // Initial value, not a real assignment
 
     int new_tot = tot + term;
     if done {
         total = new_tot;
         tot = 0;
+        finish; // packet is hereby finished. 
     } else {
         tot = new_tot;
     }
 }
 ```
 
-In this case the compiler would generate a state machine with one state. The regex is mapped to a 3-state state machine. Represented below:
+In this case the compiler would generate a state machine with two states. One state for when the module is active, and one is generated implicitly for the inactive case. The regex is mapped to a 3-state state machine. Represented below:
 
 - A: `inactive`
 - B: `(X, false - /)`
@@ -84,11 +85,47 @@ Compiled to a DFA this gives:
 - C -> B when valid & !done
 - C -> C when valid & done
 
-This state machine m
-These two state machines must be proven equivalent. There must be exactly one edge-preserving mapping from the regex to the code. This means, each code state should uphold the constraints of all regex states that map to it. There may be no additional reachable edges. 
+The code's state machine must be proven equivalent to the regex state machine. This is done by simulating the code STM based on the regex. The code must properly request inputs at regex states where inputs are provided, and may not when not. It's inputs must be valid for _any_ path in the regex STM, while it's outputs must conform to _some_ path of the regex. 
 
-Finally the initial conditions must be reestablished on any edge back to inactive. 
+Any module working on finite packet sizes must also specify the `finish` keyword when the module is finished sending a packet. 
+At this point the initial conditions must be reestablished explicitly. After this, the module goes back into the inactive state. 
 
-In this example all three states are mapped on the single code state. So the code must abide by all their constraints. And it does, in the case `done == false` the module may not output `total` Likewise, in the case `done == true`, the module *must* output `total`. 
+In this example, the code simulation starts right in its initial state. Then the different paths of the regex STM are all simulated. For the case of infinite loops, we save any distinct (regex, code-STM) pair we come across, and skip combinations we've already come across. 
+
+Since in this example the only active state for the code corresponds to both active states of the regex, the code must abide by the constraints of both regex paths. And it does, in the case `done == false` the module may not output `total` Likewise, in the case `done == true`, the module *must* output `total`. And in the case of `done == true`, the code has to go back to the initial state through the `finish` keyword. 
 
 The caller is then responsible for providing a stream of the form of the regex. 
+
+### Unpacker
+The previous example was quite simple though, with the code's active state machine containing only one state. In this example we explore a module that does have structural state. 
+
+```Verilog
+timeline (X -> X) .. (/ -> X) .. (/ -> X) .. (/ -> X)
+module Unpack4<T> : T[4] packed -> T out_stream {
+    state int st := 0; // Initial value, not a real assignment
+    state T[3] stored_packed;
+
+    if st == 0 {
+        out_stream = packed[0];
+        stored_packed[0] = packed[1]; // Shorthand notation is possible here "stored_packed[0:2] = packed[1:3];"
+        stored_packed[1] = packed[2];
+        stored_packed[2] = packed[3];
+        st = 1;
+    } else if st == 1 {
+        out_stream = stored_packed[0];
+        st = 2;
+    } else if st == 2 {
+        out_stream = stored_packed[1];
+        st = 3;
+    } else if st == 3 {
+        out_stream = stored_packed[2];
+        st = 0;
+        finish; // packet is hereby finished. 
+    }
+}
+```
+
+In this case, the regex has 4 states, but we don't know what number of states the code has. One could bound the integer `st` of course, and for the number of states multiply together the counts of all structural state objects we find. But we don't need to. We can simply simulate the code, only explicitly saving the structural state fields. 
+
+In this case, we know the starting value of `st`, and we just need to simulate the hardware with this. So in the first cycle, we are obligated to read from `packed`, and write to `out_stream`. Following the code that is the case, as we execute the first branch: `st == 0`. We know the next state `st = 1`, so we continue going along. This continues for the remaining states of the regex, ending at `st == 3` where we also call `finish`. 
+
diff --git a/philosophy/types.md b/philosophy/types.md
@@ -11,3 +11,6 @@ No natural implementation choice exists for Sum Types, and thus they shouldn't b
 One exception however, is quite natural in hardware, and that is the Maybe (or Option) type. Sum types in general actually fit nicely with the flow descriptors system, where the developer can specify which level of wire sharing they want, and which ports should describe separate variants. 
 
 Finally, there should be a type-safe implementation for a full wire-sharing sum type. That should be supported by the standard library, using something like a Union type, for those cases where the reduction in bus width is worth the additional multiplexers and routing constraints. 
+
+# Enums
+Enums are lovely. It's important that the programmer can specify what the exact representation is, such that the compiler can optimize their use. Be it as a one-hot vector, binary encoding, or a combination of the two.