-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
class Memory: add reset_less parameter that will not initialize memories with 0 value in generated RTL. #270
base: master
Are you sure you want to change the base?
Conversation
…ies with 0 value in generated RTL. This is to simulate better the behavior for ASIC memory implementation where memory blocks will have random value after startup. This parameter can be configured on class level so it can be set for all generated memory blocks if generating code for ASICs.
Codecov Report
@@ Coverage Diff @@
## master #270 +/- ##
=========================================
+ Coverage 82.29% 82.3% +<.01%
=========================================
Files 34 34
Lines 5598 5601 +3
Branches 1201 1202 +1
=========================================
+ Hits 4607 4610 +3
Misses 851 851
Partials 140 140
Continue to review full report at Codecov.
|
This is true. However, nMigen initializes signals to 0 as well (even |
I don't use Actually I have been thinking of doing that 'X' initialization also for |
But unlike on an FPGA, on an ASIC the flip-flops won't be reset unless you explicitly do so. So I see the challenges here as quite similar: both FPGA registers and (in most cases) memories are initialized, and neither ASIC registers nor memories are initialized. I think adding this parameter to |
Actually it's introducing back a feature from Migen where RTL generation actually was made altered for ASIC. I thought one of the goals for nMigen was to get rid of this difference... Anyway so what do you think about having an extra |
That feature was removed from Migen because it was thought that no one used it, and it bitrotted. I do not see a way to generate the same code for FPGA and ASIC because FPGA and ASIC inherently have different behavior. |
Actually one of the reasons for using (n)Migen is to avoid the problem that designs that have been tested and are working on a FPGA don't work as an ASIC. Ideally nMigen code would always behave the same on FPGA and ASIC. If that is not possible at least the flow should error out if a feature is used that is not supported on a certain platform. It should not be the case that the behavior is dependent on the platform.
|
Changing the behavior to what you suggest will break all (n)Migen code in existence. |
Bugger, due to lack of good documentation this was not clear to me up to now. Can at least this expectation be adapted for memories only ? POR (power-on-reset) for ASIC and flip-flops can be implemented if really needed. For memories it would mean adding a ROM and loading that into the RAM at POR. Can you give also example of code that actually depends on the fact that value is set at POR but not at clock domain reset ? |
Any FPGA design that is synthesized without an explicit reset (either external or generated inside the design) by relying on the ability to initialize flip-flops. For example, Xilinx specifically recommends against adding a global reset unless it is unavoidable: rather than resetting the design after power-on, they recommend gating the clock until end of startup. By default, nMigen generates code that complies with these recommendations.
It would have to be, if only because some FPGAs have memories that cannot be initialized (e.g. iCE40 SPRAM; IIRC, Xilinx UltraRAM). But it is not clear to me what is the ideal path forward here. One of the major design goals behind (n)Migen is that every design you write (without using instances) should be deterministic (modulo asynchronous clocks), so it tries very hard to make sure there are no uninitialized design elements. Whatever is the language-level solution here, I think that it should be applied to both signals and memories. Otherwise, it would lead to simulation-synthesis mismatch on ASICs, which is just as bad as platform-dependent behavior, if not worse. I do not like the idea of using random values in place of uninitialized ones in simulation. In case of registers, there is no guarantee that at power-on, the flip-flop is not in a metastable state. That can violate invariants in downstream logic. In all cases, using random values may give the impression that the downstream logic must behave correctly if it passes enough simulation runs, but of course that is not true, e.g. let's say such downstream logic breaks only if a 32-bit register has one particular value. So it can hide bugs. Are you using property checking with SymbiYosys? If you are, it is easy to set it up so that every possible power-on state will be checked. Of course, this does not address the issue of metastability. Overall, I am leaning towards implementing conservative X-propagation in pysim to handle uninitialized registers and memories. But it is a sizable task. |
It is documented in the docstring for Lines 809 to 812 in 834fe3c
But I agree that the documentation is currently not very accessible and must be improved. |
An ASIC designer like me sees POR also as reset logic; so 'no reset logic for this |
Indeed that is a very FPGA-centric statement. Before nMigen explicitly declares ASIC support all such statements should be clarified. |
That should in itself not be a problem. It should be possible to use global reset for ASIC and rely on initialization for FPGA. The problem is when you do want BTW, on ASICs external reset signals are going through a reset synchronizer. The synchronous reset deassertion is to make sure all set/reset signals for all the flip-flops are deasserted in the same clock cycle. Otherwise it could mess up state machines.
Metastability is only a problem for CDC, during power-on flip-flops will always initialize to 0 or 1. The metastable state is not stable due to noise in transistors and the live time is much shorter than the time taken for power ramp up. |
Actually I was only talking about random values for pysim. For external simulators like verilator/icarus I would use classic 'X' during simulation. I agree that also implementing unitialized value support in pysim would be the best solution but this would have a performance impact. |
I propose using a special value This is far more coarse grained than Verilog X-propagation--in fact in my proposal (call it "U-propagation") you would not be able to erase undefinedness by One drawback is that, by not behaving exactly like Verilog X-propagation, there will be a mismatch between pysim and Verilog simulators; more on that later.
I agree that if your ASIC comes with a reset synchronizer then metastability on power-on cannot possibly be a problem. (Yes, I know what a reset synchronizer is :) I'll elaborate the reason I mention metastability in this context below.
So let's take a step back and consider the wider context in which determinism appears in nMigen. It may be helpful to mention that my background is memory-safe low-level languages, so when working on nMigen, I tend to think in similar categories. In general, I see the two primary goals of nMigen as:
The second one should be self-evident. In case of the first one, which safety properties would these be? Well, nMigen is a low-level, RTL language--it does not even have a type system--so it does not provide extensive modeling tools that could uphold domain invariants. What it can do is provide tools for tackling nondeterminism. I believe that discussions of nondeterminism in HDLs are hampered by grouping three categories of nondeterminism under the same nondescript label
* With any current synthesis toolchain out of the box, as far as I know. In principle, with appropriate toolchain support or a preprocessor simulating it, this category may be reduced to (1); more on that later. By completely** (at the moment) excluding the ability to use don't-cares in expressions, nMigen eliminates category (3). By completely** (at the moment) excluding the ability to use uninitialized registers and memories, nMigen eliminates category (1). By requiring (in the future; see #4) explicit synchronization in every case where a signal is sampled that is not known to be synchronous, and adding (as currently done) appropriate clock constraints, nMigen has / will partially eliminate category (2); it is, of course, still the responsibility of the designer to ensure that the input signals meet declared constraints. (An additional practical complication is permitting the use of synchronous inputs with complex timing relationships, such as those requiring phase shifts during sampling, without unnecessary synchronization stages.) ** Of course, instances may be used to reintroduce both of these categories. This near-complete elimination of nondeterminism on language level is highly desirable. The reason I am hesitating regarding your suggestions is that they introduces much of it back--indeed, more than one might expect. Let's consider two of them by themselves. First, power-on reset. On FPGAs, the internal power-on reset signal is, in general, asynchronous to any user clock. This means that every design meets category (2) right at the start. As a consequence, every in-tree FPGA platform performs power-on sequencing in the default startup code, either synchronizing the reset to the clock, or synchronously gating the clock with the reset. Without this, the safety properties would not hold. The implementation of the checker #4 should take this into account. The reason I mentioned metastability is because it is not clear just what the contract of nMigen on ASIC should be. Should it consider the ASIC power-on reset an asynchronous event it has to take into account? Then the timings of that event in relation to clocks, etc, would need to be considered. In any case this problem is comparatively minor, since, as you have mentioned, you can always add a global power-on reset signal and make the behavior identical to existing FPGA platforms. If it is desirable to avoid that, and the platform guarantees that every flip-flop ends up as stable 0 or 1 shortly after reset, then they may be modelled as uninitialized values, i.e. category (1). Second, memory initialization. It is undeniable that all ASIC RAMs and some FPGA RAMs cannot be initialized. There is simply no way around this platform restriction, so these would have to be modelled as uninitialized values, i.e. category (1). And here's where the problem lies. Verilog collapses all three categories of nondeterminism into just (As an aside, VHDL appears to support "uninitialized" and "strong 0-or-1" as separate values. I am not sure what their exact semantics is, though, but it looks kind of like it would solve this problem.) This is not a new problem. Treating access to uninitialized data (and other illegal operations) as a marker of unreachable state space is something that C and C++ did for a very long time, often with disastrous results. The way undefined behavior would propagate without bounds has long been recognized as a source of serious issues in critical software, and recently LLVM has introduced the In context of a HDL in general and Yosys in particular, an equivalent for the LLVM In terms of implementation, I think this cell could be lowered to Verilog as a function that includes a |
But the |
No, taking care of the timing would be a task of the platform but the nMigen contract should be setup in such a way that the platform (e.g. ASIC) can do it efficiently.
The problem is not that minor because if you introduce logic on asynchronous signals like a reset you have to make sure it is glitch free. This also means this has to be custom and can't be done with regular synthesis flow and also that the synthesis flow does not try to do optimizations on the logic. |
As said timing should be taken care of by platform, but for ASIC handling of initialization with an asynchronous reset can be a solution. reg sync_sig = 1'h0;
reg \sync_sig$next ;
always @(posedge clk)
sync_sig <= \sync_sig$next ; ASIC friendly code: reg sync_sig;
reg \sync_sig$next ;
always @(posedge clk or negedge por_reset_n)
if (por_reset_n = 0)
sync_sig <= 1'h0;
else
sync_sig <= \sync_sig$next ;
end Active low logic for por_reset is handy as then the signal can be generated by extra delay on supply voltage during power-up with a RC circuit without needing an on-chip POR circuit. But this could also be done by custom yosys code for the ASIC platform. reg sync_sig;
reg \sync_sig$next ;
always @(posedge clk)
sync_sig <= \sync_sig$next ; And also handle the signal as unitialized in the simulators. |
Right now that is of course not the case because |
I'm not sure how I can make my point more clear. A I am looking for a real life use case for such a signal where you need the POR reset but not the clock domain reset other than not having to support unitialized values in the simulator. Such a use case would allow me to think about improving nMigen ASIC friendliness taking it into account. Personally I would currently say it is even better to use an unitialized value for |
OK. I understand your argument now. Thank you for the explanation. I would not call this nondeterminism (at least when the clock domain has a synchronous reset) because the circuit behaves in a completely predictable way, but I agree that it introduces similar design challenges.
I agree that we can change the semantics of
nMigen already supports asynchronous resets. Use
It would be trivial to add negedge resets, and this is planned specifically for ASICs, see #184. |
Robust in what way ? Do you mean the difference in behavior of pysim handling of unitialized values and 'X'/'U' propogation in different simulators ? For example verilator is known to not follow all Verilog rules strictly.
OK, but I would like to use the asynchronous reset with a signal coming from a POR circuit for handling the POR behavior as given in the example code earlier. For clock domains the current default of synchronous reset is also good for ASICs. |
The behavior of X-propagation in simulation is fine. You are correct that it is different in pysim, iverilog and verilator, but I believe that difference is not a significant hazard. However, the behavior of X-propagation in synthesis can be very harmful. Suppose you have a
What I meant is that nMigen can (with the exception of polarity, which is easy to fix) generate your "ASIC friendly code" example above. It is up to you whether and where you want to use that capability. |
Understood, thanks. Is it the same for 'U', e.g. unitialized ?
From what I understood it is only for clock domains and not the case for the POR behavior. For POR behavior code currently depends on initial behavior of Verilog which is not ASIC compatible. But like said above if clock domain reset is guaranteed to have the same effect than POR, POR would not be needed for ASIC by enforcing a clock domain reset at start. It would then be OK to just ignore initial setup in Verilog as is done for ASIC synthesis. |
Neither Yosys nor Verilog have |
Is the use case of undefined as a special value in simulation to prove that this value does not have an impact on some output? If yes, then aren't there better tools to prove that from formal verification? |
AFAIK, right now SymbiYosys does not model |
There may be no need for x or u at all if all you want is to prove that the actual value 0 or 1 does not matter (in an application specific sense of matter). |
That's only if you already have a formal specification, and this specification is sound. If you don't have one, there is no automatic way to verify this property. |
But the same is true for analysing a simulation with x/u. If you don't have a specification of where they must not propagate then the entire endeavor is meaningless. |
Sure. But you probably already have an informal specification, which you apply by looking at gtkwave for the output corresponding to your test vectors. That is a much weaker approach, but if you already consider simulation valuable in spite of it being much weaker than property testing, you're fine with that. Also, running |
True. But it's not obvious to me that covering that use case with some augmented formal verification tooling would be harder than adding support for u/x/ throughout the graph. It would be tooling tailored to proving the invariance of this value w.r.t. some input/reset value. |
Adding support for conservative U-propagation is the same as adding a Python class that overrides the arithmetic/bitwise operations on integers to return itself. It is essentially trivial. |
If |
it may be a good idea to track Uninitialized at the bit-level by using a mask, since that allows packing signals together using Cat and then unpacking them later without known bits becoming uninitialized. If you like, I could work on a pull request to implement that for the simulator. |
I'm not sure if that's a good idea. My proposal (coarse-grained Uninitialized) can be computed at very low cost and with almost no changes to the simulator. Your proposal has a significant runtime cost and development cost (how do you validate it? every operation gains a third row/column in the truth table, like in Verilog) and requires major changes to the simulator. |
If values where all bits are initialized are implemented using the existing mechanics of just using an class PartiallyInitialized:
"""
value: int
the value. uninitialized bits should be 0.
mask: int
the mask for which bits are initialized in `value`. 0 bits mean that the
corresponding bit in `value` is uninitialized. 1 bits mean that the
corresponding bit in `value` is initialized. The sign is treated similarly,
allowing values to have an uninitialized sign bit. mask is assumed
to have some zero bits, otherwise an int could have been used instead
of PartiallyUninitialized
"""
def __init__(self, value, mask):
self.value = value
self.mask = mask
@classmethod
def make(cls, value, mask):
"""get PartiallyInitialized or int."""
assert isinstance(value, int)
assert isinstance(mask, int)
value &= mask
if mask == -1:
return value
return cls(value, mask)
@staticmethod
def get_value_and_mask(value):
if isinstance(value, PartiallyInitialized):
return self.value, self.mask
assert isinstance(value, int)
return value, -1
@classmethod
def uninitialized(cls):
"""totally uninitialized value"""
return cls(0, 0)
# all arithmetic operations return uninitialized
def __add__(self, other):
return self.uninitialized()
def __radd__(self, other):
return self.uninitialized()
# similarly for all other arithmetic operations
# all bitwise operations `&` the masks together and do the operation on the values
def __invert__(self):
return self.make(~self.value, self.mask)
def __and__(self, other):
other_value, other_mask = self.get_value_and_mask(other)
return self.make(self.value & other_value, self.mask & other_mask)
def __rand__(self, other):
other_value, other_mask = self.get_value_and_mask(other)
return self.make(self.value & other_value, self.mask & other_mask)
def __or__(self, other):
other_value, other_mask = self.get_value_and_mask(other)
return self.make(self.value | other_value, self.mask & other_mask)
def __ror__(self, other):
other_value, other_mask = self.get_value_and_mask(other)
return self.make(self.value | other_value, self.mask & other_mask)
# similarly for xor
# shifts return `uninitialized` for partially-initialized shift amounts,
# otherwise just shift value and mask
def __lshift__(self, other):
other_value, other_mask = self.get_value_and_mask(other)
if other_mask != -1:
return self.uninitialized()
return self.make(self.value << other_value, self.mask << other_value)
def __rlshift__(self, other):
# self is partially uninitialized
return self.uninitialized()
def __rshift__(self, other):
other_value, other_mask = self.get_value_and_mask(other)
if other_mask != -1:
return self.uninitialized()
return self.make(self.value >> other_value, self.mask >> other_value)
def __rrshift__(self, other):
# self is partially uninitialized
return self.uninitialized() bitslicing, |
I see. In terms of performance impact, your proposal is similar to mine (they both rely on bimorphic call sites) with one important caveat: if you reconstruct a full-width mask from several partial-width masks, in this implementation it wouldn't become In terms of semantics, this is essentially the same as Verilog, right? I don't feel comfortable using it without randomized testing against Yosys, then. (Actually, it is bad enough that we don't test pysim against Yosys, but current pysim is way too slow. I'm working on pysim2 right now that should address this problem.) |
I don't quite follow, could you give a concrete example?
yeah, except that I picked the above rules since uninitialized bits always propagate, they can't be hidden (by anding with 0, for example). Mux would have to be implemented using a python
I'm more comfortable since uninitialized bits can't be hidden without Mux or actual control-flow, but I think testing against yosys is a good idea even if uninitialized doesn't get implemented at all.
yay for pysim2! |
Nevermind that, on second look I think I was just wrong.
Is this a good thing? I'm worried that we'll end up with rules almost but not quite matching Verilog X-propagation, so people will come to expect netlist simulators like iverilog to match pysim, which will not be the case but also won't be obvious. I think this isn't obviously desirable, and needs further thought. I wonder if we should rather do X-propagation completely on our own somehow, and use casex to detect the case of reading uninitialized data from memory, or something like that. |
I'm not sure myself. There is on the one hand the much less complex rules that uninitialized input bits always output uninitialized bits (unless using Mux or control-flow), on the other hand compatibility with Verilog and all its gnarly details does have its own benefits. If we implement uninitialized propagation ourselves (using casex or similar), we need to ensure it doesn't result in less optimal synthesis results, which may be difficult. We could also just disable all the uninitialized-handling code when synthesizing, which may work (assuming yosys or other tools don't take liberties). |
They do take liberties, so this is definitely not an option. Please refer to my earlier comment on the 3rd category of nondeterminism for a deeper explanation why.
I think less optimal synthesis results could be a consequence of "sealing away" uninitializedness at the point where it introduced, which is orthogonal to having it propagate. Indeed, even if nMigen would not have any built-in notion of uninitializedness whatsoever (like in @Fatsie's original proposal, which has the simulator randomize these values), then I would still insist that it use Once we do that, any X-propagation rules we impose on top would be considered unused logic by synthesis tools, so they won't affect synthesis any further. |
I am using nmigen for generating RTL to be implemented on an ASIC. In an ASIC the content of a memory block is random at startup. Currently generated RTL by nmigen initializes memories by default with 0, which does not match behavior of a memory on an ASIC.
I am using external simulators and not pysim as my designs currently contain external verilog and VHDL code.
Some comments on the code:
reset_less
after the same parameter name forSignal
. I am open for suggestion for other/better name.True
. I suppose pysim does not want to support 'X' or 'U' values for signals.