-
Notifications
You must be signed in to change notification settings - Fork 21
Using QuickCheck
So QuickCheck sounds like something that won't take long to explain?
Well, the basics are pretty quick yes, but the name really means that checks are performed quickly.
Okay, so what checks are we talking about here?
Do you remember what a Proof Obligation is, sometimes called a "PO"?
Mmmm, yeah you did mention that, but... why don't you remind me?
Sure. We'll need an example to work with anyway, so let start with that. Imagine you had this very small specification, with just one function:
functions
f: seq of nat * nat -> nat
f(s, i) == s(i);
That's pretty small. But it's also pointless, surely? Calling f(s, i)
is just the same as using s(i)
?
Yes, it's just a simple example to show you how QuickCheck works. As you say, all this function does is return the i'th element of the sequence passed in. So can you see anything wrong with this spec?
Well... not really. I mean, it just returns the i'th element. It looks fine.
So what would happen if I passed an empty sequence?
Ah yes, okay I see...
Or if I passed a valid sequence, but an index of zero or one beyond the end of the sequence?
Right, it would fail. But why don't the language tools tell me about this?
Well, the spec "isn't wrong" - it works perfectly if you use the function as intended. But if you aren't careful, there are problems. So really, the tools are saying that this is perfectly acceptable, BUT you have an obligation to call the function with sensible arguments.
That's quite a big BUT there. We're building formally defined systems, so we can't depend on callers "being careful", surely?
Absolutely not. So the tools can tell you about this.
I thought you said they didn't?
The default parse and typecheck doesn't raise this issue, because as I said, it "isn't wrong". But there is another level of checking that the tools can give you. In cases like this, the tools can produce "Proof Obligations" that effectively define what callers of this function must guarantee.
You've mentioned the Proof Obligation Generator before, I remember.
That's the one.
So what does a Proof Obligation look like?
Let's try with the example. You generate POs with the "pog" command:
> pog
Generated 1 proof obligation:
Proof Obligation 1: (Unproved)
f: sequence apply obligation in 'DEFAULT' (test.vdm) at line 3:16
(forall s:seq of (nat), i:nat &
i in set inds s)
Alright, hold on... that's just a VDM-SL expression?
We like to eat our own dogfood. But can you see what it's saying?
So it's a boolean expression for a start...
True.
And it's saying that for every possible seq of nat
and every possible nat
, the latter is a valid index of the former. Okay, so yes, this is saying that every number is a valid index of every sequence... but that's rubbish, surely?
Or "false", yes. There are some sequences and natural numbers that work together, but it isn't generally true for all sequences and numbers.
So, hold on. This "PO" is just a VDM expression. I can see how it relates to the spec, but it doesn't call f
to test it?
Right. It isn't like a unit test. It's an "obligation" on you that follows from the way you've written the function. So you must prove that this obligation is met - sometimes we use the term "discharge", meaning to show that the obligation expression is always true.
But it isn't true, surely?
I agree. But how do you know?
Well, just look at it! It's saying that all numbers are valid indexes for all sequences. That's rubbish. 12345 isn't a valid index of [1,2,3] for example, and zero isn't a valid index of anything. So it's obviously false.
Exactly. You know that the obligation isn't provable because you can think of a counterexample. That's definitely a way to show that an obligation cannot be met or discharged or proved or whatever you want to call it.
Okay, so the PO generator will show you the obligations, and then you have to try to think of counterexamples?
Well, that works for a simple example like this. But a spec doesn't have to get much more complicated before its POs are much harder to understand and discharge.
But can't the tools do that for me?
Sometimes, yes. That is what QuickCheck does. It tries to quickly find counterexamples to Proof Obligations.
Ah, hence the name! I see. So could QuickCheck find a counterexample here?
Let's try:
> qc
PO #1, FAILED in 0.001s
Counterexample: i = 0, s = []
----
f: sequence apply obligation in 'DEFAULT' (test.vdm) at line 3:16
(forall s:seq of (nat), i:nat &
i in set inds s)
Wow, that's quick!
It's a VERY small example, but yes, it is trying to take no more than a few seconds to find a counterexample.
And qc
just means QuickCheck?
Yes, you can actually type quickcheck
if you want, but... we're trying to be quick here.
So hold on, let me try to understand what it's saying. The PO "failed" - that's failed to verify in some way, I guess - and the counterexample is shown as a value for i
and s
. Those are from the forall
at the start of the PO. And sure enough, i
is not in the set inds s
. Fantastic!
Right, so we quickly identified a problem. But what should we do about it?
Why do we have to do anything about it?
If you leave it as it stands, and someone calls this function without meeting this obligation (and nothing is forcing them to), the specification will give undefined behaviour (in the real world, the system will crash). So until this is fixed, you're depending on human beings being careful. That's not good!
Okay, but surely if we fix this problem, won't it just mean that the callers of the function have to change? We've just shifted the problem somewhere else?
That's true. But if they have to change too, it means that there genuinely was a problem with what they were doing. And fixing that will allow us to find and fix problems with their callers, and so on. So the process of discharging POs ripples through your specification, bringing benefits throughout.
Okay I get that. So to fix this example, we have to tweak the spec somehow?
Yes, but how?
Hmmm. Well, that PO will never be true, so we have to change the spec so that it doesn't generate this PO, or perhaps doesn't generate any POs?
Basically, yes that's right. Either we have to tweak the spec so that it generates a PO that is always true, or so that there are simply no obligations placed on us. That will then have "tightened" the specification to the point where we are sure that it will always do what we expect, instead of giving undefined behaviour for some inputs. As you said before, this is a formal specification, so it should be well defined for all valid inputs.
So what can we tweak to fix this?
In cases like this, the problem is because the arguments passed to the function are not sufficiently constrained to guarantee what the function does. They're too loose, in a sense.
So we can tighten them?
Of course. Any idea how?
Hmm. One thing I thought of when you mentioned passing zero. The index to a sequence in VDM starts at 1. So really the i
parameter ought to be a nat1
. Would that be enough?
Let's try that. Here's the new obligation and qc
result...
> pog
Generated 1 proof obligation:
Proof Obligation 1: (Unproved)
f: sequence apply obligation in 'DEFAULT' (test.vdm) at line 3:16
(forall s:seq of (nat), i:nat1 &
i in set inds s)
> qc
PO #1, FAILED in 0.003s
Counterexample: i = 1, s = []
----
f: sequence apply obligation in 'DEFAULT' (test.vdm) at line 3:16
(forall s:seq of (nat), i:nat1 &
i in set inds s)
>
Okay, so I see that the i
is now a nat1
, so it's just switched the counterexample from 0 to 1. But we're still not done fixing the spec then? We still have to fix that obligation?
Yes. Any other ideas how?
All I can think of is spelling it out... adding a precondition that says the index matches the sequence?
Okay let's try that... I'll add a precondition that says i in set inds s
> pog
Generated 1 proof obligation:
Proof Obligation 1: (Unproved)
f: sequence apply obligation in 'DEFAULT' (test.vdm) at line 3:16
(forall s:seq of (nat), i:nat1 & pre_f(s, i) =>
i in set inds s)
> qc
PO #1, MAYBE in 0.001s
>
LOL, well I suppose MAYBE
is an improvement. But what does that mean?
MAYBE
means that it couldn't find any counterexamples.
But if there are no counterexamples, it must always be true, surely?
It only means it couldn't find any counterexamples. Maybe they exist, but it couldn't find one quickly.
Hmmm. I was hoping we could prove things. That's an important part of using formal specification, right?
Yes, if you want very high confidence in your specification, you need to prove its obligations. But that is a non-trivial process in general. The QuickCheck tool is trying to quickly catch the "stupid cases" where you have missed a check or a precondition and so on. That helps improve specifications, but a QuickCheck MAYBE
is not the same as a proof. It's necessary, but not sufficient.
So qc
can't really prove anything?
It is really designed to disprove POs. But there are also some cases where it can indicate POs should be provable. For example when POs follow very simple patterns which are known always to be true, or if you have finite types, where it can check every possible value, or if the PO starts with "exists" rather than "forall", then one witness value is proof of existence.
But this still only means that it should be provable, not that it is actually proved?
Yes. We have to be careful in what we claim about proof. QuickCheck is not a prover/solver and there are lots of subtle aspects to fully formal proof that the tool cannot analyse. But if we claim an obligation is PROVABLE
it means we have reason to believe that a proof should be easy to find, and the tool may explain how it came to that conclusion. This is very useful, but it is not a formal proof.
So for example, if I wrap the s(i)
result in a check, like if i in set inds s then s(i) else 0
, you get the following:
> pog
Generated 1 proof obligation:
Proof Obligation 1: (Unproved)
f: sequence apply obligation in 'DEFAULT' (test.vdm) at line 3:40
(forall s:seq of (nat), i:nat1 &
((i in set (inds s)) =>
i in set inds s))
> qc
PO #1, PROVABLE by trivial (i in set (inds s)) in 0.001s
>
You can see the PO says, "if the index works, the index works". Well duh, of course that's true, so the PO is "trivially" provable, and it shows you the expression that it knew had to be true.
So why didn't that work with that same clause in the precondition?
That's because the precondition uses pre_f(s, i)
as the test in the PO. So qc
can't quickly tell that it matches the simple pattern. A more powerful proof engine would be able to do that though.
And you said that finite types could be checked exhaustively? Does that lead to a provable PO?
Yes! It works when your Proof Obligation only uses finite types, and the size of the type must be less that 100,000 by default. But sometimes it can indicate these POs are provable. For example, imagine a function like this:
functions
f: set of set of bool -> nat1
f(s) ==
if s = {}
then 999
else card s;
> pog
Generated 1 proof obligation:
Proof Obligation 1: (Unproved)
f: subtype obligation in 'DEFAULT' (test.vdm) at line 2:5
(forall s:set of (set of (bool)) &
(if (s = {}) then 999 else (card s)) > 0)
> qc
PO #1, PROVABLE by finite types in 0.011s
>
Admittedly this is a contrived example, but you see that the obligation says that the body of the function must produce a nat1
for all sets of sets of booleans. But that is a finite type (there are only 16 members of the type), so it can easily try them all. And since there are no counterexamples found, and every value was tried, the PO should be PROVABLE
as you see.
Okay, so QuickCheck is small and fast and tries to find problems, but ultimately for formal proof you need more powerful tools?
Exactly. Though we can keep improving qc
to try more and more strategies, to find as many problems as we can. Ultimately, we can write a strategy that calls out to a more powerful solver.
What's a strategy?
Internally, qc
uses several strategies to find counterexamples. There are six of them built into the tool, which you can see with -help
:
> qc -help
Usage: quickcheck [-?|-help][-q|-v|-n][-t <msecs>]
[-i <status>]* [-s <strategy>]* [-<strategy:option>]* [<PO numbers/ranges/patterns>]
-?|-help - show command help
-q|-v|-n - run with minimal or verbose output
-t <msecs> - timeout in milliseseconds
-i <status> - only show this result status
-s <strategy> - enable this strategy (below)
-<strategy:option> - pass option to strategy
PO# numbers - only process these POs
PO# - PO# - process a range of POs
<pattern> - process PO names or modules matching
Enabled strategies:
fixed [-fixed:file <file> | -fixed:create <file>][-fixed:size <size>]
search (no options)
finite [-finite:size <size>]
trivial (no options)
direct (no options)
Disabled strategies (add with -s <name>):
random [-random:size <size>][-random:seed <seed>]
>
So that shows the general usage, and that there are five enabled strategies ("fixed", "search", "finite", "trivial" and "direct") and one disabled one ("random"). More strategies can be added as plugins.
Why would you disable a strategy?
You might choose to disable strategies that are very specialized or expensive to run, or if they more or less duplicate an enabled one. For example, the "random" strategy chooses moderately random numbers for a nat1
; the "fixed" strategy chooses 1, 2, 3, ... Those are very similar, so the random one is disabled by default. If they were both enabled, it would still work but it would not be as quick.
Okay, I see. What does the "search" strategy do?
It takes a different approach. Instead of generating lots of values for each forall binding, it searches the body of the PO expression, looking for boolean checks that it might be able to violate. So if the PO has x <> 0
, it will try setting x
to zero, naively. There are a dozen or more of these cases that it can violate easily, without complex analysis.
Right, I see how that could work. What about "finite"?
That one works for finite types - like set of bool
or <A>|<B>|<C>
that have a finite number of values. In these cases, if the type is small enough, we can enumerate all of the possible values, rather than selecting just a few as the other strategies do.
Is the "trivial" strategy as trivial as it sounds?
Pretty much, yes. This strategy takes advantage of the fact that some POs have a simple structure like <something> => <something>
. You get this because specs often guard against <something>
before doing something that requires it to be true. So the POs reflect that, but these are "trivially true". We saw an example of that earlier.
And "direct"?
This strategy is slightly different in that it is not trying to discharge obligations themselves. Rather, it works out what the obligation is trying to verify, and achieves the same effect "directly" - by a different analysis of the specification itself.
Umm... you'd better give me an example.
Well, for example, total function obligations are trying to verify that functions declared to be total genuinely produce a result for every possible argument value. So the PO says, "for all arguments, applying the function with those argument produces a value of the right type". It is possible to prove that obligation expression, but it is also possible to directly show that the body of the function is a total function - because it is comprised entirely of total operators and returns a well defined type, so it will always produce a result.
Okay, I see. So what do the extra options do, like -fixed:size
?
The size
option sets the number of values for strategies to generate. Sizes usually default to a small number like 20, so that the tool is quick. But if you want to really chew on a particular PO, you could increase this to (say) 1000 values or more. For example:
functions
f: nat1 -> nat
f(i) == let s = [a | a in set {1,...,1000}] in s(i);
> qc
PO #1, MAYBE in 0.074s
> qc -fixed:size 2000
PO #1, FAILED in 1.195s
Counterexample: i = 1001
----
f: sequence apply obligation in 'DEFAULT' (test.vdm) at line 4:48
(forall i:nat1 &
(let s:seq of (nat1) = [a | a in set {1, ... ,1000}] in
i in set inds s))
So you see here that the 2000 value run took nearly 50x longer than the default, but it successfully found a counterexample, whereas the default wasn't sure. So if the default settings don't find a counterexample, it may be worth trying a larger size like this, if you have time.
Hmm, so using a larger size is more likely to find a counterexample, but it will take longer?
Exactly.
I can guess what -random:seed
does, but what are the -fixed:create
and -fixed:file
options for?
The idea is that the fixed strategy uses fixed values for each type, hence the name. It will use sensible fixed values by default, but it is possible to use your own fixed values, as VDM-SL set expressions. This may help with very specialized cases, especially with complex types. You can probably imagine that an auto-generated record value, with a dozen fields, would be nonsense (albeit legal). So for cases like this, you can create a file with your own sensible values.
Ah! So that's what the -fixed:create
does?
Yes. You can either give it a filename argument or use the default, which is "ranges.qc" in the current directory.
Can we try that?
Sure...
> qc -fixed:create
Created 2 default ranges in 'ranges.qc'.
Check them! Then run 'qc -s fixed'
>
$ cat ranges.qc
-- in 'DEFAULT' (test.vdm) at line 3:16
s:seq of (nat) = 20;
-- in 'DEFAULT' (test.vdm) at line 3:16
i:nat1 = 20;
$
Okay wait, I can see you've listed the "ranges.qc" file, but I was expecting to see VDM values?
But if you look at the -fixed:create
output, it says it created two default values in the ranges.qc file, which you should check. If you just use a number like "20", it will generate 20 values internally. But you can use explicit VDM set expressions instead. So for example, I can change the file to be...
-- in 'DEFAULT' (test.vdm) at line 3:16
s:seq of (nat) = { [a, b, c] | a, b, c in set {1, ..., 10} };
-- in 'DEFAULT' (test.vdm) at line 3:16
i:nat1 = {1, ..., 100};
> qc
PO #1, FAILED in 0.054s
Counterexample: i = 4, s = [1, 1, 10]
----
f: sequence apply obligation in 'DEFAULT' (test.vdm) at line 3:16
(forall s:seq of (nat), i:nat1 &
i in set inds s)
Note that the filename defaults to "ranges.qc". We could use -fixed:file
to use a different ranges file. So you could have a collection of specialized range files to focus on certain subsets of the POs.
Okay, that makes sense. And you can also give specific PO numbers to qc
?
Yes, after the other options. By default, it will check all of the POs in the current module or class. But you can use special range.qc files for subsets of POs and then check them by passing the PO number(s). You can also give an inclusive range of PO numbers with a hyphen, like 100 - 120
, or you can give function or module/class name patterns to process all of the POs that match those patterns.
That could still get a bit tedious?
That's why we have the script
command, remember? So you can put all of these qc
commands into a script and execute them easily.
Cool! So is there anything else I should know about QuickCheck?
One point is that it can't process all POs. You can still run qc
, but those POs will show a status of UNCHECKED
.
Why can't you check them?
There are various reasons. For example, one common case is where you have a recursive function without a measure defined. That produces an obligation on the measure_f
call, but since there is no measure, that function is not defined. In effect it is inviting you to define a measure, and there would be a warning telling you this too. Another simple example is with operations that contain loops. Here the logic is too complicated to analyse at the moment, though it is an area for future improvement. You should get an explanation for why a PO is UNCHECKED.
PO #85, UNCHECKED (Obligation for missing measure function)
PO #96, UNCHECKED (Loop modifies state)
PO #98, UNCHECKED (Not yet supported for operations)
PO #100, UNCHECKED (Earlier statements may have updated state at 817:18)
Okay. Anything else?
You sometimes get a Note
with a MAYBE
result from qc
, which is hinting that there might be a problem, even though it could not find a counterexample. For example:
f: nat * map nat to nat -> nat
f(n, m) ==
if n > 1000
then m(n)
else n;
Proof Obligation 1: (Unproved)
f: map apply obligation in 'DEFAULT' (test.vdmsl) at line 5:10
(forall n:nat, m:map nat to nat &
((n > 1000) =>
n in set dom m))
> qc
PO #1, MAYBE (Note: does not check m?) in 0.007s
>
You see that it is saying that the obligation "does not check m
?"
So what is that telling me?
It means that although it could not find a counterexample to this obligation, it is suspicious that the specification does not appear to check the value of m
before the use of the map in m(n)
. So how can it meet the obligation that n in set dom m
for certain? Really it ought to have something like this domain test as a precondition, although there might be other good reasons why this is actually okay. It's a suspicious situation that might help you to determine whether the MAYBE
is safe or not.
Okay, that's worth knowing. Anything else I should know?
You may sometimes find that the evaluation of the PO with some particular values produces a runtime error in the PO itself. This is rare, but it can happen. In that case, you will get a normal VDMJ error message, as well as a counterexample and the PO. The error is to help you work out why the PO itself is failing, which usually indicates an underlying issue in your spec.
Here's a small example:
functions
f: seq of nat -> nat
f(a) == let s = g(a) in hd s;
g: seq of nat -> seq of nat
g(a) == [hd a] ^ tl a;
> qc
PO #1, FAILED in 0.012s
Counterexample: a = []
Causes Error 4010: Cannot take head of empty sequence in 'DEFAULT' (test.vdm) at line 6:14
----
f: non-empty sequence obligation in 'DEFAULT' (test.vdm) at line 3:32
(forall a:seq of (nat) &
(let s:seq of (nat) = g(a) in
s <> []))
> break f
Created break [1] in 'DEFAULT' (test.vdm) at line 3:13
3: f(a) == let s = g(a) in hd s;
> qcrun 1
=> print f([])
Stopped break [1] in 'DEFAULT' (test.vdm) at line 3:13
3: f(a) == let s = g(a) in hd s;
MainThread> s
6: g(a) == [hd a] ^ tl a;
MainThread> s
Error 4010: Cannot take head of empty sequence in 'DEFAULT' (test.vdm) at line 6:14
6: g(a) == [hd a] ^ tl a;
MainThread>
So you can see that on the way to checking s <> []
it has to call g(a)
, which causes an error if a
is empty. You get other POs that tell you that the argument to g(a)
cannot be empty, but until those are fixed, this PO fails during execution. The debugging session shows the qcrun
command, which will invoke a print
command from a PO with a counterexample. You can see in this case it made print f([])
, which we then step up to the point of the error.
Okay, so qcrun
looks useful for debugging counterexamples generally. Anything else I should know?
One point to remember is that, if you're changing your specification, the PO numbers will change. This could be annoying if you have prepared scripts with qc
commands for specific PO numbers. But remember you can give qc
a list of names, meaning "all the POs in this list of functions", which is more stable (until you rename the functions!). You can see the problem.
Yes, tricky to fix that one. Anything else?
One thing to mention is the @QuickCheck
annotation that comes with QC. This can be used to bind particular types to a polymorphic function's type parameters when QC checks it. By default, @T
parameters are bound to real
, but that may not be sensible for all functions. For example:
functions
-- @QuickCheck @T = bool, set of bool;
f[@T]: set of @T -> nat1
f(s) == card s;
> qc
PO #1, FAILED in 0.004s
Counterexample: s = {}, T = bool
----
f: subtype obligation in 'DEFAULT' (test.vdm) at line 3:5
(forall s:set of (@T) &
(card s) > 0)
Notice that the counterexample shows @T
bound to bool
, which was the first of the types given in the @QuickCheck
. This particular example would also fail with the default binding of @T
to real
, but you see the point.
Yes, okay. Anything else?
There's a timeout on the tool, so that really complex examples don't lock up the machine. By default it's 5000ms, which should be plenty of time for most POs, but you can get a result status of TIMEOUT
if a spec is very complex. This is roughly the same as MAYBE
. You can change the value using the -t
option to qc
. Setting it to zero means no timeout. There's also a -q
option to do the checks without much output, which is useful if you have another display of the results in a GUI tool, and a -v
option to be verbose, which can help if you're unsure what's happening. The -n option gives nominal output, which is just the status and the duration.
But otherwise, I think that's about it. I should probably add a description of how to write your own QuickCheck strategy at some point, but that's an advanced topic :-)
Well, if I get any strategy ideas, I'll let you know!
- TL;DR
- VDM Method
- VDMJ Overview
- VDMJ Docs for Specifiers
- VDMJ Docs for Tool Developers
- VDMJ Docs for LSP Developers