-
Notifications
You must be signed in to change notification settings - Fork 21
Using QuickCheck
So QuickCheck sounds like something that won't take long to explain?
Well, the basics are pretty quick yes, but the name really means that checks are performed quickly.
Okay, so what checks are we talking about here?
Do you remember what a Proof Obligation is, sometimes called a "PO"?
Mmmm, yeah you did mention that, but... why don't you remind me?
Sure. We'll need an example to work with anyway, so let start with that. Imagine you had this very small specification, with just one function:
functions
f: seq of nat * nat -> nat
f(s, i) == s(i);
That's pretty small. But it's also pointless, surely? Calling f(s, i)
is just the same as using s(i)
?
Yes, it's just a simple example to show you how QuickCheck works. As you say, all this function does is return the i'th element of the sequence passed in. So can you see anything wrong with this spec?
Well... not really. I mean, it just returns the i'th element. It looks fine.
So what would happen if I passed an empty sequence?
Ah yes, okay I see...
Or if I passed a valid sequence, but an index of zero or one beyond the end of the sequence?
Right, it would fail. But why don't the language tools tell me about this?
Well, the spec "isn't wrong" - it works perfectly if you use the function as intended. But if you aren't careful, there are problems. So really, the tools are saying that this is perfectly acceptable, BUT you have an obligation to call the function with sensible arguments.
That's quite a big BUT there. We're building formally defined systems, so we can't depend on callers "being careful", surely?
Absolutely not. So the tools can tell you about this.
I thought you said they didn't?
The default parse and typecheck doesn't raise this issue, because as I said, it "isn't wrong". But there is another level of checking that the tools can give you. In cases like this, the tools can produce "Proof Obligations" that effectively define what callers of this function must guarantee.
You've mentioned the Proof Obligation Generator before, I remember.
That's the one.
So what does a Proof Obligation look like?
Let's try with the example. You generate POs with the "pog" command:
> pog
Generated 1 proof obligation:
Proof Obligation 1: (Unproved)
f: sequence apply obligation in 'DEFAULT' (test.vdm) at line 3:16
(forall s:seq of (nat), i:nat &
i in set inds s)
Alright, hold on... that's just a VDM-SL expression?
We like to eat our own dogfood. But can you see what it's saying?
So it's a boolean expression for a start...
True.
And it's saying that for every possible seq of nat
and every possible nat
, the latter is a valid index of the former. Okay, so yes, this is saying that every number is a valid index of every sequence... but that's rubbish, surely?
Or "false", yes. There are some sequences and natural numbers that work together, but it isn't generally true for all sequences and numbers.
So, hold on. This "PO" is just a VDM expression. I can see how it relates to the spec, but it doesn't call f
to test it?
Right. It isn't like a unit test. It's an "obligation" on you that follows from the way you've written the function. So you must prove that this obligation is met - sometimes we use the term "discharge", meaning to show that the obligation expression is always true.
But it isn't true, surely?
I agree. But how do you know?
Well, just look at it! It's saying that all numbers are valid indexes for all sequences. That's rubbish. 12345 isn't a valid index of [1,2,3] for example, and zero isn't a valid index of anything. So it's obviously false.
Exactly. You know that the obligation isn't provable because you can think of a counterexample. That's definitely a way to show that an obligation cannot be met or discharged or proved or whatever you want to call it.
Okay, so the PO generator will show you the obligations, and then you have to try to think of counterexamples?
Well, that works for a simple example like this. But a spec doesn't have to get much more complicated before its POs are much harder to understand and discharge.
But can't the tools do that for me?
Sometimes, yes. That is what QuickCheck does. It tries to quickly find counterexamples to Proof Obligations.
Ah, hence the name! I see. So could QuickCheck find a counterexample here?
Let's try:
> qc
PO #1, FAILED in 0.001s: Counterexample: i = 0, s = []
----
f: sequence apply obligation in 'DEFAULT' (test.vdm) at line 3:16
(forall s:seq of (nat), i:nat &
i in set inds s)
Wow, that's quick!
It's a VERY small example, but yes, it is trying to take no more than a few seconds to find a counterexample.
And qc
just means QuickCheck?
Yes, you can actually type quickcheck
if you want, but... we're trying to be quick here.
So hold on, let me try to understand what it's saying. The PO "failed" - that's failed to verify in some way, I guess - and the counterexample is shown as a value for i
and s
. Those are from the forall
at the start of the PO. And sure enough, i
is not in the set inds s
. Fantastic!
Right, so we quickly identified a problem. But what should we do about it?
Why do we have to do anything about it?
If you leave it as it stands, and someone calls this function without meeting this obligation (and nothing is forcing them to), the specification will give undefined behaviour (in the real world, the system will crash). So until this is fixed, you're depending on human beings being careful. That's not good!
Okay, but surely if we fix this problem, won't it just mean that the callers of the function have to change? We've just shifted the problem somewhere else?
That's true. But if they have to change too, it means that there genuinely was a problem with what they were doing. And fixing that will allow us to find and fix problems with their callers, and so on. So the process of discharging POs ripples through your specification, bringing benefits throughout.
Okay I get that. So to fix this example, we have to tweak the spec somehow?
Yes, but how?
Hmmm. Well, that PO will never be true, so we have to change the spec so that it doesn't generate this PO, or perhaps doesn't generate any POs?
Basically, yes that's right. Either we have to tweak the spec so that it generates a PO that is always true, or so that there are simply no obligations placed on us. That will then have "tightened" the specification to the point where we are sure that it will always do what we expect, instead of giving undefined behaviour for some inputs. As you said before, this is a formal specification, so it should be well defined for all valid inputs.
So what can we tweak to fix this?
In cases like this, the problem is because the arguments passed to the function are not sufficiently constrained to guarantee what the function does. They're too loose, in a sense.
So we can tighten them?
Of course. Any idea how?
Hmm. One thing I thought of when you mentioned passing zero. The index to a sequence in VDM starts at 1. So really the i
parameter ought to be a nat1
. Would that be enough?
Let's try that. Here's the new obligation and qc
result...
> pog
Generated 1 proof obligation:
Proof Obligation 1: (Unproved)
f: sequence apply obligation in 'DEFAULT' (test.vdm) at line 3:16
(forall s:seq of (nat), i:nat1 &
i in set inds s)
> qc
PO #1, FAILED in 0.003s: Counterexample: i = 5, s = []
----
f: sequence apply obligation in 'DEFAULT' (test.vdm) at line 3:16
(forall s:seq of (nat), i:nat1 &
i in set inds s)
>
Okay, so I see that the i
is now a nat1
, but it's just switched the counterexample from 0 to 5. Why 5??
Yeah, it looks arbitrary. It's because qc
is using a random strategy to try to find counterexamples. You'll get a different small number every time you try it, unless you seed the random generator. It isn't completely random, but as long as it finds any counterexample that causes the PO to fail, that is good enough.
Okay. But we're still not done fixing the spec then? We still have to fix that obligation?
Yes. Any other ideas how?
All I can think of is spelling it out... adding a precondition that says the index matches the sequence?
Okay let's try that... I'll add a precondition that says i in set inds s
> pog
Generated 1 proof obligation:
Proof Obligation 1: (Unproved)
f: sequence apply obligation in 'DEFAULT' (test.vdm) at line 3:16
(forall s:seq of (nat), i:nat1 & pre_f(s, i) =>
i in set inds s)
> qc
PO #1, MAYBE in 0.027s
>
LOL, well I suppose MAYBE
is an improvement. But what does that mean?
MAYBE
means that it couldn't find any counterexamples.
But if there are no counterexamples, it must always be true, surely?
It only means it couldn't find any counterexamples. Maybe they exist, but it couldn't find one quickly.
Hmmm. I was hoping we could prove things. That's an important part of using formal specification, right?
Yes, if you want very high confidence in your specification, you need to prove its obligations. But that is a non-trivial process in general. The QuickCheck tool is trying to quickly catch the "stupid cases" where you have missed a check or a precondition and so on. That helps improve specifications, but a QuickCheck MAYBE
is not the same as a proof. It's necessary, but not sufficient.
So qc
can't really prove anything?
It is really designed to disprove POs. But there are also some cases where it can prove POs. For example when POs follow very simple patterns which are known always to be true, or if you have finite types, where it can check every possible value, or if the PO starts with "exists" rather than "forall", then one value is proof of existence.
So for example, if I wrap the s(i)
result in a check, like if i in set inds s then s(i) else 0
, you get the following:
> pog
Generated 1 proof obligation:
Proof Obligation 1: (Trivial by x in set s => x in set s)
f: sequence apply obligation in 'DEFAULT' (test.vdm) at line 3:40
(forall s:seq of (nat), i:nat1 &
((i in set (inds s)) =>
i in set inds s))
> qc
PO #1, TRIVIAL by x in set s => x in set s
>
You can see the PO says, "if the index works, the index works". Well duh, of course that's true, so the PO is "trivially" proved.
So why didn't that work with that clause in the precondition?
That's because the precondition uses pre_f(s, i)
as the test in the PO. So qc
can't quickly tell that it matches the simple pattern. A more powerful proof engine would be able to do that though.
And you said that finite types could be checked exhaustively? Does that lead to a proof?
Yes! It works when your Proof Obligation only uses finite types, and the size of the type must be less that 100,000 by default. But sometimes it can prove these POs. For example, imagine a function like this:
functions
f: set of set of bool -> nat1
f(s) ==
if s = {}
then 999
else card s;
> pog
Generated 1 proof obligation:
Proof Obligation 1: (Unproved)
f: subtype obligation in 'DEFAULT' (test.vdm) at line 2:5
(forall s:set of (set of (bool)) &
(if (s = {}) then 999 else (card s)) > 0)
> qc
PO #1, PROVED in 0.011s
>
Admittedly this is a contrived example, but you see that the obligation says that the body of the function must produce a nat1
for all sets of sets of booleans. But that is a finite type (there are only 16 members of the type), so it can easily try them all. And since there are no counterexamples found, and every value was tried, the PO must be PROVED
as you see.
Okay, so QuickCheck is small and fast and tries to find problems, but ultimately for complex cases you need more powerful tools?
Exactly. Though we can keep improving qc
to try more and more strategies, to find as many problems as we can. Ultimately, we can write a strategy that calls out to a more powerful solver.
What's a strategy?
Internally, qc
uses several strategies to find counterexamples. There are five of them built into the tool, which you can see with -help
:
> qc -help
Usage: quickcheck [-?|-help][-t <secs>][-s <strategy>]* [-<strategy:option>]* [<PO numbers/ranges/patterns>]
Enabled strategies:
search (no options)
random [-random:size <size>][-random:seed <seed>]
finite [-finite:size <size>]
trivial (no options)
Disabled strategies (add with -s <name>):
fixed [-fixed:file <file> | -fixed:create <file>][-fixed:size <size>]
>
So that shows the general usage, and that there are four enabled strategies ("search", "random", "finite" and "trivial") and one disabled one ("fixed"). More strategies can be added as plugins.
Why would you disable a strategy?
You might choose to disable strategies that are very specialized or expensive to run, or if they more or less duplicate an enabled one. For example, the "random" strategy chooses moderately random numbers for a nat1
; the "fixed" strategy chooses 1, 2, 3, ... Those are very similar, so the fixed one is disabled by default, as it is more complex to configure. If they were both enabled, it would still work but it would not be as quick.
Okay, I see. What does the "search" strategy do?
It takes a different approach. Instead of generating lots of values for each forall binding, it searches the body of the PO expression, looking for boolean checks that it might be able to violate. So if the PO has x <> 0
, it will try setting x
to zero, naively. There are a dozen or more of these cases that it can violate easily, without complex analysis.
Okay, I see how that could work. What about "finite"?
That one works for finite types - like set of bool
or <A>|<B>|<C>
that have a finite number of values. In these cases, if the type is small enough, we can enumerate all of the possible values, rather than selecting just a few as the other strategies do.
Is the "trivial" strategy as trivial as it sounds?
Pretty much, yes. This strategy takes advantage of the fact that some POs have a simple structure like <something> => <something>
. You get this because spec often guard against <something>
before doing something that requires it to be true. So the POs reflect that, but these are "trivially true". We saw an example of that earlier.
Okay, I see. And what do the extra options do, like -random:size
?
The size
option sets the number of values for strategies to generate. Sizes usually default to a small number like 20, so that the tool is quick. But if you want to really chew on a particular PO, you could increase this to (say) 1000 values or more. For example:
functions
f: nat1 -> nat
f(i) == let s = [a | a in set {1,...,1000}] in s(i);
> qc
PO #1, MAYBE in 0.074s
> qc -random:size 1000
PO #1, FAILED in 0.572s: Counterexample: i = 1067
----
f: sequence apply obligation in 'DEFAULT' (test.vdm) at line 4:48
(forall i:nat1 &
(let s:seq of (nat1) = [a | a in set {1, ... ,1000}] in
i in set inds s))
So you see here that the 1000 value run took over 7x longer than the default, but it successfully found a counterexample, whereas the default wasn't sure. So if the default settings don't find a counterexample, it may be worth trying a larger size like this, if you have time.
Hmm, so using a larger size is more likely to find a counterexample, but it will take longer?
Exactly.
I can guess what -random:seed
does, but what are the -fixed:create
and -fixed:file
options for?
The idea is that the fixed strategy uses fixed values for each type, hence the name. It will use sensible fixed values by default, but it is possible to use your own fixed values, as VDM-SL set expressions. This may help with very specialized cases, especially with complex types. You can probably imagine that an auto-generated record value, with a dozen fields, would be nonsense (albeit legal). So for cases like this, you can create a file with your own sensible values.
Ah! So that's what the -fixed:create
does?
Yes. You can either give it a filename argument or use the default, which is "ranges.qc" in the current directory.
Can we try that?
Sure...
> qc -s fixed -fixed:create
Created 2 default ranges in 'ranges.qc'.
Check them! Then run 'qc -s fixed'
>
$ cat ranges.qc
-- in 'DEFAULT' (test.vdm) at line 3:16
s:seq of (nat) = 20;
-- in 'DEFAULT' (test.vdm) at line 3:16
i:nat1 = 20;
$
Okay wait, why is that -s fixed
there?
Remember, the "fixed" strategy is disabled by default. So the -s fixed
enables it for this run, and disables all the others. If you want (say) "fixed" and "search" you would need to include two -s
options.
I see. And I can see you've listed the "ranges.qc" file, but I was expecting to see VDM values?
But if you look at the -fixed:create
output, it says it created two default values in the ranges.qc file, which you should check. If you just use a number like "20", it will generate 20 values internally. But you can use explicit VDM set expressions instead. So for example, I can change the file to be...
-- in 'DEFAULT' (test.vdm) at line 3:16
s:seq of (nat) = { [a, b, c] | a, b, c in set {1, ..., 10} };
-- in 'DEFAULT' (test.vdm) at line 3:16
i:nat1 = {1, ..., 100};
> qc -s fixed
Using ranges file ranges.qc
Expanding 2 ranges: ..
Ranges expanded in 0.123s
PO #1, FAILED in 0.054s: Counterexample: i = 4, s = [1, 1, 10]
----
f: sequence apply obligation in 'DEFAULT' (test.vdm) at line 3:16
(forall s:seq of (nat), i:nat1 &
i in set inds s)
Note that we need the -s fixed
to enable the strategy as before, but the filename defaults to "ranges.qc". We could use -fixed:file
to use a different ranges file. So you could have a collection of specialized range files to focus on certain subsets of the POs.
Okay, that makes sense. And you can also give specific PO numbers to qc
?
Yes, after the other options. By default, it will check all of the POs in the current module or class. But you can use special range.qc files for subsets of POs and then check them by passing the PO number(s). You can also give an inclusive range of PO numbers with a hyphen, like 100 - 120
, or you can give function or module/class name patterns to process all of the POs that match those patterns.
That could still get a bit tedious?
That's why we have the script
command, remember? So you can put all of these qc
commands into a script and execute them easily.
Cool! So is there anything else I should know about QuickCheck?
One point is that it can't process all POs. In particular, POs generated from operations and statements tend not to be processable. You can still run qc
, but those POs will show a status of UNCHECKED
.
Is that a work in progress?
Yes, though it's difficult - specifications with state are harder to reason about.
Okay. Anything else?
You may very rarely find that the evaluation of the PO with some particular values produces a runtime error in the PO itself. This is rare, but it can happen. In that case, you will get a normal VDMJ error message, as well as a counterexample and the PO. The error is to help you work out why the PO itself is failing, which usually indicates an underlying issue in your spec.
Here's a small example:
functions
f: seq of nat -> nat
f(a) == let s = g(a) in hd s;
g: seq of nat -> seq of nat
g(a) == [hd a] ^ tl a;
> qc
PO #1, FAILED in 0.012s: Counterexample: a = []
Causes Error 4010: Cannot take head of empty sequence in 'DEFAULT' (test.vdm) at line 6:14
----
f: non-empty sequence obligation in 'DEFAULT' (test.vdm) at line 3:32
(forall a:seq of (nat) &
(let s:seq of (nat) = g(a) in
s <> []))
So you can see that on the way to checking s <> []
it has to call g(a)
, which causes an error if a
is empty. You get other POs that tell you that the argument to g(a)
cannot be empty, but until those are fixed, this PO fails during execution.
Okay let's hope that never happens! Anything else?
One point to remember is that, if you're changing your specification, the PO numbers will change. This could be annoying if you have prepared scripts with qc
commands for specific PO numbers. But remember you can give qc
a list of names, meaning "all the POs in this list of functions", which is more stable (until you rename the functions!). You can see the problem.
Yes, tricky to fix that one. Anything else?
One thing to mention is the @QuickCheck
annotation that comes with QC. This can be used to bind particular types to a polymorphic function's type parameters when QC checks it. By default, @T
parameters are bound to real
, but that may not be sensible for all functions. For example:
functions
-- @QuickCheck @T = bool, set of bool;
f[@T]: set of @T -> nat1
f(s) == card s;
> qc
PO #1, FAILED in 0.004s: Counterexample: s = {}, T = bool
----
f: subtype obligation in 'DEFAULT' (test.vdm) at line 3:5
(forall s:set of (@T) &
(card s) > 0)
Notice that the counterexample shows @T
bound to bool
, which was the first of the types given in the @QuickCheck
. This particular example would also fail with the default binding of @T
to real
, but you see the point.
Yes, okay. Anything else?
There's a timeout on the tool, so that really complex examples don't lock up the machine. By default it's five seconds, which should be plenty of time for most POs, but you can get a result status of TIMEOUT
if a spec is very complex. This is roughly the same as MAYBE
. You can change the value using the -t
option to qc
.
But otherwise, I think that's about it. I should probably add a description of how to write your own QuickCheck strategy at some point, but that's an advanced topic :-)
Well, if I get any strategy ideas, I'll let you know!
- TL;DR
- VDM Method
- VDMJ Overview
- VDMJ Docs for Specifiers
- VDMJ Docs for Tool Developers
- VDMJ Docs for LSP Developers