Skip to content

Using QuickCheck

Nick Battle edited this page Jan 4, 2024 · 67 revisions

So QuickCheck sounds like something that won't take long to explain?

Well, the basics are pretty quick yes, but the name really means that checks are performed quickly.

Okay, so what checks are we talking about here?

Do you remember what a Proof Obligation is, sometimes called a "PO"?

Mmmm, yeah you did mention that, but... why don't you remind me?

Sure. We'll need an example to work with anyway, so let start with that. Imagine you had this very small specification, with just one function:

functions
    f: seq of nat * nat -> nat
    f(s, i) == s(i);

That's pretty small. But it's also pointless, surely? Calling f(s, i) is just the same as using s(i)?

Yes, it's just a simple example to show you how QuickCheck works. As you say, all this function does is return the i'th element of the sequence passed in. So can you see anything wrong with this spec?

Well... not really. I mean, it just returns the i'th element. It looks fine.

So what would happen if I passed an empty sequence?

Ah yes, okay I see...

Or if I passed a valid sequence, but an index of zero or one beyond the end of the sequence?

Right, it would fail. But why don't the language tools tell me about this?

Well, the spec "isn't wrong" - it works perfectly if you use the function as intended. But if you aren't careful, there are problems. So really, the tools are saying that this is perfectly acceptable, BUT you have an obligation to call the function with sensible arguments.

That's quite a big BUT there. We're building formally defined systems, so we can't depend on callers "being careful", surely?

Absolutely not. So the tools can tell you about this.

I thought you said they didn't?

The default parse and typecheck doesn't raise this issue, because as I said, it "isn't wrong". But there is another level of checking that the tools can give you. In cases like this, the tools can produce "Proof Obligations" that effectively define what callers of this function must guarantee.

You've mentioned the Proof Obligation Generator before, I remember.

That's the one.

So what does a Proof Obligation look like?

Let's try with the example. You generate POs with the "pog" command:

> pog
Generated 1 proof obligation:

Proof Obligation 1: (Unproved)
f: sequence apply obligation in 'DEFAULT' (test.vdm) at line 3:16
(forall s:seq of (nat), i:nat &
  i in set inds s)

Alright, hold on... that's just a VDM-SL expression?

We like to eat our own dogfood. But can you see what it's saying?

So it's a boolean expression for a start...

True.

And it's saying that for every possible seq of nat and every possible nat, the latter is a valid index of the former. Okay, so yes, this is saying that every number is a valid index of every sequence... but that's rubbish, surely?

Or "false", yes. There are some sequences and natural numbers that work together, but it isn't generally true for all sequences and numbers.

So, hold on. This "PO" is just a VDM expression. I can see how it relates to the spec, but it doesn't call f to test it?

Right. It isn't like a unit test. It's an "obligation" on you that follows from the way you've written the function. So you must prove that this obligation is met - sometimes we use the term "discharge", meaning to show that the obligation expression is always true.

But it isn't true, surely?

I agree. But how do you know?

Well, just look at it! It's saying that all numbers are valid indexes for all sequences. That's rubbish. 12345 isn't a valid index of [1,2,3] for example, and zero isn't a valid index of anything. So it's obviously false.

Exactly. You know that the obligation isn't provable because you can think of a counterexample. That's definitely a way to show that an obligation cannot be met or discharged or proved or whatever you want to call it.

Okay, so the PO generator will show you the obligations, and then you have to try to think of counterexamples?

Well, that works for a simple example like this. But a spec doesn't have to get much more complicated before its POs are much harder to understand and discharge.

But can't the tools do that for me?

Sometimes, yes. That is what QuickCheck does. It tries to quickly find counterexamples to Proof Obligations.

Ah, hence the name! I see. So could QuickCheck find a counterexample here?

Let's try:

> qc
PO #1, FAILED in 0.001s: Counterexample: i = 0, s = []
----
f: sequence apply obligation in 'DEFAULT' (test.vdm) at line 3:16
(forall s:seq of (nat), i:nat &
  i in set inds s)

Wow, that's quick!

It's a VERY small example, but yes, it is trying to take no more than a few seconds to find a counterexample.

And qc just means QuickCheck?

Yes, you can actually type quickcheck if you want, but... we're trying to be quick here.

So hold on, let me try to understand what it's saying. The PO "failed" - that's failed to verify in some way, I guess - and the counterexample is shown as a value for i and s. Those are from the forall at the start of the PO. And sure enough, i is not in the set inds s. Fantastic!

Right, so we quickly identified a problem. But what should we do about it?

Why do we have to do anything about it?

If you leave it as it stands, and someone calls this function without meeting this obligation (and nothing is forcing them to), the specification will give undefined behaviour (in the real world, the system will crash). So until this is fixed, you're depending on human beings being careful. That's not good!

Okay, but surely if we fix this problem, won't it just mean that the callers of the function have to change? We've just shifted the problem somewhere else?

That's true. But if they have to change too, it means that there genuinely was a problem with what they were doing. And fixing that will allow us to find and fix problems with their callers, and so on. So the process of discharging POs ripples through your specification, bringing benefits throughout.

Okay I get that. So to fix this example, we have to tweak the spec somehow?

Yes, but how?

Hmmm. Well, that PO will never be true, so we have to change the spec so that it doesn't generate this PO, or perhaps doesn't generate any POs?

Basically, yes that's right. Either we have to tweak the spec so that it generates a PO that is always true, or so that there are simply no obligations placed on us. That will then have "tightened" the specification to the point where we are sure that it will always do what we expect, instead of giving undefined behaviour for some inputs. As you said before, this is a formal specification, so it should be well defined for all valid inputs.

So what can we tweak to fix this?

In cases like this, the problem is because the arguments passed to the function are not sufficiently constrained to guarantee what the function does. They're too loose, in a sense.

So we can tighten them?

Of course. Any idea how?

Hmm. One thing I thought of when you mentioned passing zero. The index to a sequence in VDM starts at 1. So really the i parameter ought to be a nat1. Would that be enough?

Let's try that. Here's the new obligation and qc result...

> pog
Generated 1 proof obligation:

Proof Obligation 1: (Unproved)
f: sequence apply obligation in 'DEFAULT' (test.vdm) at line 3:16
(forall s:seq of (nat), i:nat1 &
  i in set inds s)

> qc
PO #1, FAILED in 0.003s: Counterexample: i = 1, s = []
----
f: sequence apply obligation in 'DEFAULT' (test.vdm) at line 3:16
(forall s:seq of (nat), i:nat1 &
  i in set inds s)

> 

Okay, so I see that the i is now a nat1, so it's just switched the counterexample from 0 to 1. But we're still not done fixing the spec then? We still have to fix that obligation?

Yes. Any other ideas how?

All I can think of is spelling it out... adding a precondition that says the index matches the sequence?

Okay let's try that... I'll add a precondition that says i in set inds s

> pog
Generated 1 proof obligation:

Proof Obligation 1: (Unproved)
f: sequence apply obligation in 'DEFAULT' (test.vdm) at line 3:16
(forall s:seq of (nat), i:nat1 & pre_f(s, i) =>
  i in set inds s)

> qc
PO #1, MAYBE in 0.001s
>

LOL, well I suppose MAYBE is an improvement. But what does that mean?

MAYBE means that it couldn't find any counterexamples.

But if there are no counterexamples, it must always be true, surely?

It only means it couldn't find any counterexamples. Maybe they exist, but it couldn't find one quickly.

Hmmm. I was hoping we could prove things. That's an important part of using formal specification, right?

Yes, if you want very high confidence in your specification, you need to prove its obligations. But that is a non-trivial process in general. The QuickCheck tool is trying to quickly catch the "stupid cases" where you have missed a check or a precondition and so on. That helps improve specifications, but a QuickCheck MAYBE is not the same as a proof. It's necessary, but not sufficient.

So qc can't really prove anything?

It is really designed to disprove POs. But there are also some cases where it can indicate POs should be provable. For example when POs follow very simple patterns which are known always to be true, or if you have finite types, where it can check every possible value, or if the PO starts with "exists" rather than "forall", then one witness value is proof of existence.

But this still only means that it should be provable, not that it is actually proved?

Yes. We have to be careful in what we claim about proof. QuickCheck is not a prover/solver and there are lots of subtle aspects to fully formal proof that the tool cannot analyse. But if we claim an obligation is PROVABLE it means we have reason to believe that a proof should be easy to find, and the tool may explain how it came to that conclusion. This is very useful, but it is not a formal proof.

So for example, if I wrap the s(i) result in a check, like if i in set inds s then s(i) else 0, you get the following:

> pog
Generated 1 proof obligation:

Proof Obligation 1: (Unproved)
f: sequence apply obligation in 'DEFAULT' (test.vdm) at line 3:40
(forall s:seq of (nat), i:nat1 &
  ((i in set (inds s)) =>
    i in set inds s))

> qc
PO #1, PROVABLE by trivial (i in set (inds s)) in 0.001s
> 

You can see the PO says, "if the index works, the index works". Well duh, of course that's true, so the PO is "trivially" provable, and it shows you the expression that it knew had to be true.

So why didn't that work with that same clause in the precondition?

That's because the precondition uses pre_f(s, i) as the test in the PO. So qc can't quickly tell that it matches the simple pattern. A more powerful proof engine would be able to do that though.

And you said that finite types could be checked exhaustively? Does that lead to a provable PO?

Yes! It works when your Proof Obligation only uses finite types, and the size of the type must be less that 100,000 by default. But sometimes it can indicate these POs are provable. For example, imagine a function like this:

functions
	f: set of set of bool -> nat1
	f(s) ==
		if s = {}
		then 999
		else card s;

> pog
Generated 1 proof obligation:

Proof Obligation 1: (Unproved)
f: subtype obligation in 'DEFAULT' (test.vdm) at line 2:5
(forall s:set of (set of (bool)) &
  (if (s = {}) then 999 else (card s)) > 0)

> qc
PO #1, PROVABLE by finite types in 0.011s
> 

Admittedly this is a contrived example, but you see that the obligation says that the body of the function must produce a nat1 for all sets of sets of booleans. But that is a finite type (there are only 16 members of the type), so it can easily try them all. And since there are no counterexamples found, and every value was tried, the PO should be PROVABLE as you see.

Okay, so QuickCheck is small and fast and tries to find problems, but ultimately for formal proof you need more powerful tools?

Exactly. Though we can keep improving qc to try more and more strategies, to find as many problems as we can. Ultimately, we can write a strategy that calls out to a more powerful solver.

What's a strategy?

Internally, qc uses several strategies to find counterexamples. There are five of them built into the tool, which you can see with -help:

> qc -help
Usage: quickcheck [-?|-help][-q|-v][-t <secs>][-i <status>]*[-s <strategy>]* [-<strategy:option>]* [<PO numbers/ranges/patterns>]

  -?|-help           - show command help
  -q|-v              - run with minimal or verbose output
  -t <secs>          - timeout in secs
  -i <status>        - only show this result status
  -s <strategy>      - enable this strategy (below)
  -<strategy:option> - pass option to strategy
  PO# numbers        - only process these POs
  PO# - PO#          - process a range of POs
  <pattern>          - process PO names or modules matching

Enabled strategies:
  fixed [-fixed:file <file> | -fixed:create <file>][-fixed:size <size>]
  search (no options)
  finite [-finite:size <size>]
  trivial (no options)
  direct (no options)

Disabled strategies (add with -s <name>):
  random [-random:size <size>][-random:seed <seed>]
> 

So that shows the general usage, and that there are five enabled strategies ("fixed", "search", "finite", "trivial" and "direct") and one disabled one ("random"). More strategies can be added as plugins.

Why would you disable a strategy?

You might choose to disable strategies that are very specialized or expensive to run, or if they more or less duplicate an enabled one. For example, the "random" strategy chooses moderately random numbers for a nat1; the "fixed" strategy chooses 1, 2, 3, ... Those are very similar, so the random one is disabled by default. If they were both enabled, it would still work but it would not be as quick.

Okay, I see. What does the "search" strategy do?

It takes a different approach. Instead of generating lots of values for each forall binding, it searches the body of the PO expression, looking for boolean checks that it might be able to violate. So if the PO has x <> 0, it will try setting x to zero, naively. There are a dozen or more of these cases that it can violate easily, without complex analysis.

Okay, I see how that could work. What about "finite"?

That one works for finite types - like set of bool or <A>|<B>|<C> that have a finite number of values. In these cases, if the type is small enough, we can enumerate all of the possible values, rather than selecting just a few as the other strategies do.

Is the "trivial" strategy as trivial as it sounds?

Pretty much, yes. This strategy takes advantage of the fact that some POs have a simple structure like <something> => <something>. You get this because spec often guard against <something> before doing something that requires it to be true. So the POs reflect that, but these are "trivially true". We saw an example of that earlier.

And "direct"?

This strategy is slightly different in that it is not trying to discharge obligations themselves. Rather, it works out what the obligation is trying to verify, and achieves the same effect "directly" - by a different analysis of the specification itself.

Umm... you'd better give me an example.

Well, for example, total function obligations are trying to verify that functions declared to be total genuinely produce a result for every possible argument value. So the PO says, "for all arguments, applying the function with those argument produces a value of the right type". It is possible to prove that obligation expression, but it is also possible to directly show that the body of the function is a total function - because it is comprised entirely of total operators and returns a well defined type, so it will always produce a result.

Okay, I see. And what do the extra options do, like -fixed:size?

The size option sets the number of values for strategies to generate. Sizes usually default to a small number like 20, so that the tool is quick. But if you want to really chew on a particular PO, you could increase this to (say) 1000 values or more. For example:

functions
    f: nat1 -> nat
    f(i) == let s = [a | a in set {1,...,1000}] in s(i);

> qc
PO #1, MAYBE in 0.074s

> qc -fixed:size 2000
PO #1, FAILED in 1.195s: Counterexample: i = 1001
----
f: sequence apply obligation in 'DEFAULT' (test.vdm) at line 4:48
(forall i:nat1 &
  (let s:seq of (nat1) = [a | a in set {1, ... ,1000}] in
    i in set inds s))

So you see here that the 2000 value run took nearly 50x longer than the default, but it successfully found a counterexample, whereas the default wasn't sure. So if the default settings don't find a counterexample, it may be worth trying a larger size like this, if you have time.

Hmm, so using a larger size is more likely to find a counterexample, but it will take longer?

Exactly.

I can guess what -random:seed does, but what are the -fixed:create and -fixed:file options for?

The idea is that the fixed strategy uses fixed values for each type, hence the name. It will use sensible fixed values by default, but it is possible to use your own fixed values, as VDM-SL set expressions. This may help with very specialized cases, especially with complex types. You can probably imagine that an auto-generated record value, with a dozen fields, would be nonsense (albeit legal). So for cases like this, you can create a file with your own sensible values.

Ah! So that's what the -fixed:create does?

Yes. You can either give it a filename argument or use the default, which is "ranges.qc" in the current directory.

Can we try that?

Sure...

> qc -fixed:create
Created 2 default ranges in 'ranges.qc'.
Check them! Then run 'qc -s fixed'
>

$ cat ranges.qc
-- in 'DEFAULT' (test.vdm) at line 3:16
s:seq of (nat) = 20;
-- in 'DEFAULT' (test.vdm) at line 3:16
i:nat1 = 20;
$

Okay wait, I can see you've listed the "ranges.qc" file, but I was expecting to see VDM values?

But if you look at the -fixed:create output, it says it created two default values in the ranges.qc file, which you should check. If you just use a number like "20", it will generate 20 values internally. But you can use explicit VDM set expressions instead. So for example, I can change the file to be...

-- in 'DEFAULT' (test.vdm) at line 3:16
s:seq of (nat) = { [a, b, c] | a, b, c in set {1, ..., 10} };
-- in 'DEFAULT' (test.vdm) at line 3:16
i:nat1 = {1, ..., 100};

> qc
PO #1, FAILED in 0.054s: Counterexample: i = 4, s = [1, 1, 10]
----
f: sequence apply obligation in 'DEFAULT' (test.vdm) at line 3:16
(forall s:seq of (nat), i:nat1 &
  i in set inds s)

Note that the filename defaults to "ranges.qc". We could use -fixed:file to use a different ranges file. So you could have a collection of specialized range files to focus on certain subsets of the POs.

Okay, that makes sense. And you can also give specific PO numbers to qc?

Yes, after the other options. By default, it will check all of the POs in the current module or class. But you can use special range.qc files for subsets of POs and then check them by passing the PO number(s). You can also give an inclusive range of PO numbers with a hyphen, like 100 - 120, or you can give function or module/class name patterns to process all of the POs that match those patterns.

That could still get a bit tedious?

That's why we have the script command, remember? So you can put all of these qc commands into a script and execute them easily.

Cool! So is there anything else I should know about QuickCheck?

One point is that it can't process all POs. In particular, POs generated from operations and statements tend not to be processable. You can still run qc, but those POs will show a status of UNCHECKED.

Is that a work in progress?

Yes, though it's difficult - specifications with state are harder to reason about.

Okay. Anything else?

You may very rarely find that the evaluation of the PO with some particular values produces a runtime error in the PO itself. This is rare, but it can happen. In that case, you will get a normal VDMJ error message, as well as a counterexample and the PO. The error is to help you work out why the PO itself is failing, which usually indicates an underlying issue in your spec.

Here's a small example:

functions
	f: seq of nat -> nat
	f(a) == let s = g(a) in hd s;
	
	g: seq of nat -> seq of nat
	g(a) == [hd a] ^ tl a;

> qc
PO #1, FAILED in 0.012s: Counterexample: a = []
Causes Error 4010: Cannot take head of empty sequence in 'DEFAULT' (test.vdm) at line 6:14
----
f: non-empty sequence obligation in 'DEFAULT' (test.vdm) at line 3:32
(forall a:seq of (nat) &
  (let s:seq of (nat) = g(a) in
    s <> []))

> break f
Created break [1] in 'DEFAULT' (test.vdm) at line 3:13
3:  	f(a) == let s = g(a) in hd s;

> qcrun 1
=> print f([])
Stopped break [1] in 'DEFAULT' (test.vdm) at line 3:13
3:  	f(a) == let s = g(a) in hd s;
MainThread> s
6:  	g(a) == [hd a] ^ tl a;
MainThread> s
Error 4010: Cannot take head of empty sequence in 'DEFAULT' (test.vdm) at line 6:14
6:  	g(a) == [hd a] ^ tl a;
MainThread> 

So you can see that on the way to checking s <> [] it has to call g(a), which causes an error if a is empty. You get other POs that tell you that the argument to g(a) cannot be empty, but until those are fixed, this PO fails during execution. The debugging session shows the qcrun command, which will invoke a print command from a PO with a counterexample. You can see in this case it made print f([]), which we then step up to the point of the error.

Okay, so qcrun looks useful for debugging counterexamples generally. Anything else I should know?

One point to remember is that, if you're changing your specification, the PO numbers will change. This could be annoying if you have prepared scripts with qc commands for specific PO numbers. But remember you can give qc a list of names, meaning "all the POs in this list of functions", which is more stable (until you rename the functions!). You can see the problem.

Yes, tricky to fix that one. Anything else?

One thing to mention is the @QuickCheck annotation that comes with QC. This can be used to bind particular types to a polymorphic function's type parameters when QC checks it. By default, @T parameters are bound to real, but that may not be sensible for all functions. For example:

functions
	-- @QuickCheck @T = bool, set of bool;
	f[@T]: set of @T -> nat1
	f(s) == card s;

> qc
PO #1, FAILED in 0.004s: Counterexample: s = {}, T = bool
----
f: subtype obligation in 'DEFAULT' (test.vdm) at line 3:5
(forall s:set of (@T) &
  (card s) > 0)

Notice that the counterexample shows @T bound to bool, which was the first of the types given in the @QuickCheck. This particular example would also fail with the default binding of @T to real, but you see the point.

Yes, okay. Anything else?

There's a timeout on the tool, so that really complex examples don't lock up the machine. By default it's five seconds, which should be plenty of time for most POs, but you can get a result status of TIMEOUT if a spec is very complex. This is roughly the same as MAYBE. You can change the value using the -t option to qc. There's also a -q option to do the checks without much output, which is useful if you have another display of the results in a GUI tool, and there's a -v option to be verbose, which can help if you're unsure what's happening.

But otherwise, I think that's about it. I should probably add a description of how to write your own QuickCheck strategy at some point, but that's an advanced topic :-)

Well, if I get any strategy ideas, I'll let you know!