-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default runs too low #190
Comments
I like the point 1) - in fact I'd like to (in addition to saying "run for N runs" and "run for N seconds") be able to say "run indefinitely", although that could probably be emulated with high enough number of runs or seconds. Re 2) - for integers the space is finite but huge, so there this could work, but what about lists and strings and other collections? Would we arbitrarily decide "the input space is all lists below 50 elements"? Re 3) I'm not completely sure these are related. The number of tests needed grows as the the real distribution of the label nears the wanted distribution. (In Hughes' talk the numbers might be made up but anyways, with distributions
This seems different from verification (with some probability p) that the test will never fail. Again I don't know how we'd find the needed number of tests. The one metric that I believe would be able to tell us whether we've tested the program enough, is code coverage guided generation (like AFL does), perhaps with some symbolic execution sprinkled in. If you went through all the meaningfully different paths ( |
Yeah that would work only in addition to specifying some memory limit your application has to fit in. If you specify that your application has to fit into i.e. 100mb of RAM, than all data structures are finite. |
A gripe I've had for a while is that the default runs of 100 is way too low to get decent coverage for most scenarios, and in my experience is pretty low for most test suites.
I recommend usually about 10,000 runs as a base number, then adjust based on desired run time.
I think from a DX perspective having it specified as the number of runs should really be considered as an abstraction leak. Property tests assert that a condition holds for all inputs meeting some criteria; the implementation detail of verifying that assertion is generating a certain number of samples, but the user doesn't necessarily have a great mental model of how many those samples should be (and indeed understanding this requires some fairly non-trivial statistical understanding, as well as knowledge about implementation details of the fuzzers, etc).
Here are some practical suggestions on how to improve this:
A separate issue that could be resolved much more quickly (and is also breaking) is that
Test.fuzzWith
expects an absolute number of runs. I think this is un-ergonomic, since it's a value one needs to keep messing with. A nicer design would be as a multiplier of the globally configured value. This can be used both for "this test is super slow, so let's not waste too much time testing it" to "this test has highly variable behaviour, so let's spend a lot of our time testing the input space", but let's the test runner also influence the total number of tests to run.The text was updated successfully, but these errors were encountered: