Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing input sizes (and distributions) #1

Open
JanMatas opened this issue Nov 11, 2017 · 1 comment
Open

Testing input sizes (and distributions) #1

JanMatas opened this issue Nov 11, 2017 · 1 comment

Comments

@JanMatas
Copy link

Hi,

I was wondering if it is possible to publish approximate sizes of testing inputs or at least some statistics about the test set (median and quantiles), so we know what amount of work we will be dealing with in our algorithms.

Thanks!

@m8pple
Copy link
Contributor

m8pple commented Nov 11, 2017

Given the goal is to look for good scaling across the spectrum of problem sizes, there
is no fixed set of test inputs, as I don't know how good the best solution is going to be.

Instead the approach taken is to give each group a time budget per puzzle. Then puzzle instances
of increasing size are executed until the total per-puzzle time budget is exceeded. So
there is no upper bound on the scale that will be tested. The spacing of puzzle scales
tested is chosen heuristically for each puzzle to give a reasonable spread and resolution
of points, while still making it possible for fast implementations to reach large scales
(the heuristic sometimes gets adjusted as things proceed and solutions get faster).

The time budget in the auto-tests is also less than in the final tests, just for financial
reasons. There are about ~360 puzzles to evaluate across all groups, so allocating
30 seconds per puzzle takes about 21600 CPU seconds, or about 3-4 hours, which is
probably about right for intermediate runs. For the final run it will be higher though,
as the turnaround time doesn't matter much - maybe 5-10 minutes per puzzle,
so 1.25 - 2.5 CPU days.

So trying to optimise for execution times longer than around 5 minutes per puzzle
instance is probably not worth it, as it is too expensive in time to evaluate them
all in a controlled environment. However, different groups will probably achieve quite
different scales within that sort of time budget, in some cases by a couple of orders of
magnitude.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants