You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was wondering if it is possible to publish approximate sizes of testing inputs or at least some statistics about the test set (median and quantiles), so we know what amount of work we will be dealing with in our algorithms.
Thanks!
The text was updated successfully, but these errors were encountered:
Given the goal is to look for good scaling across the spectrum of problem sizes, there
is no fixed set of test inputs, as I don't know how good the best solution is going to be.
Instead the approach taken is to give each group a time budget per puzzle. Then puzzle instances
of increasing size are executed until the total per-puzzle time budget is exceeded. So
there is no upper bound on the scale that will be tested. The spacing of puzzle scales
tested is chosen heuristically for each puzzle to give a reasonable spread and resolution
of points, while still making it possible for fast implementations to reach large scales
(the heuristic sometimes gets adjusted as things proceed and solutions get faster).
The time budget in the auto-tests is also less than in the final tests, just for financial
reasons. There are about ~360 puzzles to evaluate across all groups, so allocating
30 seconds per puzzle takes about 21600 CPU seconds, or about 3-4 hours, which is
probably about right for intermediate runs. For the final run it will be higher though,
as the turnaround time doesn't matter much - maybe 5-10 minutes per puzzle,
so 1.25 - 2.5 CPU days.
So trying to optimise for execution times longer than around 5 minutes per puzzle
instance is probably not worth it, as it is too expensive in time to evaluate them
all in a controlled environment. However, different groups will probably achieve quite
different scales within that sort of time budget, in some cases by a couple of orders of
magnitude.
Hi,
I was wondering if it is possible to publish approximate sizes of testing inputs or at least some statistics about the test set (median and quantiles), so we know what amount of work we will be dealing with in our algorithms.
Thanks!
The text was updated successfully, but these errors were encountered: