-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Constructing Bleachermarks with dynamic Benchmarks #1
Comments
nthiery wrote in private email exchange: A potential alternative to the syntax:
could be:
Potential advantages:
|
Miguel and I discussed the following simple solution:
This is very close to nthiery's suggestion, but takes behaviour from optional arguments into more explicit structures. It's always a choice where to strike the balance between those two. |
I imagine that you mean something like this, right?:
That is, the InfiniteBenchmarkList is parametrized by n (that will correspond to the size of the input generated at the beginning of the pipeline. And each one of those benchmarks uses the runid ( Is that what you meant? |
Haha, I mean the third option :-D
That is, |
Gotcha. What about a fourth option?:
|
Hmm, yes, possibly that's the easiest since that lambda already crept into my other suggestion. I was trying to avoid the lambdas. Also, this suggestion would allow returning different length pipelines for various |
The syntax I proposed was just one of potentially many; it's good to I like the:
I am not sure I see the point of having to create an intermediate I am not in favor of the Here is another variant:
In cases where one wants to make use of the run id, data_gen could
Then bleachermark would detect that optional argument by Cheers, |
My idea is that if the user wants to use a range, he can just concretely
We will probably want to disallow different-sized pipelines in a
We argued that we don't really lose out: when do you want to benchmark
How would I then construct an InfiniteBenchmarkList where the
That's a neat idea we might be able to use whatever structure we end up Best, |
I am having a very hard time following you guys. I feel I'm missing some common context you may have developed at SD75 while I was in a different room. For example, I do not understand what you mean by "adaptive". I do not understand why the list of A syntax |
This was something Nicolas came up with: consider that you are trying to gauge The twist: 1) perhaps you don't know in advance how the function develops (n, That's what's covered by "Adaptive" and why it makes sense to consider an infinite benchmark list (unbounded input size). Best, |
The way we are considering it, two pipelines that differ only in the size of the "data" that is generated by the first pipeline element, they should be two Benchmarks, run in the same Bleachermark. A "benchmark" is really "a single instance of problem, algorithm (pipeline) and input size" which is called many times only to average over random inputs (of the given size). Every time we call the first function of a benchmark's pipeline we are serving it a Thus |
Ok, I'm starting to get it.
Isn't this pure python ? Bleachermark([data_gen(2^n), f] for n in itertools.count()) However something like
In this case, wouldn't it be safer to pass an instance of Or maybe, less brutal, a hex string. |
On Tue, Sep 13, 2016 at 12:28:59AM -0700, Johan Rosenkilde wrote:
+1
+1 Luckily I did not mean to abuse that run_id :-) Let me try to clarify
Cheers, |
On Tue, Sep 13, 2016 at 05:17:15AM -0700, Luca De Feo wrote:
Side note: with this syntax BleacherMark can't know that all Yes, itertools.count() is a reasonable alternative to NN in Python, as |
What more can (realistically) BleacherMark do when it knows, compared to when it does not? |
Mainly, store and present the information in a more coherent way (a homogenous table, with meaningful labels for columns), which would also help with the basic statistical treatment. |
Mainly, store and present the information in a more coherent way (a homogenous table, with meaningful labels for columns), which would also help with the basic statistical treatment.
Ok, but this has nothing specific to infinite benchmarks, so it can be
used against this argument of Johan's:
My idea is that if the user wants to use a range, he can just concretely instantiate the list of pipelines with a comprehension: short and pythonic.
I personally have no opinion on what's the best option: Johan's will
to keep things pythonic is obviously sane, but if being pythonic
limits BleacherMark, then I like Nicolas' `range=` proposal.
The only thing I'm 100% against, is using two different idioms for
finite and infinite benchmarks.
|
Are we talking about two different notions of "the same pipeline" here? Miguel and I have been talking about the Bleachermark assuming that each pipeline has the same length and the parts are pairwise comparable: e.g. each Benchmark has the pipeline:
But the different Benchmarks might be different codes, different decoders and/or different channels. Nicolas seems to be talking about a much stronger notion of "the same", namely that it's actually the same list of functions, and where the first function takes a "size" (or index) argument. That seems to fit Nicolas' use case fine, but I don't see how it could support e.g. the coding use case, where it could any or all elements of the pipeline which are different. An pure algebra example could be cut-off points in an algorithm: def multiply(cutoff): bs = InfiniteBenchmarkList(lambda cutoff: [ gen_100_bit_numbers, multiply(cutoff) ]) Now Luca just replied, and I'll reply to that in a second :-P |
I didn't know
My suggestion is that Bleachermark takes an iterable of Benchmarks or list of functions (in the latter case, Bleachermark wraps the list of functions in Benchmarks itself).
My current problem with Nicolas' suggestion is that I don't see how it covers anything else than his use case. |
def count():
i = 0
while True:
yield i
i += 1 If you need introspection, you can make a tiny wrapper around it in |
Wrapping only |
Wrapping only itertools.count is not enough if you also want to make list comprehensions ranging over it. That was our motivation for the constructor InfiniteBenchmarkList: basically a list comprehension that knows it's infinite.
Right. This is better: you can test for `__len__()`. If the iterable
has it, and if it returns a finite value, then the iterable is finite.
If it does not, then it may be finite or infinte, and in this case you
can warn.
|
Hmm, yes, except that then we throw warnings also in the case where the Your suggestion is "safe": it warns on all the potentially bad cases, With an InfiniteBenchmarkList we (could) warn only if the user That being said, I think the arguments for InfiniteBenchmarkList is Best, |
On Tue, Sep 13, 2016 at 05:40:03AM -0700, miguelmarco wrote:
More than this in fact: knowing that the benchmarks are indexed by NN Sure, this is a special case, but a so common one that it deserves a Cheers, |
On Tue, Sep 13, 2016 at 05:48:46AM -0700, Johan Rosenkilde wrote:
Just to clarify: I totally see the point of Beachermark supporting I agree that it's probably not worth it to special case infinite Cheers, |
More than this in fact: knowing that the benchmarks are indexed by NN
means that Beachermark can try all sorts of adaptive heuristics to run
only a meaningful subset of the benchmarks, a very crude one being to
only try sizes of the form n=2^k or n=floor(a^k). Sure, this
information could be retrieved from the hint="polynomial" optional
argument as well, but still, it seems silly to have to run the
iterator for all the intermediate values.
That's not restricted to what you call homogeneous benchmarks at all. In
my example with multiplication you'd like exactly the same feature.
I do agree that an unfortunate side-effect of the iterator scheme is
that we would have to compute and locally cache every pipeline up to the
maximally considered. I imagine that the adaptive strategy will often
quite sparsely run the benchmarks.
Just to clarify: I totally see the point of Beachermark supporting
collections of benchmarks with different pipelines. I am just saying
that the special case of a collection of benchmarks sharing the exact
same pipeline and differing just by the "size" argument fed to the
first function probably deserves special casing. Let me call this a
homogenous beachmark.
I'm not convinced that homogeneous vs inhomogeneous deserves special
casing. I still don't see what the Bleachermark gains from knowing that
the pipeline is always exactly the same (on top of knowing that the
pipeline are element-wise semantically comparable, as already
described).
I have a new suggestion for supporting infinite Bleachermark. Just give
the pipeline constructor directly to the Bleachermark:
```
def gen_data(n):
return lambda (): randint(2^n)
def f(X):
my_factoring_algo(X)
B = Bleachermark(lambda n: [ gen_data(n), f ])
B.run(strategy="adaptive", hint="exponential")
```
Best,
Johan
|
I am thinking that finite bleachermark will almost always be constructed from lists or tuples, whereas infinite ones will be constructed from other iterators. We could use that to let the bleachermark know if it must treat the case one way or another. |
I am thinking that finite bleachermark will almost always be constructed from
lists or tuples, whereas infinite ones will be constructed from other iterators.
We could use that to let the bleachermark know if it must treat the case one way
or another.
One catch is that generating expressions write like tuples but behave
exactly like an infinite iterator would:
```
iter_finite = ( [ gen_data(n), f ] for n in range(1, 10) )
iter_infinite = ( [ gen_data(n), f ] for n in itertools.count() )
```
The user might easily write the former instead of using `[ ... ]`.
An infinite Bleachermark has to include a lambda in some form, whatever
you do. We should just give that lambda directly to the Bleachermark
without wrappers or ados -- cut out the middle-man as it were. I think
my latest suggestion is the simplest while fully flexible, unambiguous
syntax for achieving this.
Thus I propose supporting basically two syntaxes: 1) explicit, finite
iterable, and 2) a lambda implying an infinite Bleachermark.
```
B_fin = Bleachermark([ gen_data(n), f ] for n in range(1, 10))
B_inf = Bleachermark(lambda n: [ gen_data(n), f ])
```
The first iterable is fully expanded on construction time. The second is
automatically treated as the infinite (for n in NN) Bleachermark. The
Bleachermark constructor wraps the pipelines in Benchmark objects
itself, so users usually never deal with Benchmarks directly.
|
I have been thinking about how to implement these nested schedulers, and so far I haven't found a satisfying design. My idea was that a scheduler is just some object that exposes a .run() method that will return a specific run of a specific benchmark each time it is called. The scheduler, in its turn, will rely on the .run() method of other schedulers to do its job, being benchmarks the "primitive schedulers" in this model. Imagine the following (simple) scenario: we want to combine the RunN scheduler (that runs its arguments N times, with consecutive run id's) with the RepeatN (that repeats the same run id several times). So, for instance
would run something like: benchmark1 with runid 0 Whereas
would run benchmark1 with runid 0 So, if we want this behaviour, RunN has to pass some argument (runid) to RepeatN in the first case, whereas in the second case RunN doesn't make use of any such argument. But I think we should have a common interface in this schedulers, in such a way that it is possible to nest them in all possible ways (although some of them might not make any sense). So what do you think? we include an argument in this interface and we just make most schedulers ignore it? |
Thanks for getting back to this, and sorry for being silent for so long.
Exactly my thought too.
I don't think you would ever want to repeat the same run id, and you
We should get a t-shirt with that. More precisely, a run id is personal to the "primitive schedulers" and Perhaps it's easier to imagine that the runids are not even there. You The semantics of RunN should, IMO, be to ask its benchmark (singular,
RunN should not accept multiple benchmarks -- only a single one. The
Except of course that RepeatN should not exist ;-)
I agree. The interface I propose contains have runids only in the return
(or possibly, it will just be an iterator) A Benchmark has two methods:
The task master thread will keep querying the top-level scheduler for
Best, |
Hi guys, I have done some occasional thinking on this. I am making some experiments with a design where schedulers are just iterators (so, no .run() method, just .next()). They are created by passing other iterators, and they just call those iterators in the appropriate order. The runid's are then handled internally by benchmarks (so each benchmark takes care of being run each time with a different runid, which might be troublesome in parallel settings...) A basic example could be
So Balanced(RunN(b, 4) for b in benchmarks) would just run each benchmark once until all benchmarks have been run 4 times. I am not sure if this is the right design, but we have gone through so many ideas that I cannot even remenber the pros and cons of each one. Thoughts? |
It can be beneficial to specify an infinite list as benchmarks. Use cases include
This issue is to discuss what a reasonable syntax for supporting this is, in the construct of
Bleachermark
and possibly elsewhere.A related discussion is whether to (attempt to) protect the user from combining this with benchmarking strategies in a non-sensical way (e.g. interleaved benchmarking on an infinite list makes no sense).
The text was updated successfully, but these errors were encountered: