Constructing Bleachermarks with dynamic Benchmarks #1

johanrosenkilde · 2016-09-12T08:37:56Z

It can be beneficial to specify an infinite list as benchmarks. Use cases include

Adaptatively run benchmarks for X seconds
Sequentially run benchmarks for X seconds, each benchmark run for Y seconds.

This issue is to discuss what a reasonable syntax for supporting this is, in the construct of Bleachermark and possibly elsewhere.

A related discussion is whether to (attempt to) protect the user from combining this with benchmarking strategies in a non-sensical way (e.g. interleaved benchmarking on an infinite list makes no sense).

The text was updated successfully, but these errors were encountered:

johanrosenkilde · 2016-09-12T08:46:25Z

nthiery wrote in private email exchange:

A potential alternative to the syntax:

    def my_benchmark(n): return Benchmark([data_generator, sort])
    BleacherMark(my_benchmark, adaptative=True, total_time=...)

could be:

    BleacherMark([data_generator, sort], range=iterable, adaptative=True, ...)

Potential advantages:

BleacherMark gets more information: it knows it's always the same pipeline.
Passing a range gives flexibility. Typical useful values:
- range(10)
- NN (in Sage ...)
- [2^n for n in range(10)]
- (2^n for n in NN)
adaptative="polynomial" could then provide a reasonable default
value for the range, like (2^n for n in NN)

johanrosenkilde · 2016-09-12T08:48:49Z

Miguel and I discussed the following simple solution:

A Bleachermark takes any iterable of iterables (we discussed aside that users should possibly not be forced to wrap pipelines in Benchmarks themselves).
An iterable can be infinite. We provide a simple class for providing such an infinite list, e.g.

In [1]: def data_gen(n):
            # generation of input data
            return randomint(2^n)
In [2]: def f(x):
            # computation I wish to benchmark
            return sqrt(x)
In [3]: bs = InfiniteBenchmarkList([ data_gen, lambda n: f ])
In [4]: B = Bleachermark(bs)
In [5]: B.strategy("adaptive", hint="polynomial_time")

This is very close to nthiery's suggestion, but takes behaviour from optional arguments into more explicit structures. It's always a choice where to strike the balance between those two.

miguelmarco · 2016-09-12T09:48:26Z

I imagine that you mean something like this, right?:

In [1]: def data_gen(n):
            def genrandom(i):
                # generation of input data
                random.seed(i)
                return randomint(2^n)
            return genrandom
In [2]: def f(x):
            # computation I wish to benchmark
            return sqrt(x)
In [3]: bs = InfiniteBenchmarkList([ data_gen, f])

That is, the InfiniteBenchmarkList is parametrized by n (that will correspond to the size of the input generated at the beginning of the pipeline. And each one of those benchmarks uses the runid (i) to set the random seed.

Is that what you meant?

johanrosenkilde · 2016-09-12T10:02:33Z

Haha, I mean the third option :-D

<what you wrote above>
In [3]: bs = InfiniteBenchmarkList([ data_gen, lambda n: f])

That is, InfiniteBenchmarkList wouldn't be able to distinguish f(x) from lambda n: f, so it couldn't know whether to apply n to it's arguments or not. So it should always do so, in which case you have to wrap a "constant" pipeline function in a lambda.

miguelmarco · 2016-09-12T11:30:18Z

Gotcha. What about a fourth option?:

In [3]: bs = InfiniteBenchmarkList(lambda n : [ data_gen(n),  f])

johanrosenkilde · 2016-09-12T11:37:46Z

Hmm, yes, possibly that's the easiest since that lambda already crept into my other suggestion. I was trying to avoid the lambdas. Also, this suggestion would allow returning different length pipelines for various n. But of course, we're semantically disallowing that anyway, so perhaps it's not imperative that we syntactically disallow it as well.

nthiery · 2016-09-12T18:40:02Z

Hi,

The syntax I proposed was just one of potentially many; it's good to
explore around!

I like the:

    B.strategy("adaptive", hint="polynomial_time").

I am not sure I see the point of having to create an intermediate
InfiniteBenchmarkList object (at least from a user interface point of
view; it could of course be useful internally). Also specifying a
range seems more flexible; but maybe that can be added later, with
"Infinite" above meaning for now "indexed by natural integers".

I am not in favor of the lambda n: xxx: they introduces some
additional complexity for the caller and hide information to
bleachermark. For example, the fact that the pipeline is explicitly
always the same is actually a feature that bleachermark may actually
want to make use of.

Here is another variant:

In [1]: def data_gen(n):
            return randomint(2^n)
In [2]: def f(x):
            return sqrt(x)
In [3]: bs = InfiniteBenchmarkList([ data_gen, f])

In cases where one wants to make use of the run id, data_gen could
take an optional argument:

In [1]: def data_gen(n, run_id=i):
            random.seed(i)
            return randomint(2^n)

Then bleachermark would detect that optional argument by
introspection, and pass down the run id.

Cheers,

johanrosenkilde · 2016-09-12T19:04:54Z

I am not sure I see the point of having to create an intermediate
InfiniteBenchmarkList object (at least from a user interface point of
view; it could of course be useful internally). Also specifying a
range seems more flexible; but maybe that can be added later, with
"Infinite" above meaning for now "indexed by natural integers".

My idea is that if the user wants to use a range, he can just concretely
instantiate the list of pipelines with a comprehension: short and
pythonic. The construction for InfiniteBenchmarkList is basically as a
convenient way to make an (enriched) generating expression ranging over
NN (which is not available in pure Python).

I am not in favor of the lambda n: xxx: they introduces some
additional complexity for the caller and hide information to
bleachermark. For example, the fact that the pipeline is explicitly
always the same is actually a feature that bleachermark may actually
want to make use of.

We will probably want to disallow different-sized pipelines in a
Bleachermark in any case. We had a discussion about this, but assuming
that the pipeline always has the same length and same "semantics", we
can simplify a lot:

We can assume the one set of labels across all benchmarks. So the
labels will be stored in the Bleachermark.
Data presentation is vastly simplified since the data fits much better
in a tabular way.

We argued that we don't really lose out: when do you want to benchmark
across two very different pipelines while still comparing the data? It's
probably possible to come up with something, but in most of those cases
you could probably modulate your "1-pipeline-fits-all" to encompass
both. For instance, if comparing two algorithms, one which had an
intermediate "optimise data" step; the other algorithm could just be
said to have the trivial "optimise data" step.

Here is another variant:

In [1]: def data_gen(n):
            return randomint(2^n)
In [2]: def f(x):
            return sqrt(x)
In [3]: bs = InfiniteBenchmarkList([ data_gen, f])

How would I then construct an InfiniteBenchmarkList where the
f-functions also depend on n?

In cases where one wants to make use of the run id, data_gen could
take an optional argument:
In [1]: def data_gen(n, run_id=i):
            random.seed(i)
            return randomint(2^n)
Then bleachermark would detect that optional argument by
introspection, and pass down the run id.

That's a neat idea we might be able to use whatever structure we end up
with. I've been annoyed at those run_ids all along - Miguel and I
discussed for an hour or so whether to have them at all ;-)

Best,
Johan

defeo · 2016-09-12T21:57:04Z

I am having a very hard time following you guys. I feel I'm missing some common context you may have developed at SD75 while I was in a different room.

For example, I do not understand what you mean by "adaptive".

I do not understand why the list of Benchmarks should be infinite, whereas Nicolas' use case, if I understand correctly, simply asks to run the same pipeline with inputs coming from an arbitrary iterable, rather than just from [0,N].

A syntax B.run(range=<iterable>), with B.run(N=100) being an alias for B.run(range=range(100)), seems good enough to me.

johanrosenkilde · 2016-09-13T07:24:16Z

For example, I do not understand what you mean by "adaptive".

This was something Nicolas came up with: consider that you are trying to gauge
how (the running time of) some algorithm develops on larger and larger input. So you'd like to run the algorithm on a "representative set" of input sizes that allows you to plot a size/speed curve with nicely spaced points.

The twist: 1) perhaps you don't know in advance how the function develops (n,
n^2, n^3, 2^n?), and you don't know how long you are willing to run the
computaiton. What you'd like is to just start running the benchmarks, go get a
cup of coffee, come back and interrupt the computation and inspect.

That's what's covered by "Adaptive" and why it makes sense to consider an infinite benchmark list (unbounded input size).

Best,
Johan

johanrosenkilde · 2016-09-13T07:28:59Z

The way we are considering it, two pipelines that differ only in the size of the "data" that is generated by the first pipeline element, they should be two Benchmarks, run in the same Bleachermark. A "benchmark" is really "a single instance of problem, algorithm (pipeline) and input size" which is called many times only to average over random inputs (of the given size).

Every time we call the first function of a benchmark's pipeline we are serving it a run_id. But this is only supposed to seed a random generator, look up a line number in a data input file, or similar. In particular, don't use it for encoding size of problem. That's abuse.

Thus Bleachermark.run(range=<iterable>) really has no sense. The range could range over run_id but that would only increase the probability that people will abuse the run_id for encoding sizes and such :-)

defeo · 2016-09-13T12:17:15Z

Ok, I'm starting to get it.

The construction for InfiniteBenchmarkList is basically as a convenient way to make an (enriched) generating expression ranging over NN (which is not available in pure Python).

Isn't this pure python ?

Bleachermark([data_gen(2^n), f] for n in itertools.count())

However something like InfinteBenchmarkList could be a handy utility function to generate the above.

Every time we call the first function of a benchmark's pipeline we are serving it a run_id. But this is only supposed to seed a random generator, look up a line number in a data input file, or similar. In particular, don't use it for encoding size of problem. That's abuse.

In this case, wouldn't it be safer to pass an instance of random.Random to the pipeline? That would prevent any abuse.

Or maybe, less brutal, a hex string.

nthiery · 2016-09-13T12:24:09Z

On Tue, Sep 13, 2016 at 12:28:59AM -0700, Johan Rosenkilde wrote:

The way we are considering it, two pipelines that differ only in the
size of the "data" that is generated by the first pipeline element,
they should be two Benchmarks, run in the same Bleachermark. A
"benchmark" is really "a single instance of problem, algorithm
(pipeline) and input size" which is called many times only to average
over random inputs (of the given size).

+1

Every time we call the first function of a benchmark's pipeline we are
serving it a run_id. But this is only supposed to seed a random
generator, look up a line number in a data input file, or similar. In
particular, don't use it for encoding size of problem. That's abuse.

+1

Luckily I did not mean to abuse that run_id :-) Let me try to clarify
this with a documentation. Note that this is about the semantic, not
the specific syntax:

BlearcherMark(pipeline, range=iterable)

    pipeline -- a pipeline of functions
    iterable -- an iterable of data

Return a collection of benchmarks sharing the same pipeline and
differing only by the data fed to the first item of the pipeline.

The first item `f` of the pipeline shall take a mandatory
argument. It can also accept a ``run_id`` optional argument.
Typically `f` takes as mandatory input a size (e.g represented by a
non negative integer) and generates some random data for that size.

EXAMPLES::

    sage: B = BleacherMark([random_list, sort], range=NN)
    sage: B.run(adaptative=True, hint="polynomial")

    sage: def data_gen(n, run_id=None):
    ....:     random.seed(run_id)
    ....:     return ...(n)
    sage: B = BleacherMark([data_gen, f], range=...)

Cheers,

nthiery · 2016-09-13T12:31:54Z

On Tue, Sep 13, 2016 at 05:17:15AM -0700, Luca De Feo wrote:

Bleachermark([data_gen(2^n), f] for n in itertools.count())

Side note: with this syntax BleacherMark can't know that all
benchmarks share the same pipeline.

Yes, itertools.count() is a reasonable alternative to NN in Python, as
long as BleacherMark does not try to detect that the range is NN to do
something specific in that case.

defeo · 2016-09-13T12:35:33Z

Side note: with this syntax BleacherMark can't know that all benchmarks share the same pipeline.

What more can (realistically) BleacherMark do when it knows, compared to when it does not?

miguelmarco · 2016-09-13T12:40:02Z

What more can (realistically) BleacherMark do when it knows, compared to when it does not?

Mainly, store and present the information in a more coherent way (a homogenous table, with meaningful labels for columns), which would also help with the basic statistical treatment.

defeo · 2016-09-13T12:47:30Z

Mainly, store and present the information in a more coherent way (a homogenous table, with meaningful labels for columns), which would also help with the basic statistical treatment.

Ok, but this has nothing specific to infinite benchmarks, so it can be used against this argument of Johan's:

My idea is that if the user wants to use a range, he can just concretely instantiate the list of pipelines with a comprehension: short and pythonic.

I personally have no opinion on what's the best option: Johan's will to keep things pythonic is obviously sane, but if being pythonic limits BleacherMark, then I like Nicolas' `range=` proposal. The only thing I'm 100% against, is using two different idioms for finite and infinite benchmarks.

johanrosenkilde · 2016-09-13T12:48:45Z

What more can (realistically) BleacherMark do when it knows, compared to when it does not?

Mainly, store and present the information in a more coherent way (a homogenous table, with meaningful > labels for columns), which would also help with the basic statistical treatment.

Are we talking about two different notions of "the same pipeline" here? Miguel and I have been talking about the Bleachermark assuming that each pipeline has the same length and the parts are pairwise comparable: e.g. each Benchmark has the pipeline:

   message_gen -> encoding -> channel -> decoding

But the different Benchmarks might be different codes, different decoders and/or different channels.

Nicolas seems to be talking about a much stronger notion of "the same", namely that it's actually the same list of functions, and where the first function takes a "size" (or index) argument.

That seems to fit Nicolas' use case fine, but I don't see how it could support e.g. the coding use case, where it could any or all elements of the pipeline which are different.

An pure algebra example could be cut-off points in an algorithm:

def multiply(cutoff):
def mult_karatsuba_w_cutoff_to_classical(a, b):
# multiply a and b, use karatsuba recursively until size < cutoff
return mult_karatsuba_w_cutoff_to_classical

bs = InfiniteBenchmarkList(lambda cutoff: [ gen_100_bit_numbers, multiply(cutoff) ])
B = Bleachermark(bs)

Now Luca just replied, and I'll reply to that in a second :-P

johanrosenkilde · 2016-09-13T12:54:08Z

Bleachermark([data_gen(2^n), f] for n in itertools.count())

I didn't know itertools.count() -- that's cool. As Nicolas said, it has limited introspection, which means we would have no way of detecting an infinite list (and hence perhaps warn the user if his testing strategy is senseless for an infinite list).

The only thing I'm 100% against, is using two different idioms for finite and infinite benchmarks.

My suggestion is that Bleachermark takes an iterable of Benchmarks or list of functions (in the latter case, Bleachermark wraps the list of functions in Benchmarks itself).

InfiniteList was just a suggestion to support infinite iterables before I knew of itertools.count() - so it's not really a different idiom. I'm not sure I'm for it anymore. The reason should be for introspection purposes. In any case, itertools.count() would still work but you would just not give Bleachermark a chance to check sanity etc.

My current problem with Nicolas' suggestion is that I don't see how it covers anything else than his use case.

defeo · 2016-09-13T13:15:07Z

InfiniteList was just a suggestion to support infinite iterables before I knew of itertools.count() - so it's not really a different idiom. I'm not sure I'm for it anymore. The reason should be for introspection purposes. In any case, itertools.count() would still work but you would just not give Bleachermark a chance to check sanity etc.

itertools.count() is nothing else than

def count():
    i = 0
    while True:
        yield i
        i += 1

If you need introspection, you can make a tiny wrapper around it in
BleacherMark.

johanrosenkilde · 2016-09-13T13:24:58Z

If you need introspection, you can make a tiny wrapper around it in BleacherMark.

Wrapping only itertools.count is not enough if you also want to make list comprehensions ranging over it. That was our motivation for the constructor InfiniteBenchmarkList: basically a list comprehension that knows it's infinite.

defeo · 2016-09-13T13:30:34Z

Wrapping only itertools.count is not enough if you also want to make list comprehensions ranging over it. That was our motivation for the constructor InfiniteBenchmarkList: basically a list comprehension that knows it's infinite.

Right. This is better: you can test for `__len__()`. If the iterable has it, and if it returns a finite value, then the iterable is finite. If it does not, then it may be finite or infinte, and in this case you can warn.

johanrosenkilde · 2016-09-13T13:42:17Z

Wrapping only itertools.count is not enough if you also want to make list comprehensions ranging over it. That was our motivation for the constructor InfiniteBenchmarkList: basically a list comprehension that knows it's infinite.

Right. This is better: you can test for __len__(). If the iterable
has it, and if it returns a finite value, then the iterable is finite.
If it does not, then it may be finite or infinte, and in this case you
can warn.

Hmm, yes, except that then we throw warnings also in the case where the
user used soft parentheses instead of brackets for his finite
comprehension.

Your suggestion is "safe": it warns on all the potentially bad cases,
and many more.

With an InfiniteBenchmarkList we (could) warn only if the user
explicitly declared "infinite" using that constructor, but choose not to
warn otherwise since "it's probably finite, or the user knows what he's
doing".

That being said, I think the arguments for InfiniteBenchmarkList is
running out. I never cared too much for it anyway; I like writing ( (f, g) for n in NN). I just didn't know about itertools.count() :-)

Best,
Johan

nthiery · 2016-09-13T16:55:18Z

On Tue, Sep 13, 2016 at 05:40:03AM -0700, miguelmarco wrote:

 What more can (realistically) BleacherMark do when it knows,
 compared to when it does not?
Mainly, store and present the information in a more coherent way (a
homogenous table, with meaningful labels for columns), which would also
help with the basic statistical treatment.

More than this in fact: knowing that the benchmarks are indexed by NN
means that Beachermark can try all sorts of adaptive heuristics to run
only a meaningful subset of the benchmarks, a very crude one being to
only try sizes of the form n=2^k or n=floor(a^k). Sure, this
information could be retrieved from the hint="polynomial" optional
argument as well, but still, it seems silly to have to run the
iterator for all the intermediate values.

Sure, this is a special case, but a so common one that it deserves a
specific treatment.

Cheers,

nthiery · 2016-09-13T17:02:03Z

On Tue, Sep 13, 2016 at 05:48:46AM -0700, Johan Rosenkilde wrote:

Nicolas seems to be talking about a much stronger notion of "the same",
namely that it's actually the same list of functions, and where the
first function takes a "size" (or index) argument.

That seems to fit Nicolas' use case fine, but I don't see how it could
support e.g. the coding use case, where it could any or all elements of
the pipeline which are different.

Just to clarify: I totally see the point of Beachermark supporting
collections of benchmarks with different pipelines. I am just saying
that the special case of a collection of benchmarks sharing the exact
same pipeline and differing just by the "size" argument fed to the
first function probably deserves special casing. Let me call this a
homogenous beachmark.

I agree that it's probably not worth it to special case infinite
homogeneous beachermarks vs finite ones. On the other hand, it's
probably worth special casing homogenenous beachermarks indexed by
NN. Or at least making appropriate provisions in the syntax to allow
for such special casing.

Cheers,

johanrosenkilde · 2016-09-14T08:45:22Z

More than this in fact: knowing that the benchmarks are indexed by NN means that Beachermark can try all sorts of adaptive heuristics to run only a meaningful subset of the benchmarks, a very crude one being to only try sizes of the form n=2^k or n=floor(a^k). Sure, this information could be retrieved from the hint="polynomial" optional argument as well, but still, it seems silly to have to run the iterator for all the intermediate values.

That's not restricted to what you call homogeneous benchmarks at all. In my example with multiplication you'd like exactly the same feature. I do agree that an unfortunate side-effect of the iterator scheme is that we would have to compute and locally cache every pipeline up to the maximally considered. I imagine that the adaptive strategy will often quite sparsely run the benchmarks.

Just to clarify: I totally see the point of Beachermark supporting collections of benchmarks with different pipelines. I am just saying that the special case of a collection of benchmarks sharing the exact same pipeline and differing just by the "size" argument fed to the first function probably deserves special casing. Let me call this a homogenous beachmark.

I'm not convinced that homogeneous vs inhomogeneous deserves special casing. I still don't see what the Bleachermark gains from knowing that the pipeline is always exactly the same (on top of knowing that the pipeline are element-wise semantically comparable, as already described). I have a new suggestion for supporting infinite Bleachermark. Just give the pipeline constructor directly to the Bleachermark: ``` def gen_data(n): return lambda (): randint(2^n) def f(X): my_factoring_algo(X) B = Bleachermark(lambda n: [ gen_data(n), f ]) B.run(strategy="adaptive", hint="exponential") ``` Best, Johan

miguelmarco · 2016-09-16T09:36:23Z

I am thinking that finite bleachermark will almost always be constructed from lists or tuples, whereas infinite ones will be constructed from other iterators. We could use that to let the bleachermark know if it must treat the case one way or another.

johanrosenkilde · 2016-09-19T06:20:11Z

I am thinking that finite bleachermark will almost always be constructed from lists or tuples, whereas infinite ones will be constructed from other iterators. We could use that to let the bleachermark know if it must treat the case one way or another.

One catch is that generating expressions write like tuples but behave exactly like an infinite iterator would: ``` iter_finite = ( [ gen_data(n), f ] for n in range(1, 10) ) iter_infinite = ( [ gen_data(n), f ] for n in itertools.count() ) ``` The user might easily write the former instead of using `[ ... ]`. An infinite Bleachermark has to include a lambda in some form, whatever you do. We should just give that lambda directly to the Bleachermark without wrappers or ados -- cut out the middle-man as it were. I think my latest suggestion is the simplest while fully flexible, unambiguous syntax for achieving this. Thus I propose supporting basically two syntaxes: 1) explicit, finite iterable, and 2) a lambda implying an infinite Bleachermark. ``` B_fin = Bleachermark([ gen_data(n), f ] for n in range(1, 10)) B_inf = Bleachermark(lambda n: [ gen_data(n), f ]) ``` The first iterable is fully expanded on construction time. The second is automatically treated as the infinite (for n in NN) Bleachermark. The Bleachermark constructor wraps the pipelines in Benchmark objects itself, so users usually never deal with Benchmarks directly.

miguelmarco · 2016-10-20T11:15:38Z

I have been thinking about how to implement these nested schedulers, and so far I haven't found a satisfying design.

My idea was that a scheduler is just some object that exposes a .run() method that will return a specific run of a specific benchmark each time it is called. The scheduler, in its turn, will rely on the .run() method of other schedulers to do its job, being benchmarks the "primitive schedulers" in this model.

Imagine the following (simple) scenario: we want to combine the RunN scheduler (that runs its arguments N times, with consecutive run id's) with the RepeatN (that repeats the same run id several times).

So, for instance

RunN(RepeatN(b, 2) for b in benchmarks, 5)

would run something like:

benchmark1 with runid 0
benchmark1 with runid 0
benchmark2 with runid 0
benchmark2 with runid 0
benchmark1 with runid 1
benchmark1 with runid 1
benchmark2 with runid 1
benchmark2 with runid 1
...
benchmark1 with runid 4
benchmark1 with runid 4
benchmark2 with runid 4
benchmark2 with runid 4

Whereas

RepeatN(RunN(b, 5) for b in benchmarks, 2)

would run

benchmark1 with runid 0
benchmark2 with runid 0
benchmark1 with runid 1
benchmark2 with runid 1
benchmark1 with runid 2
benchmark2 with runid 2
benchmark1 with runid 3
benchmark2 with runid 3
benchmark1 with runid 4
benchmark2 with runid 4
benchmark1 with runid 0
benchmark2 with runid 0
benchmark1 with runid 1
benchmark2 with runid 1
benchmark1 with runid 2
benchmark2 with runid 2
benchmark1 with runid 3
benchmark2 with runid 3
benchmark1 with runid 4
benchmark2 with runid 4

So, if we want this behaviour, RunN has to pass some argument (runid) to RepeatN in the first case, whereas in the second case RunN doesn't make use of any such argument.

But I think we should have a common interface in this schedulers, in such a way that it is possible to nest them in all possible ways (although some of them might not make any sense). So what do you think? we include an argument in this interface and we just make most schedulers ignore it?

johanrosenkilde · 2016-10-20T12:07:54Z

Thanks for getting back to this, and sorry for being silent for so long.
Another free-time project has taken total precedence all of October.

My idea was that a scheduler is just some object that exposes a .run()
method that will return a specific run of a specific benchmark each
time it is called. The scheduler, in its turn, will rely on the .run()
method of other schedulers to do its job, being benchmarks the
"primitive schedulers" in this model.

Exactly my thought too.

Imagine the following (simple) scenario: we want to combine the RunN
scheduler (that runs its arguments N times, with consecutive run id's)
with the RepeatN (that repeats the same run id several times).

I don't think you would ever want to repeat the same run id, and you
don't want to give the user any control over the run id.

The run id shall not be abused

We should get a t-shirt with that.

More precisely, a run id is personal to the "primitive schedulers" and
have no meaning except as identifying one particular run of that
Benchmark. The user can choose to use the run id as a seed for his
random generator, or he can choose to ignore it.

Perhaps it's easier to imagine that the runids are not even there. You
suggested that at one point, just to force the user not to abuse it, and
there might be a point to it. It just hinders reproducibility somewhat
due to the random seed thing.

The semantics of RunN should, IMO, be to ask its benchmark (singular,
see below) for exactly N runs and then raise StopIteration. No mention
of runids. I don't see a possible semantics of RepeatN.

So, for instance

RunN(RepeatN(b, 2) for b in benchmarks, 5)

RunN should not accept multiple benchmarks -- only a single one. The
reason is that there's many orderings you could imagine, so there will
be dedicated schedulers for taking care of that. What you wrote above
should be:

RunN(Sequential( RepeatN(b, 2) for b in benchmarks ), 5)

Except of course that RepeatN should not exist ;-)

But I think we should have a common interface in this schedulers, in
such a way that it is possible to nest them in all possible ways
(although some of them might not make any sense). So what do you
think? we include an argument in this interface and we just make most
schedulers ignore it?

I agree. The interface I propose contains have runids only in the return
values. More precisely, a scheduler is an object with a single method
get_run():

get_run: () -> benchmark-id * run-id

(or possibly, it will just be an iterator)

A Benchmark has two methods:

get_run (to implement the Scheduler interface)
run: run-id -> results

The task master thread will keep querying the top-level scheduler for
get_run(). He will feed the (benchmark_id, run_id) to the runner.

If the runner is sequential, the same thread will look up the
Benchmark B with the given id and call B.run(run_id) and save the
results.
If the runner is parallel, it will decide a thread T to give the task,
and send (benchmark_id, run_id) to T. T has already received a list of
the Benchmarks, and can now find the B with the benchmark_id and call
B.run_id(). Then asynchronously return the results to the main thread.

Note that the use of benchmark_ids avoid sending the Benchmark objects
to the threads more than once, which might be costly.

Best,
Johan

miguelmarco · 2016-11-25T13:15:03Z

Hi guys,

I have done some occasional thinking on this. I am making some experiments with a design where schedulers are just iterators (so, no .run() method, just .next()). They are created by passing other iterators, and they just call those iterators in the appropriate order. The runid's are then handled internally by benchmarks (so each benchmark takes care of being run each time with a different runid, which might be troublesome in parallel settings...)

A basic example could be

class Scheduler:
    def __init__(self, scheds):
        self._scheds = scheds
    
    def __iter__(self):
        return self
    
    def next(self):
        raise StopIteration
    
    __next__ = next
    

class RunN(Scheduler):
    r"""
    Repeats the passed scheduler n times.
    """
    def __init__(self, sched, n):
        self._n = n
        self._sched = sched
        self._runs = 0
        
    def next(self):
        if self._runs >= self._n:
            raise StopIteration
        else:
            self._runs += 1
            return self._sched
        
    __next__ = next
        

class Balanced(Scheduler):
    r"""
    Runs the passed schedulers keeping balanced the number of runs of each one
    """
    def __init__(self, schedulers):
        r"""
        INPUT: an iterator that returns schedulers. It can be infinite in theory, but then each one would be run once.
        """
        self._listsched = []
        self._itersched = schedulers
        self._is_exhausted = False
        self._current = 0
        
    def next(self):
        r"""
        returns the next scheduler to be run
        """
        while not self._is_exhausted:
            try:
                sched = self._itersched.next()
                try:
                    res = sched.next()
                    self._listsched.append(sched)
                    return res
                except StopIteration:
                    pass
                
            except StopIteration:
                self._is_exhausted = True
                
        while len(self._listsched) > 0:
            try:
                res = self._listsched[self._current].next()
                self._current += 1
                if self._current >= len(self._listsched):
                    self._current = 0
                return res
            except StopIteration:
                self._listsched.pop(self._current)
    
        raise StopIteration
    
    __next__ = next

So Balanced(RunN(b, 4) for b in benchmarks) would just run each benchmark once until all benchmarks have been run 4 times. I am not sure if this is the right design, but we have gone through so many ideas that I cannot even remenber the pros and cons of each one.

Thoughts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Constructing Bleachermarks with dynamic Benchmarks #1

Constructing Bleachermarks with dynamic Benchmarks #1

johanrosenkilde commented Sep 12, 2016 •

edited

Loading

johanrosenkilde commented Sep 12, 2016

johanrosenkilde commented Sep 12, 2016

miguelmarco commented Sep 12, 2016

johanrosenkilde commented Sep 12, 2016

miguelmarco commented Sep 12, 2016

johanrosenkilde commented Sep 12, 2016

nthiery commented Sep 12, 2016

johanrosenkilde commented Sep 12, 2016

defeo commented Sep 12, 2016

johanrosenkilde commented Sep 13, 2016

johanrosenkilde commented Sep 13, 2016

defeo commented Sep 13, 2016

nthiery commented Sep 13, 2016

nthiery commented Sep 13, 2016

defeo commented Sep 13, 2016 •

edited

Loading

miguelmarco commented Sep 13, 2016

defeo commented Sep 13, 2016 via email

johanrosenkilde commented Sep 13, 2016

johanrosenkilde commented Sep 13, 2016

defeo commented Sep 13, 2016

johanrosenkilde commented Sep 13, 2016

defeo commented Sep 13, 2016 via email

johanrosenkilde commented Sep 13, 2016

nthiery commented Sep 13, 2016

nthiery commented Sep 13, 2016

johanrosenkilde commented Sep 14, 2016 via email

miguelmarco commented Sep 16, 2016

johanrosenkilde commented Sep 19, 2016 via email

miguelmarco commented Oct 20, 2016

johanrosenkilde commented Oct 20, 2016

miguelmarco commented Nov 25, 2016

Constructing Bleachermarks with dynamic Benchmarks #1

Constructing Bleachermarks with dynamic Benchmarks #1

Comments

johanrosenkilde commented Sep 12, 2016 • edited Loading

johanrosenkilde commented Sep 12, 2016

johanrosenkilde commented Sep 12, 2016

miguelmarco commented Sep 12, 2016

johanrosenkilde commented Sep 12, 2016

miguelmarco commented Sep 12, 2016

johanrosenkilde commented Sep 12, 2016

nthiery commented Sep 12, 2016

johanrosenkilde commented Sep 12, 2016

defeo commented Sep 12, 2016

johanrosenkilde commented Sep 13, 2016

johanrosenkilde commented Sep 13, 2016

defeo commented Sep 13, 2016

nthiery commented Sep 13, 2016

nthiery commented Sep 13, 2016

defeo commented Sep 13, 2016 • edited Loading

miguelmarco commented Sep 13, 2016

defeo commented Sep 13, 2016 via email

johanrosenkilde commented Sep 13, 2016

johanrosenkilde commented Sep 13, 2016

defeo commented Sep 13, 2016

johanrosenkilde commented Sep 13, 2016

defeo commented Sep 13, 2016 via email

johanrosenkilde commented Sep 13, 2016

nthiery commented Sep 13, 2016

nthiery commented Sep 13, 2016

johanrosenkilde commented Sep 14, 2016 via email

miguelmarco commented Sep 16, 2016

johanrosenkilde commented Sep 19, 2016 via email

miguelmarco commented Oct 20, 2016

johanrosenkilde commented Oct 20, 2016

miguelmarco commented Nov 25, 2016

johanrosenkilde commented Sep 12, 2016 •

edited

Loading

defeo commented Sep 13, 2016 •

edited

Loading