improve benchmarks for Data.IntMap #657

jwaldmann · 2019-07-04T20:51:51Z

Benchmarks should

use several sets of data (currently: just one, contiguous keys [1 .. 2^12])
test bulk operations (union, intersection) - currently, they don't? https://github.com/haskell/containers/blob/master/containers-tests/benchmarks/IntMap.hs

NB: these bulk ops are the main reason for IntMap? if we only operate by-element, we could use hashmaps?

The text was updated successfully, but these errors were encountered:

int-e · 2019-07-06T19:50:23Z

Here's a potential source for inspiration: https://gist.github.com/int-e/36578cb04d0a187252c366b0b45ddcb6#file-intmapfal-hs-L20-L45

jwaldmann · 2019-07-07T00:03:32Z

yeah, also https://github.com/jwaldmann/containers/blob/intmap-fromList/containers-tests/benchmarks/IntMap.hs

sjakobi · 2019-07-29T23:08:24Z

@jwaldmann That looks like a nice improvement! Why don't you simply make a PR?

jwaldmann · 2019-07-31T08:54:37Z

"Why don't you.." - because it's a drastic change that should be discussed first? Current benchmark:

defaultMain
        [ bench "lookup" $ whnf (lookup keys) m ...

my proposal

  defaultMain $ do
    e <- [ 10, 15 .. 25 ]
    return $ bgroup ("2^" <> show e)
      [ bulk
        [ ("contiguous/overlapping", [1..2^e], [1..2^e]) ...

sjakobi · 2019-07-31T10:47:40Z

@jwaldmann I guess a PR would be the perfect platform for that discussion! :)

gereeter · 2019-12-28T09:43:17Z

test bulk operations (union, intersection) - currently, they don't?

These are tested in the set-operations-intmap benchmark, which also uses a variety of data sets.

sjakobi · 2020-08-14T12:55:57Z

For reference: In #653, there's some performance work on fromList[WithKey] that needs better benchmarks with more realistic inputs.

sjakobi · 2020-08-14T13:02:20Z

For reference: In #653, there's some performance work on fromList[WithKey] that needs better benchmarks with more realistic inputs.

Also related: #652

sjakobi · 2020-08-14T13:09:43Z

Regarding the problem of realistic inputs for fromList and friends: How about using e.g. splitmix to generate them randomly? Certainly that's not very realistic for many applications, but it adds another data point.

prettyprinter has a benchmark that is similarly based on randomly generated data: https://github.com/quchen/prettyprinter/blob/ab2c09419cca51fcc37760e71ef6861d26753e94/prettyprinter/bench/LargeOutput.hs#L173-L181

sjakobi added benchmarking IntMap labels Jul 15, 2020

sjakobi mentioned this issue Aug 14, 2020

Speed up fromList for IntMap #653

Draft

sjakobi pinned this issue Aug 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve benchmarks for Data.IntMap #657

improve benchmarks for Data.IntMap #657

jwaldmann commented Jul 4, 2019

int-e commented Jul 6, 2019

jwaldmann commented Jul 7, 2019

sjakobi commented Jul 29, 2019

jwaldmann commented Jul 31, 2019

sjakobi commented Jul 31, 2019

gereeter commented Dec 28, 2019

sjakobi commented Aug 14, 2020

sjakobi commented Aug 14, 2020

sjakobi commented Aug 14, 2020

improve benchmarks for Data.IntMap #657

improve benchmarks for Data.IntMap #657

Comments

jwaldmann commented Jul 4, 2019

int-e commented Jul 6, 2019

jwaldmann commented Jul 7, 2019

sjakobi commented Jul 29, 2019

jwaldmann commented Jul 31, 2019

sjakobi commented Jul 31, 2019

gereeter commented Dec 28, 2019

sjakobi commented Aug 14, 2020

sjakobi commented Aug 14, 2020

sjakobi commented Aug 14, 2020