Can we improve fromDistinctAscList for IntSet and IntMap #654

treeowl · 2019-07-04T03:31:32Z

The control flow of fromDistinctAscList for IntSet and IntMap is rather ... opaque. It operates using an explicit stack that seems to represent a sort of zipper for the tree being constructed. That stack is, of course, allocated on the heap, which doesn't seem ideal. The fact that the code is just a bit more complicated than I can really understand is unfortunate. My guess is that it's probably possible to shift that explicit stack onto GHC's native reduction stack, which seems likely to make the code less unreadable and more efficient.

The text was updated successfully, but these errors were encountered:

treeowl · 2019-07-04T03:32:26Z

@int-e I know you were involved in writing this code many years ago; I wonder if you might be interested in revisiting it.

treeowl · 2019-07-04T04:33:14Z

The reason this came up: #653 seems to make fromList for ascending and descending lists almost as fast as fromDistinctAscList for these types, which hints that the latter should be able to do better.

int-e · 2019-07-06T00:46:38Z

I wouldn't expect a huge speedup... most of the speedup of fromDistrinctAscList comes from avoiding the repeated traversal and recreation of paths all the way from the root, and with the improved fromList you get that same effect. That said, I'm seeing a small speedup over the existing fromDistinctAscList with the following code:

fromDistinctAscList1 :: [(Key,a)] -> IntMap a
fromDistinctAscList1 []              = Nil
fromDistinctAscList1 ((kx,vx) : zs0) = addAll kx (Tip kx vx) zs0
  where
    -- invariant for `addAll` and `addMany`: `kx` is a key in the map `tx`.
    addAll !kx !tx []
        = tx
    addAll !kx !tx ((ky,vy) : zs)
        | m <- branchMask kx ky
        , Inserted ty zs' <- addMany m ky (Tip ky vy) zs
        = addAll kx (link ky ty kx tx) zs'

    -- addMany adds all elements that have the same prefix as `kx` w.r.t.
    -- the branch mask `m` to `tx`.
    addMany !m !kx tx []
        = Inserted tx []
    addMany !m !kx tx zs0@((ky,vy) : zs)
        | mask kx m /= mask ky m
        = Inserted tx zs0
        | Inserted ty zs' <- addMany (branchMask kx ky) ky (Tip ky vy) zs
        = addMany m kx (link ky ty kx tx) zs'

data Inserted a = Inserted !(IntMap a) ![(Key,a)]

There is some wasted effort here, since most of the time, one can predict which way the link will go. But annoyingly, the Bin has to be flipped if both negative and positive keys occur.

For IntSet, I imagine there's a bit more potential for speedup from having a specialized inner loop that combines the bits for keys with the same prefix.

Finally, regarding fromAscList, it's currently a two-pass algorithm; for raw speed it may be worthwhile to bite the bullet and fuse them manually...

treeowl · 2019-07-06T01:05:27Z

I would imagine that CPU branch prediction could handle the link thing. It would, of course, be nicer if we had a branchless way to order the subtrees, but I doubt we'll get one soon. I personally find your proposed code much easier to read, so as long as it's not slower, I want it. Since you say it's a little faster, so much the better!

treeowl · 2019-07-06T01:29:44Z

You write

Invariant for addAll and addMany: kx is a key in the map tx.

That doesn't sound right, since you call addAll with mask kx m. What did you mean exactly?

treeowl · 2019-07-06T02:01:56Z

Would it be worth writing a separate version for the special case where the first element is non-negative? That is a pretty common scenario.

int-e · 2019-07-06T04:02:40Z

Invariant for addAll and addMany: kx is a key in the map tx.

That doesn't sound right, since you call addAll with mask kx m. What did you mean exactly?

That was unintended. In fact, any value that agrees with the prefix of the tree tx is fine... initially I passed in the actual prefix, and the mask kx m was a leftover of that. But keys are more readily available.

int-e · 2019-07-06T04:42:36Z

Would it be worth writing a separate version for the special case where the first element is non-negative? That is a pretty common scenario.

Hmm, actually the link calls inside addMany are predictable, so this would work:

fromDistinctAscList1a :: [(Key,a)] -> IntMap a
fromDistinctAscList1a []              = Nil
fromDistinctAscList1a ((kx,vx) : zs0) = addAll kx (Tip kx vx) zs0
  where
    -- invariant for `addAll` and `addMany`: `kx` is a key in the map `tx`.
    addAll !kx tx []
        = tx
    addAll !kx tx ((ky,vy) : zs)
        | m <- branchMask kx ky
        , Inserted ty zs' <- addMany m ky (Tip ky vy) zs
        = addAll kx (link ky ty kx tx) zs'

    -- addMany adds all elements that have the same prefix as `kx` w.r.t.
    -- the branch mask `m` to `tx`.
    addMany !m !kx tx []
        = Inserted tx []
    addMany !m !kx tx zs0@((ky,vy) : zs)
        | mask kx m /= mask ky m
        = Inserted tx zs0
        | m' <- branchMask kx ky
        , Inserted ty zs' <- addMany m' ky (Tip ky vy) zs
        = addMany m kx (Bin (mask kx m') m' tx ty) zs'

However, I don't see a speedup compared to fromDistinctAscList1. But my benchmarking may be too naive:

main = do
    print $ foldl' (+) 0 $ Prelude.map size $ do
        sz <- [2^i | i <- [0..10]]
        shift <- [v * 2^i | i <- [0..20], v <- [-1,1]]
        mult <- [1..100]
        [fromDistinctAscList1 [(shift + mult * i, ()) | i <- [0..sz]]]

Btw, it's interesting to ponder when fromDistinctAscList1 works: it works for inputs such that, for each prefix, the keys corresponding to that prefix occur consecutively in the list, and the list has no duplicate keys. So in particular it works for both increasing and decreasing lists.

int-e · 2019-07-06T14:09:43Z

I'm working on this. Here are some detailed benchmarks obtained using gauge (section [n,1] uses keys [0..n] as input keys)... fromList1 is the code from #653, while fromList is the current implementation:

containers/containers/src/Data/IntMap/Internal.hs

Lines 3080 to 3084 in 9231fd7

    
           fromList :: [(Key,a)] -> IntMap a 
        
           fromList xs 
        
             = Foldable.foldl' ins empty xs 
        
             where 
        
               ins t (k,x)  = insert k x t

[1,1]/fromList                           mean 19.90 ns  ( +- 103.2 ps  )
[1,1]/fromList1                          mean 24.60 ns  ( +- 27.25 ps  )
[1,1]/fromAscList                        mean 34.58 ns  ( +- 44.19 ps  )
[1,1]/fromDistinctAscList                mean 25.65 ns  ( +- 37.96 ps  )
[1,1]/fromDistinctAscList1               mean 20.15 ns  ( +- 32.51 ps  )
[1,1]/fromDistinctAscList1a              mean 19.90 ns  ( +- 24.34 ps  )

[10,1]/fromList                          mean 137.5 ns  ( +- 2.841 ns  )
[10,1]/fromList1                         mean 162.2 ns  ( +- 2.894 ns  )
[10,1]/fromAscList                       mean 294.3 ns  ( +- 5.420 ns  )
[10,1]/fromDistinctAscList               mean 186.4 ns  ( +- 3.141 ns  )
[10,1]/fromDistinctAscList1              mean 144.5 ns  ( +- 1.696 ns  )
[10,1]/fromDistinctAscList1a             mean 149.5 ns  ( +- 3.939 ns  )

[100,1]/fromList                         mean 2.064 μs  ( +- 68.65 ns  )
[100,1]/fromList1                        mean 1.740 μs  ( +- 43.99 ns  )
[100,1]/fromAscList                      mean 3.035 μs  ( +- 92.46 ns  )
[100,1]/fromDistinctAscList              mean 1.837 μs  ( +- 79.14 ns  )
[100,1]/fromDistinctAscList1             mean 1.773 μs  ( +- 16.59 ns  )
[100,1]/fromDistinctAscList1a            mean 1.605 μs  ( +- 55.38 ns  )

[1000,1]/fromList                        mean 36.36 μs  ( +- 744.9 ns  )
[1000,1]/fromList1                       mean 18.21 μs  ( +- 325.1 ns  )
[1000,1]/fromAscList                     mean 33.16 μs  ( +- 918.7 ns  )
[1000,1]/fromDistinctAscList             mean 21.37 μs  ( +- 346.2 ns  )
[1000,1]/fromDistinctAscList1            mean 18.13 μs  ( +- 482.7 ns  )
[1000,1]/fromDistinctAscList1a           mean 17.60 μs  ( +- 372.6 ns  )

[10000,1]/fromList                       mean 1.162 ms  ( +- 6.459 μs  )
[10000,1]/fromList1                      mean 256.8 μs  ( +- 4.090 μs  )
[10000,1]/fromAscList                    mean 1.013 ms  ( +- 14.06 μs  )
[10000,1]/fromDistinctAscList            mean 326.7 μs  ( +- 3.084 μs  )
[10000,1]/fromDistinctAscList1           mean 256.2 μs  ( +- 5.021 μs  )
[10000,1]/fromDistinctAscList1a          mean 248.9 μs  ( +- 3.901 μs  )

[100000,1]/fromList                      mean 13.93 ms  ( +- 465.4 μs  )
[100000,1]/fromList1                     mean 9.134 ms  ( +- 231.1 μs  )
[100000,1]/fromAscList                   mean 11.49 ms  ( +- 350.1 μs  )
[100000,1]/fromDistinctAscList           mean 10.02 ms  ( +- 338.8 μs  )
[100000,1]/fromDistinctAscList1          mean 9.199 ms  ( +- 278.7 μs  )
[100000,1]/fromDistinctAscList1a         mean 9.111 ms  ( +- 358.4 μs  )

treeowl · 2019-07-06T16:28:49Z

Have you tried with non-contiguous keys at all?

treeowl · 2019-07-06T16:32:03Z

It would also be good to get larger sizes in the mix.

int-e · 2019-07-06T19:10:23Z

As already mentioned in #658, I have more benchmark results here: https://gist.github.com/int-e/36578cb04d0a187252c366b0b45ddcb6. These include sparse lists (the second test parameter is the distance between consecutive keys). Sparse lists make a huge difference for IntSet, but for IntMap, the effect is pretty small, which is why I didn't include them above.

I've tested maps with 1M entries, but gauge started spewing warnings about the tests taking long, and there was nothing interesting to see, so I dropped them. But in any case, you have the source code now, so feel free to play around yourself.

int-e mentioned this issue Jul 6, 2019

Improve fromAscList and friends. #658

Merged

treeowl mentioned this issue Dec 29, 2019

Rewrite Data.IntMap to be faster and use less memory #340

Open

sjakobi added IntMap IntSet performance labels Jul 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we improve fromDistinctAscList for IntSet and IntMap #654

Can we improve fromDistinctAscList for IntSet and IntMap #654

treeowl commented Jul 4, 2019

treeowl commented Jul 4, 2019

treeowl commented Jul 4, 2019 •

edited

Loading

int-e commented Jul 6, 2019 •

edited

Loading

treeowl commented Jul 6, 2019

treeowl commented Jul 6, 2019

treeowl commented Jul 6, 2019

int-e commented Jul 6, 2019

int-e commented Jul 6, 2019

int-e commented Jul 6, 2019

treeowl commented Jul 6, 2019

treeowl commented Jul 6, 2019

int-e commented Jul 6, 2019 •

edited

Loading

Can we improve fromDistinctAscList for IntSet and IntMap #654

Can we improve fromDistinctAscList for IntSet and IntMap #654

Comments

treeowl commented Jul 4, 2019

treeowl commented Jul 4, 2019

treeowl commented Jul 4, 2019 • edited Loading

int-e commented Jul 6, 2019 • edited Loading

treeowl commented Jul 6, 2019

treeowl commented Jul 6, 2019

treeowl commented Jul 6, 2019

int-e commented Jul 6, 2019

int-e commented Jul 6, 2019

int-e commented Jul 6, 2019

treeowl commented Jul 6, 2019

treeowl commented Jul 6, 2019

int-e commented Jul 6, 2019 • edited Loading

treeowl commented Jul 4, 2019 •

edited

Loading

int-e commented Jul 6, 2019 •

edited

Loading

int-e commented Jul 6, 2019 •

edited

Loading