Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More efficient Eq, Ord for Set, Map #1017

Merged
merged 1 commit into from
Aug 25, 2024
Merged

Conversation

meooow25
Copy link
Contributor

@meooow25 meooow25 commented Aug 6, 2024

More efficient implementation moving away from the toList based approach.

For #1016.


Benchmarks, on GHC 9.6.3:

Set before:
  eq:      OK
    74.9 μs ± 5.6 μs, 447 KB allocated,  86 B  copied, 7.0 MB peak memory
  compare: OK
    61.9 μs ± 5.9 μs, 447 KB allocated,  85 B  copied, 7.0 MB peak memory

Set after:
  eq:      OK
    29.2 μs ± 2.7 μs, 128 KB allocated,  11 B  copied, 7.0 MB peak memory, 61% less than baseline
  compare: OK
    29.1 μs ± 2.8 μs, 128 KB allocated,  11 B  copied, 7.0 MB peak memory, 53% less than baseline

Map before:
  eq:      OK
    123  μs ±  11 μs, 637 KB allocated, 163 B  copied, 9.0 MB peak memory
  compare: OK
    168  μs ±  15 μs, 637 KB allocated, 163 B  copied, 9.0 MB peak memory

Map after:
  eq:      OK
    38.5 μs ± 3.6 μs, 159 KB allocated,  15 B  copied, 9.0 MB peak memory, 68% less than baseline
  compare: OK
    39.1 μs ± 3.4 μs, 159 KB allocated,  15 B  copied, 9.0 MB peak memory, 76% less than baseline

Note: Why is the improvement less for Set? This is because the benchmarks use Set Int, and there happen to be specializations for Eq [Int] and Ord [Int]. Without specializations, the improvement for Set is (70-75%) just like Map (see numbers in #1016).

@meooow25 meooow25 force-pushed the fast-eq-ord branch 2 times, most recently from aa6ecd0 to a0ce261 Compare August 6, 2024 20:01
@meooow25
Copy link
Contributor Author

meooow25 commented Aug 7, 2024

For the record, if we have

foo :: Set Int -> Set Int -> Ordering
foo = compare

(with GHC 9.6.3) it gets compiled to the Core

Rec {
-- RHS size: {terms: 101, types: 97, coercions: 33, joins: 0/0}
$wgo1 [InlPrag=[2], Occ=LoopBreaker]
  :: Set Int -> Iterator Int -> (# Ordering, Iterator Int #)
[GblId[StrictWorker([!, !])],
 Arity=2,
 Str=<1L><1L>,
 Unf=OtherCon []]
$wgo1
  = \ (ds_s5eh :: Set Int) (eta_s5ei :: Iterator Int) ->
      case ds_s5eh of {
        Bin bx_a3pH k_a3pI ds1_a3pJ ds2_a3pK ->
          case k_a3pI of { I# x#_s5zS ->
          case bx_a3pH of {
            __DEFAULT ->
              case $wgo1 ds1_a3pJ eta_s5ei of wild2_X1E
              { (# ww_s5vE, ww1_s5vF #) ->
              case ww_s5vE of {
                __DEFAULT -> wild2_X1E;
                EQ ->
                  case ww1_s5vF `cast` <Co:2> :: ... of {
                    Push x1_a3qi r1_a3qj stk'_a3qk ->
                      case x1_a3qi of { I# y#_s5zV ->
                      case iterDown @Int r1_a3qj stk'_a3qk of nt_a3qm { __DEFAULT ->
                      case <# x#_s5zS y#_s5zV of {
                        __DEFAULT ->
                          case ==# x#_s5zS y#_s5zV of {
                            __DEFAULT -> (# GT, nt_a3qm `cast` <Co:3> :: ... #);
                            1# -> $wgo1 ds2_a3pK (nt_a3qm `cast` <Co:3> :: ...)
                          };
                        1# -> (# LT, nt_a3qm `cast` <Co:3> :: ... #)
                      }
                      }
                      };
                    Nada -> (# GT, (Nada @Int) `cast` <Co:3> :: ... #)
                  }
              }
              };
            1# ->
              case eta_s5ei `cast` <Co:2> :: ... of {
                Push x_a3qr r1_a3qs stk'_a3qt ->
                  case x_a3qr of { I# y#_s5A1 ->
                  case iterDown @Int r1_a3qs stk'_a3qt of nt_a3qv { __DEFAULT ->
                  case <# x#_s5zS y#_s5A1 of {
                    __DEFAULT ->
                      case ==# x#_s5zS y#_s5A1 of {
                        __DEFAULT -> (# GT, nt_a3qv `cast` <Co:3> :: ... #);
                        1# -> (# EQ, nt_a3qv `cast` <Co:3> :: ... #)
                      };
                    1# -> (# LT, nt_a3qv `cast` <Co:3> :: ... #)
                  }
                  }
                  };
                Nada -> (# GT, (Nada @Int) `cast` <Co:3> :: ... #)
              }
          }
          };
        Tip ->
          case eta_s5ei `cast` <Co:2> :: ... of nt_s54V { __DEFAULT ->
          (# EQ, nt_s54V `cast` <Co:3> :: ... #)
          }
      }
end Rec }

-- RHS size: {terms: 20, types: 23, coercions: 5, joins: 0/0}
foo [InlPrag=INLINABLE] :: Set Int -> Set Int -> Ordering
[GblId,
 Arity=2,
 Str=<1L><1L>,
 Unf=Unf{Src=<vanilla>, TopLvl=True,
         Value=True, ConLike=True, WorkFree=True, Expandable=True,
         Guidance=IF_ARGS [0 0] 110 20}]
foo
  = \ (s1_a3oU :: Set Int) (s2_a3oV :: Set Int) ->
      case $wgo1
             s1_a3oU ((iterDown @Int s2_a3oV (Nada @Int)) `cast` <Co:3> :: ...)
      of
      { (# ww_s5vE, ww1_s5vF #) ->
      case ww_s5vE of wild1_a3qC {
        __DEFAULT -> wild1_a3qC;
        EQ ->
          case ww1_s5vF `cast` <Co:2> :: ... of {
            Push ds_a4A6 ds1_a4A7 ds2_a4A8 -> LT;
            Nada -> EQ
          }
      }
      }

Which looks pretty good to me!

@meooow25
Copy link
Contributor Author

meooow25 commented Aug 7, 2024

@treeowl, would appreciate a review if you have the time for it.

* Add tests and benchmarks.
* Implement Eq and Ord using foldMap + iterator. Effect on benchmark
  times, using GHC 9.6.3:
  Set Int, eq:          -61%
  Set Int, compare:     -53%
  Map Int Int, eq:      -68%
  Map Int Int, compare: -76%
@meooow25 meooow25 merged commit 4af12df into haskell:master Aug 25, 2024
11 checks passed
@meooow25 meooow25 deleted the fast-eq-ord branch August 25, 2024 07:42
@meooow25
Copy link
Contributor Author

Thanks for reviewing!

@meooow25 meooow25 mentioned this pull request Aug 25, 2024
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants