Skip to content

Haskell counters that can be safely incremented from multiple threads

License

Notifications You must be signed in to change notification settings

sergv/atomic-counter

Repository files navigation

build

Synopsis

Mutable cells that hold an integer value and can be safely modified from multiple threads. Support only few operations: read, write, +, -, and bitwise and, or, xor and nand.

Good use case is a shared counter that multiple threads increment.

Operations translate to atomic CPU instructions which involve memory barrier. For limited set of operation this package provides in concurrent setting they tend to be faster than atomic modify operation on IORefs and other shared var types. Value inside the Counter type is unboxed which contributes to outperforming all other shared vars because they box the value they store. For integers boxing is especially bad.

For single-threaded use case this package will likely not outperform vanilla IORefs or STRefs (it depends on whether GHC adds memory barrier for them or not, cf https://gitlab.haskell.org/ghc/ghc/-/issues/22764, if it does end up adding the same memory barrier then both this package and IORef should be equal). All in all, operations this package provides will incur guaranteed memory barrier which will serve no purpose in single-threaded settings so for that use case this package is probably not the best. The intended use of this package is in the concurrent setting where it does seem like a clear winner (please see benchmarks below).

Still even is single-threaded scenario the Counter can serve as an efficient mutable integer cell that does not box the integer.

Benchmark

Summary

Depending on number of threads Counter from this package can be up to 10 times faster than next best TVar from the stm package.

Details

The benchmark is to spawn N threads each of which will increment the same counter by 1 for a number of iterations.

Test setup: Intel i5-9600K CPU @ 3.70GHz, 6 cores, no hyperthreading, GHC 9.6.1.

NB IORef inconsistent is the benchmark that just reads and writes IORef with the fastest functions the IORef supports. It gives wrong results when multiple threads access IORef this way since these reads and writes are not synchronized. It’s included only for speed comparison purposes. All others do proper synchronization of increments between the threads. Current package is shown as Counter.

$ cabal run bench -- -j1 --timeout 300 --stdev 2
All
  Correctness:          OK (13.97s)
    +++ OK, passed 10000 tests.
  Read/write contention with 10 iterations and 1 threads
    Counter:            OK (14.23s)
      839  ns ±  33 ns
    IORef inconsistent: OK (1.08s)
      1.03 μs ±  36 ns, 1.23x
    IORef atomic:       OK (1.30s)
      1.25 μs ±  40 ns, 1.48x
    MVar:               OK (10.53s)
      1.24 μs ±  13 ns, 1.48x
    TMVar:              OK (0.65s)
      1.18 μs ±  30 ns, 1.41x
    TVar:               OK (0.33s)
      1.13 μs ±  45 ns, 1.35x
    Addr:               OK (3.37s)
      818  ns ±  28 ns, 0.97x
  Read/write contention with 100 iterations and 1 threads
    Counter:            OK (1.47s)
      1.35 μs ±  24 ns
    IORef inconsistent: OK (17.27s)
      2.09 μs ±  47 ns, 1.55x
    IORef atomic:       OK (0.45s)
      3.22 μs ± 122 ns, 2.38x
    MVar:               OK (3.15s)
      2.97 μs ±  30 ns, 2.20x
    TMVar:              OK (1.24s)
      4.65 μs ±  64 ns, 3.44x
    TVar:               OK (2.43s)
      4.61 μs ± 158 ns, 3.41x
    Addr:               OK (1.16s)
      1.09 μs ±  22 ns, 0.80x
  Read/write contention with 1000 iterations and 1 threads
    Counter:            OK (0.47s)
      6.48 μs ± 235 ns
    IORef inconsistent: OK (0.78s)
      2.76 μs ±  88 ns, 0.43x
    IORef atomic:       OK (0.60s)
      17.5 μs ± 399 ns, 2.69x
    MVar:               OK (2.09s)
      15.8 μs ± 208 ns, 2.43x
    TMVar:              OK (0.40s)
      48.3 μs ± 1.9 μs, 7.46x
    TVar:               OK (0.55s)
      32.4 μs ± 713 ns, 5.00x
    Addr:               OK (0.34s)
      4.73 μs ± 189 ns, 0.73x
  Read/write contention with 10000 iterations and 1 threads
    Counter:            OK (0.50s)
      58.4 μs ± 2.1 μs
    IORef inconsistent: OK (0.42s)
      23.8 μs ± 829 ns, 0.41x
    IORef atomic:       OK (0.73s)
      168  μs ± 3.8 μs, 2.87x
    MVar:               OK (0.66s)
      157  μs ± 4.6 μs, 2.69x
    TMVar:              OK (2.09s)
      512  μs ± 7.2 μs, 8.78x
    TVar:               OK (1.39s)
      321  μs ± 6.7 μs, 5.50x
    Addr:               OK (47.12s)
      43.8 μs ± 562 ns, 0.75x
  Read/write contention with 10 iterations and 2 threads
    Counter:            OK (1.10s)
      7.85 μs ± 144 ns
    IORef inconsistent: OK (1.93s)
      7.65 μs ± 201 ns, 0.97x
    IORef atomic:       OK (0.54s)
      8.16 μs ± 235 ns, 1.04x
    MVar:               OK (7.81s)
      7.81 μs ±  72 ns, 0.99x
    TMVar:              OK (0.49s)
      7.98 μs ± 292 ns, 1.02x
    TVar:               OK (1.91s)
      7.98 μs ± 181 ns, 1.02x
    Addr:               OK (2.01s)
      7.64 μs ± 213 ns, 0.97x
  Read/write contention with 100 iterations and 2 threads
    Counter:            OK (1.00s)
      8.50 μs ± 290 ns
    IORef inconsistent: OK (8.00s)
      8.13 μs ± 133 ns, 0.96x
    IORef atomic:       OK (2.80s)
      13.3 μs ± 358 ns, 1.56x
    MVar:               OK (4.64s)
      19.2 μs ± 473 ns, 2.25x
    TMVar:              OK (5.52s)
      32.3 μs ± 1.1 μs, 3.80x
    TVar:               OK (83.27s)
      30.5 μs ± 279 ns, 3.59x
    Addr:               OK (4.38s)
      8.71 μs ± 333 ns, 1.03x
  Read/write contention with 1000 iterations and 2 threads
    Counter:            OK (28.57s)
      43.7 μs ± 106 ns
    IORef inconsistent: OK (0.91s)
      17.2 μs ± 669 ns, 0.39x
    IORef atomic:       OK (130.48s)
      223  μs ±  10 μs, 5.09x
    MVar:               OK (1.19s)
      5.11 ms ± 173 μs, 116.88x
    TMVar:              OK (0.33s)
      295  μs ± 7.7 μs, 6.75x
    TVar:               OK (1.44s)
      337  μs ±  10 μs, 7.71x
    Addr:               OK (0.47s)
      49.3 μs ± 915 ns, 1.13x
  Read/write contention with 10000 iterations and 2 threads
    Counter:            OK (0.51s)
      466  μs ±  17 μs
    IORef inconsistent: OK (13.49s)
      190  μs ± 1.2 μs, 0.41x
    IORef atomic:       OK (71.77s)
      1.80 ms ±  33 μs, 3.86x
    MVar:               OK (7.23s)
      60.1 ms ± 505 μs, 128.83x
    TMVar:              OK (1.50s)
      2.39 ms ±  39 μs, 5.13x
    TVar:               OK (108.24s)
      3.13 ms ± 405 μs, 6.70x
    Addr:               OK (0.29s)
      531  μs ±  20 μs, 1.14x
  Read/write contention with 10 iterations and 4 threads
    Counter:            OK (2.33s)
      15.5 μs ± 384 ns
    IORef inconsistent: OK (2.41s)
      15.6 μs ±  52 ns, 1.01x
    IORef atomic:       OK (1.56s)
      19.5 μs ± 487 ns, 1.26x
    MVar:               OK (1.51s)
      28.9 μs ± 647 ns, 1.87x
    TMVar:              OK (2.88s)
      19.9 μs ± 439 ns, 1.29x
    TVar:               OK (0.38s)
      18.2 μs ± 332 ns, 1.18x
    Addr:               OK (0.59s)
      15.3 μs ± 477 ns, 0.99x
  Read/write contention with 100 iterations and 4 threads
    Counter:            OK (1.63s)
      25.3 μs ± 723 ns
    IORef inconsistent: OK (0.35s)
      17.6 μs ± 451 ns, 0.69x
    IORef atomic:       OK (0.46s)
      154  μs ± 2.1 μs, 6.08x
    MVar:               OK (0.62s)
      1.20 ms ±  38 μs, 47.27x
    TMVar:              OK (0.12s)
      191  μs ± 7.4 μs, 7.54x
    TVar:               OK (0.14s)
      221  μs ± 6.6 μs, 8.71x
    Addr:               OK (0.75s)
      21.0 μs ± 228 ns, 0.83x
  Read/write contention with 1000 iterations and 4 threads
    Counter:            OK (0.17s)
      242  μs ± 6.7 μs
    IORef inconsistent: OK (0.15s)
      103  μs ± 2.7 μs, 0.43x
    IORef atomic:       OK (2.28s)
      1.71 ms ±  36 μs, 7.07x
    MVar:               OK (0.41s)
      12.9 ms ± 338 μs, 53.16x
    TMVar:              OK (0.29s)
      2.08 ms ±  65 μs, 8.61x
    TVar:               OK (0.17s)
      2.23 ms ±  73 μs, 9.21x
    Addr:               OK (0.58s)
      234  μs ± 1.9 μs, 0.97x
  Read/write contention with 10000 iterations and 4 threads
    Counter:            OK (0.17s)
      2.24 ms ±  60 μs
    IORef inconsistent: OK (9.34s)
      1.00 ms ±  31 μs, 0.45x
    IORef atomic:       OK (70.87s)
      22.0 ms ± 820 μs, 9.80x
    MVar:               OK (0.74s)
      126  ms ± 1.9 ms, 56.07x
    TMVar:              OK (0.33s)
      19.0 ms ± 670 μs, 8.48x
    TVar:               OK (0.09s)
      20.8 ms ± 732 μs, 9.27x
    Addr:               OK (0.18s)
      2.45 ms ±  97 μs, 1.09x
  Read/write contention with 10 iterations and 6 threads
    Counter:            OK (6.77s)
      23.9 μs ± 813 ns
    IORef inconsistent: OK (0.42s)
      23.4 μs ± 712 ns, 0.98x
    IORef atomic:       OK (0.70s)
      39.4 μs ± 747 ns, 1.65x
    MVar:               OK (1.32s)
      50.0 μs ± 1.0 μs, 2.09x
    TMVar:              OK (0.52s)
      35.3 μs ± 449 ns, 1.47x
    TVar:               OK (3.93s)
      32.6 μs ±  93 ns, 1.37x
    Addr:               OK (0.42s)
      23.4 μs ± 381 ns, 0.98x
  Read/write contention with 100 iterations and 6 threads
    Counter:            OK (1.20s)
      47.8 μs ± 866 ns
    IORef inconsistent: OK (0.24s)
      26.9 μs ± 983 ns, 0.56x
    IORef atomic:       OK (1.46s)
      483  μs ±  17 μs, 10.10x
    MVar:               OK (2.01s)
      1.86 ms ±  13 μs, 38.97x
    TMVar:              OK (0.49s)
      519  μs ± 4.5 μs, 10.86x
    TVar:               OK (14.78s)
      521  μs ±  14 μs, 10.91x
    Addr:               OK (0.56s)
      43.1 μs ± 1.6 μs, 0.90x
  Read/write contention with 1000 iterations and 6 threads
    Counter:            OK (13.77s)
      459  μs ± 8.8 μs
    IORef inconsistent: OK (0.23s)
      207  μs ± 4.1 μs, 0.45x
    IORef atomic:       OK (2.10s)
      3.48 ms ±  70 μs, 7.58x
    MVar:               OK (0.64s)
      19.3 ms ± 555 μs, 42.03x
    TMVar:              OK (1.07s)
      5.55 ms ± 161 μs, 12.08x
    TVar:               OK (0.27s)
      4.85 ms ± 189 μs, 10.56x
    Addr:               OK (27.53s)
      461  μs ± 4.8 μs, 1.00x
  Read/write contention with 10000 iterations and 6 threads
    Counter:            OK (0.52s)
      4.88 ms ± 142 μs
    IORef inconsistent: OK (0.25s)
      2.25 ms ±  24 μs, 0.46x
    IORef atomic:       OK (66.59s)
      75.5 ms ±  11 ms, 15.46x
    MVar:               OK (2.85s)
      193  ms ± 4.2 ms, 39.54x
    TMVar:              OK (0.16s)
      54.9 ms ± 1.6 ms, 11.25x
    TVar:               OK (10.04s)
      47.1 ms ± 339 μs, 9.64x
    Addr:               OK (0.52s)
      4.94 ms ±  89 μs, 1.01x
  Read/write contention with 10 iterations and 8 threads
    Counter:            OK (1.84s)
      27.3 μs ± 1.0 μs
    IORef inconsistent: OK (1.98s)
      27.5 μs ± 1.0 μs, 1.01x
    IORef atomic:       OK (0.91s)
      56.1 μs ± 787 ns, 2.06x
    MVar:               OK (0.69s)
      94.9 μs ± 3.2 μs, 3.48x
    TMVar:              OK (1.35s)
      56.1 μs ± 1.5 μs, 2.06x
    TVar:               OK (0.36s)
      51.7 μs ± 1.5 μs, 1.89x
    Addr:               OK (0.24s)
      26.3 μs ± 672 ns, 0.96x
  Read/write contention with 100 iterations and 8 threads
    Counter:            OK (1.41s)
      63.1 μs ± 1.2 μs
    IORef inconsistent: OK (8.38s)
      33.3 μs ± 503 ns, 0.53x
    IORef atomic:       OK (0.59s)
      709  μs ±  21 μs, 11.25x
    MVar:               OK (1.21s)
      2.31 ms ±  49 μs, 36.65x
    TMVar:              OK (4.67s)
      667  μs ±  18 μs, 10.58x
    TVar:               OK (4.81s)
      690  μs ± 6.5 μs, 10.94x
    Addr:               OK (0.69s)
      59.6 μs ± 1.7 μs, 0.94x
  Read/write contention with 1000 iterations and 8 threads
    Counter:            OK (0.16s)
      583  μs ±  13 μs
    IORef inconsistent: OK (1.13s)
      304  μs ± 3.5 μs, 0.52x
    IORef atomic:       OK (93.87s)
      7.12 ms ± 158 μs, 12.22x
    MVar:               OK (1.54s)
      24.4 ms ± 541 μs, 41.83x
    TMVar:              OK (0.09s)
      6.32 ms ± 241 μs, 10.84x
    TVar:               OK (0.67s)
      6.36 ms ± 221 μs, 10.92x
    Addr:               OK (0.56s)
      540  μs ±  20 μs, 0.93x
  Read/write contention with 10000 iterations and 8 threads
    Counter:            OK (5.13s)
      5.67 ms ±  68 μs
    IORef inconsistent: OK (0.16s)
      3.14 ms ±  69 μs, 0.55x
    IORef atomic:       OK (35.04s)
      148  ms ± 319 μs, 26.14x
    MVar:               OK (1.58s)
      230  ms ± 8.7 ms, 40.61x
    TMVar:              OK (26.99s)
      65.9 ms ± 4.1 ms, 11.62x
    TVar:               OK (3.41s)
      66.6 ms ± 779 μs, 11.75x
    Addr:               OK (0.16s)
      5.84 ms ± 164 μs, 1.03x
  Read/write contention with 10 iterations and 12 threads
    Counter:            OK (0.54s)
      30.5 μs ± 452 ns
    IORef inconsistent: OK (0.29s)
      32.4 μs ± 820 ns, 1.06x
    IORef atomic:       OK (7.93s)
      70.3 μs ± 436 ns, 2.30x
    MVar:               OK (0.44s)
      232  μs ± 8.7 μs, 7.59x
    TMVar:              OK (13.74s)
      80.4 μs ± 2.3 μs, 2.64x
    TVar:               OK (1.73s)
      79.2 μs ± 2.7 μs, 2.60x
    Addr:               OK (0.26s)
      29.6 μs ± 1.0 μs, 0.97x
  Read/write contention with 100 iterations and 12 threads
    Counter:            OK (0.49s)
      98.0 μs ± 1.1 μs
    IORef inconsistent: OK (0.66s)
      48.2 μs ± 561 ns, 0.49x
    IORef atomic:       OK (0.88s)
      1.07 ms ±  39 μs, 10.89x
    MVar:               OK (0.86s)
      3.18 ms ±  61 μs, 32.48x
    TMVar:              OK (0.21s)
      1.00 ms ±  21 μs, 10.19x
    TVar:               OK (0.12s)
      1.06 ms ±  29 μs, 10.77x
    Addr:               OK (0.13s)
      94.0 μs ± 2.7 μs, 0.96x
  Read/write contention with 1000 iterations and 12 threads
    Counter:            OK (6.78s)
      955  μs ± 5.9 μs
    IORef inconsistent: OK (0.22s)
      466  μs ±  16 μs, 0.49x
    IORef atomic:       OK (2.46s)
      11.1 ms ± 109 μs, 11.62x
    MVar:               OK (1.14s)
      34.3 ms ± 923 μs, 35.98x
    TMVar:              OK (8.60s)
      10.7 ms ± 226 μs, 11.22x
    TVar:               OK (4.18s)
      9.82 ms ±  42 μs, 10.29x
    Addr:               OK (0.42s)
      1.00 ms ±  40 μs, 1.05x
  Read/write contention with 10000 iterations and 12 threads
    Counter:            OK (1.99s)
      9.45 ms ± 116 μs
    IORef inconsistent: OK (7.78s)
      4.74 ms ± 7.4 μs, 0.50x
    IORef atomic:       OK (82.99s)
      304  ms ±  66 ms, 32.15x
    MVar:               OK (11.29s)
      343  ms ± 8.3 ms, 36.34x
    TMVar:              OK (0.33s)
      116  ms ± 1.9 ms, 12.23x
    TVar:               OK (10.63s)
      105  ms ± 1.5 ms, 11.15x
    Addr:               OK (3.97s)
      9.47 ms ± 361 μs, 1.00x
  Read/write contention with 10 iterations and 16 threads
    Counter:            OK (0.20s)
      42.1 μs ± 1.3 μs
    IORef inconsistent: OK (0.18s)
      40.5 μs ± 1.4 μs, 0.96x
    IORef atomic:       OK (1.08s)
      153  μs ± 662 ns, 3.63x
    MVar:               OK (1.56s)
      408  μs ± 9.1 μs, 9.69x
    TMVar:              OK (2.43s)
      128  μs ± 1.9 μs, 3.04x
    TVar:               OK (0.16s)
      135  μs ± 5.0 μs, 3.19x
    Addr:               OK (0.32s)
      40.6 μs ± 940 ns, 0.96x
  Read/write contention with 100 iterations and 16 threads
    Counter:            OK (0.32s)
      131  μs ± 2.1 μs
    IORef inconsistent: OK (1.68s)
      75.4 μs ± 1.6 μs, 0.58x
    IORef atomic:       OK (4.05s)
      1.27 ms ±  45 μs, 9.70x
    MVar:               OK (4.53s)
      4.39 ms ±  85 μs, 33.50x
    TMVar:              OK (0.16s)
      1.48 ms ±  40 μs, 11.30x
    TVar:               OK (1.20s)
      1.45 ms ± 5.5 μs, 11.06x
    Addr:               OK (4.85s)
      131  μs ± 1.3 μs, 1.00x
  Read/write contention with 1000 iterations and 16 threads
    Counter:            OK (0.15s)
      1.05 ms ±  32 μs
    IORef inconsistent: OK (0.28s)
      632  μs ±  12 μs, 0.60x
    IORef atomic:       OK (53.85s)
      16.3 ms ± 1.5 ms, 15.58x
    MVar:               OK (2.98s)
      46.6 ms ± 672 μs, 44.57x
    TMVar:              OK (0.09s)
      15.2 ms ± 472 μs, 14.54x
    TVar:               OK (0.36s)
      14.6 ms ± 354 μs, 13.96x
    Addr:               OK (0.15s)
      1.38 ms ±  47 μs, 1.32x
  Read/write contention with 10000 iterations and 16 threads
    Counter:            OK (0.17s)
      13.3 ms ± 301 μs
    IORef inconsistent: OK (0.16s)
      5.73 ms ± 119 μs, 0.43x
    IORef atomic:       OK (117.97s)
      464  ms ±  38 ms, 34.81x
    MVar:               OK (14.95s)
      481  ms ± 4.5 ms, 36.11x
    TMVar:              OK (1.80s)
      155  ms ± 2.7 ms, 11.65x
    TVar:               OK (7.20s)
      146  ms ± 663 μs, 10.93x
    Addr:               OK (5.34s)
      12.8 ms ± 187 μs, 0.96x
  Read/write contention with 10 iterations and 20 threads
    Counter:            OK (0.19s)
      49.3 μs ± 1.9 μs
    IORef inconsistent: OK (0.20s)
      48.2 μs ± 1.7 μs, 0.98x
    IORef atomic:       OK (5.11s)
      188  μs ± 5.2 μs, 3.82x
    MVar:               OK (4.08s)
      536  μs ± 5.4 μs, 10.86x
    TMVar:              OK (1.44s)
      159  μs ± 531 ns, 3.23x
    TVar:               OK (0.37s)
      176  μs ± 3.0 μs, 3.57x
    Addr:               OK (0.82s)
      50.6 μs ± 1.0 μs, 1.03x
  Read/write contention with 100 iterations and 20 threads
    Counter:            OK (0.10s)
      171  μs ± 5.7 μs
    IORef inconsistent: OK (0.25s)
      90.0 μs ± 2.1 μs, 0.52x
    IORef atomic:       OK (9.07s)
      1.52 ms ±  59 μs, 8.87x
    MVar:               OK (2.90s)
      5.61 ms ± 159 μs, 32.73x
    TMVar:              OK (1.57s)
      1.85 ms ±  54 μs, 10.77x
    TVar:               OK (0.19s)
      1.88 ms ±  28 μs, 10.98x
    Addr:               OK (0.18s)
      169  μs ± 3.8 μs, 0.98x
  Read/write contention with 1000 iterations and 20 threads
    Counter:            OK (0.74s)
      1.68 ms ±  65 μs
    IORef inconsistent: OK (0.18s)
      881  μs ±  24 μs, 0.52x
    IORef atomic:       OK (73.70s)
      21.3 ms ± 600 μs, 12.66x
    MVar:               OK (0.94s)
      61.0 ms ± 1.4 ms, 36.31x
    TMVar:              OK (7.19s)
      18.1 ms ± 472 μs, 10.75x
    TVar:               OK (7.26s)
      18.1 ms ± 494 μs, 10.79x
    Addr:               OK (0.09s)
      1.74 ms ±  43 μs, 1.04x
  Read/write contention with 10000 iterations and 20 threads
    Counter:            OK (6.72s)
      16.2 ms ± 174 μs
    IORef inconsistent: OK (6.36s)
      7.73 ms ± 247 μs, 0.48x
    IORef atomic:       OK (21.41s)
      641  ms ± 264 μs, 39.55x
    MVar:               OK (38.82s)
      609  ms ±  21 ms, 37.53x
    TMVar:              OK (4.56s)
      192  ms ± 648 μs, 11.82x
    TVar:               OK (1.13s)
      193  ms ± 7.5 ms, 11.91x
    Addr:               OK (0.43s)
      16.8 ms ± 581 μs, 1.03x
  Read/write contention with 10 iterations and 32 threads
    Counter:            OK (1.01s)
      73.7 μs ± 1.3 μs
    IORef inconsistent: OK (0.51s)
      72.0 μs ± 777 ns, 0.98x
    IORef atomic:       OK (0.24s)
      178  μs ± 3.8 μs, 2.41x
    MVar:               OK (0.91s)
      919  μs ±  17 μs, 12.47x
    TMVar:              OK (2.27s)
      277  μs ± 4.4 μs, 3.76x
    TVar:               OK (0.15s)
      299  μs ±  10 μs, 4.06x
    Addr:               OK (0.50s)
      71.8 μs ± 2.6 μs, 0.97x
  Read/write contention with 100 iterations and 32 threads
    Counter:            OK (0.29s)
      280  μs ± 6.4 μs
    IORef inconsistent: OK (1.44s)
      150  μs ± 3.2 μs, 0.53x
    IORef atomic:       OK (0.25s)
      1.74 ms ±  56 μs, 6.22x
    MVar:               OK (1.19s)
      9.06 ms ± 269 μs, 32.31x
    TMVar:              OK (4.80s)
      2.98 ms ±  36 μs, 10.64x
    TVar:               OK (19.94s)
      3.10 ms ±  27 μs, 11.04x
    Addr:               OK (0.55s)
      272  μs ± 9.4 μs, 0.97x
  Read/write contention with 1000 iterations and 32 threads
    Counter:            OK (2.27s)
      2.69 ms ±  48 μs
    IORef inconsistent: OK (2.18s)
      1.27 ms ±  23 μs, 0.47x
    IORef atomic:       OK (62.21s)
      35.2 ms ± 4.3 ms, 13.08x
    MVar:               OK (5.90s)
      89.9 ms ± 233 μs, 33.44x
    TMVar:              OK (2.96s)
      30.7 ms ± 470 μs, 11.40x
    TVar:               OK (5.98s)
      30.4 ms ± 124 μs, 11.30x
    Addr:               OK (0.29s)
      2.75 ms ±  93 μs, 1.02x
  Read/write contention with 10000 iterations and 32 threads
    Counter:            OK (0.08s)
      28.8 ms ± 1.0 ms
    IORef inconsistent: OK (0.32s)
      12.9 ms ± 351 μs, 0.45x
    IORef atomic:       OK (82.13s)
      1.180 s ±  65 ms, 41.02x
    MVar:               OK (128.00s)
      982  ms ±  14 ms, 34.15x
    TMVar:              OK (0.17s)
      299  ms ± 7.6 ms, 10.41x
    TVar:               OK (7.51s)
      316  ms ±  12 ms, 11.00x
    Addr:               OK (1.36s)
      27.0 ms ± 871 μs, 0.94x
  Read/write contention with 10 iterations and 64 threads
    Counter:            OK (0.84s)
      132  μs ± 3.3 μs
    IORef inconsistent: OK (0.44s)
      132  μs ± 2.4 μs, 1.00x
    IORef atomic:       OK (0.18s)
      323  μs ± 6.5 μs, 2.45x
    MVar:               OK (0.47s)
      1.89 ms ±  74 μs, 14.33x
    TMVar:              OK (4.18s)
      557  μs ± 1.7 μs, 4.22x
    TVar:               OK (2.19s)
      575  μs ± 1.6 μs, 4.36x
    Addr:               OK (0.19s)
      125  μs ± 2.8 μs, 0.95x
  Read/write contention with 100 iterations and 64 threads
    Counter:            OK (0.28s)
      568  μs ±  13 μs
    IORef inconsistent: OK (0.32s)
      284  μs ± 2.9 μs, 0.50x
    IORef atomic:       OK (0.21s)
      2.62 ms ±  95 μs, 4.62x
    MVar:               OK (2.37s)
      18.1 ms ± 688 μs, 31.81x
    TMVar:              OK (0.15s)
      6.24 ms ± 173 μs, 10.98x
    TVar:               OK (5.00s)
      6.30 ms ± 162 μs, 11.08x
    Addr:               OK (2.06s)
      555  μs ± 7.4 μs, 0.98x
  Read/write contention with 1000 iterations and 64 threads
    Counter:            OK (0.59s)
      5.80 ms ± 111 μs
    IORef inconsistent: OK (2.10s)
      2.49 ms ±  96 μs, 0.43x
    IORef atomic:       OK (2.36s)
      75.8 ms ± 396 μs, 13.06x
    MVar:               OK (2.88s)
      188  ms ± 4.8 ms, 32.41x
    TMVar:              OK (5.95s)
      61.7 ms ± 176 μs, 10.63x
    TVar:               OK (1.52s)
      63.8 ms ± 2.4 ms, 11.00x
    Addr:               OK (0.28s)
      5.52 ms ± 167 μs, 0.95x
  Read/write contention with 10000 iterations and 64 threads
    Counter:            OK (2.74s)
      56.6 ms ± 2.2 ms
    IORef inconsistent: OK (10.26s)
      26.0 ms ± 746 μs, 0.46x
    IORef atomic:       OK (100.70s)
      3.018 s ± 328 ms, 53.35x
    MVar:               OK (30.56s)
      1.994 s ±  16 ms, 35.25x
    TMVar:              OK (0.36s)
      665  ms ± 3.4 ms, 11.76x
    TVar:               OK (1.79s)
      653  ms ± 2.4 ms, 11.55x
    Addr:               OK (11.06s)
      57.0 ms ± 546 μs, 1.01x
  Read/write contention with 10 iterations and 128 threads
    Counter:            OK (0.67s)
      240  μs ± 5.9 μs
    IORef inconsistent: OK (1.43s)
      247  μs ± 8.5 μs, 1.03x
    IORef atomic:       OK (1.38s)
      642  μs ± 6.2 μs, 2.68x
    MVar:               OK (1.92s)
      3.83 ms ±  34 μs, 15.94x
    TMVar:              OK (16.12s)
      1.11 ms ±  23 μs, 4.63x
    TVar:               OK (4.27s)
      1.15 ms ±  36 μs, 4.78x
    Addr:               OK (0.16s)
      230  μs ± 6.9 μs, 0.96x
  Read/write contention with 100 iterations and 128 threads
    Counter:            OK (0.26s)
      1.15 ms ±  42 μs
    IORef inconsistent: OK (0.31s)
      560  μs ±  18 μs, 0.49x
    IORef atomic:       OK (0.31s)
      7.80 ms ± 274 μs, 6.81x
    MVar:               OK (0.55s)
      36.2 ms ± 1.2 ms, 31.54x
    TMVar:              OK (19.40s)
      12.4 ms ±  20 μs, 10.78x
    TVar:               OK (0.62s)
      13.1 ms ± 411 μs, 11.39x
    Addr:               OK (0.13s)
      1.12 ms ±  24 μs, 0.98x
  Read/write contention with 1000 iterations and 128 threads
    Counter:            OK (0.15s)
      8.93 ms ± 264 μs
    IORef inconsistent: OK (0.52s)
      5.28 ms ± 104 μs, 0.59x
    IORef atomic:       OK (80.71s)
      310  ms ±  57 ms, 34.68x
    MVar:               OK (5.86s)
      378  ms ± 3.1 ms, 42.38x
    TMVar:              OK (1.51s)
      131  ms ± 3.5 ms, 14.62x
    TVar:               OK (3.07s)
      129  ms ± 981 μs, 14.44x
    Addr:               OK (4.47s)
      11.3 ms ± 139 μs, 1.27x
  Read/write contention with 10000 iterations and 128 threads
    Counter:            OK (1.37s)
      112  ms ± 3.3 ms
    IORef inconsistent: OK (5.13s)
      53.3 ms ± 571 μs, 0.47x
    IORef atomic:       OK (11.14s)
      6.952 s ±  54 ms, 61.84x
    MVar:               OK (12.70s)
      4.075 s ± 154 ms, 36.25x
    TMVar:              OK (3.51s)
      1.268 s ± 5.7 ms, 11.27x
    TVar:               OK (0.71s)
      1.306 s ±  50 ms, 11.62x
    Addr:               OK (11.18s)
      117  ms ± 1.7 ms, 1.04x

All 309 tests passed (2397.33s)

Memory overhead

In pure Haskell (i.e. with no-cmm flag enabled) each unlifted value of type Counter is a singleton mutable array from GHC primitives under the hood. Thus it occupies at least platform integer size + array size + header bytes which should typically be at least 3 machine words. Lifted values may occupy more depending on optimizations.

By default CMM will be enabled which should save one word of overhead because there would be no array any more hence no need to store trivial size.