Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instance Arbitrary Double uses lots of memory with high sizes #418

Open
ChickenProp opened this issue Dec 31, 2024 · 2 comments · May be fixed by #419
Open

Instance Arbitrary Double uses lots of memory with high sizes #418

ChickenProp opened this issue Dec 31, 2024 · 2 comments · May be fixed by #419

Comments

@ChickenProp
Copy link

ChickenProp commented Dec 31, 2024

My code has been doing scale (^ 8) to generate Doubles in between ±10^16 instead of in between ±100. But at that size, memory usage is very high since QuickCheck-2.14.3.

$ ghci -package QuickCheck-2.14.2 +RTS -M100M -RTS
GHCi, version 9.6.6: https://www.haskell.org/ghc/  :? for help
ghci> import Test.QuickCheck
ghci> flip traverse [1..100] $ \_ -> length . show <$> generate (scale (const 100_000_000) $ arbitrary @Double)
[20,20,19,19,19,19,20,20,21,20,20,19,20,17,20,19,21,19,20,21,18,19,20,20,20,18,20,19,20,19,19,18,19,21,20,19,18,19,20,21,20,19,20,19,19,19,18,20,19,19,21,19,19,17,20,19,16,21,20,19,18,19,20,20,19,19,21,20,20,21,20,20,19,19,19,20,20,19,19,19,19,20,19,19,17,20,20,20,20,20,18,19,20,19,19,19,20,20,20,19]

$ ghci -package QuickCheck-2.14.3 +RTS -M100M -RTS
GHCi, version 9.6.6: https://www.haskell.org/ghc/  :? for help
ghci> import Test.QuickCheck
ghci> flip traverse [1..100] $ \_ -> length . show <$> generate (scale (const 100_000_000) $ arbitrary @Double)
[20,*** Exception: heap overflow

(But I've also seen the heap overflow with -M4G.)

I guess this is due to smallDenominators introduced in #297, which picks a number up to size and then uses O(n) time and memory to index into a tree of rationals.

Some possibilities that come to mind:

  • Specify that Arbitrary instances might not expect sizes that large, and users need to use other generators if they want that kind of thing. I don't think it would cause me big problems, but I don't love the idea. ("I want bigger numbers -> I'll use a bigger size" is pretty natural, which is how I came across this problem; and if the numbers are embedded in some other data structure, specifying another generator might be awkward.)
  • In smallDenominators, limit the size to some relatively small max.
  • It's possible to calculate the tree values without enumerating the tree. I guess this would make it O(log n) time and memory (effectively O(1) because size is Int not Integer). I expect this would be a lot faster and more memory efficient for large n. If it's too slow for small n, then "build the first _ tree values, calculate directly from then on" could work.
@phadej
Copy link
Contributor

phadej commented Jan 4, 2025

I can only say that size argument in QC is very much ad-hoc and particular Arbitrary instance specific. Without "breaking" (i.e. significantly changing) existing code, you just need to know how particular instance is behaving. This is especially true for compositional instances. Try to generate [[[[Int]]]]], it will blow up memory even at small sizes. In particular you cannot hint "generate small lists with big ints" nor "generate small lists of big ints"; the size is not "budgeted", it's copied.

That said, the

In smallDenominators, limit the size to some relatively small max.

feels the best solution. I.e. fix the particular instance in ad-hoc way. It's ad-hoc instance (size-wise) anyway.

ChickenProp added a commit to proda-ai/quickcheck that referenced this issue Jan 6, 2025
    ghci> flip traverse [1..100] $ \_ -> length . show <$> generate (scale (const 100000000) $ (arbitrary :: Gen Double))

on master, this is slow and memory hungry. (I couldn't figure out how to
pass RTS options to ghci using `cabal repl`, but I've seen it take over
5 minutes and 10 GB.) With this commit it's fast.

Closes nick8325#418.
@ChickenProp
Copy link
Author

Nod. I've opened #419 for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants