Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function for generating Float/Double in [0,1] range #149

Merged
merged 11 commits into from
Jun 15, 2020

Conversation

Shimuuar
Copy link

@Shimuuar Shimuuar commented Jun 8, 2020

There're functions for Float and Double. They come in two variants: one that
sample in [0,1] interval and one that samples in (0,1]. Former allows to save
few floating poit operations when one needs values in [0,1] range. Which is very
commonly used primitive

Latter isimportant when output of generator is fed into function which not
defined at 0. This is again very common. For Float sampling zero is
feasible. For Double it's improbable but could in principle happen in bad
generator/seed combo, or for malicious generator initialization

Created as result of discussion in haskell#62

There're functions for Float and Double. They come in two variants: one that
sample in [0,1] interval and one that samples in (0,1].  Former allows to save
few floating poit operations when one needs values in [0,1] range. Which is very
commonly used primitive

Latter isimportant when output of generator is fed into function which not
defined at 0. This is again very common. For Float sampling zero is
feasible. For Double it's improbable but could in principle happen in bad
generator/seed combo, or for malicious generator inicialization.
src/System/Random/Internal.hs Outdated Show resolved Hide resolved
m = fromIntegral (maxBound :: Word64) :: Double
{-# INLINE word64ToDoubleInUnitInterval #-}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you run the benchmarks on this? I have a feeling that uniformDouble01M might need to be INLINE to avoid a regression, but I'm not sure.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. No change at all. I played a bit with strictness but GHC but I wasn't able to find any measurable change.

src/System/Random/Internal.hs Outdated Show resolved Hide resolved
src/System/Random/Internal.hs Outdated Show resolved Hide resolved
src/System/Random/Internal.hs Outdated Show resolved Hide resolved

-- | Generate uniformly distributed 'Double' in the range (0, 1]. That
-- is result is guaranteed to be positive
uniformDoublePos01M :: StatefulGen g m => g -> m Double
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I see "pos", I think "position". How about uniformDoublePositive01M? Longer but unambiguous.

Additional thought: "positive 0" actually refers to +0, as opposed to "negative zero", -0, in IEEE 754. So people might think that something called uniformDoublePos01M or uniformDoublePositive01M includes positive zero. To avoid "positive" and "0" following each other in the name of the function, perhaps the scheme could be

  • uniformDouble01
  • uniformDouble01Positive
  • uniformFloat01
  • uniformFloat01Positive

Comment on lines +732 to +734
-- We add small constant to shift generated value from zero. It's
-- selected as 1/2 of smallest possible nonzero value
d = 2.710505431213761e-20 -- 2**(-65)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the largest value that can be added to 1.0 such that the result is still 1.0. There are many smaller values that also have this property and that would also result in never getting 0 as a result of this function. My guess is that the goal here was to avoid "excessively small values" rather than just avoiding zero, is that correct? If so, I would suggest extending the description a little to justify why this is 2^-64 and not e.g. 5e-324, which would also avoid getting zero but would not exclude very small numbers from being returned.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It' good question.

No. It's 1/2 of smallest non zero value that this algorithm could generate. Largest ε that 1+ε=1 is 2^{-53}. Choice is somewhat arbitrary. Another possibility is to pick ε*2^{-64}. That is largest number that will leave all other values unchanged.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, thanks for the correction.

I guess the question is: under what circumstances do people want to avoid getting a zero here, and under those circumstances, would they also want to avoid getting extremely small values?

My default here would be to exclude only the zero. Everything else should have a good reason IMO, since excluding more numbers results in a larger effect on the outcome distribution.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When result is fed into function that isn't defined for 0. Most common choices are: log 1/x^n. For log anything nonzero goes. For inverse x it's better to get some distance between underflow and 1/2^64. Denormalized numbers are sort of useless here since 1/x will yield same ∞ as zero.

ε*2^{-64}. looks like good choice since it doesn't change rest of distribution and still quite far from underflows

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1/2 of smallest possible nonzero value

I don't think that's right. The smallest positive double, or the smallest absolute value that is non-zero, is 5e-324. So "1/2 of smallest possible nonzero value" would be 2.5e-324.

I don't have a very strong opinion on what this number should be, but the comment should be updated.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can explain that what @Shimuuar suggests is actually very good. Suppose you have floating point numbers with only 2 bits for the mantissa and 2 bits for the exponent then using the proposed algorithm:

word32ToFloatInUnitInterval :: Int -> Float
word32ToFloatInUnitInterval w32 = f / m
  where
    f = fromIntegral w32 :: Float
    m = fromIntegral (maxBound :: Int) :: Float

and manually calculating for our tiny floating point values

map ((/4) . fromIntegral) [0,1,2,3]
[0.0,0.25,0.5,0.75]

The smallest value the RNG algorithm can generate is 0.25 and half of this is 0.125 so we will get

map ((+(0.25/2)) . (/4) . fromIntegral) [0,1,2,3]
[0.125,0.375,0.625,0.875]

which has the nice property that it is symmetrical in the interval. N.B. that these really are available in the floating point representation

normals :: [Double]
normals = [ 1/2^e * (1 + m0 * (1/2^1) + m1 * (1/2^2)) | e <- [2,1], m0 <- [0,1], m1 <- [0,1] ]
[0.25,0.3125,0.375,0.4375,0.5,0.625,0.75,0.875]

so no rounding is happening (that was what I was worried about).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course this nice property doesn't really manifest itself with Double and even Float - you would have to sample a lot of Double to notice the mean is now correct :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay that clears it up.

1/2 of smallest possible nonzero value

"smallest possible" here does not refer to the IEEE 754 encoding but to the division algorithm we use to generate doubles. Cool. Let's put that in the comment.

For concreteness' sake, here is my suggestion:

Add a small constant shift away from zero. This constant is 1/2 of the smallest positive value generated by 'uniformDouble01M'.

(And something similar for Float).

@idontgetoutmuch idontgetoutmuch dismissed their stale review June 10, 2020 15:47

I want to check how this changes this distribution for 8 bit floating point numbers just to convince myself.

Copy link
Collaborator

@lehins lehins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can only recommend adding two minor things:

  • A note to haddock for uniformFloat01M/uniformDouble01M saying that they are used for implementing uniformM for Float/Double respectfully.
  • @since 1.2.0 for all newly added functions in this PR

@Shimuuar
Copy link
Author

@lehins I've added since annotations and expanded documentation for unform{Float,Double}01M. I left positive variation as they are until we figure out how should they work

Unrelated. Should we add @since for all functions added in 1.2?

@lehins
Copy link
Collaborator

lehins commented Jun 10, 2020

Should we add @since for all functions added in 1.2?

Yes, we should add it for all new functions except the ones that are only exported from the Internal module. As far as I know this is already the case, but of course it is possible that we missed some

@lehins
Copy link
Collaborator

lehins commented Jun 10, 2020

lol. @SInCE guy probably hates by now all the Haskellers for tagging him in all the issues on github 😄

@Shimuuar
Copy link
Author

Instances certainly don't have such annotations.

It could've been worse. Like being tagged by java programmers.

Copy link
Owner

@idontgetoutmuch idontgetoutmuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shimuuar I assume extra constant addition has almost no effect on performance?

@Shimuuar
Copy link
Author

I'll benchmarks for new generators and we'll compare them and uniformRM

@Shimuuar
Copy link
Author

First round of benchmarks is very confusing. It seems we mostly measure overhead of threading state:

unbounded/Float                         mean 50.07 μs  ( +- 902.2 ns  )
unbounded/Double                        mean 24.64 μs  ( +- 156.8 ns  )
floating/IO/uniformFloat01M             mean 49.29 μs  ( +- 253.2 ns  )
floating/IO/uniformFloatPositive01M     mean 49.76 μs  ( +- 410.6 ns  )
floating/IO/uniformDouble01M            mean 49.21 μs  ( +- 294.3 ns  )
floating/IO/uniformDoublePositive01M    mean 49.56 μs  ( +- 532.6 ns  )
floating/St/uniformFloat01M             mean 73.86 μs  ( +- 472.0 ns  )
floating/St/uniformFloatPositive01M     mean 73.53 μs  ( +- 276.9 ns  )
floating/St/uniformDouble01M            mean 73.31 μs  ( +- 300.2 ns  )
floating/St/uniformDoublePositive01M    mean 73.51 μs  ( +- 303.9 ns  )

benchmark for uniformRM runs twice as fast as benchmark for uniformDouble01M despite the fact that it's defined using uniformDouble01M Running benchmark in State monad is 1.5 times slower than running it in StateT IO.

Even worse. benchmark runtime jumps randomly jump when Main.hs. Few benchmarks were added and yet slow benchmak become fast and slow one fast.

pure/uniformR/unbounded/Float            mean 24.82 μs  ( +- 1.546 μs  )
pure/uniformR/unbounded/Double           mean 49.23 μs  ( +- 290.0 ns  )
-----
pure/uniformR/unbounded/Float            mean 50.07 μs  ( +- 902.2 ns  )
pure/uniformR/unbounded/Double           mean 24.64 μs  ( +- 156.8 ns  )

It seems that we're at mercy of optimizer here.

@Shimuuar
Copy link
Author

I've added benchmarks which thread state manually (via getMany) and results are just as confusing. I played with INLINE pragmas.

-- NO INLINE
uniformFloat01M                 mean 49.66 μs  ( +- 916.7 ns  )
uniformFloatPositive01M         mean 25.04 μs  ( +- 376.6 ns  )
uniformDouble01M                mean 25.05 μs  ( +- 283.4 ns  )
uniformDoublePositive01M        mean 49.93 μs  ( +- 1.098 μs  )
-- INLINE
uniformFloat01M			mean 24.74 μs  ( +- 178.5 ns  )
uniformFloatPositive01M		mean 24.68 μs  ( +- 185.8 ns  )
uniformDouble01M		mean 49.44 μs  ( +- 420.2 ns  )
uniformDoublePositive01M	mean 24.59 μs  ( +- 94.56 ns  )

What's more worrying GHC is not able to cut through state monad at -O1 and we get about 10x slowdown (haskell is usually compiled with -O1, not -O2):

IO/uniformFloat01M              mean 224.5 μs  ( +- 1.399 μs  )
IO/uniformFloatPositive01M      mean 229.2 μs  ( +- 1.536 μs  )
IO/uniformDouble01M             mean 228.9 μs  ( +- 5.127 μs  )
IO/uniformDoublePositive01M     mean 227.8 μs  ( +- 1.462 μs  )
St/uniformFloat01M              mean 220.6 μs  ( +- 1.263 μs  )
St/uniformFloatPositive01M      mean 219.7 μs  ( +- 856.4 ns  )
St/uniformDouble01M             mean 219.0 μs  ( +- 567.8 ns  )
St/uniformDoublePositive01M     mean 220.7 μs  ( +- 5.145 μs  )
pure/uniformFloat01M		mean 24.81 μs  ( +- 207.7 ns  )
pure/uniformFloatPositive01M	mean 24.81 μs  ( +- 200.5 ns  )
pure/uniformDouble01M		mean 49.05 μs  ( +- 172.8 ns  )
pure/uniformDoublePositive01M	mean 24.59 μs  ( +- 166.3 ns  )

@curiousleo
Copy link
Collaborator

Those are very valuable findings! Benchmarking the IO and ST based APIs is sorely lacking right now. Ideally we would be able to benchmark the full [Pure, IO, ST] x Uniform instances matrix.

@idontgetoutmuch
Copy link
Owner

What's more worrying GHC is not able to cut through state monad at -O1 and we get about 10x slowdown (haskell is usually compiled with -O1, not -O2):

I thought stack defaulted to -O2?

@Shimuuar
Copy link
Author

I don't know about stack but cabal uses -O by default. It also complains about -O2. Here is what cabal thinks about math-functions:

Warning: These warnings may cause trouble when distributing the package:
Warning: 'ghc-options: -O2' is rarely needed. Check that it is giving a real
benefit and not just imposing longer compile times on your users.

@lehins
Copy link
Collaborator

lehins commented Jun 15, 2020

I don't know about stack but cabal uses -O by default.

Stack uses Cabal the library underneath and it does the same thing by default: -O

@curiousleo
Copy link
Collaborator

I don't know about stack but cabal uses -O by default.

Stack uses Cabal the library underneath and it does the same thing by default: -O

Just to check my mental model: if I'm writing code in some project, and I depend on random, does my project have to be compiled with -O2 for inlining and other optimisations that we sort of rely on to take effect?

Related - do stack and cabal respect the ghc-options in the .cabal file of (transitive) dependencies? That is, what is the effect of adding -O2 here?

-Wall

@lehins
Copy link
Collaborator

lehins commented Jun 15, 2020

It is a bad practice to add optimization flags in the library section, in fact I think Cabal will give you a warning about it if you do. It is the end executable that really matters. So if you have a project foo that depends on random and when foo defines some sort of executable (benchmarks, tests or executable) then you need to specify all the ghc flags relevant for your executable, including optimizations, which will be applied to random code as well.

In order for inlining to be triggered -O is enough. I believe -O2 makes ghc a bit more aggressive with all its optimizations.

@Shimuuar
Copy link
Author

I changed flags in the benchmark but didn't touch random library. So yes, client code should enable -O2.

It is a bad practice to add optimization flags in the library section

It depends. Anything that is not inlined in user code will not benefit from -O2. Which may or may not be a problem

P.S. I think that this PR should be merged. It didn't introduce any regressions. Only showed that we may have some performance problems unrelated to it

Comment on lines +191 to +192
[ bench "uniformFloat01M" $ nfIO $ runStateGenT_ (mkStdGen 1337) $ \g ->
replicateM_ sz $ do !_ <- uniformFloat01M g
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to make this more generic, i.e. to run all benchmarks with runStateGenT_ rather than just this specific one? I would really like us to have a full benchmark matrix.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely it's possible. I however would prefer to do it in separate PR in order to one thing at a time

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Filed #161.

@curiousleo
Copy link
Collaborator

I'd still like to see #149 (comment) addressed. Otherwise LGTM!

@Shimuuar
Copy link
Author

I just update haddocks. Do you like updated version?

image

@lehins
Copy link
Collaborator

lehins commented Jun 15, 2020

Fancy :)

@curiousleo curiousleo merged commit 2ee2c58 into idontgetoutmuch:v1.2-proposal Jun 15, 2020
@curiousleo
Copy link
Collaborator

@Shimuuar I squashed the commits but kept the commit message of the initial commit, see 2ee2c58.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants