Why does random for Float and Double produce exactly 24 or 53 bits? #58

Zemyla · 2020-01-05T18:37:34Z

It seems like we could exploit the fact that those types have more precision closer to 0. The algorithm would look like this:

randomDouble :: RandomGen g => g -> (Double, g)
randomDouble = rr where
  b :: Word64
  b = bit 53
  mask = b - 1
  r = 1.0 / fromIntegral b

  rr g | g `seq` False = undefined
  rr g = case randomR (0, mask) g of
    (i, g') | testBit i 52 -> seq x (x, g') where
      x = r * fromIntegral i
    (0, g') -> go0 r 53 g'
    (i, g') -> let
      cs = countLeadingZeros i - 11
      in case randomR (0, bit cs - 1) g' of
        (k, g'') -> seq x (x, g'') where
          x = r * fromIntegral (unsafeShiftL i cs .|. k) / fromIntegral ((bit cs) :: Word64)

  go0 rc sh g | rc `seq` sh `seq` g `seq` False = undefined
  -- Stop before hitting denormals, because those are a pain.
  go0 rc sh g | sh >= 1022 = (0.0, g)
  go0 rc sh g = case randomR (0, mask) g of
    (i, g') | testBit i 52 -> seq x (x, g') where
      x = rc * fromIntegral i
    (0, g') -> go0 (rc * r) (sh + 53) g'
    (i, g')
      | sh + cs >= 1022 -> (0.0, g')
      | otherwise -> case randomR (0, bit cs - 1) g' of
          (k, g'') -> seq x (x, g'') where
            x = rc * fromIntegral (unsafeShiftL i cs .|. k) / fromIntegral ((bit cs) :: Word64)
      where
        cs = countLeadingZeros i - 11

And then randomRFloating could be defined to take advantage of the increased precision near 0:

randomRFloating :: (Fractional a, Ord a, Random a, RandomGen g) => (a, a) -> g -> (a, g)
randomRFloating = rrf0 where
  rrf0 (l, h) g = case compare l h of
    LT -> rrf l h g
    EQ -> (l, g)
    GT -> rrf h l g

  rrf l h g | l `seq` h `seq` g `seq` False = undefined
  rrf l h g | l >= 0 = case random g of
    (coef, g') -> seq x (x, g') where
      x = 2.0 * (0.5 * l + coef * (0.5 * h - 0.5 * l))
  rrf l h g | h <= 0 = case random g of
    (coef, g') -> seq x (x, g') where
      x = 2.0 * (0.5 * h + coef * (0.5 * l - 0.5 * h))
  -- Here, l < 0 < h. We randomly choose one side and then generate a random number on that side.
  rrf l h g = let
    rdiv = 1 - toRational h / toRational l
    in seq rdiv $ case randomR (0, denominator rdiv - 1) g of
      (i, g') | i < numerator rdiv -> go g' where
        -- Don't generate 0 on the lower end.
        go gc = case random gc of
          (r, gc')
            | x == 0 = go gc'
            | otherwise = (x, gc')
            where
              x = r * l
      (i, g') -> case random g' of
        (r, g'') -> seq x (x, g'') where
          x = r * h

cartazio · 2020-01-05T19:43:26Z

its a bug/problem in current random

cartazio · 2020-01-05T19:45:00Z

a better algorithm with decent perf can be found here https://github.com/cartazio/old-random/blob/v1.3.0/src/Data/Distribution/FloatingInterval.hs

cartazio · 2020-01-05T19:58:42Z

you raised a very good point @Zemyla :)

cartazio · 2020-01-05T19:59:14Z

i've been in a rabbit hole on some ghc patching experiments for vector, i'll dust this stuff off post haste :)

…ings Instances and warnings

Instances and warnings

author Alexey Kuleshevich <[email protected]> 1581472095 +0300 committer Leonhard Markert <[email protected]> 1590493894 +0200 This patch is mostly backwards compatible. See "Breaking Changes" below for the full list of backwards incompatible changes. This patch fixes quality and performance issues, addresses additional miscellaneous issues, and introduces a monadic API. Issues addressed ================ Priority issues fixed in this patch: - Title: "The seeds generated by split are not independent" Link: haskell#25 Fixed: changed algorithm to SplitMix, which provides a robust 'split' operation - Title: "Very low throughput" Link: haskell#51 Fixed: see "Performance" below Additional issues addressed in this patch: - Title: "Add Random instances for tuples" Link: haskell#26 Addressed: added 'Uniform' instances for up to 6-tuples - Title: "Add Random instance for Natural" Link: haskell#44 Addressed: added 'UniformRange' instance for 'Natural' - Title: "incorrect distribution of randomR for floating-point numbers" Link: haskell#53 Addressed: see "Regarding floating-point numbers" below - Title: "System/Random.hs:43:1: warning: [-Wtabs]" Link: haskell#55 Fixed: no more tabs - Title: "Why does random for Float and Double produce exactly 24 or 53 bits?" Link: haskell#58 Fixed: see "Regarding floating-point numbers" below - Title: "read :: StdGen fails for strings longer than 6" Link: haskell#59 Addressed: 'StdGen' is no longer an instance of 'Read' Regarding floating-point numbers: with this patch, the relevant instances for 'Float' and 'Double' sample more bits than before but do not sample every possible representable value. The documentation now clearly spells out what this means for users. Quality (issue 25) ================== The algorithm [1] in version 1.1 of this library fails empirical PRNG tests when used to generate "split sequences" as proposed in [3]. SplitMix [2] passes the same tests. This patch changes 'StdGen' to use the SplitMix implementation provided by the splitmix package. Test batteries used: dieharder, TestU1, PractRand. [1]: P. L'Ecuyer, "Efficient and portable combined random number generators". https://doi.org/10.1145/62959.62969 [2]: G. L. Steele, D. Lea, C. H. Flood, "Fast splittable pseudorandom number generators". https://doi.org/10.1145/2714064.2660195 [3]: H. G. Schaathun, "Evaluation of splittable pseudo-random generators". https://doi.org/10.1017/S095679681500012X Performance (issue 51) ====================== The "improvement" column in the following table is a multiplier: the improvement for 'random' for type 'Float' is 1038, so this operation is 1038 times faster with this patch. | Name | Mean (1.1) | Mean (patch) | Improvement| | ----------------------- | ---------- | ------------ | ---------- | | pure/random/Float | 30 | 0.03 | 1038| | pure/random/Double | 52 | 0.03 | 1672| | pure/random/Integer | 43 | 0.33 | 131| | pure/uniform/Word8 | 14 | 0.03 | 422| | pure/uniform/Word16 | 13 | 0.03 | 375| | pure/uniform/Word32 | 21 | 0.03 | 594| | pure/uniform/Word64 | 42 | 0.03 | 1283| | pure/uniform/Word | 44 | 0.03 | 1491| | pure/uniform/Int8 | 15 | 0.03 | 511| | pure/uniform/Int16 | 15 | 0.03 | 507| | pure/uniform/Int32 | 22 | 0.03 | 749| | pure/uniform/Int64 | 44 | 0.03 | 1405| | pure/uniform/Int | 43 | 0.03 | 1512| | pure/uniform/Char | 17 | 0.49 | 35| | pure/uniform/Bool | 18 | 0.03 | 618| | pure/uniform/CChar | 14 | 0.03 | 485| | pure/uniform/CSChar | 14 | 0.03 | 455| | pure/uniform/CUChar | 13 | 0.03 | 448| | pure/uniform/CShort | 14 | 0.03 | 473| | pure/uniform/CUShort | 13 | 0.03 | 457| | pure/uniform/CInt | 21 | 0.03 | 737| | pure/uniform/CUInt | 21 | 0.03 | 742| | pure/uniform/CLong | 43 | 0.03 | 1544| | pure/uniform/CULong | 42 | 0.03 | 1460| | pure/uniform/CPtrdiff | 43 | 0.03 | 1494| | pure/uniform/CSize | 43 | 0.03 | 1475| | pure/uniform/CWchar | 22 | 0.03 | 785| | pure/uniform/CSigAtomic | 21 | 0.03 | 749| | pure/uniform/CLLong | 43 | 0.03 | 1554| | pure/uniform/CULLong | 42 | 0.03 | 1505| | pure/uniform/CIntPtr | 43 | 0.03 | 1476| | pure/uniform/CUIntPtr | 42 | 0.03 | 1463| | pure/uniform/CIntMax | 43 | 0.03 | 1535| | pure/uniform/CUIntMax | 42 | 0.03 | 1493| API changes =========== MonadRandom ----------- This patch adds a class 'MonadRandom': -- | 'MonadRandom' is an interface to monadic pseudo-random number generators. class Monad m => MonadRandom g s m | g m -> s where {-# MINIMAL freezeGen,thawGen,(uniformWord32|uniformWord64) #-} type Frozen g = (f :: Type) | f -> g freezeGen :: g s -> m (Frozen g) thawGen :: Frozen g -> m (g s) uniformWord32 :: g s -> m Word32 -- default implementation in terms of uniformWord64 uniformWord64 :: g s -> m Word64 -- default implementation in terms of uniformWord32 -- plus methods for other word sizes and for byte strings -- all have default implementations so the MINIMAL pragma holds Conceptually, in 'MonadRandom g s m', 'g s' is the type of the generator, 's' is the state type, and 'm' the underlying monad. Via the functional dependency 'g m -> s', the state type is determined by the generator and monad. 'Frozen' is the type of the generator's state "at rest". It is defined as an injective type family via 'f -> g', so there is no ambiguity as to which 'g' any 'Frozen g' belongs to. This definition is generic enough to accommodate, for example, the 'Gen' type from the 'mwc-random' package, which itself abstracts over the underlying primitive monad and state token. The documentation shows the full 'MonadRandom Gen' instance. Four 'MonadRandom' instances ("monadic adapters") are provided for pure generators to enable their use in monadic code. The documentation describes them in detail. 'Uniform' and 'UniformRange' ---------------------------- The 'Random' typeclass has conceptually been split into 'Uniform' and 'UniformRange'. The 'Random' typeclass is still included for backwards compatibility. 'Uniform' is for types where it is possible to sample from the type's entire domain; 'UniformRange' is for types where one can sample from a specified range. Breaking Changes ================ This patch introduces these breaking changes: * requires 'base >= 4.10' (GHC-8.2) * 'StdGen' is no longer an instance of 'Read' * 'randomIO' and 'randomRIO' where extracted from the 'Random' class into separate functions In addition, there may be import clashes with new functions, e.g. 'uniform' and 'uniformR'. Deprecations ============ This patch introduces 'genWord64', 'genWord32' and similar methods to the 'RandomGen' class. The significantly slower method 'next' and its companion 'genRange' are now deprecated. Co-authored-by: Alexey Kuleshevich <[email protected]> Co-authored-by: idontgetoutmuch <[email protected]> Co-authored-by: Leonhard Markert <[email protected]>

This patch is mostly backwards compatible. See "Breaking Changes" below for the full list of backwards incompatible changes. This patch fixes quality and performance issues, addresses additional miscellaneous issues, and introduces a monadic API. Issues addressed ================ Priority issues fixed in this patch: - Title: "The seeds generated by split are not independent" Link: haskell#25 Fixed: changed algorithm to SplitMix, which provides a robust 'split' operation - Title: "Very low throughput" Link: haskell#51 Fixed: see "Performance" below Additional issues addressed in this patch: - Title: "Add Random instances for tuples" Link: haskell#26 Addressed: added 'Uniform' instances for up to 6-tuples - Title: "Add Random instance for Natural" Link: haskell#44 Addressed: added 'UniformRange' instance for 'Natural' - Title: "incorrect distribution of randomR for floating-point numbers" Link: haskell#53 Addressed: see "Regarding floating-point numbers" below - Title: "System/Random.hs:43:1: warning: [-Wtabs]" Link: haskell#55 Fixed: no more tabs - Title: "Why does random for Float and Double produce exactly 24 or 53 bits?" Link: haskell#58 Fixed: see "Regarding floating-point numbers" below - Title: "read :: StdGen fails for strings longer than 6" Link: haskell#59 Addressed: 'StdGen' is no longer an instance of 'Read' Regarding floating-point numbers: with this patch, the relevant instances for 'Float' and 'Double' sample more bits than before but do not sample every possible representable value. The documentation now clearly spells out what this means for users. Quality (issue 25) ================== The algorithm [1] in version 1.1 of this library fails empirical PRNG tests when used to generate "split sequences" as proposed in [3]. SplitMix [2] passes the same tests. This patch changes 'StdGen' to use the SplitMix implementation provided by the splitmix package. Test batteries used: dieharder, TestU1, PractRand. [1]: P. L'Ecuyer, "Efficient and portable combined random number generators". https://doi.org/10.1145/62959.62969 [2]: G. L. Steele, D. Lea, C. H. Flood, "Fast splittable pseudorandom number generators". https://doi.org/10.1145/2714064.2660195 [3]: H. G. Schaathun, "Evaluation of splittable pseudo-random generators". https://doi.org/10.1017/S095679681500012X Performance (issue 51) ====================== The "improvement" column in the following table is a multiplier: the improvement for 'random' for type 'Float' is 1038, so this operation is 1038 times faster with this patch. | Name | Mean (1.1) | Mean (patch) | Improvement| | ----------------------- | ---------- | ------------ | ---------- | | pure/random/Float | 30 | 0.03 | 1038| | pure/random/Double | 52 | 0.03 | 1672| | pure/random/Integer | 43 | 0.33 | 131| | pure/uniform/Word8 | 14 | 0.03 | 422| | pure/uniform/Word16 | 13 | 0.03 | 375| | pure/uniform/Word32 | 21 | 0.03 | 594| | pure/uniform/Word64 | 42 | 0.03 | 1283| | pure/uniform/Word | 44 | 0.03 | 1491| | pure/uniform/Int8 | 15 | 0.03 | 511| | pure/uniform/Int16 | 15 | 0.03 | 507| | pure/uniform/Int32 | 22 | 0.03 | 749| | pure/uniform/Int64 | 44 | 0.03 | 1405| | pure/uniform/Int | 43 | 0.03 | 1512| | pure/uniform/Char | 17 | 0.49 | 35| | pure/uniform/Bool | 18 | 0.03 | 618| | pure/uniform/CChar | 14 | 0.03 | 485| | pure/uniform/CSChar | 14 | 0.03 | 455| | pure/uniform/CUChar | 13 | 0.03 | 448| | pure/uniform/CShort | 14 | 0.03 | 473| | pure/uniform/CUShort | 13 | 0.03 | 457| | pure/uniform/CInt | 21 | 0.03 | 737| | pure/uniform/CUInt | 21 | 0.03 | 742| | pure/uniform/CLong | 43 | 0.03 | 1544| | pure/uniform/CULong | 42 | 0.03 | 1460| | pure/uniform/CPtrdiff | 43 | 0.03 | 1494| | pure/uniform/CSize | 43 | 0.03 | 1475| | pure/uniform/CWchar | 22 | 0.03 | 785| | pure/uniform/CSigAtomic | 21 | 0.03 | 749| | pure/uniform/CLLong | 43 | 0.03 | 1554| | pure/uniform/CULLong | 42 | 0.03 | 1505| | pure/uniform/CIntPtr | 43 | 0.03 | 1476| | pure/uniform/CUIntPtr | 42 | 0.03 | 1463| | pure/uniform/CIntMax | 43 | 0.03 | 1535| | pure/uniform/CUIntMax | 42 | 0.03 | 1493| API changes =========== MonadRandom ----------- This patch adds a class 'MonadRandom': -- | 'MonadRandom' is an interface to monadic pseudo-random number generators. class Monad m => MonadRandom g s m | g m -> s where {-# MINIMAL freezeGen,thawGen,(uniformWord32|uniformWord64) #-} type Frozen g = (f :: Type) | f -> g freezeGen :: g s -> m (Frozen g) thawGen :: Frozen g -> m (g s) uniformWord32 :: g s -> m Word32 -- default implementation in terms of uniformWord64 uniformWord64 :: g s -> m Word64 -- default implementation in terms of uniformWord32 -- plus methods for other word sizes and for byte strings -- all have default implementations so the MINIMAL pragma holds Conceptually, in 'MonadRandom g s m', 'g s' is the type of the generator, 's' is the state type, and 'm' the underlying monad. Via the functional dependency 'g m -> s', the state type is determined by the generator and monad. 'Frozen' is the type of the generator's state "at rest". It is defined as an injective type family via 'f -> g', so there is no ambiguity as to which 'g' any 'Frozen g' belongs to. This definition is generic enough to accommodate, for example, the 'Gen' type from the 'mwc-random' package, which itself abstracts over the underlying primitive monad and state token. The documentation shows the full 'MonadRandom Gen' instance. Four 'MonadRandom' instances ("monadic adapters") are provided for pure generators to enable their use in monadic code. The documentation describes them in detail. 'Uniform' and 'UniformRange' ---------------------------- The 'Random' typeclass has conceptually been split into 'Uniform' and 'UniformRange'. The 'Random' typeclass is still included for backwards compatibility. 'Uniform' is for types where it is possible to sample from the type's entire domain; 'UniformRange' is for types where one can sample from a specified range. Breaking Changes ================ This patch introduces these breaking changes: * requires 'base >= 4.10' (GHC-8.2) * 'StdGen' is no longer an instance of 'Read' * 'randomIO' and 'randomRIO' where extracted from the 'Random' class into separate functions In addition, there may be import clashes with new functions, e.g. 'uniform' and 'uniformR'. Deprecations ============ This patch introduces 'genWord64', 'genWord32' and similar methods to the 'RandomGen' class. The significantly slower method 'next' and its companion 'genRange' are now deprecated. Co-authored-by: Alexey Kuleshevich <[email protected]> Co-authored-by: idontgetoutmuch <[email protected]> Co-authored-by: Leonhard Markert <[email protected]>

This patch is mostly backwards compatible. See "Breaking Changes" below for the full list of backwards incompatible changes. This patch fixes quality and performance issues, addresses additional miscellaneous issues, and introduces a monadic API. Issues addressed ================ Priority issues fixed in this patch: - Title: "The seeds generated by split are not independent" Link: haskell#25 Fixed: changed algorithm to SplitMix, which provides a robust 'split' operation - Title: "Very low throughput" Link: haskell#51 Fixed: see "Performance" below Additional issues addressed in this patch: - Title: "Add Random instances for tuples" Link: haskell#26 Addressed: added 'Uniform' instances for up to 6-tuples - Title: "Add Random instance for Natural" Link: haskell#44 Addressed: added 'UniformRange' instance for 'Natural' - Title: "incorrect distribution of randomR for floating-point numbers" Link: haskell#53 Addressed: see "Regarding floating-point numbers" below - Title: "System/Random.hs:43:1: warning: [-Wtabs]" Link: haskell#55 Fixed: no more tabs - Title: "Why does random for Float and Double produce exactly 24 or 53 bits?" Link: haskell#58 Fixed: see "Regarding floating-point numbers" below - Title: "read :: StdGen fails for strings longer than 6" Link: haskell#59 Addressed: 'StdGen' is no longer an instance of 'Read' Regarding floating-point numbers: with this patch, the relevant instances for 'Float' and 'Double' sample more bits than before but do not sample every possible representable value. The documentation now clearly spells out what this means for users. Quality (issue 25) ================== The algorithm [1] in version 1.1 of this library fails empirical PRNG tests when used to generate "split sequences" as proposed in [3]. SplitMix [2] passes the same tests. This patch changes 'StdGen' to use the SplitMix implementation provided by the splitmix package. Test batteries used: dieharder, TestU1, PractRand. [1]: P. L'Ecuyer, "Efficient and portable combined random number generators". https://doi.org/10.1145/62959.62969 [2]: G. L. Steele, D. Lea, C. H. Flood, "Fast splittable pseudorandom number generators". https://doi.org/10.1145/2714064.2660195 [3]: H. G. Schaathun, "Evaluation of splittable pseudo-random generators". https://doi.org/10.1017/S095679681500012X Performance (issue 51) ====================== The "improvement" column in the following table is a multiplier: the improvement for 'random' for type 'Float' is 1038, so this operation is 1038 times faster with this patch. | Name | Mean (1.1) | Mean (patch) | Improvement| | ----------------------- | ---------- | ------------ | ---------- | | pure/random/Float | 30 | 0.03 | 1038| | pure/random/Double | 52 | 0.03 | 1672| | pure/random/Integer | 43 | 0.33 | 131| | pure/uniform/Word8 | 14 | 0.03 | 422| | pure/uniform/Word16 | 13 | 0.03 | 375| | pure/uniform/Word32 | 21 | 0.03 | 594| | pure/uniform/Word64 | 42 | 0.03 | 1283| | pure/uniform/Word | 44 | 0.03 | 1491| | pure/uniform/Int8 | 15 | 0.03 | 511| | pure/uniform/Int16 | 15 | 0.03 | 507| | pure/uniform/Int32 | 22 | 0.03 | 749| | pure/uniform/Int64 | 44 | 0.03 | 1405| | pure/uniform/Int | 43 | 0.03 | 1512| | pure/uniform/Char | 17 | 0.49 | 35| | pure/uniform/Bool | 18 | 0.03 | 618| | pure/uniform/CChar | 14 | 0.03 | 485| | pure/uniform/CSChar | 14 | 0.03 | 455| | pure/uniform/CUChar | 13 | 0.03 | 448| | pure/uniform/CShort | 14 | 0.03 | 473| | pure/uniform/CUShort | 13 | 0.03 | 457| | pure/uniform/CInt | 21 | 0.03 | 737| | pure/uniform/CUInt | 21 | 0.03 | 742| | pure/uniform/CLong | 43 | 0.03 | 1544| | pure/uniform/CULong | 42 | 0.03 | 1460| | pure/uniform/CPtrdiff | 43 | 0.03 | 1494| | pure/uniform/CSize | 43 | 0.03 | 1475| | pure/uniform/CWchar | 22 | 0.03 | 785| | pure/uniform/CSigAtomic | 21 | 0.03 | 749| | pure/uniform/CLLong | 43 | 0.03 | 1554| | pure/uniform/CULLong | 42 | 0.03 | 1505| | pure/uniform/CIntPtr | 43 | 0.03 | 1476| | pure/uniform/CUIntPtr | 42 | 0.03 | 1463| | pure/uniform/CIntMax | 43 | 0.03 | 1535| | pure/uniform/CUIntMax | 42 | 0.03 | 1493| API changes =========== StatefulGen ----------- This patch adds a class 'StatefulGen': -- | 'StatefulGen' is an interface to monadic pseudo-random number generators. class Monad m => StatefulGen g m where uniformWord32 :: g -> m Word32 -- default implementation in terms of uniformWord64 uniformWord64 :: g -> m Word64 -- default implementation in terms of uniformWord32 -- plus methods for other word sizes and for byte strings -- all have default implementations so the MINIMAL pragma holds In 'StatefulGen g m', 'g' is the type of the generator and 'm' the underlying monad. Four 'StatefulGen' instances ("monadic adapters") are provided for pure generators to enable their use in monadic code. The documentation describes them in detail. FrozenGen --------- This patch also introduces a class 'FrozenGen': -- | 'FrozenGen' is designed for stateful pseudo-random number generators -- that can be saved as and restored from an immutable data type. class StatefulGen (MutableGen f m) m => FrozenGen f m where type MutableGen f m = (g :: Type) | g -> f freezeGen :: MutableGen f m -> m f thawGen :: f -> m (MutableGen f m) 'f' is the type of the generator's state "at rest" and 'm' the underlying monad. 'MutableGen' is defined as an injective type family via 'g -> f' so for any generator 'g', the type 'f' of its at-rest state is well-defined. Both 'StatefulGen' and 'FrozenGen' are generic enough to accommodate, for example, the 'Gen' type from the 'mwc-random' package, which itself abstracts over the underlying primitive monad and state token. The documentation shows the full instances. 'Uniform' and 'UniformRange' ---------------------------- The 'Random' typeclass has conceptually been split into 'Uniform' and 'UniformRange'. The 'Random' typeclass is still included for backwards compatibility. 'Uniform' is for types where it is possible to sample from the type's entire domain; 'UniformRange' is for types where one can sample from a specified range. Breaking Changes ================ This patch introduces these breaking changes: * requires 'base >= 4.10' (GHC-8.2) * 'StdGen' is no longer an instance of 'Read' * 'randomIO' and 'randomRIO' where extracted from the 'Random' class into separate functions In addition, there may be import clashes with new functions, e.g. 'uniform' and 'uniformR'. Deprecations ============ This patch introduces 'genWord64', 'genWord32' and similar methods to the 'RandomGen' class. The significantly slower method 'next' and its companion 'genRange' are now deprecated. Co-authored-by: Alexey Kuleshevich <[email protected]> Co-authored-by: idontgetoutmuch <[email protected]> Co-authored-by: Leonhard Markert <[email protected]>

This patch is mostly backwards compatible. See "Breaking Changes" below for the full list of backwards incompatible changes. This patch fixes quality and performance issues, addresses additional miscellaneous issues, and introduces a monadic API. Issues addressed ================ Priority issues fixed in this patch: - Title: "The seeds generated by split are not independent" Link: haskell#25 Fixed: changed algorithm to SplitMix, which provides a robust 'split' operation - Title: "Very low throughput" Link: haskell#51 Fixed: see "Performance" below Additional issues addressed in this patch: - Title: "Add Random instances for tuples" Link: haskell#26 Addressed: added 'Uniform' instances for up to 6-tuples - Title: "Add Random instance for Natural" Link: haskell#44 Addressed: added 'UniformRange' instance for 'Natural' - Title: "incorrect distribution of randomR for floating-point numbers" Link: haskell#53 Addressed: see "Regarding floating-point numbers" below - Title: "System/Random.hs:43:1: warning: [-Wtabs]" Link: haskell#55 Fixed: no more tabs - Title: "Why does random for Float and Double produce exactly 24 or 53 bits?" Link: haskell#58 Fixed: see "Regarding floating-point numbers" below - Title: "read :: StdGen fails for strings longer than 6" Link: haskell#59 Addressed: 'StdGen' is no longer an instance of 'Read' Regarding floating-point numbers: with this patch, the relevant instances for 'Float' and 'Double' sample more bits than before but do not sample every possible representable value. The documentation now clearly spells out what this means for users. Quality (issue 25) ================== The algorithm [1] in version 1.1 of this library fails empirical PRNG tests when used to generate "split sequences" as proposed in [3]. SplitMix [2] passes the same tests. This patch changes 'StdGen' to use the SplitMix implementation provided by the splitmix package. Test batteries used: dieharder, TestU1, PractRand. [1]: P. L'Ecuyer, "Efficient and portable combined random number generators". https://doi.org/10.1145/62959.62969 [2]: G. L. Steele, D. Lea, C. H. Flood, "Fast splittable pseudorandom number generators". https://doi.org/10.1145/2714064.2660195 [3]: H. G. Schaathun, "Evaluation of splittable pseudo-random generators". https://doi.org/10.1017/S095679681500012X Performance (issue 51) ====================== The "improvement" column in the following table is a multiplier: the improvement for 'random' for type 'Float' is 1038, so this operation is 1038 times faster with this patch. | Name | Mean (1.1) | Mean (patch) | Improvement| | ----------------------- | ---------- | ------------ | ---------- | | pure/random/Float | 30 | 0.03 | 1038| | pure/random/Double | 52 | 0.03 | 1672| | pure/random/Integer | 43 | 0.33 | 131| | pure/uniform/Word | 44 | 0.03 | 1491| | pure/uniform/Int | 43 | 0.03 | 1512| | pure/uniform/Char | 17 | 0.49 | 35| | pure/uniform/Bool | 18 | 0.03 | 618| API changes =========== StatefulGen ----------- This patch adds a class 'StatefulGen': -- | 'StatefulGen' is an interface to monadic pseudo-random number generators. class Monad m => StatefulGen g m where uniformWord32 :: g -> m Word32 -- default implementation in terms of uniformWord64 uniformWord64 :: g -> m Word64 -- default implementation in terms of uniformWord32 -- plus methods for other word sizes and for byte strings -- all have default implementations so the MINIMAL pragma holds In 'StatefulGen g m', 'g' is the type of the generator and 'm' the underlying monad. Four 'StatefulGen' instances ("monadic adapters") are provided for pure generators to enable their use in monadic code. The documentation describes them in detail. FrozenGen --------- This patch also introduces a class 'FrozenGen': -- | 'FrozenGen' is designed for stateful pseudo-random number generators -- that can be saved as and restored from an immutable data type. class StatefulGen (MutableGen f m) m => FrozenGen f m where type MutableGen f m = (g :: Type) | g -> f freezeGen :: MutableGen f m -> m f thawGen :: f -> m (MutableGen f m) 'f' is the type of the generator's state "at rest" and 'm' the underlying monad. 'MutableGen' is defined as an injective type family via 'g -> f' so for any generator 'g', the type 'f' of its at-rest state is well-defined. Both 'StatefulGen' and 'FrozenGen' are generic enough to accommodate, for example, the 'Gen' type from the 'mwc-random' package, which itself abstracts over the underlying primitive monad and state token. The documentation shows the full instances. 'Uniform' and 'UniformRange' ---------------------------- The 'Random' typeclass has conceptually been split into 'Uniform' and 'UniformRange'. The 'Random' typeclass is still included for backwards compatibility. 'Uniform' is for types where it is possible to sample from the type's entire domain; 'UniformRange' is for types where one can sample from a specified range. Breaking Changes ================ This patch introduces these breaking changes: * requires 'base >= 4.8' (GHC-7.10) * 'StdGen' is no longer an instance of 'Read' * 'randomIO' and 'randomRIO' where extracted from the 'Random' class into separate functions In addition, there may be import clashes with new functions, e.g. 'uniform' and 'uniformR'. Deprecations ============ This patch introduces 'genWord64', 'genWord32' and similar methods to the 'RandomGen' class. The significantly slower method 'next' and its companion 'genRange' are now deprecated. Co-authored-by: Alexey Kuleshevich <[email protected]> Co-authored-by: idontgetoutmuch <[email protected]> Co-authored-by: Leonhard Markert <[email protected]>

Bodigrim · 2020-06-24T23:28:42Z

@Zemyla could you please compare your algorithm to random-1.2.0? I believe we generate more than 24/53 bits now.

random/src/System/Random/Internal.hs

Lines 771 to 790 in 11464aa

    
           -- | See [Floating point number caveats](System-Random-Stateful.html#fpcaveats). 
        
           instance UniformRange Double where 
        
             uniformRM (l, h) g 
        
               | l == h = return l 
        
               | otherwise = do 
        
                 x <- uniformDouble01M g 
        
                 return $ x * l + (1 -x) * h 
        
           -- | Generates uniformly distributed 'Double' in the range \([0, 1]\). 
        
           --   Numbers are generated by generating uniform 'Word64' and dividing 
        
           --   it by \(2^{64}\). It's used to implement 'UniformR' instance for 
        
           --   'Double'. 
        
           -- 
        
           -- @since 1.2.0 
        
           uniformDouble01M :: StatefulGen g m => g -> m Double 
        
           uniformDouble01M g = do 
        
             w64 <- uniformWord64 g 
        
             return $ fromIntegral w64 / m 
        
             where 
        
               m = fromIntegral (maxBound :: Word64) :: Double

This approach does not cover denormalized floats (@curiousleo correct me, if I'm wrong, please), which is discussed in #53, but AFAIU yours does not as well.

curiousleo · 2020-06-25T06:24:32Z

I don't claim to fully understand @Zemyla's code, but my impression is that it is based on ideas similar to http://allendowney.com/research/rand/ -- the countLeadingZeros here can be used to read a uniform bitstring as a geometric distribution, which is how the exponent of a uniformly random IEEE float ought to be generated.

I would like to see something along those lines in a future version of random, either as the default way to generate floating point numbers or as an optional "more accurate but slower" method, depending on how it performs.

However, code like this needs to be tested thoroughly. In addition, while generating numbers in the unit interval is nice, translating the unit interval into an arbitrary target interval via addition and multiplication leads to a loss in precision. So ideally, the uniform floating point generation method would be able to directly generate numbers in an arbitrary interval.

This is what https://gitlab.com/christoph-conrads/rademacher-fpl/ does. Note how thoroughly tested it is - which is necessary, because the code gets pretty complex.

@Zemyla if this is something you're interested in, you may also be interested in the discussions we had in the run-up to the 1.2.0 release: idontgetoutmuch#113 (comment) (and the whole surrounding issue) as well as idontgetoutmuch#105. I am experimenting with a Haskell implementation of a "uniform floating point number in an arbitrary range generator" here: https://github.com/curiousleo/random-float.

idontgetoutmuch mentioned this issue Apr 29, 2020

Interface to performance idontgetoutmuch/random#1

Open

curiousleo pushed a commit to curiousleo/random that referenced this issue May 5, 2020

Merge pull request haskell#58 from idontgetoutmuch/instances-and-warn…

7173a80

…ings Instances and warnings

cartazio pushed a commit that referenced this issue May 19, 2020

Merge pull request #58 from idontgetoutmuch/instances-and-warnings

1611123

Instances and warnings

idontgetoutmuch mentioned this issue May 20, 2020

V1.2 proposal #61

Merged

lehins mentioned this issue Jun 4, 2020

V1.2 proposal #62

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why does random for Float and Double produce exactly 24 or 53 bits? #58

Why does random for Float and Double produce exactly 24 or 53 bits? #58

Zemyla commented Jan 5, 2020

cartazio commented Jan 5, 2020

cartazio commented Jan 5, 2020

cartazio commented Jan 5, 2020

cartazio commented Jan 5, 2020

Bodigrim commented Jun 24, 2020

curiousleo commented Jun 25, 2020

Why does random for Float and Double produce exactly 24 or 53 bits? #58

Why does random for Float and Double produce exactly 24 or 53 bits? #58

Comments

Zemyla commented Jan 5, 2020

cartazio commented Jan 5, 2020

cartazio commented Jan 5, 2020

cartazio commented Jan 5, 2020

cartazio commented Jan 5, 2020

Bodigrim commented Jun 24, 2020

curiousleo commented Jun 25, 2020