Losing the battle to convert a string to double #129

vinniefalco · 2019-11-16T16:13:09Z

Going from double to string is amazing using this lib, but I am having trouble efficiently and correctly parsing a string into a double. This is what I have, any ideas? Is there some ulfjack magic that could be worked here?

https://github.com/vinniefalco/json/blob/develop/include/boost/json/detail/number.ipp

ulfjack · 2019-11-17T09:16:35Z

If you know your input is the "shortest" format of a floating-point value of a given width, then there is in fact a fast algorithm to parse these values. I have some code for that. I'll need to do a bit of work in order to commit it here, but it shouldn't be much (famous last words).

plokhotnyuk · 2019-11-17T10:40:51Z

@ulfjack There is an amazing library for Rust that parses float/double values precisely and fast

I've ported the moderate path from it to our library for Scala

ulfjack · 2019-11-17T13:22:17Z

I'm aware of the article. I think we can do better.

ulfjack · 2019-11-17T14:57:56Z

I have pushed some code to the https://github.com/ulfjack/ryu/tree/parsing branch. There are still big holes in the code, but that's the general shape. Note that it can only handle up to ~18 decimal digits (which is fine for input that was formatted as shortest), and I need to double-check that the tables have enough bits for this usage.

ulfjack · 2019-11-18T19:11:05Z

I pushed another commit filling in some of the missing pieces.

ulfjack · 2019-12-04T12:49:00Z

I pushed a couple of fixes to the branch. Before it can be merged, I need to figure out error handling, make it work on Mac/Win, and double-check that the bit widths are ok. More tests would also be good, but IMO not a blocker for merging into master.

plokhotnyuk · 2019-12-04T13:49:43Z

@ulfjack please consider using of SWAR techniques for parsing the decimal representation, like here: sirthias/borer#114

A limited implementation for up to 17-digit numbers. See #129. * Move some d2s intrinsics to d2s_intrinsics.h * Rudimentary string to double parsing * Implement more pieces for parsing - parse exponents - parse dot - handle negative exponents - handle rounding - add some basic tests for these things * Fix constants * Use __builtin_clzll Thanks to expnkx for finding this. See #133. * Allow exponent to start with '+' * Correctly handle mantissa overflow * Increase the table size Denormal numbers require a larger table because the decimal exponent can go down to -326 (we're using the table inversely to how it's originally used for shortest). * Handle most overflow/underflow situations * Increase table size some more * More tests close to the underflow These are almost exactly halfway between 0 and the smallest representable float. * Implement error handling Mention the restrictions on the input, and that this implementation is experimental. * Add another caveat to the documentation * Make it compile with -DRYU_ONLY_64_BIT_OPS This mode was not defining mulShift. We just use the same implementation as if it's not set. * Use _BitScanReverse64 on Windows * Rename mulShift to mulShift{32,64} Since I moved the mulShift implementations to d2s_intrinsics, they conflict with the ones defined in f2s.c. This is only an issue when -DRYU_OPTIMIZE_SIZE is set, but I think it's better to disambiguate anyway. * Rename max to max32 to avoid name conflicts on MSVC * MSVC: Always use _BitScanReverse64 The HAS_64_BIT_INTRINSICS macro is false if RYU_ONLY_64_BIT_OPS is set, so using that is incorrect. Instead, only check for _MSC_VER. I can't seem to find a macro to check for existence of __builtin_clzll.

ulfjack · 2020-01-15T09:33:38Z

@vinniefalco posted on the PR: "How about 0.X1E1000 where X is a thousand zeroes"

Looks like you found a bug!

vinniefalco · 2020-01-15T12:38:41Z

We were able to fix our number parser:

https://github.com/vinniefalco/json/blob/5aae31dc74d055d84a7f13e438d80cdf6005c670/include/boost/json/detail/impl/number.ipp#L155

However, it fails to produce the same result as stod for some cases. It is off by just one bit in those cases, which I guess is what all of this complicated looking code in s2d.c is for, which is to identify the cases where it needs to round up or down by one bit to find the nearest double? And this is all because converting from a base 10 exponent to a base 2 exponent is lossy?

plokhotnyuk · 2020-01-15T13:26:27Z

@vinniefalco
ICYMI: https://www.exploringbinary.com/17-digits-gets-you-there-once-youve-found-your-way/

vinniefalco · 2020-01-15T13:38:28Z

Well.... it would have been nice to read that before I struggled to write the code !!!

nickaleks · 2020-01-15T15:43:11Z

@vinniefalco @ulfjack I am currently looking at fast string-to-double conversion, and reading this discussion it's not clear whether current ryu implementation is correct, or whether json implementation is good to use. Can one of those solutions be used yet?

vinniefalco · 2020-01-15T16:20:54Z

The JSON implementation seems to work (i.e. passes tests) but it only implements the "fast path" conversion. Which means it can be off by one bit for some strings. And note that it is not officially part of Boost yet, and is still being worked on.

ulfjack mentioned this issue Dec 4, 2019

Ryu-like String to Float Parsing #138

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Losing the battle to convert a string to double #129

Losing the battle to convert a string to double #129

vinniefalco commented Nov 16, 2019

ulfjack commented Nov 17, 2019

plokhotnyuk commented Nov 17, 2019 •

edited

Loading

ulfjack commented Nov 17, 2019

ulfjack commented Nov 17, 2019

ulfjack commented Nov 18, 2019

ulfjack commented Dec 4, 2019

plokhotnyuk commented Dec 4, 2019

ulfjack commented Jan 15, 2020

vinniefalco commented Jan 15, 2020

plokhotnyuk commented Jan 15, 2020 •

edited

Loading

vinniefalco commented Jan 15, 2020

nickaleks commented Jan 15, 2020

vinniefalco commented Jan 15, 2020

Losing the battle to convert a string to double #129

Losing the battle to convert a string to double #129

Comments

vinniefalco commented Nov 16, 2019

ulfjack commented Nov 17, 2019

plokhotnyuk commented Nov 17, 2019 • edited Loading

ulfjack commented Nov 17, 2019

ulfjack commented Nov 17, 2019

ulfjack commented Nov 18, 2019

ulfjack commented Dec 4, 2019

plokhotnyuk commented Dec 4, 2019

ulfjack commented Jan 15, 2020

vinniefalco commented Jan 15, 2020

plokhotnyuk commented Jan 15, 2020 • edited Loading

vinniefalco commented Jan 15, 2020

nickaleks commented Jan 15, 2020

vinniefalco commented Jan 15, 2020

plokhotnyuk commented Nov 17, 2019 •

edited

Loading

plokhotnyuk commented Jan 15, 2020 •

edited

Loading