-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slightly improved API. Dramatically improved implementation #1
base: master
Are you sure you want to change the base?
Conversation
The API has no reason to pass the range arguments by reference. Doing so slows down the function. The arguments are not return values, and the first thing the code does is dereference them and copy them into stack variables. Passing the input ranges by value skips this step and has fewer errors. Furthermore, the arguments are small enough (2 primatives) that putting them onto the stack by value can make the function call inline even faster.
Same change as before: change input arguments to stack based arguments for efficiency. This also allows us to skip the case where the inputs == NULL in the error check
I re-wrote the encode and decode functions to provide a DRAMATICALLY faster interface that does no branching and uses only a few 64-bit integer operations with unrolled loops to do the interleaving/deinterleaving using binary magic numbers for efficient morton codes. The code should go much much much faster now.
I got a compile error geohash.c: In function ‘geohash_decode’: And IMO it's better to keep the old simple&stupid implementation, since this optimize one is hard to understand. |
And about the 'geohash_decode' function, origin implementation and this optimize one have different result for same input. |
I finally understand your implementation, it's brilliant. |
So, sorry about the compile errors. I imagine they are easy to fix. I wrote this at work without access to a C compiler. With regard to not matching the identical input and output, I didn't test this because, again, no C compiler. However, any differences should be minor If you manage to fix the differences (which should be trivial, I'll do it if you don't), then I STRONGLY recommend not having two new functions. There's no reason to have two different APIs to do exactly the same thing but just one is different. The unix philosophy: "There should be one, and exactly one, obvious way to do something". Users of your API shouldn't have to choose between the 'fast' way and the 'slow' way...it's confusing. Just provide one way. |
For comparison: the original encode took 140 ns on my machine (compiled with clang -O2) and the new encode now takes 0.57 ns to encode. Decodes went from 100 ns to the same 0.57 ns. (So, prior performance was 7 million encodes per second and 10 million decodes per second; now we can do 1.7 billion per second with either encodes or decodes). New version performance comparison versus compiler optimization levels:
|
mattsta: can you post the assembly for the encode-decode on clang -O1 vs gcc -O3? I'm amazed there is almost a factor of 30 difference in performance and I really want to see what the compiler is doing. |
Thanks for offering to check. I put files at https://gist.github.com/mattsta/cd338cb0253e85d8dfe0. It will be easier to just clone the gist and compare locally ( The assembly files are from the version of geohash-int I keep at https://github.com/mattsta/geohash-int Assembly was generated with: The timing results were found using my dumb looping timing tester: https://github.com/mattsta/krmt/blob/master/geo/test.c It's possible the higher optimization levels were removing some useful steps since I didn't use the results of the function anywhere again. |
I tested with gcc 3.4.5/4.8.2 in linux, it's strange that the new decode function is slightly slower with gcc3.4.5 -O2, it seems that new glibc is much faster with math function 'ldexp'. I put all test record at |
@Steve132 @mattsta Now I use simple divide & shift operations instead of math function 'ldexp', the decode is about 300% faster than using ' ldexp'. |
Using the new divide & shift decode, my time for a decode (gcc-4.9 -O2) is now 29 ns from the previous 50 ns. Good fix! |
Try this out:
|
That's some crazy compiler voodoo there. After fixing 8 syntax errors, it decodes at 36 ns compared to 29 ns for the all-shifting version. (gcc-4.9 -O2) I think we've found the optimal results for now with the divide and shift operations. :) |
I dramatically improved the implementation of encode and decode to use efficient bit-twiddling and scaling to perform the morton-code encoding and decoding in constant time with only a few operations. The resulting code should be significantly faster.
I also changed the api to use pass-by-value where it was more efficient and more appropriate for a C-API. The resulting call-by-value change decreased error checking and should increase performance over pass-by-reference