Slightly improved API. Dramatically improved implementation #1

Steve132 · 2014-09-15T19:02:53Z

I dramatically improved the implementation of encode and decode to use efficient bit-twiddling and scaling to perform the morton-code encoding and decoding in constant time with only a few operations. The resulting code should be significantly faster.

I also changed the api to use pass-by-value where it was more efficient and more appropriate for a C-API. The resulting call-by-value change decreased error checking and should increase performance over pass-by-reference

The API has no reason to pass the range arguments by reference. Doing so slows down the function. The arguments are not return values, and the first thing the code does is dereference them and copy them into stack variables. Passing the input ranges by value skips this step and has fewer errors. Furthermore, the arguments are small enough (2 primatives) that putting them onto the stack by value can make the function call inline even faster.

Same change as before: change input arguments to stack based arguments for efficiency. This also allows us to skip the case where the inputs == NULL in the error check

I re-wrote the encode and decode functions to provide a DRAMATICALLY faster interface that does no branching and uses only a few 64-bit integer operations with unrolled loops to do the interleaving/deinterleaving using binary magic numbers for efficient morton codes. The code should go much much much faster now.

yinqiwen · 2014-09-16T08:37:58Z

I got a compile error

geohash.c: In function ‘geohash_decode’:
geohash.c:161:5: error: ‘ilato’ undeclared (first use in this function)
ilato = xyhilo; //get back the original integer coordinates
^
geohash.c:161:5: note: each undeclared identifier is reported only once for each function it appears in
geohash.c:162:5: error: ‘ilono’ undeclared (first use in this function)
ilono = xyhilo >> 32;
^
geohash.c:170:42: warning: incompatible implicit declaration of built-in function ‘ldexp’ [enabled by default]
area->latitude.min = lat_range.min + ldexp(ilato, -step) * lat_scale;

And IMO it's better to keep the old simple&stupid implementation, since this optimize one is hard to understand.

yinqiwen · 2014-09-16T08:47:35Z

And about the 'geohash_decode' function, origin implementation and this optimize one have different result for same input.

yinqiwen · 2014-09-16T09:59:52Z

I finally understand your implementation, it's brilliant.
I'll merge this by hand, and add two functions 'geohash_decode_fast/geohash_encode_fast'.

Steve132 · 2014-09-16T15:34:23Z

So, sorry about the compile errors. I imagine they are easy to fix. I wrote this at work without access to a C compiler.

With regard to not matching the identical input and output, I didn't test this because, again, no C compiler. However, any differences should be minor

If you manage to fix the differences (which should be trivial, I'll do it if you don't), then I STRONGLY recommend not having two new functions. There's no reason to have two different APIs to do exactly the same thing but just one is different. The unix philosophy: "There should be one, and exactly one, obvious way to do something".

Users of your API shouldn't have to choose between the 'fast' way and the 'slow' way...it's confusing. Just provide one way.

mattsta · 2014-09-16T17:31:03Z

For comparison: the original encode took 140 ns on my machine (compiled with clang -O2) and the new encode now takes 0.57 ns to encode.

Decodes went from 100 ns to the same 0.57 ns.

(So, prior performance was 7 million encodes per second and 10 million decodes per second; now we can do 1.7 billion per second with either encodes or decodes).

New version performance comparison versus compiler optimization levels:

clang -O0: encode = 2 ns; decode = 0.57 ns
gcc -O0: encode = 101 ns; decode = 108 ns
clang -O1 and higher: encode = 0.57 ns; decode = 0.57 ns
gcc -O1: encode = 36 ns; decode = 54 ns
gcc -O2: encode = 30 ns; decode = 49 ns
gcc -O3: encode = 19 ns; decode = 21 ns

Steve132 · 2014-09-16T21:19:49Z

mattsta: can you post the assembly for the encode-decode on clang -O1 vs gcc -O3? I'm amazed there is almost a factor of 30 difference in performance and I really want to see what the compiler is doing.

mattsta · 2014-09-16T21:40:29Z

Thanks for offering to check. I put files at https://gist.github.com/mattsta/cd338cb0253e85d8dfe0. It will be easier to just clone the gist and compare locally (git clone https://gist.github.com/mattsta/cd338cb0253e85d8dfe0). If you're feeling dangerous, you can try to open the gist in your browser, but the page is very large: Direct links: clang -O1 and gcc -O3.

The assembly files are from the version of geohash-int I keep at https://github.com/mattsta/geohash-int

Assembly was generated with: [gcc-4.9 | clang] geohash.c -o [compiler-options].s -O[0,1,2,3] -S -g -fverbose-asm

The timing results were found using my dumb looping timing tester: https://github.com/mattsta/krmt/blob/master/geo/test.c

It's possible the higher optimization levels were removing some useful steps since I didn't use the results of the function anywhere again.

yinqiwen · 2014-09-17T10:49:13Z

I tested with gcc 3.4.5/4.8.2 in linux, it's strange that the new decode function is slightly slower with gcc3.4.5 -O2, it seems that new glibc is much faster with math function 'ldexp'. I put all test record at
https://gist.github.com/yinqiwen/c6d0be258356996bd16b

yinqiwen · 2014-09-18T02:56:00Z

@Steve132 @mattsta Now I use simple divide & shift operations instead of math function 'ldexp', the decode is about 300% faster than using ' ldexp'.
Test record: https://gist.github.com/yinqiwen/c6d0be258356996bd16b

mattsta · 2014-09-18T03:29:42Z

Using the new divide & shift decode, my time for a decode (gcc-4.9 -O2) is now 29 ns from the previous 50 ns. Good fix!

Steve132 · 2014-09-18T17:45:34Z

Try this out:

static inline double fastldiexp(double d,uint8 step)
{
    union
    { 
        double fval;
        struct
        {
            unsigned s:1;
            unsigned e:11;
            uint32_t m:52;
        };
    }
    fval=d;
    e-=step;
    return fval;
}


area->latitude.min = lat_range.min+fastldiexp(ilato,step)*lat_scale;
area->latitude.max = lat_range.min+fastldiexp(ilato+1,step)*lat_scale;
area->longitude.min = lon_range.min+fastldiexp(ilono,step)*lon_scale;
area->longitude.max = lon_range.min+fastldiexp(ilono+1,step)*lon_scale;

mattsta · 2014-09-19T02:28:35Z

That's some crazy compiler voodoo there. After fixing 8 syntax errors, it decodes at 36 ns compared to 29 ns for the all-shifting version. (gcc-4.9 -O2)

I think we've found the optimal results for now with the divide and shift operations. :)

Steve132 added 4 commits September 15, 2014 14:10

Update geohash.c

d171985

Same change as before: change input arguments to stack based arguments for efficiency. This also allows us to skip the case where the inputs == NULL in the error check

Update geohash.c

ec1c7e0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slightly improved API. Dramatically improved implementation #1

Slightly improved API. Dramatically improved implementation #1

Steve132 commented Sep 15, 2014

yinqiwen commented Sep 16, 2014

yinqiwen commented Sep 16, 2014

yinqiwen commented Sep 16, 2014

Steve132 commented Sep 16, 2014

mattsta commented Sep 16, 2014

Steve132 commented Sep 16, 2014

mattsta commented Sep 16, 2014

yinqiwen commented Sep 17, 2014

yinqiwen commented Sep 18, 2014

mattsta commented Sep 18, 2014

Steve132 commented Sep 18, 2014

mattsta commented Sep 19, 2014

Slightly improved API. Dramatically improved implementation #1

Are you sure you want to change the base?

Slightly improved API. Dramatically improved implementation #1

Conversation

Steve132 commented Sep 15, 2014

yinqiwen commented Sep 16, 2014

yinqiwen commented Sep 16, 2014

yinqiwen commented Sep 16, 2014

Steve132 commented Sep 16, 2014

mattsta commented Sep 16, 2014

Steve132 commented Sep 16, 2014

mattsta commented Sep 16, 2014

yinqiwen commented Sep 17, 2014

yinqiwen commented Sep 18, 2014

mattsta commented Sep 18, 2014

Steve132 commented Sep 18, 2014

mattsta commented Sep 19, 2014