Dispatcher decompressing data when not needed #31

dimitrijejankov · 2018-07-07T05:10:44Z

Hi I've been adding stuff to the Dispatcher to get my new algorithm running.
I noticed that the dispatcher is decompressing the data even though it does not use the decompressed data.

In the file
https://github.com/riceplinygroup/plinycompute/blob/master/pdb/src/serverFunctionalities/source/DispatcherServer.cc

line 88 to 95 we are decompressing the data with snappy stored in tempPage. The decompressed result is stored in readToHere. But if we look at the 130 to 136 only the tempPage (the compressed data is sent), thus we are just doing useless work on the dispatcher.

dimitrijejankov · 2018-07-07T05:47:20Z

Another issue is that even if we are sending uncompressed bytes of a shallow copy the bool DispatcherServer::sendBytes has a hardcoded value to true for the indicator compressedOrNot, thus it will probably not work.

jiazou-bigdata · 2018-07-07T11:30:32Z

Thanks Dimitrije.

(1) The reason that it decompress data before forwarding the compressed data is to give dispatcher a chance to look into the content to do size checking or other things such as from line 98 to line 108.

(2) I once did write an example to successfully use shallow copy and send uncompressed bytes directly, please check it here: https://github.com/riceplinygroup/plinycompute/blob/master/applications/TPCHBench/tpchDataGenerator.cc

Let me know if you have further questions.

dimitrijejankov · 2018-07-07T12:55:52Z

Hi Jia,
Thanks for the reply.

(1) The only check we do is whether the vector is empty or not, can be done on the other side as well. The code for type validation is commented out 267-292 that is called on 110 and it will always pass.

(2) If you look at the sendBytes method 349 - 352 you will see that the second last argument is set to true. Which corresponds to the parameter compressedOrNot in StorageAddData. Meaning it will always be set as compressed regardless if the flag ENABLE_COMPRESSION is enabled or not.

jiazou-bigdata · 2018-07-07T16:24:14Z

Hey Dimitrije,

Not very sure what exactly your concerns are and how important it is. Just to clarify it a bit and share information/ideas from my side:

For your first point, we can easily change code to cancel the size checking and directly forward the compressed bytes, but I'm afraid for some complex dispatching features such as partitioning or building indexing while dispatching data, decompression is unavoidable.
In addition, size checking or other validation at dispatcher side is often reasonable as it is much easier to obtain/track error logs there than from many worker servers. Logging at client side is irrelevant with server design.

For your second point, I guess it also only requires small change of code to make the flag in sendBytes to be configurable and I suspect it once worked like that.
Some additional comments: for most of our experiments for the SIGMOD paper that processes hundreds of gigabytes of intermediate data (also our targeting machine learning use scenarios), adding compression has better performance in most of the cases, for some case where compression ratio is not good, (de)compression overhead doesn't seem significant. So if asked for a default configuration, I would recommend to have compression on, unless our targeting scenario changes.

Hope this helps.
Jia

dimitrijejankov · 2018-07-07T16:49:34Z

Hi Jia,
Thanks for your time. Good point, if the (de)compression is not a large overhead we can keep it. For the second issue is it fine if I go ahead and change that?

jiazou-bigdata · 2018-07-07T17:15:50Z

Sure.

dimitrijejankov added the possible problem label Jul 7, 2018

dimitrijejankov assigned jiazou-bigdata Jul 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dispatcher decompressing data when not needed #31

Dispatcher decompressing data when not needed #31

dimitrijejankov commented Jul 7, 2018

dimitrijejankov commented Jul 7, 2018

jiazou-bigdata commented Jul 7, 2018

dimitrijejankov commented Jul 7, 2018

jiazou-bigdata commented Jul 7, 2018

dimitrijejankov commented Jul 7, 2018

jiazou-bigdata commented Jul 7, 2018

Dispatcher decompressing data when not needed #31

Dispatcher decompressing data when not needed #31

Comments

dimitrijejankov commented Jul 7, 2018

dimitrijejankov commented Jul 7, 2018

jiazou-bigdata commented Jul 7, 2018

dimitrijejankov commented Jul 7, 2018

jiazou-bigdata commented Jul 7, 2018

dimitrijejankov commented Jul 7, 2018

jiazou-bigdata commented Jul 7, 2018