-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Just-in-time deserialization #353
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-0.17 #353 +/- ##
==============================================
Coverage ? 58.36%
==============================================
Files ? 19
Lines ? 1561
Branches ? 0
==============================================
Hits ? 911
Misses ? 650
Partials ? 0 Continue to review full report at Codecov.
|
@madsbk so it seems that the reason it fails to |
Make sense, thanks @pentschev for investigating this! |
edf503e
to
40058cd
Compare
Well Pandas is required by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mads, this is some great work you put here, it's great to see the incredible speedup you've achieved!
Functionality-wise, I don't have much to comment on, some parts are much beyond my understanding, and it seems like there was enough testing by you and others with TPCx-BB. Therefore, my comments/requests revolve around style and documentations only.
return sys.getsizeof(self._obj_pxy_deserialize()) | ||
|
||
def __len__(self): | ||
return len(self._obj_pxy_deserialize()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this deserialize the entire object just to check its length? If so, wouldn't it make sense to store an attribute with the length during serialization and just return it to avoid this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes currently len(x)
will deserialize x
. What you suggest, we are already doing with x.name
, which I added because worker will access x.name
before execution tasks.
My plan is to address this issue in a following PR when we have some more experience using JIT deserialization. I suspect we will need to handle a range of attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thanks for the continued work on it.
return complex(self._obj_pxy_deserialize()) | ||
|
||
def __index__(self): | ||
return operator.index(self._obj_pxy_deserialize()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all these dunder method implementations here because they are going to be needed in practice, or are they here just to match a full implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some are needed for ProxyObject
to pass-through cudf and tpcx-bb workflows but many of them are just here for completeness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, given they're already there I won't suggest we remove them, but for future reference I think we can try to keep code shorter instead of solving all possible cases that are probably unnecessary, this helps with future maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree but since ProxyObject
is exposed to the end user, I think it is reasonable to support most common operations. Say a user writes a Dask tasks that uses Numpy arrays, the user should be able to use most, if not all, Numpy operations.
"subclass": subclass, | ||
"serializers": serializers, | ||
} | ||
self._obj_pxy_lock = threading.RLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious, why isn't a simple threading.Lock
sufficient here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is to handle all the methods that acquires the lock before calling _obj_pxy_deserialize()
, which also acquires the lock.
dask_cuda/proxy_object.py
Outdated
List of frames that makes up the serialized object | ||
""" | ||
with self._obj_pxy_lock: | ||
assert serializers is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we replace runtime assertions to raising more appropriate exceptions? For example here I think it would make more sense to raise a ValueError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure, the check is there to assert the internal logic of the class. The _obj_pxy_*
methods are not supposed to be called by the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that, but nevertheless I think of assert
as a debug statement, so I'm not sure it makes much sense in runtime code. I know Python doesn't distinguish between "debug" and "release" builds, like we can with C, but regardless I think it's clearer to have errors be specific, one could also argue that assertions would substitute virtually any exceptions we could raise, but the exceptions let us be clearer about our intent.
In the interest of avoiding nitpicking, I will leave the final decision up to you, but I think there's real value in being more specific instead of using assertions, whenever we want to raise an exception if something goes wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I agree completely. I like to use assert
in internal logic in order to both catch bugs while developing and document the expected state of an object.
However, in this case I agree with you. Since it can be triggered by user input, it is preferable to raise ValueError
.
We'll give this a test in a fresh environment on TPCxBB workloads on constrained memory systems as well |
@pentschev thanks for the review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @madsbk , I've suggested some styling and typo fixes, feel free to ignore those you disagree with. It's a lot of minor things, but at least GH will make it easy for you to apply them. :)
Also, I think we can leave this open until EOD or Monday so people who are interested can still review or test it before merging, but otherwise it looks good to me! Thanks again @madsbk for the work you put into this, it's really great! |
Co-authored-by: Peter Andreas Entschev <[email protected]>
It is all in :)
Yes, no hurry. I also like to wait on a verdict from @beckernick :P |
I am running this currently :) |
Thanks for all of your work on this Mads! And everyone for the reviews! 😄 |
Thanks @beckernick, I don't know exactly why we see the memory spikes but I will make a follow up PR to address it when I find the time :) |
This PR implements Just-in-time deserialization of device memory by wrapping
DeviceHostFile
items in a proxy class:ProxyObject
. The deserializaion of the items are then delayed until the items are accessed.Fixes #342
Ref. #57
Use
In order to enable JIT deserializaion, use the new
jit_unspill
argument when creatingLocalCUDACluster
, set--enable-jit-unspill
when starting a CUDAWorker, or set the environment variableDASK_JIT_UNSPILL=True
TODO
ProxyObject
(should come up with a better name)ProxyObject
in order to allow the communication spilled dataProxyObject
in order to avoid re-spilling of data in the case whereProxyObject
wraps un-spilled device data.ProxyObject