-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eagerly populate the class dict for cudf.pandas proxy types #14534
Eagerly populate the class dict for cudf.pandas proxy types #14534
Conversation
…populate-class-dict
…populate-class-dict
…populate-class-dict
…populate-class-dict
…populate-class-dict
…populate-class-dict
…populate-class-dict
/okay to test |
/okay to test |
@wence- I've reverted the vast majority of the changes I made to this PR and this is now ready for review. Since I've made a couple of minor fixes to this pr I'd like to get it reviewed by someone other than me before merging. Something to note, the drop in test pass rate is expected and are valid failures that are not in the scope of fixing in this PR. |
Co-authored-by: Matthew Roeschke <[email protected]>
if instance is not None: | ||
return _maybe_wrap_result( | ||
getattr(instance._fsproxy_slow, self._name), | ||
None, # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to adjust the func
typing in _maybe_wrap_result as func: Callable | None
and raise an exception in is_final_type
and is_intermediate_type
branches if func is None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to be straight-forward, while testing this change I noticed 4k
additional failures. Maybe we can revisit this in another PR.
@@ -273,6 +277,9 @@ def Index__new__(cls, *args, **kwargs): | |||
"__new__": Index__new__, | |||
"_constructor": _FastSlowAttribute("_constructor"), | |||
"__array_ufunc__": _FastSlowAttribute("__array_ufunc__"), | |||
"_accessors": set(), | |||
"_data": _FastSlowAttribute("_data", private=True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So these attributes we are wanting the fast path to have a chance to evaluate and not automatically be _Unusable()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah so in this case since it's private
we won't try to operate on e.g. cudf.Index._data
which is a ColumnAccessor
and instead reach for cudf.Index._data
which is a numpy ndarray which we can wrap.
This part of the eager population I'm not too thrilled about. These private variables are not guaranteed to be stable. It would be nice if any slow attribute was attempted to return a proxy object so we didn't have to specify attributes like this but I suppose that can be looked at in a follow up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there are still such cases we can address in follow-ups. At this point the scope of this PR has kept on expanding, that's why I haven't addressed additional issues in this PR.
/okay to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My typing request and concern about having to define private attributes in our proxies can be addressed in a follow up
/merge |
Rather than dynamically looking up class attributes (and methods), this PR makes it so that we eagerly populate the class with all known methods and attributes (by inspecting the "slow" class).
This solves a number of problems:
getattr
trivially inexpensive (no dynamic__getattr__
for each attribute access)DataFrame.max
Series.list
super().
to access attributes of base types