Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eagerly populate the class dict for cudf.pandas proxy types #14534

Merged
merged 95 commits into from
May 17, 2024

Conversation

shwina
Copy link
Contributor

@shwina shwina commented Nov 30, 2023

Rather than dynamically looking up class attributes (and methods), this PR makes it so that we eagerly populate the class with all known methods and attributes (by inspecting the "slow" class).

This solves a number of problems:

  • it makes getattr trivially inexpensive (no dynamic __getattr__ for each attribute access)
  • it ensures the same object is returned every time you do, e.g., DataFrame.max
  • it makes tab completion fast because the attributes don't have to be computed each time
  • it no longer exposes attributes that are specific to cuDF - for example Series.list
  • it allows subclassing of proxy types to work better. For example, derived types can now call super(). to access attributes of base types

@github-actions github-actions bot added the Python Affects Python cuDF API. label Nov 30, 2023
Copy link

copy-pr-bot bot commented Feb 22, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@shwina shwina changed the base branch from branch-24.02 to branch-24.04 February 22, 2024 19:50
@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue conda and removed Python Affects Python cuDF API. labels Feb 22, 2024
@shwina shwina removed libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue conda labels Feb 23, 2024
@github-actions github-actions bot removed the ci label May 16, 2024
@galipremsagar
Copy link
Contributor

/okay to test

@galipremsagar
Copy link
Contributor

/okay to test

@galipremsagar
Copy link
Contributor

galipremsagar commented May 17, 2024

@wence- I've reverted the vast majority of the changes I made to this PR and this is now ready for review. Since I've made a couple of minor fixes to this pr I'd like to get it reviewed by someone other than me before merging.

Something to note, the drop in test pass rate is expected and are valid failures that are not in the scope of fixing in this PR.

if instance is not None:
return _maybe_wrap_result(
getattr(instance._fsproxy_slow, self._name),
None, # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to adjust the func typing in _maybe_wrap_result as func: Callable | None and raise an exception in is_final_type and is_intermediate_type branches if func is None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be straight-forward, while testing this change I noticed 4k additional failures. Maybe we can revisit this in another PR.

@@ -273,6 +277,9 @@ def Index__new__(cls, *args, **kwargs):
"__new__": Index__new__,
"_constructor": _FastSlowAttribute("_constructor"),
"__array_ufunc__": _FastSlowAttribute("__array_ufunc__"),
"_accessors": set(),
"_data": _FastSlowAttribute("_data", private=True),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So these attributes we are wanting the fast path to have a chance to evaluate and not automatically be _Unusable()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so in this case since it's private we won't try to operate on e.g. cudf.Index._data which is a ColumnAccessor and instead reach for cudf.Index._data which is a numpy ndarray which we can wrap.

This part of the eager population I'm not too thrilled about. These private variables are not guaranteed to be stable. It would be nice if any slow attribute was attempted to return a proxy object so we didn't have to specify attributes like this but I suppose that can be looked at in a follow up

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there are still such cases we can address in follow-ups. At this point the scope of this PR has kept on expanding, that's why I haven't addressed additional issues in this PR.

@mroeschke
Copy link
Contributor

/okay to test

Copy link
Contributor

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My typing request and concern about having to define private attributes in our proxies can be addressed in a follow up

@galipremsagar
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit e6e6761 into rapidsai:branch-24.06 May 17, 2024
70 checks passed
@galipremsagar galipremsagar added the 5 - Ready to Merge Testing and reviews complete, ready to merge label May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Tab completion broken in ipython
7 participants