-
Notifications
You must be signed in to change notification settings - Fork 917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Dask cuDF legacy code #17205
Refactor Dask cuDF legacy code #17205
Conversation
# For dask>2024.2.0, we can silence the loud deprecation | ||
# warning before importing `dask.dataframe` (this won't | ||
# do anything for dask==2024.2.0) | ||
config.set({"dataframe.query-planning-warning": False}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer relevant.
python/dask_cudf/dask_cudf/_expr.py
Outdated
_PATCHED = False | ||
|
||
|
||
def _patch_dask_expr(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of this PR is just moving around existing code. This change is actually new - We used to implicitly "patch" dask-expr when the expr
module was imported. I'm trying to make the patching a bit more explicit (but I only want to allow this function to do its thing once).
If |
Question: is there any user-visible change to this refactoring, such that we must warn users about the deprecation? Or is it that we can just swap out the implementation from under them? |
I'm happy to do whatever others prefer. I just want to choose some convention that helps us move in the direction of making everything other than
There may be some code that people were importing directly from With that said, I'll try to make some changes to better-warn people who try to import things from "deprecated" paths. |
@rjzamora agree, the important thing is to mark them private.
|
Sounds good @madsbk - I made this change. |
def raise_not_implemented_error(attr_name): | ||
def _deprecated_api(old_api, new_api=None, rec=None): | ||
def inner_func(*args, **kwargs): | ||
if new_api: | ||
# Use alternative | ||
msg = f"{old_api} is now deprecated. " | ||
msg += rec or f"Please use {new_api} instead." | ||
warnings.warn(msg, FutureWarning) | ||
new_attr = new_api.split(".") | ||
module = import_module(".".join(new_attr[:-1])) | ||
return getattr(module, new_attr[-1])(*args, **kwargs) | ||
|
||
# No alternative - raise an error | ||
raise NotImplementedError( | ||
f"Top-level {attr_name} API is not available for dask-expr." | ||
f"{old_api} is no longer supported. " + (rec or "") | ||
) | ||
|
||
return inner_func |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I decided to expand this utility to help deal with the many dask_cudf.io
APIs that are no longer directly accessible with query-planning on.
|
||
to_orc = _deprecated_api( | ||
"dask_cudf.to_orc", | ||
new_api="dask_cudf._legacy.io.to_orc", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The legacy to_orc API is actually "fine" (it's compatible with query-planning stuff), but we might as well discourage this kind of usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks @rjzamora
/merge |
Description
The "legacy" DataFrame API is now deprecated (dask/dask#11437). The main purpose of this PR is to start isolating legacy code in Dask cuDF.
Old layout:
New layout:
Notes
dask_cudf.core
(previously led to a "silent" bug when query-planning was enabled).dask_cudf.io
(and sub-modules likedask_cudf.io.parquet
), but they will typically get a deprecation warning.dask_cudf.io.*
. Also, thegroupby
,sorting
, andaccessors
modules have simply moved. It should be uncommon for down-stream code to import from these modules. It's also worth noting that query-planning was already causing problems for these users if they were doing this.Checklist