-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extensions API for third-party table/catalog providers #43
base: main
Are you sure you want to change the base?
Conversation
|
||
/// A composite extension registry for enabled extensions. | ||
#[derive(Debug)] | ||
pub(crate) struct Extensions(Box<[Box<dyn Extension>]>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not Vec<Box<dyn Extension>>
? What is the advantage of Box<[T]>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit lighter (hardly relevant in this scenario) and it's immutable by design (arguably also irrelevant, though I personally prefer it when I can choose). I think I got this habit after watching this video. :)
But I don't mind switching to Vec
if you think that would improve readability.
dfc48a5
to
c6a9535
Compare
c6a9535
to
4fa7371
Compare
4fa7371
to
ad26fc8
Compare
Rebased on top of the main branch and extracted the |
if let Ok(py_plan) = py_plan.to_object(py).downcast_bound::<PyExecutionPlan>(py) { | ||
// For session contexts created with datafusion_ray.extended_session_context(), the inner | ||
// execution plan can be used as such (and the enabled extensions are all available). | ||
Ok(py_plan.borrow().plan.clone()) | ||
} else { | ||
// The session context originates from outside our library, so we'll grab the protobuf plan | ||
// by calling the python method with no extension codecs. | ||
let py_proto = py_plan.call_method0("to_proto")?; | ||
let plan_bytes: &[u8] = py_proto.extract()?; | ||
let plan_node = protobuf::PhysicalPlanNode::try_decode(plan_bytes).map_err(|e| { | ||
PyRuntimeError::new_err(format!( | ||
"Unable to decode physical plan protobuf message: {}", | ||
e | ||
)) | ||
})?; | ||
|
||
let runtime = RuntimeEnv::default(); | ||
let registry = SessionContext::new(); | ||
plan_node | ||
.try_into_physical_plan(®istry, &runtime, Extensions::codec()) | ||
.map_err(|e| e.into()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of my goals was to remove the datafusion-python
dependency from datafusion-ray
so that you wouldn't have hard requirements about having the exact same version between the two. It can be worse in that you also have to have the same compiler for both. Now for datafusion-ray
we may be able to get away with it for official releases since we control the build pipeline for both. This does place a restriction on end users in that they have to make sure they keep these versions synced on their machine. In my opinion it would be better to lean on things like the FFI interface that is coming in datafusion-python
43.0.0. I know that right now doesn't solve the problem of having all extensions, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @timsaucer !
Makes sense, and I'm entirely onboard with this goal. Which is why I tried to preserve this guarantee by keeping the existing code within the else
branch here, where no assumption is made about the datafusion-python
version/compiler.
The "embedded" datafusion-python
dependency is only supposed to be an opt-in alternative for users who decide to call the new function for creating an ABI compatible context preconfigured with the enabled extensions (that's the only way the above downcast can succeed, if I'm not mistaken).
So any existing or future code that doesn't switch to using the new extended_session_context()
function will continue to work without any compatibility restrictions.
However, if I somehow failed to preserve this guarantee or if I missed something that introduces any potential risks, please let me know so I'll revisit the approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to hold up progress, but this feels like a step in the wrong direction. But I also don't have enough time right now to give a better solution. I think I'd want to do something like use the FFI_ExecutionPlan in df43 to share these across packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I'll look into that. Thanks!
Add support for third party table/catalog providers, to allow integrations with databases, table formats etc. to be enabled with dedicated features.
This patch adds an extensions API and a first extension for Flight SQL tables.
As a side note, the Delta Lake extension can be added once the datafusion dependency in the official project is upgraded to 42+.
create external table
CompositeCodec