-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support for Arrow PyCapsule interface #23
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,12 +6,14 @@ | |
|
||
import anywidget | ||
import duckdb | ||
import pyarrow as pa | ||
import traitlets | ||
|
||
from ._util import ( | ||
arrow_table_from_dataframe_protocol, | ||
arrow_table_from_ipc, | ||
get_columns, | ||
has_pycapsule_stream_interface, | ||
is_arrow_ipc, | ||
table_to_ipc, | ||
) | ||
|
@@ -37,7 +39,10 @@ def __init__(self, data, *, table: str = "df"): | |
conn = data | ||
else: | ||
conn = duckdb.connect(":memory:") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you know if the DuckDB There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I believe it spills to disk There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought I was talking to someone recently who asserted There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi just stopping by; in-memory databases will spill to disk, it will create/use the You can test this out by setting a relatively low memory limit and creating a temp table that exceeds this limit. |
||
if is_arrow_ipc(data): | ||
if has_pycapsule_stream_interface(data): | ||
# arrow_table = pa.RecordBatchReader.from_stream(data) | ||
arrow_table = pa.table(data) | ||
Comment on lines
+48
to
+49
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here this is not ideal because If I uncomment the Note that it shows 11k rows in the bottom right, but then doesn't display anything. So I think you need to persist the Arrow stream input manually. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I figured this is not ideal... I just wasn't sure how to wire up duckdb to query something consistently. I'm guessing something like "CREATE VIEW" from the input stream might allow us register some tables that duckdb can continuously query. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not 100% certain but I don't think you could always use I think we're running into the same issue as in ibis: ibis-project/ibis#9663 (comment) . In particular, the presence of the dunder method doesn't tell you whether the input data is already materialized or a stream. If you know that the input data is an in-memory table, then you can call |
||
elif is_arrow_ipc(data): | ||
arrow_table = arrow_table_from_ipc(data) | ||
else: | ||
arrow_table = arrow_table_from_dataframe_protocol(data) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These branches aren't mutually exclusive (we grab arrow from the duckdb so that it is captured in the next block), and we don't want to raise a
ValueError
for displaying stuff that can't be visualized with quak.Right now
"hello world"
Would raise a value error since it doesn't match any of the conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see; this is probably the error I was hitting when I tried to
repr
something else (why I tried to unload quak)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general note, why do you convert it to Arrow if you're passing it back to duckdb? I think duckdb can do replacement scans on
DuckDBPyRelation
objects.