-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arrow string_view
raises DuckDB NotImplementedException
#41
Comments
😢 I'd argue this is a DuckDB bug, since DuckDB says it supports Arrow input, and so it should check for these data types. The Arrow PyCapsule Interface actually has a mechanism to fix this: it allows data consumers to check the input schema and ask the producer to cast the data to a desired output schema. For now, it shouldn't be too hard to manually work around this using pyarrow APIs |
All good!
So once we get back the table we should cast some columns? |
Yeah, we should be able to check for any string view (probably also binary view, and maybe run-end-array) and cast those to duckdb-supported types. It would be nice if there were a resource in DuckDB that said which Arrow types it supports. |
Ok working on a PR! |
Looks like it's coming from here: |
Oh so the latest version (maybe unreleased) of duckdb should support it: https://github.com/duckdb/duckdb/blame/4e3a192ce94a793510f11b598805f104d7531c15/src/function/table/arrow.cpp#L88-L89 |
Acually, it looks like it's here lol https://github.com/duckdb/duckdb/blob/4e3a192ce94a793510f11b598805f104d7531c15/src/function/table/arrow.cpp#L88-L89 |
Oh... yup lol you beat me to it |
I made a little note in my argument for DuckDB to support the interface: duckdb/duckdb#10716 (comment) |
Made a PR, but the implementation isn't working since the |
😱 how is string_view -> string not a valid cast?? I would suggest temporarily using arro3 to do the cast, but arrow-rs won't support view types until the next major release (in ~one month) The other option to support polars specifically for now is to call polars' to_arrow method, which allows Polars to cast to non-view array types if you set the |
Lol i know...
I would defer to you here as the arrow guru :) what do you think would be the best choice for the time being? I could definitely do the polars patch today... but i am not familiar enough with the arro3 yet :) |
Just checking re duckdb/duckdb#10716 (reply in thread) you're on the latest DuckDB version? |
It turns out that was a simple repro. This fails: import polars as pl
import duckdb
import pyarrow as pa
df = pl.DataFrame({"a": ["a", "b", "c"]})
table = pa.table(df)
duckdb.from_arrow(table)
With polars 1.4.1, duckdb 1.0.0, pyarrow 17.0.0. |
Yes, and just tried out with nightly |
I don't think there's a good way today to handle string view -> string data type casting. I suppose the best workaround right now is to hard-code support for polars, check for a Even though this goes against what I want with the pycapsule interface, which is for consumers to not have to think about where the data is coming from 😛 arro3 isn't capable of this until the next arrow-rs release |
Done, see #42 |
Oh it works for me with this same nightly And the upstream was closed for being fixed on latest main duckdb/duckdb#13424 |
ok maybe I didn't restart my jupyter kernel :( |
Yup, can confirm that I'll revert #42 after the next python release |
#23 led to a regression for string data types, specifically with the string_view Arrow data type not being recognized by DuckDB. I'm going to add some tests to ensure our Arrow/DuckDB connection code works, as we should have end-to-end tests to catch this.
cc: @kylebarron
The text was updated successfully, but these errors were encountered: