-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyArrow input should result in PyArrow output? #187
Comments
This is an interesting question. While none the internal plumbing hard-codes a specific data-type (very intentionally), a lot of the transforms were designed to work with pandas or sparse datatypes. They do have a mechanism (single-dispatch) for customising the behaviour with other data types, but it isn't implemented for most transforms. Obviously we could cast back to a pyarrow data type at the end if we wanted to. At least historically, I'd always viewed arrow as an interchange format, since there were few routines that ran directly on the arrow datastructures themselves. I think this is changing, so I'm totally open to thinking through this more. Do you have specific use-cases where having the output be an arrow table would make more sense for you? |
Thanks for your response!
I think if a user passes in Polars, they expect to get back Polars. And as I was looking into preserving the input data class for Polars, I noticed that for PyArrow the input data class isn't preserved If you're open to it, I could put up a PR demonstrating how Narwhals could work here, as suggested in #160 (comment)? No obligations nor hard feelings if it then gets rejected of course, it just looks like a good use-case (for Polars in particular it would be good to keep things Polars-native if possible...maybe they can also stay lazy, not sure yet) |
Hi @MarcoGorelli ! I was toying with Narwhals a bit this morning, and it looks great. I'm still leveling up, but I have most of an implementation working now in Formulaic that can use it as the materialization backend. Given your heavy involvement in Narwhals, I suspect you will know various tricks that I don't, so when I put up a PR soon, I'll let you chime in on it (and feel free at that time to make further contributions :)). |
Cool, thanks!
😄 I'm the original author (maybe I should make that clearer somewhere)
Sounds great! And feel free to join our Discord if you have any question/request which doesn't quite fit into a GitHub issue |
Given that we have now merged in (experimental) support for narwhals, I'll close this issue in favour of specific bugs that may arise :). Thanks again for your help @MarcoGorelli ! |
If I run the README example with PyArrow input, I get pandas output:
I think I'd have expected
I'm asking in the context of #160 , because there, I think Polars input should probably result in Polars output?
The text was updated successfully, but these errors were encountered: