-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support polars #160
Comments
Hi @deanm0000 ! Thanks for the suggestion. There's a way to trivially implement support (i.e. how we currently implement support for pyarrow Of course, to get the performance benefits, converting everything to pandas defeats the purpose. Do you have any instances where you are performance bottle-necked? Or is this more just a quality of life feature request? |
I guess, in those terms, it's a quality of life improvement. From a pure usability perspective it isn't hard to convert to pandas. I didn't realize that the pyarrow input just converted to pandas under the hood. I poked around really quickly and I couldn't find where in the code the transformations happen. Could you point me to that, like if I did |
The lazy arrow -> pandas conversion happens here: https://github.com/matthewwardrop/formulaic/blob/main/formulaic/materializers/arrow.py . In practice, under the hood, the data sometimes can pass through uncopied through this transaction, but then compute is done in numpy arrays or pandas Series depending on the transform. Again, the framework is datatype agnostic, so it is happy with other types... but we'd need to go through and update the transforms (like contrast encodings) to make sure they have implementations for these types. |
Maybe on thing to consider here is the effort to come with a DataFrame API: https://data-apis.org/dataframe-api/draft/ It could be handy to write DataFrame agnostic code. |
Hi @matthewwardrop - would you be open to using Narwhals for this? Altair recently adopted it for this purpose vega/altair#3452, as did scikit-lego Happy to put up a POC if you'd be interested (just checking first!) |
Polars is a (relatively) new dataframe library that is gaining more popularity and blows pandas away in performance using arrow memory in the backend.
The text was updated successfully, but these errors were encountered: