-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for byte shuffle as a codec so that it can be used as a Zarr filter #260
Comments
Hi @pbranson, just to say this sounds like a good idea to me, PR welcome. |
Thanks for the encouragement @rsignell-usgs ! Have made a start but getting hit by a pre-christmas crunch. Hopefully have a PR ready for review by early next week. |
I have had a go at exporting the internal blosc shuffle functions, however the buffers used by those functions use an unsigned char* pointer, presumably due to the byte shuffle, this presents some warnings during compile as the Buffer convenience class uses a char* pointer. The c function header is: I have some branches of numcodecs
which outputs:
Clearly the encode output buffer isnt being written to. I tried a few different ways to try declaring the cython pointer as Wondering if this path is worth pursuing, or if just directly coding simpler shuffle that doesnt make use of hardware optimisations might be more achievable and sufficient for now? This is my first foray into cpython/cython so would be greatful for any advice. |
For deep C-related issues in numcodecs I nearly always tag @jakirkham! 😉 |
Sorry this has taken a while to get back to! I abandoned the approach of trying to use blosc-c as I couldnt see a clear path forward. Resorted to using numba - if one of the main devs could take a look at the PR #273 to give some guidance on if this is an acceptable approach that would be great. |
Closed by #273 |
Enhancement Request
Following on from discussion here:
fsspec/kerchunk#11
It would be great to expose the blosc library shuffle operations as a numcodec Codec so that shuffle could be included as a zarr filter.
This will assist with efforts to expand the functionality provided by the fsspec ReferenceFileSystem to a broader range of datasets stored in hdf format on S3.
The plan would be to expose https://github.com/Blosc/c-blosc/blob/9fae1c9acb659159321aca69aefcdbce663e2374/blosc/shuffle.h as a cython shuffle.pyx module in numcodecs.
If this sounds like a good enhancement I would be happy to try making a PR for this.
The text was updated successfully, but these errors were encountered: