-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Comparison with the zarr format? #527
Comments
Hello. I don't represent Hugging Face or its position on the issue. If I'm wrong, please correct me. |
Thanks for the answer! I believe that zarr offers the same and more than safetensors (chunking, different compressions and others) except perhaps some of the specific dtypes such as bf16. Thank you! |
You're welcome. |
I had the exact same question as @julioasotodv. It's interesting to see in the readme that HDF5 is described as
Zarr is heavily inspired by HDF5's data model, and so for all the same reasons would also be a great fit for this problem.
An ML-specific format is only an advantage over a more general format if there is some feature that the ML domain needs that the general format cannot provide. Otherwise Hugging Face are just duplicating effort only to end up with a tool with fewer features and a smaller community. People have used Zarr in an ML context at scale - I believe Google used it to checkpoint model weights when training Gemini.
Can you elaborate on this? The relevant questions (and my stab at answers) are:
In fact, one might even imagine using the "virtual zarr" approach to make existing data in safetensor form accessible via zarr library implementations, which would be interesting as a way to very efficiently read safetensor data into applications which understand zarr without duplicating the safetensor data...
I would be very curious to hear an opinion from Hugging Face! :) |
I wasn't aware at all of HDF5 was getting deprecated in TF/Keras at the time of writing this lib, which means building upon it didn't look like a great choice. HDF5 main issue at the time was lack of support for BF16, which is ubiquitous now in ML (and was already quite used at the time.) Same probably goes for FP8(4,5 variants). Zarr seems to be trying to store compression schemes. This is imho a counterproductive thing. Currently there are several alternatives for quantization and every month there's a new one, with always very different information needed, layout/constraints and I'm not mentioning alignment. It seems currently other libraries are able to build on top of The issue with trying to support those, is that since they are not standard, they do not have any equivalent in most libraries (GGUF compressed tensors do not exist in torch for instance). This makes it super hard to both cover everything, and stay simple. And it makes other libraries trying to read Currently, the current format hasn't changed a bit since inception, except for the addition of FP8 (which is a real hardware dtype). zarr seems to have issues in its specs: https://zarr-specs.readthedocs.io/en/latest/v3/data-types.html Are there other sources ? Just as a background For the detailed questions:
Safetensors too. In ML tensors are not compressible by design, safetensors is zero-copy, which is pretty hard to beat.
Yes is it. If zarr supports compression you cannot guarantee that you can do that though (whereas safetensors is guaranteed to be zero-copy).
If Zarr implements zip, then it is vulnerable. There's no way around it. Your implementation may not be vulnerable, but that's implementation specific. Any other implementor might be or if users are unzipping manually.
Sure. |
Thank you for the detailed answer!
Yeah I'm not advocating for HDF5, just pointing out the similarities.
Zarr v3 is still in beta. Zarr v2 is widely used though (spec here). For the purposes of this discussion the difference isn't very important - other than that one of the aims of v3 is to support adding new custom dtypes.
Interesting. It would be interesting to know if these different quantization approaches could all be considered examples of Zarr codecs or not. I can see there might be a difference in project scope here though, with other libraries building on top of
Yeah makes sense. I would have thought a Zarr reader implementation could be zero-copy for the case of uncompressed data too though.
I see. Obviously compression is useful for the other users of Zarr, but it is optional. So perhaps one could imagine simply defining "safe Zarr" as any Zarr store that is not compressed using certain codecs... You could also check for the presence of these codecs before attempting for decompress anything. And implement a "zero-copy safe zarr reader" that just refuses to open compressed data.
Zarr supports zlib compression, but if I understand correctly it doesn't implement zip in general. I have asked upstream to check my understanding though, see zarr-developers/zarr-python#2625.
FYI the relevant project to read about is Google TensorStore, which is a C++ implementation of Zarr, used for checkpointing LLMs at scale. |
A key advantage of Safetensors over Zarr is its ability to handle chunks "dynamically", with virtually no overhead. In contrast, Zarr relies on a static chunking strategy (now also introduces sharding) set during its creation. When Zarr chunks perfectly align with user requirements, both SafeTensors and Zarr deliver comparable performance. Safetensors is still slightly faster but maybe due to its Rust implementation compared to Zarr's Pure Python code. Safetensor is as fast as zarr
However, if user request doesn't perfectly match Zarr chunks, the performance difference can be pretty big. Safetensors can be easily 10 to 100 times faster than Zarr in many many cases. Safetensor is 41 times faster than Zarr for the same data request
https://colab.research.google.com/drive/1ewxNez6R-q9tWtGBp4JsQrKMFrTS5BG_?usp=sharing Over a network, if your server supports multiple range requests, you can perform any partial read from a Safetensors file using just 3 GET requests, without needing caching.
Google TensorStore is more than a C++ implementation of Zarr. Zarr is just one of the many drivers it supports. https://google.github.io/tensorstore/driver/index.html |
@csaybar I repeated the benchmarks on my local machines. New Mac M-series Older Lenovo Thinkpad with Ubuntu I also repeated the benchmarks on Colab but: Colab Based on the variability of these results, I would be hesitant to draw any definitive statements from these benchmarks. In particular:
does not match my personal experience. (But it's quite possible that it's just because I have very limited personal experience!) I'm also hesitant to conclude that any differences between the formats relate to the chunking strategy. I'm interested in seeing more benchmarks in the community comparing |
Hi @nenb, you're absolutely right! I'm not very familiar with the Zarr Python implementation, but it seems the default compression is not None. The documentation is messy due to the transition between v2 and v3, but it seems that To make a fair comparison, I'll use Results:https://colab.research.google.com/drive/1-yuHw5Id2bJDpKkilHBQ7QeRPfM8u8VB?usp=sharing
Safetensor doesn’t have a chunk strategy. Everything is fixed by default. It uses Check the number of functions calls Zarr has to make for the same request: 85,053 (Zarr) vs 5 (Safetensor). If Zarr were coded in C or Rust, it could potentially match Safetensor speed at "perfect alignment". HDF5 is faster than Zarr because its core is in C. Zarr’s main devs prefer to keep everything in Python, I’m not sure why. |
In your runs you are profiling the import time of To have a realistic benchmark, I would suggest using larger data chunks so that the imports do not distort the results and/or rewriting the benchmark so as to remove these imports from the results. Because in practice, I don't really mind if there is a cost of tens of milliseconds on the very first read - the rest of my I/O times will completely dwarf this (and if they don't,
My point was that
I am a user and not a dev, and so I can only speculate. But there are multiple implementations of |
The profiling is specifically conducted within the context manager, not considering the import time. I added from safetensors import safe_open
from safetensors.numpy import save_file
import numpy as np
import cProfile
import pstats
import numpy as np
import cProfile
import zarr
import pstats
https://docs.python.org/3/library/profile.html
Something "realistic" will depend on the use case. The example I used corresponds to a usual Sentinel-2 data cube. That you can download using stackstac for instance. But in my experience, bigger will be worse for Python Zarr. Consider that:
In my opinion, in a server context only matters minimise the number of GET operations, especially when serving millions of clients. Safetensor is also better here. As it is a fixed size, you can:
To illustrate the GET optimization (step 2), consider this example:
You can do the same in Zarr and GeoTIFF too but at "chunk" level, Safetensor permit you to do it at "byte" level. In a machine learning context, compression is irrelevant, that is why safetensors shines. In a geoscience context, data usually has high autocorrelation, you can easily get a 10x compression ratio for some datasets (especially climate-related ones) with a lossless compressor. This is the reason why formats like Safetensors are not popular in the geoscience community IMO.
Beyond Python-Zarr, other implementations lack support. |
This is not correct. You can see this in the I think it's important that a lot of the statements that have been made are also backed-up with benchmarks (and I'll admit that I am also guilty of this!). There are only two benchmarks available in this thread. I wanted to point out that they i) vary quite a bit across machines and ii) when some subtleties are accounted for, the differences are not as large as originally suggested.
I do want to address this though, as I have seen it mentioned before, and I am not sure what it relates to. If we consider the I don't mean to claim that this is true for all use-cases, but if it's true for one of the most downloaded LLMs on HF, then I don't think it's correct to say that compression is irrelevant. |
I disagree here. To me, this sounds like a lack of optimization. But I understand your point and I think it is a valid claim.
Exactly! That’s why I shared some Colab notebook. It would be very interesting to see counterexamples.
In my experience (maybe not best to talk about this), if you have a high compression ratio, quantizing your model may be the best approach. I have been playing with Qwen2.5-1.5B-Instruct-GPTQ-Int8 I just compressed with zip and tar.xz and the ratio is close to 1 in both compared to the original safetensor file. |
Hi,
I know that safetensors are widely used nowadays in HF, and the comparisons made in this repo's README file make a lot of sense.
However, I am now surprised to see that there is no comparison with zarr, which is probably the most widely used format to store tensors in an universal, compressed and scalable way.
Is there any particular reason why safetensors was created instead of just using zarr, which has been around for longer (and has nice benefits such as good performance in object storage reads and writes)?
Thank you!
The text was updated successfully, but these errors were encountered: