You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So, beside integer endianness utterly defeating the compressor, (a) xz -7 ints is significantly smaller than every other setting, and (b) output size grows with compression levels for ints2.
It'd be nice if that were the other way around I think.
Testing on bookworm (5.4.1-0.2).
The text was updated successfully, but these errors were encountered:
It's a curious result. LZMA SDK 24.09 produces results that, at least roughly, show some similar behavior.
I didn't investigate why it happens. Typically big endian compresses better than little endian. However, here little endian might benefit from the fact that the most random byte is always after 0x00, but again, I didn't actually investigate.
Differences between compression presets is weirder. If one only changes the dictionary size, keeping other things the same, in some cases a smaller dictionary makes the file a lot smaller. The same happens with the latest LZMA SDK.
I'm not sure if this is only a funny anomaly where specific input tricks the encoder on a wrong path, or if there is something worth improving due to these results. Artificial files like this don't represent real-world files well at all. I tried with zstd --ultra -22 too, and that produces a much smaller result from the big endian file: 2.49 MiB (BE) vs. 7.80 MiB (LE). With gzip -9 it's 20.3 MiB (BE) vs. 13.2 MiB (LE).
When nearby bytes have values that are close to each other (these two files, bitmap images, PCM audio, timestamps in a log), a simple delta filter makes a big difference:
(Typically one wants 4-byte distance paired with maching LZMA2 options pb=2,lp=2,lc=2 but it doesn't matter above with such an extreme input file.)
So when you have special kind of data, specializing the compression method helps. For example, with PCM audio, Delta+LZMA2 is better than plain LZMA2. But FLAC and other special purpose compressors produce much smaller results and do it much faster too.
The encoder in XZ Utils is based on an old LZMA SDK version. Some day it should be updated. Any encoder tweaks need to wait for that, and I likely won't touch the encoder in the very near future.
Given:
i.e.
ints
is 4-byte integers [0, 10000000] in big endian, andints2
is the same in little endian.Then after
I see
So, beside integer endianness utterly defeating the compressor, (a) xz -7 ints is significantly smaller than every other setting, and (b) output size grows with compression levels for ints2.
It'd be nice if that were the other way around I think.
Testing on bookworm (5.4.1-0.2).
The text was updated successfully, but these errors were encountered: