Skip to content
This repository has been archived by the owner on Nov 19, 2024. It is now read-only.

sz: Support S2 (close #429) #431

Merged
merged 4 commits into from
Nov 19, 2024
Merged

sz: Support S2 (close #429) #431

merged 4 commits into from
Nov 19, 2024

Conversation

mholt
Copy link
Owner

@mholt mholt commented Nov 18, 2024

@CompuRoot maybe you can take a look at this. I'll likely just merge this and then we can change it later if needed.

@klauspost -- Thank you for yet again, another great package! (One minor request: It could be useful if the compression levels were exported.)

@klauspost
Copy link
Contributor

@mholt "next version" - meaning the replacement will :)

@klauspost
Copy link
Contributor

For now here is a teaser :)

image

@mholt
Copy link
Owner Author

mholt commented Nov 18, 2024

Wow! So, do you think I should wait for that? (Before I make new exported API)

sz.go Outdated Show resolved Hide resolved
if sz.S2.MaxBlockSize != 0 {
opts = append(opts, s2.ReaderMaxBlockSize(sz.S2.MaxBlockSize))
}
return io.NopCloser(s2.NewReader(r, opts...)), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit sad the API is truncated to an io.ReadCloser, but I guess the users can flip back to the org package for more extensive API.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah, I guess this makes type asserting difficult doesn't it. Sorry :(

I went through all the formats just now to see if the Closer API was really necessary but it does seem a few formats use it, so I think we need it to stay in order to keep the API advantages.

Copy link

@CompuRoot CompuRoot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for picking it so quickly !

The only suggestion is to change file extension .s2 to .sz2 or .s2z to be on pair with other and to reflect relation to Snappy

@CompuRoot
Copy link

For now here is a teaser :)

And bellow is result of testing a small (140Mb) .vdi file with kopia:

     Compression                Compressed   Throughput   Allocs   Usage
------------------------------------------------------------------------------------------------
  0. s2-parallel-8              62.9 MB      1.5 GB/s     1023     197 MB
  1. s2-default                 62.9 MB      1.1 GB/s     1050     195.9 MB
  2. s2-parallel-4              62.9 MB      1 GB/s       995      187.5 MB
  3. pgzip-best-speed           58.9 MB      816.1 MB/s   1352     214.1 MB
  4. s2-better                  60.6 MB      739.4 MB/s   1019     182.4 MB
  5. pgzip                      56.9 MB      474.1 MB/s   1294     210.9 MB
  6. lz4                        64.7 MB      433.7 MB/s   27       204.1 MB (deprecated)
  7. zstd-fastest               54.4 MB      352.6 MB/s   6311     212.4 MB
  8. deflate-best-speed         58.9 MB      322.3 MB/s   38       135 MB
  9. zstd                       52.4 MB      322.3 MB/s   3266     156.3 MB
 10. deflate-default            56.9 MB      186.6 MB/s   38       135.3 MB
 11. gzip-best-speed            58.4 MB      185.6 MB/s   44       135.4 MB
 12. zstd-better-compression    51.6 MB      108.9 MB/s   3222     176 MB
 13. gzip                       55.9 MB      47.3 MB/s    44       135 MB
 14. zstd-best-compression      47 MB        31.8 MB/s    3367     207.5 MB (deprecated)
 15. pgzip-best-compression     56.1 MB      31.2 MB/s    1299     171.6 MB
 16. deflate-best-compression   56.1 MB      8.6 MB/s     39       135.4 MB
 17. gzip-best-compression      55.9 MB      7.3 MB/s     43       135 MB

Co-authored-by: Klaus Post <[email protected]>
@klauspost
Copy link
Contributor

@CompuRoot I am not religious about it, but I have used .s2 for my commandline tool.

The snappy relation is not too important, except for backwards compatibility of reading snappy data.

@CompuRoot
Copy link

@CompuRoot I am not religious about it, but I have used .s2 for my commandline tool.

The snappy relation is not too important, except for backwards compatibility of reading snappy data.

You right, if there already exists CLI tool with this extension, then keeping .s2 extension for compatibility is a better solution.

@mholt mholt merged commit e1f152f into master Nov 19, 2024
6 checks passed
@mholt mholt deleted the s2 branch November 19, 2024 15:36
@mholt
Copy link
Owner Author

mholt commented Nov 19, 2024

Thanks both for chiming in!

Given that S2 is largely compatible with Snappy, I'm keeping it as part of the Snappy format, with the option of writing Snappy-incompatible data if desired.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants