-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade go-msgpack to v2 2.1.1 #23750
Conversation
7ccdebf
to
3a88232
Compare
Build Results: |
go.mod
Outdated
|
||
replace github.com/hashicorp/raft => /Users/swenson/projects/raft | ||
|
||
replace github.com/hashicorp/raft-boltdb => /Users/swenson/projects/raft-boltdb | ||
|
||
replace github.com/hashicorp/raft-boltdb/v2 => /Users/swenson/projects/raft-boltdb/v2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@swenson I think these may have made it in by accident.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more of a reminder that I still need to merge those before this is done.
Thanks for your diligent work on this painful bit of tech debt @swenson ❤️ .
I wonder if we should find a way to quantify this specifically for Vault's usage. I think we only use msgpack for the Raft wire transport (i.e. the thin bit of RPC framing around each Raft RPC with actual log entries already protobuf encoded in the byte slice) and similar log framing for the LogStore (i.e. bit of msgpack boilerplate around each log entries proto bytes to encode the other metadata like index/term/appendedAt time etc. So it should be relatively easy to write a Benchmark test in our physical/raft package that exercises both network and disk storage and the compare results of several runs with This is more me musing aloud that requesting you do all this work! If you'd like to I think it would be valuable but we can also probably have someone else try it! |
Thanks @banks. Time-permitting, I can try to write a benchmark this week. |
CI Results: |
1bd2bbd
to
39db02f
Compare
@banks I added a benchmark in 02f613b tl;dr there was no performance difference in the benchmark Though I am suspicious as to why Linux is 10x faster than macOS -- I thought the fsync issue was that macOS didn't do them properly and was supposed to be faster? 🤔
|
Amazing 🤩 Thanks Christopher. Great to confirm that.
It's a long answer but... since Go 1.12 But tl;dr, for reason I'm still fuzzy on despite researching a few times, Macs are typically much slower to Fsync if you actually force the flush properly (since the fsync system call alone doesn't) and Go since 1.12 always does. |
54048fb
to
3de7c35
Compare
physical/raft/msgpack.go
Outdated
@@ -9,5 +9,5 @@ package raft | |||
// on the library, which allows us to pin the version in go.mod. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this file now, right? Or at least update the comment?
3de7c35
to
2e3e6bf
Compare
Thanks everyone. I updated this to remove the extra tests and related code. |
And set the `time.Time` option to use the go-msgpack-1.1.5-compatible encoding in all the places, since that is the (now previous) version in `go.mod`. v2 2.1.1 was specifically designed to honor backwards compatibility with 1.1.5 and 0.5.5, and to clean up the code base to be more maintainable. There may performance lost with the 1.1.5 to 2.1.1 migration since the fastpath code was removed, but the increased safety is probably worth it. See [the release notes for go-msgkack 2.1.0](https://github.com/hashicorp/go-msgpack/releases/tag/v2.1.0) for more details. I tested this by running this code, and booting up a cluster with a node also running Vault 1.15.0 (before the upgrade). Before I made the changes to set the right `time.Time` option, the previous-version node would throw a bunch of time-decoding errors. After fixing the option, the node came up smoothly, even after changing leadership between them. This relies on - [ ] hashicorp/raft-boltdb#38 - [ ] hashicorp/raft#577 and will need to be updated after they are merged to get the `go.mod` fixes removed.
I added a simple pair of benchmarks (one with a final sync, one without) and ran them before and after on both my Mac (M2 Max) laptop and my Linux (AMD Threadripper 3970X) desktop. tl;dr There was no performance difference for this benchmark. ``` goos: darwin goarch: arm64 pkg: github.com/hashicorp/vault/physical/raft │ a.txt │ b.txt │ │ sec/op │ sec/op vs base │ RaftWithNetwork-10 58.65m ± 2% 58.62m ± 2% ~ (p=0.937 n=6) ``` ``` goos: linux goarch: amd64 pkg: github.com/hashicorp/vault/physical/raft cpu: AMD Ryzen Threadripper 3970X 32-Core Processor │ c.txt │ d.txt │ │ sec/op │ sec/op vs base │ RaftWithNetwork-64 5.861m ± 1% 5.837m ± 0% ~ (p=0.240 n=6) ```
2e3e6bf
to
9239f71
Compare
Thanks! |
And set the
time.Time
option to use the go-msgpack-1.1.5-compatible encoding in all the places, since that is the (now previous) version ingo.mod
.v2 2.1.1 was specifically designed to honor backwards compatibility with 1.1.5 and 0.5.5, and to clean up the code base to be more maintainable. There may performance lost with the 1.1.5 to 2.1.1 migration since the fastpath code was removed, but the increased safety is probably worth it. See
the release notes for go-msgkack 2.1.0 for more details.
I tested this by running this code, and booting up a cluster with a node also running Vault 1.15.0 (before the upgrade). Before I made the changes to set the right
time.Time
option, the previous-version node would throw a bunch of time-decoding errors. After fixing the option, the node came up smoothly, even after changing leadership between them.This relies on
and will need to be updated after they are merged to get the
go.mod
fixes removed.