-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
go api: use byte slices for deck parsing - reuse deck parsing results for subsequent reqs #1
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing all this! Really exciting stuff! Mostly left some opinionated formatting thoughts for the WIP.
|
||
// ensures work is only done once even when racing for deck parse. Once its loaded, will be checked | ||
// using atomic.LoadUint32 instead of a mutex lock | ||
parseOnce sync.Once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of scope: shame we aren't on go1.21, its the next version up and would give us https://pkg.go.dev/sync#OnceValue
Given the work-in-progress part of this PR and our total control over the repo I wonder if it's worth a quick bump.
|
||
// region deck parsing unstable | ||
|
||
// deck designed to be parsed once and then used for lookups. The load of the parsing is in the request context as to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit throughout: Go comments (and especially GoDoc comments) should be complete sentences ending in punctuation https://go.dev/wiki/CodeReviewComments#comment-sentences.
// not block the sub | ||
type deck struct { | ||
// raw data not parsed, after parsing, raw data buf is freed and replaced by ready-to-use result | ||
buf []byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit throughout: GoDoc comments should begin with the name of the thing being described https://go.dev/wiki/CodeReviewComments#comment-sentences.
return err | ||
} | ||
|
||
parts := bytes.Split(decompressed, []byte(";;;")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question for throughout: is it safe to assume we never have unicode things to worry about? In general I believe this is unsafe, but I haven't thought too deeply about if these particular byte sequences we use around this file will be safe (they very well might be!).
} | ||
|
||
slices.SortFunc(keys, func(i, j string) bool { | ||
// may allocate some extra space in some cases, but we will shrink after we are done formatting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be missing it but where is this shrink? (I don't think there's any benefit to shrinking here probably, but just making sure I'm following).
|
||
for i := len(compressionDict) - 1; i >= 0; i-- { | ||
word := compressionDict[i] | ||
// TODO: this is the source of lots of allocs and CPU cycles, probably no need for regexp here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this was the most naive 1-1 move haha. You inspired me to see how the naive string replacer would do and I put that up here: Spireblight#20
return text, nil | ||
} | ||
|
||
type delimCB func(val []byte) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: for type names especially could we type out fuller names? I think this stands for callback, but it took me a bit to guess.
} | ||
|
||
// no guarentee of not copying here in map lookup, so unsafe it out. Is valid as long as `decompressed` is still valid and static | ||
cardStr := unsafe.String(&card[0], len(card)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsafe is pretty spooky. For a single string copy is this measurably important?
} | ||
|
||
// Bytes use to get parsed deck | ||
func (d *deck) Bytes() (res []byte, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now it is fairly obvious when the expensive part of this happens: is it worth doing something (maybe even just renaming this) to make it clear this is the expensive call?
As listed in this pull request, the optimization can be taken a bit further.
Added deck struct to manage parsing logic.
Converted string parsing functions into their byte counterparts.
Added functionality for reusing parsed deck data between reqs.
Added tests for the bytes deck parsing logic (left the old for now, but can remove have just been using for comparisons for now).
Added tests for the deck API handler (bumped up coverage by a bit)
Do not have as good of a computer for the benchmarks, but here is the comparison from my machine:
Unsure of why the small deck is faster as of writing this, but the main change I wanted to implement was reuse of the request results.
Seems like even in the "large deck" cases we end up not storing that much in the decompressed buffer, was <2Kb for the large case per-item.