Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile-Guided Optimization (PGO) benchmark report #5

Closed
zamazan4ik opened this issue Sep 29, 2024 · 5 comments · Fixed by #6
Closed

Profile-Guided Optimization (PGO) benchmark report #5

zamazan4ik opened this issue Sep 29, 2024 · 5 comments · Fixed by #6

Comments

@zamazan4ik
Copy link

Hi!

As I have done many times before, I decided to test the Profile-Guided Optimization (PGO) technique to optimize the library performance. For reference, results for other projects are available at https://github.com/zamazan4ik/awesome-pgo . Since PGO has helped many other libraries, I decided to apply it to serde-brief to see if a performance win (or loss) can be achieved. Here are my benchmark results. For benchmarks, I used these benchmarks since it was mentioned in the Reddit post.

This information can be interesting for anyone who wants to achieve more performance with the library in their use cases.

Test environment

  • Fedora 40
  • Linux kernel 6.10.11
  • AMD Ryzen 9 5900x
  • 48 Gib RAM
  • SSD Samsung 980 Pro 2 Tib
  • Compiler - Rustc 1.81.0
  • serde-brief version: used in the benchmark above (I guess the latest for the moment)
  • Disabled Turbo boost

Benchmark

For PGO optimization I use cargo-pgo tool. Release bench results I got with taskset -c 0 cargo bench --no-default-features --features serde-brief command. The PGO training phase is done with taskset -c 0 cargo pgo bench -- --no-default-features --features serde-brief, PGO optimization phase - with taskset -c 0 cargo pgo optimize bench -- --no-default-features --features serde-brief.

taskset -c 0 is used for reducing the OS scheduler's influence on the results. All measurements are done on the same machine, with the same background "noise" (as much as I can guarantee).

Results

I got the following results:

According to the results, PGO measurably improves the library's performance.

Further steps

I understand that the steps above can be time-consuming and hard to implement in practice. At the very least, the library's users can find this performance report and decide to enable PGO for their applications if they care about the library's performance in their workloads. Maybe a small note somewhere in the documentation (the README file?) will be enough to raise awareness about this work.

Please don't treat the issue like an actual issue - it's just a benchmark report (since Discussions are disabled for the repo).

Thank you.

@FlixCoder
Copy link
Owner

Thank you! Sounds helpful and a considerable speed up.

Feel free to make a PR adding documentation for it. Or if you don't want to, I can try to do it as well. (I imagine a new markdown file in the docs folder, that can be linked in the readme and included in the library's docs module :D). I don't want to take your contribution away :)

@FlixCoder
Copy link
Owner

Oh and as an addition: I expect there are more improvements possible by optimizing the source code still. Postcard is about 3x faster at serialization right now. It won't be possible to reach this, since we do encode more data, but I bet it is possible to halve time needed time (I hope).

Just need to learn how to do it :D

@zamazan4ik
Copy link
Author

Feel free to make a PR adding documentation for it. Or if you don't want to, I can try to do it as well.

I think it would be better if you could create such a document - in this case, it will be written in a consistent way with other pieces of documentation for the project. As a reference, I have several examples of such PGO docs in other projects (applications and libraries): https://github.com/zamazan4ik/awesome-pgo?tab=readme-ov-file#project-specific-documentation-about-pgo . I hope it will be helpful.

I don't want to take your contribution away

Oh, no worries about that at all! I am already happy that one more person is interested in PGO!

I expect there are more improvements possible by optimizing the source code still. Postcard is about 3x faster at serialization right now. It won't be possible to reach this, since we do encode more data, but I bet it is possible to halve time needed time (I hope).

Yep, I understand. You can try to get some insights about possible optimization from PGO too since you can compare flamegraphs before and after PGO to check a difference (or even compare the resulting assembly/LLVM IR before and after PGO). It could be time-consuming, though. A nice thing with PGO is that all that "boring optimization stuff" is done semi-automatically by a compiler. You focus on high-level optimizations, the compiler does the "dirty" low-level optimization stuff ;)

@FlixCoder
Copy link
Owner

Alright, it is documented in #6. I would be interested in your feedback if you have the time :)

The PR will close this, but it is linked in the docs, so users can find the information :)

@zamazan4ik
Copy link
Author

zamazan4ik commented Oct 3, 2024

I would be interested in your feedback if you have the time :)

Sure! Just did it ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants