Facebook BOLT binary optimizer #184
Replies: 9 comments 2 replies
-
That link is private. Could you summarize? |
Beta Was this translation helpful? Give feedback.
-
FYI, there is the possibility that the BOLT project can be merged into the LLVM project in near future. |
Beta Was this translation helpful? Give feedback.
-
Bit more details from a Pyston dev with only limited knowledge of BOLT: Important is that the binary is compiled with And yes seems like it's very close to getting merged into LLVM ✨ . |
Beta Was this translation helpful? Give feedback.
-
https://www.phoronix.com/scan.php?page=news_item&px=LLVM-Lands-BOLT BOLT is merged into LLVM codebase! |
Beta Was this translation helpful? Give feedback.
-
FYI, I am preparing BOLT experiment with the latest LLVM branch. |
Beta Was this translation helpful? Give feedback.
-
As the record, I success to run BOLT with instrument option for CPython. $ llvm-bolt ./python -instrument -o ./python.bolt
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: 1bb0caf561688681be67cc91560348c9e43fcbf3
BOLT-INFO: first alloc address is 0x400000
BOLT-INFO: creating new program header table at address 0xc00000, offset 0x800000
BOLT-WARNING: debug info will be stripped from the binary. Use -update-debug-sections to keep it.
BOLT-INFO: enabling relocation mode
BOLT-INFO: forcing -jump-tables=move for instrumentation
BOLT-INFO: enabling -align-macro-fusion=all since no profile was specified
BOLT-INFO: enabling lite mode
BOLT-WARNING: .annobin_elf_init.c_end/1 (0x646745) does not have any section
BOLT-WARNING: .annobin___libc_csu_fini.end/1 (0x646745) does not have any section
BOLT-WARNING: split function detected on input : bytes_richcompare.cold.36/1. The support is limited in relocation mode.
BOLT-WARNING: Ignored 4 functions due to cold fragments.
BOLT-INSTRUMENTER: Number of indirect call site descriptors: 1184
BOLT-INSTRUMENTER: Number of indirect call target descriptors: 5258
BOLT-INSTRUMENTER: Number of function descriptors: 5258
BOLT-INSTRUMENTER: Number of branch counters: 117099
BOLT-INSTRUMENTER: Number of ST leaf node counters: 59143
BOLT-INSTRUMENTER: Number of direct call counters: 0
BOLT-INSTRUMENTER: Total number of counters: 176242
BOLT-INSTRUMENTER: Total size of counters: 1409936 bytes (static alloc memory)
BOLT-INSTRUMENTER: Total size of string table emitted: 111961 bytes in file
BOLT-INSTRUMENTER: Total size of descriptors: 8563332 bytes in file
BOLT-INSTRUMENTER: Profile will be saved to file /tmp/prof.fdata
BOLT-INFO: 0 out of 5349 functions in the binary (0.0%) have non-empty execution profile
BOLT-INFO: the input contains 871 (dynamic count : 0) opportunities for macro-fusion optimization that are going to be fixed
BOLT-INFO: 409751 instructions were shortened
BOLT-INFO: removed 892 empty blocks
BOLT-INFO: UCE removed 21413 blocks and 1493326 bytes of code.
BOLT-INFO: SCTC: patched 0 tail calls (0 forward) tail calls (0 backward) from a total of 0 while removing 0 double jumps and removing 0 basic blocks totalling 0 bytes of code. CTCs total execution count is 0 and the number of times CTCs are taken is 0.
<unknown>:0: error: Undefined temporary symbol .Ltmp1394
BOLT-INFO: output linked against instrumentation runtime library, lib entry point is 0x170ad20
BOLT-INFO: clear procedure is 0x1709b70
BOLT-INFO: setting _end to 0xaaff18
BOLT-INFO: setting _end to 0xaaff18
BOLT-INFO: patched build-id (flipped last bit) |
Beta Was this translation helpful? Give feedback.
-
FYI I cleaned up Pyston's usage of BOLT and submitted it as python/cpython#95908 I think compared to #224, I noticed that there was a pretty significant improvement on macrobenchmarks if some extra compilation flags were used. |
Beta Was this translation helpful? Give feedback.
-
Thanks! I hope @corona10 can review and merge this, and maybe @pablogsal will be willing to backport it to 3.11. |
Beta Was this translation helpful? Give feedback.
-
Thanks @corona10 for the review! Thanks @kmod for the patch and for working with Dong-Hee. |
Beta Was this translation helpful? Give feedback.
-
See https://discord.com/channels/768122496351993886/768122496351993890/920257709968859136
The Pyston folks claim to be using BOLT (https://github.com/facebookincubator/BOLT/) which makes their binaries faster. Might be worth looking into what they are actually doing.
(In the same thread they describe doing something slightly different for PGO and LTO that we might copy more easily.)
Beta Was this translation helpful? Give feedback.
All reactions