Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another cmov_int16 proposal #78

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

aguinetsb
Copy link

@aguinetsb aguinetsb commented Jun 8, 2024

This proposal does not conditionally load a value, which could be less future proof. Indeed, having only one "virtual branch" loading a value is what could cause a compiler to create a branch, as it can avoid a load from memory given a condition - which is usually considered as a profitable optimization.

It's not only a speculation as this is what seemed to have originally happened with clang 18. Let's indeed look at clang's optimization pipeline using compiler explorer's very nice viewer on the original poly_frommsg function (https://godbolt.org/z/eYrM3scbe):

  • before the X86 cmov conversion pass, we actually have "virtual" cmov instructions:
    image

  • after the X86 cmov conversion pass, we have the introduction of branches:
    image

Let's look at what this pass is doing: https://github.com/llvm/llvm-project/blob/768118d1ad38bf13c545828f67bd6b474d61fc55/llvm/lib/Target/X86/X86CmovConversion.cpp#L10-L31. I think we hit all the conditions in that case:

  1. we have cmovs in a loop
  2. it is considered as profitable to do the transformation, because I think it manages to save a mov to a register (see the comments in checkForProfitableCmovCandidates: https://github.com/llvm/llvm-project/blob/768118d1ad38bf13c545828f67bd6b474d61fc55/llvm/lib/Target/X86/X86CmovConversion.cpp#L421-L443 that give more details). It would be interesting to debug that analysis to really understand what is considered more profitable here, as the generated assembly doesn't look very efficient to me:

image

What we can note though is that if it figures out one of the "branch" of the cmov has a slower latency than the other, it will generate a branch. This is why I think the original patch is asking for troubles, and that we should avoid exposing such patterns.

As to when this has been introduced in Clang, for what I've checked clang 15 is already doing that optimization, so it's been "a while". Checking LLVM's git log could help answer that question more precisely if that's of interest.

This patch also defines the cmov_int16 function as noinline. Indeed, Link-Time Optimization (LTO) could inline that function: the compiler could then realize the select argument is just one bit, which could then generate a select from the LLVM IR point of view, and we could then have the x86 cmov conversion pass be tempted to generate a branch again.

As an idea for the future, we could verify codegen in a test by verifying that the code generated by the compiler does not contain any branch. This could be done by using LLVM's FileCheck utility - which is basically an enhanced grep that can reason with context. The test would basically check that in the general textual assembly, we don't have any test or branch instructions (depending on the targeted architecture). We could also check that on the LLVM IR, but:

  1. that would only work for clang-based compilers
  2. LLVM's codegen is the one generating a branch here (as stated above), so we wouldn't catch the issue.

This test could be run only if that tool is available in the environment, so that we don't make the build system more complex for everyone. I can make a first tentative if that's something of interest to the maintainers!

This proposal does not conditionally load a value, which could be less
future proof.

Indeed, having only one "branch" load a value is what could cause a
compiler to create a branch, as it can avoid a load from memory given a
condition - which is usually considered as an effective optimization.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant