Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integer multiply high (mulh) equivalent in SimdInt and SimdUint #440

Open
ds84182 opened this issue Sep 18, 2024 · 3 comments
Open

Integer multiply high (mulh) equivalent in SimdInt and SimdUint #440

ds84182 opened this issue Sep 18, 2024 · 3 comments
Labels
C-feature-request Category: a feature request, i.e. not implemented / a PR

Comments

@ds84182
Copy link

ds84182 commented Sep 18, 2024

LLVM already has a fairly shaky optimization that doesn't make it through isel when followed by a left or right shift. https://rust.godbolt.org/z/a6eexcE6z

#![no_std]
#![feature(portable_simd)]

use core::simd::prelude::*;

// Generates vpmulhw
pub fn mulhw(a: i16x16, b: i16x16) -> i16x16 {
    (((a.cast::<i32>()) * (b.cast::<i32>())) >> 16).cast::<i16>()
}

// Generates a mess
pub fn mulhw_and_shift(a: i16x16, b: i16x16) -> i16x16 {
    mulhw(a, b) >> 1
}

A dedicated function that can multiply two integers and take the high parts would be very beneficial when handling fixed-point integers. And hopefully it'll have a better chance of surviving optimization passes.

@ds84182 ds84182 added the C-feature-request Category: a feature request, i.e. not implemented / a PR label Sep 18, 2024
@calebzulawski
Copy link
Member

calebzulawski commented Sep 18, 2024

This is perhaps not very intuitive, but things like this work when you use a swizzle: https://rust.godbolt.org/z/zGMb4b5PK
I've done similar here: https://github.com/calebzulawski/autobahn-hash/blob/f35d18565b996a162d1cfbc18abd268b940f4ced/src/lib.rs#L83-L91

@programmerjake
Copy link
Member

This is perhaps not very intuitive, but things like this work when you use a swizzle: https://rust.godbolt.org/z/zGMb4b5PK

that breaks when avx2 is enabled: https://rust.godbolt.org/z/h84KsKEnx

@calebzulawski
Copy link
Member

Interesting, I think that's a missing instruction combine rule, separate issue from trying to optimize the shifts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: a feature request, i.e. not implemented / a PR
Projects
None yet
Development

No branches or pull requests

3 participants