Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix {to,from}_array UB when repr(simd) produces padding #342

Merged
merged 2 commits into from
Apr 27, 2023

Conversation

calebzulawski
Copy link
Member

@calebzulawski calebzulawski commented Apr 23, 2023

Addresses #341 (but doesn't introduce an intrinsic).

// it results in better codegen with optimizations disabled, but we should
// probably just use `transmute` once that works on const generic types.
// FIXME: We currently use a pointer store instead of `transmute_copy` because `repr(simd)`
// results in padding for non-power-of-2 vectors (so vectors are larger than arrays).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If vectors are larger than arrays and more aligned (or the same size and same align), then I think the read is always fine for this?

Because if not, we'll have to remove https://doc.rust-lang.org/std/simd/struct.Simd.html#method.as_mut_array.

(Which, comes to think of it, means that we can always implement to_array as *self.as_array(), since we currently require T: Copy.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I changed it to *self.as_array()

Copy link
Member

@scottmcm scottmcm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a reviewer here, but FWIW this looks good to me.

///
/// # Safety
/// Writing to `ptr` must be safe, as if by `<*mut [T; N]>::write_unaligned`.
const unsafe fn store(self, ptr: *mut [T; N]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'll want to explicitly copy self to a temporary first, and then memcpy from the temporary, since afaict llvm optimizes that to a vector load/store due to the vector insns generated by the temporary, whereas just calling memcpy generates only a memcpy so llvm doesn't see any vector operations and generates possibly less-efficient code.

compare store vs. store2 -- store2 has the temporary.
https://godbolt.org/z/jWsc6ezP4

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it intentional, though, that it's not a vector read? Because I read the OP as saying that that would generate an over-long read.

Copy link
Member

@programmerjake programmerjake Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, because LLVM IR defines vector load/store instructions to never read/write the padding (when align is small enough), as i explained on Zulip (see #341). read_unaligned has no such guarantee

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we intentionally want a llvm load/store instruction with align <elem-align> not full-simd-type-align

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not written as a vector store, because that would be wrong (writing past the end of the array), but it should lower as one. I added the temporary with a comment as to why it's necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, I was mixing up the LLVM type and the Rust type.

self.store(tmp.as_mut_ptr());
tmp.assume_init()
}
*self.as_array()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't generate a vector store instruction so llvm may produce less optimal code.

@calebzulawski calebzulawski merged commit 195d4ca into master Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants