-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix {to,from}_array UB when repr(simd) produces padding #342
Conversation
// it results in better codegen with optimizations disabled, but we should | ||
// probably just use `transmute` once that works on const generic types. | ||
// FIXME: We currently use a pointer store instead of `transmute_copy` because `repr(simd)` | ||
// results in padding for non-power-of-2 vectors (so vectors are larger than arrays). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If vectors are larger than arrays and more aligned (or the same size and same align), then I think the read
is always fine for this?
Because if not, we'll have to remove https://doc.rust-lang.org/std/simd/struct.Simd.html#method.as_mut_array.
(Which, comes to think of it, means that we can always implement to_array
as *self.as_array()
, since we currently require T: Copy
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I changed it to *self.as_array()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a reviewer here, but FWIW this looks good to me.
/// | ||
/// # Safety | ||
/// Writing to `ptr` must be safe, as if by `<*mut [T; N]>::write_unaligned`. | ||
const unsafe fn store(self, ptr: *mut [T; N]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you'll want to explicitly copy self
to a temporary first, and then memcpy from the temporary, since afaict llvm optimizes that to a vector load/store due to the vector insns generated by the temporary, whereas just calling memcpy generates only a memcpy so llvm doesn't see any vector operations and generates possibly less-efficient code.
compare store
vs. store2
-- store2
has the temporary.
https://godbolt.org/z/jWsc6ezP4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it intentional, though, that it's not a vector read? Because I read the OP as saying that that would generate an over-long read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, because LLVM IR defines vector load
/store
instructions to never read/write the padding (when align
is small enough), as i explained on Zulip (see #341). read_unaligned
has no such guarantee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we intentionally want a llvm load
/store
instruction with align <elem-align>
not full-simd-type-align
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not written as a vector store, because that would be wrong (writing past the end of the array), but it should lower as one. I added the temporary with a comment as to why it's necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see, I was mixing up the LLVM type and the Rust type.
crates/core_simd/src/vector.rs
Outdated
self.store(tmp.as_mut_ptr()); | ||
tmp.assume_init() | ||
} | ||
*self.as_array() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't generate a vector store
instruction so llvm may produce less optimal code.
cc3e99b
to
c504f01
Compare
Addresses #341 (but doesn't introduce an intrinsic).