-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can FFI code use memory_order_consume
#528
Comments
Are there any C or C++ compilers that actually implement Does yours? As far as I know, all of the major compilers found it too hard to preserve dependencies through the optimization process, gave up, and currently resort to treating Certainly this is true for Clang. And LLVM's inability to preserve dependencies also applies to rustc. Therefore, if the Rust part of your example is compiled using rustc with its default backend, it can be guaranteed to work if and only if the C++ part of the example is compiled with a compiler that implements Therefore, if Rust were to make a language-level guarantee that the example works, it would be tantamount to assuming that no compiler (that rustc cares about interoperating with) will ever implement Which might actually be a valid assumption! But it also feels dangerous. Arguably this is something that platform ABIs ought to specify one way or the other. After all, you could imagine a scenario similar to your example, but instead of Rust calling into C++, it's 'C++ compiled by compiler A' calling into 'C++ compiled by compiler B'. In that case, the example would be defined behavior, but for it to work, the two C++ compilers would have to agree on how On the other hand, you could argue that, from a C and C++ perspective, atomics are a standard library feature, and 'how to use standard library functionality' isn't usually part of the ABI specifications… In any case, I haven't seen any ABI specifications go into detail about this. Of the ones I checked, the x86-64 and ARM ABI specifications don't explicitly define a mapping from the C++ atomics model to the instruction set at all, even though there are multiple possible mappings. I guess the standard mapping is just an informal convention. In contrast, the RISC-V ABI specification does define a mapping, but that document conspicuously omits any mention of |
It may be reasonable for a compiler to handle consume optimizations like acquire, but lower to hardware instructions that are only correct for the former (ARM's default load/stores come to mind). This is something I've considered for lccc, yes (though xlang does omit a A more targetted question is whether replacing the C++ code with assembly code on, e.g. ARM, that just does a default load, is correct. Such a load can't be considered to be an acquire load, but it's UB if the load in the Rust AM is only a Relaxed load. As a note about platform ABI, I am currently adding atomics code to the psABI for my own architecture, and while it doesn't mention consume by name, it only has - and needs - special casing for seq_cst, as all lesser orders are just achieved by virtue of the least strict instruction sequence possible for each operation. |
I would phrase this as, nobody has even given a proper definition of what it would mean to preserve these dependencies. I consider So in that sense I don't think any code can use If you believe that there is a well-defined meaning to a C++ program with
If you just have the "consume" load in the assembly, then I don't think that's any better than pretending that If you have a larger assembly block that has both the "consume" load and the other operation that depends on it, and you can argue in the ARM memory model that because of the dependency of the 2nd operation on the first this is properly synchronized, that should be fine.
I this the answer to that is "no" even with inline assembly, for the reasons spelled out above: there is no well-defined notion of "dependency-ordered before" in Rust (or C++) so it can't flow into the Rust code after the inline assembly code either. This relation can only exist inside a single inline asm block. |
I'd argue that dependency-ordered before is itself well-defined (in that it has semantics that can be determined), but compiler vendors have decided (rightsly so, also in my opinion) that properly implementing it is a right pain in the ass, so universally elect to implement as the stronger
Given this, would it be fair to say that, assuming my position about the definition of the operation itself, it would be well-defined then? |
As I said, I don't think FFI changes anything. But I can't help you figure out how memory_order_consume interacts with anything since I don't think it is well-defined to begin with. (Or, well, there is of course a simple way to define a syntactic notion of "dependency" on the source code. It's just entirely meaningless and hence may as well not exist. I'm not very interested in spending my time discussing models that clearly do not model reality. "All models are wrong, but some are useful" -- I think this one is not useful.) |
As to how it interacts with compilers existing, I don't think there's an issue there. I can imagine two scenarios here:
So there would be no correct compilation in any case that doesn't obey the AM-AM correspondance here - namely, that it would define that the (To be clear, the same reasoning would apply to any other tuple of |
To be clear, that is exactly how the C++ spec defines "dependency". So it's well-defined, just difficult to implement.
But that's not the issue. Let me be more explicit. You may understand all of this already, but for clarity if nothing else… In your example, as compiled by a typical optimizing compiler, the dependency ordering would typically work in practice. What follows is a modified example where it would not. The key difference is that instead of loading a pointer and then loading from that pointer, we load an integer and involve it in pointer arithmetic, but in a way that's trivially optimized out: let ptr = addr_of!(OTHER_DATA).wrapping_add(important_data).wrapping_sub(important_data); Full example: use std::sync::atomic::{AtomicUsize, Ordering};
use std::ptr::addr_of;
#[no_mangle]
pub static IMPORTANT_DATA: AtomicUsize = AtomicUsize::new(0);
static mut OTHER_DATA: i32 = 0;
#[no_mangle]
unsafe fn thread1() {
OTHER_DATA = 10;
IMPORTANT_DATA.store(20, Ordering::Release);
}
extern "C" {
fn get_important_data() -> usize;
}
#[no_mangle]
unsafe fn thread2() {
let important_data = get_important_data();
assert_eq!(important_data, 20);
let ptr = addr_of!(OTHER_DATA)
.wrapping_add(important_data)
.wrapping_sub(important_data);
let val = *ptr;
println!("{val}");
} extern "C" {
extern std::atomic<size_t>) IMPORTANT_DATA;
size_t get_important_data() {
return IMPORTANT_DATA.load(std::memory_order::consume);
}
} For this to be safe, the load from We can easily be assured that the compiler won't reorder them, because:
However, to prevent the hardware from reordering the loads, on a typical non-x86 architecture, one of the following needs to be true:
At a source level (syntactically), the address of the second load does have a value dependency on the result of the first load. But a typical optimizer will remove The way Clang solves this is by having [Edit: The following is wrong; see my later comment.] With consume ordering implemented as originally designed, however, the barrier would probably be located not in the callee To be frank, that seems completely insane. Given that dependency chains can be maintained through memory, I don't see how a compiler could avoid having to emit a barrier before and after almost every external function call. Maybe I'm missing something. Or maybe that's why nobody was willing to implement this scheme. But the details don't matter for our purposes. What matters is that rustc's LLVM output does not follow this scheme. It never preserves dependency chains. It never emits barriers before or after external function calls. So if a C++ compiler implements consume in a way that puts any such requirements on function callers, and rustc output makes FFI calls into that compiler's output, then dependency chains are just not going to be preserved properly. (In theory rustc could add a compatibility mode for such a compiler that did emit barriers around every external function call, but that seems horrible.) Therefore, if Rust made a language-level guarantee that dependency chains are preserved, it would be tantamount to assuming that rustc will never need to interoperate with such a C++ compiler. Footnotes
|
Fair, though here specifically the function doesn't have |
Well, that is the case where I was suggesting the caller would need a barrier. But now I realize I was wrong. Sigh. I have just reread one of the original proposals for consume ordering, and I realize that I misinterpreted the previously-linked example from the spec. It's still not spelled out very explicitly, but I am pretty sure the actual intended strategy is the following: The compiler tracks which expressions 'carry dependencies' (loosely defined as "might be based on a
If an expression that carries dependencies is:
then the compiler emits a memory barrier at that point. This is much less insane. And under this strategy, as long as Rust does not support FFI into or out of In particular, in the So I changed my mind. Making guarantees about your example is probably fine. Still, it's hard to be sure without seeing any concrete implementations that don't just treat consume as acquire. |
For
Atomic
s and our Multithreaded Memory model, we incorporate the C++ rules by reference in Minirust's definition. We do not defineOrdering::Consume
, but I do not believe we exclude its existance. Would it thus be valid for FFI code (or assembly) to perform a consume operation that reads from a release operation, and would the consume behaviour (dependency-ordered before) follow into rust code that executes after?Formally, is the following code defined behaviour, given no other synchronization, where thread1 and thread2 are entrypoints for separate threads:
(or equivalent C18 code).
The text was updated successfully, but these errors were encountered: