Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ill-typed unused FFI declarations can cause UB #46188

Open
arielb1 opened this issue Nov 22, 2017 · 55 comments
Open

ill-typed unused FFI declarations can cause UB #46188

arielb1 opened this issue Nov 22, 2017 · 55 comments
Labels
A-codegen Area: Code generation A-FFI Area: Foreign function interface (FFI) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@arielb1
Copy link
Contributor

arielb1 commented Nov 22, 2017

This compiles and prints "p is not null and 0x0":

pub mod bad {
    #[allow(improper_ctypes)]
    extern {
        pub fn malloc(x: usize) -> &'static mut ();
    }
    
    #[no_mangle]
    pub fn bar() {
        let _m = malloc as unsafe extern "C" fn(usize) -> &'static mut ();
    }
}

pub mod good {
    extern {
        fn malloc(x: usize) -> *const u8;
    }
    
    pub fn foo() {
        unsafe {
            let p = malloc(0x13371337deadbeef); // your computer doesn't have enough memory
            if p.is_null() {
                panic!("p is null");
            } else {
                panic!("p is not null and {:?}", p);
            }
        }
    }
}

fn main() {
    bad::bar();
    good::foo();
}

The problem is that we have two declarations of the "malloc" symbol, but LLVM uses a global namespace for these. So during codegen, the 2nd declaration we generate overwrites the first. In this case, the "ill-typed" malloc declaration (bad::malloc) comes last, up putting a nonnull attribute on malloc, which causes mod good to be miscompiled.

Here's another example that does not involve malloc. It does not get miscompiled currently, but it demonstrates the issue.

See here for a document that summarizes why this happens, and what our options are.

@arielb1 arielb1 added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness labels Nov 22, 2017
@arielb1
Copy link
Contributor Author

arielb1 commented Nov 22, 2017

I haven't actually seen this in practice, but in brendanzab/gl-rs#438 I think someone has (erroneously) declared GetProcAddress as returning a (non-null) function pointer, which might cause other random code in the same executable to break because the erroneous nonnull reaches them.

@nikomatsakis
Copy link
Contributor

Some questions:

  • Is it possible for us to have two distinct definitions in LLVM for malloc, such that the call which is not dead does not inherit the non-null attribute?
  • Alternatively, or perhaps just additionally, we could detect and lint against the mismatch.

@nikomatsakis
Copy link
Contributor

triage: P-medium

We should fix this, but not a burning issue.

@rust-highfive rust-highfive added P-medium Medium priority and removed I-nominated labels Jan 25, 2018
@XAMPPRocky XAMPPRocky added the C-bug Category: This is a bug. label Feb 19, 2018
@Centril Centril added the A-FFI Area: Foreign function interface (FFI) label Nov 20, 2018
@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 25, 2019

Note that this is not restricted to extern "C", but it also impacts Rust function declarations: https://llvm.godbolt.org/z/03s-aL

I wondered whether, in some cases, LLVM could merge the attributes of the declarations with the definition, introducing UB in safe Rust code like this:

fn foo(x: *mut u8) { ... }
extern "Rust" {
    fn foos_mangled_name(x: &mut u8);
}

The above link shows that, currently, when merging noalias, the noalias of the declaration can end up being removed when a definition without noalias is present in the same module.

It is unclear to me whether making the definition noalias would be a sound optimization of LLVM-IR. noalias is modeled after C's restrict, and in C, the behavior of programs with a declaration like the above, which doesn't match the "restrict"/"noalias" qualifier of the definition, is undefined. So LLVM can assume that it never happens, and if it does, merging noalias in any way would be "ok" for C (not for Rust, and unclear whether it's ok for LLVM-IR, hopefully not).

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 25, 2019

Alternatively, or perhaps just additionally, we could detect and lint against the mismatch.

@nikomatsakis if the incorrect declaration is in a different crate, could it become visible here due to LTO ? If so, then warning against this would need some kind of whole-program analysis.

@Elinvynia Elinvynia added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jun 9, 2020
@LeSeulArtichaut LeSeulArtichaut removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jun 9, 2020
@lahwran
Copy link

lahwran commented Feb 19, 2021

my review of current status: running the initial example in the playground, it looks like a warning is emitted. running the example through MIRI also in the playground does not produce any additional warnings besides that the allocation is too large. I see that in the rust-lang/unsafe-code-guidelines issue which mentions this one there is interesting discussion, but neither the docs in unsafe-code-guidelines nor the rustonomicon ffi page appear to mention this in the currently built versions of their docs. it doesn't appear that there's a ready-to-implement suggested fix in either this thread or that thread.

@RalfJung
Copy link
Member

Miri checks the UB listed here, and so far there is no Rust-level UB related to these invalid declarations.

So arguably, right now, the code is fine but the LLVM backend handles it incorrectly. However, I am not sure if there is any way for the LLVM backend to prevent LLVM from "merging" these prototypes, or if there is any hope of adjusting LLVM to not do that any more. Ideally, if "merging" has to happen, the merged definition would have the minimal assumptions of both inputs, and if the inputs are incompatible, that should just be an error.

@Veykril
Copy link
Member

Veykril commented Oct 21, 2022

Is this still an issue? It looks like since 1.63, rust no longer removes the branch for the shown example https://godbolt.org/z/n9PYEKexh

@RalfJung
Copy link
Member

I don't see any reason to assume that this changed in any fundamental way. Examples that show UB are always fragile, but LLVM to my knowledge still assumes the one-definition-rule.

@briansmith
Copy link
Contributor

rust-lang/unsafe-code-guidelines#198 discusses the issue. There isn't a "one definition rule" applicable here; the issue is about compatible/composite declarations and comes from LLVM's C language semantics. I'm not going to repeat everything I said there, but it's all relevant to this.

It is surprising that adding an unused declaration affects the compilation of a Rust program. However, unused declarations affecting the compilation are explicitly allowed by ISO C compatible declaration rules. It seems like rustc could avoid telling LLVM about unused declarations to avoid subjecting Rust programs to those rules. Or, rustc could just emit a different stronger warning for seemingly-unused extern declarations telling the user that even though the declaration went unused, the declaration may still affect compilation. (Indeed, because of this issue, there is really no such thing as an unused extern function declaration.)

@bjorn3
Copy link
Member

bjorn3 commented Oct 22, 2022

How do you define unused declarations? Are they unused if there are no references, if all referencing functions are dead code, or if all references aren't used during the current execution (but may be during another execution)?

@RalfJung
Copy link
Member

Focusing on unused declarations seems very fragile. The case of two used declarations with different attributes is just as dangerous, where one the attributes of one declaration might be applied to the unsuspecting other declaration.

@briansmith
Copy link
Contributor

rust-lang/unsafe-code-guidelines#198 (comment) explains why this is especially dangerous if things are really operating as described, using an example of how one can publish a crate with no unsafe but have it still affect the safety of the program. Static analysis tools looking for unsafe would need to be extended to consider extern declarations as well.

@Lokathor
Copy link
Contributor

There's a lang team thing about handling this by having people put unsafe extern, but I'd honestly prefer if this could be fixed by LLVM and/or rustc.

@bjorn3
Copy link
Member

bjorn3 commented Oct 22, 2022

Cranelift doesn't allow multiple function declarations in the same module with different signatures (at clif ir level) at all. It returns an IncompatibleSignature error. So far I have only seen a single instance where a function was defined with two different signatures in the same module. I fixed it in #95088.

@briansmith
Copy link
Contributor

@bjorn3 I think "same module" matters for non-LTO builds but for LTO builds then all the declarations in the whole program matter.

@thomcc
Copy link
Member

thomcc commented Oct 23, 2022

So far I have only seen a single instance where a function was defined with two different signatures in the same module

What about this?

extern "C" {
fn objc_msgSend(obj: NsId, sel: Sel) -> NsId;
#[allow(clashing_extern_declarations)]
#[link_name = "objc_msgSend"]
fn objc_msgSend_ul(obj: NsId, sel: Sel, i: libc::c_ulong) -> NsId;
}

Or does that not count because they have different names (it would be pretty easy to imagine code that didn't use different names for this, though...)

@RalfJung
Copy link
Member

I opened a PR for this: #128247. However, I think it's only a partial solution... I have no idea what to do for the general case, where there are two imports of the same function with arbitrarily different signature. See the PR description.

@RalfJung RalfJung added the I-compiler-nominated Nominated for discussion during a compiler team meeting. label Aug 5, 2024
@RalfJung
Copy link
Member

RalfJung commented Aug 5, 2024

Turns out this might be too hard for me to fix. I am also increasingly skeptical whether LLVM will cope well with functions declared as @func_name = external global [0 x i8] -- aren't functions special e.g. in how they are treated by the linker on some platforms, and other things like that? Also this will hit entirely untested codepaths in LLVM.

Nominating for t-compiler discussion to see what they are thinking about how to best resolve this issue. We may also want to re-consider the priority of this, as 4 years ago when the priority was set, it wasn't yet clear what the actual issue is.

@pnkfelix
Copy link
Member

@rustbot label: +I-lang-nominated

The proposed fixes for this issue might not practical, at least not without some help from the LLVM side, so we should consider what we want to do about this as a language.

Members of T-compiler in particular suspect that merely issuing a hard error in response to a pair of FFI declarations with distinct types may not suffice; we suspect there are users who are relying on being able to do that, and there is some underlying notion of "compatible" pairs of types that is being used to justify such code.

@rustbot rustbot added the I-lang-nominated Nominated for discussion during a lang team meeting. label Aug 15, 2024
@RalfJung
Copy link
Member

The next plan for #128247 is to try declaring functions with an "empty" signature (no arguments, void return type). That seems less likely to cause issues than declaring them as empty arrays.

The main concern with that approach seems to be that LLVM's special treatment for specific library functions relies on having the signature in the declaration. malloc is a typical example but shouldn't concern us much since we have our own allocation functions and Rust code doesn't usually call malloc directly, but I am wondering e.g. about all the libm functions we expose on our float types and whether they could still be const-folded.

@RalfJung
Copy link
Member

So... what happens when I import the same symbol name as an extern static and as a function?

Right now, it seems like the static "wins":

@testname = external global i32

; Function Attrs: nonlazybind uwtable
define void @mytest() unnamed_addr #0 !dbg !7 {
start:
  %_2 = load i32, ptr @testname, align 4, !dbg !13
  call void @testname(i32 %_2), !dbg !14
  ret void, !dbg !15
}

This code is obviously UB, but if I make the static a 1-ZST there's not actually going to be any memory access there.

@RalfJung
Copy link
Member

The proposed fixes for this issue might not practical, at least not without some help from the LLVM side, so we should consider what we want to do about this as a language.

Note that the cranelift backend fundamentally cannot represent such different signatures. I am not sure if there's any way to use a scheme like "declare all functions as taking no arguments and then define the rest for each call site" there -- @bjorn3?
The GCC backend can likely use a trick similar to what #128247 now does with LLVM.

On the other hand, given cross-crate monomorphization, it is really hard to ensure that we'll never get this cranelift error -- we have to ensure that the signature is consistent (at least insofar as that would affect cranelift function types) across the entire crate graph. If two crates use signatures that are "meaningfully" different (whatever exactly that means), they may work fine in isolation but the program may then fail to build when they are both used in the same dependency tree. That's pretty bad...

@bjorn3
Copy link
Member

bjorn3 commented Aug 18, 2024

Note that the cranelift backend fundamentally cannot represent such different signatures. I am not sure if there's any way to use a scheme like "declare all functions as taking no arguments and then define the rest for each call site" there -- @bjorn3?

For functions it would be possible to declare them with an empty signature and then after referencing it in a function overwrite the signature for this reference, but that would require knowing all function definitions in the codegen unit in advance (as the definition still needs to match the declaration), which is incompatible with anything like LTO as there you did have to make the choice for function signature in one module and when you merge it with another module containing the definition the function signature has to match. For declaring a function and and a static with the same name, this hack wouldn't work either. You can't reference a function/static as the other type from a module.

@RalfJung
Copy link
Member

For functions it would be possible to declare them with an empty signature and then after referencing it in a function overwrite the signature for this reference, but that would require knowing all function definitions in the codegen unit in advance

I am confused. What does "overwrite" mean here? What I meant above (what #128247 does in LLVM) is to always keep the declaration with an empty signature, it never gets changed or "overwritten" in that sense. But when we call the function we give it some arguments anyway and that works.

Maybe this could be encoded in cranelift by taking the global function, casting the function pointer to a different type that matches what this call site uses, and then calling that?

@bjorn3
Copy link
Member

bjorn3 commented Aug 18, 2024

In Cranelift IR a Function contains a list of all references to external functions with function signatures and an index into the list of function declarations stored in the moduld. This is separate from the signature stored for the function declaration in the module. When you reference an external function from cranelift IR you "import" the function declaration from the module. This normally inherits the function signature, but you can overwrite it after the fact.

You can't use an empty signature for a declaration of a function that will be locally defined however. There is a check that the function signature of the Function matches the declaration in the module when defining the function.

@RalfJung
Copy link
Member

RalfJung commented Aug 19, 2024

You can't use an empty signature for a declaration of a function that will be locally defined however. There is a check that the function signature of the Function matches the declaration in the module when defining the function.

That seems to contradict your previous statement where you said "after referencing it in a function overwrite the signature for this reference", so I am still confused. Can the per-function import of the per-module table have a different type than the per-module table or not? If no then this entire thing doesn't seem relevant here as anyway it can't be used. And even if it can, one could have calls of two different imports of the same external function inside one function so a per-caller type signature doesn't help -- we need a per-call-site signature. (Though it would help insofar as it would make the signature consistency check more local, at least if we ignore inlining.)

But I take it the upshot is that cranelift tries to be somewhat typesafe and if a function is declared with a given signature, you have to call it like that. However, presumably it is also possible to create function pointers and cast them to a different type, so that way it'd still be possible to declare them with one signature but call them with another?

@tmiasko
Copy link
Contributor

tmiasko commented Aug 19, 2024

Incorrect static declarations can also introduce immediate undefined behaviour, even when they are never used at runtime. From LLVM reference:

Either global variable definitions or declarations may have an explicit section to be placed in and may have an optional explicit alignment specified. If there is a mismatch between the explicit or inferred section information for the variable declaration and its definition the resulting behavior is undefined.

For global variable declarations, as well as definitions that may be replaced at link time (linkonce, weak, extern_weak and common linkage types), the allocation size and alignment of the definition it resolves to must be greater than or equal to that of the declaration or replaceable definition, otherwise the behavior is undefined.

An example where incorrect size of an unused static introduces undefined behaviour
extern {
    // Suppose that the actual definition is zero sized.
    static S: usize;
}
// Requires: a == 0
#[inline(never)]
pub unsafe fn f(mut a: usize) -> usize {
    let mut s = 0;
    while a != 0 {
        s += unsafe { S };
        a -= 1;
    }
    s
}

Is optimized to:

define noundef i64 @f(i64 noundef %0) unnamed_addr #0 {
start:
  %_4 = load i64, ptr @S, align 8
  %_4.fr = freeze i64 %_4
  %1 = mul i64 %_4.fr, %0
  ret i64 %1
}

@GoldsteinE
Copy link
Contributor

Incorrect static declarations can also introduce immediate undefined behaviour, even when they are never used at runtime.

Does this affect linkme? The LINKME_STOP static is a non-zero-sized static placed at the end of the section, so after any actual elements, so it does not neccessarily point to any valid memory: https://github.com/dtolnay/linkme/blob/0e5e28bd673014b2b8d0a7337eab9489f74d2624/impl/src/declaration.rs#L139-L143

In other words, is this still a problem if the static is never accessed, even in dead code? (It only has its address taken to find out where the section ends).

@RalfJung
Copy link
Member

RalfJung commented Aug 19, 2024 via email

@bjorn3
Copy link
Member

bjorn3 commented Aug 19, 2024

That seems to contradict your previous statement where you said "after referencing it in a function overwrite the signature for this reference", so I am still confused. Can the per-function import of the per-module table have a different type than the per-module table or not? If no then this entire thing doesn't seem relevant here as anyway it can't be used. And even if it can, one could have calls of two different imports of the same external function inside one function so a per-caller type signature doesn't help -- we need a per-call-site signature. (Though it would help insofar as it would make the signature consistency check more local, at least if we ignore inlining.)

Functions would be represented something like the following in clif ir:

function u0:0(i64) {
    fn0 = u0:1(i64) -> i32

block0(v0: i64):
    v1 = call fn0(v0)
    return
}

Here fn0 = u0:1(i64) -> i32 is a reference to an external function with u0:1 as index into the module and (i64) -> i32 as function signature. It is possible to override the latter such that it doesn't match the signature in the module, so declaring u0:1 as () (function signature with no arguments and return value) in the module is possible. The function u0:0 for which this is providing a definition will have to be declared as (i64) (i64 as argument and no return value) however due to the signature check. So for example the following module contents (made up syntax) would be possible:

u0:0 = local "my_function": (i64)
u0:1 = import "imported_function": ()

However if you want to also import "imported_function" as static, you did get an error as "imported_function" is already declared as function. In addition if you were to merge the above module with a module containing a definition of "imported_function":

u0:0 = export "imported_function": (i64) -> i32

that wouldn't be possible as you get two different declarations of "imported_function".

This is not much of an issue currently as Cranelift currently doesn't benefit from LTO (which would be the main reason to want to merge modules) at all due to all optimizations being strictly within a single function, but it might in the future and I don't want to exclude support for it.

But I take it the upshot is that cranelift tries to be somewhat typesafe and if a function is declared with a given signature, you have to call it like that. However, presumably it is also possible to create function pointers and cast them to a different type, so that way it'd still be possible to declare them with one signature but call them with another?

Cranelift has separate instruction for calling a function directly and calling a function pointer. The latter is less efficient as it doesn't allow directly encoding the target address in the call instruction. (By the way cranelift doesn't have typed function pointers, it only has pointer-sized integers. Every use of the call_indirect instruction has to specify the function signature.)

@RalfJung
Copy link
Member

Okay, so encoding calls with the "wrong" signatures via indirect calls works but is inefficient. Makes sense. Maybe it could still be done just for the very rare case where there's a signature mismatch, but it's not pretty. It also seems likely that cranelift will outright reject merging such modules if they have a signature mismatch rather than introducing such casts.

From a Rust perspective, the cranelift signature captures only a small part of what makes up a type, so we could allow some mismatches but not arbitrary mismatches. It would probably have to boil down to something like, the signatures have to be ABI-compatible.

The other point regarding the "signature check" you mention is something I haven't even considered yet for the LLVM case -- merging two modules where one module defines a function and the other one uses it. In that case we'll always want the definition to win. I assume the LLVM encoding via void @fn() will do that.

But IIUC the upshot is, non-ABI-compatible imports of the same symbol anywhere in the crate graph are likely never going to be fully supported in cranelift (as that would require cranelift merging modules with incompatible signatures, and doing the required adjustments to make the result a well-defined module).

@apiraino apiraino removed the I-compiler-nominated Nominated for discussion during a compiler team meeting. label Aug 22, 2024
bors added a commit to rust-lang-ci/rust that referenced this issue Sep 3, 2024
…try>

codegen: do not set attributes on foreign function imports

Currently we are setting all the usual attributes when translating an FFI import, like
```rust
    extern {
        pub fn myfun(x: NonZeroI32) -> &'static mut ();
    }
```
into LLVM. However, if the same function is imported multiple times as different items in Rust, that all becomes a single LLVM declaration. The attributes set on any one of them may then apply to calls done via other imported items. For instance, the import above, *even if it is never called*, can introduce UB if there is another part of the program that also imports `myfun`, calls it via that import, and passes 0 or `myfun` returns NULL.

To fix that, this PR changes the function declarations we emit to all look like `declare `@fn()`.` The one exception are Rust's own allocator functions, which have a bunch of attributes that LLVM currently only honors in the declaration, not the call site -- though with llvm/llvm-project@1953629 we can maybe avoid that in the future.

The hope is that this fixes rust-lang#46188. The only cases I can still think of where we emit different declarations with the same name and one of them "wins" are:
- Rust's native allocation functions. Even then, the "wrong" declarations will just have no argument and return value so I don't think this should cause problems.
- Two `extern static` with the same name but different type, or an `extern static` with the same name as an imported function. Again I doubt the "type of the static" will lead LLVM to deduce things about the function or vice versa, so probably the worst that can happen is LLVM crashes. (And of course if this static actually ends up resolving to a function, that's UB unless the static has size 0. And conversely, if it resolves to a static instead of code then calling the function is UB.)

Fixes rust-lang/miri#3581 by making this not UB.

Cc `@nikic`
bors added a commit to rust-lang-ci/rust that referenced this issue Sep 4, 2024
…try>

codegen: do not set attributes on foreign function imports

Currently we are setting all the usual attributes when translating an FFI import, like
```rust
    extern {
        pub fn myfun(x: NonZeroI32) -> &'static mut ();
    }
```
into LLVM. However, if the same function is imported multiple times as different items in Rust, that all becomes a single LLVM declaration. The attributes set on any one of them may then apply to calls done via other imported items. For instance, the import above, *even if it is never called*, can introduce UB if there is another part of the program that also imports `myfun`, calls it via that import, and passes 0 or `myfun` returns NULL.

To fix that, this PR changes the function declarations we emit to all look like `declare `@fn()`.` The one exception are Rust's own allocator functions, which have a bunch of attributes that LLVM currently only honors in the declaration, not the call site -- though with llvm/llvm-project@1953629 we can maybe avoid that in the future.

The hope is that this fixes rust-lang#46188. The only cases I can still think of where we emit different declarations with the same name and one of them "wins" are:
- Rust's native allocation functions. Even then, the "wrong" declarations will just have no argument and return value so I don't think this should cause problems.
- Two `extern static` with the same name but different type, or an `extern static` with the same name as an imported function. Again I doubt the "type of the static" will lead LLVM to deduce things about the function or vice versa, so probably the worst that can happen is LLVM crashes. (And of course if this static actually ends up resolving to a function, that's UB unless the static has size 0. And conversely, if it resolves to a static instead of code then calling the function is UB.)

Fixes rust-lang/miri#3581 by making this not UB.

Cc `@nikic`
bors added a commit to rust-lang-ci/rust that referenced this issue Sep 11, 2024
…try>

codegen: do not set attributes on foreign function imports

Currently we are setting all the usual attributes when translating an FFI import, like
```rust
    extern {
        pub fn myfun(x: NonZeroI32) -> &'static mut ();
    }
```
into LLVM. However, if the same function is imported multiple times as different items in Rust, that all becomes a single LLVM declaration. The attributes set on any one of them may then apply to calls done via other imported items. For instance, the import above, *even if it is never called*, can introduce UB if there is another part of the program that also imports `myfun`, calls it via that import, and passes 0 or `myfun` returns NULL.

To fix that, this PR changes the function declarations we emit to all look like `declare `@fn()`.` The one exception are Rust's own allocator functions, which have a bunch of attributes that LLVM currently only honors in the declaration, not the call site -- though with llvm/llvm-project@1953629 we can maybe avoid that in the future.

The hope is that this fixes rust-lang#46188. The only cases I can still think of where we emit different declarations with the same name and one of them "wins" are:
- Rust's native allocation functions. Even then, the "wrong" declarations will just have no argument and return value so I don't think this should cause problems.
- Two `extern static` with the same name but different type, or an `extern static` with the same name as an imported function. Again I doubt the "type of the static" will lead LLVM to deduce things about the function or vice versa, so probably the worst that can happen is LLVM crashes. (And of course if this static actually ends up resolving to a function, that's UB unless the static has size 0. And conversely, if it resolves to a static instead of code then calling the function is UB.)

Fixes rust-lang/miri#3581 by making this not UB.

Cc `@nikic`

try-job: x86_64-msvc
try-job: i686-msvc
@RalfJung
Copy link
Member

Turns out this is more tricky than expected -- some calling conventions put the number of arguments into the function name, so LLVM has to know the actual signature.

Not sure if LLVM can be fixed to use the arguments at the call site for this, rather than the arguments at declaration site. But it does look increasingly like there's a lot of technical hurdles for importing the same name with different signatures into one module -- this isn't fundamentally anything that should be hard, it's just that LLVM (and cranelift) are making assumptions that make this hard.

@RalfJung
Copy link
Member

I intend to propose a t-lang design meeting about this problem. Here's the writeup for that: https://hackmd.io/cEi0aL6XSk6DDDASwpVH-w
Please let me know if I missed anything important. :)

@tmandry
Copy link
Member

tmandry commented Sep 18, 2024

Thanks for writing that doc @RalfJung. We discussed during the lang team meeting and my proposal to make incremental progress (which the rest of the team seemed roughly aligned with) is the following:

  • Accept for now that the conflict is UB (as it already is), but
  • Lint against cross-crate incompatibilities as best as we can. Consider making these lints error-by-default. Improve them over time so their definition of "compatibility" becomes better.
  • Lint against using extern instead of unsafe extern in older editions, so it's clearer where users are making important assertions that might lead to UB.
  • If/when the cross-crate incompatibility lint matches exactly the behavior of our backend that we want to expose, consider upgrading it to a hard error, possibly over an edition boundary.

Does this seem like a reasonable path forward? If so, maybe we can put off using a design meeting for this.

The details of the compatibility rules are of course the difficult part, but I'm not sure if T-lang will be very helpful in working those out. I think that has to be informed by a combination of implementation details, the backends we use, the targets we support, and the use cases we discover in the wild. It seems to me that this is a complex enough space that it has to be worked out over time instead of designed all at once.

Put another way, if the question is "what definitions of compatibility are we prepared to accept", my answer would be roughly "whatever is most likely to catch actual bugs while staying out of the way in places it isn't helping". I see some value in getting the lang team to sign off on that as a general direction, but doubt we need a full design meeting to do so.

@RalfJung
Copy link
Member

RalfJung commented Sep 18, 2024

@tmandry I agree the exact compatibility rules need to be figured out "on the ground". I was mostly asking for t-lang input on the bigger conceptual questions, like -- should we lint/error against the cross-crate case of this? We generally spend a lot of effort to ensure that two different crates that build independently can always be linked together further downstream (e.g., the orphan rule). But here we'd introduce exactly the kind of thing the orphan rule intends to prevent -- two crates work fine in isolation but cannot be used together.

OTOH, that is the nature of this UB, so our hands a bit tied. But this seems like a fundamental enough point that I wanted to get some t-lang input. If it doesn't need a design meeting that's fine for me. :)

@tmandry
Copy link
Member

tmandry commented Sep 19, 2024

We generally spend a lot of effort to ensure that two different crates that build independently can always be linked together further downstream (e.g., the orphan rule). But here we'd introduce exactly the kind of thing the orphan rule intends to prevent -- two crates work fine in isolation but cannot be used together.

Good point. I do agree with what you said here..

OTOH, that is the nature of this UB, so our hands a bit tied.

And specifically it's because there's a global namespace shared by all (public) symbols linked into a binary.

I still think the best thing we can do for now is lint on it, letting the lints be allowed anywhere the conflicting crates are brought together, or at the final linked crate. Since it's allowable we can do this without thinking too hard about the implications, and consider them again if we decide to move to a hard error.

@RalfJung
Copy link
Member

Okay. :)

So the next steps would be:

  • adjust codegen to emit fewer attributes for these imports, so that more imports become "compatible" (e.g. if they only differ in a nonnull)
  • tweak the lint to reduce false positives (e.g. from differences behind pointer indirection)
  • make the lint cross-crate

I don't know how to do the last step -- is there a way to iterate all imports from all crates in the current crate graph?

@traviscross traviscross added T-lang Relevant to the language team, which will review and decide on the PR/issue. and removed I-lang-nominated Nominated for discussion during a lang team meeting. labels Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-FFI Area: Foreign function interface (FFI) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.