Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tracking issue] union field access inside const fn #51909

Closed
oli-obk opened this issue Jun 29, 2018 · 51 comments · Fixed by #85769
Closed

[tracking issue] union field access inside const fn #51909

oli-obk opened this issue Jun 29, 2018 · 51 comments · Fixed by #85769
Labels
A-const-eval Area: Constant evaluation, covers all const contexts (static, const fn, ...) C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@oli-obk
Copy link
Contributor

oli-obk commented Jun 29, 2018

union field accesses allow arbitrary transmutations inside const fn and are thus marked as unstable for now to allow us more time to think about the semantics of this without blocking the stabilization of const fn on it.

This is behind the const_fn_union feature gate.

Blocked on rust-lang/const-eval#14.

@oli-obk oli-obk added A-const-fn A-const-eval Area: Constant evaluation, covers all const contexts (static, const fn, ...) labels Jun 29, 2018
@Centril Centril added T-lang Relevant to the language team, which will review and decide on the PR/issue. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC labels Dec 1, 2018
@oli-obk
Copy link
Contributor Author

oli-obk commented May 31, 2019

Now that we don't permit any new functions from getting promoted, I guess we can lift this restriction? In acutal const contexts we check the final value. Although it does mean that users can transmute arbitrary input to safe references and thus dereference them, so via projection we should allow dereferencing raw pointers at the same time.

MSxDOS added a commit to MSxDOS/ntapi that referenced this issue Jul 16, 2019
It may now be used in constants e.g. for static assertions (but not inside constant functions, yet, see rust-lang/rust#51909)
@oli-obk
Copy link
Contributor Author

oli-obk commented Aug 20, 2019

cc @RalfJung @Centril opinions on the above comment?

@Centril
Copy link
Contributor

Centril commented Aug 20, 2019

There's still the pesky run-time issue in that we get the ptr-to-int operation if we stabilize this now. So we get unconst operations without having settled whether it is UB for const fn to act non-deterministically at run-time (and I am sure folks will abuse this if we don't settle it). That is, there's still the "education" aspect that is lacking.

However, we might reasonably use this only in the standard library for now proviso that we can at least agree on the determinism angle in lib{core,alloc,std} and review carefully (which didn't work super well re. promotability before, but at least now we know better...).

@oli-obk
Copy link
Contributor Author

oli-obk commented Aug 20, 2019

if we stabilize this now

we will get this problem whenever we stabilize. If we want to prevent it we'd need to introduce "blessed transmutes", but that just moves the complexity elsewhere and lock us into layouts.

We could set it up in a manner where the feature is still unstable but doesn't block stabilizing const fns that use it, but we explicitly opted not to do so. We could hijack #[allow_internal_unstable(const_fn_union)] and make it work for const fns, too.

@Centril
Copy link
Contributor

Centril commented Aug 20, 2019

we will get this problem whenever we stabilize.

Granted, but timing is key here imo -- we'll need to work towards having the "education" available.

We could set it up in a manner where the feature is still unstable but doesn't block stabilizing const fns that use it, but we explicitly opted not to do so.

Yep, and it has served us well but perhaps the pragmatic choice is to make an exception now.

We could hijack #[allow_internal_unstable(const_fn_union)] and make it work for const fns, too.

It'll need to be followed by a comment justifying each use but otherwise it seems reasonable.

@RalfJung
Copy link
Member

RalfJung commented Aug 24, 2019

ptr-to-int is not itself non-deterministic though. Allocation and pointer comparison to my knowledge are the only non-deterministic operations (and for ptr comparison I have not been able to observe this non-determinism in a single execution).

However, ptr-to-int makes the allocator non-determinism observable, maybe that's what you mean? I suppose you are worried about functions like (maybe using hacks to avoid unstable features):

const fn mk_int() -> usize {
  let v = 42;
  &v as *const _ as usize
}

I think we cannot reasonably make such functions UB -- undefined behavior should be decidable dynamically (e.g. by an interpreter like Miri), but "this function is deterministic" is not decidable. So this can be a safety invariant at best. Good enough for unsafe code to rely on it when interfacting with safe code, but not good enough for the compiler to exploit it.

@Centril
Copy link
Contributor

Centril commented Aug 24, 2019

However, ptr-to-int makes the allocator non-determinism observable, maybe that's what you mean?

Yep.

(maybe using hacks to avoid unstable features):

We intentionally tried to prevent that; would be very unfortunate to find out we failed.

I think we cannot reasonably make such functions UB -- undefined behavior should be decidable dynamically (e.g. by an interpreter like Miri), but "this function is deterministic" is not decidable. So this can be a safety invariant at best. Good enough for unsafe code to rely on it when interfacting with safe code, but not good enough for the compiler to exploit it.

Good points; although it would be unfortunate not to take advantage of CSE for calls to const fn. But at least the type system should be able to assume deterministic execution of const fns. In either case, if it is a safety condition, it will still need good educational material.

@RalfJung
Copy link
Member

Good points; although it would be unfortunate not to take advantage of CSE for calls to const fn

It would be unfortunate indeed, but non-decidable UB would also be unfortunate.

But at least the type system should be able to assume deterministic execution of const fns.

What do you mean by this?

@Centril
Copy link
Contributor

Centril commented Aug 24, 2019

What do you mean by this?

For example:

const fn foo(x: usize) -> usize { /* stuff */ }

fn bar() {
    let n = /* stuff */;
    let x: [u8; dyn foo(n)] = /* stuff */;
    let y: [u8; dyn foo(n)] = x; // OK.
}

@RalfJung
Copy link
Member

For example:

Ah. Well for executing const fn during CTFE I think we can guarantee that they behave deterministically. This does not even require any static checks. The Miri engine just doesn't have a source of non-determinism. (The AllocId of a Pointer is non-deterministic, but it is not observable by the program.)

@tema3210

This comment has been minimized.

@oli-obk

This comment has been minimized.

@powerboat9
Copy link

Since this issue is holding back a few other issues, what needs to be done to fix it?

@RalfJung
Copy link
Member

RalfJung commented Jun 1, 2020

Quoting from a Zulip discussion:

We are talking about unsafe code, so we should compare unsafe const-code with unsafe runtime-code. Unsafe runtime-code has certain rules you have to follow, but it is up to you to follow them -- that's the point of unsafe, there is no guaranteed safety. Violating those rules is commonly called "Undefined Behavior". Also see here.

With unsafe const-code, we have the same problem -- but the rules are different! So some code could be perfectly fine unsafe runtime-code but still be wrong unsafe const-code. And the compiler cannot tell you, because it's unsafe.

IOW, the biggest concern here is how tot each people to use unsafe code in const fn in a way that does not break CTFE. The issue is that "just" following the normal rules for unsafe is not enough.

There's some more background on this at https://github.com/rust-lang/const-eval/blob/master/const_safety.md.

@oli-obk
Copy link
Contributor Author

oli-obk commented Jun 3, 2020

Slightly stupid idea: lint about all unconst operations (with the lint specifically explaining each case individually) and require users to write #[allow(unconst_ops)] on functions or items performing such actions.

This may lead to ppl just adding #![allow(unconst_ops)] to the entire crate instead, which is uncool, and I don't think hijacking the lint system to make this one lint behave differently is a good idea.

It could also be an unconditional warning that can be silenced by using #[unconst] on such a function, const item, static item or expression.

@RalfJung
Copy link
Member

RalfJung commented Jun 3, 2020

Seems strange to me to have a completely different system for unsafe and unconst, when they are conceptually basically the same thing (just for two different languages: full Rust vs const Rust).

@oli-obk
Copy link
Contributor Author

oli-obk commented Jun 3, 2020

Yea, I just noticed we could go even more fine-grained:

#[unconst(raw_ptr_deref)].

But this made me realize that any unsafe operation can likely end up doing unconst things in some cases... so we're back to the root problem that we'd just end up marking everything, so there's no useful distinction between unconst and unsafe.

@powerboat9
Copy link

Ok. How do unions act differently in compile time and run time?

@RalfJung
Copy link
Member

RalfJung commented Jun 5, 2020

They don't? Not sure where you got the idea.

I linked to a discussion about this above. I think the main concern is that "well-behaved" const code simply has different rules than to be well-behaved "normal" code. Also see this document.

Accordingly, I am somewhat confused by @oli-obk's focus on differences between CTFE and runtime. Our CTFE checks should already make sure that does not happen. (Smells like an XY problem to me -- some of the reasons const has different rules is that during CTFE we cannot do certain operations, and the reason for that has to do with not knowing the actual base address of an allocation. But that's already two layers down the rabbit hole from the key point, which is that the rules are different.)

@oli-obk
Copy link
Contributor Author

oli-obk commented Jun 6, 2020

Our CTFE checks should already make sure that does not happen.

They do right now, but if we ever want to allow [T]::eq, we'll end up having something in CTFE that will yield different results than runtime. The important thing is that this is not observable from safe code. Unsafe code may observe this internally for optimizations.

And yes, the root problem for this is that we don't know the base address of an allocation, so for anything CTFE related, the base addresses must be irrelevant. An operation that would observe the base addresses is const-unsafe because you can't let safe code ever observe that base address. "Observing a base address" can e.g. be the fact that we'd return false for &42 as *const u8 == 1234500 as *const i32 even if that theoretically may be true during runtime.

@RalfJung
Copy link
Member

RalfJung commented Jun 6, 2020

They do right now, but if we ever want to allow [T]::eq, we'll end up having something in CTFE that will yield different results than runtime.

I don't think so. Slice equality itself does not actually differ. The only issue is the pointer equality test fast-path, and I've seen a PR that proposed removing it (it seems to be a perf win in some cases and a perf loss in others). If needed we could have a perma-unstable hack to skip the ptr check in CTFE.

Also, I see no connection of this problem to transmutes/unions. The problem you mention is (I think) related to "how can we implement ptr equality binops in CTFE without breaking anything"; that question is fully orthogonal to "how do we teach people to not misuse transmutes in CTFE". Transmutes cause problems even with our current non-implementation of ptr binops, and ptr binops cause problems even without transmute.

The kind of misuse I have in mind is something like

const fn foo(x: usize, y: &i32) -> usize {
  let y: usize = unsafe { transmute(y) };
  x * y
}

This is a sound function as far as runtime Rust is concerned, but it is const-unsound ("unconst") because it can cause a const-failure with const-safe inputs.

@oli-obk
Copy link
Contributor Author

oli-obk commented Jun 6, 2020

Ok. Let's stop focussing on hypothetical differences and just address the part where a function is unsound if it causes an error at compile time while being sound at runtime. I'll adjust the text accordingly

@RalfJung
Copy link
Member

RalfJung commented Jun 6, 2020

a function is unsound if it causes an error at compile time while being sound at runtime.

Yeah, basically. Of course compile-time functions also can make stronger assumptions about their inputs (e.g., usize are "proper integers" and not pointer values). So it's not actually about comparing behavior of a function with the same inputs between CTFE and RT. It's about having both of these properties:

  • for all RT-safe inputs, the function behaves in an RT-safe way and returns a RT-safe output (that's normal runtime soundness)
  • for all CTFE-safe inputs, the function behaves in a CTFE-safe way (does not cause "unsupported" or "UB" errors) and returns a CTFE-safe output.

@tema3210
Copy link

tema3210 commented Jun 6, 2020

We have dataflow analysis already, why not to use it to permit what is already known to be sound? For example we can check usage of alocations to forbid pointer to numbers casts which change the output. So we actually want much better check of function purity. I don't think we want to forbid use of entire language primitive because it could cause UB.

@tema3210
Copy link

tema3210 commented Jun 6, 2020

Also we may want to make the ops const someday, but i'm not sure which ones.

@oli-obk
Copy link
Contributor Author

oli-obk commented Jun 6, 2020

why not to use it to permit what is already known to be sound?

That's the whole point! We want to permit things that are sound, but there's no way to make it safe in general, just like we can't make transmute safe in general. This entire discussion is about allow things, but trusting the developer to know the const rules. You can still transmute a reference to a usize and back, and that would be sound I think. That doesn't mean transmuting a reference to a usize is sound in general.

All we're trying to say is that the rules are stricter (or there are more rules) at compile time than at rumtime, and devs need to be aware of that.

@RalfJung
Copy link
Member

RalfJung commented Jun 6, 2020

@tema3210 this entire discussion is about unsafe code, which the compiler cannot check for correctness. If it could, it would be safe code. ;) So no static analysis is going to help.

@powerboat9
Copy link

We already have a mechanism for preventing conversions from pointers and references to regular numbers, shown here and here. My question is why that only works when not in a function.

@oli-obk
Copy link
Contributor Author

oli-obk commented Jun 8, 2020

The difference is that const fn can also be executed at runtime. So we need to address what the differences between invoking a const fn at runtime and at compile time are. Ideally there's no difference, but in practice that doesn't work. Thus, we want to define that a const fn must behave the same at runtime and at compile time. This means that if you invoke foo() at runtime and it succeeds, invoking foo() at compile time must succeed, too, and must even return the same value (if foo is a safe const fn).

A constant item is always interpreted at compile-time, so we do not have this duality. Only its final value is used at runtime, and we have checks ensuring that said final value is const sound.

@RalfJung
Copy link
Member

RalfJung commented Jun 8, 2020

@powerboat9 those are easy to check as it's a single constant that we can just evaluate. But to do the same with a function, we'd have to run it with all possible inputs, which clearly is not possible.

@powerboat9
Copy link

I don't really see how unions would behave differently during runtime and during compile time. If a union throws a miri evaluation error, then the code never compiles and thus the function can never be called at runtime.

@powerboat9
Copy link

And if a function could error at compile time but doesn't, then it will be impossible to observe running at compile time and no difference between runtime and compile time behavior can be observed.

@oli-obk
Copy link
Contributor Author

oli-obk commented Jun 22, 2020

TLDR: the examples shown have unspecified behaviour depending on compiler versions and optimization levels, which is the exact same behaviour that the already known set of runtime UB exhibits at compile-time.

It's not about causing UB at runtime due to UB within a constant evaluation (we prevent that via static checks on constants). Let's just consider the regular UB things that the rustonomicon defines. If you have a function like

const fn foo() -> u32 {
    unsafe {
		let x: usize = mem::uninitialized();
		[5, 6, 7][x]
	}
}

and you evaluate it at compile-time, there's no clear behaviour that can come of this. Either the function

  • returns 5, 6 or 7
  • returns a random value
  • errors during evaluation because the length check failed
  • errors during evaluation because the length check succeeded, but then an OOB value was read
  • errors during evaluation because undefined bytes were compared with something else (current behaviour)

The function will do the same thing no matter how many times and in what circumstances you call it during CTFE. That is, for the same compiler with the same compiler flags. Any change in optimizations or compiler versions can and will change what the function returns at CTFE. Calling it at runtime is UB anyway, so we can ignore that, that has been covered thoroughly by lots of articles.

Now, let's consider the additional UB proposed in this issue:

const fn foo() -> u32 {
    let x = 42;
    let y = &x as *const i32 as usize;
    50000 / y
}

The above is considered UB in CTFE (as suggested in this issue), because you are inspecting a pointer beyond your ability to dereference or offset it. So, what can happen here? The function can either

  • return a random value
  • error during evaluation because the division triggered the division by zero panic
  • error during evaluation because the div-by-zero check wasn't triggered, causing a division by zero, which is UB even at runtime
  • error during the division because the value of y isn't known (current behaviour)
  • error during the conversion to usize or at least at the assignment of y

The argument was that we can make a guaranteed detection of the UB, so it's not a problem. But that's the thing about UB: once bring optimizations into the mix, even deterministic UB detection will become nondeterministic depending on whatever the optimizations are up to. In the above example the optimizer detecting UB can resonably replace the entire function by return 0xDEAFBEES, which can still sting because they see and smell you. Jokes aside, you now have a function which behaves eratically depending on compiler version and optimization levels. Sounds a lot like UB, doesn't it?

@RalfJung
Copy link
Member

you now have a function which behaves eratically depending on compiler version and optimization levels. Sounds a lot like UB, doesn't it?

Somewhat. It is much more constrained in that the effects of UB are "returns a weird result from this one const-eval query", and there is no chance of it deleting your files on its own, but that's about it.

return a random value

One possibility here is also "returns an uninitialized value", which is different from picking any fixed value non-deterministically.

@RalfJung
Copy link
Member

Citing what I said on Zulip:

We could make CTFE misbehavior "unspecified behavior". that means we should list what the possible behaviors are, which I guess is "const-eval will return some arbitrary thing and leave arbitrary stuff in the memory that gets interned". but the compiler won't crash or eat your kittens.

@oli-obk basically extended on what that would mean, I think.
This closes one possible option in the Rust CTFE design space: JIT'ing MIR and running it unsandboxed/unchecked during compilation. I don't think we want to do that, but it feels worth pointing out. If we say behavior is "unspecified", we have to say what the possible behaviors are that implementations can pick from, and they have to stay in those bounds. This means e.g. mandatory bounds checks during CTFE evaluation, even when evaluating unsafe code.

@powerboat9
Copy link

Perhaps have Scalar::Uninit for currently uninitialized values?

@oli-obk
Copy link
Contributor Author

oli-obk commented Jun 30, 2020

@powerboat9 I don't know what your comment refers to and what you are suggesting where such a Scalar::Uninit would be used. Please elaborate (ideally with code examples where you think these values would occur).

@RalfJung
Copy link
Member

We have ScalarMaybeUninit to represent potentially uninitialized values. But anyway I don't think these implementation details have anything to do with the discussion around unions. The question is not how to implement union accesses -- they have been implemented years ago -- the question is how to best expose them on stable without causing issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-const-eval Area: Constant evaluation, covers all const contexts (static, const fn, ...) C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants