rust-lang · chorman0773 · Oct 25, 2024 · Oct 25, 2024 · Oct 25, 2024 · Nov 11, 2024
diff --git a/src/memory-model.md b/src/memory-model.md
@@ -1,5 +1,66 @@
 # Memory model
 
-Rust does not yet have a defined memory model. Various academics and industry professionals
-are working on various proposals, but for now, this is an under-defined place
-in the language.
+r[memory]
+
+The Memory Model of Rust is incomplete and not fully decided. The following is some of the detail worked out so far.
+
+## Bytes
+
+r[memory.byte]
+
+r[memory.byte.intro]
+The most basic unit of memory in Rust is a byte. All values in Rust are computed from 0 or more bytes read from an allocation.
+
+> [!NOTE]
+> While bytes in Rust are typically lowered to hardware bytes, they may contain additional values,
+> such as being uninitialized, or storing part of a pointer.
-> While bytes in Rust are typically lowered to hardware bytes, they may contain additional values,
-> such as being uninitialized, or storing part of a pointer.
+> While bytes in Rust are typically lowered to hardware bytes, Rust uses an "abstract"
+> notion of bytes that can make distinctions which are absent in hardware,
+> such as being uninitialized, or storing part of a pointer.
-> While bytes in Rust are typically lowered to hardware bytes, they may contain additional values,
-> such as being uninitialized, or storing part of a pointer.
+> While bytes in Rust are typically lowered to hardware bytes, Rust uses an "abstract"
+> notion of bytes that can make distinctions which are absent in hardware,
+> such as being uninitialized, or storing part of a pointer.
+
+r[memory.byte.contents]
+Each byte may have one of the following values:
+
+r[memory.byte.init]
+* An initialized byte containing a `u8` value and optional [provenance][type.pointer.provenance],
+
+r[memory.byte.uninit]
+* An uninitialized byte.
+
+> [!NOTE]
+> Uninitialized bytes do not have a value and do not have a pointer fragment.
+
+> [!NOTE]
+> The above list is not yet guaranteed to be exhaustive.
+
+## Value Encoding
+
+r[memory.encoding]
+
+r[memory.encoding.intro]
+Each type in Rust has 0 or more values, which can have operations performed on them. Values are represented in memory by encoding them
+
+> [!NOTE]
+> `0u8`, `1337i16`, and `Foo{bar: "baz"}` are all values
+
+r[memory.encoding.op]
+Each type defines a pair of properties which, together, define the representation of values of the type. The *encode* operation takes a value of the type and converts it into a sequence of bytes equal in length to the size of the type, and the *decode* operation takes such a sequence of bytes and optionally converts it into a value. Encoding occurs when a value is written to memory, and decoding occurs when a value is read from memory.
+
+> [!NOTE]
+> Only certain byte sequences may decode into a value of a given type. For example, a byte sequence consisting of all zeroes does not decode to a value of a reference type.
+
+r[memory.encoding.representation]
+A sequence of bytes is said to represent a value of a type, if the decode operation for that type produces that value from that sequence of bytes. The representation of a type is the partial relation between byte sequences and values those sequences represent.
-A sequence of bytes is said to represent a value of a type, if the decode operation for that type produces that value from that sequence of bytes. The representation of a type is the partial relation between byte sequences and values those sequences represent.
+A sequence of bytes is said to represent a value of a type, if the decode operation for that type produces that value from that sequence of bytes. The representation of a type is the partial relation between byte sequences and values those sequences represent.
+This relation is functional, i.e., a given byte sequence represents at most one value.
-A sequence of bytes is said to represent a value of a type, if the decode operation for that type produces that value from that sequence of bytes. The representation of a type is the partial relation between byte sequences and values those sequences represent.
+A sequence of bytes is said to represent a value of a type, if the decode operation for that type produces that value from that sequence of bytes. The representation of a type is the partial relation between byte sequences and values those sequences represent.
+This relation is functional, i.e., a given byte sequence represents at most one value.
+
+> [!NOTE]
+> Representation is related to, but is not the same property as, the layout of the type.
-> Representation is related to, but is not the same property as, the layout of the type.
+> The representation of a type is determined by its layout. For instance, the layout of a struct defines the offsets of all its fields, and this in turn defines the representation of struct values as sequences of bytes.
-> Representation is related to, but is not the same property as, the layout of the type.
+> The representation of a type is determined by its layout. For instance, the layout of a struct defines the offsets of all its fields, and this in turn defines the representation of struct values as sequences of bytes.
+
+r[memory.encoding.symmetric]
+The result of encoding a given value of a type is a sequence of bytes that represents that value.
+
+> [!NOTE]
+> This means that a value can be copied into memory and copied out and the result is the same value.
+> The reverse is not necessarily true, a sequence of bytes read as a value then written to another location (called a typed copy) will not necessarily yield the same sequence of bytes. For example, a typed copy of a struct type will leave the padding bytes of that struct uninitialized.
+
+r[memory.encoding.decode]
+If a value of type `T` is decoded from a sequence of bytes that does not represent any value, the behavior is undefined.
+
+> [!NOTE]
+> For example, it is undefined behavior to read a `0x02` byte as `bool`.
diff --git a/src/types/array.md b/src/types/array.md
@@ -31,6 +31,10 @@ always bounds-checked in safe methods and operators.
 > Note: The [`Vec<T>`] standard library type provides a heap-allocated resizable
 > array type.
 
+r[type.array.repr]
+An array value is represented by each element in ascending index order, placed immediately adjacent in memory.
+
+
 [_Expression_]: ../expressions.md
 [_Type_]: ../types.md#type-expressions
 [`usize`]: numeric.md#machine-dependent-integer-types

diff --git a/src/types/boolean.md b/src/types/boolean.md
@@ -21,9 +21,10 @@ r[type.bool.layout]
 An object with the boolean type has a [size and alignment] of 1 each.
 
 r[type.bool.repr]
-The value false has the bit pattern `0x00` and the value true has the bit pattern
-`0x01`. It is [undefined behavior] for an object with the boolean type to have
-any other bit pattern.
+A `bool` is represented as a single initialized byte with a value of `0x00` corresponding to `false` and a value of `0x01` corresponding to `true`.
+
+> [!NOTE]
+> No other representations are valid for `bool`. Undefined Behaviour occurs when any other byte is read as type `bool`.
 
 r[type.bool.usage]
 The boolean type is the type of many operands in various [expressions]:
@@ -126,14 +127,6 @@ r[type.bool.expr.cmp.less]
 r[type.bool.expr.cmp.less-eq]
 * `a <= b` is the same as `a == b | a < b`
 
-## Bit validity
-
-r[type.bool.validity]
-
-The single byte of a `bool` is guaranteed to be initialized (in other words,
-`transmute::<bool, u8>(...)` is always sound -- but since some bit patterns
-are invalid `bool`s, the inverse is not always sound).
-
 [boolean logic]: https://en.wikipedia.org/wiki/Boolean_algebra
 [enumerated type]: enum.md
 [expressions]: ../expressions.md

diff --git a/src/types/enum.md b/src/types/enum.md
@@ -14,14 +14,31 @@ unit-like struct.
 r[type.enum.constructor]
 New instances of an `enum` can be constructed with a [struct expression].
 
-r[type.enum.value]
-Any `enum` value consumes as much memory as the largest variant for its
-corresponding `enum` type, as well as the size needed to store a discriminant.
-
 r[type.enum.name]
 Enum types cannot be denoted *structurally* as types, but must be denoted by
 named reference to an [`enum` item].
 
+## Enum values and representation
+
+r[type.enum.value]
+
+r[type.enum.value.intro]
+An enum value corresponds to exactly one variant of the enum, and consists of the fields of that variant
-An enum value corresponds to exactly one variant of the enum, and consists of the fields of that variant
+An enum value corresponds to exactly one variant of the enum, and consists of the fields of that variant.
-An enum value corresponds to exactly one variant of the enum, and consists of the fields of that variant
+An enum value corresponds to exactly one variant of the enum, and consists of the fields of that variant.
+
+> [!NOTE]
+> An enum with no variants therefore has no values.
+
+r[type.enum.value.value-padding]
+A byte is a [padding][type.union.value.padding] byte of an enum if that byte is not part of the representation of the discriminant of the enum, and in each variant it either:
+* Does not overlap with a field of the variant, or
+* Overlaps with a padding byte in a field of that variant.
+
+r[type.enum.value.repr]
+The representation of a value of an enum type includes the representation of each field of the variant at the appropriate offsets. When encoding a value of an enum type, each byte which is not use d to store a field of the variant or the discriminant is . In the case of a [`repr(C)`][layout.repr.c.adt] or a [primitive-repr][layout.repr.primitive.adt] enum, the discriminant of the variant is represented as though by the appropriate integer type stored at offset 0.
+
+> [!NOTE]
+> Most `repr(Rust)` enums will also store a discriminant in the representation of the enum, but the exact placement or type of the discriminant is unspecified, as is the value that represents each variant.
-> Most `repr(Rust)` enums will also store a discriminant in the representation of the enum, but the exact placement or type of the discriminant is unspecified, as is the value that represents each variant.
+> Most `repr(Rust)` enums will also store a discriminant in the representation of the enum, but the exact placement or type of the discriminant is unspecified, as is the byte sequence that represents each variant.
-> Most `repr(Rust)` enums will also store a discriminant in the representation of the enum, but the exact placement or type of the discriminant is unspecified, as is the value that represents each variant.
+> Most `repr(Rust)` enums will also store a discriminant in the representation of the enum, but the exact placement or type of the discriminant is unspecified, as is the byte sequence that represents each variant.
+
 [^enumtype]: The `enum` type is analogous to a `data` constructor declaration in
              Haskell, or a *pick ADT* in Limbo.
 

diff --git a/src/types/function-pointer.md b/src/types/function-pointer.md
@@ -62,6 +62,12 @@ let bo: Binop = add;
 x = bo(5,7);
 ```
 
+r[type.fn-pointer.value]
+A value of a function pointer type consists of an non-null address. A function pointer value is represented the same as an address represented as an unsigned integer type with the same width as the function pointer.
-A value of a function pointer type consists of an non-null address. A function pointer value is represented the same as an address represented as an unsigned integer type with the same width as the function pointer.
+A value of a function pointer type consists of a non-null address. A function pointer value is represented the same as an address represented as an unsigned integer type with the same width as the function pointer.
-A value of a function pointer type consists of an non-null address. A function pointer value is represented the same as an address represented as an unsigned integer type with the same width as the function pointer.
+A value of a function pointer type consists of a non-null address. A function pointer value is represented the same as an address represented as an unsigned integer type with the same width as the function pointer.
+
+> [!NOTE]
+> Whether or not a function pointer value has provenance, and whether or not this provenance is represented as pointer fragments, is not yet decided.
+
 ## Attributes on function pointer parameters
 
 r[type.fn-pointer.attributes]

diff --git a/src/types/never.md b/src/types/never.md
@@ -10,6 +10,9 @@ r[type.never.intro]
 The never type `!` is a type with no values, representing the result of
 computations that never complete.
 
+> [!NOTE]
+> Because `!` has no values, reading it from memory (or otherwise producing a value of the type at runtime) is immediate undefined behaviour.
+
 r[type.never.coercion]
 Expressions of type `!` can be coerced into any other type.
 

diff --git a/src/types/numeric.md b/src/types/numeric.md
@@ -59,9 +59,46 @@ r[type.numeric.int.size.minimum]
 > pointer support is limited and may require explicit care and acknowledgment
 > from a library to support.
 
-## Bit validity
+## Representation
 
-r[type.numeric.validity]
+r[type.numeric.repr]
 
-For every numeric type, `T`, the bit validity of `T` is equivalent to the bit
-validity of `[u8; size_of::<T>()]`. An uninitialized byte is not a valid `u8`.
+r[type.numeric.repr.integer]
+Each value of an integer type is a whole number. For unsigned types, this is a positive integer or `0`. For signed types, this can either be a positive integer, negative integer, or `0`.
+
+r[type.numeric.repr.integer-width]
+The range of values an integer type can represent depends on its signedness and its width, in bits. The width of type `uN` or `iN` is `N`. The width of type `usize` or `isize` is the value of the `target_pointer_width` property.
+
+> [!NOTE]
+> There are exactly `1<<N` unique values of an integer type of width `N`.
+> In particular, for an unsigned type, these values are in the range `0..(1<<N)` and for a signed type, are in the range `-(1<<(N-1))..(1<<(N-1))`, using rust range syntax.
+
+r[type.numeric.repr.unsigned]
+A value `i` of an unsigned integer type `U` is represented by a sequence of initialized bytes, where the `m`th successive byte according to the byte order of the platform is `(i >> (m*8)) as u8`, where `m` is between `0` and the size of `U`. None of the bytes produced by encoding an unsigned integer has a pointer fragment.
+
+> [!NOTE]
+> The two primary byte orders are `little` endian, where the bytes are ordered from lowest memory address to highest, and `big` endian, where the bytes are ordered from highest memory address to lowest.
+> The `cfg` predicate `target_endian` indicates the byte order
+
+> [!WARN]
+> On `little` endian, the order of bytes used to decode an integer type is the same as the natural order of a `u8` array - that is, the `m` value corresponds with the `m` index into a same-sized `u8` array. On `big` endian, however, the order is the opposite of this order - that is, the `m` value corresponds with the `size_of::<T>() - m` index in that array.
+
+r[type.numeric.repr.signed]
+A value `i` of a signed integer type with width `N` is represented the same as the corresponding value of the unsigned counterpart type which is congruent modulo `2^N`.
+
+> [!NOTE]
+> This encoding of signed integers is known as the 2s complement encoding.
+
+r[type.numeric.repr.float-width]
+Each floating-point type has a width. The type `fN` has a width of `N`.
+
+r[type.numeric.repr.float]
+A floating-point value is represented by the following decoding:
+* The byte sequence is decoded as the unsigned integer type with the same width as the floating-point type,
+* The resulting integer is decoded according to [IEEE 754-2019] into the format used for the type.
+
+
+r[type.numeric.repr.float-format]
+The [IEEE 754-2019] `binary32` format is used for `f32`, and the `binary64` format is used for `f64`. The set of values for each floating-point type are determined by the respective format.
+
+[IEEE 754-2019]: https://ieeexplore.ieee.org/document/8766229
diff --git a/src/types/pointer.md b/src/types/pointer.md
@@ -81,19 +81,60 @@ r[type.pointer.smart]
 
 The standard library contains additional 'smart pointer' types beyond references and raw pointers.
 
-## Bit validity
+## Pointer values and representation
 
-r[type.pointer.validity]
+r[type.pointer.value]
 
-r[type.pointer.validity.pointer-fragment]
-Despite pointers and references being similar to `usize`s in the machine code emitted on most platforms,
-the semantics of transmuting a reference or pointer type to a non-pointer type is currently undecided.
-Thus, it may not be valid to transmute a pointer or reference type, `P`, to a `[u8; size_of::<P>()]`.
+r[type.pointer.value.thin]
+Each thin pointer consists of an address and an optional [provenance][type.pointer.provenance]. The address refers to which byte the pointer points to. The provenance refers to which bytes the pointer is allowed to access, and the allocation those bytes are within.
 
-r[type.pointer.validity.raw]
-For thin raw pointers (i.e., for `P = *const T` or `P = *mut T` for `T: Sized`),
-the inverse direction (transmuting from an integer or array of integers to `P`) is always valid.
-However, the pointer produced via such a transmutation may not be dereferenced (not even if `T` has size zero).
+> [!NOTE]
+> A pointer that does not have a provenance may be called an invalid or dangling pointer.
+
+r[type.pointer.value.thin-repr]
+The representation of a value of a thin pointer is a sequence of initialized bytes with `u8` values given by the representation of its address as a value of type `usize`, and pointer fragments corresponding to its provenance, if present.
+
+r[type.pointer.value.thin-ref]
+A thin reference to `T` consists of a non-null, well aligned address, and provenance for `size_of::<T>()` bytes starting from that address. The representation of a thin reference to `T` is the same as the pointer with the same address and provenance.
+
+> [!NOTE]
+> This is true for both shared and mutable references. There are additional constraints enforced by the aliasing model that are not yet fully decided.
+
+r[type.pointer.value.wide]
+A wide pointer or reference consists of a data pointer or reference, and a pointee-specific metadata value.
+
+r[type.pointer.value.wide-reference]
+The data pointer of a wide reference has a non-null address, well aligned for `align_of_val(self)`, and with provenance for `size_of_val(self)` bytes.
+
+r[type.pointer.value.wide-repr]
+A wide pointer or reference is represented the same as `struct WidePointer<M>{data: *mut (), metadata: M}` where `M` is the pointee metadata type, and the `data` and `metadata` fields are the corresponding parts of the pointer.
+
+> [!NOTE]
+> The `WidePointer` struct has no guarantees about layout, and has the default representation.
+> In particular, it is not guaranteed that you can write a struct type with the same layout as `WidePointer<M>`.
+
+## Pointer Provenance
+
+r[type.pointer.provenance]
+
+r[type.pointer.provenance.intro]
+Pointer Provenance is a term that refers to additional data carried by pointer values in Rust, beyond its address. When stored in memory, Provenance is encoded in the Pointer Fragment part of each byte of the pointer.
+
+r[type.pointer.provenance.allocation]
+Whenever a pointer to a particular allocation is produced by using the reference or raw reference operators, or when a pointer is returned from an allocation function, the resulting pointer has provenance that refers to that allocation.
+
+> [!NOTE]
+> There is additional information encoded by provenance, but the exact scope of this information is not yet decided.
+
+r[type.pointer.provenance.dangling]
+A pointer is dangling if it has no provenance, or if it has provenance to an allocation that has since been deallocated. An access, except for an access of size zero, using a dangling pointer, is undefined behavior.
+
+> [!NOTE]
+> Allocations include local and static variables, as well as temporaries. Local Variables and Temporaries are deallocated when they go out of scope.
+
+> [!WARN]
+> The above is necessary, but not sufficient, to avoid undefined behavior. The full requirements for pointer access is not yet decided.
+> A reference obtained in safe code is guaranteed to be valid for its usable lifetime, unless interfered with by unsafe code.
 
 [Interior mutability]: ../interior-mutability.md
 [_Lifetime_]: ../trait-bounds.md

diff --git a/src/types/struct.md b/src/types/struct.md
@@ -29,6 +29,17 @@ A _unit-like struct_ type is like a struct type, except that it has no fields.
 The one value constructed by the associated [struct expression] is the only
 value that inhabits such a type.
 
+## Struct
+
+r[type.struct.value]
+
+r[type.struct.value.intro]
+A value of a struct type consists of a list of values for each field.
+
+r[type.struct.value.encode-decode]
+When a value of a struct type is encoded, each field of the struct is encoded at its corresponding offset and each byte that is not within a field of the struct is set to uninit.
+When a value of a struct type is decoded, each field of the struct is decoded from its corresponding offset. Each byte that is not within a field of the struct is ignored.
+
 [^structtype]: `struct` types are analogous to `struct` types in C, the
     *record* types of the ML family, or the *struct* types of the Lisp family.
 

diff --git a/src/types/textual.md b/src/types/textual.md
@@ -10,10 +10,13 @@ A value of type `char` is a [Unicode scalar value] (i.e. a code point that is
 not a surrogate), represented as a 32-bit unsigned word in the 0x0000 to 0xD7FF
 or 0xE000 to 0x10FFFF range.
 
-r[type.text.char-precondition]
-It is immediate [undefined behavior] to create a
-`char` that falls outside this range. A `[char]` is effectively a UCS-4 / UTF-32
-string of length 1.
+> [!NOTE]
+> It is immediate [undefined behavior] to create a
+> `char` that falls outside this range. A `[char]` is effectively a UCS-4 / UTF-32
+> string of length 1.
+
+r[type.text.char-repr]
+A value of type `char` is represented as the value of type `u32` with value equal to the code point that it represents.
 
 r[type.text.str-value]
 A value of type `str` is represented the same way as `[u8]`, a slice of
@@ -26,18 +29,6 @@ r[type.text.str-unsized]
 Since `str` is a [dynamically sized type], it can only be instantiated through a
 pointer type, such as `&str`.
 
-## Layout and bit validity
-
-r[type.text.layout]
-
-r[type.layout.char-layout]
-`char` is guaranteed to have the same size and alignment as `u32` on all platforms.
-
-r[type.layout.char-validity]
-Every byte of a `char` is guaranteed to be initialized (in other words,
-`transmute::<char, [u8; size_of::<char>()]>(...)` is always sound -- but since
-some bit patterns are invalid `char`s, the inverse is not always sound).
-
 [Unicode scalar value]: http://www.unicode.org/glossary/#unicode_scalar_value
 [undefined behavior]: ../behavior-considered-undefined.md
 [dynamically sized type]: ../dynamically-sized-types.md
diff --git a/src/types/tuple.md b/src/types/tuple.md
@@ -49,6 +49,15 @@ Furthermore, various expressions will produce the unit value if there is no othe
 r[type.tuple.access]
 Tuple fields can be accessed by either a [tuple index expression] or [pattern matching].
 
+r[type.tuple.repr]
+The values and representation of a tuple type are the same as a [struct type][type.struct.value] with the same fields and layout.
+
+> [!NOTE]
+> In general, it is not guaranteed that any particular struct type will match the layout of a given tuple type.
+
+r[type.tuple.padding]
+A tuple has the same [padding bytes][type.union.value.padding] as a struct type with the same fields and layout.
+
 [^1]: Structural types are always equivalent if their internal types are equivalent.
       For a nominal version of tuples, see [tuple structs].