From 21ad05163dd9d956ea1ec884484376d5abb34b3f Mon Sep 17 00:00:00 2001 From: Connor Horman Date: Fri, 25 Oct 2024 15:30:06 -0400 Subject: [PATCH 01/22] Add Values and Representation chapter --- src/SUMMARY.md | 1 + src/values.md | 175 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 176 insertions(+) create mode 100644 src/values.md diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 91f343b8d..8513bcfc4 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -99,6 +99,7 @@ - [Type coercions](type-coercions.md) - [Destructors](destructors.md) - [Lifetime elision](lifetime-elision.md) + - [Values and Representation](values.md) - [Special types and traits](special-types-and-traits.md) diff --git a/src/values.md b/src/values.md new file mode 100644 index 000000000..a6ca29dc1 --- /dev/null +++ b/src/values.md @@ -0,0 +1,175 @@ +# Values and Representation + +r[value] + +## Bytes + +r[value.byte] + +r[value.byte.intro] +The Most basic unit of Memory in Rust is a Byte. All values in Rust are computed from 0 or more bytes read from an allocation. + +> [!NOTE] +> While bytes in Rust are typically lowered to hardware bytes, they may contain additional values, +> such as being uninitialized, or storing part of a pointer. + +r[value.byte.init] +Each byte may be initialized, and contain a value of type `u8`, as well as an optional pointer fragment. + +r[value.byte.uninit] +Each byte may be uninitialized. + +> [!NOTE] +> Uninitialized bytes do not have a value and do not have a pointer fragment. + +## Value Encoding + +r[value.encoding] + +r[value.encoding.intro] +Each type in Rust has 0 or more values, which can have operations performed on them + +> [!NOTE] +> `0u8`, `1337i16`, and `Foo{bar: "baz"}` are all values + +r[value.encoding.op] +Each value of a type can be encoded into a sequence of bytes, and decoded from a sequence of bytes, which has a length equal to the size of the type. +The operation to encode or decode a value is determined by the representation of the type. + +> [!NOTE] +> Representation is related to, but is not the same property as, the layout of the type. + +r[value.encoding.decode] +If a value of type `T` is decoded from a sequence of bytes that does not correspond to a defined value, the behavior is undefined. If a value of type `T` is decoded from a sequence of bytes that contain pointer fragments, which are not used to represent the value, the pointer fragments are ignored. + +## Primitive Values + +r[value.primitive] + +r[value.primitive.integer] +Each value of an integer type is a whole number. For unsigned types, this is a positive integer or `0`. For signed types, this can either be a positive integer, negative integer, or `0`. + +r[value.primtive.integer-width] +The range of values an integer type can represent depends on its signedness and its width, in bits. The width of type `uN` or `iN` is `N`. The width of type `usize` or `isize` is the value of the `target_pointer_width` property. + +r[value.primitive.integer-range] +The range of an unsigned integer type of width `N` is between `0` and `1< [!NOTE] +> There are exactly `1<> (m*8)) as u8`, where `m` is between `0` and the size of `U`. None of the bytes produced by encoding an unsigned integer has a pointer fragment. + +> [!NOTE] +> The two primary byte orders are `little` endian, where the bytes are ordered from lowest memory address to highest, and `big` endian, where the bytes are ordered from highest memory address to lowest. +> The `cfg` predicate `target_endian` indicates the byte order + +> [!WARN] +> On `little` endian, the order of bytes used to decode an integer type is the same as the natural order of a `u8` array - that is, the `m` value corresponds with the `m` index into a same-sized `u8` array. On `big` endian, however, the order is the opposite of this order - that is, the `m` value corresponds with the `size_of::() - m` index in that array. + +r[value.primitive.signed-repr] +A value `i` of a signed integer type with width `N` is represented the same as the corresponding value of the unsigned counterpart type which is congruent modulo `2^N`. + +r[value.primitive.char] +Each value of type `char` is a Unicode Scalar Value, between `U+0000` and `U+10FFFF` (excluding the surrogate range `U+D800` through `U+DFFF`). + +r[value.primitive.char-repr] +The representation of type `char` is the same as the representation of the `u32` corresponding to the Code Point Number encoding by the `char`. + +r[value.primitive.bool] +The two values of type `bool` are `true` and `false`. The representation of `true` is an initialized byte with value `0x01`, and the representation of `false` is an initialized byte with value `0x00`. Neither value is represented with a pointer fragment. + +## Pointer Value + +r[value.pointer] + +r[value.pointer.thin] +Each thin pointer consists of an address and an optional provenance. The address refers to which byte the pointer points to. The provenance refers to which bytes the pointer is allowed to access, and the allocation those bytes are within. + +> [!NOTE] +> A pointer that does not have a provenance may be called an invalid or dangling pointer. + +r[value.pointer.thin-repr] +The representation of a value of a thin pointer is a sequence of initialized bytes with `u8` values given by the representation of its address as a value of type `usize`, and pointer fragments corresponding to its provenance, if present. + +r[value.pointer.thin-ref] +A thin reference to `T` consists of a non-null, well aligned address, and provenance for `size_of::()` bytes starting from that address. The representation of a thin reference to `T` is the same as the pointer with the same address and provenance. + +> [!NOTE] +> This is true for both shared and mutable references. There are additional constraints enforced by the aliasing model. + +r[value.pointer.fat] +A fat pointer or reference consists of a data pointer or reference, and a pointee-specific metadata value. + +r[value.pointer.fat-reference] +The data pointer of a fat reference has a non-null address, well aligned for `align_of_val(self)`, and with provenance for `size_of_val(self)` bytes. + +r[value.pointer.fat-representation] +A fat pointer or reference is represented the same as `struct FatPointer{data: *mut (), metadata: M}` where `M` is the pointee metadata type, and the `data` and `metadata` fields are the corresponding parts of the pointer. + +> [!NOTE] +> The `FatPointer` struct has no guarantees about layout, and has the default representation. + +r[value.pointer.fn] +A value of a function pointer type consists of an non-null address. A function pointer value is represented the same as an address represented as an unsigned integer type with the same width as the function pointer. + +> [!NOTE] +> Whether or not a function pointer value has provenance, and whether or not this provenance is represented as pointer fragments, is not yet decided. + +## Aggregate Values + +r[value.aggregate] + +r[value.aggregate.value-bytes] +A byte `b` in the representation of an aggregate is a value byte if there exists a field of that aggregate such that: +* The field has some type `T`, +* The offset of that field `o` is such that `b` falls at an offset in `o..(o+size_of::())`, +* Either `T` is a primitive type or the offset of `b` within the field is a value byte in the representation of `T`. + +> [!NOTE] +> A byte in a union is a value byte if it is a value byte in *any* field. + +r[value.aggregate.padding] +Every byte in an aggregate which is not a value byte is a padding byte. + +r[value.aggregate.struct] +A value of a struct type consists of the values of each of its fields. +The representation of such a struct contains the representation of the value of each field at its corresponding offset. + +r[value.aggregate.union] +A value of a union type consists of a sequence of bytes, corresponding to each value byte. The value bytes of a union are represented exactly. + +> [!NOTE] +> When a union value is constructed or a field is read/written to, the value of that field is encoded or decoded appropriately. + +r[value.aggregate.padding-uninit] +When a value of an aggregate is encoded, each padding byte is left as uninit + +> [!NOTE] +> It is valid for padding bytes to hold a value other than uninit when decoded, and these bytes are ignored when decoding an aggregate. + +r[value.aggregate.tuple-array] +The fields of a tuple or an array are the elements of that tuple or array. + +## Enum Values + +r[value.enum] + +r[value.enum.intro] +An enum value corresponds to exactly one variant of the enum, and consists of the fields of that variant + +> [!NOTE] +> An enum with no variants therefore has no values. + +r[value.enum.variant-padding] +A byte is a padding byte in a variant `V` if the byte is not used for computing the discriminant, and the byte would be a padding byte in a struct consisting of the fields of the variant at the same offsets. + +r[value.enum.value-padding] +A byte is a padding byte of an enum if it is a padding byte in each variant of the enum. A byte that is not a padding byte of an enum is a value byte. + +r[value.enum.repr] +The representation of a value of an enum type includes the representation of each field of the variant at the appropriate offsets. When encoding a value of an enum type, each byte which is a padding byte in the variant is set to uninit. In the case of a [`repr(C)`][layout.repr.c.adt] or a [primitive-repr][layout.repr.primitive.adt] enum, the discriminant of the variant is represented as though by the appropriate integer type stored at offset 0. + +> [!NOTE] +> Most `repr(Rust)` enums will also store a discriminant in the representation of the enum, but the exact placement or type of the discriminant is unspecified, as is the value that represents each variant. From 0e3d2efcf24709ccb04ed3c186d54f801c0d2344 Mon Sep 17 00:00:00 2001 From: Connor Horman Date: Fri, 25 Oct 2024 15:43:25 -0400 Subject: [PATCH 02/22] Specify representation of floating-point types --- src/values.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/src/values.md b/src/values.md index a6ca29dc1..78229f58a 100644 --- a/src/values.md +++ b/src/values.md @@ -80,6 +80,17 @@ The representation of type `char` is the same as the representation of the `u32` r[value.primitive.bool] The two values of type `bool` are `true` and `false`. The representation of `true` is an initialized byte with value `0x01`, and the representation of `false` is an initialized byte with value `0x00`. Neither value is represented with a pointer fragment. +r[value.primitive.float] +A floating-point value consists of either a rational number, which is within the range and precision dictated by the type, an infinity, or a NaN value. + +r[value.primitive.float-repr] +A floating-point value is represented the same as a value of the unsigned integer type with the same width given by its [IEEE 754-2019] encoding. + +r[value.primitive.float-format] +The [IEEE 754-2019] `binary32` format is used for `f32`, and the `binary64` format is used for `f64`. + +[IEEE 754-2019]: https://ieeexplore.ieee.org/document/8766229 + ## Pointer Value r[value.pointer] From c24fb7cd7c736eaeb6d27dfb844eb9e81631c25b Mon Sep 17 00:00:00 2001 From: Connor Horman Date: Fri, 25 Oct 2024 15:47:29 -0400 Subject: [PATCH 03/22] Fix lines must not end with spaces --- src/values.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/values.md b/src/values.md index 78229f58a..1d28f156f 100644 --- a/src/values.md +++ b/src/values.md @@ -84,7 +84,7 @@ r[value.primitive.float] A floating-point value consists of either a rational number, which is within the range and precision dictated by the type, an infinity, or a NaN value. r[value.primitive.float-repr] -A floating-point value is represented the same as a value of the unsigned integer type with the same width given by its [IEEE 754-2019] encoding. +A floating-point value is represented the same as a value of the unsigned integer type with the same width given by its [IEEE 754-2019] encoding. r[value.primitive.float-format] The [IEEE 754-2019] `binary32` format is used for `f32`, and the `binary64` format is used for `f64`. From ab0702b7a43c5c26abdbd9441fab0d2de9eec67f Mon Sep 17 00:00:00 2001 From: Connor Horman Date: Mon, 11 Nov 2024 11:47:54 -0500 Subject: [PATCH 04/22] Apply requested changes from PR Reviews --- src/SUMMARY.md | 2 +- src/values.md | 16 ++++++++-------- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 8513bcfc4..ace93715a 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -99,7 +99,7 @@ - [Type coercions](type-coercions.md) - [Destructors](destructors.md) - [Lifetime elision](lifetime-elision.md) - - [Values and Representation](values.md) + - [Values and representation](values.md) - [Special types and traits](special-types-and-traits.md) diff --git a/src/values.md b/src/values.md index 1d28f156f..0230d60e2 100644 --- a/src/values.md +++ b/src/values.md @@ -1,4 +1,4 @@ -# Values and Representation +# Values and representation r[value] @@ -110,17 +110,17 @@ A thin reference to `T` consists of a non-null, well aligned address, and proven > [!NOTE] > This is true for both shared and mutable references. There are additional constraints enforced by the aliasing model. -r[value.pointer.fat] -A fat pointer or reference consists of a data pointer or reference, and a pointee-specific metadata value. +r[value.pointer.wide] +A wide pointer or reference consists of a data pointer or reference, and a pointee-specific metadata value. -r[value.pointer.fat-reference] -The data pointer of a fat reference has a non-null address, well aligned for `align_of_val(self)`, and with provenance for `size_of_val(self)` bytes. +r[value.pointer.wide-reference] +The data pointer of a wide reference has a non-null address, well aligned for `align_of_val(self)`, and with provenance for `size_of_val(self)` bytes. -r[value.pointer.fat-representation] -A fat pointer or reference is represented the same as `struct FatPointer{data: *mut (), metadata: M}` where `M` is the pointee metadata type, and the `data` and `metadata` fields are the corresponding parts of the pointer. +r[value.pointer.wide-representation] +A wide pointer or reference is represented the same as `struct WidePointer{data: *mut (), metadata: M}` where `M` is the pointee metadata type, and the `data` and `metadata` fields are the corresponding parts of the pointer. > [!NOTE] -> The `FatPointer` struct has no guarantees about layout, and has the default representation. +> The `WidePointer` struct has no guarantees about layout, and has the default representation. r[value.pointer.fn] A value of a function pointer type consists of an non-null address. A function pointer value is represented the same as an address represented as an unsigned integer type with the same width as the function pointer. From c79edfbea8b457ea98f8a5a966162c12c14217b1 Mon Sep 17 00:00:00 2001 From: Connor Horman Date: Mon, 11 Nov 2024 11:51:12 -0500 Subject: [PATCH 05/22] Update src/values.md Co-authored-by: Ruby Lazuli --- src/values.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/values.md b/src/values.md index 0230d60e2..63386c910 100644 --- a/src/values.md +++ b/src/values.md @@ -7,7 +7,7 @@ r[value] r[value.byte] r[value.byte.intro] -The Most basic unit of Memory in Rust is a Byte. All values in Rust are computed from 0 or more bytes read from an allocation. +The most basic unit of memory in Rust is a byte. All values in Rust are computed from 0 or more bytes read from an allocation. > [!NOTE] > While bytes in Rust are typically lowered to hardware bytes, they may contain additional values, From 42f626f806b14605b4861df2981ba0a3d63b27ff Mon Sep 17 00:00:00 2001 From: Connor Horman Date: Mon, 11 Nov 2024 12:07:18 -0500 Subject: [PATCH 06/22] Add section giving a brief explainer of provenance. --- src/values.md | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/src/values.md b/src/values.md index 63386c910..72143a31b 100644 --- a/src/values.md +++ b/src/values.md @@ -42,6 +42,29 @@ The operation to encode or decode a value is determined by the representation of r[value.encoding.decode] If a value of type `T` is decoded from a sequence of bytes that does not correspond to a defined value, the behavior is undefined. If a value of type `T` is decoded from a sequence of bytes that contain pointer fragments, which are not used to represent the value, the pointer fragments are ignored. +## Pointer Provenance + +r[value.provenance] + +r[value.provenance.intro] +Pointer Provenance is a term that refers to additional data carried by pointer values in Rust, beyond its address. When stored in memory, Provenance is encoded in the Pointer Fragment part of each byte of the pointer. + +r[value.provenance.allocation] +Whenever a pointer to a particular allocation is produced by using the reference or raw reference operators, or when a pointer is returned from an allocation function, the resulting pointer has provenance that refers to that allocation. + +> [!NOTE] +> There is additional information encoded by provenance, but the exact scope of this information is not yet decided. + +r[value.provenance.dangling] +A pointer is dangling if it has no provenance, or if it has provenance to an allocation that has since been deallocated. An access, except for an access of size zero, using a dangling pointer, is undefined behavior. + +> [!NOTE] +> Allocations include local and static variables, as well as temporaries. Local Variables and Temporaries are deallocated when they go out of scope. + +> [!WARN] +> The above is necessary, but not sufficient, to avoid undefined behavior. The full requirements for pointer access is not yet decided. +> A reference obtained in safe code is guaranteed to be valid for its usable lifetime, unless interfered with by unsafe code. + ## Primitive Values r[value.primitive] From 66b7fc694c91de5251163114c7f3dfcdcddd8e06 Mon Sep 17 00:00:00 2001 From: Connor Horman Date: Mon, 11 Nov 2024 12:13:57 -0500 Subject: [PATCH 07/22] Fix "Line Must End with Spaces" --- src/values.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/values.md b/src/values.md index 72143a31b..d93a3b43b 100644 --- a/src/values.md +++ b/src/values.md @@ -59,7 +59,7 @@ r[value.provenance.dangling] A pointer is dangling if it has no provenance, or if it has provenance to an allocation that has since been deallocated. An access, except for an access of size zero, using a dangling pointer, is undefined behavior. > [!NOTE] -> Allocations include local and static variables, as well as temporaries. Local Variables and Temporaries are deallocated when they go out of scope. +> Allocations include local and static variables, as well as temporaries. Local Variables and Temporaries are deallocated when they go out of scope. > [!WARN] > The above is necessary, but not sufficient, to avoid undefined behavior. The full requirements for pointer access is not yet decided. From 290efc1ff70142ec272769892a5b2d24903bce04 Mon Sep 17 00:00:00 2001 From: Connor Horman Date: Thu, 21 Nov 2024 15:17:41 -0500 Subject: [PATCH 08/22] Move value definitions to appropriate chapters under `types`. --- src/memory-model.md | 46 ++++++++++- src/types/boolean.md | 7 +- src/types/function-pointer.md | 7 ++ src/types/numeric.md | 35 ++++++++ src/types/pointer.md | 62 +++++++++++--- src/types/textual.md | 11 ++- src/values.md | 149 ---------------------------------- 7 files changed, 148 insertions(+), 169 deletions(-) diff --git a/src/memory-model.md b/src/memory-model.md index 404240db8..adac052fe 100644 --- a/src/memory-model.md +++ b/src/memory-model.md @@ -1,5 +1,45 @@ # Memory model -Rust does not yet have a defined memory model. Various academics and industry professionals -are working on various proposals, but for now, this is an under-defined place -in the language. +r[memory] + +The Memory Model of Rust is incomplete and not fully decided. The following is some of the detail worked out so far. + +## Bytes + +r[memory.byte] + +r[memory.byte.intro] +The most basic unit of memory in Rust is a byte. All values in Rust are computed from 0 or more bytes read from an allocation. + +> [!NOTE] +> While bytes in Rust are typically lowered to hardware bytes, they may contain additional values, +> such as being uninitialized, or storing part of a pointer. + +r[memory.byte.init] +Each byte may be initialized, and contain a value of type `u8`, as well as an optional pointer fragment. When present, the pointer fragment carries [provenance][type.pointer.provenance] information. + +r[memory.byte.uninit] +Each byte may be uninitialized. + +> [!NOTE] +> Uninitialized bytes do not have a value and do not have a pointer fragment. + +## Value Encoding + +r[memory.encoding] + +r[memory.encoding.intro] +Each type in Rust has 0 or more values, which can have operations performed on them + +> [!NOTE] +> `0u8`, `1337i16`, and `Foo{bar: "baz"}` are all values + +r[memory.encoding.op] +Each value of a type can be encoded into a sequence of bytes, and decoded from a sequence of bytes, which has a length equal to the size of the type. +The operation to encode or decode a value is determined by the representation of the type. + +> [!NOTE] +> Representation is related to, but is not the same property as, the layout of the type. + +r[memory.encoding.decode] +If a value of type `T` is decoded from a sequence of bytes that does not correspond to a defined value, the behavior is undefined. If a value of type `T` is decoded from a sequence of bytes that contain pointer fragments, which are not used to represent the value, the pointer fragments are ignored. diff --git a/src/types/boolean.md b/src/types/boolean.md index 10c6e5de1..1ebc73588 100644 --- a/src/types/boolean.md +++ b/src/types/boolean.md @@ -21,9 +21,10 @@ r[type.bool.layout] An object with the boolean type has a [size and alignment] of 1 each. r[type.bool.repr] -The value false has the bit pattern `0x00` and the value true has the bit pattern -`0x01`. It is [undefined behavior] for an object with the boolean type to have -any other bit pattern. +A `bool` is represented as a single initialized byte with a value of `0x00` corresponding to `false` and a value of `0x01` corresponding to `true`. This byte does not have a pointer fragment. + +> [!NOTE] +> No other representations are valid for `bool`. Undefined Behaviour occurs when any other byte is read as type `bool`. r[type.bool.usage] The boolean type is the type of many operands in various [expressions]: diff --git a/src/types/function-pointer.md b/src/types/function-pointer.md index d7950b159..0d388f61a 100644 --- a/src/types/function-pointer.md +++ b/src/types/function-pointer.md @@ -62,6 +62,13 @@ let bo: Binop = add; x = bo(5,7); ``` +r[type.fn-pointer.value] +A value of a function pointer type consists of an non-null address. A function pointer value is represented the same as an address represented as an unsigned integer type with the same width as the function pointer. + +> [!NOTE] +> Whether or not a function pointer value has provenance, and whether or not this provenance is represented as pointer fragments, is not yet decided. + + ## Attributes on function pointer parameters r[type.fn-pointer.attributes] diff --git a/src/types/numeric.md b/src/types/numeric.md index 88178d123..f2bab4624 100644 --- a/src/types/numeric.md +++ b/src/types/numeric.md @@ -29,6 +29,7 @@ Type | Minimum | Maximum `i128` | -(2127) | 2127-1 + ## Floating-point types r[type.numeric.float] @@ -65,3 +66,37 @@ r[type.numeric.validity] For every numeric type, `T`, the bit validity of `T` is equivalent to the bit validity of `[u8; size_of::()]`. An uninitialized byte is not a valid `u8`. + +## Representation + +r[type.numeric.repr] + +r[type.numeric.repr.integer] +Each value of an integer type is a whole number. For unsigned types, this is a positive integer or `0`. For signed types, this can either be a positive integer, negative integer, or `0`. + +r[type.numeric.repr.integer-width] +The range of values an integer type can represent depends on its signedness and its width, in bits. The width of type `uN` or `iN` is `N`. The width of type `usize` or `isize` is the value of the `target_pointer_width` property. + +> [!NOTE] +> There are exactly `1<> (m*8)) as u8`, where `m` is between `0` and the size of `U`. None of the bytes produced by encoding an unsigned integer has a pointer fragment. + +> [!NOTE] +> The two primary byte orders are `little` endian, where the bytes are ordered from lowest memory address to highest, and `big` endian, where the bytes are ordered from highest memory address to lowest. +> The `cfg` predicate `target_endian` indicates the byte order + +> [!WARN] +> On `little` endian, the order of bytes used to decode an integer type is the same as the natural order of a `u8` array - that is, the `m` value corresponds with the `m` index into a same-sized `u8` array. On `big` endian, however, the order is the opposite of this order - that is, the `m` value corresponds with the `size_of::() - m` index in that array. + +r[type.numeric.repr.signed] +A value `i` of a signed integer type with width `N` is represented the same as the corresponding value of the unsigned counterpart type which is congruent modulo `2^N`. + +r[type.numeric.repr.float] +A floating-point value is represented the same as a value of the unsigned integer type with the same width given by its [IEEE 754-2019] encoding. + +r[type.numeric.repr.float-format] +The [IEEE 754-2019] `binary32` format is used for `f32`, and the `binary64` format is used for `f64`. + +[IEEE 754-2019]: https://ieeexplore.ieee.org/document/8766229 \ No newline at end of file diff --git a/src/types/pointer.md b/src/types/pointer.md index 0f24d6bce..1992f550f 100644 --- a/src/types/pointer.md +++ b/src/types/pointer.md @@ -81,19 +81,61 @@ r[type.pointer.smart] The standard library contains additional 'smart pointer' types beyond references and raw pointers. -## Bit validity +## Pointer values and representation -r[type.pointer.validity] +r[type.pointer.value] -r[type.pointer.validity.pointer-fragment] -Despite pointers and references being similar to `usize`s in the machine code emitted on most platforms, -the semantics of transmuting a reference or pointer type to a non-pointer type is currently undecided. -Thus, it may not be valid to transmute a pointer or reference type, `P`, to a `[u8; size_of::

()]`. +r[type.pointer.value.thin] +Each thin pointer consists of an address and an optional [provenance][type.pointer.provenance]. The address refers to which byte the pointer points to. The provenance refers to which bytes the pointer is allowed to access, and the allocation those bytes are within. + +> [!NOTE] +> A pointer that does not have a provenance may be called an invalid or dangling pointer. + +r[type.pointer.value.thin-repr] +The representation of a value of a thin pointer is a sequence of initialized bytes with `u8` values given by the representation of its address as a value of type `usize`, and pointer fragments corresponding to its provenance, if present. + +r[type.pointer.value.thin-ref] +A thin reference to `T` consists of a non-null, well aligned address, and provenance for `size_of::()` bytes starting from that address. The representation of a thin reference to `T` is the same as the pointer with the same address and provenance. + +> [!NOTE] +> This is true for both shared and mutable references. There are additional constraints enforced by the aliasing model that are not yet fully decided. + +r[type.pointer.value.wide] +A wide pointer or reference consists of a data pointer or reference, and a pointee-specific metadata value. + +r[type.pointer.value.wide-reference] +The data pointer of a wide reference has a non-null address, well aligned for `align_of_val(self)`, and with provenance for `size_of_val(self)` bytes. + +r[type.pointer.value.wide-representation] +A wide pointer or reference is represented the same as `struct WidePointer{data: *mut (), metadata: M}` where `M` is the pointee metadata type, and the `data` and `metadata` fields are the corresponding parts of the pointer. + +> [!NOTE] +> The `WidePointer` struct has no guarantees about layout, and has the default representation. + + +## Pointer Provenance + +r[type.pointer.provenance] + +r[type.pointer.provenance.intro] +Pointer Provenance is a term that refers to additional data carried by pointer values in Rust, beyond its address. When stored in memory, Provenance is encoded in the Pointer Fragment part of each byte of the pointer. + +r[type.pointer.provenance.allocation] +Whenever a pointer to a particular allocation is produced by using the reference or raw reference operators, or when a pointer is returned from an allocation function, the resulting pointer has provenance that refers to that allocation. + +> [!NOTE] +> There is additional information encoded by provenance, but the exact scope of this information is not yet decided. + +r[type.pointer.provenance.dangling] +A pointer is dangling if it has no provenance, or if it has provenance to an allocation that has since been deallocated. An access, except for an access of size zero, using a dangling pointer, is undefined behavior. + +> [!NOTE] +> Allocations include local and static variables, as well as temporaries. Local Variables and Temporaries are deallocated when they go out of scope. + +> [!WARN] +> The above is necessary, but not sufficient, to avoid undefined behavior. The full requirements for pointer access is not yet decided. +> A reference obtained in safe code is guaranteed to be valid for its usable lifetime, unless interfered with by unsafe code. -r[type.pointer.validity.raw] -For thin raw pointers (i.e., for `P = *const T` or `P = *mut T` for `T: Sized`), -the inverse direction (transmuting from an integer or array of integers to `P`) is always valid. -However, the pointer produced via such a transmutation may not be dereferenced (not even if `T` has size zero). [Interior mutability]: ../interior-mutability.md [_Lifetime_]: ../trait-bounds.md diff --git a/src/types/textual.md b/src/types/textual.md index 294c791fd..1348746b5 100644 --- a/src/types/textual.md +++ b/src/types/textual.md @@ -10,10 +10,13 @@ A value of type `char` is a [Unicode scalar value] (i.e. a code point that is not a surrogate), represented as a 32-bit unsigned word in the 0x0000 to 0xD7FF or 0xE000 to 0x10FFFF range. -r[type.text.char-precondition] -It is immediate [undefined behavior] to create a -`char` that falls outside this range. A `[char]` is effectively a UCS-4 / UTF-32 -string of length 1. +> [!NOTE] +> It is immediate [undefined behavior] to create a +> `char` that falls outside this range. A `[char]` is effectively a UCS-4 / UTF-32 +> string of length 1. + +r[type.text.char-repr] +A value of type `char` is represented as the value of type `u32` with value equal to the code point that it represents. r[type.text.str-value] A value of type `str` is represented the same way as `[u8]`, a slice of diff --git a/src/values.md b/src/values.md index d93a3b43b..3af8c602b 100644 --- a/src/values.md +++ b/src/values.md @@ -2,155 +2,6 @@ r[value] -## Bytes - -r[value.byte] - -r[value.byte.intro] -The most basic unit of memory in Rust is a byte. All values in Rust are computed from 0 or more bytes read from an allocation. - -> [!NOTE] -> While bytes in Rust are typically lowered to hardware bytes, they may contain additional values, -> such as being uninitialized, or storing part of a pointer. - -r[value.byte.init] -Each byte may be initialized, and contain a value of type `u8`, as well as an optional pointer fragment. - -r[value.byte.uninit] -Each byte may be uninitialized. - -> [!NOTE] -> Uninitialized bytes do not have a value and do not have a pointer fragment. - -## Value Encoding - -r[value.encoding] - -r[value.encoding.intro] -Each type in Rust has 0 or more values, which can have operations performed on them - -> [!NOTE] -> `0u8`, `1337i16`, and `Foo{bar: "baz"}` are all values - -r[value.encoding.op] -Each value of a type can be encoded into a sequence of bytes, and decoded from a sequence of bytes, which has a length equal to the size of the type. -The operation to encode or decode a value is determined by the representation of the type. - -> [!NOTE] -> Representation is related to, but is not the same property as, the layout of the type. - -r[value.encoding.decode] -If a value of type `T` is decoded from a sequence of bytes that does not correspond to a defined value, the behavior is undefined. If a value of type `T` is decoded from a sequence of bytes that contain pointer fragments, which are not used to represent the value, the pointer fragments are ignored. - -## Pointer Provenance - -r[value.provenance] - -r[value.provenance.intro] -Pointer Provenance is a term that refers to additional data carried by pointer values in Rust, beyond its address. When stored in memory, Provenance is encoded in the Pointer Fragment part of each byte of the pointer. - -r[value.provenance.allocation] -Whenever a pointer to a particular allocation is produced by using the reference or raw reference operators, or when a pointer is returned from an allocation function, the resulting pointer has provenance that refers to that allocation. - -> [!NOTE] -> There is additional information encoded by provenance, but the exact scope of this information is not yet decided. - -r[value.provenance.dangling] -A pointer is dangling if it has no provenance, or if it has provenance to an allocation that has since been deallocated. An access, except for an access of size zero, using a dangling pointer, is undefined behavior. - -> [!NOTE] -> Allocations include local and static variables, as well as temporaries. Local Variables and Temporaries are deallocated when they go out of scope. - -> [!WARN] -> The above is necessary, but not sufficient, to avoid undefined behavior. The full requirements for pointer access is not yet decided. -> A reference obtained in safe code is guaranteed to be valid for its usable lifetime, unless interfered with by unsafe code. - -## Primitive Values - -r[value.primitive] - -r[value.primitive.integer] -Each value of an integer type is a whole number. For unsigned types, this is a positive integer or `0`. For signed types, this can either be a positive integer, negative integer, or `0`. - -r[value.primtive.integer-width] -The range of values an integer type can represent depends on its signedness and its width, in bits. The width of type `uN` or `iN` is `N`. The width of type `usize` or `isize` is the value of the `target_pointer_width` property. - -r[value.primitive.integer-range] -The range of an unsigned integer type of width `N` is between `0` and `1< [!NOTE] -> There are exactly `1<> (m*8)) as u8`, where `m` is between `0` and the size of `U`. None of the bytes produced by encoding an unsigned integer has a pointer fragment. - -> [!NOTE] -> The two primary byte orders are `little` endian, where the bytes are ordered from lowest memory address to highest, and `big` endian, where the bytes are ordered from highest memory address to lowest. -> The `cfg` predicate `target_endian` indicates the byte order - -> [!WARN] -> On `little` endian, the order of bytes used to decode an integer type is the same as the natural order of a `u8` array - that is, the `m` value corresponds with the `m` index into a same-sized `u8` array. On `big` endian, however, the order is the opposite of this order - that is, the `m` value corresponds with the `size_of::() - m` index in that array. - -r[value.primitive.signed-repr] -A value `i` of a signed integer type with width `N` is represented the same as the corresponding value of the unsigned counterpart type which is congruent modulo `2^N`. - -r[value.primitive.char] -Each value of type `char` is a Unicode Scalar Value, between `U+0000` and `U+10FFFF` (excluding the surrogate range `U+D800` through `U+DFFF`). - -r[value.primitive.char-repr] -The representation of type `char` is the same as the representation of the `u32` corresponding to the Code Point Number encoding by the `char`. - -r[value.primitive.bool] -The two values of type `bool` are `true` and `false`. The representation of `true` is an initialized byte with value `0x01`, and the representation of `false` is an initialized byte with value `0x00`. Neither value is represented with a pointer fragment. - -r[value.primitive.float] -A floating-point value consists of either a rational number, which is within the range and precision dictated by the type, an infinity, or a NaN value. - -r[value.primitive.float-repr] -A floating-point value is represented the same as a value of the unsigned integer type with the same width given by its [IEEE 754-2019] encoding. - -r[value.primitive.float-format] -The [IEEE 754-2019] `binary32` format is used for `f32`, and the `binary64` format is used for `f64`. - -[IEEE 754-2019]: https://ieeexplore.ieee.org/document/8766229 - -## Pointer Value - -r[value.pointer] - -r[value.pointer.thin] -Each thin pointer consists of an address and an optional provenance. The address refers to which byte the pointer points to. The provenance refers to which bytes the pointer is allowed to access, and the allocation those bytes are within. - -> [!NOTE] -> A pointer that does not have a provenance may be called an invalid or dangling pointer. - -r[value.pointer.thin-repr] -The representation of a value of a thin pointer is a sequence of initialized bytes with `u8` values given by the representation of its address as a value of type `usize`, and pointer fragments corresponding to its provenance, if present. - -r[value.pointer.thin-ref] -A thin reference to `T` consists of a non-null, well aligned address, and provenance for `size_of::()` bytes starting from that address. The representation of a thin reference to `T` is the same as the pointer with the same address and provenance. - -> [!NOTE] -> This is true for both shared and mutable references. There are additional constraints enforced by the aliasing model. - -r[value.pointer.wide] -A wide pointer or reference consists of a data pointer or reference, and a pointee-specific metadata value. - -r[value.pointer.wide-reference] -The data pointer of a wide reference has a non-null address, well aligned for `align_of_val(self)`, and with provenance for `size_of_val(self)` bytes. - -r[value.pointer.wide-representation] -A wide pointer or reference is represented the same as `struct WidePointer{data: *mut (), metadata: M}` where `M` is the pointee metadata type, and the `data` and `metadata` fields are the corresponding parts of the pointer. - -> [!NOTE] -> The `WidePointer` struct has no guarantees about layout, and has the default representation. - -r[value.pointer.fn] -A value of a function pointer type consists of an non-null address. A function pointer value is represented the same as an address represented as an unsigned integer type with the same width as the function pointer. - -> [!NOTE] -> Whether or not a function pointer value has provenance, and whether or not this provenance is represented as pointer fragments, is not yet decided. - ## Aggregate Values r[value.aggregate] From d565328d5c5659bd971c2338317f45eaa5c1378b Mon Sep 17 00:00:00 2001 From: Connor Horman Date: Thu, 21 Nov 2024 15:19:07 -0500 Subject: [PATCH 09/22] Remove double line break issues --- src/types/function-pointer.md | 1 - src/types/numeric.md | 1 - src/types/pointer.md | 2 -- 3 files changed, 4 deletions(-) diff --git a/src/types/function-pointer.md b/src/types/function-pointer.md index 0d388f61a..466167650 100644 --- a/src/types/function-pointer.md +++ b/src/types/function-pointer.md @@ -68,7 +68,6 @@ A value of a function pointer type consists of an non-null address. A function p > [!NOTE] > Whether or not a function pointer value has provenance, and whether or not this provenance is represented as pointer fragments, is not yet decided. - ## Attributes on function pointer parameters r[type.fn-pointer.attributes] diff --git a/src/types/numeric.md b/src/types/numeric.md index f2bab4624..08a6808e9 100644 --- a/src/types/numeric.md +++ b/src/types/numeric.md @@ -29,7 +29,6 @@ Type | Minimum | Maximum `i128` | -(2127) | 2127-1 - ## Floating-point types r[type.numeric.float] diff --git a/src/types/pointer.md b/src/types/pointer.md index 1992f550f..1cda4a404 100644 --- a/src/types/pointer.md +++ b/src/types/pointer.md @@ -112,7 +112,6 @@ A wide pointer or reference is represented the same as `struct WidePointer{da > [!NOTE] > The `WidePointer` struct has no guarantees about layout, and has the default representation. - ## Pointer Provenance r[type.pointer.provenance] @@ -136,7 +135,6 @@ A pointer is dangling if it has no provenance, or if it has provenance to an all > The above is necessary, but not sufficient, to avoid undefined behavior. The full requirements for pointer access is not yet decided. > A reference obtained in safe code is guaranteed to be valid for its usable lifetime, unless interfered with by unsafe code. - [Interior mutability]: ../interior-mutability.md [_Lifetime_]: ../trait-bounds.md [_TypeNoBounds_]: ../types.md#type-expressions From b2e8c301784b5c557f79ef9e3e573a29a50c46a2 Mon Sep 17 00:00:00 2001 From: Connor Horman Date: Thu, 21 Nov 2024 15:20:24 -0500 Subject: [PATCH 10/22] Fix "File must end with a newline" --- src/types/numeric.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/types/numeric.md b/src/types/numeric.md index 08a6808e9..45edd95f5 100644 --- a/src/types/numeric.md +++ b/src/types/numeric.md @@ -98,4 +98,4 @@ A floating-point value is represented the same as a value of the unsigned intege r[type.numeric.repr.float-format] The [IEEE 754-2019] `binary32` format is used for `f32`, and the `binary64` format is used for `f64`. -[IEEE 754-2019]: https://ieeexplore.ieee.org/document/8766229 \ No newline at end of file +[IEEE 754-2019]: https://ieeexplore.ieee.org/document/8766229 From e634550de7cdc5a1d7dc0ae2a297304457eb45b9 Mon Sep 17 00:00:00 2001 From: Connor Horman Date: Thu, 21 Nov 2024 21:26:15 -0500 Subject: [PATCH 11/22] Move aggregate values into appropriate chapters --- src/SUMMARY.md | 1 - src/types/array.md | 3 +++ src/types/enum.md | 26 +++++++++++++++++--- src/types/struct.md | 26 ++++++++++++++++++++ src/types/tuple.md | 3 +++ src/types/union.md | 6 +++++ src/values.md | 60 --------------------------------------------- 7 files changed, 60 insertions(+), 65 deletions(-) delete mode 100644 src/values.md diff --git a/src/SUMMARY.md b/src/SUMMARY.md index ace93715a..91f343b8d 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -99,7 +99,6 @@ - [Type coercions](type-coercions.md) - [Destructors](destructors.md) - [Lifetime elision](lifetime-elision.md) - - [Values and representation](values.md) - [Special types and traits](special-types-and-traits.md) diff --git a/src/types/array.md b/src/types/array.md index ef54af1f3..160fd6dca 100644 --- a/src/types/array.md +++ b/src/types/array.md @@ -31,6 +31,9 @@ always bounds-checked in safe methods and operators. > Note: The [`Vec`] standard library type provides a heap-allocated resizable > array type. +r[type.array.repr] +The values and representation of a tuple type are the same as a [struct type][type.struct.value] with `N` fields of type `T` corresponding to each index in order, where the fields are layed out according to the [`C` representation][layout.repr.c]. + [_Expression_]: ../expressions.md [_Type_]: ../types.md#type-expressions [`usize`]: numeric.md#machine-dependent-integer-types diff --git a/src/types/enum.md b/src/types/enum.md index a3ae2878f..f0b50831f 100644 --- a/src/types/enum.md +++ b/src/types/enum.md @@ -14,14 +14,32 @@ unit-like struct. r[type.enum.constructor] New instances of an `enum` can be constructed with a [struct expression]. -r[type.enum.value] -Any `enum` value consumes as much memory as the largest variant for its -corresponding `enum` type, as well as the size needed to store a discriminant. - r[type.enum.name] Enum types cannot be denoted *structurally* as types, but must be denoted by named reference to an [`enum` item]. +## Enum values and representation + +r[type.enum.value] + +r[type.enum.value.intro] +An enum value corresponds to exactly one variant of the enum, and consists of the fields of that variant + +> [!NOTE] +> An enum with no variants therefore has no values. + +r[type.enum.value.variant-padding] +A byte is a padding byte in a variant `V` if the byte is not used for computing the discriminant, and the byte would be a padding byte in a struct consisting of the fields of the variant at the same offsets. + +r[type.enum.value.value-padding] +A byte is a padding byte of an enum if it is a padding byte in each variant of the enum. A byte that is not a padding byte of an enum is a value byte. + +r[type.enum.value.repr] +The representation of a value of an enum type includes the representation of each field of the variant at the appropriate offsets. When encoding a value of an enum type, each byte which is a padding byte in the variant is set to uninit. In the case of a [`repr(C)`][layout.repr.c.adt] or a [primitive-repr][layout.repr.primitive.adt] enum, the discriminant of the variant is represented as though by the appropriate integer type stored at offset 0. + +> [!NOTE] +> Most `repr(Rust)` enums will also store a discriminant in the representation of the enum, but the exact placement or type of the discriminant is unspecified, as is the value that represents each variant. + [^enumtype]: The `enum` type is analogous to a `data` constructor declaration in Haskell, or a *pick ADT* in Limbo. diff --git a/src/types/struct.md b/src/types/struct.md index 6a672f7af..54060fb58 100644 --- a/src/types/struct.md +++ b/src/types/struct.md @@ -29,6 +29,32 @@ A _unit-like struct_ type is like a struct type, except that it has no fields. The one value constructed by the associated [struct expression] is the only value that inhabits such a type. +## Struct and aggregate values + +r[type.struct.value] + +r[type.struct.value.value-bytes] +A byte `b` in the representation of an aggregate is a value byte if there exists a field of that aggregate such that: +* The field has some type `T`, +* The offset of that field `o` is such that `b` falls at an offset in `o..(o+size_of::())`, +* Either `T` is a primitive type or the offset of `b` within the field is a value byte in the representation of `T`. + +> [!NOTE] +> A byte in a union is a value byte if it is a value byte in *any* field. + +r[type.struct.value.padding] +Every byte in an aggregate which is not a value byte is a padding byte. + +r[type.struct.value.struct] +A value of a struct type consists of the values of each of its fields. +The representation of such a struct contains the representation of the value of each field at its corresponding offset. + +r[type.struct.value.padding-uninit] +When a value of an aggregate is encoded, each padding byte is left as uninit + +> [!NOTE] +> It is valid for padding bytes to hold a value other than uninit when decoded, and these bytes are ignored when decoding an aggregate. + [^structtype]: `struct` types are analogous to `struct` types in C, the *record* types of the ML family, or the *struct* types of the Lisp family. diff --git a/src/types/tuple.md b/src/types/tuple.md index 073fbd193..6beb4ad5e 100644 --- a/src/types/tuple.md +++ b/src/types/tuple.md @@ -49,6 +49,9 @@ Furthermore, various expressions will produce the unit value if there is no othe r[type.tuple.access] Tuple fields can be accessed by either a [tuple index expression] or [pattern matching]. +r[type.tuple.repr] +The values and representation of a tuple type are the same as a [struct type][type.struct.value] with the same fields and layout. + [^1]: Structural types are always equivalent if their internal types are equivalent. For a nominal version of tuples, see [tuple structs]. diff --git a/src/types/union.md b/src/types/union.md index c8801ee2f..08d78dfd8 100644 --- a/src/types/union.md +++ b/src/types/union.md @@ -24,5 +24,11 @@ The memory layout of a `union` is undefined by default (in particular, fields do *not* have to be at offset 0), but the `#[repr(...)]` attribute can be used to fix a layout. +r[type.union.value] +A value of a union type consists of a sequence of bytes, corresponding to each [value byte][type.struct.value.value-bytes]. The value bytes of a union are represented exactly. Each [padding byte][type.struct.value.padding] is set to uninit. + +> [!NOTE] +> When a union value is constructed or a field is read/written to, the value of that field is encoded or decoded appropriately. + [`Copy`]: ../special-types-and-traits.md#copy [item]: ../items/unions.md diff --git a/src/values.md b/src/values.md deleted file mode 100644 index 3af8c602b..000000000 --- a/src/values.md +++ /dev/null @@ -1,60 +0,0 @@ -# Values and representation - -r[value] - -## Aggregate Values - -r[value.aggregate] - -r[value.aggregate.value-bytes] -A byte `b` in the representation of an aggregate is a value byte if there exists a field of that aggregate such that: -* The field has some type `T`, -* The offset of that field `o` is such that `b` falls at an offset in `o..(o+size_of::())`, -* Either `T` is a primitive type or the offset of `b` within the field is a value byte in the representation of `T`. - -> [!NOTE] -> A byte in a union is a value byte if it is a value byte in *any* field. - -r[value.aggregate.padding] -Every byte in an aggregate which is not a value byte is a padding byte. - -r[value.aggregate.struct] -A value of a struct type consists of the values of each of its fields. -The representation of such a struct contains the representation of the value of each field at its corresponding offset. - -r[value.aggregate.union] -A value of a union type consists of a sequence of bytes, corresponding to each value byte. The value bytes of a union are represented exactly. - -> [!NOTE] -> When a union value is constructed or a field is read/written to, the value of that field is encoded or decoded appropriately. - -r[value.aggregate.padding-uninit] -When a value of an aggregate is encoded, each padding byte is left as uninit - -> [!NOTE] -> It is valid for padding bytes to hold a value other than uninit when decoded, and these bytes are ignored when decoding an aggregate. - -r[value.aggregate.tuple-array] -The fields of a tuple or an array are the elements of that tuple or array. - -## Enum Values - -r[value.enum] - -r[value.enum.intro] -An enum value corresponds to exactly one variant of the enum, and consists of the fields of that variant - -> [!NOTE] -> An enum with no variants therefore has no values. - -r[value.enum.variant-padding] -A byte is a padding byte in a variant `V` if the byte is not used for computing the discriminant, and the byte would be a padding byte in a struct consisting of the fields of the variant at the same offsets. - -r[value.enum.value-padding] -A byte is a padding byte of an enum if it is a padding byte in each variant of the enum. A byte that is not a padding byte of an enum is a value byte. - -r[value.enum.repr] -The representation of a value of an enum type includes the representation of each field of the variant at the appropriate offsets. When encoding a value of an enum type, each byte which is a padding byte in the variant is set to uninit. In the case of a [`repr(C)`][layout.repr.c.adt] or a [primitive-repr][layout.repr.primitive.adt] enum, the discriminant of the variant is represented as though by the appropriate integer type stored at offset 0. - -> [!NOTE] -> Most `repr(Rust)` enums will also store a discriminant in the representation of the enum, but the exact placement or type of the discriminant is unspecified, as is the value that represents each variant. From 98d4d8ec3c9ae081fc7a61e0b31b4e91ba76ab0c Mon Sep 17 00:00:00 2001 From: Connor Horman Date: Fri, 22 Nov 2024 13:06:07 -0500 Subject: [PATCH 12/22] Add note about producing `!` at runtime being UB --- src/types/never.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/types/never.md b/src/types/never.md index 702281db2..6897a0bad 100644 --- a/src/types/never.md +++ b/src/types/never.md @@ -10,6 +10,9 @@ r[type.never.intro] The never type `!` is a type with no values, representing the result of computations that never complete. +> [!NOTE] +> Because `!` has no values, reading it from memory (or otherwise producing a value of the type at runtime) is immediate undefined behaviour. + r[type.never.coercion] Expressions of type `!` can be coerced into any other type. From 97778c30e5128810289c7bd2e5ed7cf4af97f0e6 Mon Sep 17 00:00:00 2001 From: Connor Horman Date: Fri, 22 Nov 2024 13:14:24 -0500 Subject: [PATCH 13/22] Redefine array layout directly --- src/types/array.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/types/array.md b/src/types/array.md index 160fd6dca..27e25d4ac 100644 --- a/src/types/array.md +++ b/src/types/array.md @@ -32,7 +32,7 @@ always bounds-checked in safe methods and operators. > array type. r[type.array.repr] -The values and representation of a tuple type are the same as a [struct type][type.struct.value] with `N` fields of type `T` corresponding to each index in order, where the fields are layed out according to the [`C` representation][layout.repr.c]. +An array value is represented by each element in ascending index order, placed immediately adjacent in memory. [_Expression_]: ../expressions.md [_Type_]: ../types.md#type-expressions From 695d5170be89648c2255aafd76721e636f80ffdd Mon Sep 17 00:00:00 2001 From: Connor Horman Date: Fri, 22 Nov 2024 13:30:06 -0500 Subject: [PATCH 14/22] Remove redundant sections on bit validity --- src/types/boolean.md | 8 -------- src/types/numeric.md | 7 ------- src/types/textual.md | 12 ------------ 3 files changed, 27 deletions(-) diff --git a/src/types/boolean.md b/src/types/boolean.md index 1ebc73588..218983a2a 100644 --- a/src/types/boolean.md +++ b/src/types/boolean.md @@ -127,14 +127,6 @@ r[type.bool.expr.cmp.less] r[type.bool.expr.cmp.less-eq] * `a <= b` is the same as `a == b | a < b` -## Bit validity - -r[type.bool.validity] - -The single byte of a `bool` is guaranteed to be initialized (in other words, -`transmute::(...)` is always sound -- but since some bit patterns -are invalid `bool`s, the inverse is not always sound). - [boolean logic]: https://en.wikipedia.org/wiki/Boolean_algebra [enumerated type]: enum.md [expressions]: ../expressions.md diff --git a/src/types/numeric.md b/src/types/numeric.md index 45edd95f5..e597df701 100644 --- a/src/types/numeric.md +++ b/src/types/numeric.md @@ -59,13 +59,6 @@ r[type.numeric.int.size.minimum] > pointer support is limited and may require explicit care and acknowledgment > from a library to support. -## Bit validity - -r[type.numeric.validity] - -For every numeric type, `T`, the bit validity of `T` is equivalent to the bit -validity of `[u8; size_of::()]`. An uninitialized byte is not a valid `u8`. - ## Representation r[type.numeric.repr] diff --git a/src/types/textual.md b/src/types/textual.md index 1348746b5..85504d748 100644 --- a/src/types/textual.md +++ b/src/types/textual.md @@ -29,18 +29,6 @@ r[type.text.str-unsized] Since `str` is a [dynamically sized type], it can only be instantiated through a pointer type, such as `&str`. -## Layout and bit validity - -r[type.text.layout] - -r[type.layout.char-layout] -`char` is guaranteed to have the same size and alignment as `u32` on all platforms. - -r[type.layout.char-validity] -Every byte of a `char` is guaranteed to be initialized (in other words, -`transmute::()]>(...)` is always sound -- but since -some bit patterns are invalid `char`s, the inverse is not always sound). - [Unicode scalar value]: http://www.unicode.org/glossary/#unicode_scalar_value [undefined behavior]: ../behavior-considered-undefined.md [dynamically sized type]: ../dynamically-sized-types.md From 5bc13a0eb24d8771feda0bf4124f3dd3e50ec344 Mon Sep 17 00:00:00 2001 From: Connor Horman Date: Fri, 22 Nov 2024 13:30:59 -0500 Subject: [PATCH 15/22] Elaborate on how union constructors produce union values --- src/types/union.md | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/src/types/union.md b/src/types/union.md index 08d78dfd8..dbc71ad22 100644 --- a/src/types/union.md +++ b/src/types/union.md @@ -25,10 +25,16 @@ The memory layout of a `union` is undefined by default (in particular, fields do fix a layout. r[type.union.value] -A value of a union type consists of a sequence of bytes, corresponding to each [value byte][type.struct.value.value-bytes]. The value bytes of a union are represented exactly. Each [padding byte][type.struct.value.padding] is set to uninit. +A value of a union type consists of a sequence of bytes, corresponding to each [value byte][type.struct.value.value-bytes]. The value bytes of a union are represented exactly. Each [padding byte][type.struct.value.padding] is set to uninit when encoded. > [!NOTE] -> When a union value is constructed or a field is read/written to, the value of that field is encoded or decoded appropriately. +> A given value byte is guaranteed allowed to be uninit if it is padding in any field, recursively expanding union fields. Whether a byte of a union is allowed to be uninit in any other case is not yet decided. + +r[type.union.constructor] +The constructor of a union type encodes the initialized field value into the corresponding bytes of the union, and sets all bytes that are not used by the field to uninit. + +r[type.union.field-access] +When a field is written to by a field access expression, the value written is encoded into the corresponding bytes of the union. When a field is read, the value of that field is decoded from the corresponding bytes. [`Copy`]: ../special-types-and-traits.md#copy [item]: ../items/unions.md From 082331369dc68abd05a0b7ba0d55e48df0f72d62 Mon Sep 17 00:00:00 2001 From: Connor Horman Date: Tue, 10 Dec 2024 11:36:48 -0500 Subject: [PATCH 16/22] Change definition of memory.encoding in response to PR comments --- src/memory-model.md | 33 +++++++++++++++++++++++++++------ 1 file changed, 27 insertions(+), 6 deletions(-) diff --git a/src/memory-model.md b/src/memory-model.md index adac052fe..4f7c9f1e1 100644 --- a/src/memory-model.md +++ b/src/memory-model.md @@ -15,31 +15,52 @@ The most basic unit of memory in Rust is a byte. All values in Rust are computed > While bytes in Rust are typically lowered to hardware bytes, they may contain additional values, > such as being uninitialized, or storing part of a pointer. +r[memory.byte.contents] +Each byte may have one of the following values: + r[memory.byte.init] -Each byte may be initialized, and contain a value of type `u8`, as well as an optional pointer fragment. When present, the pointer fragment carries [provenance][type.pointer.provenance] information. +* An initialized byte containing a `u8` value and optional [provenance][type.pointer.provenance], r[memory.byte.uninit] -Each byte may be uninitialized. +* An uninitialized byte. > [!NOTE] > Uninitialized bytes do not have a value and do not have a pointer fragment. +> [!NOTE] +> The above list is not yet guaranteed to be exhaustive. + ## Value Encoding r[memory.encoding] r[memory.encoding.intro] -Each type in Rust has 0 or more values, which can have operations performed on them +Each type in Rust has 0 or more values, which can have operations performed on them. Values are represented in memory by encoding them > [!NOTE] > `0u8`, `1337i16`, and `Foo{bar: "baz"}` are all values r[memory.encoding.op] -Each value of a type can be encoded into a sequence of bytes, and decoded from a sequence of bytes, which has a length equal to the size of the type. -The operation to encode or decode a value is determined by the representation of the type. +Each type defines a pair of properties which, together, define the representation of values of the type. The *encode* operation takes a value of the type and converts it into a sequence of bytes equal in length to the size of the type, and the *decode* operation takes such a sequence of bytes and optionally converts it into a value. Encoding occurs when a value is written to memory, and decoding occurs when a value is read from memory. + +> [!NOTE] +> Only certain byte sequences may decode into a value of a given type. For example, a byte sequence consisting of all zeroes does not decode to a value of a reference type. + +r[memory.encoding.representation] +A sequence of bytes is said to represent a value of a type, if the decode operation for that type produces that value from that sequence of bytes. The representation of a type is the partial relation between byte sequences and values those sequences represent. > [!NOTE] > Representation is related to, but is not the same property as, the layout of the type. +r[memory.encoding.symmetric] +The result of encoding a given value of a type is a sequence of bytes that represents that value. + +> [!NOTE] +> This means that a value can be copied into memory and copied out and the result is the same value. +> The reverse is not necessarily true, a sequence of bytes read as a value then written to another location (called a typed copy) will not necessarily yield the same sequence of bytes. For example, a typed copy of a struct type will leave the padding bytes of that struct uninitialized. + r[memory.encoding.decode] -If a value of type `T` is decoded from a sequence of bytes that does not correspond to a defined value, the behavior is undefined. If a value of type `T` is decoded from a sequence of bytes that contain pointer fragments, which are not used to represent the value, the pointer fragments are ignored. +If a value of type `T` is decoded from a sequence of bytes that does not represent any value, the behavior is undefined. + +> [!NOTE] +> For example, it is undefined behavior to read a `0x02` byte as `bool`. From 2cbdb587d7ab780156d22732aafd869854baae19 Mon Sep 17 00:00:00 2001 From: Connor Horman Date: Tue, 10 Dec 2024 20:52:28 -0500 Subject: [PATCH 17/22] Fix "Line must not end with spaces" for the 3rd time this PR --- src/memory-model.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/memory-model.md b/src/memory-model.md index 4f7c9f1e1..4b4fb69da 100644 --- a/src/memory-model.md +++ b/src/memory-model.md @@ -35,7 +35,7 @@ r[memory.byte.uninit] r[memory.encoding] r[memory.encoding.intro] -Each type in Rust has 0 or more values, which can have operations performed on them. Values are represented in memory by encoding them +Each type in Rust has 0 or more values, which can have operations performed on them. Values are represented in memory by encoding them > [!NOTE] > `0u8`, `1337i16`, and `Foo{bar: "baz"}` are all values @@ -44,7 +44,7 @@ r[memory.encoding.op] Each type defines a pair of properties which, together, define the representation of values of the type. The *encode* operation takes a value of the type and converts it into a sequence of bytes equal in length to the size of the type, and the *decode* operation takes such a sequence of bytes and optionally converts it into a value. Encoding occurs when a value is written to memory, and decoding occurs when a value is read from memory. > [!NOTE] -> Only certain byte sequences may decode into a value of a given type. For example, a byte sequence consisting of all zeroes does not decode to a value of a reference type. +> Only certain byte sequences may decode into a value of a given type. For example, a byte sequence consisting of all zeroes does not decode to a value of a reference type. r[memory.encoding.representation] A sequence of bytes is said to represent a value of a type, if the decode operation for that type produces that value from that sequence of bytes. The representation of a type is the partial relation between byte sequences and values those sequences represent. @@ -56,11 +56,11 @@ r[memory.encoding.symmetric] The result of encoding a given value of a type is a sequence of bytes that represents that value. > [!NOTE] -> This means that a value can be copied into memory and copied out and the result is the same value. +> This means that a value can be copied into memory and copied out and the result is the same value. > The reverse is not necessarily true, a sequence of bytes read as a value then written to another location (called a typed copy) will not necessarily yield the same sequence of bytes. For example, a typed copy of a struct type will leave the padding bytes of that struct uninitialized. r[memory.encoding.decode] -If a value of type `T` is decoded from a sequence of bytes that does not represent any value, the behavior is undefined. +If a value of type `T` is decoded from a sequence of bytes that does not represent any value, the behavior is undefined. > [!NOTE] -> For example, it is undefined behavior to read a `0x02` byte as `bool`. +> For example, it is undefined behavior to read a `0x02` byte as `bool`. From 9349cd3f69f22afc0046a6f1915807cd21afbfe6 Mon Sep 17 00:00:00 2001 From: Connor Horman Date: Tue, 10 Dec 2024 21:14:25 -0500 Subject: [PATCH 18/22] Refactor definition of struct values/representation --- src/types/struct.md | 17 ++++++++++------- src/types/tuple.md | 1 + 2 files changed, 11 insertions(+), 7 deletions(-) diff --git a/src/types/struct.md b/src/types/struct.md index 54060fb58..b3e857963 100644 --- a/src/types/struct.md +++ b/src/types/struct.md @@ -29,10 +29,13 @@ A _unit-like struct_ type is like a struct type, except that it has no fields. The one value constructed by the associated [struct expression] is the only value that inhabits such a type. -## Struct and aggregate values +## Struct r[type.struct.value] +r[type.struct.value.intro] +A value of a struct type consists of a list of values for each field. + r[type.struct.value.value-bytes] A byte `b` in the representation of an aggregate is a value byte if there exists a field of that aggregate such that: * The field has some type `T`, @@ -45,15 +48,15 @@ A byte `b` in the representation of an aggregate is a value byte if there exists r[type.struct.value.padding] Every byte in an aggregate which is not a value byte is a padding byte. -r[type.struct.value.struct] -A value of a struct type consists of the values of each of its fields. -The representation of such a struct contains the representation of the value of each field at its corresponding offset. +> [!NOTE] +> Enum types can also have padding bytes. -r[type.struct.value.padding-uninit] -When a value of an aggregate is encoded, each padding byte is left as uninit +r[type.struct.value.encode-decode] +When a value of a struct type is encoded, each field of the struct is encoded at its corresponding offset and each byte that is not within a field of the struct is set to uninit. +When a value of a struct type is decoded, each field of the struct is decoded from its corresponding offset. > [!NOTE] -> It is valid for padding bytes to hold a value other than uninit when decoded, and these bytes are ignored when decoding an aggregate. +> It is valid for padding bytes to hold a value other than uninit when decoded, and these bytes are ignored when decoding an struct value. [^structtype]: `struct` types are analogous to `struct` types in C, the *record* types of the ML family, or the *struct* types of the Lisp family. diff --git a/src/types/tuple.md b/src/types/tuple.md index 6beb4ad5e..fc8f355fd 100644 --- a/src/types/tuple.md +++ b/src/types/tuple.md @@ -52,6 +52,7 @@ Tuple fields can be accessed by either a [tuple index expression] or [pattern ma r[type.tuple.repr] The values and representation of a tuple type are the same as a [struct type][type.struct.value] with the same fields and layout. + [^1]: Structural types are always equivalent if their internal types are equivalent. For a nominal version of tuples, see [tuple structs]. From 900ee9309c02b76fec05e08f7c33e9bffc1e7f80 Mon Sep 17 00:00:00 2001 From: Connor Horman Date: Thu, 12 Dec 2024 14:51:42 -0500 Subject: [PATCH 19/22] Attempt to define certain representations in less confusing ways. --- src/memory-model.md | 1 + src/types/array.md | 1 + src/types/boolean.md | 2 +- src/types/enum.md | 9 ++++----- src/types/numeric.md | 20 +++++++++++++++++--- src/types/pointer.md | 3 ++- src/types/struct.md | 14 -------------- src/types/textual.md | 3 +++ src/types/tuple.md | 5 +++++ src/types/union.md | 21 ++++++++++++++++++++- 10 files changed, 54 insertions(+), 25 deletions(-) diff --git a/src/memory-model.md b/src/memory-model.md index 4b4fb69da..6e7781efa 100644 --- a/src/memory-model.md +++ b/src/memory-model.md @@ -51,6 +51,7 @@ A sequence of bytes is said to represent a value of a type, if the decode operat > [!NOTE] > Representation is related to, but is not the same property as, the layout of the type. +> A type has a unique representation when each value is represented by exactly one byte sequence. Most primitive types have unique representations. r[memory.encoding.symmetric] The result of encoding a given value of a type is a sequence of bytes that represents that value. diff --git a/src/types/array.md b/src/types/array.md index 27e25d4ac..8f239d0b9 100644 --- a/src/types/array.md +++ b/src/types/array.md @@ -34,6 +34,7 @@ always bounds-checked in safe methods and operators. r[type.array.repr] An array value is represented by each element in ascending index order, placed immediately adjacent in memory. + [_Expression_]: ../expressions.md [_Type_]: ../types.md#type-expressions [`usize`]: numeric.md#machine-dependent-integer-types diff --git a/src/types/boolean.md b/src/types/boolean.md index 218983a2a..a50f5c6c9 100644 --- a/src/types/boolean.md +++ b/src/types/boolean.md @@ -21,7 +21,7 @@ r[type.bool.layout] An object with the boolean type has a [size and alignment] of 1 each. r[type.bool.repr] -A `bool` is represented as a single initialized byte with a value of `0x00` corresponding to `false` and a value of `0x01` corresponding to `true`. This byte does not have a pointer fragment. +A `bool` is represented as a single initialized byte with a value of `0x00` corresponding to `false` and a value of `0x01` corresponding to `true`. > [!NOTE] > No other representations are valid for `bool`. Undefined Behaviour occurs when any other byte is read as type `bool`. diff --git a/src/types/enum.md b/src/types/enum.md index f0b50831f..b6e1153f1 100644 --- a/src/types/enum.md +++ b/src/types/enum.md @@ -28,14 +28,13 @@ An enum value corresponds to exactly one variant of the enum, and consists of th > [!NOTE] > An enum with no variants therefore has no values. -r[type.enum.value.variant-padding] -A byte is a padding byte in a variant `V` if the byte is not used for computing the discriminant, and the byte would be a padding byte in a struct consisting of the fields of the variant at the same offsets. - r[type.enum.value.value-padding] -A byte is a padding byte of an enum if it is a padding byte in each variant of the enum. A byte that is not a padding byte of an enum is a value byte. +A byte is a [padding][type.union.value.padding] byte of an enum if that byte is not part of the representation of the discriminant of the enum, and in each variant it either: +* Does not overlap with a field of the variant, or +* Overlaps with a padding byte in a field of that variant. r[type.enum.value.repr] -The representation of a value of an enum type includes the representation of each field of the variant at the appropriate offsets. When encoding a value of an enum type, each byte which is a padding byte in the variant is set to uninit. In the case of a [`repr(C)`][layout.repr.c.adt] or a [primitive-repr][layout.repr.primitive.adt] enum, the discriminant of the variant is represented as though by the appropriate integer type stored at offset 0. +The representation of a value of an enum type includes the representation of each field of the variant at the appropriate offsets. When encoding a value of an enum type, each byte which is not use d to store a field of the variant or the discriminant is . In the case of a [`repr(C)`][layout.repr.c.adt] or a [primitive-repr][layout.repr.primitive.adt] enum, the discriminant of the variant is represented as though by the appropriate integer type stored at offset 0. > [!NOTE] > Most `repr(Rust)` enums will also store a discriminant in the representation of the enum, but the exact placement or type of the discriminant is unspecified, as is the value that represents each variant. diff --git a/src/types/numeric.md b/src/types/numeric.md index e597df701..6695c2219 100644 --- a/src/types/numeric.md +++ b/src/types/numeric.md @@ -70,7 +70,8 @@ r[type.numeric.repr.integer-width] The range of values an integer type can represent depends on its signedness and its width, in bits. The width of type `uN` or `iN` is `N`. The width of type `usize` or `isize` is the value of the `target_pointer_width` property. > [!NOTE] -> There are exactly `1< There are exactly `1< In particular, for an unsigned type, these values are in the range `0..(1<> (m*8)) as u8`, where `m` is between `0` and the size of `U`. None of the bytes produced by encoding an unsigned integer has a pointer fragment. @@ -82,13 +83,26 @@ A value `i` of an unsigned integer type `U` is represented by a sequence of init > [!WARN] > On `little` endian, the order of bytes used to decode an integer type is the same as the natural order of a `u8` array - that is, the `m` value corresponds with the `m` index into a same-sized `u8` array. On `big` endian, however, the order is the opposite of this order - that is, the `m` value corresponds with the `size_of::() - m` index in that array. + r[type.numeric.repr.signed] A value `i` of a signed integer type with width `N` is represented the same as the corresponding value of the unsigned counterpart type which is congruent modulo `2^N`. +> [!NOTE] +> This encoding of signed integers is known as the 2s complement encoding. + +r[type.numeric.repr.float-width] +Each floating-point type has a width. The type `fN` has a width of `N`. + r[type.numeric.repr.float] -A floating-point value is represented the same as a value of the unsigned integer type with the same width given by its [IEEE 754-2019] encoding. +A floating-point value is represented by the following decoding: +* The byte sequence is decoded as the unsigned integer type with the same width as the floating-point type, +* The resulting integer is decoded according to [IEEE 754-2019] into the format used for the type. + +> [!NOTE] +> The representation of each finite number and infinity is unique as a result of this. +> The exact behaviour of encoding and decoding NaNs is not yet decided r[type.numeric.repr.float-format] -The [IEEE 754-2019] `binary32` format is used for `f32`, and the `binary64` format is used for `f64`. +The [IEEE 754-2019] `binary32` format is used for `f32`, and the `binary64` format is used for `f64`. The set of values for each floating-point type are determined by the respective format. [IEEE 754-2019]: https://ieeexplore.ieee.org/document/8766229 diff --git a/src/types/pointer.md b/src/types/pointer.md index 1cda4a404..ba2a670b4 100644 --- a/src/types/pointer.md +++ b/src/types/pointer.md @@ -106,11 +106,12 @@ A wide pointer or reference consists of a data pointer or reference, and a point r[type.pointer.value.wide-reference] The data pointer of a wide reference has a non-null address, well aligned for `align_of_val(self)`, and with provenance for `size_of_val(self)` bytes. -r[type.pointer.value.wide-representation] +r[type.pointer.value.wide-repr] A wide pointer or reference is represented the same as `struct WidePointer{data: *mut (), metadata: M}` where `M` is the pointee metadata type, and the `data` and `metadata` fields are the corresponding parts of the pointer. > [!NOTE] > The `WidePointer` struct has no guarantees about layout, and has the default representation. +> In particular, it is not guaranteed that you can write a struct type with the same layout as `WidePointer`. ## Pointer Provenance diff --git a/src/types/struct.md b/src/types/struct.md index b3e857963..8d28b45b7 100644 --- a/src/types/struct.md +++ b/src/types/struct.md @@ -36,20 +36,6 @@ r[type.struct.value] r[type.struct.value.intro] A value of a struct type consists of a list of values for each field. -r[type.struct.value.value-bytes] -A byte `b` in the representation of an aggregate is a value byte if there exists a field of that aggregate such that: -* The field has some type `T`, -* The offset of that field `o` is such that `b` falls at an offset in `o..(o+size_of::())`, -* Either `T` is a primitive type or the offset of `b` within the field is a value byte in the representation of `T`. - -> [!NOTE] -> A byte in a union is a value byte if it is a value byte in *any* field. - -r[type.struct.value.padding] -Every byte in an aggregate which is not a value byte is a padding byte. - -> [!NOTE] -> Enum types can also have padding bytes. r[type.struct.value.encode-decode] When a value of a struct type is encoded, each field of the struct is encoded at its corresponding offset and each byte that is not within a field of the struct is set to uninit. diff --git a/src/types/textual.md b/src/types/textual.md index 85504d748..8f0fb3154 100644 --- a/src/types/textual.md +++ b/src/types/textual.md @@ -18,6 +18,9 @@ or 0xE000 to 0x10FFFF range. r[type.text.char-repr] A value of type `char` is represented as the value of type `u32` with value equal to the code point that it represents. +> [!NOTE] +> The representation of `char` is unique. + r[type.text.str-value] A value of type `str` is represented the same way as `[u8]`, a slice of 8-bit unsigned bytes. However, the Rust standard library makes extra assumptions diff --git a/src/types/tuple.md b/src/types/tuple.md index fc8f355fd..4f0b07c38 100644 --- a/src/types/tuple.md +++ b/src/types/tuple.md @@ -52,6 +52,11 @@ Tuple fields can be accessed by either a [tuple index expression] or [pattern ma r[type.tuple.repr] The values and representation of a tuple type are the same as a [struct type][type.struct.value] with the same fields and layout. +> [!NOTE] +> In general, it is not guaranteed that any particular struct type will match the layout of a given tuple type. + +r[type.tuple.padding] +A tuple has the same [padding bytes][type.union.value.padding] as a struct type with the same fields and layout. [^1]: Structural types are always equivalent if their internal types are equivalent. For a nominal version of tuples, see [tuple structs]. diff --git a/src/types/union.md b/src/types/union.md index dbc71ad22..6054d70c2 100644 --- a/src/types/union.md +++ b/src/types/union.md @@ -24,8 +24,27 @@ The memory layout of a `union` is undefined by default (in particular, fields do *not* have to be at offset 0), but the `#[repr(...)]` attribute can be used to fix a layout. +## Union Values + r[type.union.value] -A value of a union type consists of a sequence of bytes, corresponding to each [value byte][type.struct.value.value-bytes]. The value bytes of a union are represented exactly. Each [padding byte][type.struct.value.padding] is set to uninit when encoded. + +r[type.union.value.value-bytes] +A byte `b` in the representation of a struct or union is a value byte if there exists a field of that aggregate such that: +* The field has some type `T`, +* The offset of that field `o` is such that `b` falls at an offset in `o..(o+size_of::())`, +* Either `T` is a primitive type or the offset of `b` within the field is not a padding byte in the representation of `T`. + +> [!NOTE] +> A byte in a union is a value byte if it is a value byte in *any* field. + +r[type.struct.value.padding] +Every byte in an struct or union which is not a value byte is a padding byte. [Enum types][type.enum.value.value-padding], [tuple types][type.tuple.padding], and other types may also have padding bytes. + +> [!NOTE] +> Primitive types, such as integer types, do not have padding bytes. + +r[type.union.value.encoding] +A value of a union type consists of a sequence of bytes, corresponding to each [value byte][type.union.value.value-bytes]. The value bytes of a union are represented exactly. Each [padding byte][type.union.value.padding] is set to uninit when encoded. > [!NOTE] > A given value byte is guaranteed allowed to be uninit if it is padding in any field, recursively expanding union fields. Whether a byte of a union is allowed to be uninit in any other case is not yet decided. From 41ca2637ba848f096f828639af1fdf4050c2b4c6 Mon Sep 17 00:00:00 2001 From: Connor Horman Date: Thu, 12 Dec 2024 14:59:58 -0500 Subject: [PATCH 20/22] Fix "Line must not end with spaces" and link error --- src/types/numeric.md | 10 +++++----- src/types/struct.md | 6 +----- src/types/union.md | 4 ++-- 3 files changed, 8 insertions(+), 12 deletions(-) diff --git a/src/types/numeric.md b/src/types/numeric.md index 6695c2219..f46c3d589 100644 --- a/src/types/numeric.md +++ b/src/types/numeric.md @@ -70,7 +70,7 @@ r[type.numeric.repr.integer-width] The range of values an integer type can represent depends on its signedness and its width, in bits. The width of type `uN` or `iN` is `N`. The width of type `usize` or `isize` is the value of the `target_pointer_width` property. > [!NOTE] -> There are exactly `1< There are exactly `1< In particular, for an unsigned type, these values are in the range `0..(1< [!NOTE] -> This encoding of signed integers is known as the 2s complement encoding. +> This encoding of signed integers is known as the 2s complement encoding. r[type.numeric.repr.float-width] Each floating-point type has a width. The type `fN` has a width of `N`. @@ -96,11 +96,11 @@ Each floating-point type has a width. The type `fN` has a width of `N`. r[type.numeric.repr.float] A floating-point value is represented by the following decoding: * The byte sequence is decoded as the unsigned integer type with the same width as the floating-point type, -* The resulting integer is decoded according to [IEEE 754-2019] into the format used for the type. +* The resulting integer is decoded according to [IEEE 754-2019] into the format used for the type. > [!NOTE] -> The representation of each finite number and infinity is unique as a result of this. -> The exact behaviour of encoding and decoding NaNs is not yet decided +> The representation of each finite number and infinity is unique as a result of the definition of [IEEE 754-2019]. +> The exact behaviour of encoding and decoding NaNs is not yet decided r[type.numeric.repr.float-format] The [IEEE 754-2019] `binary32` format is used for `f32`, and the `binary64` format is used for `f64`. The set of values for each floating-point type are determined by the respective format. diff --git a/src/types/struct.md b/src/types/struct.md index 8d28b45b7..27252bc94 100644 --- a/src/types/struct.md +++ b/src/types/struct.md @@ -36,13 +36,9 @@ r[type.struct.value] r[type.struct.value.intro] A value of a struct type consists of a list of values for each field. - r[type.struct.value.encode-decode] When a value of a struct type is encoded, each field of the struct is encoded at its corresponding offset and each byte that is not within a field of the struct is set to uninit. -When a value of a struct type is decoded, each field of the struct is decoded from its corresponding offset. - -> [!NOTE] -> It is valid for padding bytes to hold a value other than uninit when decoded, and these bytes are ignored when decoding an struct value. +When a value of a struct type is decoded, each field of the struct is decoded from its corresponding offset. Each byte that is not within a field of the struct is ignored. [^structtype]: `struct` types are analogous to `struct` types in C, the *record* types of the ML family, or the *struct* types of the Lisp family. diff --git a/src/types/union.md b/src/types/union.md index 6054d70c2..be21bb85e 100644 --- a/src/types/union.md +++ b/src/types/union.md @@ -24,7 +24,7 @@ The memory layout of a `union` is undefined by default (in particular, fields do *not* have to be at offset 0), but the `#[repr(...)]` attribute can be used to fix a layout. -## Union Values +## Union Values r[type.union.value] @@ -37,7 +37,7 @@ A byte `b` in the representation of a struct or union is a value byte if there e > [!NOTE] > A byte in a union is a value byte if it is a value byte in *any* field. -r[type.struct.value.padding] +r[type.union.value.padding] Every byte in an struct or union which is not a value byte is a padding byte. [Enum types][type.enum.value.value-padding], [tuple types][type.tuple.padding], and other types may also have padding bytes. > [!NOTE] From 0bd1163acc62404cc078711cf0977f5c6d1febd0 Mon Sep 17 00:00:00 2001 From: Connor Horman Date: Thu, 12 Dec 2024 15:01:16 -0500 Subject: [PATCH 21/22] I missed some ;( --- src/types/boolean.md | 2 +- src/types/pointer.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/types/boolean.md b/src/types/boolean.md index a50f5c6c9..d1cce5ff0 100644 --- a/src/types/boolean.md +++ b/src/types/boolean.md @@ -21,7 +21,7 @@ r[type.bool.layout] An object with the boolean type has a [size and alignment] of 1 each. r[type.bool.repr] -A `bool` is represented as a single initialized byte with a value of `0x00` corresponding to `false` and a value of `0x01` corresponding to `true`. +A `bool` is represented as a single initialized byte with a value of `0x00` corresponding to `false` and a value of `0x01` corresponding to `true`. > [!NOTE] > No other representations are valid for `bool`. Undefined Behaviour occurs when any other byte is read as type `bool`. diff --git a/src/types/pointer.md b/src/types/pointer.md index ba2a670b4..2d1e212d8 100644 --- a/src/types/pointer.md +++ b/src/types/pointer.md @@ -111,7 +111,7 @@ A wide pointer or reference is represented the same as `struct WidePointer{da > [!NOTE] > The `WidePointer` struct has no guarantees about layout, and has the default representation. -> In particular, it is not guaranteed that you can write a struct type with the same layout as `WidePointer`. +> In particular, it is not guaranteed that you can write a struct type with the same layout as `WidePointer`. ## Pointer Provenance From 03aa1fcd3705dcbc2ae1a8a331c41fd445ac101b Mon Sep 17 00:00:00 2001 From: Connor Horman Date: Thu, 12 Dec 2024 15:16:13 -0500 Subject: [PATCH 22/22] Remove "Unique Representation" Note. The definition is not useful and none of the types that noted uniquness actually upholds that definition if pointer fragments are allowed to be read as integer types. --- src/memory-model.md | 1 - src/types/numeric.md | 4 ---- src/types/textual.md | 3 --- 3 files changed, 8 deletions(-) diff --git a/src/memory-model.md b/src/memory-model.md index 6e7781efa..4b4fb69da 100644 --- a/src/memory-model.md +++ b/src/memory-model.md @@ -51,7 +51,6 @@ A sequence of bytes is said to represent a value of a type, if the decode operat > [!NOTE] > Representation is related to, but is not the same property as, the layout of the type. -> A type has a unique representation when each value is represented by exactly one byte sequence. Most primitive types have unique representations. r[memory.encoding.symmetric] The result of encoding a given value of a type is a sequence of bytes that represents that value. diff --git a/src/types/numeric.md b/src/types/numeric.md index f46c3d589..79fb887aa 100644 --- a/src/types/numeric.md +++ b/src/types/numeric.md @@ -83,7 +83,6 @@ A value `i` of an unsigned integer type `U` is represented by a sequence of init > [!WARN] > On `little` endian, the order of bytes used to decode an integer type is the same as the natural order of a `u8` array - that is, the `m` value corresponds with the `m` index into a same-sized `u8` array. On `big` endian, however, the order is the opposite of this order - that is, the `m` value corresponds with the `size_of::() - m` index in that array. - r[type.numeric.repr.signed] A value `i` of a signed integer type with width `N` is represented the same as the corresponding value of the unsigned counterpart type which is congruent modulo `2^N`. @@ -98,9 +97,6 @@ A floating-point value is represented by the following decoding: * The byte sequence is decoded as the unsigned integer type with the same width as the floating-point type, * The resulting integer is decoded according to [IEEE 754-2019] into the format used for the type. -> [!NOTE] -> The representation of each finite number and infinity is unique as a result of the definition of [IEEE 754-2019]. -> The exact behaviour of encoding and decoding NaNs is not yet decided r[type.numeric.repr.float-format] The [IEEE 754-2019] `binary32` format is used for `f32`, and the `binary64` format is used for `f64`. The set of values for each floating-point type are determined by the respective format. diff --git a/src/types/textual.md b/src/types/textual.md index 8f0fb3154..85504d748 100644 --- a/src/types/textual.md +++ b/src/types/textual.md @@ -18,9 +18,6 @@ or 0xE000 to 0x10FFFF range. r[type.text.char-repr] A value of type `char` is represented as the value of type `u32` with value equal to the code point that it represents. -> [!NOTE] -> The representation of `char` is unique. - r[type.text.str-value] A value of type `str` is represented the same way as `[u8]`, a slice of 8-bit unsigned bytes. However, the Rust standard library makes extra assumptions