-
Notifications
You must be signed in to change notification settings - Fork 498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Values and Representation chapter #1664
base: master
Are you sure you want to change the base?
Changes from all commits
21ad051
0e3d2ef
c24fb7c
ab0702b
c79edfb
42f626f
66b7fc6
290efc1
d565328
b2e8c30
e634550
98d4d8e
97778c3
695d517
5bc13a0
0823313
2cbdb58
9349cd3
900ee93
41ca263
0bd1163
03aa1fc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -1,5 +1,66 @@ | ||||||||||||
# Memory model | ||||||||||||
|
||||||||||||
Rust does not yet have a defined memory model. Various academics and industry professionals | ||||||||||||
are working on various proposals, but for now, this is an under-defined place | ||||||||||||
in the language. | ||||||||||||
r[memory] | ||||||||||||
|
||||||||||||
The Memory Model of Rust is incomplete and not fully decided. The following is some of the detail worked out so far. | ||||||||||||
|
||||||||||||
## Bytes | ||||||||||||
|
||||||||||||
r[memory.byte] | ||||||||||||
|
||||||||||||
r[memory.byte.intro] | ||||||||||||
The most basic unit of memory in Rust is a byte. All values in Rust are computed from 0 or more bytes read from an allocation. | ||||||||||||
|
||||||||||||
> [!NOTE] | ||||||||||||
> While bytes in Rust are typically lowered to hardware bytes, they may contain additional values, | ||||||||||||
> such as being uninitialized, or storing part of a pointer. | ||||||||||||
Comment on lines
+15
to
+16
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||
|
||||||||||||
r[memory.byte.contents] | ||||||||||||
Each byte may have one of the following values: | ||||||||||||
|
||||||||||||
r[memory.byte.init] | ||||||||||||
* An initialized byte containing a `u8` value and optional [provenance][type.pointer.provenance], | ||||||||||||
|
||||||||||||
r[memory.byte.uninit] | ||||||||||||
* An uninitialized byte. | ||||||||||||
|
||||||||||||
> [!NOTE] | ||||||||||||
> Uninitialized bytes do not have a value and do not have a pointer fragment. | ||||||||||||
|
||||||||||||
> [!NOTE] | ||||||||||||
> The above list is not yet guaranteed to be exhaustive. | ||||||||||||
|
||||||||||||
## Value Encoding | ||||||||||||
|
||||||||||||
r[memory.encoding] | ||||||||||||
|
||||||||||||
r[memory.encoding.intro] | ||||||||||||
Each type in Rust has 0 or more values, which can have operations performed on them. Values are represented in memory by encoding them | ||||||||||||
|
||||||||||||
> [!NOTE] | ||||||||||||
> `0u8`, `1337i16`, and `Foo{bar: "baz"}` are all values | ||||||||||||
|
||||||||||||
r[memory.encoding.op] | ||||||||||||
Each type defines a pair of properties which, together, define the representation of values of the type. The *encode* operation takes a value of the type and converts it into a sequence of bytes equal in length to the size of the type, and the *decode* operation takes such a sequence of bytes and optionally converts it into a value. Encoding occurs when a value is written to memory, and decoding occurs when a value is read from memory. | ||||||||||||
|
||||||||||||
> [!NOTE] | ||||||||||||
> Only certain byte sequences may decode into a value of a given type. For example, a byte sequence consisting of all zeroes does not decode to a value of a reference type. | ||||||||||||
|
||||||||||||
r[memory.encoding.representation] | ||||||||||||
A sequence of bytes is said to represent a value of a type, if the decode operation for that type produces that value from that sequence of bytes. The representation of a type is the partial relation between byte sequences and values those sequences represent. | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we do decide to define decoding as the unique minium value, we could possibly even reduce this definition to just say, for each type there is a representation function, which is a partial function that decodes a sequence of bytes into a corresponding value. And then we once-and-forall define the corresponding encode operation across all types. |
||||||||||||
|
||||||||||||
> [!NOTE] | ||||||||||||
> Representation is related to, but is not the same property as, the layout of the type. | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I find this more confusing than helpful. What even exactly is the "layout" of a type, i.e. what is the mathematical type of a "layout"? I am not sure what you are trying to achieve with this sentence. Here's my attempt at rewording:
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm attempting to explain someting to less mathetmatically-focused consumers of the reference. This is a non-normative note and therefore serves to explain or exemplify something about the preceeding paragraph. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What I wrote wasn't even very mathematical. ;) I find your current note confusing. I don't know what you are trying to say, so I made a guess. I am also fine with removing it entirely, but if you want to keep it then please explain which point you are trying to make. There's no point in keeping a note which is confusing to an expert; it will likely confuse many novices as well. |
||||||||||||
|
||||||||||||
r[memory.encoding.symmetric] | ||||||||||||
The result of encoding a given value of a type is a sequence of bytes that represents that value. | ||||||||||||
|
||||||||||||
> [!NOTE] | ||||||||||||
> This means that a value can be copied into memory and copied out and the result is the same value. | ||||||||||||
> The reverse is not necessarily true, a sequence of bytes read as a value then written to another location (called a typed copy) will not necessarily yield the same sequence of bytes. For example, a typed copy of a struct type will leave the padding bytes of that struct uninitialized. | ||||||||||||
|
||||||||||||
r[memory.encoding.decode] | ||||||||||||
If a value of type `T` is decoded from a sequence of bytes that does not represent any value, the behavior is undefined. | ||||||||||||
|
||||||||||||
> [!NOTE] | ||||||||||||
> For example, it is undefined behavior to read a `0x02` byte as `bool`. |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -14,14 +14,31 @@ unit-like struct. | |||||
r[type.enum.constructor] | ||||||
New instances of an `enum` can be constructed with a [struct expression]. | ||||||
|
||||||
r[type.enum.value] | ||||||
Any `enum` value consumes as much memory as the largest variant for its | ||||||
corresponding `enum` type, as well as the size needed to store a discriminant. | ||||||
|
||||||
r[type.enum.name] | ||||||
Enum types cannot be denoted *structurally* as types, but must be denoted by | ||||||
named reference to an [`enum` item]. | ||||||
|
||||||
## Enum values and representation | ||||||
|
||||||
r[type.enum.value] | ||||||
|
||||||
r[type.enum.value.intro] | ||||||
An enum value corresponds to exactly one variant of the enum, and consists of the fields of that variant | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
> [!NOTE] | ||||||
> An enum with no variants therefore has no values. | ||||||
|
||||||
r[type.enum.value.value-padding] | ||||||
A byte is a [padding][type.union.value.padding] byte of an enum if that byte is not part of the representation of the discriminant of the enum, and in each variant it either: | ||||||
* Does not overlap with a field of the variant, or | ||||||
* Overlaps with a padding byte in a field of that variant. | ||||||
|
||||||
r[type.enum.value.repr] | ||||||
The representation of a value of an enum type includes the representation of each field of the variant at the appropriate offsets. When encoding a value of an enum type, each byte which is not use d to store a field of the variant or the discriminant is . In the case of a [`repr(C)`][layout.repr.c.adt] or a [primitive-repr][layout.repr.primitive.adt] enum, the discriminant of the variant is represented as though by the appropriate integer type stored at offset 0. | ||||||
|
||||||
> [!NOTE] | ||||||
> Most `repr(Rust)` enums will also store a discriminant in the representation of the enum, but the exact placement or type of the discriminant is unspecified, as is the value that represents each variant. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
[^enumtype]: The `enum` type is analogous to a `data` constructor declaration in | ||||||
Haskell, or a *pick ADT* in Limbo. | ||||||
|
||||||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -62,6 +62,12 @@ let bo: Binop = add; | |||||
x = bo(5,7); | ||||||
``` | ||||||
|
||||||
r[type.fn-pointer.value] | ||||||
A value of a function pointer type consists of an non-null address. A function pointer value is represented the same as an address represented as an unsigned integer type with the same width as the function pointer. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
> [!NOTE] | ||||||
> Whether or not a function pointer value has provenance, and whether or not this provenance is represented as pointer fragments, is not yet decided. | ||||||
|
||||||
## Attributes on function pointer parameters | ||||||
|
||||||
r[type.fn-pointer.attributes] | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -59,9 +59,46 @@ r[type.numeric.int.size.minimum] | |
> pointer support is limited and may require explicit care and acknowledgment | ||
> from a library to support. | ||
|
||
## Bit validity | ||
## Representation | ||
|
||
r[type.numeric.validity] | ||
r[type.numeric.repr] | ||
|
||
For every numeric type, `T`, the bit validity of `T` is equivalent to the bit | ||
validity of `[u8; size_of::<T>()]`. An uninitialized byte is not a valid `u8`. | ||
r[type.numeric.repr.integer] | ||
Each value of an integer type is a whole number. For unsigned types, this is a positive integer or `0`. For signed types, this can either be a positive integer, negative integer, or `0`. | ||
|
||
r[type.numeric.repr.integer-width] | ||
The range of values an integer type can represent depends on its signedness and its width, in bits. The width of type `uN` or `iN` is `N`. The width of type `usize` or `isize` is the value of the `target_pointer_width` property. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't this say how it depends on that? Specifically, unsigned integers carry values in |
||
|
||
> [!NOTE] | ||
> There are exactly `1<<N` unique values of an integer type of width `N`. | ||
> In particular, for an unsigned type, these values are in the range `0..(1<<N)` and for a signed type, are in the range `-(1<<(N-1))..(1<<(N-1))`, using rust range syntax. | ||
|
||
r[type.numeric.repr.unsigned] | ||
A value `i` of an unsigned integer type `U` is represented by a sequence of initialized bytes, where the `m`th successive byte according to the byte order of the platform is `(i >> (m*8)) as u8`, where `m` is between `0` and the size of `U`. None of the bytes produced by encoding an unsigned integer has a pointer fragment. | ||
|
||
> [!NOTE] | ||
> The two primary byte orders are `little` endian, where the bytes are ordered from lowest memory address to highest, and `big` endian, where the bytes are ordered from highest memory address to lowest. | ||
> The `cfg` predicate `target_endian` indicates the byte order | ||
|
||
> [!WARN] | ||
> On `little` endian, the order of bytes used to decode an integer type is the same as the natural order of a `u8` array - that is, the `m` value corresponds with the `m` index into a same-sized `u8` array. On `big` endian, however, the order is the opposite of this order - that is, the `m` value corresponds with the `size_of::<T>() - m` index in that array. | ||
|
||
r[type.numeric.repr.signed] | ||
A value `i` of a signed integer type with width `N` is represented the same as the corresponding value of the unsigned counterpart type which is congruent modulo `2^N`. | ||
|
||
> [!NOTE] | ||
> This encoding of signed integers is known as the 2s complement encoding. | ||
|
||
r[type.numeric.repr.float-width] | ||
Each floating-point type has a width. The type `fN` has a width of `N`. | ||
|
||
r[type.numeric.repr.float] | ||
A floating-point value is represented by the following decoding: | ||
* The byte sequence is decoded as the unsigned integer type with the same width as the floating-point type, | ||
* The resulting integer is decoded according to [IEEE 754-2019] into the format used for the type. | ||
|
||
|
||
r[type.numeric.repr.float-format] | ||
The [IEEE 754-2019] `binary32` format is used for `f32`, and the `binary64` format is used for `f64`. The set of values for each floating-point type are determined by the respective format. | ||
|
||
[IEEE 754-2019]: https://ieeexplore.ieee.org/document/8766229 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -29,6 +29,17 @@ A _unit-like struct_ type is like a struct type, except that it has no fields. | |
The one value constructed by the associated [struct expression] is the only | ||
value that inhabits such a type. | ||
|
||
## Struct | ||
|
||
r[type.struct.value] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should say somewhere that a value of strict type is a list of values with one value for each field of the struct. |
||
|
||
r[type.struct.value.intro] | ||
A value of a struct type consists of a list of values for each field. | ||
|
||
r[type.struct.value.encode-decode] | ||
When a value of a struct type is encoded, each field of the struct is encoded at its corresponding offset and each byte that is not within a field of the struct is set to uninit. | ||
When a value of a struct type is decoded, each field of the struct is decoded from its corresponding offset. Each byte that is not within a field of the struct is ignored. | ||
|
||
[^structtype]: `struct` types are analogous to `struct` types in C, the | ||
*record* types of the ML family, or the *struct* types of the Lisp family. | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "value" defined anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Values are defined constructively, with each "class" of types. As mentioned in another comment, these are present in different chapters at the request of T-lang-doc and T-spec.