-
-
Notifications
You must be signed in to change notification settings - Fork 30.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support UUIDv6, UUIDv7, and UUIDv8 from RFC 9562 #89083
Comments
Three new types of UUIDs have been proposed in the latest draft of the next version of RFC4122. Full text of that draft is in [1] (published 21 April 2021; draft period ends 21 Oct 2021). Support for these should be included in uuid.py for Python 3.11, with backport for 3.9 and 3.10. The timetable for Python 3.11 should fit with the end of the IETF draft period. Implementation should be similar to the existing UUID classes in uuid.py, the prototypes in [2], or even parts of my own uuid6 version [3]. [1] https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format |
It is a new feature, and we usually do not backport new features to old Python versions, so it can only be included in Python 3.11 (backports can be provided by third-party libraries). Do you want to create a PR? |
Is there anyone currently working on this? If not I'd like to have a look at implementing this. |
Note: the spec for UUIDv5 - UUIDv8 is still a draft, it's still being revised: Therefore, it is too early to add this to the Python standard library. |
UUIDv6, UUIDv7, and UUIDv8 are now in a standards-track RFC: |
I'll make a PR for this (I'm interested in those versions). |
This comment was marked as resolved.
This comment was marked as resolved.
FYI - there are PyPI packages from people in the community attempting to come up with ways to use UUID v6-8 today:
What we'd be seeking to do within the stdlib is settle upon how these should fit as features into the standard library's existing |
Actually, I first tried an implementation based on those packages but after reading the RFC again, I was wondering: "which is the best course of action for the standard library?" and thus I decided to pick the (only) possible variant of v6 where the implementation is RFC-compliant (and then I hit the issue with the fields...) and for v7 and v8, I decided to first take the generic one (and made an alternative for v7 using monotonicity as specified in the RFC alternatives). I did not decide anything on v8 since discussion should first be done. Note that oittaa's v7 is more or less like #120650 (non-monotonous sub-sec v7) since it follows the basic RFC but Simmons' v7 seems to follow the alternative (Method 3) combined with Method 1, §6.2 (Fixed Bit-Length Dedicated Counter) whereas #120830 is Method 3 combined with Method 2, §6.2 (Monotonic Random). I say "seems to" because it's not really clear whether the RFC allows mixing Method 1 & Method 3 (Method 1 forces the counter to immediately follow the 48-bit timestamp part but Method 3 says that the sub-seconds precision should be at that place so...). Method 2 explicitly tells me that I need to use the last 62 bits to make whatever I need so it's closer to RFC compliance. Actually, there are more prototypes that I found last week: https://github.com/uuid6/prototypes, and they like to differ in the implementation of v7 and v8... For v6, the implementation is RFC-decided so we don't need to bother with a discussion, just the other issue on the fields. For v7/v8, do you think we need a Discourse (different from https://discuss.python.org/t/add-uuid7-in-uuid-module-in-standard-library/44390/7) & a PEP perhaps? There's also https://github.com/uuid-rs/uuid which uses the same techniques that I presented in the first PR (namely, UUIDv7 has 80-bit security and UUIDv8 has custom chunks). |
I've opened https://discuss.python.org/t/rfc-4122-9562-uuid-version-7-and-8-implementation/56725 to discuss the implementations more in detail. |
Just to provide a little bit more information about what I've discovered, if you go with the C-implementation route. I created a small uuid7 C-function based on the RFC Method 3 (Replace Leftmost Random Bits with Increased Clock Precision) with nanosecond precision, which works otherwise a bit like the functions in util-linux libuuid, because I wanted to benchmark it across a few systems. Modern Linux systems provide very nicely distributed UUIDs with their nanosecond precision system timers even with this relatively simple function, but what I noticed is that at least on MacOS the system timer provides only a microsecond precision, which basically "wastes" more than a byte of randomness in the "normal" use case where the system isn't generating millions of UUIDs every second. Even though the code is more complicated for the Method 1 (Fixed Bit-Length Dedicated Counter, leftmost counter bit is initialized as zero), it should be more robust across the board from embedded systems to high powered servers and basically only "wastes" one bit by setting it to zero. If you really want to make complicated code, you could detect the capabilities of the system. :)
|
@picnixz, RFC 9562 author here. My vision for an update to the UUID standard library was as follows:
I am sure I am forgetting something but this was what I can think of off the top of my head as it pertains to the content of RFC 9562 and this standards library. I will edit if I think of anything else. I am happy to take a stab at PRs for the different sections of this checklist so there are not tons of changes to review. Many of these are easier than others. Edit1, cited the errata for the test vector. The final text UUID is okay, its the binary that does not line up. Edit2: The description for |
@kyzer-davis Thank you for your reply! it helps a lot and I get a better idea of what the RFC's authors were thinking about now. When you edit your post, please make it a new comment and ping me instead so that I get the notification.
This should be addressed in #120878 and after implementing the different versions I think.
This is #120650.
This may or may not be an optional feature that we would have added afterwards v6 is implemented IMO. We will need a separate issue.
I like this approach but this would make the size of the module blow up. I'm not sure whether we want to expose every possible way to create an UUID in the standard library. For now, I think we can have one version and more versions could be added in separate PRs. Easier to review. In addition, we could have four separate functions if this is required. Or a class acting as a factory.
This is also a follow-up PR and a separate feature. We should be able to do it for other versions as well, though the functionality is somewhat exposed (partially and incorrectly for newer versions) by the current UUID class. As such, I think we should discuss this in #120878 or in a separate issue as well (but not before we have the implementations of v6, v7 and v8).
I forgot about this PR. I think I have one ready (it's really easy to make one though). By the way, anyone should be able to generate v8, not just an "admin". See #123224.
I think the standard library is responsible for providing something standard. The appendix gives the possibility to use sha256 but IIRC, that's only an example. Instead, we could allow users to specify the hash function they want, with a minimum number of output bytes that we would truncate (we could also construct a PRF out of any hash function using the same techniques as in TLS 1.2 and HMAC to construct a digest of suitable size, but it might be an overkill).
This is a three-line function or it can be made as an
Yes, I'll add them in the current PRs. They are indeed important enough.
What would UUID.NIL represent? and UUID.MAX? If they are documented in the RFC (that I don't know by heart), then those are easy to add, though I don't know why we would add them. This should be a separate feature request by the way.
AFAIK, 9562 is an extension of 4122 so v1, v3, v4, and v5 should be ok if they conform to 4122. If this is not the case, then we've been having issues for the last decades that no one ever spotted. But we have tests that should have detected them (or maybe not?). Synchronization issues might happen but we're probably very unlucky if we ever hit them (or if someone noticed it). If anyone wants to check this, they can do it, but I'd prefer them to only open an issue once it's been confirmed that there is an issue. |
Thanks, I found your PRs after I submitted my writeup! (I will comment on #120878 rather than here)
Will do But generally I agree. Each item I mention should be a PR. Some come before others but they all have some benefit. I will address a few of the comments to expand on them: Support all v7 counter methods:
I don't think it will add that much overhead. All Methods 1-3:
Fixed Bit-Length Dedicated Counter (Method 1):
Monotonic Random (Method 2):
Replace Leftmost Random Bits with Increased Clock Precision (Method 3):
v8
Yeah, that is what I meant. Admins==Users in my world.
That true. This specific example, while illustrative, was an ask of many folks who want to use v5 but for one reason or another can't use sha1. The request came so late into the spec writing that we didn't get to give it a proper version number. This was a simple way of providing some extensibility to v5 via the v8 space. With the level chatter around it that I saw while authoring the document: I think folks will find it useful if there was a way to generate a sha256 name-based UUIDv8 using that specific example. Maybe we could label it Note: I wrote that example by actually modifying uuidv5 in But I tend to agree with the fact that v8 name-based (using whatever algo is probably overkill). If we create the v8 generic function; whomever creates that library can piggy back off our
We have that here for general checks: For counter rollover (shouldn't happen with the methods built-in guards but maybe it does) the guidance is here:
https://www.rfc-editor.org/rfc/rfc9562.html#name-nil-uuid https://www.rfc-editor.org/rfc/rfc9562.html#name-max-uuid
We did make one small change to v1 (and v6) which is that we now encourage random > MAC nodes now.
The default for this lib is if IMO we should add a new v1/v6 flag that is like Note: I do not think we need to add the MAC address randomization techniques [IEEE802.11bh] to this logic. More on nodes: there is lots of logic added for other types of IEEE 802 node derivation (can be handled in some other PR or not at all. Just guidance for what to do if we obtain these values and how to convert them into a node ID) Lastly, I looked at my notes and I did check v3/v5 when performing various checks during the authoring. |
Updated my original comment with a link to the errata that change a binary bit from a test vector. |
For synchronization and lock mechanism, there is actually one for v1, but a bit obscured. When an implementation in C exists (using unix-tools underneath), it can call a separate process, if it is running on the system, called This method is for now not present for UUID v6, as the UUID v7 would not need such treatment, as each process should generate a different random value, which is used in v7 instead of a mac address. |
Although fault tolerance requires that each microservice writes to its own database tables, in practice this requirement is often violated. The implementation of UUIDv7 for PostgreSQL had to switch from Method 1 to Method 3 (Increased Clock Precision with 12 bits sub-millisecond timestamp fraction) to synchronize the UUIDv7s generated by different microservices for the same database table. This turned out to be simpler than the autoincrement-like analogue. See the C implementation v27-0001-Implement-UUID-v7.patch of Method 3 at the page as a reference. The entire timestamp acts as a counter in rare case when more than about 4 identifiers per microsecond are generated. This implementation also added the ability to offset the timestamp by a specified interval to hide the record creation time for information security. If offset would cause the timestamp to be outside the allowed range, it should not be applied. It would be nice to add such a special UUIDv7 function for microservices. |
Edited my original comment to feature a note about |
From my understand of the v7 spec, it's possible to have both sub-millisecond precission AND a counter, and also the sub-millisecond DON'T need to be at least 12 bits, just only recommended since it's the size of the I was planning on doing my own implementation (I can share it if you are interested to include it), and my idea was to have a |
When using submillisecond precision, I advise you to use the whole timestamp as a counter if the timestamps for consecutive UUIDs have not increased. This will eliminate the need for a separate counter, and therefore it will be possible to preserve a sufficiently long random segment (at least 32 bits) to make attacks difficult by sequential brute force of UUID values. As for the length of the submillisecond part, it is necessary to take into account not only the available precision of time sources in operating systems, but also the maximum performance of recording in the DBMS, which does not require nanosecond accuracy. |
You're right that the RFC does not give a lower bound but it gives an upper bound (emphasis mine):
That being said, I chose to have a non-modularizable implemention of v7 for the standard library as a first draft. It's always possible to make it extensible in the future but I think the standard library should propose one way (if you want more, then I think 3rd-party libs should be used). The issue with specifying a flexible counter bit length is that we need to keep track of multiple global variables (for instance, all UUIDv7 objects generated with a counter with say 7 bits will have their own global timestamp and counter for synchronization). This is quite easy since you would create a dictionary entry with its state, each entry being a possible configuration. But I don't really want to go there. What could be done, however, is to create a factory of UUID factories. In other words, you specify a configuration for your UUIDv7 algorithm and you'll get an object that would create UUIDv7 objects according to that specific configuration. The factory object would have a single method, namely This would probably the easiest way to have a flexible UUIDv7 implementation included in the standard library. The standard library would however expose by default the UUIDv7 implementation using the RFC recommended methods. |
My comments were about made the v7 implementation according to the spec, but i'm ok with having base support with just 48 timestamp bits and no counters or already enabled opt-in featured, as far as API is designes to allow add and enabled the featured in upcoming versions. It's said, i don't want implentation to be opinión based but be spec based, like 12 bits submilliseconds being all or nothing when spec allows It to be variable, or force to choose between submilliseconds precission or counter when spec allows both. Call me purist if you want. |
I'll work on designing a separate factory for that. What I want to more or less ensure is to be consistent with other languages if possible (e.g., PHP/Rust that can be both used for microservices and/or backends) rather than having Python follows its own rules.
I won't call you that because that's what I would personally do for my personal projects. However, for the stdlib we sometimes need to make design and implementation choices. But part of me do like a flexible implementation (especially if it is desired by the community). Now, another of example of this is actually the |
|
I have create a full implementation of the UUIDv7 latest frozen spec (RFC 9562) at https://github.com/piranna/UUIDv7, with 100% tests coverage. The most complex part was to understand that in practice, having a monotonic random makes the counter as a sort of guard, since spec text is confusing explaining the relationship between methods 1 & 2. After that, implementation was easy-ish and I think got to get a very simple API, although complete and and at the same time unopinionated. I have done it with the intention of being considered to be included as a built-in library in the Python batteries. Besides adding (more and better) documentation, what else would I need to get it included? |
Thank you for that but I already (and completely) implemented the v7 but I'll have a look at yours. We can update my PR and avoid having two different PRs but not today (I'm currently travelling) The issue here is not really the API but rather which standard implementation to choose by default. I followed what other languages decided to do (and will update it accordingly) but we will probably have another interface for more versatility. In general, the Python library does not like having a single function with lots of parameters which make the implementation different and rather like having different functions with different names (but I'm not sure if this principle applies here; it did for the (yet to be accepted) fnmatch.filterfalse function). This is also the reason why I first wanted a parameterless uuidv7 function as a first implementation. |
My own one by default works without arguments, all of them are optional, and that just provides a 48 bits timestamp + 74 random bits. Later you can provide the arguments to enable the different methods, or tune Up, or define the explicit values for each one of the fields. |
Actually this is what we try to avoid. One reason is that it makes maintenance harder (if any) and optimization harder as well (you create if-branches due to that). Finally, the fact that there are assertions / checks for checking whether the parameters combinations are good or not is something I would like to avoid (I think it's easier to make multiple functions rather than a single one; but we can make a single class with multiple class methods acting as a factory, which is what I originally had in mind). I had a quick look at the API and I think we'll need to rethink the UUID class itself and the class itself should only be a view and not be responsible for generating the value. What I can suggest is: we first decide on a default implementation that is really parameterless, namely |
I also think UUID API needs to be rethink, It seems like It was a uuid1 that later was refitted to allow support for the other versions. A UUID base class and several UUIDx chikd classes with their own properties would be better. A parameterless version of UUID would just only create current timestamp + random, that can work pretty much as replacement of uuid4. I think we can start with that, just only in my use case i needed to set the timestamp explicitly too, just only meanwhile i was there, i wanted to go the extra (ten :-P) miles :-D And i'm glad you liked i did It :-) |
I have a separate issue for tracking the UUID interface itself (burried in this huge conversation): #120878 (I added it to the issue; should have done that earlier...). It's only about the time fields but this is roughly one thing that is annoying (namely some attributes are not supported or have different meanings depending on the version). |
… column This would start a new convention with the v2 APIs and models in order to have consistent, clear naming, particularly when it comes to FK references. We currently have the `uuid` field as the self-referencial FK column on the `Workspace` model. More details around the impetus for changing the naming around IDs can be found in RedHatInsights#1257. These changes offer an alternate approach, since we have no data in stage/production, where we no longer use the `uuid` as the `lookup_field` in Django, but rather use a `uuid` as the `id` format. The rationale for not doing this, and having an explicit `uuid` was primarily for having sequential integers as the PK/FK relations. However, UUID7 is a time-ordered UUID, eliminating index issues and solving the need for having distributed ID values across our services. We're using `uuid-utils` [1] which is a compliant implementation using Rust's UUID library. There's also an open proposal [2,3] to add it to Python's standard library. This updates the model, view and serializer. In order to move the `id` from int to uuid, we need two migrations: - one to move the current `id` column, and the `parent` column (because of the FK ref) as well as making the current `uuid` column the PK - a second to then rename the `uuid` column to `id` and add the `parent` FK ref/column back [1] https://github.com/aminalaee/uuid-utils [2] python/cpython#89083 [3] https://discuss.python.org/t/add-uuid7-in-uuid-module-in-standard-library/44390
Co-authored-by: Hugo van Kemenade <[email protected]>
Change 03924b5 added |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
Related
fields
andtime_*
properties must not be used on UUIDs that are time-agnostic. #120878The text was updated successfully, but these errors were encountered: