design-proposal: improved handling of the VMI firmware UUID #347

dasionov · 2024-11-06T01:27:47Z

What this PR does / why we need it:

This proposal introduces a mechanism to persist the firmware UUID of a Virtual Machine Instance (VMI) in KubeVirt. By storing the firmware UUID, we ensure that it remains consistent across VMI restarts, which is
crucial for applications and services that rely on the UUID for identification or licensing purposes.

relates-to: kubevirt/kubevirt#13156, kubevirt/kubevirt#13158

Special notes for your reviewer:

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

Design: A design document was considered and is present (link) or not required
PR: The PR description is expressive enough and will help future contributors
Code: Write code that humans can understand and Keep it simple
Refactor: You have left the code cleaner than you found it (Boy Scout Rule)
Upgrade: Impact of this change on upgrade flows was considered and addressed if required
Testing: New code requires new unit tests. New features and bug fixes require at least on e2e test
Documentation: A user-guide update was considered and is present (link) or not required. You want a user-guide update if it's a user facing feature / API change.
Community: Announcement to kubevirt-dev was considered

Release note:

None

kubevirt-bot · 2024-11-06T01:27:51Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign vladikr for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

dankenigsberg

Thanks @dasionov for stepping up to create a design proposal. It would be wonderful if you can get this old issue sorted.

However, I think that you should discuss possible alternatives for a solution before deciding one implementation (persisting UUID in status).

design-proposals/persist-vmi-firmware-uuid.md

iholder101

Thanks a lot @dasionov for picking this issue up! It's indeed important to solve and keeps coming back!

I very much agree with most of what @dankenigsberg commented above, especially regarding the need to provide a few alternative solutions. This way we can discuss which would be the best way moving forward.

Let me try to help you with that. Here are a few alternatives that I have in mind:

Keep the UUID in the VM's spec even if the user did not request it
Partially implemented here: kubevirt/kubevirt#13158.

Main idea: The UUID would be generated, as of now, for every VM once it boots. After generation, the VM's spec would be updated with the generated UUID.
Pros: Easy to implement, avoid adding new API fields, fully backward compatible.
Cons: Abusing the spec. The spec should only contain the user's desired state. In most situations, the VM owner does not care what the UUID is, hence it is not a part of the desired state from the user's POV. This will complicate and make the VM's definition longer, hard to read, and hard to reason about.

Generate a default random UUID via a webhook and set it to the spec
Partially implemented here: kubevirt/kubevirt#12085.

Main idea: If the user did not provide a desired UUID, a webhook will generate a random UUID and will set it to the VM's spec.
Pros: Same as the previous suggestion.
Cons: Same argument from above regarding abusing the spec. In addition, using webhooks for supplying defaults is a bad practice for many reasons, for example it might hurt performance and scalability if a lot of VMs are being started at the same time, since then virt-api might become a bottle neck.

Create a new firmware UUID field under status

Main idea: Just as today, UUID would be generated during the VM's first boot. After it is generated, it would be saved to a dedicated firmware UUID status field.
Pros: No need to abuse spec and we can keep it clean.
Cons: Adding a new API field which is generally not valuable outside the scope of this issue. New API fields need to be added with extreme caution, as they are very hard to remove, might increase load and hurt performance, and make the API less readable and compact. (We can consider minimizing this con by setting the UUID in a VM condition as suggested here).

Introduce a breaking change

Main idea: Just as today, UUID would be generated during the VM's first boot, but it would generate the UUID based on both name and namespace, which will be a breaking change. Before doing so, we would warn the users for a few versions (via logs/alerts/etc) and possibly providing tooling to add the current firmware UUID to the VM's spec in order to protect the workload.
Pros: No need to abuse spec and we can keep it clean, very easy to implement.
Cons: Demands user intervention in order to avoid breaking current workloads.

Obviously, feel free to add more alternatives that you can think of, or express your opinion on the above alternatives.

design-proposals/persist-vmi-firmware-uuid.md

Acedus

Thanks! This looks like a solid foundation. One concern I have is around VM snapshots—I think it's worth mentioning that even if the firmware UUID of an existing VM is set to persist, it won’t be retained during a restore. This could potentially cause functionality issues.

design-proposals/persist-vmi-firmware-uuid.md

Acedus · 2024-11-07T12:49:24Z

design-proposals/persist-vmi-firmware-uuid.md

+   - Keeps the spec clean.
+   - Focuses on VM status for generated information.
+
+   **Cons:**


A drawback here in my view is that an object's status is often dynamic and therefore somewhat fragile. This makes sense, as controllers manipulate the status field to represent the object's current state, while they rarely—if ever—modify the object's spec.

A drawback here in my view is that an object's status is often dynamic and therefore somewhat fragile. This makes sense, as controllers manipulate the status field to represent the object's current state

It is a matter of how we implement the field and the controllers that would interact with it.
I mean, as I see it, this status field will be only ever assigned once then never modified again by any controller. In this scenario I don't think it would be highly dynamic and fragile.

while they rarely—if ever—modify the object's spec.

And for a good reason :)
Ideally, only a human should ever modify a VM's spec.

Multiple k8s objects rely on their status fields to function. It's not that fragile and is not editable for users.

With this option, IMO, the flow simplifies: on VM start, the VM controller stores either the user-specified spec.template.spec.firmware.uuid or vm.metadata.uid.
No webhooks are required.
We can also trigger events for running VMs without a firmware UUID in the status to restart to get a stable UUID across shutdowns or just raise the restart required condition.

With this option, IMO, the flow simplifies: on VM start, the VM controller stores either the user-specified spec.template.spec.firmware.uuid or vm.metadata.uid.
No webhooks are required.
We can also trigger events for running VMs without a firmware UUID in the status to restart to get a stable UUID across shutdowns or just raise the restart required condition.

Just to ensure that I understood correctly. In this MO, existing VMs would not persist the firmware UUID specified in their VMI counterparts but rather set the status firmware UUID to (for example) vm.metadata.uid and raise the restart required condition? This acknowledges that disruption (i.e. firmware UUID changes post-restart to affect anything that relies on it) is imminent, correct?

This acknowledges that disruption (i.e. firmware UUID changes post-restart to affect anything that relies on it) is imminent, correct?

This disruption has to be prevented. Otherwise it will be a breaking change.
In order to prevent it I believe we'd have to apply a multi-phased approach:

Continue with the current firmware UUID calculation, but store the UUID in status.

As @vladikr suggested, use alerts / VM conditions in order to notify the users to restart VMs that do not have firmware UUID in their status.

After a while, let the community know (i.e. via a mail to mailing list) that in the next version firmware UUID must reside in status in order to prevent breaking workloads.

Switch to using vm.metadata.uid as the new firmware UUID (unless a UUID is already provided in the VM's status, therefore keeping backward compatibility).

As @vladikr suggested, use alerts / VM conditions in order to notify the users to restart VMs that do not have firmware UUID in their status.

So at least a restart is required to derive the firmware UUID from the VMI (including shutdown VMs) so that it would persist in status, correct?

Switch to using vm.metadata.uid as the new firmware UUID (unless a UUID is already provided in the VM's status, therefore keeping backward compatibility).

Assuming a scenario where n existing VMs in cluster(s) suffer from the issue of overlapping firmware UUIDs, what is the recommended approach for users to rectify the issue post-upgrade? Recreate the VMs? Change the spec.firmware.uuid to what appears in vm.metadata.uid and restart? Is user action mandatory to resolve the issue?

So at least a restart is required to derive the firmware UUID from the VMI (including shutdown VMs) so that it would persist in status, correct?

Now that I'm re-thinking about it, a restart might not be necessary, as virt-controller could simply copy spec.firmware.uuid to the status field if exists, and if not, re-calc the hash and assign it to status.

Assuming a scenario where n existing VMs in cluster(s) suffer from the issue of overlapping firmware UUIDs, what is the recommended approach for users to rectify the issue post-upgrade? Recreate the VMs? Change the spec.firmware.uuid to what appears in vm.metadata.uid and restart? Is user action mandatory to resolve the issue?

TBH I'm not sure that we can redeem this situation. If two VMs are already running with the same firmware UUID, I think one has to be re-created. In any case, I think it shouldn't be in the scope of this proposal which only aims to prevent this situation from happening in the future.

TBH I'm not sure that we can redeem this situation. If two VMs are already running with the same firmware UUID, I think one has to be re-created. In any case, I think it shouldn't be in the scope of this proposal which only aims to prevent this situation from happening in the future.

SGTM, thanks.

design-proposals/persist-vmi-firmware-uuid.md

iholder101 · 2024-11-07T13:49:13Z

One concern I have is around VM snapshots—I think it's worth mentioning that even if the firmware UUID of an existing VM is set to persist, it won’t be retained during a restore.

@Acedus would you mind elaborating on this one? why won't it be retained during a restore?

Acedus · 2024-11-07T13:53:22Z

One concern I have is around VM snapshots—I think it's worth mentioning that even if the firmware UUID of an existing VM is set to persist, it won’t be retained during a restore.

@Acedus would you mind elaborating on this one? why won't it be retained during a restore?

I'm referring to VM backups taken before the introduction of the changes described in this DP. Correct me if I'm wrong, but the VM snapshot content won't be updated as part of this change (e.g., whether through spec or status), and if it isn't, once the new UUID generation mechanism is introduced and a restore is attempted the UUID will change.

vladikr · 2024-11-07T17:53:51Z

design-proposals/persist-vmi-firmware-uuid.md

+   firmwareUUID: "123e4567-e89b-12d3-a456-426614174000"
+```
+
+or persist via status condition


I'd argue against a condition. Conditions are generally needed to manage the lifecycle or to be consumed by other controllers.

conditions are not for persistence.

design-proposals/persist-vmi-firmware-uuid.md

fabiand · 2024-11-11T08:45:42Z

design-proposals/persist-vmi-firmware-uuid.md

+**Pros:**
+- The upgrade-specific code can be safely removed after two or three releases.
+- Simple logic post-upgrade, with a straightforward UUID assignment.
+- Limited disturbance to GitOps workflows, affecting only pre-existing VMs with the current method.


This is only true if gitops is setting metadata.uid, correct?

@fabiand @iholder101 What if the VM controller would dynamically add this annotation during VM creation if vm.spec.template.spec.Firmware.UUID is not defined?

For example, when a VM is created (via GitOps or using oc/kubectl), the controller would set the following annotation:

apiVersion: kubevirt.io/v1 kind: VirtualMachine metadata: annotations: kubevirt.io/host-uuid: "<host-uuid>"

Does that align with your understanding? Thank you!

@jangel97 what is the benefit of storing this in a new annotation, instead of an already-existing API?

I believe leveraging existing solutions would be the most effective approach. However, we should also consider scenarios where a system has a host UUID, and this breaking change could potentially alter it. In that case, if I understand correctly, wouldn't it be necessary to retain the "old" host UUID somewhere?

fabiand · 2024-11-11T08:47:51Z

design-proposals/persist-vmi-firmware-uuid.md

+   - May potentially bottleneck `virt-api`.
+
+3. **Add a Firmware UUID Field to VM Status**  
+   **Description:** Store the generated UUID within the VM’s status.


Status is only live, it must not be used to store information.

aka any informatoin in the status in only relevant and "persisted" on the runtime platform.
Object smust not be created with a populated status (only empty status is what we want: status: {})

Object smust not be created with a populated status (only empty status is what we want: status: {})

A VM object won't be created with a populated status, but a VM that was created, booted, then shut off would have a non-empty status.

Am I missing anything?

ping @fabiand :)
just want to understand if I get you correctly.

fabiand · 2024-11-11T08:49:04Z

design-proposals/persist-vmi-firmware-uuid.md

+        type: Ready
+   created: true
+   runStrategy: Once
+   firmwareUUID: "123e4567-e89b-12d3-a456-426614174000"


.status ins only for erporting, not for persitence.

Can you elaborate?
How is spec more persistent than status? Aren't they both persisted in etcd?

.status ins only for erporting, not for persitence.

@fabiand I don't know where this is coming from. Numerous objects across k8s use the status exactly for that purpose, including objects in KubeVirt.

The VM object, until it is deleted would have a status - it simply stores the current state of the VM, while the spec holds the desired.

Conceptually, it is not what status is meant to show.

Status is meant to represent the current state. However, the FW UUID cannot be "read" from the current state of the VM, because it is stopped.
There is a name to persisting such information on the status: caching.
Status is not about caching, it is about representing the actual. But here the suggestion is to store the actual so it will be the input for a future actual (which comes in the opposite direction).

We actually had this implemented in the network interfaces status and it caused a mess: To calculate the new status on a reconcile cycle, the previous data was considered, potentially dragging incorrect state over cycles indefinitely. We have since dropped this behavior, calculating the interfaces status without depending on the previous state.
But this is still tricky business even today, because multiple components attempt to touch the same list.

Bottom line, while this could be a good solution, caching the data on the status can potentially bite back. There is logic that uses this status information to set the UUID on the next VM instantiation and an opposite logic that reads the VM instantiation spec (or status?) to reflect that actual back to the same status field. Sounds like a loop to me.

Even if the VM is stopped the status shows the current state at this UUID is stored on the VM boot disk
As long as the VM exists this is its current state, regardless of its phase.

fabiand · 2024-11-11T08:49:16Z

design-proposals/persist-vmi-firmware-uuid.md

+   firmwareUUID: "123e4567-e89b-12d3-a456-426614174000"
+```
+
+or persist via status condition


conditions are not for persistence.

dankenigsberg · 2024-11-13T11:45:44Z

design-proposals/persist-vmi-firmware-uuid.md

+**Pros:**
+- The upgrade-specific code can be safely removed after two or three releases.
+- Simple logic post-upgrade, with a straightforward UUID assignment.
+- Limited disturbance to GitOps workflows, affecting only pre-existing VMs with the current method.


@jangel97 what is the benefit of storing this in a new annotation, instead of an already-existing API?

dankenigsberg · 2024-11-13T11:48:03Z

design-proposals/persist-vmi-firmware-uuid.md

+
+**Cons:**
+- Requires additional code (likely in `virt-operator`) to handle the upgrade-specific persistence.
+- Patching the spec of a running VM may trigger a "restart required" condition.


Can this be easily avoided in code, if we find that vm's Firmware.UUID has NOT differed from the vmi's?

Generally, patching object specs after the admission is not a good practice that hides the user intent.

Generally, patching object specs after the admission is not a good practice that hides the user intent.

Ack. And I'd say that patching during admission is just as bad in this respect.
However in this case, the user never intended us to have a buggy UUID, but must have it persisted. So I don't see here any issue, beside the (limited) disturbance to git-ops.

Can this be easily avoided in code, if we find that vm's Firmware.UUID has NOT differed from the vmi's?

@dasionov would you refer to this? I believe that the Con should say

- Code should be added to the computation of the "restart required" condition so that it is not raised if vm.Template.Firmware.UUID equals vmi.Firmware.UUID.

sry for the delay, fixed

dankenigsberg · 2024-11-14T10:42:48Z

design-proposals/persist-vmi-firmware-uuid.md

+
+### 4. Introduce a Breaking Change to Ensure Universal Uniqueness
+
+**Description:** Modify UUID generation to use a combination of VM name and namespace, ensuring unique UUIDs within the cluster.


what about universal uniqueness? How would it be ensured?

what about universal uniqueness? How would it be ensured?

Using both name + namespace as a hash will ensure uniqueness AFAICT

not universal uniqueness. a different vm on a different cluster would obtain the same uuid if it has the same name and namespace.

Ah, got you.
We can always add something "random" to the hash, like creation timestamp. WDYT?

Would it be possible to clarify the impact of VMs in different clusters having the same UUID if they share the same name and namespace?
Are there specific cases where cross-cluster uniqueness is essential for the UUID to function correctly?

I guess that @dankenigsberg is referring migrating a VM from a different cluster, e.g. via importing/exporting disks

Are there specific cases where cross-cluster uniqueness is essential for the UUID to function correctly?

By definition, UUID has to be universally-unique. A lot of services would break if two different VMs present matching identifiers.

Just use the pod medata.UID?
https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids

you mean the vm.metadata.uid?

xpivarc · 2024-11-18T14:20:52Z

/label sig/compute

kubevirt-bot · 2024-11-18T14:20:56Z

@xpivarc: The label(s) /label sig/compute cannot be applied. These labels are supported: good-first-issue, needs-approver-review. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to this:

/label sig/compute

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

iholder101 · 2024-11-18T14:25:22Z

/label sig/compute

/sig compute
(yeah, confusing as hell)

Acedus · 2024-11-19T09:26:10Z

design-proposals/persist-vmi-firmware-uuid.md

+- May necessitate tooling to facilitate UUID preservation for compatibility with existing workloads.
+
+
+### 5. Upgrade-Specific Persistence of Firmware UUID


I don’t necessarily object to this approach—in fact, it’s likely unavoidable to prevent disruption to existing workloads—but the process for implementing it should be more thoroughly detailed IMO.

Correct me if I'm wrong, but if VMs are stopped during the upgrade, they cannot rely on VMI objects to derive and persist the firmware UUID, as those objects won’t exist. As a result, existing VMs without a defined spec.firmware.uuid field would need to populate the persistent UUID field (whatever that may be) using the current method (based on the name). The new method should then be applied exclusively to new VMs.

While this doesn’t cover all use cases, it’s reasonable to assume it addresses the majority—except for scenarios involving (for example) backup and restore, which present a separate set of challenges.

I don't understand the reservation that you are raising. Option 5 does not distinguish between running and non-running VMs. During upgrade, they are all changed to persist the buggy uuid. After upgrade, all VMs receive a fresh uuid.

The problem of restoring old VMs that did not persist their buggy uuid sounds important, but not specific to this option. @dasionov can you address it? in the design? I don't have any good idea...

IIRC the original idea was to persist the UUID in existing VMs by obtaining it from the VMIs, therefore my comment. Perhaps I mixed up the proposal with implementation, my bad, you can disregard the comment in that case.

iholder101 · 2024-11-19T10:09:10Z

design-proposals/persist-vmi-firmware-uuid.md

+### 5. Upgrade-Specific Persistence of Firmware UUID
+
+**Description:** Before upgrading KubeVirt, persist the `Firmware.UUID` of existing VMsin `vm.spec`.
+After the upgrade, any VM without `Firmware.UUID` is considered new.
+For these new VMs, use `vm.metadata.uid` as the firmware UUID if `vm.spec.template.spec.Firmware.UUID` is not defined.


Another problem with this approach is upgrades that skip versions.
IOW, a user can upgrade Kubevirt v1.0 straight to v1.3, skipping v1.1 and v1.2. With this approach, such users will break their VMs.

Do we support skipping intermediate minor releases? I don't think that we test that.
Anyway, the proposal says below

Upgrade-specific code can be removed after two or three releases.

so a 1.4->1.7 would still be safe in this regard.

enp0s3

@dasionov Hi, I would suggest to simplify the PR subject and call it "improved handling of the VM firmware UUID", sine I see that you want to cover more than only the UUID persistence subject.

design-proposals/persist-vmi-firmware-uuid.md

aburdenthehand · 2024-11-20T15:14:44Z

/cc @EdDev

vladikr · 2024-11-22T15:09:39Z

design-proposals/persist-vmi-firmware-uuid.md

+**Pros:**
+- Straightforward implementation.
+- Avoids the need to introduce new API fields.
+- Fully backward compatible.


Wouldn't this break existing VMs? After a shutdown, these will get a new UUID.

After a shutdown, these will get a new UUID.

IIUC the idea here is a multi-phased approach, in which UUID will first be saved to the spec and then (after a few releases) the way of computing the UUID will change. IOW this approach is similar to the status field approach but with a spec field.

vladikr · 2024-11-22T15:15:08Z

design-proposals/persist-vmi-firmware-uuid.md

+**Cons:**
+- Introduces a new API field, which could increase API surface area and requires long-term support.
+- New fields may impact system performance and resource load.
+- Adds to API complexity and may reduce readability.


I don't understand this con, can you elaborate?

vladikr · 2024-11-22T15:15:14Z

design-proposals/persist-vmi-firmware-uuid.md

+
+**Cons:**
+- Introduces a new API field, which could increase API surface area and requires long-term support.
+- New fields may impact system performance and resource load.


xpivarc · 2024-11-18T15:11:50Z

design-proposals/persist-vmi-firmware-uuid.md

+**Cons:**
+- Potential misuse of the spec field, which ideally should only reflect the user’s intended configuration.
+- The majority of users may not consider the UUID part of the desired VM state, making this field irrelevant from their perspective.
+- Can lead to longer and more complex VM definitions, impacting readability and management.


How come? Can we elaborate?

xpivarc · 2024-11-22T15:06:30Z

design-proposals/persist-vmi-firmware-uuid.md

+### 1. Persist UUID in the VM's Spec Field
+
+**Description:** The UUID is generated when the VMI is being created and is then saved to the VM's spec field.
+
+**Pros:**
+- Straightforward implementation.
+- Avoids the need to introduce new API fields.
+- Fully backward compatible.
+
+**Cons:**
+- Potential misuse of the spec field, which ideally should only reflect the user’s intended configuration.
+- The majority of users may not consider the UUID part of the desired VM state, making this field irrelevant from their perspective.
+- Can lead to longer and more complex VM definitions, impacting readability and management.


How do you ensure backward compatibility here?

Please take a look at #347 (comment).
@dasionov perhaps it's better to add this to the document (under several of the approaches)

xpivarc · 2024-11-22T15:10:01Z

design-proposals/persist-vmi-firmware-uuid.md

+### 2. Generate a Default UUID via Webhook and Set it to the Spec Field
+
+**Description:** A webhook generates a UUID if the user does not provide one. This UUID is then set in the VM's spec field.
+
+**Pros:**
+- Retains backward compatibility and avoids introducing new API fields.
+
+**Cons:**
+- Same concerns as above about the spec field.
+- Using webhooks to assign defaults can degrade performance, especially with high volumes of VMs, as `virt-api` may become a bottleneck.
+- May compromise scalability, as generating a UUID via webhook adds overhead when multiple VMs start simultaneously.
+
+---


You should describe why is it backward compatible (The controller uses same function to "generate" UUID in case of old vm where old vm is equal to vm that does not have the UUID set)

xpivarc · 2024-11-22T15:10:22Z

design-proposals/persist-vmi-firmware-uuid.md

+
+**Cons:**
+- Same concerns as above about the spec field.
+- Using webhooks to assign defaults can degrade performance, especially with high volumes of VMs, as `virt-api` may become a bottleneck.


How does this degrade a performance if the webhook already exists? How is this a bottleneck if the API can be scaled?

i guess maybe not everyone would want to scale the entire virt-api component just to get more web-hooks power IMHO

xpivarc · 2024-11-22T15:12:10Z

design-proposals/persist-vmi-firmware-uuid.md

+### 3. Create a New Firmware UUID Field in VM Status
+
+**Description:** A new field is added under VM status to store the generated firmware UUID upon first VM boot.
+
+**Pros:**
+- Keeps the spec clean, avoiding unnecessary fields in the user-defined configuration.
+- Provides a clear separation between user configuration (spec) and system-generated information (status).
+
+**Cons:**
+- Introduces a new API field, which could increase API surface area and requires long-term support.
+- New fields may impact system performance and resource load.
+- Adds to API complexity and may reduce readability.
+
+*Note:* To minimize this con, consider setting the UUID as a condition under VM status instead of introducing a separate field.
+


Again how do we ensure the existing VMs retain stable UUID?

I think the flow of option 3 is not well explained - at least from what I had in mind.
I see this working the following:

VM controller:

On start:
if the user explicitly set the vm.spec.template.spec.Firmware.UUID - propagate this to vmi.spec.Firmware.UUID

Otherwise (user didn't set firmware UUID explicitly):

sets the vmi.spec.Firmware.UUID to vm.metadata.uid if vm.Status.FirmwareUUID is empty (because it's a newly created VM - no VMI existed before)

if vm.status.FirmwareUUID is set - propagate it to vmi.spec.Firmware.UUID

On Sync:
if vm.status.FirmwareUUID is empty set it using vmi.spec.Firmware.UUID

This approach preserves backward compatibility and is still gitops friendly

xpivarc · 2024-11-22T15:12:37Z

design-proposals/persist-vmi-firmware-uuid.md

+
+---
+
+### 4. Introduce a Breaking Change to Ensure Universal Uniqueness


xpivarc · 2024-11-22T15:13:55Z

design-proposals/persist-vmi-firmware-uuid.md

+- May necessitate tooling to facilitate UUID preservation for compatibility with existing workloads.
+
+
+### 5. Upgrade-Specific Persistence of Firmware UUID


This would require having 2 releases to fix this issue or some kind of coordination of the upgrade that is not available today.

xpivarc · 2024-11-22T15:16:04Z

design-proposals/persist-vmi-firmware-uuid.md

+should not affect updates / rollbacks.
+
+## Functional Testing Approach
+Verify that a newly created VMI has a unique firmware UUID assigned and that this UUID persists across VMI restarts.


Don't we have this test already?
What we miss is upgrade compatibility.

can we even test that?

You could express that some parts are tested as e2e tests and others as unit tests.
You could detail the scenarios which needs coverage.

EdDev

Thank you, I like this proposal.
While I think there is room to improve some parts, the base is good and triggers conversation and ideas.

Thank you!

EdDev · 2024-11-24T13:01:27Z

design-proposals/persist-vmi-firmware-uuid.md

+
+
+## Definition of Users
+End Users: Individuals or organizations running VMs and VMIs on KubeVirt who require consistent firmware UUIDs for their applications.


We usually define VM owners (e.g. namespace/project admins) or/and cluster admins.
I guess you mean here to the first.

EdDev · 2024-11-24T13:04:49Z

design-proposals/persist-vmi-firmware-uuid.md

+End Users: Individuals or organizations running VMs and VMIs on KubeVirt who require consistent firmware UUIDs for their applications.
+
+## User Stories
+As an end-user, I expect my VMI to maintain its identity across restarts.


This does not look like a "story" to me.
And there are several scenarios, from creating and starting a new VM, starting an old VM (that was not running), restoring an old VM, restoring a new VM, etc.
Please list here all the cases that you intend to support and mention the ones you do not intend to support for this design.

EdDev · 2024-11-24T13:06:26Z

design-proposals/persist-vmi-firmware-uuid.md

@@ -0,0 +1,191 @@
+# Overview
+This proposal introduces a mechanism to persist the firmware UUID of a Virtual Machine Instance (VMI) in KubeVirt.


I was under the impression that we persist VM (not VMI) FW ID.

EdDev · 2024-11-24T13:32:01Z

design-proposals/persist-vmi-firmware-uuid.md

+**Description:** A webhook generates a UUID if the user does not provide one. This UUID is then set in the VM's spec field.
+This approach ensures backward compatibility because the controller will use the same function to generate a UUID for VMs that do not already have the UUID set.


It is not clear to me the added value of a webhook compared to a similar logic at the VM controller.
All changes to the VM resource passes through the VM controller, so please elaborate how a webhook can do more. If it is just an option, with identical pros as one in the controller, then please clarify that explicitly.

I think this got drop in the last update, so the comment is irrelevant now.

EdDev · 2024-11-24T13:40:01Z

design-proposals/persist-vmi-firmware-uuid.md

+## Repos
+Kubevirt/kubevirt
+
+# Design


There is a need to choose one solution and then list the others as alternatives.

EdDev · 2024-11-24T15:41:42Z

design-proposals/persist-vmi-firmware-uuid.md

+
+---
+
+## What Happens to the Firmware UUID During a Restore?


There are potentially to issues in this domain, one extends an existing problem in general and another may be specific to restoration (and cloning?):

There is a potential collision with an existing identical UUID, but this is rare enough so one could intentionally ignore.

If the same VM can be restored twice, then it has the potential to collide with a VM defined in the system.
Can a VM be restored twice, resulting in two cone VMs?

In the same context, cloning needs to be addressed.

It can be cloned, which will remove the VM specific data, but not restored twice.

EdDev · 2024-11-24T16:05:38Z

design-proposals/persist-vmi-firmware-uuid.md

+        type: Ready
+   created: true
+   runStrategy: Once
+   firmwareUUID: "123e4567-e89b-12d3-a456-426614174000"


Conceptually, it is not what status is meant to show.

Status is meant to represent the current state. However, the FW UUID cannot be "read" from the current state of the VM, because it is stopped.
There is a name to persisting such information on the status: caching.
Status is not about caching, it is about representing the actual. But here the suggestion is to store the actual so it will be the input for a future actual (which comes in the opposite direction).

We actually had this implemented in the network interfaces status and it caused a mess: To calculate the new status on a reconcile cycle, the previous data was considered, potentially dragging incorrect state over cycles indefinitely. We have since dropped this behavior, calculating the interfaces status without depending on the previous state.
But this is still tricky business even today, because multiple components attempt to touch the same list.

Bottom line, while this could be a good solution, caching the data on the status can potentially bite back. There is logic that uses this status information to set the UUID on the next VM instantiation and an opposite logic that reads the VM instantiation spec (or status?) to reflect that actual back to the same status field. Sounds like a loop to me.

EdDev · 2024-11-24T16:06:41Z

design-proposals/persist-vmi-firmware-uuid.md

+The proposed changes have no anticipated impact on scalability capabilities of the KubeVirt framework
+
+## Update/Rollback Compatibility
+should not affect updates / rollbacks.


This is where the content of ## What Happens to the Firmware UUID During a Restore? should land.
Or you should remove this part.

EdDev · 2024-11-24T16:08:28Z

design-proposals/persist-vmi-firmware-uuid.md

+should not affect updates / rollbacks.
+
+## Functional Testing Approach
+Verify that a newly created VMI has a unique firmware UUID assigned and that this UUID persists across VMI restarts.


You could express that some parts are tested as e2e tests and others as unit tests.
You could detail the scenarios which needs coverage.

EdDev · 2024-11-24T16:13:02Z

design-proposals/persist-vmi-firmware-uuid.md

+- Based on the selected design, either introduce a new field or utilize an existing one 
+(such as in spec, status) to store the firmware UUID.
+- Update controller logic to check for and persist the UUID, ensuring it is generated only once per VM.
+- Testing: add unit and functional tests


I do not think these are valid phases.

You cannot add any logic without unit tests and possibly basic e2e tests.
And you cannot add the API change without utilizing it, not even as a separate PR.

Usually phasing means that you provide a basic functionality that works, then extend it in the next phases. It can also include feature lifecycle stages, i.e. alpha, beta and GA planning.
BTW, you should mention explicitly that there is no plan to protect this new logic with a FG (the fact it was not mentioned, implied to me that this is the intention).

This proposal introduces a mechanism to persist the firmware UUID of a Virtual Machine in KubeVirt. By storing the firmware UUID, we ensure that it remains consistent across VM restarts. which is crucial for applications and services that rely on the UUID for identification or licensing purposes. Signed-off-by: Daniel Sionov <[email protected]>

kubevirt-bot added the dco-signoff: yes Indicates the PR's author has DCO signed all their commits. label Nov 6, 2024

kubevirt-bot requested review from dhiller and jobbler November 6, 2024 01:27

kubevirt-bot added the size/M label Nov 6, 2024

dasionov force-pushed the add_persist_vmi_firmware_uuid_in_vm_status_design_proposal branch 2 times, most recently from 4257d15 to b7735b0 Compare November 6, 2024 14:20

dankenigsberg suggested changes Nov 7, 2024

View reviewed changes

kubevirt-bot assigned dankenigsberg Nov 7, 2024

iholder101 reviewed Nov 7, 2024

View reviewed changes

design-proposals/persist-vmi-firmware-uuid.md Outdated Show resolved Hide resolved

iholder101 reviewed Nov 7, 2024

View reviewed changes

design-proposals/persist-vmi-firmware-uuid.md Outdated Show resolved Hide resolved

This was referenced Nov 7, 2024

Hashing the VM name for the UUID leads to duplicates kubevirt/kubevirt#11166

Open

Persistent firmware UUID kubevirt/kubevirt#13158

Closed

dasionov force-pushed the add_persist_vmi_firmware_uuid_in_vm_status_design_proposal branch from b7735b0 to 15c34f8 Compare November 7, 2024 11:32

kubevirt-bot added size/L and removed size/M labels Nov 7, 2024

iholder101 mentioned this pull request Nov 7, 2024

Same system UUID shared between nodes when migrate failed kubevirt/kubevirt#12305

Closed

Acedus reviewed Nov 7, 2024

View reviewed changes

dasionov force-pushed the add_persist_vmi_firmware_uuid_in_vm_status_design_proposal branch from 15c34f8 to c35b4a4 Compare November 7, 2024 13:45

dankenigsberg suggested changes Nov 7, 2024

View reviewed changes

design-proposals/persist-vmi-firmware-uuid.md Outdated Show resolved Hide resolved

design-proposals/persist-vmi-firmware-uuid.md Outdated Show resolved Hide resolved

dasionov force-pushed the add_persist_vmi_firmware_uuid_in_vm_status_design_proposal branch 2 times, most recently from 3751565 to 7201c75 Compare November 7, 2024 14:36

vladikr reviewed Nov 7, 2024

View reviewed changes

dankenigsberg suggested changes Nov 7, 2024

View reviewed changes

design-proposals/persist-vmi-firmware-uuid.md Outdated Show resolved Hide resolved

design-proposals/persist-vmi-firmware-uuid.md Outdated Show resolved Hide resolved

dasionov force-pushed the add_persist_vmi_firmware_uuid_in_vm_status_design_proposal branch from 7201c75 to 1279cbf Compare November 10, 2024 12:25

fabiand reviewed Nov 11, 2024

View reviewed changes

dankenigsberg suggested changes Nov 13, 2024

View reviewed changes

dasionov force-pushed the add_persist_vmi_firmware_uuid_in_vm_status_design_proposal branch from 1279cbf to b79c2d0 Compare November 14, 2024 10:34

dankenigsberg reviewed Nov 14, 2024

View reviewed changes

dasionov force-pushed the add_persist_vmi_firmware_uuid_in_vm_status_design_proposal branch from b79c2d0 to 3d599f9 Compare November 14, 2024 13:06

kubevirt-bot added the sig/compute label Nov 18, 2024

Acedus reviewed Nov 19, 2024

View reviewed changes

iholder101 reviewed Nov 19, 2024

View reviewed changes

dasionov force-pushed the add_persist_vmi_firmware_uuid_in_vm_status_design_proposal branch from 3d599f9 to f1166a8 Compare November 19, 2024 11:46

enp0s3 reviewed Nov 20, 2024

View reviewed changes

design-proposals/persist-vmi-firmware-uuid.md Outdated Show resolved Hide resolved

design-proposals/persist-vmi-firmware-uuid.md Outdated Show resolved Hide resolved

dasionov changed the title ~~design-proposal: persist-vmi-firmware-uuid~~ improved handling of the VMI firmware UUID Nov 20, 2024

dasionov force-pushed the add_persist_vmi_firmware_uuid_in_vm_status_design_proposal branch from f1166a8 to 4da1ce6 Compare November 20, 2024 11:38

dasionov changed the title ~~improved handling of the VMI firmware UUID~~ design-proposal: improved handling of the VMI firmware UUID Nov 20, 2024

dasionov force-pushed the add_persist_vmi_firmware_uuid_in_vm_status_design_proposal branch from 4da1ce6 to 45bdad4 Compare November 20, 2024 11:53

kubevirt-bot requested a review from EdDev November 20, 2024 15:14

vladikr reviewed Nov 22, 2024

View reviewed changes

xpivarc reviewed Nov 22, 2024

View reviewed changes

dasionov force-pushed the add_persist_vmi_firmware_uuid_in_vm_status_design_proposal branch 3 times, most recently from bc9b3ff to 58738d1 Compare November 24, 2024 15:13

EdDev reviewed Nov 24, 2024

View reviewed changes

dasionov force-pushed the add_persist_vmi_firmware_uuid_in_vm_status_design_proposal branch from 58738d1 to 2d28927 Compare November 25, 2024 14:21


		### 4. Introduce a Breaking Change to Ensure Universal Uniqueness

		Description: Modify UUID generation to use a combination of VM name and namespace, ensuring unique UUIDs within the cluster.

		- May necessitate tooling to facilitate UUID preservation for compatibility with existing workloads.


		### 5. Upgrade-Specific Persistence of Firmware UUID



		## Definition of Users
		End Users: Individuals or organizations running VMs and VMIs on KubeVirt who require consistent firmware UUIDs for their applications.

		@@ -0,0 +1,191 @@
		# Overview
		This proposal introduces a mechanism to persist the firmware UUID of a Virtual Machine Instance (VMI) in KubeVirt.

		Description: A webhook generates a UUID if the user does not provide one. This UUID is then set in the VM's spec field.
		This approach ensures backward compatibility because the controller will use the same function to generate a UUID for VMs that do not already have the UUID set.

design-proposal: improved handling of the VMI firmware UUID #347

Are you sure you want to change the base?

design-proposal: improved handling of the VMI firmware UUID #347

Conversation

dasionov commented Nov 6, 2024 • edited Loading

kubevirt-bot commented Nov 6, 2024

dankenigsberg left a comment

Choose a reason for hiding this comment

iholder101 left a comment • edited Loading

Choose a reason for hiding this comment

Acedus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Acedus Nov 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iholder101 commented Nov 7, 2024

Acedus commented Nov 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dankenigsberg Nov 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iholder101 Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dasionov Nov 19, 2024 • edited Loading

Choose a reason for hiding this comment

xpivarc commented Nov 18, 2024

kubevirt-bot commented Nov 18, 2024

iholder101 commented Nov 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enp0s3 left a comment

Choose a reason for hiding this comment

aburdenthehand commented Nov 20, 2024

Choose a reason for hiding this comment

iholder101 Nov 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iholder101 Nov 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dasionov Nov 23, 2024 • edited Loading

dasionov commented Nov 6, 2024 •

edited

Loading

iholder101 left a comment •

edited

Loading

Acedus Nov 19, 2024 •

edited

Loading

dankenigsberg Nov 13, 2024 •

edited

Loading

iholder101 Nov 14, 2024 •

edited

Loading

dasionov Nov 19, 2024 •

edited

Loading

iholder101 commented Nov 18, 2024 •

edited

Loading

iholder101 Nov 24, 2024 •

edited

Loading

iholder101 Nov 24, 2024 •

edited

Loading

dasionov Nov 23, 2024 •

edited

Loading

vladikr Nov 22, 2024 •

edited

Loading