Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

design-proposal: improved handling of the VMI firmware UUID #347

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dasionov
Copy link

@dasionov dasionov commented Nov 6, 2024

What this PR does / why we need it:

This proposal introduces a mechanism to persist the firmware UUID of a Virtual Machine Instance (VMI) in KubeVirt. By storing the firmware UUID, we ensure that it remains consistent across VMI restarts, which is
crucial for applications and services that rely on the UUID for identification or licensing purposes.

relates-to: kubevirt/kubevirt#13156, kubevirt/kubevirt#13158

Special notes for your reviewer:

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

Release note:

None

@kubevirt-bot kubevirt-bot added the dco-signoff: yes Indicates the PR's author has DCO signed all their commits. label Nov 6, 2024
@kubevirt-bot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign vladikr for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dasionov dasionov force-pushed the add_persist_vmi_firmware_uuid_in_vm_status_design_proposal branch 2 times, most recently from 4257d15 to b7735b0 Compare November 6, 2024 14:20
Copy link
Member

@dankenigsberg dankenigsberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dasionov for stepping up to create a design proposal. It would be wonderful if you can get this old issue sorted.

However, I think that you should discuss possible alternatives for a solution before deciding one implementation (persisting UUID in status).

design-proposals/persist-vmi-firmware-uuid.md Outdated Show resolved Hide resolved
design-proposals/persist-vmi-firmware-uuid.md Outdated Show resolved Hide resolved
design-proposals/persist-vmi-firmware-uuid.md Outdated Show resolved Hide resolved
design-proposals/persist-vmi-firmware-uuid.md Outdated Show resolved Hide resolved
Copy link
Contributor

@iholder101 iholder101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @dasionov for picking this issue up! It's indeed important to solve and keeps coming back!

I very much agree with most of what @dankenigsberg commented above, especially regarding the need to provide a few alternative solutions. This way we can discuss which would be the best way moving forward.

Let me try to help you with that. Here are a few alternatives that I have in mind:

Keep the UUID in the VM's spec even if the user did not request it
Partially implemented here: kubevirt/kubevirt#13158.

Main idea: The UUID would be generated, as of now, for every VM once it boots. After generation, the VM's spec would be updated with the generated UUID.
Pros: Easy to implement, avoid adding new API fields, fully backward compatible.
Cons: Abusing the spec. The spec should only contain the user's desired state. In most situations, the VM owner does not care what the UUID is, hence it is not a part of the desired state from the user's POV. This will complicate and make the VM's definition longer, hard to read, and hard to reason about.

Generate a default random UUID via a webhook and set it to the spec
Partially implemented here: kubevirt/kubevirt#12085.

Main idea: If the user did not provide a desired UUID, a webhook will generate a random UUID and will set it to the VM's spec.
Pros: Same as the previous suggestion.
Cons: Same argument from above regarding abusing the spec. In addition, using webhooks for supplying defaults is a bad practice for many reasons, for example it might hurt performance and scalability if a lot of VMs are being started at the same time, since then virt-api might become a bottle neck.

Create a new firmware UUID field under status

Main idea: Just as today, UUID would be generated during the VM's first boot. After it is generated, it would be saved to a dedicated firmware UUID status field.
Pros: No need to abuse spec and we can keep it clean.
Cons: Adding a new API field which is generally not valuable outside the scope of this issue. New API fields need to be added with extreme caution, as they are very hard to remove, might increase load and hurt performance, and make the API less readable and compact. (We can consider minimizing this con by setting the UUID in a VM condition as suggested here).

Introduce a breaking change

Main idea: Just as today, UUID would be generated during the VM's first boot, but it would generate the UUID based on both name and namespace, which will be a breaking change. Before doing so, we would warn the users for a few versions (via logs/alerts/etc) and possibly providing tooling to add the current firmware UUID to the VM's spec in order to protect the workload.
Pros: No need to abuse spec and we can keep it clean, very easy to implement.
Cons: Demands user intervention in order to avoid breaking current workloads.

Obviously, feel free to add more alternatives that you can think of, or express your opinion on the above alternatives.

design-proposals/persist-vmi-firmware-uuid.md Outdated Show resolved Hide resolved
Copy link

@Acedus Acedus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks like a solid foundation. One concern I have is around VM snapshots—I think it's worth mentioning that even if the firmware UUID of an existing VM is set to persist, it won’t be retained during a restore. This could potentially cause functionality issues.

design-proposals/persist-vmi-firmware-uuid.md Show resolved Hide resolved
design-proposals/persist-vmi-firmware-uuid.md Outdated Show resolved Hide resolved
design-proposals/persist-vmi-firmware-uuid.md Outdated Show resolved Hide resolved
design-proposals/persist-vmi-firmware-uuid.md Show resolved Hide resolved
design-proposals/persist-vmi-firmware-uuid.md Outdated Show resolved Hide resolved
- Keeps the spec clean.
- Focuses on VM status for generated information.

**Cons:**
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A drawback here in my view is that an object's status is often dynamic and therefore somewhat fragile. This makes sense, as controllers manipulate the status field to represent the object's current state, while they rarely—if ever—modify the object's spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A drawback here in my view is that an object's status is often dynamic and therefore somewhat fragile. This makes sense, as controllers manipulate the status field to represent the object's current state

It is a matter of how we implement the field and the controllers that would interact with it.
I mean, as I see it, this status field will be only ever assigned once then never modified again by any controller. In this scenario I don't think it would be highly dynamic and fragile.

while they rarely—if ever—modify the object's spec.

And for a good reason :)
Ideally, only a human should ever modify a VM's spec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple k8s objects rely on their status fields to function. It's not that fragile and is not editable for users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this option, IMO, the flow simplifies: on VM start, the VM controller stores either the user-specified spec.template.spec.firmware.uuid or vm.metadata.uid.
No webhooks are required.
We can also trigger events for running VMs without a firmware UUID in the status to restart to get a stable UUID across shutdowns or just raise the restart required condition.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this option, IMO, the flow simplifies: on VM start, the VM controller stores either the user-specified spec.template.spec.firmware.uuid or vm.metadata.uid.
No webhooks are required.
We can also trigger events for running VMs without a firmware UUID in the status to restart to get a stable UUID across shutdowns or just raise the restart required condition.

Just to ensure that I understood correctly. In this MO, existing VMs would not persist the firmware UUID specified in their VMI counterparts but rather set the status firmware UUID to (for example) vm.metadata.uid and raise the restart required condition? This acknowledges that disruption (i.e. firmware UUID changes post-restart to affect anything that relies on it) is imminent, correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This acknowledges that disruption (i.e. firmware UUID changes post-restart to affect anything that relies on it) is imminent, correct?

This disruption has to be prevented. Otherwise it will be a breaking change.
In order to prevent it I believe we'd have to apply a multi-phased approach:

  1. Continue with the current firmware UUID calculation, but store the UUID in status.
  2. As @vladikr suggested, use alerts / VM conditions in order to notify the users to restart VMs that do not have firmware UUID in their status.
  3. After a while, let the community know (i.e. via a mail to mailing list) that in the next version firmware UUID must reside in status in order to prevent breaking workloads.
  4. Switch to using vm.metadata.uid as the new firmware UUID (unless a UUID is already provided in the VM's status, therefore keeping backward compatibility).

Copy link

@Acedus Acedus Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @vladikr suggested, use alerts / VM conditions in order to notify the users to restart VMs that do not have firmware UUID in their status.

So at least a restart is required to derive the firmware UUID from the VMI (including shutdown VMs) so that it would persist in status, correct?

Switch to using vm.metadata.uid as the new firmware UUID (unless a UUID is already provided in the VM's status, therefore keeping backward compatibility).

Assuming a scenario where n existing VMs in cluster(s) suffer from the issue of overlapping firmware UUIDs, what is the recommended approach for users to rectify the issue post-upgrade? Recreate the VMs? Change the spec.firmware.uuid to what appears in vm.metadata.uid and restart? Is user action mandatory to resolve the issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So at least a restart is required to derive the firmware UUID from the VMI (including shutdown VMs) so that it would persist in status, correct?

Now that I'm re-thinking about it, a restart might not be necessary, as virt-controller could simply copy spec.firmware.uuid to the status field if exists, and if not, re-calc the hash and assign it to status.

Assuming a scenario where n existing VMs in cluster(s) suffer from the issue of overlapping firmware UUIDs, what is the recommended approach for users to rectify the issue post-upgrade? Recreate the VMs? Change the spec.firmware.uuid to what appears in vm.metadata.uid and restart? Is user action mandatory to resolve the issue?

TBH I'm not sure that we can redeem this situation. If two VMs are already running with the same firmware UUID, I think one has to be re-created. In any case, I think it shouldn't be in the scope of this proposal which only aims to prevent this situation from happening in the future.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I'm not sure that we can redeem this situation. If two VMs are already running with the same firmware UUID, I think one has to be re-created. In any case, I think it shouldn't be in the scope of this proposal which only aims to prevent this situation from happening in the future.

SGTM, thanks.

@dasionov dasionov force-pushed the add_persist_vmi_firmware_uuid_in_vm_status_design_proposal branch from 15c34f8 to c35b4a4 Compare November 7, 2024 13:45
design-proposals/persist-vmi-firmware-uuid.md Outdated Show resolved Hide resolved
design-proposals/persist-vmi-firmware-uuid.md Outdated Show resolved Hide resolved
@iholder101
Copy link
Contributor

One concern I have is around VM snapshots—I think it's worth mentioning that even if the firmware UUID of an existing VM is set to persist, it won’t be retained during a restore.

@Acedus would you mind elaborating on this one? why won't it be retained during a restore?

@Acedus
Copy link

Acedus commented Nov 7, 2024

One concern I have is around VM snapshots—I think it's worth mentioning that even if the firmware UUID of an existing VM is set to persist, it won’t be retained during a restore.

@Acedus would you mind elaborating on this one? why won't it be retained during a restore?

I'm referring to VM backups taken before the introduction of the changes described in this DP. Correct me if I'm wrong, but the VM snapshot content won't be updated as part of this change (e.g., whether through spec or status), and if it isn't, once the new UUID generation mechanism is introduced and a restore is attempted the UUID will change.

@dasionov dasionov force-pushed the add_persist_vmi_firmware_uuid_in_vm_status_design_proposal branch 2 times, most recently from 3751565 to 7201c75 Compare November 7, 2024 14:36
firmwareUUID: "123e4567-e89b-12d3-a456-426614174000"
```

or persist via status condition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue against a condition. Conditions are generally needed to manage the lifecycle or to be consumed by other controllers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conditions are not for persistence.

design-proposals/persist-vmi-firmware-uuid.md Outdated Show resolved Hide resolved
design-proposals/persist-vmi-firmware-uuid.md Outdated Show resolved Hide resolved
@dasionov dasionov force-pushed the add_persist_vmi_firmware_uuid_in_vm_status_design_proposal branch from 7201c75 to 1279cbf Compare November 10, 2024 12:25
**Pros:**
- The upgrade-specific code can be safely removed after two or three releases.
- Simple logic post-upgrade, with a straightforward UUID assignment.
- Limited disturbance to GitOps workflows, affecting only pre-existing VMs with the current method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only true if gitops is setting metadata.uid, correct?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabiand @iholder101 What if the VM controller would dynamically add this annotation during VM creation if vm.spec.template.spec.Firmware.UUID is not defined?

For example, when a VM is created (via GitOps or using oc/kubectl), the controller would set the following annotation:

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  annotations:
    kubevirt.io/host-uuid: "<host-uuid>"

Does that align with your understanding? Thank you!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jangel97 what is the benefit of storing this in a new annotation, instead of an already-existing API?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe leveraging existing solutions would be the most effective approach. However, we should also consider scenarios where a system has a host UUID, and this breaking change could potentially alter it. In that case, if I understand correctly, wouldn't it be necessary to retain the "old" host UUID somewhere?

- May potentially bottleneck `virt-api`.

3. **Add a Firmware UUID Field to VM Status**
**Description:** Store the generated UUID within the VM’s status.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Status is only live, it must not be used to store information.

aka any informatoin in the status in only relevant and "persisted" on the runtime platform.
Object smust not be created with a populated status (only empty status is what we want: status: {})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Object smust not be created with a populated status (only empty status is what we want: status: {})

A VM object won't be created with a populated status, but a VM that was created, booted, then shut off would have a non-empty status.

Am I missing anything?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @fabiand :)
just want to understand if I get you correctly.

type: Ready
created: true
runStrategy: Once
firmwareUUID: "123e4567-e89b-12d3-a456-426614174000"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.status ins only for erporting, not for persitence.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate?
How is spec more persistent than status? Aren't they both persisted in etcd?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.status ins only for erporting, not for persitence.

@fabiand I don't know where this is coming from. Numerous objects across k8s use the status exactly for that purpose, including objects in KubeVirt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VM object, until it is deleted would have a status - it simply stores the current state of the VM, while the spec holds the desired.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually, it is not what status is meant to show.

Status is meant to represent the current state. However, the FW UUID cannot be "read" from the current state of the VM, because it is stopped.
There is a name to persisting such information on the status: caching.
Status is not about caching, it is about representing the actual. But here the suggestion is to store the actual so it will be the input for a future actual (which comes in the opposite direction).

We actually had this implemented in the network interfaces status and it caused a mess: To calculate the new status on a reconcile cycle, the previous data was considered, potentially dragging incorrect state over cycles indefinitely. We have since dropped this behavior, calculating the interfaces status without depending on the previous state.
But this is still tricky business even today, because multiple components attempt to touch the same list.

Bottom line, while this could be a good solution, caching the data on the status can potentially bite back. There is logic that uses this status information to set the UUID on the next VM instantiation and an opposite logic that reads the VM instantiation spec (or status?) to reflect that actual back to the same status field. Sounds like a loop to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if the VM is stopped the status shows the current state at this UUID is stored on the VM boot disk
As long as the VM exists this is its current state, regardless of its phase.

firmwareUUID: "123e4567-e89b-12d3-a456-426614174000"
```

or persist via status condition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conditions are not for persistence.

**Pros:**
- The upgrade-specific code can be safely removed after two or three releases.
- Simple logic post-upgrade, with a straightforward UUID assignment.
- Limited disturbance to GitOps workflows, affecting only pre-existing VMs with the current method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jangel97 what is the benefit of storing this in a new annotation, instead of an already-existing API?


**Cons:**
- Requires additional code (likely in `virt-operator`) to handle the upgrade-specific persistence.
- Patching the spec of a running VM may trigger a "restart required" condition.
Copy link
Member

@dankenigsberg dankenigsberg Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be easily avoided in code, if we find that vm's Firmware.UUID has NOT differed from the vmi's?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, patching object specs after the admission is not a good practice that hides the user intent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, patching object specs after the admission is not a good practice that hides the user intent.

Ack. And I'd say that patching during admission is just as bad in this respect.
However in this case, the user never intended us to have a buggy UUID, but must have it persisted. So I don't see here any issue, beside the (limited) disturbance to git-ops.

Can this be easily avoided in code, if we find that vm's Firmware.UUID has NOT differed from the vmi's?

@dasionov would you refer to this? I believe that the Con should say

- Code should be added to the computation of the "restart required" condition so that it is not raised if vm.Template.Firmware.UUID equals vmi.Firmware.UUID.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sry for the delay, fixed

@dasionov dasionov force-pushed the add_persist_vmi_firmware_uuid_in_vm_status_design_proposal branch from 1279cbf to b79c2d0 Compare November 14, 2024 10:34

### 4. Introduce a Breaking Change to Ensure Universal Uniqueness

**Description:** Modify UUID generation to use a combination of VM name and namespace, ensuring unique UUIDs within the cluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about universal uniqueness? How would it be ensured?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about universal uniqueness? How would it be ensured?

Using both name + namespace as a hash will ensure uniqueness AFAICT

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not universal uniqueness. a different vm on a different cluster would obtain the same uuid if it has the same name and namespace.

Copy link
Contributor

@iholder101 iholder101 Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got you.
We can always add something "random" to the hash, like creation timestamp. WDYT?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to clarify the impact of VMs in different clusters having the same UUID if they share the same name and namespace?
Are there specific cases where cross-cluster uniqueness is essential for the UUID to function correctly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that @dankenigsberg is referring migrating a VM from a different cluster, e.g. via importing/exporting disks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there specific cases where cross-cluster uniqueness is essential for the UUID to function correctly?

By definition, UUID has to be universally-unique. A lot of services would break if two different VMs present matching identifiers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

@dasionov dasionov Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean the vm.metadata.uid?

@dasionov dasionov force-pushed the add_persist_vmi_firmware_uuid_in_vm_status_design_proposal branch from b79c2d0 to 3d599f9 Compare November 14, 2024 13:06
@xpivarc
Copy link
Member

xpivarc commented Nov 18, 2024

/label sig/compute

@kubevirt-bot
Copy link

@xpivarc: The label(s) /label sig/compute cannot be applied. These labels are supported: good-first-issue, needs-approver-review. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to this:

/label sig/compute

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@iholder101
Copy link
Contributor

iholder101 commented Nov 18, 2024

/label sig/compute

/sig compute
(yeah, confusing as hell)

- May necessitate tooling to facilitate UUID preservation for compatibility with existing workloads.


### 5. Upgrade-Specific Persistence of Firmware UUID
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t necessarily object to this approach—in fact, it’s likely unavoidable to prevent disruption to existing workloads—but the process for implementing it should be more thoroughly detailed IMO.

Correct me if I'm wrong, but if VMs are stopped during the upgrade, they cannot rely on VMI objects to derive and persist the firmware UUID, as those objects won’t exist. As a result, existing VMs without a defined spec.firmware.uuid field would need to populate the persistent UUID field (whatever that may be) using the current method (based on the name). The new method should then be applied exclusively to new VMs.

While this doesn’t cover all use cases, it’s reasonable to assume it addresses the majority—except for scenarios involving (for example) backup and restore, which present a separate set of challenges.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the reservation that you are raising. Option 5 does not distinguish between running and non-running VMs. During upgrade, they are all changed to persist the buggy uuid. After upgrade, all VMs receive a fresh uuid.

The problem of restoring old VMs that did not persist their buggy uuid sounds important, but not specific to this option. @dasionov can you address it? in the design? I don't have any good idea...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the original idea was to persist the UUID in existing VMs by obtaining it from the VMIs, therefore my comment. Perhaps I mixed up the proposal with implementation, my bad, you can disregard the comment in that case.

Comment on lines 101 to 99
### 5. Upgrade-Specific Persistence of Firmware UUID

**Description:** Before upgrading KubeVirt, persist the `Firmware.UUID` of existing VMsin `vm.spec`.
After the upgrade, any VM without `Firmware.UUID` is considered new.
For these new VMs, use `vm.metadata.uid` as the firmware UUID if `vm.spec.template.spec.Firmware.UUID` is not defined.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another problem with this approach is upgrades that skip versions.
IOW, a user can upgrade Kubevirt v1.0 straight to v1.3, skipping v1.1 and v1.2. With this approach, such users will break their VMs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we support skipping intermediate minor releases? I don't think that we test that.
Anyway, the proposal says below

  • Upgrade-specific code can be removed after two or three releases.

so a 1.4->1.7 would still be safe in this regard.

@dasionov dasionov force-pushed the add_persist_vmi_firmware_uuid_in_vm_status_design_proposal branch from 3d599f9 to f1166a8 Compare November 19, 2024 11:46
Copy link
Contributor

@enp0s3 enp0s3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dasionov Hi, I would suggest to simplify the PR subject and call it "improved handling of the VM firmware UUID", sine I see that you want to cover more than only the UUID persistence subject.

design-proposals/persist-vmi-firmware-uuid.md Outdated Show resolved Hide resolved
design-proposals/persist-vmi-firmware-uuid.md Outdated Show resolved Hide resolved
@dasionov dasionov changed the title design-proposal: persist-vmi-firmware-uuid improved handling of the VMI firmware UUID Nov 20, 2024
@dasionov dasionov force-pushed the add_persist_vmi_firmware_uuid_in_vm_status_design_proposal branch from f1166a8 to 4da1ce6 Compare November 20, 2024 11:38
@dasionov dasionov changed the title improved handling of the VMI firmware UUID design-proposal: improved handling of the VMI firmware UUID Nov 20, 2024
@dasionov dasionov force-pushed the add_persist_vmi_firmware_uuid_in_vm_status_design_proposal branch from 4da1ce6 to 45bdad4 Compare November 20, 2024 11:53
@aburdenthehand
Copy link
Member

/cc @EdDev

**Pros:**
- Straightforward implementation.
- Avoids the need to introduce new API fields.
- Fully backward compatible.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this break existing VMs? After a shutdown, these will get a new UUID.

Copy link
Contributor

@iholder101 iholder101 Nov 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a shutdown, these will get a new UUID.

IIUC the idea here is a multi-phased approach, in which UUID will first be saved to the spec and then (after a few releases) the way of computing the UUID will change. IOW this approach is similar to the status field approach but with a spec field.

**Cons:**
- Introduces a new API field, which could increase API surface area and requires long-term support.
- New fields may impact system performance and resource load.
- Adds to API complexity and may reduce readability.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this con, can you elaborate?


**Cons:**
- Introduces a new API field, which could increase API surface area and requires long-term support.
- New fields may impact system performance and resource load.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how?

**Cons:**
- Potential misuse of the spec field, which ideally should only reflect the user’s intended configuration.
- The majority of users may not consider the UUID part of the desired VM state, making this field irrelevant from their perspective.
- Can lead to longer and more complex VM definitions, impacting readability and management.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come? Can we elaborate?

Comment on lines 35 to 47
### 1. Persist UUID in the VM's Spec Field

**Description:** The UUID is generated when the VMI is being created and is then saved to the VM's spec field.

**Pros:**
- Straightforward implementation.
- Avoids the need to introduce new API fields.
- Fully backward compatible.

**Cons:**
- Potential misuse of the spec field, which ideally should only reflect the user’s intended configuration.
- The majority of users may not consider the UUID part of the desired VM state, making this field irrelevant from their perspective.
- Can lead to longer and more complex VM definitions, impacting readability and management.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you ensure backward compatibility here?

Copy link
Contributor

@iholder101 iholder101 Nov 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at #347 (comment).
@dasionov perhaps it's better to add this to the document (under several of the approaches)

Comment on lines 51 to 68
### 2. Generate a Default UUID via Webhook and Set it to the Spec Field

**Description:** A webhook generates a UUID if the user does not provide one. This UUID is then set in the VM's spec field.

**Pros:**
- Retains backward compatibility and avoids introducing new API fields.

**Cons:**
- Same concerns as above about the spec field.
- Using webhooks to assign defaults can degrade performance, especially with high volumes of VMs, as `virt-api` may become a bottleneck.
- May compromise scalability, as generating a UUID via webhook adds overhead when multiple VMs start simultaneously.

---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should describe why is it backward compatible (The controller uses same function to "generate" UUID in case of old vm where old vm is equal to vm that does not have the UUID set)


**Cons:**
- Same concerns as above about the spec field.
- Using webhooks to assign defaults can degrade performance, especially with high volumes of VMs, as `virt-api` may become a bottleneck.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this degrade a performance if the webhook already exists? How is this a bottleneck if the API can be scaled?

Copy link
Author

@dasionov dasionov Nov 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess maybe not everyone would want to scale the entire virt-api component just to get more web-hooks power IMHO

Comment on lines 65 to 92
### 3. Create a New Firmware UUID Field in VM Status

**Description:** A new field is added under VM status to store the generated firmware UUID upon first VM boot.

**Pros:**
- Keeps the spec clean, avoiding unnecessary fields in the user-defined configuration.
- Provides a clear separation between user configuration (spec) and system-generated information (status).

**Cons:**
- Introduces a new API field, which could increase API surface area and requires long-term support.
- New fields may impact system performance and resource load.
- Adds to API complexity and may reduce readability.

*Note:* To minimize this con, consider setting the UUID as a condition under VM status instead of introducing a separate field.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again how do we ensure the existing VMs retain stable UUID?

Copy link
Member

@vladikr vladikr Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the flow of option 3 is not well explained - at least from what I had in mind.
I see this working the following:

VM controller:

  • On start:
    if the user explicitly set the vm.spec.template.spec.Firmware.UUID - propagate this to vmi.spec.Firmware.UUID

    Otherwise (user didn't set firmware UUID explicitly):

    • sets the vmi.spec.Firmware.UUID to vm.metadata.uid if vm.Status.FirmwareUUID is empty (because it's a newly created VM - no VMI existed before)
    • if vm.status.FirmwareUUID is set - propagate it to vmi.spec.Firmware.UUID
  • On Sync:
    if vm.status.FirmwareUUID is empty set it using vmi.spec.Firmware.UUID

This approach preserves backward compatibility and is still gitops friendly


---

### 4. Introduce a Breaking Change to Ensure Universal Uniqueness
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1

- May necessitate tooling to facilitate UUID preservation for compatibility with existing workloads.


### 5. Upgrade-Specific Persistence of Firmware UUID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would require having 2 releases to fix this issue or some kind of coordination of the upgrade that is not available today.

should not affect updates / rollbacks.

## Functional Testing Approach
Verify that a newly created VMI has a unique firmware UUID assigned and that this UUID persists across VMI restarts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have this test already?
What we miss is upgrade compatibility.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we even test that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could express that some parts are tested as e2e tests and others as unit tests.
You could detail the scenarios which needs coverage.

@dasionov dasionov force-pushed the add_persist_vmi_firmware_uuid_in_vm_status_design_proposal branch 3 times, most recently from bc9b3ff to 58738d1 Compare November 24, 2024 15:13
Copy link
Member

@EdDev EdDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I like this proposal.
While I think there is room to improve some parts, the base is good and triggers conversation and ideas.

Thank you!



## Definition of Users
End Users: Individuals or organizations running VMs and VMIs on KubeVirt who require consistent firmware UUIDs for their applications.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually define VM owners (e.g. namespace/project admins) or/and cluster admins.
I guess you mean here to the first.

End Users: Individuals or organizations running VMs and VMIs on KubeVirt who require consistent firmware UUIDs for their applications.

## User Stories
As an end-user, I expect my VMI to maintain its identity across restarts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not look like a "story" to me.
And there are several scenarios, from creating and starting a new VM, starting an old VM (that was not running), restoring an old VM, restoring a new VM, etc.
Please list here all the cases that you intend to support and mention the ones you do not intend to support for this design.

@@ -0,0 +1,191 @@
# Overview
This proposal introduces a mechanism to persist the firmware UUID of a Virtual Machine Instance (VMI) in KubeVirt.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was under the impression that we persist VM (not VMI) FW ID.

Comment on lines 57 to 58
**Description:** A webhook generates a UUID if the user does not provide one. This UUID is then set in the VM's spec field.
This approach ensures backward compatibility because the controller will use the same function to generate a UUID for VMs that do not already have the UUID set.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me the added value of a webhook compared to a similar logic at the VM controller.
All changes to the VM resource passes through the VM controller, so please elaborate how a webhook can do more. If it is just an option, with identical pros as one in the controller, then please clarify that explicitly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this got drop in the last update, so the comment is irrelevant now.

## Repos
Kubevirt/kubevirt

# Design
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a need to choose one solution and then list the others as alternatives.


---

## What Happens to the Firmware UUID During a Restore?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are potentially to issues in this domain, one extends an existing problem in general and another may be specific to restoration (and cloning?):

  • There is a potential collision with an existing identical UUID, but this is rare enough so one could intentionally ignore.
  • If the same VM can be restored twice, then it has the potential to collide with a VM defined in the system.
    Can a VM be restored twice, resulting in two cone VMs?

In the same context, cloning needs to be addressed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be cloned, which will remove the VM specific data, but not restored twice.

type: Ready
created: true
runStrategy: Once
firmwareUUID: "123e4567-e89b-12d3-a456-426614174000"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually, it is not what status is meant to show.

Status is meant to represent the current state. However, the FW UUID cannot be "read" from the current state of the VM, because it is stopped.
There is a name to persisting such information on the status: caching.
Status is not about caching, it is about representing the actual. But here the suggestion is to store the actual so it will be the input for a future actual (which comes in the opposite direction).

We actually had this implemented in the network interfaces status and it caused a mess: To calculate the new status on a reconcile cycle, the previous data was considered, potentially dragging incorrect state over cycles indefinitely. We have since dropped this behavior, calculating the interfaces status without depending on the previous state.
But this is still tricky business even today, because multiple components attempt to touch the same list.

Bottom line, while this could be a good solution, caching the data on the status can potentially bite back. There is logic that uses this status information to set the UUID on the next VM instantiation and an opposite logic that reads the VM instantiation spec (or status?) to reflect that actual back to the same status field. Sounds like a loop to me.

The proposed changes have no anticipated impact on scalability capabilities of the KubeVirt framework

## Update/Rollback Compatibility
should not affect updates / rollbacks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the content of ## What Happens to the Firmware UUID During a Restore? should land.
Or you should remove this part.

should not affect updates / rollbacks.

## Functional Testing Approach
Verify that a newly created VMI has a unique firmware UUID assigned and that this UUID persists across VMI restarts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could express that some parts are tested as e2e tests and others as unit tests.
You could detail the scenarios which needs coverage.

Comment on lines 164 to 167
- Based on the selected design, either introduce a new field or utilize an existing one
(such as in spec, status) to store the firmware UUID.
- Update controller logic to check for and persist the UUID, ensuring it is generated only once per VM.
- Testing: add unit and functional tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think these are valid phases.

You cannot add any logic without unit tests and possibly basic e2e tests.
And you cannot add the API change without utilizing it, not even as a separate PR.

Usually phasing means that you provide a basic functionality that works, then extend it in the next phases. It can also include feature lifecycle stages, i.e. alpha, beta and GA planning.
BTW, you should mention explicitly that there is no plan to protect this new logic with a FG (the fact it was not mentioned, implied to me that this is the intention).

This proposal introduces a mechanism to persist the firmware UUID of a
Virtual Machine in KubeVirt.
By storing the firmware UUID, we ensure that it remains consistent
across VM restarts.
which is crucial for applications and services that rely on the UUID for
identification or licensing purposes.

Signed-off-by: Daniel Sionov <[email protected]>
@dasionov dasionov force-pushed the add_persist_vmi_firmware_uuid_in_vm_status_design_proposal branch from 58738d1 to 2d28927 Compare November 25, 2024 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has DCO signed all their commits. sig/compute size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.