Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CA-402901: Update leaked dp to Sr #6169

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 14 additions & 1 deletion ocaml/xapi/storage_smapiv1_wrapper.ml
Original file line number Diff line number Diff line change
Expand Up @@ -453,6 +453,9 @@ functor
List.fold_left perform_one vdi_t ops

let perform_nolock context ~dbg ~dp ~sr ~vdi ~vm this_op =
debug "%s dp=%s, sr=%s, vdi=%s, vm=%s, op=%s" __FUNCTION__ dp
(s_of_sr sr) (s_of_vdi vdi) (s_of_vm vm)
(Vdi_automaton.string_of_op this_op) ;
match Host.find sr !Host.host with
| None ->
raise (Storage_error (Sr_not_attached (s_of_sr sr)))
Expand All @@ -473,6 +476,15 @@ functor
superstate to superstate'. These may fail: if so we revert the
datapath+VDI state to the most appropriate value. *)
let ops = Vdi_automaton.( - ) superstate superstate' in
debug "%s %s -> %s: %s" __FUNCTION__
(Vdi_automaton.string_of_state superstate)
(Vdi_automaton.string_of_state superstate')
(String.concat ", "
(List.map
(fun (op, _) -> Vdi_automaton.string_of_op op)
ops
)
) ;
side_effects context dbg dp sr sr_t vdi vdi_t vm ops
with e ->
let e =
Expand Down Expand Up @@ -529,7 +541,8 @@ functor
)
with e ->
if not allow_leak then (
ignore (Vdi.add_leaked dp vdi_t) ;
Sr.add_or_replace vdi (Vdi.add_leaked dp vdi_t) sr_t ;
Copy link
Member

@minglumlu minglumlu Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to help myself to understand...
This is only for SMAPIv1.
There are two cases in which the destroy_datapath_nolock will be called:

  1. DP.destroy_sr: this happens when SR.detach or VBD.unplug.
  2. remove_datapaths_andthen_nolock: this happens before all VDI related operations.
    In the case 1, it is to deliberately remove the datapath. Now with the fix, if the removal failed due to exceptions from e.g. SM, the datapath will be recorded as leaked in both memory and file. So that in case 2, it could be identified and be removed again. Furthermore, the removal failure again in case 2 will fail the then VDI operation and eventually expose the error. I think this is expected as it would be bad to ignore the error which might cause more issues.
    After all, from a user's perspective,
    if the case 1 is for a VM.reboot or VM.shutdown, the failure would get the VM in halted state, and a VM.start would get the VM running back;
    if the case 1 is for a VM.pool_migrate or VDI.pool_migrate, the VM could be started on the destination again after the toolstack restart on the source host.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in VM.pool_migrate or VDI.pool_migrate case, the toolstack restart only remove the DPs used in migrations, would it remove the leaked DPs of the VM? The VM doesn't resident on the source host actually. But the leaked DPs still blocks the VM attach on the destination host.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only for SMAPIv1.

On master (XS 8), yes, as the Storage_smapiv1_wrapper covers only SMAPIv1 SRs. Before, the module was called Storage_impl and covered both SMAPIv1 and SMAPIv3. The change was made following an update on how qemu-dp is used inside SMAPIv3 plugins.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in VM.pool_migrate or VDI.pool_migrate case, the toolstack restart only remove the DPs used in migrations, would it remove the leaked DPs of the VM? The VM doesn't resident on the source host actually. But the leaked DPs still blocks the VM attach on the destination host.

If a dp was leaked during a VM.reboot, then it will now be properly recorded as such. A following VM.pool_migrate includes a VDI.deactivate call on the source host, where now the wrapper will notice the leaked dp and will get rid of it before proceeding. This happens before VDI.activate is called on the remote host.

Everything.to_file !host_state_path (Everything.make ()) ;
raise e
) else (
(* allow_leak means we can forget this dp *)
Expand Down
Loading