Merge master to pool licensing feature branch #6185

minglumlu · 2024-12-13T07:43:57Z

No description provided.

Signed-off-by: Danilo Del Busso <[email protected]>

This enables extending the type without causing performance issues, and should reduce the work for the garbage collector. Signed-off-by: Edwin Török <[email protected]>

Signed-off-by: Andrii Sultanov <[email protected]>

Instead of getting one timestamp for all collectors, get them closer to the actual measurement time. Signed-off-by: Andrii Sultanov <[email protected]>

…them Signed-off-by: Andrii Sultanov <[email protected]>

Instead of timestamps taken at read time in xcp-rrdd, propagate timestamps taken by plugins (or collectors in xcp-rrdd itself) and use these when updating values. Also process datasources without flattening them all into one list. This allows to process datasources with the same timestamp (provided by the plugin or dom0 collector in xcp_rrdd) at once. Co-authored-by: Edwin Török <[email protected]> Signed-off-by: Edwin Török <[email protected]> Signed-off-by: Andrii Sultanov <[email protected]>

RRDs can have different owners, and new ones can show up without new domids being involved (SRs, VMs are also owner types). Signed-off-by: Andrii Sultanov <[email protected]>

ds_update* functions previously relied on being called for the entirety of datasources of a certain RRD. Instead, operate on chunks of datasources provided by the same plugin. These chunks are not contiguous (more detailed explanation in the comments in the code), so indices to rrd_dss and the associated structures need to be carried with each datasource. Also turns the value*transform tuple into a type for more explicit accesses. Signed-off-by: Andrii Sultanov <[email protected]>

…and archiving This ensures that the interval between the current timestamp and previous update is calculated correctly. Keep the same external interface through rrdd_http_handler by adding an 'internal' parameter to xml exporting functions. Signed-off-by: Andrii Sultanov <[email protected]>

Add an "unsafe" version of rrd_add_ds to avoid checking if the datasource already exists twice. Signed-off-by: Andrii Sultanov <[email protected]>

…m explicitly If RRD's datasource disappears and is not reported by the plugin (or collected by xcp-rrdd), then it's not going to be refreshed, keeping its old value and timestamp. Previously, this was handled nicely because the whole RRD was updated at once, and updates were defaulted to VT_Unknown, Identity before being assigned actually collected values. After RRD collection was changed to process datasources in chunks per source, we need to explicitly find the ones that weren't updated on this iteration, and explicitly reset them. (This is not just a theoretical exercise, it can happen when a VIF gets unplugged or destroyed, for example, with its stats not being reset and continuing to display old values) Signed-off-by: Andrii Sultanov <[email protected]>

Instead of writing Int64 (truncated from timestamp floats) into the memory-mapped files, keep the precision of the timestamp. Signed-off-by: Andrii Sultanov <[email protected]>

RRDD, aside from the OCaml library, exposes two interfaces. This changes the Python one. The C one lives in rrd-client-lib and needs to be changed at the same time and coordinated during updating. Signed-off-by: Andrii Sultanov <[email protected]>

Signed-off-by: Andrii Sultanov <[email protected]>

When a host starts, the systemd service xapi-wait-init-complete waiting on the creation of the xapi init cookie file may fail on timeout for a matter of seconds. This patch adds 1 minute (300 seconds total) to the timeout passed to the script xapi-wait-init-complete. Signed-off-by: Thierry Escande <[email protected]>

Signed-off-by: Danilo Del Busso <[email protected]>

Without the change, the tests only pass when running on a runner with UTC as a timezone Signed-off-by: Danilo Del Busso <[email protected]>

Previously the SM feature check was done in two parts, fetch all the SM ref from the coordinator, and then fetch their information such as types and features from the coordinator basedon the refs. This can cause race conditions, where the previously fetched refs might have been deleted when we fetch the SM features. The deletion might happen due to the way SM registeration works for shared SRs such as GFS2, where each `PBD.plug` will trigger a deregister of existing SM and re-register (which will delete existing SMs), and create another (very likely) identical SM. To avoid this race, instead of fetch SM refs and their features separately, do this in one go so we get a consistent snapshot of the db state. Also add a bit more debugging messages. Signed-off-by: Vincent Liu <[email protected]>

Bake in assumptions that have been constant ever since xenstore was created: getdomainpath always returns "/local/domain/<domid>", /local/domain/domid/vm returns "/vm/<uuid>", so there's no need to look up that path to get the uuid again This reduces the number of xenstore accesses from 3 to 1 with no functional change. As suggested in: xapi-project#6068 (review) Signed-off-by: Andrii Sultanov <[email protected]>

…pi-project#6127) Previously the SM feature check was done in two parts, fetch all the SM ref from the coordinator, and then fetch their information such as types and features from the coordinator basedon the refs. This can cause race conditions, where the previously fetched refs might have been deleted when we fetch the SM features. The deletion might happen due to the way SM registeration works for shared SRs such as GFS2, where each `PBD.plug` will trigger a deregister of existing SM and re-register (which will delete existing SMs), and create another (very likely) identical SM. To avoid this race, instead of fetch SM refs and their features separately, do this in one go so we get a consistent snapshot of the db state.

Completes the 15+ years old TODO, at the expense of removing an ultimate example of a "not invented here" attitude. Signed-off-by: Andrii Sultanov <[email protected]>

Issue an alert about a broken host kernel if bits 4, 5, 7, 9, or 14 are set in /proc/sys/kernel/tainted, indicating some kind of error was encountered and the future behaviour of the kernel might not be predictable or safe anymore (though it generally should reasonably recover). Only one alert per tainted bit per boot can be present (more than one can be issued, if the user dismissed the alerts and restarted the toolstack). Distinguish between Major (4,5,7 - these are all things that might cause a host crash, but are unlikely to corrupt whatever data has been written out) and Warning (9, 14 - might be a concern and could be raised to Support but usually are not severe enough to crash the host) levels of errors as suggested by the Foundations team. This should serve as an indicator during issue investigation to look for the cause of the taint. Signed-off-by: Andrii Sultanov <[email protected]>

The manual notes that `Lazy.from_fun` "should only be used if the function f is already defined. In particular it is always less efficient to write `from_fun (fun () -> expr)` than `lazy expr`. So, replace `Lazy.from_fun` with `lazy` in this particular case Compare the lambda dump for `lazy (fun () -> [| 1;2;3 |] |> Array.map (fun x -> x+1))`: ``` (seq (let (l/269 = (function param/319[int] (apply (field_imm 12 (global Stdlib__Array!)) (function x/318[int] : int (+ x/318 1)) (makearray[int] 1 2 3)))) (setfield_ptr(root-init) 0 (global Main!) l/269)) 0) ``` with the lambda dump of the `let x = Lazy.from_fun (fun () -> [| 1;2;3 |] |> Array.map (fun x -> x+1))`: ``` (seq (let (x/269 = (apply (field_imm 5 (global Stdlib__Lazy!)) (function param/332[int] (apply (field_imm 12 (global Stdlib__Array!)) (function x/331[int] : int (+ x/331 1)) (makearray[int] 1 2 3))))) (setfield_ptr(root-init) 0 (global Main!) x/269)) 0) ``` See: https://patricoferris.github.io/js_of_ocamlopt/#code=bGV0IGwgPSBsYXp5IChmdW4gKCkgLT4gW3wgMTsyOzMgfF0gfD4gQXJyYXkubWFwIChmdW4geCAtPiB4KzEpKQoKbGV0IHggPSBMYXp5LmZyb21fZnVuIChmdW4gKCkgLT4gW3wgMTsyOzMgfF0gfD4gQXJyYXkubWFwIChmdW4geCAtPiB4KzEpKQ%3D%3D Signed-off-by: Andrii Sultanov <[email protected]>

A module binding appeared to be unused but was being evaluated for its effect alone. We reintroduce it here and don't bind a name. Signed-off-by: Colin James <[email protected]>

A module binding appeared to be unused but was being evaluated for its effect alone. We reintroduce it here and don't bind a name.

…6159) Default empty `allowed_operations` on a cloned VBD means that XenCenter does not display the DVD option in the console tab for VMs cloned from templates, for example. Follow the practice in `xapi_vbd`, and `update_allowed_operations` immediately after `Db.VBD.create`. --- I've tested this fix and XC displays the ISO selection on the cloned VM now, and allowed operations on the cloned VBDs are correct as well.

In some environments the time ranges checked are too strict causing test to fail. Previously the maximum error accepted was 10 ms, increase to 50 ms. Also increase timeouts to reduce error/value ratio. Signed-off-by: Frediano Ziglio <[email protected]>

Old implementation had different issues. Just use mutexes and conditions using C stubs. Current Ocaml Condition module does not have support for timeout waiting for conditions, so use C stubs. This implementation does not require opening a pipe. This reduces the possibilities of errors calling "wait" to zero. Mostly of the times it does not require kernel calls.

When a host starts, the systemd service xapi-wait-init-complete waiting on the creation of the xapi init cookie file may fail on timeout for a matter of seconds. This patch adds 1 minute (300 seconds total) to the timeout passed to the script xapi-wait-init-complete.

In some environments the time ranges checked are too strict causing test to fail. Previously the maximum error accepted was 10 ms, increase to 50 ms. Also increase timeouts to reduce error/value ratio.

Test that the event is correctly executed. Signed-off-by: Frediano Ziglio <[email protected]>

Signed-off-by: Frediano Ziglio <[email protected]>

Do not use ">" or other operators to compare Mtime.t, the value is intended to be unsigned and should be compared with Int64.unsigned_compare as Mtime functions do. Signed-off-by: Frediano Ziglio <[email protected]>

Signed-off-by: Frediano Ziglio <[email protected]>

The parameter was false only to support an internal usage, external users should always alert the thread loop. Signed-off-by: Frediano Ziglio <[email protected]>

- Do not wait huge amount of time if the queue is empty but use Delay.wait if possible; - Fix delete of periodic events. In case the event is processed it's removed from the queue. Previously remove_from_queue was not able to mark this event as removed; - Do not race between checking the first event and removing it. These 2 actions were done in 2 separate critical section, now they are done in a single one. Signed-off-by: Frediano Ziglio <[email protected]>

Potentially a periodic event can be cancelled while this is processed. Simulate this behavior removing the event inside the handler. This was fixed by previous commit. Signed-off-by: Frediano Ziglio <[email protected]>

Previously if the queue was empty and the loop thread was active the scheduler took quite some time to pick up the new event. Check that this is done in a timely fashion to avoid regressions in code. Signed-off-by: Frediano Ziglio <[email protected]>

Signed-off-by: Pau Ruiz Safont <[email protected]>

Similar to test_empty test however the state of the loop should be different. Signed-off-by: Frediano Ziglio <[email protected]>

Some plugins may not store the client-set metadata, and return a static value when replying to the update. This would override the values that a client used when the SR was created, or set afterwards, which is unexpected. Now name_label and name_description fields returned by the plugins are ignored on update. Current set_name_label and set_name_description rely on the update mechanism to work. Instead add database call at the end of the methods to ensure both xapi and the SR backend are synchronized, even when the latter fails to update the values. Signed-off-by: Pau Ruiz Safont <[email protected]>

Though the majority of completions were already using set_completions and the like to add completion suggestions, there were two leftovers still needlessly changing COMPREPLY themselves. This caused bugs, as in the case of xe vm-import filename=<TAB> autocompleting all of the filenames into the prompt instead of presenting the choice. Let only these functions operate on COMPREPLY directly. Signed-off-by: Andrii Sultanov <[email protected]>

**xe-cli completion: Use grep -E instead of egrep** Otherwise newer packages in XS9 issue "egrep: warning: egrep is obsolescent; using grep -E" warnings all over the place --- **xe-cli completion: Hide COMPREPLY manipulation behind functions** Though the majority of completions were already using set_completions and the like to add completion suggestions, there were two leftovers still needlessly changing COMPREPLY themselves. This caused bugs, as in the case of `xe vm-import filename=<TAB>` autocompleting all of the filenames into the prompt instead of presenting the choice. Let only these functions operate on COMPREPLY directly.

…roject#6165) Some plugins may not store the client-set metadata, and return a static value when replying to the update. This would override the values that a client used when the SR was created, or set afterwards, which is unexpected. Now name_label and name_description fields returned by the plugins are ignored on update. Current set_name_label and set_name_description rely on the update mechanism to work. Instead, add database call at the end of the methods to ensure both xapi and the SR backend are synchronized, even when the latter fails to update the values. Tested on GFS2 tests (JR 4175192), as well as ring3 bvt + bst (209177), and storage validation tests (SR 209180)

For the scan retry, previously we were comparing the entire vdi data structure from the database using the (<>) operator. This is a bit wasteful and not very stable. Instead let us just compare the vdi refs, since the race here comes from `VDI.db_{introduce,forget}`, which would only add/remove vdis from the db, but not change its actual data structure. Also add a bit more logging when retrying, since this should not happen very often. Signed-off-by: Vincent Liu <[email protected]>

When add leaked datapath: 1. add leaked datapath to Sr.vdis 2. write to db file 3. log enhance If there are storage exceptions raised when destroying datapath, the procedure fails and the state of VDI becomes incorrect, which leads to various abnormalresults in subsequent operations. To handle this, the leaked datapath is designed to redestroy the datapath and refresh the state before next storage operation via function remove_datapaths_andthen_nolock. But this mechanism doesn't take effect in current code. This commit is to fix this bug. Leaked datapath should be added to Sr.vdis to make the leaked datapath really work. And write to db file to avoid losing the leaked datapath if xapi restarts. Signed-off-by: Changlei Li <[email protected]>

There was some issue in the current code, from structure corruption to thread safety. Add some test and fix discovered issues. More details on commit messages.

When add leaked datapath: 1. add leaked datapath to Sr.vdis 2. write to db file 3. log enhance If there are storage exceptions raised when destroying datapath, the procedure fails and the state of VDI becomes incorrect, which leads to various abnormalresults in subsequent operations. To handle this, the leaked datapath is designed to redestroy the datapath and refresh the state before next storage operation via function remove_datapaths_andthen_nolock. But this mechanism doesn't take effect in current code. This commit is to fix this bug. leaked datapath should be added to Sr.vdis to make the leaked datapath really work. And write to db file to avoid losing the leaked datapath if xapi restarts.

For the scan retry, previously we were comparing the entire vdi data structure from the database using the (<>) operator. This is a bit wasteful and not very stable. Instead let us just compare the vdi refs, since the race here comes from `VDI.db_{introduce,forget}`, which would only add/remove vdis from the db, but not change its actual data structure. Also add a bit more logging when retrying, since this should not happen very often.

QEMU orders devices by the time of plugging. Parallelizing them introduces randomness, which breaks the assumption that devices are ordered in a deterministic way. Serialize all PCI and VUSB plugs to restore behaviour. Signed-off-by: Pau Ruiz Safont <[email protected]>

@freddy77

QEMU orders devices by the time of plugging. Parallelizing them introduces randomness, which breaks the assumption that devices are ordered in a deterministic way. Serialize all PCI and VUSB plugs to restore behaviour. This fix has been manually tested by @freddy77 against an affected VM relying on the ordering

danilo-delbusso and others added 30 commits October 1, 2024 11:25

CP-51694: Add testing of C# date converter

30501ad

Signed-off-by: Danilo Del Busso <[email protected]>

CP-51694: Add testing of Java date deserializer

dc1ef20

Signed-off-by: Danilo Del Busso <[email protected]>

rrdd: avoid constructing intermediate lists, use Seq

f0b6322

This enables extending the type without causing performance issues, and should reduce the work for the garbage collector. Signed-off-by: Edwin Török <[email protected]>

CA-391651 - rrd: Remove deprecated member of rra struct

8b2b2d2

Signed-off-by: Andrii Sultanov <[email protected]>

CA-391651: Make timestamps of data collectors in xcp-rrdd independent

cacf52a

Instead of getting one timestamp for all collectors, get them closer to the actual measurement time. Signed-off-by: Andrii Sultanov <[email protected]>

CA-391651: rrdd_server - read plugins' timestamps, don't just ignore …

a7bc62d

…them Signed-off-by: Andrii Sultanov <[email protected]>

CA-391651: Rename 'new_domid' parameter to 'new_rrd'

0a6daf2

RRDs can have different owners, and new ones can show up without new domids being involved (SRs, VMs are also owner types). Signed-off-by: Andrii Sultanov <[email protected]>

CA-391651: rrd - don't iterate over lists needlessly

7052ddd

Add an "unsafe" version of rrd_add_ds to avoid checking if the datasource already exists twice. Signed-off-by: Andrii Sultanov <[email protected]>

CA-391651 - rrd protocol: Stop truncating timestamps to seconds

8d7c057

Instead of writing Int64 (truncated from timestamp floats) into the memory-mapped files, keep the precision of the timestamp. Signed-off-by: Andrii Sultanov <[email protected]>

CA-391651 rrd: Don't truncate timestamps when calculating values

b3da1c9

Signed-off-by: Andrii Sultanov <[email protected]>

CA-391651: Update RRD tests to the new interfaces

76e317c

Signed-off-by: Andrii Sultanov <[email protected]>

CA-391651 - docs: Update RRD design pages

1d7fe35

Signed-off-by: Andrii Sultanov <[email protected]>

CP-51694: Add testing of Go date deserialization

20f4dcc

Signed-off-by: Danilo Del Busso <[email protected]>

Set non-UTC timezone for date time unit test runners

ca5d3b6

Signed-off-by: Danilo Del Busso <[email protected]>

Fix parsing of timezone agnostic date strings in Java deserializer

5bcef81

Signed-off-by: Danilo Del Busso <[email protected]>

Ensure C# date tests work when running under any timezone

b81d11e

Without the change, the tests only pass when running on a runner with UTC as a timezone Signed-off-by: Danilo Del Busso <[email protected]>

CP-52524 - dbsync_slave: stop calculating boot time ourselves

228071a

Completes the 15+ years old TODO, at the expense of removing an ultimate example of a "not invented here" attitude. Signed-off-by: Andrii Sultanov <[email protected]>

CA-402654: Partially revert 3e2e970

860843f

A module binding appeared to be unused but was being evaluated for its effect alone. We reintroduce it here and don't bind a name. Signed-off-by: Colin James <[email protected]>

CA-402654: Partially revert 3e2e970 (xapi-project#6131)

8f49371

A module binding appeared to be unused but was being evaluated for its effect alone. We reintroduce it here and don't bind a name.

last-genius and others added 27 commits December 6, 2024 08:49

Delay: wait a bit more testing the module (xapi-project#6162)

f14fcdf

In some environments the time ranges checked are too strict causing test to fail. Previously the maximum error accepted was 10 ms, increase to 50 ms. Also increase timeouts to reduce error/value ratio.

Simple test for periodic scheduler

3476a22

Test that the event is correctly executed. Signed-off-by: Frediano Ziglio <[email protected]>

Limit mutex contention in add_to_queue

6249261

Signed-off-by: Frediano Ziglio <[email protected]>

Compare correctly Mtime.t

f86c076

Do not use ">" or other operators to compare Mtime.t, the value is intended to be unsigned and should be compared with Int64.unsigned_compare as Mtime functions do. Signed-off-by: Frediano Ziglio <[email protected]>

Protect queue with mutex in remove_from_queue

2950dd9

Signed-off-by: Frediano Ziglio <[email protected]>

Remove signal parameter from add_to_queue

529eeaa

The parameter was false only to support an internal usage, external users should always alert the thread loop. Signed-off-by: Frediano Ziglio <[email protected]>

Add test for removing periodic event in periodic scheduler

935c84f

Potentially a periodic event can be cancelled while this is processed. Simulate this behavior removing the event inside the handler. This was fixed by previous commit. Signed-off-by: Frediano Ziglio <[email protected]>

xapi_sr: remove commented code from 2009

21b56b4

Signed-off-by: Pau Ruiz Safont <[email protected]>

Add a test to check the loop is woken up adding a new event

88dd4d9

Similar to test_empty test however the state of the loop should be different. Signed-off-by: Frediano Ziglio <[email protected]>

Test and improve Xapi periodic scheduler (xapi-project#6155)

8941c9d

There was some issue in the current code, from structure corruption to thread safety. Add some test and fix discovered issues. More details on commit messages.

Merge branch 'master' into private/mingl/merge_master_to_feature

0b15ba2

liulinC approved these changes Dec 13, 2024

View reviewed changes

gangj approved these changes Dec 13, 2024

View reviewed changes

minglumlu merged commit a889151 into xapi-project:feature/pool-licensing Dec 13, 2024
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge master to pool licensing feature branch #6185

Merge master to pool licensing feature branch #6185

minglumlu commented Dec 13, 2024

Merge master to pool licensing feature branch #6185

Merge master to pool licensing feature branch #6185

Conversation

minglumlu commented Dec 13, 2024