-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge master to pool licensing feature branch #6185
Merged
minglumlu
merged 118 commits into
xapi-project:feature/pool-licensing
from
minglumlu:private/mingl/merge_master_to_feature
Dec 13, 2024
Merged
Merge master to pool licensing feature branch #6185
minglumlu
merged 118 commits into
xapi-project:feature/pool-licensing
from
minglumlu:private/mingl/merge_master_to_feature
Dec 13, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Danilo Del Busso <[email protected]>
Signed-off-by: Danilo Del Busso <[email protected]>
This enables extending the type without causing performance issues, and should reduce the work for the garbage collector. Signed-off-by: Edwin Török <[email protected]>
Signed-off-by: Andrii Sultanov <[email protected]>
Instead of getting one timestamp for all collectors, get them closer to the actual measurement time. Signed-off-by: Andrii Sultanov <[email protected]>
…them Signed-off-by: Andrii Sultanov <[email protected]>
Instead of timestamps taken at read time in xcp-rrdd, propagate timestamps taken by plugins (or collectors in xcp-rrdd itself) and use these when updating values. Also process datasources without flattening them all into one list. This allows to process datasources with the same timestamp (provided by the plugin or dom0 collector in xcp_rrdd) at once. Co-authored-by: Edwin Török <[email protected]> Signed-off-by: Edwin Török <[email protected]> Signed-off-by: Andrii Sultanov <[email protected]>
RRDs can have different owners, and new ones can show up without new domids being involved (SRs, VMs are also owner types). Signed-off-by: Andrii Sultanov <[email protected]>
ds_update* functions previously relied on being called for the entirety of datasources of a certain RRD. Instead, operate on chunks of datasources provided by the same plugin. These chunks are not contiguous (more detailed explanation in the comments in the code), so indices to rrd_dss and the associated structures need to be carried with each datasource. Also turns the value*transform tuple into a type for more explicit accesses. Signed-off-by: Andrii Sultanov <[email protected]>
…and archiving This ensures that the interval between the current timestamp and previous update is calculated correctly. Keep the same external interface through rrdd_http_handler by adding an 'internal' parameter to xml exporting functions. Signed-off-by: Andrii Sultanov <[email protected]>
Add an "unsafe" version of rrd_add_ds to avoid checking if the datasource already exists twice. Signed-off-by: Andrii Sultanov <[email protected]>
…m explicitly If RRD's datasource disappears and is not reported by the plugin (or collected by xcp-rrdd), then it's not going to be refreshed, keeping its old value and timestamp. Previously, this was handled nicely because the whole RRD was updated at once, and updates were defaulted to VT_Unknown, Identity before being assigned actually collected values. After RRD collection was changed to process datasources in chunks per source, we need to explicitly find the ones that weren't updated on this iteration, and explicitly reset them. (This is not just a theoretical exercise, it can happen when a VIF gets unplugged or destroyed, for example, with its stats not being reset and continuing to display old values) Signed-off-by: Andrii Sultanov <[email protected]>
Instead of writing Int64 (truncated from timestamp floats) into the memory-mapped files, keep the precision of the timestamp. Signed-off-by: Andrii Sultanov <[email protected]>
RRDD, aside from the OCaml library, exposes two interfaces. This changes the Python one. The C one lives in rrd-client-lib and needs to be changed at the same time and coordinated during updating. Signed-off-by: Andrii Sultanov <[email protected]>
Signed-off-by: Andrii Sultanov <[email protected]>
Signed-off-by: Andrii Sultanov <[email protected]>
Signed-off-by: Andrii Sultanov <[email protected]>
When a host starts, the systemd service xapi-wait-init-complete waiting on the creation of the xapi init cookie file may fail on timeout for a matter of seconds. This patch adds 1 minute (300 seconds total) to the timeout passed to the script xapi-wait-init-complete. Signed-off-by: Thierry Escande <[email protected]>
Signed-off-by: Danilo Del Busso <[email protected]>
Signed-off-by: Danilo Del Busso <[email protected]>
Signed-off-by: Danilo Del Busso <[email protected]>
Without the change, the tests only pass when running on a runner with UTC as a timezone Signed-off-by: Danilo Del Busso <[email protected]>
Previously the SM feature check was done in two parts, fetch all the SM ref from the coordinator, and then fetch their information such as types and features from the coordinator basedon the refs. This can cause race conditions, where the previously fetched refs might have been deleted when we fetch the SM features. The deletion might happen due to the way SM registeration works for shared SRs such as GFS2, where each `PBD.plug` will trigger a deregister of existing SM and re-register (which will delete existing SMs), and create another (very likely) identical SM. To avoid this race, instead of fetch SM refs and their features separately, do this in one go so we get a consistent snapshot of the db state. Also add a bit more debugging messages. Signed-off-by: Vincent Liu <[email protected]>
Bake in assumptions that have been constant ever since xenstore was created: getdomainpath always returns "/local/domain/<domid>", /local/domain/domid/vm returns "/vm/<uuid>", so there's no need to look up that path to get the uuid again This reduces the number of xenstore accesses from 3 to 1 with no functional change. As suggested in: xapi-project#6068 (review) Signed-off-by: Andrii Sultanov <[email protected]>
…pi-project#6127) Previously the SM feature check was done in two parts, fetch all the SM ref from the coordinator, and then fetch their information such as types and features from the coordinator basedon the refs. This can cause race conditions, where the previously fetched refs might have been deleted when we fetch the SM features. The deletion might happen due to the way SM registeration works for shared SRs such as GFS2, where each `PBD.plug` will trigger a deregister of existing SM and re-register (which will delete existing SMs), and create another (very likely) identical SM. To avoid this race, instead of fetch SM refs and their features separately, do this in one go so we get a consistent snapshot of the db state.
Completes the 15+ years old TODO, at the expense of removing an ultimate example of a "not invented here" attitude. Signed-off-by: Andrii Sultanov <[email protected]>
Issue an alert about a broken host kernel if bits 4, 5, 7, 9, or 14 are set in /proc/sys/kernel/tainted, indicating some kind of error was encountered and the future behaviour of the kernel might not be predictable or safe anymore (though it generally should reasonably recover). Only one alert per tainted bit per boot can be present (more than one can be issued, if the user dismissed the alerts and restarted the toolstack). Distinguish between Major (4,5,7 - these are all things that might cause a host crash, but are unlikely to corrupt whatever data has been written out) and Warning (9, 14 - might be a concern and could be raised to Support but usually are not severe enough to crash the host) levels of errors as suggested by the Foundations team. This should serve as an indicator during issue investigation to look for the cause of the taint. Signed-off-by: Andrii Sultanov <[email protected]>
The manual notes that `Lazy.from_fun` "should only be used if the function f is already defined. In particular it is always less efficient to write `from_fun (fun () -> expr)` than `lazy expr`. So, replace `Lazy.from_fun` with `lazy` in this particular case Compare the lambda dump for `lazy (fun () -> [| 1;2;3 |] |> Array.map (fun x -> x+1))`: ``` (seq (let (l/269 = (function param/319[int] (apply (field_imm 12 (global Stdlib__Array!)) (function x/318[int] : int (+ x/318 1)) (makearray[int] 1 2 3)))) (setfield_ptr(root-init) 0 (global Main!) l/269)) 0) ``` with the lambda dump of the `let x = Lazy.from_fun (fun () -> [| 1;2;3 |] |> Array.map (fun x -> x+1))`: ``` (seq (let (x/269 = (apply (field_imm 5 (global Stdlib__Lazy!)) (function param/332[int] (apply (field_imm 12 (global Stdlib__Array!)) (function x/331[int] : int (+ x/331 1)) (makearray[int] 1 2 3))))) (setfield_ptr(root-init) 0 (global Main!) x/269)) 0) ``` See: https://patricoferris.github.io/js_of_ocamlopt/#code=bGV0IGwgPSBsYXp5IChmdW4gKCkgLT4gW3wgMTsyOzMgfF0gfD4gQXJyYXkubWFwIChmdW4geCAtPiB4KzEpKQoKbGV0IHggPSBMYXp5LmZyb21fZnVuIChmdW4gKCkgLT4gW3wgMTsyOzMgfF0gfD4gQXJyYXkubWFwIChmdW4geCAtPiB4KzEpKQ%3D%3D Signed-off-by: Andrii Sultanov <[email protected]>
A module binding appeared to be unused but was being evaluated for its effect alone. We reintroduce it here and don't bind a name. Signed-off-by: Colin James <[email protected]>
A module binding appeared to be unused but was being evaluated for its effect alone. We reintroduce it here and don't bind a name.
…6159) Default empty `allowed_operations` on a cloned VBD means that XenCenter does not display the DVD option in the console tab for VMs cloned from templates, for example. Follow the practice in `xapi_vbd`, and `update_allowed_operations` immediately after `Db.VBD.create`. --- I've tested this fix and XC displays the ISO selection on the cloned VM now, and allowed operations on the cloned VBDs are correct as well.
In some environments the time ranges checked are too strict causing test to fail. Previously the maximum error accepted was 10 ms, increase to 50 ms. Also increase timeouts to reduce error/value ratio. Signed-off-by: Frediano Ziglio <[email protected]>
Old implementation had different issues. Just use mutexes and conditions using C stubs. Current Ocaml Condition module does not have support for timeout waiting for conditions, so use C stubs. This implementation does not require opening a pipe. This reduces the possibilities of errors calling "wait" to zero. Mostly of the times it does not require kernel calls.
When a host starts, the systemd service xapi-wait-init-complete waiting on the creation of the xapi init cookie file may fail on timeout for a matter of seconds. This patch adds 1 minute (300 seconds total) to the timeout passed to the script xapi-wait-init-complete.
In some environments the time ranges checked are too strict causing test to fail. Previously the maximum error accepted was 10 ms, increase to 50 ms. Also increase timeouts to reduce error/value ratio.
Test that the event is correctly executed. Signed-off-by: Frediano Ziglio <[email protected]>
Signed-off-by: Frediano Ziglio <[email protected]>
Do not use ">" or other operators to compare Mtime.t, the value is intended to be unsigned and should be compared with Int64.unsigned_compare as Mtime functions do. Signed-off-by: Frediano Ziglio <[email protected]>
Signed-off-by: Frediano Ziglio <[email protected]>
The parameter was false only to support an internal usage, external users should always alert the thread loop. Signed-off-by: Frediano Ziglio <[email protected]>
- Do not wait huge amount of time if the queue is empty but use Delay.wait if possible; - Fix delete of periodic events. In case the event is processed it's removed from the queue. Previously remove_from_queue was not able to mark this event as removed; - Do not race between checking the first event and removing it. These 2 actions were done in 2 separate critical section, now they are done in a single one. Signed-off-by: Frediano Ziglio <[email protected]>
Potentially a periodic event can be cancelled while this is processed. Simulate this behavior removing the event inside the handler. This was fixed by previous commit. Signed-off-by: Frediano Ziglio <[email protected]>
Previously if the queue was empty and the loop thread was active the scheduler took quite some time to pick up the new event. Check that this is done in a timely fashion to avoid regressions in code. Signed-off-by: Frediano Ziglio <[email protected]>
Signed-off-by: Pau Ruiz Safont <[email protected]>
Similar to test_empty test however the state of the loop should be different. Signed-off-by: Frediano Ziglio <[email protected]>
Some plugins may not store the client-set metadata, and return a static value when replying to the update. This would override the values that a client used when the SR was created, or set afterwards, which is unexpected. Now name_label and name_description fields returned by the plugins are ignored on update. Current set_name_label and set_name_description rely on the update mechanism to work. Instead add database call at the end of the methods to ensure both xapi and the SR backend are synchronized, even when the latter fails to update the values. Signed-off-by: Pau Ruiz Safont <[email protected]>
Though the majority of completions were already using set_completions and the like to add completion suggestions, there were two leftovers still needlessly changing COMPREPLY themselves. This caused bugs, as in the case of xe vm-import filename=<TAB> autocompleting all of the filenames into the prompt instead of presenting the choice. Let only these functions operate on COMPREPLY directly. Signed-off-by: Andrii Sultanov <[email protected]>
**xe-cli completion: Use grep -E instead of egrep** Otherwise newer packages in XS9 issue "egrep: warning: egrep is obsolescent; using grep -E" warnings all over the place --- **xe-cli completion: Hide COMPREPLY manipulation behind functions** Though the majority of completions were already using set_completions and the like to add completion suggestions, there were two leftovers still needlessly changing COMPREPLY themselves. This caused bugs, as in the case of `xe vm-import filename=<TAB>` autocompleting all of the filenames into the prompt instead of presenting the choice. Let only these functions operate on COMPREPLY directly.
…roject#6165) Some plugins may not store the client-set metadata, and return a static value when replying to the update. This would override the values that a client used when the SR was created, or set afterwards, which is unexpected. Now name_label and name_description fields returned by the plugins are ignored on update. Current set_name_label and set_name_description rely on the update mechanism to work. Instead, add database call at the end of the methods to ensure both xapi and the SR backend are synchronized, even when the latter fails to update the values. Tested on GFS2 tests (JR 4175192), as well as ring3 bvt + bst (209177), and storage validation tests (SR 209180)
For the scan retry, previously we were comparing the entire vdi data structure from the database using the (<>) operator. This is a bit wasteful and not very stable. Instead let us just compare the vdi refs, since the race here comes from `VDI.db_{introduce,forget}`, which would only add/remove vdis from the db, but not change its actual data structure. Also add a bit more logging when retrying, since this should not happen very often. Signed-off-by: Vincent Liu <[email protected]>
When add leaked datapath: 1. add leaked datapath to Sr.vdis 2. write to db file 3. log enhance If there are storage exceptions raised when destroying datapath, the procedure fails and the state of VDI becomes incorrect, which leads to various abnormalresults in subsequent operations. To handle this, the leaked datapath is designed to redestroy the datapath and refresh the state before next storage operation via function remove_datapaths_andthen_nolock. But this mechanism doesn't take effect in current code. This commit is to fix this bug. Leaked datapath should be added to Sr.vdis to make the leaked datapath really work. And write to db file to avoid losing the leaked datapath if xapi restarts. Signed-off-by: Changlei Li <[email protected]>
There was some issue in the current code, from structure corruption to thread safety. Add some test and fix discovered issues. More details on commit messages.
When add leaked datapath: 1. add leaked datapath to Sr.vdis 2. write to db file 3. log enhance If there are storage exceptions raised when destroying datapath, the procedure fails and the state of VDI becomes incorrect, which leads to various abnormalresults in subsequent operations. To handle this, the leaked datapath is designed to redestroy the datapath and refresh the state before next storage operation via function remove_datapaths_andthen_nolock. But this mechanism doesn't take effect in current code. This commit is to fix this bug. leaked datapath should be added to Sr.vdis to make the leaked datapath really work. And write to db file to avoid losing the leaked datapath if xapi restarts.
For the scan retry, previously we were comparing the entire vdi data structure from the database using the (<>) operator. This is a bit wasteful and not very stable. Instead let us just compare the vdi refs, since the race here comes from `VDI.db_{introduce,forget}`, which would only add/remove vdis from the db, but not change its actual data structure. Also add a bit more logging when retrying, since this should not happen very often.
QEMU orders devices by the time of plugging. Parallelizing them introduces randomness, which breaks the assumption that devices are ordered in a deterministic way. Serialize all PCI and VUSB plugs to restore behaviour. Signed-off-by: Pau Ruiz Safont <[email protected]>
QEMU orders devices by the time of plugging. Parallelizing them introduces randomness, which breaks the assumption that devices are ordered in a deterministic way. Serialize all PCI and VUSB plugs to restore behaviour. This fix has been manually tested by @freddy77 against an affected VM relying on the ordering
liulinC
approved these changes
Dec 13, 2024
gangj
approved these changes
Dec 13, 2024
minglumlu
merged commit Dec 13, 2024
a889151
into
xapi-project:feature/pool-licensing
15 checks passed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.