Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test and improve Xapi periodic scheduler #6155

Merged
merged 9 commits into from
Dec 11, 2024

Conversation

freddy77
Copy link
Collaborator

@freddy77 freddy77 commented Dec 4, 2024

There was some issue in the current code, from structure corruption to thread safety.
Add some test and fix discovered issues.

More details on commit messages.

@freddy77 freddy77 changed the title Test and Xapi periodic scheduler Test and improve Xapi periodic scheduler Dec 4, 2024
@edwintorok
Copy link
Contributor

edwintorok commented Dec 5, 2024

There are some changes in common here with my changes (feature/perf...edwintorok:xen-api:pr/mtime, not quite ready looks like I committed some files I shouldn't have, but nearly there), for which I was about to open a PR (and haven't opened one before because there were too many PRs open).

I think we should probably open all PRs that we have, in a draft form so that we're aware of each-other's work. Otherwise we're just duplicating work. We have the Mtime changes done by 3 different people so far (@psafont , you and myself).

Copy link
Contributor

@edwintorok edwintorok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'Do not wait huge amount of time if the queue is empty' was already fixed by https://github.com/xapi-project/xen-api/pull/6122/files#diff-5c3ab2a7766ba1f2c349d7985a633259d244dd8d5b2a3dfaf948337dc9a45221 (although that is on feature/perf, so likely the 2 changes here will conflict.)

Copy link
Contributor

@edwintorok edwintorok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be useful to split the bugfixes into separate commits (like I've done in https://github.com/xapi-project/xen-api/pull/6161/commits), that way it'll be easier to see which changes are duplicate (and one of the commits can be dropped from one of the PRs, and which ones are still needed).

@freddy77
Copy link
Collaborator Author

freddy77 commented Dec 6, 2024

@edwintorok I'll try to split the fixes but they are really convoluted, they could introduce some artificial weird artifacts

@edwintorok
Copy link
Contributor

The ones that are difficult to split you can leave as is. I'll rebase my changes on top of this one, once it is merged to master.

@freddy77
Copy link
Collaborator Author

freddy77 commented Dec 6, 2024

Rebased on master and split the multiple fix patch.

@freddy77 freddy77 requested a review from edwintorok December 6, 2024 10:27
Test that the event is correctly executed.

Signed-off-by: Frediano Ziglio <[email protected]>
Do not use ">" or other operators to compare Mtime.t, the value is intended to
be unsigned and should be compared with Int64.unsigned_compare as Mtime
functions do.

Signed-off-by: Frediano Ziglio <[email protected]>
The parameter was false only to support an internal usage, external users
should always alert the thread loop.

Signed-off-by: Frediano Ziglio <[email protected]>
- Do not wait huge amount of time if the queue is empty but
  use Delay.wait if possible;
- Fix delete of periodic events. In case the event is processed
  it's removed from the queue. Previously remove_from_queue was
  not able to mark this event as removed;
- Do not race between checking the first event and removing it.
  These 2 actions were done in 2 separate critical section, now
  they are done in a single one.

Signed-off-by: Frediano Ziglio <[email protected]>
Potentially a periodic event can be cancelled while this is processed.
Simulate this behavior removing the event inside the handler.
This was fixed by previous commit.

Signed-off-by: Frediano Ziglio <[email protected]>
Previously if the queue was empty and the loop thread was active
the scheduler took quite some time to pick up the new event.
Check that this is done in a timely fashion to avoid regressions in code.

Signed-off-by: Frediano Ziglio <[email protected]>
Similar to test_empty test however the state of the loop should be different.

Signed-off-by: Frediano Ziglio <[email protected]>
@freddy77
Copy link
Collaborator Author

freddy77 commented Dec 9, 2024

Rebased on master, added a new test.

Copy link
Contributor

@edwintorok edwintorok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets leave it the global pending_event as it is for now, and I'll revisit this when I introduce a Picos-based scheduler in XAPI.
Picos uses a more efficient psq based implementation, rather than the array based implementation in XAPI's Ipq (which afaict needs to be grown and resized every time we add and remove elements).

And we can probably implement 'removing from the queue' of a periodic task as canceling the Fiber that is running the periodic task (which internally uses scheduler delays to reschedule itself).

@freddy77
Copy link
Collaborator Author

Lets leave it the global pending_event as it is for now, and I'll revisit this when I introduce a Picos-based scheduler in XAPI. Picos uses a more efficient psq based implementation, rather than the array based implementation in XAPI's Ipq (which afaict needs to be grown and resized every time we add and remove elements).

No, it does not need to resize every time. But the lookup is O(n) instead of O(log(n)).

Yes, calling add_periodic_pending is not done while extracting the minimal to avoid regressions.

@freddy77
Copy link
Collaborator Author

@edwintorok @psafont can this PR be merged ?

@psafont psafont added this pull request to the merge queue Dec 11, 2024
Merged via the queue into xapi-project:master with commit 8941c9d Dec 11, 2024
15 checks passed
@freddy77 freddy77 deleted the scheduler branch December 11, 2024 10:49
@edwintorok
Copy link
Contributor

I thought I pressed the merge button, but maybe it didn't work for some reason. This means I'll need to rebase my PR again because it probably didn't include this change yet #6161

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants