Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#7044 sched deadlock: Fix theoretical deadlock #7087

Merged
merged 4 commits into from
Nov 19, 2024

Conversation

derekbruening
Copy link
Contributor

Adds a check for the caller holding the previous input's lock before acquiring it to retrieve the previous workload. The existing code acquired the new lock without checking caller_holds_cur_input_lock, which could hang (it happened not to because the only caller who sets caller_holds_cur_input_lock to true set the current input to invalid and set_cur_input() returns prior to this point).

Issue: #7044

Adds a check for the caller holding the previous input's lock before
acquiring it to retrieve the previous workload.  The existing code
acquired the new lock without checking caller_holds_cur_input_lock,
which could hang (it happened not to because the only caller who sets
caller_holds_cur_input_lock to true set the current input to invalid
and set_cur_input() returns prior to this point).

Issue: #7044
Copy link
Contributor

@edeiana edeiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an easy way to add a test for this that would otherwise "hang" (hence timeout) if this change wasn't in place?

@derekbruening
Copy link
Contributor Author

Is there an easy way to add a test for this that would otherwise "hang" (hence timeout) if this change wasn't in place?

No, since the only code path today where the caller holds the input lock is on replay and it doesn't hit either of these points. This would only happen on a future code change where some new call site holds the lock. So I think code inspection is the best we can do; but better than just leaving known possible problems in place.

@derekbruening
Copy link
Contributor Author

a64 failure is #5641 timeout in sigmask-noalarm

@derekbruening derekbruening merged commit b158d1f into master Nov 19, 2024
16 of 17 checks passed
@derekbruening derekbruening deleted the i7044-avoid-lock-issue branch November 19, 2024 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants