-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread holding spinlock can be blocked. #9531
Comments
we are trying to call sched_lock in critical section(could be for spin lock), but it isn't very easy because:
Even thread running on one CPU trigger a scheduling on other CPU, the target CPU which call spin_lock_irqsave will postpone the request after spin_unlock_restore is called since IPI triggering schedule is also blocking too. |
Perhaps that critical section in sched_lock() might be replaced with something similar spin_lock_irqsave(). The problem is that logic like spin_setbit(), spin_clrbit() and the call to nxsched_merge_prioritized() do require a critical section. So it is inescapable in the current design.
Does enter_critical_section() now call sched_lock()? I don't see that call but I know we talked about that in the past. Since spin_lock() already calls enter_critical_section(), my concern in this case will be fixed when the general issue is fixed. Also, my preference would be lose a bit of performance if that is what is necessary to avoid the hard performance bug that would occur if the a thread is blocked indefinitely while holding a spinlock. |
Let's say; you want to explore fringe; let's say you want to tune and create a kind of high performance lock-free system / mode, such within a specific and simple hardware/environment; let's say having two CPUs ; you could have a hard-limit on spin-cycles; then test clock timing ; light-job, mid-job, heavy-job etc ; each time you hit hard-limit you log ; thus, you could draw a policy / control for a thread going lalaland in that setup. |
Suspending the holder of a spinlock can cause a myriad of failures. Suspending the holder does not free the spinlock; the spinlock remains locked while the holder is blocked. In the worst case, this can cause deadlocks since the holder of the spinlock is blocked and may not have the priority to run again to release it. Another spinlock performance issue: #1488 The point of this issue, @mu578 your comments, and #1488 is that better management of spinlocks is needed for them to really work as needed in SMP. And, as @xiaoxiang781216 has pointed out, there are some tricky design issues that would have to be addressed to accomplish that. Perhaps we could do something as simple as adding a in-a-spinlock flag in the TCB which would very temporarily lock the scheduler just like lockcount > 0? Very simple but also kind of kludgey. |
I found taking spinlock will disable interrupt and enable interrupt when releasing spinlock in zephyr project. |
In SMP mode, disabling local interrupts will not prevent the thread from being blocked. But locking the scheduler with sched_lock() will. |
There are two method enter scheduler:
We just prevent scheduling from interrupt and first one should be guarantee by programmer. |
Also, a more generalized case of your 2) is:
Disabling interrupts, of course, has bad effects on real time performance. Interrupts would have to be disabled throughout the time that the spinlock is held. So I don't like Zephyr's solution very much. In SMP machines, different interrupts can be handled by different CPUs. So disabling local CPU interrupts doesn't work. NuttX does have enter/leave_critical_section() which will disable interrupts on all CPUs, but it is very expensive (it uses more spinlocks and some inter-CPU messaging). sched_lock() works better than disabling interrupts. It prevents the thread from being blocked on the CPU without disabling interrupts. Interrupts can still occur and the kernel threads can still respond to interrupt event (but on a different CPUs). So there should be no loss of real time performance. Giving up the CPU to momentarily service an interrupt is usually OK; when the interrupt returns, it will always return to the thread holding the spinlock. The thread stays in place if the scheduler is locked. The only way that the thread could be blocked is if, as you say, in 1), if the thread holding the spinlock voluntarily blocks itself. That would be very bad situation in a real time system and should be avoided. |
Hmmm... this might not be true. I would need to review the code. At least at one time, the logic locked all threads in place on all CPUs. So new ready-to-run threads couldn't run but would go in the pending task list until the scheduler is unlocked. |
that was my point, indeed starting with a narrowed environment, then experiencing different isolated strategies, I don't think one can solve this kind of model by just a theoretical approach, it is too practical driven, maybe two different types of scheduling for high and low priority groups would be a solution. |
@hujun260 look at this issue. |
@MasayukiIshikawa @masayuki2009 I was just looking at the implementation of spin_lock_irqsave().
spin_lock_irqsave() basically just:
The waiting thread then holds the lock and the following code section is protected.
However, it is possible that another thread running on a different CPU could cause a higher priority thread to be started and that could cause the thread holding the spinlock to be blocked? It looks like that could happen and, if so, that would be bad. Am I missing something?
A simple solution would be to call sched_lock() from spin_irq_lock() while holding the spinlock and call sched_unlock() when the spinlock is unlocked. That is reasonably lightweight and should guarantee that the thread holding the spinlock runs on the CPU until it relinquishes the spinlock.
I suppose that issue would not be unique to spin_lock_irqsave() but would really apply to all use of spinlocks.
This is similar to another issue that added a call to sched_lock() in enter_critical_section because the thread holding the critical section was being suspended. In that case, there is no real, hard error, but the results could be confusing and counterintuitive. This seems like the same problem but the results could be a real problem in certain cases where keeping the spinlock locked indefinitely effects system performance.
@xiaoxiang781216 has more experience with that related issue. Please comment on this issue.
The text was updated successfully, but these errors were encountered: