Thread holding spinlock can be blocked. #9531

patacongo · 2023-06-13T16:51:01Z

@MasayukiIshikawa @masayuki2009 I was just looking at the implementation of spin_lock_irqsave().

spin_lock_irqsave() basically just:

Calls irqsave() to disable local interrupts, and
Calls spin_lock() to lock the spinlock.

The waiting thread then holds the lock and the following code section is protected.

However, it is possible that another thread running on a different CPU could cause a higher priority thread to be started and that could cause the thread holding the spinlock to be blocked? It looks like that could happen and, if so, that would be bad. Am I missing something?

A simple solution would be to call sched_lock() from spin_irq_lock() while holding the spinlock and call sched_unlock() when the spinlock is unlocked. That is reasonably lightweight and should guarantee that the thread holding the spinlock runs on the CPU until it relinquishes the spinlock.

I suppose that issue would not be unique to spin_lock_irqsave() but would really apply to all use of spinlocks.

This is similar to another issue that added a call to sched_lock() in enter_critical_section because the thread holding the critical section was being suspended. In that case, there is no real, hard error, but the results could be confusing and counterintuitive. This seems like the same problem but the results could be a real problem in certain cases where keeping the spinlock locked indefinitely effects system performance.

@xiaoxiang781216 has more experience with that related issue. Please comment on this issue.

xiaoxiang781216 · 2023-06-13T17:58:26Z

we are trying to call sched_lock in critical section(could be for spin lock), but it isn't very easy because:

sched_lock call enter_critical_section internally
The code of sched_lock is very long, directly call it in spinlock_irqsave reduce the performance a lot.

However, isn't it possible that another thread running on a different CPU could cause a higher priority thread to be started and that could cause the thread holding the spinlock to be blocked? It looks like that could happen and, if so, that would be bad. Am I missing something?

Even thread running on one CPU trigger a scheduling on other CPU, the target CPU which call spin_lock_irqsave will postpone the request after spin_unlock_restore is called since IPI triggering schedule is also blocking too.

patacongo · 2023-06-13T20:52:12Z

sched_lock call enter_critical_section internally

Perhaps that critical section in sched_lock() might be replaced with something similar spin_lock_irqsave(). The problem is that logic like spin_setbit(), spin_clrbit() and the call to nxsched_merge_prioritized() do require a critical section. So it is inescapable in the current design.

The code of sched_lock is very long, directly call it in spinlock_irqsave reduce the performance a lot.

Does enter_critical_section() now call sched_lock()? I don't see that call but I know we talked about that in the past.

Since spin_lock() already calls enter_critical_section(), my concern in this case will be fixed when the general issue is fixed.

Also, my preference would be lose a bit of performance if that is what is necessary to avoid the hard performance bug that would occur if the a thread is blocked indefinitely while holding a spinlock.

mu578 · 2023-06-14T16:45:43Z

Let's say; you want to explore fringe; let's say you want to tune and create a kind of high performance lock-free system / mode, such within a specific and simple hardware/environment; let's say having two CPUs ;

you could have a hard-limit on spin-cycles; then test clock timing ; light-job, mid-job, heavy-job etc ; each time you hit hard-limit you log ;

thus, you could draw a policy / control for a thread going lalaland in that setup.

patacongo · 2023-06-14T16:59:12Z

Suspending the holder of a spinlock can cause a myriad of failures. Suspending the holder does not free the spinlock; the spinlock remains locked while the holder is blocked. In the worst case, this can cause deadlocks since the holder of the spinlock is blocked and may not have the priority to run again to release it.

Another spinlock performance issue: #1488

The point of this issue, @mu578 your comments, and #1488 is that better management of spinlocks is needed for them to really work as needed in SMP. And, as @xiaoxiang781216 has pointed out, there are some tricky design issues that would have to be addressed to accomplish that.

Perhaps we could do something as simple as adding a in-a-spinlock flag in the TCB which would very temporarily lock the scheduler just like lockcount > 0? Very simple but also kind of kludgey.

TaiJuWu · 2023-09-15T01:18:15Z

I found taking spinlock will disable interrupt and enable interrupt when releasing spinlock in zephyr project.
It can avoid the spinlock holder be blocked. Maybe we could follow them?
reference: zephyr

patacongo · 2023-09-15T02:12:53Z

I found taking spinlock will disable interrupt and enable interrupt when releasing spinlock in zephyr project.
It can avoid the spinlock holder be blocked. Maybe we could follow them?
reference: zephyr

In SMP mode, disabling local interrupts will not prevent the thread from being blocked. But locking the scheduler with sched_lock() will.

TaiJuWu · 2023-09-15T02:41:15Z

I found taking spinlock will disable interrupt and enable interrupt when releasing spinlock in zephyr project.
It can avoid the spinlock holder be blocked. Maybe we could follow them?
reference: zephyr

In SMP mode, disabling local interrupts will not prevent the thread from being blocked. But locking the scheduler with sched_lock() will.

There are two method enter scheduler:

thread give up its cpu by himself (like mutex, event group)
interrupts force cpu to give up cpu (like timer, IPI...)

We just prevent scheduling from interrupt and first one should be guarantee by programmer.
If I am wrong, please correct me.

patacongo · 2023-09-15T03:46:42Z

There are two method enter scheduler:
1. thread give up its cpu by himself (like mutex, event group)

2. interrupts force cpu to give up cpu (like timer, IPI...)
We just prevent scheduling from interrupt and first one should be guarantee by programmer. If I am wrong, please correct me.

Also, a more generalized case of your 2) is:

A higher priority thread needs to run in SMP mode and the thread holding the semaphore becomes the lowest priority running thread. Then the thread holding the spinlock is blocked.

Disabling interrupts, of course, has bad effects on real time performance. Interrupts would have to be disabled throughout the time that the spinlock is held. So I don't like Zephyr's solution very much.

In SMP machines, different interrupts can be handled by different CPUs. So disabling local CPU interrupts doesn't work. NuttX does have enter/leave_critical_section() which will disable interrupts on all CPUs, but it is very expensive (it uses more spinlocks and some inter-CPU messaging).

sched_lock() works better than disabling interrupts. It prevents the thread from being blocked on the CPU without disabling interrupts. Interrupts can still occur and the kernel threads can still respond to interrupt event (but on a different CPUs). So there should be no loss of real time performance.

Giving up the CPU to momentarily service an interrupt is usually OK; when the interrupt returns, it will always return to the thread holding the spinlock. The thread stays in place if the scheduler is locked.

The only way that the thread could be blocked is if, as you say, in 1), if the thread holding the spinlock voluntarily blocks itself. That would be very bad situation in a real time system and should be avoided.

patacongo · 2023-09-15T03:52:31Z

sched_lock() works better than disabling interrupts. It prevents the thread from being blocked on the CPU without disabling interrupts. Interrupts can still occur and the kernel threads can still respond to interrupt event (but on a different CPUs). So there should be no loss of real time performance.

Hmmm... this might not be true. I would need to review the code. At least at one time, the logic locked all threads in place on all CPUs. So new ready-to-run threads couldn't run but would go in the pending task list until the scheduler is unlocked.

mu578 · 2023-09-16T02:08:23Z

Suspending the holder of a spinlock can cause a myriad of failures. Suspending the holder does not free the spinlock; the spinlock remains locked while the holder is blocked. In the worst case, this can cause deadlocks since the holder of the spinlock is blocked and may not have the priority to run again to release it.

Another spinlock performance issue: #1488

The point of this issue, @mu578 your comments, and #1488 is that better management of spinlocks is needed for them to really work as needed in SMP. And, as @xiaoxiang781216 has pointed out, there are some tricky design issues that would have to be addressed to accomplish that.

Perhaps we could do something as simple as adding a in-a-spinlock flag in the TCB which would very temporarily lock the scheduler just like lockcount > 0? Very simple but also kind of kludgey.

that was my point, indeed starting with a narrowed environment, then experiencing different isolated strategies, I don't think one can solve this kind of model by just a theoretical approach, it is too practical driven, maybe two different types of scheduling for high and low priority groups would be a solution.

xiaoxiang781216 · 2024-11-18T07:37:25Z

@hujun260 look at this issue.

patacongo changed the title ~~Can thread holding spinlock be blocked?~~ Thread holding spinlock can be blocked. Jun 14, 2023

TaiJuWu mentioned this issue Sep 12, 2023

Implement ticket spinlock #10605

Merged

xiaoxiang781216 linked a pull request Nov 18, 2024 that will close this issue

spin_lock_irqsave+sched_lock #14578

Open

xiaoxiang781216 self-assigned this Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thread holding spinlock can be blocked. #9531

Thread holding spinlock can be blocked. #9531

patacongo commented Jun 13, 2023 •

edited

Loading

xiaoxiang781216 commented Jun 13, 2023 •

edited

Loading

patacongo commented Jun 13, 2023 •

edited

Loading

mu578 commented Jun 14, 2023

patacongo commented Jun 14, 2023 •

edited

Loading

TaiJuWu commented Sep 15, 2023

patacongo commented Sep 15, 2023

TaiJuWu commented Sep 15, 2023 •

edited

Loading

patacongo commented Sep 15, 2023

patacongo commented Sep 15, 2023

mu578 commented Sep 16, 2023

xiaoxiang781216 commented Nov 18, 2024

Thread holding spinlock can be blocked. #9531

Thread holding spinlock can be blocked. #9531

Comments

patacongo commented Jun 13, 2023 • edited Loading

xiaoxiang781216 commented Jun 13, 2023 • edited Loading

patacongo commented Jun 13, 2023 • edited Loading

mu578 commented Jun 14, 2023

patacongo commented Jun 14, 2023 • edited Loading

TaiJuWu commented Sep 15, 2023

patacongo commented Sep 15, 2023

TaiJuWu commented Sep 15, 2023 • edited Loading

patacongo commented Sep 15, 2023

patacongo commented Sep 15, 2023

mu578 commented Sep 16, 2023

xiaoxiang781216 commented Nov 18, 2024

patacongo commented Jun 13, 2023 •

edited

Loading

xiaoxiang781216 commented Jun 13, 2023 •

edited

Loading

patacongo commented Jun 13, 2023 •

edited

Loading

patacongo commented Jun 14, 2023 •

edited

Loading

TaiJuWu commented Sep 15, 2023 •

edited

Loading