Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

armv7/8m: use pendsv to handle context switch #13606

Merged
merged 6 commits into from
Oct 9, 2024

Conversation

GUIDINGLI
Copy link
Contributor

@GUIDINGLI GUIDINGLI commented Sep 25, 2024

Summary

armv6/7/8m: use pendsv to handle context switch

This PR support Nested interrupt in armv6/7/8m:

There are two types of nested interrupt model:

  1. Zero latency nested interrupt
Interrupt           Priority            Note
Data abort          Highest
SVC                 0x50
High irq1           0x60             ISR can't access system API
irq_save()          0x70
High irq2           0x80             ISR can't access system API
normal irq3         0xB0

We have already support this mode before this PR

  1. Nested interrupt which interrupt level lower than up_irq_save()
Interrupt           Priority            Note
Data abort          Highest
SVC                 0x70
irq_save()          0x80
High irq1           0x90              ISR can access system API
High irq2           0xA0              ISR can access system API
normal irq3         0xB0

Now, this PR can support this mode

Impact

arm-m

Testing

qemu

./tools/configure.sh mps2-an500:nsh -j8

which described in:
apache/nuttx-apps#2535

@github-actions github-actions bot added Arch: arm Issues related to ARM (32-bit) architecture Arch: arm64 Issues related to ARM64 (64-bit) architecture Arch: ceva Issues related to CEVA architecture Arch: risc-v Issues related to the RISC-V (32-bit or 64-bit) architecture Arch: x86 Issues related to the x86 architecture Arch: x86_64 Issues related to the x86_64 architecture Arch: xtensa Issues related to the Xtensa architecture Size: M The size of the change in this PR is medium labels Sep 25, 2024
@nuttxpr
Copy link

nuttxpr commented Sep 25, 2024

[Experimental Bot, please feedback here]

NuttX PR Requirements Review:

Verdict: The PR does not fully meet the NuttX requirements.

Reasoning:

  • Summary:
    • While the summary describes the technical implementation ("use pendsv to handle context switch"), it lacks a clear explanation of why this change is necessary.
    • What specific problems did the lack of nested interrupt support cause?
    • The summary should explicitly state that this enhances nested interrupt support for ARMv6/7/8m architectures.
  • Impact:
    • Impact on user: The PR states "arm-m" as the impact. This is insufficient. It needs to clarify if users need to adapt to this change. Will existing code using interrupts behave differently?
    • Other impacts: The PR is missing information about potential impacts on:
      • Build process
      • Documentation (does this require updates?)
      • Security (are there any implications of changing interrupt handling?)
      • Compatibility (any backward/forward compatibility concerns?)
  • Testing:
    • The testing section lacks details.
    • Build Hosts: Specify the OS, CPU architecture, and compiler versions used for testing.
    • Target(s): Provide more specific information about the "airoha chip" and the QEMU configuration used.
    • Logs: Instead of just stating "your testing logs here," include relevant snippets of the logs demonstrating the functionality before and after the change. Highlight the key differences and improvements.

Recommendations:

  1. Improve the Summary: Explain the "why" behind the change. Be explicit about the benefits of nested interrupt support and how this PR addresses them.
  2. Expand Impact Assessment: Address all impact points with specific details. If an impact area is not applicable, state it explicitly (e.g., "Impact on build: N/A").
  3. Provide Thorough Testing Information: Include specific details about the testing environment, target hardware/configurations, and relevant log snippets showcasing the improvement.

By addressing these points, the PR will be more informative and better aligned with NuttX's contribution guidelines.

@github-actions github-actions bot added Size: L The size of the change in this PR is large and removed Size: M The size of the change in this PR is medium labels Sep 25, 2024
@acassis
Copy link
Contributor

acassis commented Sep 25, 2024

@GUIDINGLI did you test High Priority / Zero Latency support (it is called highpri board config) ?

@masayuki2009
Copy link
Contributor

airoha chip & qemu

I think airoha chip is not included to the NuttX upstream.
Why don't you test this PR with supported boards such as STM32F4Discovery?

Also, please specify which configuration you tested with QEMU.

@GUIDINGLI
Copy link
Contributor Author

GUIDINGLI commented Sep 26, 2024

airoha chip & qemu

I think airoha chip is not included to the NuttX upstream. Why don't you test this PR with supported boards such as STM32F4Discovery?

Also, please specify which configuration you tested with QEMU.

Why must use STM32F4Discovery board ? Did the irq driver support HIGH prio irq in this board ?

If some one chip wants the nested irq which described in the Summary second model, the irq handler must obey some rules:

  1. the hw irq prio should not upon irq_save()
  2. the ISR should consider interrupted by other irq, (not a atomic operation any more)

So, I don't think the STM32F4Discovery chip driver have consider this.

And I confirm the airoha irq driver have handled the rules in above.
This mainly tell the reviewer, I have test the PR in a real chip, a real project, a real device.

Also you want run the PR in your own environment.
Then I provide the Qemu test case.
See the Summary.

@masayuki2009
Copy link
Contributor

Why must use STM32F4Discovery board ? Did the irq driver support HIGH prio irq in this board ?

@GUIDINGLI
So, do you mean that this PR ** does not affect ** the boards that do not use the HIGH prio irq?

@GUIDINGLI
Copy link
Contributor Author

Why must use STM32F4Discovery board ? Did the irq driver support HIGH prio irq in this board ?

@GUIDINGLI So, do you mean that this PR ** does not affect ** the boards that do not use the HIGH prio irq?

It does, I also have test the normal mode in other board, but not STM32F4Discovery.

In fact, we have a test team, and do the automation daily test in several boards, like:
BES board on xiaomi watch, BES board on smart speaker, and so on.
This can cover the normal prio irq.

NO STM32F4Discovery current now, next step I can suggest the test team add the this board to automation test

@GUIDINGLI
Copy link
Contributor Author

@xiaoxiang781216
@masayuki2009
@acassis
@pkarashchenko
@raiden00pl

Please help to review this PR

@raiden00pl
Copy link
Member

could you move all changes other than related to pendsv to a separate PR ? This is a change with a potentially huge impact, so it's better to have it isolated if possible.

@GUIDINGLI
Copy link
Contributor Author

could you move all changes other than related to pendsv to a separate PR ? This is a change with a potentially huge impact, so it's better to have it isolated if possible.

OK, split to:
#13651

I will rebase this PR, after 13651 merged.

@acassis
Copy link
Contributor

acassis commented Sep 26, 2024

@GUIDINGLI High Performance / Zero Latency is something different and I think you should test it too:
https://cwiki.apache.org/confluence/display/NUTTX/High+Performance%2C+Zero+Latency+Interrupts

@raiden00pl
Copy link
Member

What about this workaround for high priority interrupts which is based on PendSV ? https://nuttx.apache.org/docs/latest/guides/zerolatencyinterrupts.html#getting-back-into-the-game

If PendSV is used for something else, then it's not possible.

@github-actions github-actions bot removed Arch: arm64 Issues related to ARM64 (64-bit) architecture Arch: ceva Issues related to CEVA architecture Arch: risc-v Issues related to the RISC-V (32-bit or 64-bit) architecture labels Sep 27, 2024
@github-actions github-actions bot added the Area: Documentation Improvements or additions to documentation label Oct 9, 2024
@xiaoxiang781216 xiaoxiang781216 merged commit c00d477 into apache:master Oct 9, 2024
33 checks passed
@masayuki2009
Copy link
Contributor

@GUIDINGLI

I noticed that assertion happens during signest_test with spresense:smp (NCPUS=3).

signest_test: Simple case:
  Total signalled 1240  Odd=620 Even=620
  Total handled   1240  Odd=620 Even=620
  Total nested    0    Odd=0   Even=0  
signest_test: With task locking
  Total signalled 2480  Odd=1240 Even=1240
  Total handled   2480  Odd=1240 Even=1240
  Total nested    0    Odd=0   Even=0  
[CPU0] dump_assert_info: Current Version: NuttX  12.0.0 24527c1a00 Oct 11 2024 08:20:50 arm
[CPU0] dump_assert_info: Assertion failed : at file: :0 task(CPU0): ostest process: ostest 0xd010b19
[CPU0] up_dump_register: R0: 2d0305c0 R1: 00000001 R2: 00000000  R3: 0e002000
[CPU0] up_dump_register: R4: 00000000 R5: 2d033af8 R6: 00000000  FP: 00000000
[CPU0] up_dump_register: R8: 00000006 SB: 2d030090 SL: 2d0305c0 R11: 000000d8
[CPU0] up_dump_register: IP: 00000000 SP: 2d0387c0 LR: 0d004cff  PC: 0d004cff
[CPU0] up_dump_register: xPSR: 60000000 BASEPRI: 00000080 CONTROL: 00000006
[CPU0] up_dump_register: EXC_RETURN: 00000000
[CPU0] dump_stackinfo: User Stack:
[CPU0] dump_stackinfo:   base: 0x2d0369c0
[CPU0] dump_stackinfo:   size: 00008120
[CPU0] dump_stackinfo:     sp: 0x2d0387c0
[CPU0] stack_dump: 0x2d0387a0: 0d02b93f 2d0305c0 2d033af8 0d02ad45 0d02ad45 00000006 2d030090 0d004dbd
[CPU0] stack_dump: 0x2d0387c0: 0d02ad45 00000000 00000000 2d033ba4 2d033ba4 0d010b19 00000000 00000000
[CPU0] stack_dump: 0x2d0387e0: 00000000 00000000 2d033af8 2d0305c0 00000000 00000000 00000000 7474754e
[CPU0] stack_dump: 0x2d038800: 00000058 00000000 00000000 00000000 00000002 2d033b9c 2d032ea8 2d02fef8
[CPU0] stack_dump: 0x2d038820: 00000000 0d006773 0d006788 21000000 2e323100 00302e30 00000000 00000000
[CPU0] stack_dump: 0x2d038840: 00000000 34320000 63373235 30306131 74634f20 20313120 34323032 3a383020
[CPU0] stack_dump: 0x2d038860: 353a3032 ffff0030 2d02e7e0 2d033b4c 00000000 ffffffea 6d726100 ffffff00
[CPU0] stack_dump: 0x2d038880: 00000000 ffffff92 2d031d84 2d031d74 2d031d70 00000002 2d031d84 2d031d74
[CPU0] stack_dump: 0x2d0388a0: 2d031d70 0000002b 2d031d9c 0000000a 0000002a 0d00aa53 00000000 0d012bb5
[CPU0] stack_dump: 0x2d0388c0: 0000005a 1dcd6500 2d033af8 2d031d6c 2d031d78 0d012e33 ffffffff 00000066
[CPU0] stack_dump: 0x2d0388e0: 0000003a 0000003b 00010066 00000000 00000000 00002000 2d033cc8 00000005
[CPU0] stack_dump: 0x2d038900: 2d031c80 00000000 2d036988 0d02da9b 2d0369b6 00000000 00000000 0d010d2f
[CPU0] stack_dump: 0x2d038920: 0014d040 00000001 0000003f 00147680 000059c0 00147680 00005b28 00000000
[CPU0] stack_dump: 0x2d038940: 00000000 0d010b19 00000005 2d036988 00000000 00000000 00000000 0d00ad77
[CPU0] stack_dump: 0x2d038960: 0d010b19 0d007811 00000000 00000000 00000000 00000000 00000000 00000000

To reproduce the assertion, please apply the following changes.

diff --git a/boards/arm/cxd56xx/spresense/configs/smp/defconfig b/boards/arm/cxd56xx/spresense/configs/smp/defconfig
index 4209329e9e..e171235ea5 100644
--- a/boards/arm/cxd56xx/spresense/configs/smp/defconfig
+++ b/boards/arm/cxd56xx/spresense/configs/smp/defconfig
@@ -28,6 +28,8 @@ CONFIG_CXD56_I2C=y
 CONFIG_CXD56_SPI4=y
 CONFIG_CXD56_SPI5=y
 CONFIG_CXD56_SPI=y
+CONFIG_DEBUG_ASSERTIONS=y
+CONFIG_DEBUG_FEATURES=y
 CONFIG_DEBUG_FULLOPT=y
 CONFIG_DEBUG_SYMBOLS=y
 CONFIG_EXAMPLES_HELLO=y
@@ -36,6 +38,7 @@ CONFIG_FS_PROCFS_REGISTER=y
 CONFIG_HAVE_CXX=y
 CONFIG_HAVE_CXXINITIALIZE=y
 CONFIG_INIT_ENTRYPOINT="spresense_main"
+CONFIG_NDEBUG=y
 CONFIG_NSH_ARCHINIT=y
 CONFIG_NSH_BUILTIN_APPS=y
 CONFIG_NSH_READLINE=y
@@ -47,7 +50,7 @@ CONFIG_RR_INTERVAL=200
 CONFIG_RTC=y
 CONFIG_RTC_DRIVER=y
 CONFIG_SMP=y
-CONFIG_SMP_NCPUS=2
+CONFIG_SMP_NCPUS=3
 CONFIG_SPI=y
 CONFIG_START_DAY=3
 CONFIG_START_MONTH=10
@@ -57,5 +60,6 @@ CONFIG_SYSTEM_NSH=y
 CONFIG_SYSTEM_SYSTEM=y
 CONFIG_SYSTEM_TASKSET=y
 CONFIG_TESTING_OSTEST=y
+CONFIG_TESTING_OSTEST_LOOPS=10
 CONFIG_TESTING_SMP=y
 CONFIG_UART1_SERIAL_CONSOLE=y

@GUIDINGLI
Copy link
Contributor Author

Working on it, trying to find a sony board...

So, this issue can reproduce on Qemu ?

@masayuki2009
Copy link
Contributor

So, this issue can reproduce on Qemu ?

It can not be reproducible on QEMU since armv7-m SMP target is not available on QEMU.

@jasonbu
Copy link
Contributor

jasonbu commented Oct 11, 2024

@GUIDINGLI

I noticed that assertion happens during signest_test with spresense:smp (NCPUS=3).

signest_test: Simple case:
  Total signalled 1240  Odd=620 Even=620
  Total handled   1240  Odd=620 Even=620
  Total nested    0    Odd=0   Even=0  
signest_test: With task locking
  Total signalled 2480  Odd=1240 Even=1240
  Total handled   2480  Odd=1240 Even=1240
  Total nested    0    Odd=0   Even=0  
[CPU0] dump_assert_info: Current Version: NuttX  12.0.0 24527c1a00 Oct 11 2024 08:20:50 arm
[CPU0] dump_assert_info: Assertion failed : at file: :0 task(CPU0): ostest process: ostest 0xd010b19
[CPU0] up_dump_register: R0: 2d0305c0 R1: 00000001 R2: 00000000  R3: 0e002000
[CPU0] up_dump_register: R4: 00000000 R5: 2d033af8 R6: 00000000  FP: 00000000
[CPU0] up_dump_register: R8: 00000006 SB: 2d030090 SL: 2d0305c0 R11: 000000d8
[CPU0] up_dump_register: IP: 00000000 SP: 2d0387c0 LR: 0d004cff  PC: 0d004cff
[CPU0] up_dump_register: xPSR: 60000000 BASEPRI: 00000080 CONTROL: 00000006
[CPU0] up_dump_register: EXC_RETURN: 00000000
[CPU0] dump_stackinfo: User Stack:
[CPU0] dump_stackinfo:   base: 0x2d0369c0
[CPU0] dump_stackinfo:   size: 00008120
[CPU0] dump_stackinfo:     sp: 0x2d0387c0
[CPU0] stack_dump: 0x2d0387a0: 0d02b93f 2d0305c0 2d033af8 0d02ad45 0d02ad45 00000006 2d030090 0d004dbd
[CPU0] stack_dump: 0x2d0387c0: 0d02ad45 00000000 00000000 2d033ba4 2d033ba4 0d010b19 00000000 00000000
[CPU0] stack_dump: 0x2d0387e0: 00000000 00000000 2d033af8 2d0305c0 00000000 00000000 00000000 7474754e
[CPU0] stack_dump: 0x2d038800: 00000058 00000000 00000000 00000000 00000002 2d033b9c 2d032ea8 2d02fef8
[CPU0] stack_dump: 0x2d038820: 00000000 0d006773 0d006788 21000000 2e323100 00302e30 00000000 00000000
[CPU0] stack_dump: 0x2d038840: 00000000 34320000 63373235 30306131 74634f20 20313120 34323032 3a383020
[CPU0] stack_dump: 0x2d038860: 353a3032 ffff0030 2d02e7e0 2d033b4c 00000000 ffffffea 6d726100 ffffff00
[CPU0] stack_dump: 0x2d038880: 00000000 ffffff92 2d031d84 2d031d74 2d031d70 00000002 2d031d84 2d031d74
[CPU0] stack_dump: 0x2d0388a0: 2d031d70 0000002b 2d031d9c 0000000a 0000002a 0d00aa53 00000000 0d012bb5
[CPU0] stack_dump: 0x2d0388c0: 0000005a 1dcd6500 2d033af8 2d031d6c 2d031d78 0d012e33 ffffffff 00000066
[CPU0] stack_dump: 0x2d0388e0: 0000003a 0000003b 00010066 00000000 00000000 00002000 2d033cc8 00000005
[CPU0] stack_dump: 0x2d038900: 2d031c80 00000000 2d036988 0d02da9b 2d0369b6 00000000 00000000 0d010d2f
[CPU0] stack_dump: 0x2d038920: 0014d040 00000001 0000003f 00147680 000059c0 00147680 00005b28 00000000
[CPU0] stack_dump: 0x2d038940: 00000000 0d010b19 00000005 2d036988 00000000 00000000 00000000 0d00ad77
[CPU0] stack_dump: 0x2d038960: 0d010b19 0d007811 00000000 00000000 00000000 00000000 00000000 00000000

To reproduce the assertion, please apply the following changes.

diff --git a/boards/arm/cxd56xx/spresense/configs/smp/defconfig b/boards/arm/cxd56xx/spresense/configs/smp/defconfig
index 4209329e9e..e171235ea5 100644
--- a/boards/arm/cxd56xx/spresense/configs/smp/defconfig
+++ b/boards/arm/cxd56xx/spresense/configs/smp/defconfig
@@ -28,6 +28,8 @@ CONFIG_CXD56_I2C=y
 CONFIG_CXD56_SPI4=y
 CONFIG_CXD56_SPI5=y
 CONFIG_CXD56_SPI=y
+CONFIG_DEBUG_ASSERTIONS=y
+CONFIG_DEBUG_FEATURES=y
 CONFIG_DEBUG_FULLOPT=y
 CONFIG_DEBUG_SYMBOLS=y
 CONFIG_EXAMPLES_HELLO=y
@@ -36,6 +38,7 @@ CONFIG_FS_PROCFS_REGISTER=y
 CONFIG_HAVE_CXX=y
 CONFIG_HAVE_CXXINITIALIZE=y
 CONFIG_INIT_ENTRYPOINT="spresense_main"
+CONFIG_NDEBUG=y
 CONFIG_NSH_ARCHINIT=y
 CONFIG_NSH_BUILTIN_APPS=y
 CONFIG_NSH_READLINE=y
@@ -47,7 +50,7 @@ CONFIG_RR_INTERVAL=200
 CONFIG_RTC=y
 CONFIG_RTC_DRIVER=y
 CONFIG_SMP=y
-CONFIG_SMP_NCPUS=2
+CONFIG_SMP_NCPUS=3
 CONFIG_SPI=y
 CONFIG_START_DAY=3
 CONFIG_START_MONTH=10
@@ -57,5 +60,6 @@ CONFIG_SYSTEM_NSH=y
 CONFIG_SYSTEM_SYSTEM=y
 CONFIG_SYSTEM_TASKSET=y
 CONFIG_TESTING_OSTEST=y
+CONFIG_TESTING_OSTEST_LOOPS=10
 CONFIG_TESTING_SMP=y
 CONFIG_UART1_SERIAL_CONSOLE=y

without any change, will not reproduce problem.

and only have to change CONFIG_SMP_NCPUS=3
able to reproduce problem.
make signest_test the first one run ostest will keep reproduce. doing more detail locating.

@GUIDINGLI
Copy link
Contributor Author

@masayuki2009
Expected to be completed by next Monday.

@jasonbu
Copy link
Contributor

jasonbu commented Oct 12, 2024

@masayuki2009
do you have method or guide how to connect debugger to the spresense board, Jlink or other method, as we cannot re-produce it in sim or qemu environments.

@jasonbu
Copy link
Contributor

jasonbu commented Oct 14, 2024

@masayuki2009
hi, hope you see the message.
It take us more than expected time to make the re-produce stable, in some cases, log will make scenes disappear. and trying to debug by add critical logs to know what exactly happening inside this case.
we are still working on it. If it's urgent. we can share more half progress code to you.

@jasonbu
Copy link
Contributor

jasonbu commented Oct 16, 2024

So, this issue can reproduce on Qemu ?

It can not be reproducible on QEMU since armv7-m SMP target is not available on QEMU.

hi @masayuki2009 ,please review if #14363 can fix your problem.

@raiden00pl
Copy link
Member

@GUIDINGLI @xiaoxiang781216 hi, this PR broke nrf52832-dk/timer example (nrf52840 also broken). No crash visible, system resets for no reason.

image

@raiden00pl
Copy link
Member

raiden00pl commented Oct 19, 2024

timer example is also broken on nucleo-g070rb (cortex M0+) so we can assume that more boards are affected.
But in this case we have a visible crash:

image

@xiaoxiang781216
Copy link
Contributor

@jasonbu could you look at the new issue reported by @raiden00pl ?

@jasonbu
Copy link
Contributor

jasonbu commented Oct 21, 2024

timer example is also broken on nucleo-g070rb (cortex M0+) so we can assume that more boards are affected. But in this case we have a visible crash:

image

@raiden00pl hi, can you provide more information how to re-produce the problem, need nucleo-g070rb board? or we possible make it reproduce in qemu?

@raiden00pl
Copy link
Member

@jasonbu

nucleo-g070rb/nsh configuration from upstream and just run built-in timer application.
I don't know if this can be reproduced on QEMU.

@jasonbu
Copy link
Contributor

jasonbu commented Oct 21, 2024

@jasonbu

nucleo-g070rb/nsh configuration from upstream and just run built-in timer application. I don't know if this can be reproduced on QEMU.

will try later if it can re-produce in
boards/arm/mps/mps2-an500/configs/nsh/defconfig & qemu,

if you can speed up will be very appreciate.

@raiden00pl
Copy link
Member

@jasonbu there is no support for the timer driver (timer_register()) in mps2-an500 so I doubt you can reproduce this issue with this platform. It's possible that timer drivers are broken also for other arm chips, maybe you have any stm32 nearby?

@jasonbu
Copy link
Contributor

jasonbu commented Oct 24, 2024

@jasonbu there is no support for the timer driver (timer_register()) in mps2-an500 so I doubt you can reproduce this issue with this platform. It's possible that timer drivers are broken also for other arm chips, maybe you have any stm32 nearby?

there is a STM32H7 nearby, will try a little later.

@raiden00pl
Copy link
Member

@jasonbu any progres ?

@jasonbu
Copy link
Contributor

jasonbu commented Nov 4, 2024

@jasonbu any progres ?

let's talk in your issue #14514

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arch: arm Issues related to ARM (32-bit) architecture Area: Documentation Improvements or additions to documentation Size: M The size of the change in this PR is medium
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants