arch/xtensa: Add a call0 ABI variant #62117

andyross · 2023-08-30T15:48:18Z

New context layer for building Zephyr with the call0 ABI. The system call overhead of having to spill 64 GPRs, and the general complexity of the existing asm2 layer, is prohibitive when trying to build a userspace kernel entry path for Xtensa.

Note: it still uses the windowed registers, but as a quick-switch register set for optimized interrupt entry. This is not a mechanism for building Zephyr on hardware that actually lacks windows (though with some work it could be).

This is much (much) leaner in terms of entry/exit overhead, is significantly smaller, and should scale better with changes needed for userspace. There is a cost in terms of function call code size and execution speed, but in practice it seems manageable (I'm measuring ~4% with a no-frame-pointer build, Max Filippov reports more like ~13% in Linux builds -- both numbers are well under the kind of overhead we see enabling userspace at all).

andyross · 2023-08-30T15:49:26Z

See docs in the PR itself, but I'll paste here for visibility:

Zephyr with the Xtensa CALL0 ABI

The Xtensa register window mechanism is a poor fit for memory
protection. The existing Zephyr optimizations, in particular the
cross-stack call we do on interrupt entry, can't be used. The way
Xtensa windows spill is to write register state through the stack
pointers it finds in the hidden caller registers, which are wholely
under the control of the userspace app that was interrupted. The
kernel can't be allowed to do that, it's a huge security hole.

Naively, the kernel would have to write out the entire 64-entry
register set on every entry from a context with a PS.RING value other
than zero, including system calls. That's not really an acceptable
performance or complexity cost.

Instead, for userspace apps, Zephyr builds using the "call0" ABI,
where only a fixed set of 16 GPRs (see section 10 of the Cadence
Xtensa Instruction Set Architecture Summary for details on the ABI).
Kernel traps can then use the remaining GPRs for their own purposes,
greatly speeding up entry speed.

Toolchain

Existing Xtensa toolchains support a -mabi=call0 flag to generate
code with this ABI, and it works as expected. It sets a
XTENSA_CALL0_ABI preprocessor flag and the Xtensa HAL code and our
crt1.S file are set up to honor it appropriately (though there's a
glitch in the Zephyr HAL integration, see below).

Unfortunately that doesn't extend to binary artifacts. In particular
the libgcc.a files generated for our existing SDK toolchains are using
the windowed ABI and will fail at runtime when you hit 64 bit math.

Cadence toolchains have an automatic multilib scheme and will select a
compatible libgcc automatically.

But for now, you have to have a separate toolchain (or at least a
separately-built libgcc) for building call0 apps. I'm using this
script and it works fine, pending proper SDK integration:

#!/bin/sh
set -ex

TC=$1
if [ -z "$TC" ]; then
TC=sample_controller
fi

# Grab source (these are small)
git clone https://github.com/zephyrproject-rtos/sdk-ng
git clone https://github.com/crosstool-ng/crosstool-ng

# Build ct-ng itself
cd crosstool-ng
./bootstrap
./configure --enable-local
make -j$(nproc)

# Configure for the desired toolchain
ln -s ../sdk-ng/overlays
cp ../sdk-ng/configs/xtensa-${TC}_zephyr-elf.config .config

grep -v CT_TARGET_CFLAGS .config > asdf
mv asdf .config
echo CT_TARGET_CFLAGS="-mabi=call0" >> .config
echo CT_LIBC_NONE=y >> .config

./ct-ng olddefconfig
./ct-ng build.$(nproc)

echo "##"
echo "## Set these values to enable this toolchain:"
echo "##"
echo export CROSS_COMPILE=$HOME/x-tools/xtensa-${TC}_zephyr-elf/bin/xtensa-${TC}_zephyr-elf-
echo export ZEPHYR_TOOLCHAIN_VARIANT=cross-compile

Note you don't really need to use the toolchain that was built, it's
enough to take the libgcc.a and drop it on top of the one in your SDK,
just be careful to save the original, etc...

Longer term we need proper integration into the SDK, obviously.
Ideally we would just rebuild libgcc in isolation, but I couldn't
figure out how to do that except in the context of a full gcc build.
We could do the separate build in sdk-ng and then copy the files to a
multilib scheme of our own devising, which is something existing
Zephyr platforms sometimes need to do. Also I note that other gcc
architectures have already solved problems like this (c.f. x86_64 and
x32 working in the same toolchain, etc...), but couldn't find any
Xtensa-specific support.

Also as a quick start: it's possible to build Zephyr in such a way as
to avoid (almost/maybe) all calls to libgcc via disabling kconfigs
that trigger 64 bit code. I found I could get full coverage of the
new call0 code with a custom test rig and -- -DCONFIG_TIMEOUT_64BIT=n -DCONFIG_CBPRINTF_REDUCED_INTEGRAL=y -DCONFIG_CBPRINTF_NANO=y

Bugs and Missing Features

No support in the SDK. See above about toolchains.

The existing syscall implementation is a mock, because this happened
in parallel with the new MMU-enabled dc233c platform. The only code
difference is that it uses a single static region loaded from a
"_mock_priv_stack" symbol instead of a real thread kernel stack. In
theory everything will work when you enable CONFIG_USERSPACE, but...

Likewise the MMU integration is a blindly-copied stub and unexercised.
It's really simple though ("just read from PTEVADDR with PS.EXCM=1 to
prime the TLB"), so hopefully it won't need much fixing.

There's no C implementation of the Zephyr "call syscalls" API, just
assembly. The interface is deliberately based on the call0 function
call ABI though, so this should be trivially easy.

For simplicity, this code is written to save context to the arch
region of the thread struct and not the stack (except for nested
interrupts, obviously). That's a common pattern in Zephyr, but for
KERNEL_COHERENCE (SMP) platforms it's actually a performance headache,
because the stack is cached where the thread struct is not. We should
move this back to the stack, but that requires doing some logic in the
assembly to check that the resulting stack pointer doesn't overflow
the protected region, which is slightly non-trivial (or at least needs
a little inspiration).

Right now the Zephyr HAL integration doesn't build when the call0 flag
is set because of a duplicated symbol. I just disabled the build at
the cmake level for now, but this needs to be figured out.

The ARCH_EXCEPT() handling is a stub and doesn't actually do anything.
Really this isn't any different than the existing code, it just lives
in the asm2 files that were disabled and I have to find a shared
location for it.

Backtrace logging is likewise specific to the older frame format and
ABI and doesn't work. The call0 ABI is actually much simpler, though,
so this shouldn't be hard to make work.

FPU support isn't enabled. Likewise this isn't any different (except
trivially -- the context struct has a different layout). We just need
to copy code and find compatible hardware to test it on.

A good security audit is needed to make sure all the holes are
plugged. At the very least, note that when the PS.WOE bit is set,
user thread contexts are able to "rotate" registers and write to their
contexts by doing regular function calls. We need to be 100% sure no
RING 1+ code ever runs with this set. (Also the ISA spec is a little
ambiguous as to how the hardware treats CALLn/ENTRY/RETW instructions
when WOE=0, I worry a tiny bit there might be hardware that has
unpluggable holes...)

I realized when writing this that our existing handling for NMI
exceptions (strictly any level > EXCM_LEVEL) isn't correct, as we
generate ZSR-style EPS/EPC register usage in those handlers that isn't
strictly separated from the code it may/will have interrupted. This
is actually a bug with asm2 also, but it's going to make
high-priority/DEBUG/NMI exceptions unreliable unless fixed (I don't
think there are any Zephyr users currently though).

andyross · 2023-08-30T15:55:58Z

The context follows discussion from #61303 so be sure to check there also.

Basically: it's real and it works. The code is comparatively tiny vs. the maybe-too-clever-for-its-own-good asm2 integration (which I still really like, but really isn't a good fit for privilege separation). I'm very happy on the whole. The libgcc issue is going to be a pain, and I'm really hoping someone with more SDK fu and toolchain interest can help with that? @jcmvbkbc 's name is all over the GNU toolchain patches for the architecture, so he surely has suggestions on how to make this work best.

The only missing feature still causing any test failures I can find is ARCH_EXCEPT(), which I should have patched up RSN (it's just a code motion thing). The other missing stuff is all in the nice-to-have category and IMHO shouldn't block merge or further work.

This was all done against sample_controller, with a few tries for validation on my mt8195 at various stages. It should ("should") work everywhere but in practice there will be burps. First up on my list is to get it working on dc233c to validate the MMU handling, though that will need to wait a few days while I catch up on my day job tasks which have been woefully neglected while I work on fun stuff.

andyross · 2023-08-30T16:32:04Z

(added a selection of Xtensa stakeholders, please pull in anyone I've forgotten)

Also it's worth pointing out that at least a little thought has been given to a re-unification path, since I don't think people will want to give up on windows for platforms where they provide value, and no one wants to maintain two completely separate context layers in the long term.

Most of the code isn't involved in the "register-twirl on interrupt" trick at all. We can partition things out into paths like "pre-call context save" (which can be e.g. save-all-64, or twirl, etc...), "do the call" (which can be a simple ABI call or the cross stack trick), "post-call-switch", "post-call-no-switch", etc... The state needed to discriminate that is really just the RING bits of PS, so the overhead isn't awful even if getting it to generate optimally might be a pain. Definitely not for the first version but maybe at some point.

andyross · 2023-08-30T17:09:59Z

arch/xtensa/core/xtensa-win0.S

+	 */
+	movi a1, ~(PS_RING_MASK | PS_UM_MASK | PS_INTLEVEL_MASK)
+	and a0, a0, a1
+	addi a0, a0, 1 /* INTLEVEL = 1 */


I'm realizing while re-reading this post-submit that this is actually wrong. I blindly copied the need to adjust intlevel from the L1 handler, but in fact syscalls (after setup) are thread contexts and should be running at an interrupt level of zero, obviously. My mock rig didn't catch that, but obviously the test suite will start flagging errors if you're not able to receive L1 interrupts from syscalls.

Should work to just remove that one line and adjust all the comments talking about this.

dcpleung · 2023-08-30T17:43:46Z

FYI, I did a quick tinkering with sdk-ng trying to build with multilib. However, the usually way of specifying MULTILIB_OPTIONS was not successful. More investigation is needed to see if it is even possible to have multilib support.

andyross · 2023-08-30T17:51:21Z

FYI, I did a quick tinkering with sdk-ng trying to build with multilib.

You got farther than I did. I discovered those MULTILIB_ argument hooks via google, saw that they weren't touched by the xtensa stuff, and gave up hoping quite frankly that you'd know how to fix it. :) But we also have Max's attention here who's (I assume) the maintainer, so maybe there's an even better resource. @jcmvbkbc ?

jcmvbkbc · 2023-08-30T21:01:01Z

maybe there's an even better resource

Not at the moment, unfortunately. In linux it was solved by getting rid of the libgcc dependency and adding all required helper sources to the kernel.

andyross · 2023-08-30T22:03:29Z

In linux it was solved by getting rid of the libgcc dependency and adding all required helper sources to the kernel.

Oh hey, well look at that, there they are. That's really clever. Unfortunately we're likely to run into licensing issues with importing GPL code in Zephyr (but I'm really not the expert on this stuff); any idea if the source is available in Apache-compatible form anywhere? But the code in the Linux tree is a TON simpler (and smaller -- looks like most of libgcc is never used by xtensa compiler output?) than the original libgcc, so if nothing else it would be a good guide for us building our own artifacts for the SDK without building a whole compiler to throw away.

jcmvbkbc · 2023-08-31T00:24:35Z

any idea if the source is available in Apache-compatible form anywhere?

I only know of GPL-licensed version of this code. OTOH LLVM should have something equivalent, perhaps under different license.

New ABI, much simpler, slightly bigger/slower code, but MUCH cleaner syscall entry. Signed-off-by: Andy Ross <[email protected]>

This register alias was originally introduced to allow A0 to be used as a scratch register when handling exceptions from MOVSP instructions. (It replaced some upstream code from Cadence that hard-coded EXCSAVE1). Now the MMU code is now using too, and for exactly the same purpose. Calling it "ALLOCA" is only confusing. Rename it to make it clear what it's doing. Signed-off-by: Andy Ross <[email protected]>

This file doesn't need the asm2 header, and the preprocessor logic around whether to include the backtrace header is needless (all it does is declare functions). Signed-off-by: Andy Ross <[email protected]>

New context layer for building Zephyr with the call0 ABI. The system call overhead of having to spill 64 GPRs, and the general complexity of the existing asm2 layer, is prohibitive when trying to build a userspace kernel entry path for Xtensa. Note: it still uses the windowed registers, but as a quick-switch register set for optimized interrupt entry. This is not a mechanism for building Zephyr on hardware that actually lacks windows (though with some work it could be). This is much (much) leaner in terms of entry/exit overhead, is significantly smaller, and should scale better with changes needed for userspace. There is a cost in terms of function call code size and execution speed, but in practice it seems manageable (I'm measuring ~4% with a no-frame-pointer build, Max Filippov reports more like ~13% in Linux builds -- both numbers are well under the kind of overhead we see enabling userspace at all). Signed-off-by: Andy Ross <[email protected]>

The backtrace mechanism is specific to the windowed call ABI currently, so doesn't work with call0. This was hidden previously because backtraces were disabled by default at the kconfig level for the old ("sample_controller") qemu_xtensa, but are on now with dc233c for essentially arbitrary reasons. There was some protection against this already, extend it to the case where "we have windows, but aren't using them". Signed-off-by: Andy Ross <[email protected]>

There was a system call handler added as part of the call0 ABI integration. Wire it up to the standard Zephyr userspace system call API. Signed-off-by: Andy Ross <[email protected]>

The linkage for the "after vector" RAM region was specifying files explicitly, which breaks when other files (or new files) want to add code to these regions. In particular call0 needs to use iram to store local jump targets for the vector entry. There's really no good spec on this. Conventionally xtensa platforms have always put the .iram and iram0 sections (and their derived sections like literals) immediately after the vector table, but I doubt there's a promise to that extent anywhere. Just try to adhere to what convention there is. (I note there's also a ".iram1" section referenced here that I'm not moving because I don't know what's in it) Signed-off-by: Andy Ross <[email protected]>

The job of this file is almost exclusively page table management. MMU hardware interaction is in there but a small part, and they don't interact except in a few small APIs. Rename it in advance of doing work to the hardware side. Signed-off-by: Andy Ross <[email protected]>

This is a reworked MMU layer, sitting cleanly below the page table handling in the OS, intended to integrate with call0 ABI handling and to be more performant and efficient. Notable differences from the original work: + Significantly smaller code and simpler API (just three functions to be called from the OS/userspace/ptable layer). + Big README-MMU document containing my learnings over the process, so hopefully fewer people need to go through this in the future. + No TLB flushing needed. Clean separation of ASIDs, just requires that the upper levels match the ASID to the L1 page table page consistently. + Validation layer to check page table caching attributes. Caching the PTEs is poison to SMP, as it requires a data cache flush on every CPU for every change of a page table anywhere. Even on single-core systems, having one mapping be write-through and the other writeback (which is what we had!) can break things. Hopefully this catches all the mistakes. + Vector mapping is done with a 4k page and not a 4M page, leading to much more flexibility with hardware memory layout. The original scheme required that the 4M region containing vecbase be mapped virtually to a location other than the hardware address, which makes confusing linkage with call0 and difficult initialization constraints where the exception vectors run at different addresses before and after MMU setup (effectively forcing them to be PIC code). + More provably correct initialization, all MMU changes happen in a single asm block with no memory accesses which would generate a refill. + Requires CONFIG_XTENSA_CALL0_ABI. This isn't strictly an issue of behavior: the MMU code doesn't care about reguster use per se. But with the register-windowed context management code in the older asm2, it's impossible to provide security across memory domains. Just rip the bandaid off. Signed-off-by: Andy Ross <[email protected]>

We were trying to initialize the MMU extremely early (out of crt1.S), which was forcing some odd gymnastics with splitting up the initialization paths. Just don't do that. The MMU boots with a pefectly good mapping for physical memory. Use that until we're set up and about to launch the post-arch initialization in z_cstart(). Just in general, trying to run compiled C code before z_cstart() calls kernel_arch_init() is asking for trouble. We do it only in extreme situations, and we don't need it here. Signed-off-by: Andy Ross <[email protected]>

andyross · 2023-10-28T21:03:10Z

Another spin, now with full MMU support for dc233c.

This is intended to go under a rework of the userspace PR (which I
need to find a place to put). As it is it splits the existing MMU
code into a mostly-unchanged "page table" layer and an entirely new
hardware driver for the bottom bits. Paste major deltas from the
commit message:

Significantly smaller code and simpler API (just three functions to
be called from the OS/userspace/ptable layer).
Big README-MMU document containing my learnings over the process, so
hopefully fewer people need to go through this in the future.
No TLB flushing needed. Clean separation of ASIDs, just requires
that the upper levels match the ASID to the L1 page table page
consistently.
Validation layer to check page table caching attributes. Caching
the PTEs is poison to SMP, as it requires a data cache flush on
every CPU for every change of a page table anywhere. Even on
single-core systems, having one mapping be write-through and the
other writeback (which is what we had!) can break things. Hopefully
this catches all the mistakes.
Vector mapping is done with a 4k page and not a 4M page, leading to
much more flexibility with hardware memory layout. The original
scheme required that the 4M region containing vecbase be mapped
virtually to a location other than the hardware address, which makes
confusing linkage with call0 and difficult initialization
constraints where the exception vectors run at different addresses
before and after MMU setup (effectively forcing them to be PIC
code).
More provably correct initialization, all MMU changes happen in a
single asm block with no memory accesses which would generate a
refill.
Requires CONFIG_XTENSA_CALL0_ABI. This isn't strictly an issue of
behavior: the MMU code doesn't care about reguster use per se. But
with the register-windowed context management code in the older
asm2, it's impossible to provide security across memory domains.
Just rip the bandaid off.

ceolin · 2023-10-30T22:49:43Z

arch/xtensa/core/xtensa-win0.S

+ *
+ * + System calls have their own optimized handler
+ * + MMU TLB exceptions are handled directly, as they must not
+ *   re-enable interrupts at any point before returning to the trapping


what you mean here ? syscall handler is expected to run with interruptions enabled
https://github.com/zephyrproject-rtos/zephyr/blob/main/tests/kernel/mem_protect/userspace/src/main.c#L934

Maybe the typography is confusing? Two items in a bulleted list:

System calls have their own optimizer handler

TLB exceptions don't unmask EXCM

Syscalls absolutely do unmask interrupts, obviously.

ceolin · 2023-10-30T22:53:22Z

arch/xtensa/core/xtensa-win0.S

+	s32i a0, a10, ___thread_t_arch_OFFSET + __xtensa_win0_ctx_t_user_ps_OFFSET
+
+	/* Further cook PS to set RING=0 and UM=0, then set our sp and unmask */
+	movi a1, ~(PS_RING_MASK | PS_UM_MASK)


why are you clearing up UM ? As far as I could see kernel threads and user threads are using user vector mode, the exception is only here ...

In other hand, you have to unmask EXCM, since syscall handlers expects it.

Just performance, really. The user vector has more work to do than the kernel vector: it needs to check for syscall entry. Obviously system call handlers won't be making syscalls of their own, so there's no point.

And the code already removed EXCM from this the PS value in a0 a few lines above.

Oh, and actually in this version of the PR, all threads are initialized with a PS of zero, so all threads are kernel mode by default. Shimming the userspace stuff on top as we speak, at which point we'll be consistent between RING and UM (i.e. you either have RING=3/UM=1 or RING=0/UM=0, and no other states)

Oh ok, just noticed that thread ctx ps is not set :P

ceolin · 2023-10-30T22:57:14Z

arch/xtensa/core/xtensa-win0.S

+	l32i a1, a1, 0
+#endif
+	wsr a0, PS
+


assuming that you have to re-enable interruptions / exceptions before jump into the handler
you can be switched after setting PS and syscall parameters are not saved in this context stack ...

The interrupted userspace a0/a1/pc/ps values (the ones we need to preserve) are already stashed in the thread struct at this point. All a context switch arriving now has to do is save our kernel space GPRs.

that happens only with interrutptions ritght ? inside must_switch

I remember having seen case where a syscall triggers a context switch without an interruption happens and AFAIU xtensa_switch does not save them.

You can do a cooperative context switchin inside a syscall too (consider e.g. k_sleep()). But again system calls aren't interrupts to the userspace code, we can rely on userspace having preserved its own state too. In this particular case, the system call is just a regular ABI call: we're responsible for preserving the return value in a0 and the stack pointer, and a12-a15 (which are already preserved in kernel space by the ABI call we make to the handler). And we restore the PC and PS values upon returning too. So we only need four registers of state.

(Actually writing this I realize that it's 3: the return PC is just the address of the RET instruction in the stub that's going to jump to a0, we don't really need to store both and could just restore to the value in a0. Will add an optimization note)

ceolin · 2023-10-31T04:39:55Z

once I built the toolchain, I had to add +CONFIG_XTENSA_CALL0_ABI=y in boards/xtensa/qemu_xtensa/qemu_xtensa_mmu_defconfig to build the sample. Trying to run the sample I get:

 ** CPU  ** FATAL EXCEPTION
 ** CPU  ** FATAL EXCEPTION
 ** CPU  ** FATAL EXCEPTION
 ** CPU  ** FATAL EXCEPTION
 ** CPU  ** FATAL EXCEPTION
 ** CPU  ** FATAL EXCEPTION
 ** CPU  ** FATAL EXCEPTION
 ** CPU  ** FATAL EXCEPTION
 ** CPU  ** FATAL EXCEPTION
 ** CPU  ** FATAL EXCEPTION
 ** CPU  ** FATAL EXCEPTION
 ** CPU qemu-system-xtensa: terminating on signal 2
ninja: build stopped: interrupted by user.

it seems to be happening in printk("Launching crashing thread %p\n", &boom);

Removing some printks the code runs a little bit further but it stucks with:

Flagging medium priority interrupt                                       
swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint!
 (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nest
ed = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = s
wint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! 
(nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (ne
sted = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested =
 swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint
! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nes
ted = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = 
swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint!
 (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nest
ed = swint! (nested = swint! (nested = swin

It seems I am missing something to reproduce the sample, just after that I noticed that the build it generates a lot of warnings:

/home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/../../../../xtensa-dc233c_zephyr-elf/bin/ld.bfd: /home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/libgcc.a(_floatsidf.o): warning: incompatible Xtensa configuration (ABI do
es not match)                                                                                                                                                                                                                                                                                          
/home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/../../../../xtensa-dc233c_zephyr-elf/bin/ld.bfd: /home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/libgcc.a(_floatdidf.o): warning: incompatible Xtensa configuration (ABI do
es not match)                                                                                                                                                                                                                                                                                          
/home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/../../../../xtensa-dc233c_zephyr-elf/bin/ld.bfd: /home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/libgcc.a(_truncdfsf2.o): warning: incompatible Xtensa configuration (ABI d
oes not match)                                                                                                                                                                                                                                                                                         
/home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/../../../../xtensa-dc233c_zephyr-elf/bin/ld.bfd: /home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/libgcc.a(_extendsfdf2.o): warning: incompatible Xtensa configuration (ABI 
does not match)                                                                                                                                                                                                                                                                                        
/home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/../../../../xtensa-dc233c_zephyr-elf/bin/ld.bfd: /home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/libgcc.a(_divdi3.o): warning: incompatible Xtensa configuration (ABI does 
not match)                                                               
/home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/../../../../xtensa-dc233c_zephyr-elf/bin/ld.bfd: /home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/libgcc.a(_moddi3.o): warning: incompatible Xtensa configuration (ABI does 
not match)                                                               
/home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/../../../../xtensa-dc233c_zephyr-elf/bin/ld.bfd: /home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/libgcc.a(_udivdi3.o): warning: incompatible Xtensa configuration (ABI does
 not match)                                                              
/home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/../../../../xtensa-dc233c_zephyr-elf/bin/ld.bfd: /home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/libgcc.a(_umoddi3.o): warning: incompatible Xtensa configuration (ABI does
 not match)                                                              
/home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/../../../../xtensa-dc233c_zephyr-elf/bin/ld.bfd: warning: zephyr/zephyr.elf has a LOAD segment with RWX permissions
/home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/../../../../xtensa-dc233c_zephyr-elf/bin/ld.bfd: warning: zephyr/zephyr.elf has a LOAD segment with RWX permissions

What is missing in the toolchain ?

andyross · 2023-10-31T14:19:20Z

@ceolin:

It seems I am missing something to reproduce the sample, just after that I noticed that the build it generates a lot of warnings:

Yeah, that's what you get if you have a windowed toolchain (strictly the libgcc.a in the toolchain) building against a call0 Zephyr, or vice versa. It will crash any time it hits the ENTRY instruction at the start of a libgcc instruction (because PS.WOE=0). And indeed printk() is one of the big offenders, as it uses 64 bit math by default. If you look way upthread, there are some kconfigs I found that suppress enough 64 bit-isms to make the test run, and that probably works still.

Another option is to use a cadence toolchain, as those are multilib and don't have this problem.

Finally remember @dcpleung had patches to the SDK working which build a multilib gcc; I never got to trying them myself but for sure that's a better solution than the hand-built one in my instructions above. Maybe he could just hand you a copy?

ceolin · 2023-10-31T16:21:55Z

@ceolin:

It seems I am missing something to reproduce the sample, just after that I noticed that the build it generates a lot of warnings:

Yeah, that's what you get if you have a windowed toolchain (strictly the libgcc.a in the toolchain) building against a call0 Zephyr, or vice versa. It will crash any time it hits the ENTRY instruction at the start of a libgcc instruction (because PS.WOE=0). And indeed printk() is one of the big offenders, as it uses 64 bit math by default. If you look way upthread, there are some kconfigs I found that suppress enough 64 bit-isms to make the test run, and that probably works still.

Yep, I know it. The point is that I follow the whole instructions to get sdk-ng and crosstool and to build them. I thought with that we had a proper libgcc ...

Another option is to use a cadence toolchain, as those are multilib and don't have this problem.

One of the things you pointed in the other pr is not having a way to test in CI, that is where we are here ...

andyross · 2023-10-31T16:28:31Z

Yep, I know it. The point is that I follow the whole instructions to get sdk-ng and crosstool and to build them. I thought with that we had a proper libgcc ...

Can you check that the relevant CFLAGS kconfig is in the ct-ng .config file? It can get dropped if you reconfigure it via menuconfig or whatever, IIRC. You can also check the disassembly of libgcc.a, if you see any CALLXn/ENTRY/RETW instructions, it's a windowed library.

One of the things you pointed in the other pr is not having a way to test in CI, that is where we are here ...

I know. :) This is still a DNM PR for sure, pending a toolchain release.

I swear I'm not trying to monopolize this or turn it into a fight. We need one codebase that meets all requirements (though to be fair this one has/will-have some genuine advantages in performance/size). Clearly neither is there yet, the intent is to turn this one into a merged PR with the userspace support from the other. But I'm very happy to go in the other direction too.

dcpleung · 2023-10-31T17:29:29Z

As for toolchain, grab zephyrproject-rtos/sdk-ng#701 and make sure do a submodule update to grab the GCC patch.

ceolin · 2023-11-09T17:22:38Z

arch/xtensa/core/mmu.c

+	regs->ptevaddr = CONFIG_XTENSA_MMU_PTE_BASE + user_asid * 0x400000;
+
+	/* The ptables code doesn't add the mapping for the l1 page itself */
+	l1_page[l1_idx(regs->ptevaddr)] = (uint32_t) l1_page;


Manipulating the page table here does not look nice if the goal is to split the mmu / page table logic

the cache bits may not match as well ...

ceolin · 2023-11-09T17:28:15Z

arch/xtensa/core/mmu.c

+	__ASSERT_NO_MSG(user_asid != 1 && user_asid < CONFIG_XTENSA_MMU_ASID_MAX);
+
+	/* We don't use ring 1/2, ring 0 ASID must be 1 */
+	regs->rasid = (user_asid << 24) | 0x000001;


As far as I know each ring in RASID must be different, we can't leave rings 1/2 with the same value

github-actions · 2024-02-04T00:30:20Z

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

andyross added the DNM This PR should not be merged (Do Not Merge) label Aug 30, 2023

andyross requested review from nashif, ceolin, MaureenHelm and dcpleung August 30, 2023 15:48

andyross requested a review from tejlmand as a code owner August 30, 2023 15:48

zephyrbot added area: Build System area: Xtensa Xtensa Architecture area: Console area: Toolchains Toolchains labels Aug 30, 2023

zephyrbot requested review from 57300, jeremybettis, nordicjm and stephanosio August 30, 2023 15:48

zephyrbot assigned dcpleung Aug 30, 2023

andyross requested review from dbaluta, sylvioalves, glaubermaroto and iuliana-prodan August 30, 2023 16:14

andyross mentioned this pull request Aug 30, 2023

xtensa: Enable userspace with Xtensa MMU #61303

Merged

andyross commented Aug 30, 2023

View reviewed changes

This comment was marked as outdated.

Sign in to view

andyross added 11 commits October 28, 2023 05:27

arch/xtensa: Add XTENSA_CALL0_ABI kconfig

7c3705d

New ABI, much simpler, slightly bigger/slower code, but MUCH cleaner syscall entry. Signed-off-by: Andy Ross <[email protected]>

arch/xtensa: #include cleanup

058f21d

This file doesn't need the asm2 header, and the preprocessor logic around whether to include the backtrace header is needless (all it does is declare functions). Signed-off-by: Andy Ross <[email protected]>

arch/xtensa: Add call0-based syscall interface

28d3b5a

There was a system call handler added as part of the call0 ABI integration. Wire it up to the standard Zephyr userspace system call API. Signed-off-by: Andy Ross <[email protected]>

hello_world rig for exercising xtensa-win0

4901dc5

andyross force-pushed the xtensa-win0 branch from 5ed75a7 to 4901dc5 Compare October 28, 2023 21:03

andyross requested review from carlescufi and galak as code owners October 28, 2023 21:03

zephyrbot added the area: Samples Samples label Oct 28, 2023

ceolin reviewed Oct 30, 2023

View reviewed changes

ceolin reviewed Nov 9, 2023

View reviewed changes

nordicjm removed their request for review December 5, 2023 08:41

github-actions bot added the Stale label Feb 4, 2024

github-actions bot closed this Feb 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arch/xtensa: Add a call0 ABI variant #62117

arch/xtensa: Add a call0 ABI variant #62117

andyross commented Aug 30, 2023

andyross commented Aug 30, 2023

andyross commented Aug 30, 2023

andyross commented Aug 30, 2023

andyross Aug 30, 2023

dcpleung commented Aug 30, 2023

andyross commented Aug 30, 2023

jcmvbkbc commented Aug 30, 2023

andyross commented Aug 30, 2023

jcmvbkbc commented Aug 31, 2023

This comment was marked as outdated.

andyross commented Oct 28, 2023

ceolin Oct 30, 2023

andyross Oct 30, 2023

ceolin Oct 30, 2023

andyross Oct 30, 2023

andyross Oct 30, 2023

ceolin Oct 31, 2023

ceolin Oct 30, 2023

andyross Oct 30, 2023

ceolin Oct 31, 2023

andyross Oct 31, 2023

ceolin commented Oct 31, 2023

andyross commented Oct 31, 2023

ceolin commented Oct 31, 2023

andyross commented Oct 31, 2023 •

edited

Loading

dcpleung commented Oct 31, 2023

ceolin Nov 9, 2023

ceolin Nov 9, 2023

ceolin Nov 9, 2023

github-actions bot commented Feb 4, 2024

arch/xtensa: Add a call0 ABI variant #62117

arch/xtensa: Add a call0 ABI variant #62117

Conversation

andyross commented Aug 30, 2023

andyross commented Aug 30, 2023

Zephyr with the Xtensa CALL0 ABI

Toolchain

Bugs and Missing Features

andyross commented Aug 30, 2023

andyross commented Aug 30, 2023

Choose a reason for hiding this comment

dcpleung commented Aug 30, 2023

andyross commented Aug 30, 2023

jcmvbkbc commented Aug 30, 2023

andyross commented Aug 30, 2023

jcmvbkbc commented Aug 31, 2023

This comment was marked as outdated.

andyross commented Oct 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ceolin commented Oct 31, 2023

andyross commented Oct 31, 2023

ceolin commented Oct 31, 2023

andyross commented Oct 31, 2023 • edited Loading

dcpleung commented Oct 31, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Feb 4, 2024

andyross commented Oct 31, 2023 •

edited

Loading