Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arch/xtensa: Add a call0 ABI variant #62117

Closed
wants to merge 12 commits into from

Conversation

andyross
Copy link
Contributor

New context layer for building Zephyr with the call0 ABI. The system call overhead of having to spill 64 GPRs, and the general complexity of the existing asm2 layer, is prohibitive when trying to build a userspace kernel entry path for Xtensa.

Note: it still uses the windowed registers, but as a quick-switch register set for optimized interrupt entry. This is not a mechanism for building Zephyr on hardware that actually lacks windows (though with some work it could be).

This is much (much) leaner in terms of entry/exit overhead, is significantly smaller, and should scale better with changes needed for userspace. There is a cost in terms of function call code size and execution speed, but in practice it seems manageable (I'm measuring ~4% with a no-frame-pointer build, Max Filippov reports more like ~13% in Linux builds -- both numbers are well under the kind of overhead we see enabling userspace at all).

@andyross
Copy link
Contributor Author

See docs in the PR itself, but I'll paste here for visibility:

Zephyr with the Xtensa CALL0 ABI

The Xtensa register window mechanism is a poor fit for memory
protection. The existing Zephyr optimizations, in particular the
cross-stack call we do on interrupt entry, can't be used. The way
Xtensa windows spill is to write register state through the stack
pointers it finds in the hidden caller registers, which are wholely
under the control of the userspace app that was interrupted. The
kernel can't be allowed to do that, it's a huge security hole.

Naively, the kernel would have to write out the entire 64-entry
register set on every entry from a context with a PS.RING value other
than zero, including system calls. That's not really an acceptable
performance or complexity cost.

Instead, for userspace apps, Zephyr builds using the "call0" ABI,
where only a fixed set of 16 GPRs (see section 10 of the Cadence
Xtensa Instruction Set Architecture Summary for details on the ABI).
Kernel traps can then use the remaining GPRs for their own purposes,
greatly speeding up entry speed.

Toolchain

Existing Xtensa toolchains support a -mabi=call0 flag to generate
code with this ABI, and it works as expected. It sets a
XTENSA_CALL0_ABI preprocessor flag and the Xtensa HAL code and our
crt1.S file are set up to honor it appropriately (though there's a
glitch in the Zephyr HAL integration, see below).

Unfortunately that doesn't extend to binary artifacts. In particular
the libgcc.a files generated for our existing SDK toolchains are using
the windowed ABI and will fail at runtime when you hit 64 bit math.

Cadence toolchains have an automatic multilib scheme and will select a
compatible libgcc automatically.

But for now, you have to have a separate toolchain (or at least a
separately-built libgcc) for building call0 apps. I'm using this
script and it works fine, pending proper SDK integration:

#!/bin/sh
set -ex

TC=$1
if [ -z "$TC" ]; then
TC=sample_controller
fi

# Grab source (these are small)
git clone https://github.com/zephyrproject-rtos/sdk-ng
git clone https://github.com/crosstool-ng/crosstool-ng

# Build ct-ng itself
cd crosstool-ng
./bootstrap
./configure --enable-local
make -j$(nproc)

# Configure for the desired toolchain
ln -s ../sdk-ng/overlays
cp ../sdk-ng/configs/xtensa-${TC}_zephyr-elf.config .config

grep -v CT_TARGET_CFLAGS .config > asdf
mv asdf .config
echo CT_TARGET_CFLAGS="-mabi=call0" >> .config
echo CT_LIBC_NONE=y >> .config

./ct-ng olddefconfig
./ct-ng build.$(nproc)

echo "##"
echo "## Set these values to enable this toolchain:"
echo "##"
echo export CROSS_COMPILE=$HOME/x-tools/xtensa-${TC}_zephyr-elf/bin/xtensa-${TC}_zephyr-elf-
echo export ZEPHYR_TOOLCHAIN_VARIANT=cross-compile

Note you don't really need to use the toolchain that was built, it's
enough to take the libgcc.a and drop it on top of the one in your SDK,
just be careful to save the original, etc...

Longer term we need proper integration into the SDK, obviously.
Ideally we would just rebuild libgcc in isolation, but I couldn't
figure out how to do that except in the context of a full gcc build.
We could do the separate build in sdk-ng and then copy the files to a
multilib scheme of our own devising, which is something existing
Zephyr platforms sometimes need to do. Also I note that other gcc
architectures have already solved problems like this (c.f. x86_64 and
x32 working in the same toolchain, etc...), but couldn't find any
Xtensa-specific support.

Also as a quick start: it's possible to build Zephyr in such a way as
to avoid (almost/maybe) all calls to libgcc via disabling kconfigs
that trigger 64 bit code. I found I could get full coverage of the
new call0 code with a custom test rig and -- -DCONFIG_TIMEOUT_64BIT=n -DCONFIG_CBPRINTF_REDUCED_INTEGRAL=y -DCONFIG_CBPRINTF_NANO=y

Bugs and Missing Features

No support in the SDK. See above about toolchains.

The existing syscall implementation is a mock, because this happened
in parallel with the new MMU-enabled dc233c platform. The only code
difference is that it uses a single static region loaded from a
"_mock_priv_stack" symbol instead of a real thread kernel stack. In
theory everything will work when you enable CONFIG_USERSPACE, but...

Likewise the MMU integration is a blindly-copied stub and unexercised.
It's really simple though ("just read from PTEVADDR with PS.EXCM=1 to
prime the TLB"), so hopefully it won't need much fixing.

There's no C implementation of the Zephyr "call syscalls" API, just
assembly. The interface is deliberately based on the call0 function
call ABI though, so this should be trivially easy.

For simplicity, this code is written to save context to the arch
region of the thread struct and not the stack (except for nested
interrupts, obviously). That's a common pattern in Zephyr, but for
KERNEL_COHERENCE (SMP) platforms it's actually a performance headache,
because the stack is cached where the thread struct is not. We should
move this back to the stack, but that requires doing some logic in the
assembly to check that the resulting stack pointer doesn't overflow
the protected region, which is slightly non-trivial (or at least needs
a little inspiration).

Right now the Zephyr HAL integration doesn't build when the call0 flag
is set because of a duplicated symbol. I just disabled the build at
the cmake level for now, but this needs to be figured out.

The ARCH_EXCEPT() handling is a stub and doesn't actually do anything.
Really this isn't any different than the existing code, it just lives
in the asm2 files that were disabled and I have to find a shared
location for it.

Backtrace logging is likewise specific to the older frame format and
ABI and doesn't work. The call0 ABI is actually much simpler, though,
so this shouldn't be hard to make work.

FPU support isn't enabled. Likewise this isn't any different (except
trivially -- the context struct has a different layout). We just need
to copy code and find compatible hardware to test it on.

A good security audit is needed to make sure all the holes are
plugged. At the very least, note that when the PS.WOE bit is set,
user thread contexts are able to "rotate" registers and write to their
contexts by doing regular function calls. We need to be 100% sure no
RING 1+ code ever runs with this set. (Also the ISA spec is a little
ambiguous as to how the hardware treats CALLn/ENTRY/RETW instructions
when WOE=0, I worry a tiny bit there might be hardware that has
unpluggable holes...)

I realized when writing this that our existing handling for NMI
exceptions (strictly any level > EXCM_LEVEL) isn't correct, as we
generate ZSR-style EPS/EPC register usage in those handlers that isn't
strictly separated from the code it may/will have interrupted. This
is actually a bug with asm2 also, but it's going to make
high-priority/DEBUG/NMI exceptions unreliable unless fixed (I don't
think there are any Zephyr users currently though).

@andyross
Copy link
Contributor Author

The context follows discussion from #61303 so be sure to check there also.

Basically: it's real and it works. The code is comparatively tiny vs. the maybe-too-clever-for-its-own-good asm2 integration (which I still really like, but really isn't a good fit for privilege separation). I'm very happy on the whole. The libgcc issue is going to be a pain, and I'm really hoping someone with more SDK fu and toolchain interest can help with that? @jcmvbkbc 's name is all over the GNU toolchain patches for the architecture, so he surely has suggestions on how to make this work best.

The only missing feature still causing any test failures I can find is ARCH_EXCEPT(), which I should have patched up RSN (it's just a code motion thing). The other missing stuff is all in the nice-to-have category and IMHO shouldn't block merge or further work.

This was all done against sample_controller, with a few tries for validation on my mt8195 at various stages. It should ("should") work everywhere but in practice there will be burps. First up on my list is to get it working on dc233c to validate the MMU handling, though that will need to wait a few days while I catch up on my day job tasks which have been woefully neglected while I work on fun stuff.

@andyross
Copy link
Contributor Author

(added a selection of Xtensa stakeholders, please pull in anyone I've forgotten)

Also it's worth pointing out that at least a little thought has been given to a re-unification path, since I don't think people will want to give up on windows for platforms where they provide value, and no one wants to maintain two completely separate context layers in the long term.

Most of the code isn't involved in the "register-twirl on interrupt" trick at all. We can partition things out into paths like "pre-call context save" (which can be e.g. save-all-64, or twirl, etc...), "do the call" (which can be a simple ABI call or the cross stack trick), "post-call-switch", "post-call-no-switch", etc... The state needed to discriminate that is really just the RING bits of PS, so the overhead isn't awful even if getting it to generate optimally might be a pain. Definitely not for the first version but maybe at some point.

*/
movi a1, ~(PS_RING_MASK | PS_UM_MASK | PS_INTLEVEL_MASK)
and a0, a0, a1
addi a0, a0, 1 /* INTLEVEL = 1 */
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm realizing while re-reading this post-submit that this is actually wrong. I blindly copied the need to adjust intlevel from the L1 handler, but in fact syscalls (after setup) are thread contexts and should be running at an interrupt level of zero, obviously. My mock rig didn't catch that, but obviously the test suite will start flagging errors if you're not able to receive L1 interrupts from syscalls.

Should work to just remove that one line and adjust all the comments talking about this.

@dcpleung
Copy link
Member

FYI, I did a quick tinkering with sdk-ng trying to build with multilib. However, the usually way of specifying MULTILIB_OPTIONS was not successful. More investigation is needed to see if it is even possible to have multilib support.

@andyross
Copy link
Contributor Author

FYI, I did a quick tinkering with sdk-ng trying to build with multilib.

You got farther than I did. I discovered those MULTILIB_ argument hooks via google, saw that they weren't touched by the xtensa stuff, and gave up hoping quite frankly that you'd know how to fix it. :) But we also have Max's attention here who's (I assume) the maintainer, so maybe there's an even better resource. @jcmvbkbc ?

@jcmvbkbc
Copy link
Contributor

maybe there's an even better resource

Not at the moment, unfortunately. In linux it was solved by getting rid of the libgcc dependency and adding all required helper sources to the kernel.

@andyross
Copy link
Contributor Author

In linux it was solved by getting rid of the libgcc dependency and adding all required helper sources to the kernel.

Oh hey, well look at that, there they are. That's really clever. Unfortunately we're likely to run into licensing issues with importing GPL code in Zephyr (but I'm really not the expert on this stuff); any idea if the source is available in Apache-compatible form anywhere? But the code in the Linux tree is a TON simpler (and smaller -- looks like most of libgcc is never used by xtensa compiler output?) than the original libgcc, so if nothing else it would be a good guide for us building our own artifacts for the SDK without building a whole compiler to throw away.

@jcmvbkbc
Copy link
Contributor

any idea if the source is available in Apache-compatible form anywhere?

I only know of GPL-licensed version of this code. OTOH LLVM should have something equivalent, perhaps under different license.

@dcpleung

This comment was marked as outdated.

New ABI, much simpler, slightly bigger/slower code, but MUCH cleaner
syscall entry.

Signed-off-by: Andy Ross <[email protected]>
This register alias was originally introduced to allow A0 to be used
as a scratch register when handling exceptions from MOVSP
instructions. (It replaced some upstream code from Cadence that
hard-coded EXCSAVE1).  Now the MMU code is now using too, and for
exactly the same purpose.

Calling it "ALLOCA" is only confusing.  Rename it to make it clear
what it's doing.

Signed-off-by: Andy Ross <[email protected]>
This file doesn't need the asm2 header, and the preprocessor logic
around whether to include the backtrace header is needless (all it
does is declare functions).

Signed-off-by: Andy Ross <[email protected]>
New context layer for building Zephyr with the call0 ABI.  The system
call overhead of having to spill 64 GPRs, and the general complexity
of the existing asm2 layer, is prohibitive when trying to build a
userspace kernel entry path for Xtensa.

Note: it still uses the windowed registers, but as a quick-switch
register set for optimized interrupt entry.  This is not a mechanism
for building Zephyr on hardware that actually lacks windows (though
with some work it could be).

This is much (much) leaner in terms of entry/exit overhead, is
significantly smaller, and should scale better with changes needed for
userspace.  There is a cost in terms of function call code size and
execution speed, but in practice it seems manageable (I'm measuring
~4% with a no-frame-pointer build, Max Filippov reports more like ~13%
in Linux builds -- both numbers are well under the kind of overhead we
see enabling userspace at all).

Signed-off-by: Andy Ross <[email protected]>
The backtrace mechanism is specific to the windowed call ABI
currently, so doesn't work with call0.  This was hidden previously
because backtraces were disabled by default at the kconfig level for
the old ("sample_controller") qemu_xtensa, but are on now with dc233c
for essentially arbitrary reasons.  There was some protection against
this already, extend it to the case where "we have windows, but aren't
using them".

Signed-off-by: Andy Ross <[email protected]>
There was a system call handler added as part of the call0 ABI
integration.  Wire it up to the standard Zephyr userspace system call
API.

Signed-off-by: Andy Ross <[email protected]>
The linkage for the "after vector" RAM region was specifying files
explicitly, which breaks when other files (or new files) want to add
code to these regions.  In particular call0 needs to use iram to store
local jump targets for the vector entry.

There's really no good spec on this.  Conventionally xtensa platforms
have always put the .iram and iram0 sections (and their derived
sections like literals) immediately after the vector table, but I
doubt there's a promise to that extent anywhere.

Just try to adhere to what convention there is.

(I note there's also a ".iram1" section referenced here that I'm not
moving because I don't know what's in it)

Signed-off-by: Andy Ross <[email protected]>
The job of this file is almost exclusively page table management.  MMU
hardware interaction is in there but a small part, and they don't
interact except in a few small APIs.  Rename it in advance of doing
work to the hardware side.

Signed-off-by: Andy Ross <[email protected]>
This is a reworked MMU layer, sitting cleanly below the page table
handling in the OS, intended to integrate with call0 ABI handling and
to be more performant and efficient.  Notable differences from the
original work:

+ Significantly smaller code and simpler API (just three functions to
  be called from the OS/userspace/ptable layer).

+ Big README-MMU document containing my learnings over the process, so
  hopefully fewer people need to go through this in the future.

+ No TLB flushing needed.  Clean separation of ASIDs, just requires
  that the upper levels match the ASID to the L1 page table page
  consistently.

+ Validation layer to check page table caching attributes.  Caching
  the PTEs is poison to SMP, as it requires a data cache flush on
  every CPU for every change of a page table anywhere.  Even on
  single-core systems, having one mapping be write-through and the
  other writeback (which is what we had!) can break things.  Hopefully
  this catches all the mistakes.

+ Vector mapping is done with a 4k page and not a 4M page, leading to
  much more flexibility with hardware memory layout.  The original
  scheme required that the 4M region containing vecbase be mapped
  virtually to a location other than the hardware address, which makes
  confusing linkage with call0 and difficult initialization
  constraints where the exception vectors run at different addresses
  before and after MMU setup (effectively forcing them to be PIC
  code).

+ More provably correct initialization, all MMU changes happen in a
  single asm block with no memory accesses which would generate a
  refill.

+ Requires CONFIG_XTENSA_CALL0_ABI.  This isn't strictly an issue of
  behavior: the MMU code doesn't care about reguster use per se.  But
  with the register-windowed context management code in the older
  asm2, it's impossible to provide security across memory domains.
  Just rip the bandaid off.

Signed-off-by: Andy Ross <[email protected]>
We were trying to initialize the MMU extremely early (out of crt1.S),
which was forcing some odd gymnastics with splitting up the
initialization paths.  Just don't do that.

The MMU boots with a pefectly good mapping for physical memory.  Use
that until we're set up and about to launch the post-arch
initialization in z_cstart().

Just in general, trying to run compiled C code before z_cstart() calls
kernel_arch_init() is asking for trouble.  We do it only in extreme
situations, and we don't need it here.

Signed-off-by: Andy Ross <[email protected]>
@andyross
Copy link
Contributor Author

Another spin, now with full MMU support for dc233c.

This is intended to go under a rework of the userspace PR (which I
need to find a place to put). As it is it splits the existing MMU
code into a mostly-unchanged "page table" layer and an entirely new
hardware driver for the bottom bits. Paste major deltas from the
commit message:

  • Significantly smaller code and simpler API (just three functions to
    be called from the OS/userspace/ptable layer).

  • Big README-MMU document containing my learnings over the process, so
    hopefully fewer people need to go through this in the future.

  • No TLB flushing needed. Clean separation of ASIDs, just requires
    that the upper levels match the ASID to the L1 page table page
    consistently.

  • Validation layer to check page table caching attributes. Caching
    the PTEs is poison to SMP, as it requires a data cache flush on
    every CPU for every change of a page table anywhere. Even on
    single-core systems, having one mapping be write-through and the
    other writeback (which is what we had!) can break things. Hopefully
    this catches all the mistakes.

  • Vector mapping is done with a 4k page and not a 4M page, leading to
    much more flexibility with hardware memory layout. The original
    scheme required that the 4M region containing vecbase be mapped
    virtually to a location other than the hardware address, which makes
    confusing linkage with call0 and difficult initialization
    constraints where the exception vectors run at different addresses
    before and after MMU setup (effectively forcing them to be PIC
    code).

  • More provably correct initialization, all MMU changes happen in a
    single asm block with no memory accesses which would generate a
    refill.

  • Requires CONFIG_XTENSA_CALL0_ABI. This isn't strictly an issue of
    behavior: the MMU code doesn't care about reguster use per se. But
    with the register-windowed context management code in the older
    asm2, it's impossible to provide security across memory domains.
    Just rip the bandaid off.

*
* + System calls have their own optimized handler
* + MMU TLB exceptions are handled directly, as they must not
* re-enable interrupts at any point before returning to the trapping
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what you mean here ? syscall handler is expected to run with interruptions enabled
https://github.com/zephyrproject-rtos/zephyr/blob/main/tests/kernel/mem_protect/userspace/src/main.c#L934

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the typography is confusing? Two items in a bulleted list:

  • System calls have their own optimizer handler
  • TLB exceptions don't unmask EXCM

Syscalls absolutely do unmask interrupts, obviously.

s32i a0, a10, ___thread_t_arch_OFFSET + __xtensa_win0_ctx_t_user_ps_OFFSET

/* Further cook PS to set RING=0 and UM=0, then set our sp and unmask */
movi a1, ~(PS_RING_MASK | PS_UM_MASK)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you clearing up UM ? As far as I could see kernel threads and user threads are using user vector mode, the exception is only here ...

In other hand, you have to unmask EXCM, since syscall handlers expects it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just performance, really. The user vector has more work to do than the kernel vector: it needs to check for syscall entry. Obviously system call handlers won't be making syscalls of their own, so there's no point.

And the code already removed EXCM from this the PS value in a0 a few lines above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and actually in this version of the PR, all threads are initialized with a PS of zero, so all threads are kernel mode by default. Shimming the userspace stuff on top as we speak, at which point we'll be consistent between RING and UM (i.e. you either have RING=3/UM=1 or RING=0/UM=0, and no other states)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok, just noticed that thread ctx ps is not set :P

l32i a1, a1, 0
#endif
wsr a0, PS

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assuming that you have to re-enable interruptions / exceptions before jump into the handler
you can be switched after setting PS and syscall parameters are not saved in this context stack ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interrupted userspace a0/a1/pc/ps values (the ones we need to preserve) are already stashed in the thread struct at this point. All a context switch arriving now has to do is save our kernel space GPRs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that happens only with interrutptions ritght ? inside must_switch

I remember having seen case where a syscall triggers a context switch without an interruption happens and AFAIU xtensa_switch does not save them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do a cooperative context switchin inside a syscall too (consider e.g. k_sleep()). But again system calls aren't interrupts to the userspace code, we can rely on userspace having preserved its own state too. In this particular case, the system call is just a regular ABI call: we're responsible for preserving the return value in a0 and the stack pointer, and a12-a15 (which are already preserved in kernel space by the ABI call we make to the handler). And we restore the PC and PS values upon returning too. So we only need four registers of state.

(Actually writing this I realize that it's 3: the return PC is just the address of the RET instruction in the stub that's going to jump to a0, we don't really need to store both and could just restore to the value in a0. Will add an optimization note)

@ceolin
Copy link
Member

ceolin commented Oct 31, 2023

once I built the toolchain, I had to add +CONFIG_XTENSA_CALL0_ABI=y in boards/xtensa/qemu_xtensa/qemu_xtensa_mmu_defconfig to build the sample. Trying to run the sample I get:

 ** CPU  ** FATAL EXCEPTION
 ** CPU  ** FATAL EXCEPTION
 ** CPU  ** FATAL EXCEPTION
 ** CPU  ** FATAL EXCEPTION
 ** CPU  ** FATAL EXCEPTION
 ** CPU  ** FATAL EXCEPTION
 ** CPU  ** FATAL EXCEPTION
 ** CPU  ** FATAL EXCEPTION
 ** CPU  ** FATAL EXCEPTION
 ** CPU  ** FATAL EXCEPTION
 ** CPU  ** FATAL EXCEPTION
 ** CPU qemu-system-xtensa: terminating on signal 2
ninja: build stopped: interrupted by user.

it seems to be happening in printk("Launching crashing thread %p\n", &boom);

Removing some printks the code runs a little bit further but it stucks with:

Flagging medium priority interrupt                                       
swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint!
 (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nest
ed = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = s
wint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! 
(nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (ne
sted = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested =
 swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint
! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nes
ted = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = 
swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint!
 (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nested = swint! (nest
ed = swint! (nested = swint! (nested = swin

It seems I am missing something to reproduce the sample, just after that I noticed that the build it generates a lot of warnings:

/home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/../../../../xtensa-dc233c_zephyr-elf/bin/ld.bfd: /home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/libgcc.a(_floatsidf.o): warning: incompatible Xtensa configuration (ABI do
es not match)                                                                                                                                                                                                                                                                                          
/home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/../../../../xtensa-dc233c_zephyr-elf/bin/ld.bfd: /home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/libgcc.a(_floatdidf.o): warning: incompatible Xtensa configuration (ABI do
es not match)                                                                                                                                                                                                                                                                                          
/home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/../../../../xtensa-dc233c_zephyr-elf/bin/ld.bfd: /home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/libgcc.a(_truncdfsf2.o): warning: incompatible Xtensa configuration (ABI d
oes not match)                                                                                                                                                                                                                                                                                         
/home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/../../../../xtensa-dc233c_zephyr-elf/bin/ld.bfd: /home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/libgcc.a(_extendsfdf2.o): warning: incompatible Xtensa configuration (ABI 
does not match)                                                                                                                                                                                                                                                                                        
/home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/../../../../xtensa-dc233c_zephyr-elf/bin/ld.bfd: /home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/libgcc.a(_divdi3.o): warning: incompatible Xtensa configuration (ABI does 
not match)                                                               
/home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/../../../../xtensa-dc233c_zephyr-elf/bin/ld.bfd: /home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/libgcc.a(_moddi3.o): warning: incompatible Xtensa configuration (ABI does 
not match)                                                               
/home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/../../../../xtensa-dc233c_zephyr-elf/bin/ld.bfd: /home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/libgcc.a(_udivdi3.o): warning: incompatible Xtensa configuration (ABI does
 not match)                                                              
/home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/../../../../xtensa-dc233c_zephyr-elf/bin/ld.bfd: /home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/libgcc.a(_umoddi3.o): warning: incompatible Xtensa configuration (ABI does
 not match)                                                              
/home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/../../../../xtensa-dc233c_zephyr-elf/bin/ld.bfd: warning: zephyr/zephyr.elf has a LOAD segment with RWX permissions
/home/ceolin/x-tools/xtensa-dc233c_zephyr-elf/lib/gcc/xtensa-dc233c_zephyr-elf/13.2.0/../../../../xtensa-dc233c_zephyr-elf/bin/ld.bfd: warning: zephyr/zephyr.elf has a LOAD segment with RWX permissions

What is missing in the toolchain ?

@andyross
Copy link
Contributor Author

@ceolin:

It seems I am missing something to reproduce the sample, just after that I noticed that the build it generates a lot of warnings:

Yeah, that's what you get if you have a windowed toolchain (strictly the libgcc.a in the toolchain) building against a call0 Zephyr, or vice versa. It will crash any time it hits the ENTRY instruction at the start of a libgcc instruction (because PS.WOE=0). And indeed printk() is one of the big offenders, as it uses 64 bit math by default. If you look way upthread, there are some kconfigs I found that suppress enough 64 bit-isms to make the test run, and that probably works still.

Another option is to use a cadence toolchain, as those are multilib and don't have this problem.

Finally remember @dcpleung had patches to the SDK working which build a multilib gcc; I never got to trying them myself but for sure that's a better solution than the hand-built one in my instructions above. Maybe he could just hand you a copy?

@ceolin
Copy link
Member

ceolin commented Oct 31, 2023

@ceolin:

It seems I am missing something to reproduce the sample, just after that I noticed that the build it generates a lot of warnings:

Yeah, that's what you get if you have a windowed toolchain (strictly the libgcc.a in the toolchain) building against a call0 Zephyr, or vice versa. It will crash any time it hits the ENTRY instruction at the start of a libgcc instruction (because PS.WOE=0). And indeed printk() is one of the big offenders, as it uses 64 bit math by default. If you look way upthread, there are some kconfigs I found that suppress enough 64 bit-isms to make the test run, and that probably works still.

Yep, I know it. The point is that I follow the whole instructions to get sdk-ng and crosstool and to build them. I thought with that we had a proper libgcc ...

Another option is to use a cadence toolchain, as those are multilib and don't have this problem.

One of the things you pointed in the other pr is not having a way to test in CI, that is where we are here ...

@andyross
Copy link
Contributor Author

andyross commented Oct 31, 2023

Yep, I know it. The point is that I follow the whole instructions to get sdk-ng and crosstool and to build them. I thought with that we had a proper libgcc ...

Can you check that the relevant CFLAGS kconfig is in the ct-ng .config file? It can get dropped if you reconfigure it via menuconfig or whatever, IIRC. You can also check the disassembly of libgcc.a, if you see any CALLXn/ENTRY/RETW instructions, it's a windowed library.

One of the things you pointed in the other pr is not having a way to test in CI, that is where we are here ...

I know. :) This is still a DNM PR for sure, pending a toolchain release.

I swear I'm not trying to monopolize this or turn it into a fight. We need one codebase that meets all requirements (though to be fair this one has/will-have some genuine advantages in performance/size). Clearly neither is there yet, the intent is to turn this one into a merged PR with the userspace support from the other. But I'm very happy to go in the other direction too.

@dcpleung
Copy link
Member

As for toolchain, grab zephyrproject-rtos/sdk-ng#701 and make sure do a submodule update to grab the GCC patch.

regs->ptevaddr = CONFIG_XTENSA_MMU_PTE_BASE + user_asid * 0x400000;

/* The ptables code doesn't add the mapping for the l1 page itself */
l1_page[l1_idx(regs->ptevaddr)] = (uint32_t) l1_page;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manipulating the page table here does not look nice if the goal is to split the mmu / page table logic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the cache bits may not match as well ...

__ASSERT_NO_MSG(user_asid != 1 && user_asid < CONFIG_XTENSA_MMU_ASID_MAX);

/* We don't use ring 1/2, ring 0 ASID must be 1 */
regs->rasid = (user_asid << 24) | 0x000001;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know each ring in RASID must be different, we can't leave rings 1/2 with the same value

@nordicjm nordicjm removed their request for review December 5, 2023 08:41
Copy link

github-actions bot commented Feb 4, 2024

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions bot added the Stale label Feb 4, 2024
@github-actions github-actions bot closed this Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Build System area: Console area: Samples Samples area: Toolchains Toolchains area: Xtensa Xtensa Architecture DNM This PR should not be merged (Do Not Merge) Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants