Elfloader: NVIDIA Jetson Orin support #190

andybui01 · 2024-02-08T03:55:44Z

Elfloader portion of the Jetson Orin port effort (seL4/seL4#1135)

This patch set is quite big and might be hard to review, if the reviewers would like me to break this into separate PRs please let me know.

andybui01 · 2024-02-09T11:35:26Z

Ok, failing on all the ARMv8 32 bit builds, the change in sys_boot.c needs to be simplified since really all we changed was ARMv8 64 bit code.

This commit also adds support for the tegra194-tcu uart device. Signed-off-by: Andy Bui <[email protected]>

of set/way. Cleaning a cache by set/way only cleans the CPU-local caches. System caches can only be cleaned by virtual address, thus the elfloader is changed to clean the data cache by VA instead of set/way. This is important as the point-of-coherency (PoC) may lie beyond the system cache, in which case we cannot clean to PoC unless we use a clean by va. An ifdef is used for the implementation of continue_boot(), as we expect this to diverge from the legacy (ARMv7 right now) implementation slowly. Signed-off-by: Andy Bui <[email protected]>

EFI may boot the elfloader with caches disabled on the secondary cores, we want the value of non_boot_lock to be visible. Some barriers are added to stabilize SMP booting in the elfloader. Co-authored-by: Yanyan Shen <[email protected]> Co-authored-by: Matthias Rosenfelder <[email protected]> Signed-off-by: Andy Bui <[email protected]>

Move init_boot_vspace so that we can clean to PoC. This is only doable by va, hence we initialize the kernel page tables before the MMU is reset. Signed-off-by: Andy Bui <[email protected]>

Since we do not have to branch to another label, there is no need to follow the ABI here. This removes 2 memory access before and after changing the state of the MMU, which should overall reduce the chance of any speculative fetches going wrong. Signed-off-by: Andy Bui <[email protected]>

bootloader parameters. Signed-off-by: Tw <[email protected]>

(type mismatch). This chops off the aff3 level on Aarch64. Why does the compiler not warn??? Because the own header was not included. If you just include the header (without the change of the return value in the header), we get: seL4_tools/elfloader-tool/src/arch-arm/64/cpuid.c:14:10: error: conflicting types for 'read_cpuid_mpidr' 14 | uint32_t read_cpuid_mpidr(void) | ^~~~~~~~~~~~~~~~ In file included from /home/mro/nvos_neu2/tools/seL4_tools/ elfloader-tool/src/arch-arm/64/cpuid.c:9: elfloader-tool/include/arch-arm/cpuid.h:15:8: note: previous declaration of 'read_cpuid_mpidr' was here 15 | word_t read_cpuid_mpidr(void); | ^~~~~~~~~~~~~~~~ [190/200] Building C object elfloader-tool/CMakeFiles/elfloader.dir/ src/arch-arm/smp_boot.c.obj ninja: build stopped: subcommand failed. Signed-off-by: Matthias Rosenfelder <[email protected]>

pagetables. The 64 kiB alignment is a maximum requirement for a stage2 concatenated pagetable. See Table G5-4 in ARM DDI 0487I.a, page G5-9186. Note: Both comments at the top of the file as well as in line 85 say "64 kiB". 2^14 is unfortunately only 16 kiB. Note2: This code is not executed on AArch64, because the finish_relocation() function panics on AArch64. The latter always takes the "shortcut" via continue_boot(). Signed-off-by: Matthias Rosenfelder <[email protected]>

shadowing. The variable "dtb_size" is of type "size_t" and is defined in src/arch-arm/sys_boot.c, line 36. "size_t" is most certainly NOT the same size as "uint32_t", even on 32-bit architectures. Thus, the declaration in smp_boot.c is incorrect, since it does not match the definition in sys_boot.c. Why even create a local declaration - put this in a common header file and you will see those problems right away. Single point of maintenance! This may lead to an incorrectly sized memory access, that only happens to be correct by chance in Little-Endian mode. For ARM in Big-Endian mode this is a bug and will most likely result in an incorrect DTB size of zero. This fixes c573511 ("elfloader: pass DTB from bootloader to seL4 on ARM"). Moreover, remove the shadowing of global variables by defining local ones with the same name => Rename the local one in src/common.c. This could have been detected with "-Wshadow". Practically speaking our DTBs are always (a lot) smaller than 32-bit. Thus, continue to pass a 32-bit size to the kernel in order to not change the API here. Signed-off-by: Matthias Rosenfelder <[email protected]>

The size calculation was incorrect, unfortunately. That lead to an incorrect memory map provided by UEFI. When switching on EFI_DEBUG one can see that the memory range occupied by the ELF-loader ("paddr=" line) is not fully marked as used by UEFI and the last few pages of the ELF-loader are actually marked as being free. Output before: ELF-loader started on Image ranges: paddr=[7f9bc0000..7fcde1fff] text[7f9bc0000..7f9bd1e8f] data[7f9bd2000..7fcddd9ff] bss[7f9bd2850..7f9c0a4cf] edata[7fcddda00..7fcde1fff] [...] [paddr=0x180023000-0x7f9bbffff] [type = Conventional, attr: Normal <- ok [paddr=0x7f9bc0000-0x7fcdddfff] [type = Loader Code, attr: Normal <- Not ok: end address too low Should be 0x4000 higher [paddr=0x7fcdde000-0x7ffffffff] [type = Conventional, attr: Normal <- Not ok: start address too low Should be 0x4000 higher After: ELF-loader started on Image ranges: paddr=[7f9bbc000..7fcdddfff] text[7f9bbc000..7f9bcde8f] data[7f9bce000..7fcdd99ff] bss[7f9bce850..7f9c064cf] edata[7fcdd9a00..7fcdddfff] [...] [paddr=0x180023000-0x7f9bbbfff] [type = Conventional, attr: Normal <- ok [paddr=0x7f9bbc000-0x7fcdddfff] [type = Loader Code, attr: Normal <- ok (same as above) [paddr=0x7fcdde000-0x7ffffffff] [type = Conventional, attr: Normal <- ok (starts one after paddr end) Note: You don't have that debug output (EFI_DEBUG) in your code, that prints the UEFI memory map. So you have to believe me that this is the actual output. This fixes 030d83b ("elfloader: improve EFI support"). Signed-off-by: Matthias Rosenfelder <[email protected]>

The driver list section was part of the "*(COMMON)" section that was placed *after* the image payload (kernel, rootserver etc.). The driver list is data and should be placed adjacent to other data belonging to the ELFloader. This is clearly visible in the mapfile (which you don't generate). The driver list entry is present in the aarch32 linker script for EFI, why was it missing for 64 bit? No functional change. Signed-off-by: Matthias Rosenfelder <[email protected]>

Use existing defines to make the code more descriptive. For this move some defines out of the assembler-only file. This is a preparation patch for upcoming patches. No functional change. Signed-off-by: Matthias Rosenfelder <[email protected]>

Regarding right shifts the standard says: "The type of the result is that of the promoted left operand. The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand." Corresponding GCC warning (if used on a "small" type like uint8_t): main.c:25:39: warning: right shift count >= width of type [-Wshift-count-overflow] \#define GET_PGD_INDEX(x) (((x) >> (ARM_2MB_BLOCK_BITS + PMD_BITS + PUD_BITS)) & MASK(PGD_BITS)) main.c:46:39: note: in expansion of macro ‘GET_PGD_INDEX’ 46 | printf("GET_PGD_INDEX(x): %lu\n", GET_PGD_INDEX(x)); | ^~~~~~~~~~~~~ Thus, make sure that we never exceed/reach it by explicitly casting to a 64-bit type. It also allows using a pointer as macro parameter. This is a preparation patch for upcoming patches. Signed-off-by: Matthias Rosenfelder <[email protected]>

This change sets up pagetables individually for: - The ELFloader image (Normal memory) - The DTB, whether supplied by EFI, cpio or u-boot (Normal mem) - The UART MMIO range (Strongly-Ordered mem) Thus, it removes the bulk 512 GiB 1:1 mapping that was there before. This resulted in problems, since the kernel image was mapped with Normal memory, but the same physical memory was part of the 1:1 Strongly-Ordered mapping. This fulfills the definition of "Mismatched memory attributes" from the ARM Architecture specification (ARM DDI 0487I.a, section B2.8). Even though I am currently unable to see where there would *occur* such a mismatched access, having such a mapping situation is certainly not desirable and should be avoided. Moreover, it is unclear whether there could arise problems from establishing the (Strongly-ordered) mapping if there is nothing behind a physical address (which is certainly true for some parts of the 512 GiB range). This commit solves the sporadics hangs while booting after the "Enabling MMU and ..." message. Tests on several different Orins (Muc and SJ) show promising results, i.e. no "hangs" occurred anymore. Note: The code in arm_switch_to_hyp_tables() still disables and re-enables both MMU & caches, but there are no memory accesses in between. That section has been engineered to be as short as possible and no memory accesses happen in between. Several barriers and code to invalidate instruction caches have been added, too, in order to be on the safe side. However, tests with just adding *that* code still showed the problem being present. The only change that showed behavior change was the change of translation tables. Thus, this *is* the actual solution to the instability problems. Moreover, we need to support crossing a 1 GiB page for placement of the ELFloader. This is due to the latest firmware on Orin0 in MUC, named "Jetson UEFI firmware (version 4.1-33958178)", which puts our image closely below a 1 GiB boundary. Only for tiny image sizes the boundary will not be crossed. Thus, we do not hard-code the writing of tables, because the logic for doing so while crossing a 1 GiB boundary is too complicated. Instead, we use a fully dynamic approach that walks the pagetables in software for a given VA and inserts missing levels on demand from a preallocated pool of pages. Only the two top-level pagetables are fixed. This allows for re-use of all pagetable code, where we only need to distinguish in one (!) place between hypervisor and non-hyp (or VHE). Signed-off-by: Matthias Rosenfelder <[email protected]>

UEFI is an operating system that hides as a bootloader. UEFI is in control of the machine as long as we didn't call exit_boot_services. For instance, UEFI may set up timers to interrupt us while we're fiddling with hardware and UEFI is fiddling with hardware itself and UEFI may be fiddling with the exact same hardware that we are fiddling with, while we're being preempted. That is not good. The previous state of ELFloader is that before exiting UEFI boot services, we already called platform_init() in main(), which may fiddle around with all kinds of hardware. Thus, we should have already exited UEFI boot services when main() is called. Note that exit_boot_services now still executes on the UEFI stack (since we switch the stack in _start()). But so did e.g. the clear_bss() function. I don't see a problem here. It's more a question the other way around: Previously, we called into UEFI with exit_boot_services on our own, potentially too small, stack. Do we have enough space for UEFI to execute? How are we supposed to know that? The UEFI implementation can change, so we can never be sure. But it would be unreasonable for UEFI to start us with a stack that is too small to call any UEFI API, including exit_boot_services. So we can safely assume that there is enough space when using the UEFI stack (since our use of stack to this point is minimal). Also, mask all exceptions until we are about to enter the kernel. We do not want to run with whatever state the bootloader set us up before, do we? We only re-enable the asyncs and debugs; interrupts and FIQs are still masked when entering the kernel. What would we gain from that? We don't expect any. Asyncs (SErrors), however, can indicate that we e.g. touched memory that we shouldn't have touched (secure memory). Signed-off-by: Matthias Rosenfelder <[email protected]>

There are better ways to loop in C. No functional change. Signed-off-by: Matthias Rosenfelder <[email protected]>

size mismatch. The UEFI specification 2.10 says in section 7 for EFI_BOOT_SERVICES.GetMemoryMap(): "The GetMemoryMap() function also returns the size and revision number of the EFI_MEMORY_DESCRIPTOR. The DescriptorSize represents the size in bytes of an EFI_MEMORY_DESCRIPTOR array element returned in MemoryMap. The size is returned to allow for future expansion of the EFI_MEMORY_DESCRIPTOR in response to hardware innovation. The structure of the EFI_MEMORY_DESCRIPTOR may be extended in the future but it will remain backwards compatible with the current definition. Thus OS software must use the DescriptorSize to find the start of each EFI_MEMORY_DESCRIPTOR in the MemoryMap array." This mismatch is the case on (our) Orin UEFI. The compiled size of a memory descriptor is 40 Bytes, but the Orin UEFI implementation uses 48 Bytes per descriptor. Thus, due to the requirement to use a larger size than the returned total size (due to the fact that the buffer allocation itself may lead to one more entry in the memory map), we must increase by the size (in terms of number of descriptors), but use the number of bytes that UEFI uses for one memory map entry, not what we think it might be. Some other people already stumbled over this: https://forum.osdev.org/viewtopic.php?f=1&t=32953 Based on the comment in the existing code, the author seems to not have understood how the size of the memory map can be determined. Just read the spec! So we better update that misleading comment. Signed-off-by: Matthias Rosenfelder <[email protected]>

Type PTE. The ARM spec (ARM DDI 0487J.a) says on page B2-216 ("Aarch64 Application Level Memory Model"): "Hardware does not prevent speculative instruction fetches from a memory location with any of the Device memory attributes unless the memory location is also marked as execute-never for all Exception levels." and "Failure to mark a memory location with any Device memory attribute as execute-never for all Exception levels is a programming error." Similar statements can be found in the chapter about the Aarch32 Application Level Memory Model for aarch32 mode. Signed-off-by: Andy Bui <[email protected]>

Inner Shareable. We noticed a lot of instability with the Jetson Orin port when running in UP + HYP mode. Setting the shareability to Inner Shareable (IS) unconditionally is the fix, however, we're unable to find the exact line in Arm documentation that explains why this is the case. Potential theory: there are other agents on the Jetson Orin that require cache coherency and live in the IS domain. Being more permissive with memory resolves the prefetching and translation faults we were getting. For reference, Arm states "Arm expects operating systems to mark the majority of DRAM memory as Normal Write-back cacheable, Inner shareable" (102376_0200_01_en version 2). Signed-off-by: Andy Bui <[email protected]>

andybui01 · 2024-02-26T05:09:25Z

Two failures on the HW run:

TX2 clang (all configs): cannot connect to mq?
TQMA, passes all tests but output is cut short, CI thinks it failed?

Besides that, there are positive signs that the elfloader changes (and especially the aarch64 changes) don't introduce any regressions.

cc: @kent-mcleod @axel-h @Indanz I'd appreciate your reviews, let me know if the diff can be minimized or cleaned up

kent-mcleod · 2024-02-26T10:30:06Z

I'd appreciate your reviews, let me know if the diff can be minimized or cleaned up

I'd find it easier to fit in time and attention for reviewing if the PR was split up and merged incrementally.

The EFI commits and many of the bugfix commits that are self-contained fixes can likely be reviewed and merged with limited discussion required through their own PR. Specifically:
- 0108b77
- 7b3c2f3
- 1aa2fb0
- c329bef
- e8a15ec
- dea6924
- 65b49ae
- fa81fff
- 33f00c8
- 5ebb1fc
- 80b8902
- 0e0a99d
The Orin platform port commit (9f2a8d0) is trivial once all the dependent changes have been landed. This can probably be merged in advance of the other changes.
The remaining commits will probably require the bulk of discussion in my opinion as they change structure of common code.

So if you are able to split out the above commits into a separate PR and leave the rest in this one I think that will help us proceed.

Indanz · 2024-02-26T11:30:34Z

elfloader-tool/src/arch-arm/drivers/smp-psci.c

-    dmb();
+#if defined(CONFIG_ARCH_AARCH64)
+    /* If the secondary core caches are off, need to make sure that the info
+     * is clean to the physical memory so that the sedcondary cores can read it.


Typo: "sedcondary".

Indanz · 2024-02-26T11:32:09Z

elfloader-tool/src/arch-arm/smp_boot.c

+        /* The compiler may optimize this loop away, add a dsb()
+         * to force a reload. */
+        dsb();


This also applies to non-aarch64, so this doesn't look like the right solution. Using standard atomics may be better and more portable here.

Edit: The problem may be masked by cpu_idle() in 32-bit, but this still feels wrong.

Indanz · 2024-02-26T11:34:29Z

elfloader-tool/src/arch-arm/smp_boot.c

+#if defined(CONFIG_ARCH_AARCH64)
+    dsb();
+    non_boot_lock = 1;
+    /* Secondary CPUs may still run with MMU & caches off. Force the update to be visible. */


The same is true for 32-bit, right?

Indanz · 2024-02-26T11:39:44Z

elfloader-tool/src/binaries/efi/gnuefi/elf_aarch64_efi.lds

@@ -61,6 +61,7 @@ SECTIONS
  .dynstr   : { *(.dynstr) }
  . = ALIGN(4096);
  .note.gnu.build-id : { *(.note.gnu.build-id) }
+  _end = .;


This should be part of commit dea6924 I think.

Indanz · 2024-02-26T11:42:14Z

elfloader-tool/include/arch-arm/elfloader.h

+/* Switches MMU-related stuff: pagetables, MAIR & TCR etc. Works also if the MMU
+ * was off initially. EL2 translation regime only.
+ */
+extern void arm_switch_to_hyp_tables(void);


Perhaps call it aarch64_switch_to_hyp_tables if there is no 32-bit version of it.

Indanz · 2024-02-26T11:45:30Z

I agree with Kent.

I think 4996de7 is the most controversial change and hardest to review for correctness, all other commits seem fairly straightforward.

andybui01 · 2024-02-27T00:58:54Z

No problem, thanks guys.

What I'll do is split this into 3 PRs then:

General stability/EFI changes
The series of commits that changes ARM booting and splits aarch64 out, including the humongous page table commit (4996de7)
Finally, the Jetson Orin port

Indanz · 2024-02-27T11:26:55Z

I would prefer the problematic commit to be in its own PR, so we can discuss it there without holding up the rest.

andybui01 force-pushed the andyb/orin-port branch 3 times, most recently from b62f20a to e299d4c Compare February 8, 2024 05:54

lsf37 added the hw-test enable sel4test hardware builds + runs label Feb 9, 2024

andybui01 force-pushed the andyb/orin-port branch from e299d4c to 167a3b6 Compare February 23, 2024 05:18

Andy Bui and others added 19 commits February 26, 2024 10:54

elfloader: add support for Jetson Orin

9f2a8d0

This commit also adds support for the tegra194-tcu uart device. Signed-off-by: Andy Bui <[email protected]>

elfloader: arm: move kernel page table init code

0cdf9c2

Move init_boot_vspace so that we can clean to PoC. This is only doable by va, hence we initialize the kernel page tables before the MMU is reset. Signed-off-by: Andy Bui <[email protected]>

elfloader: arm: fix incorrect sp when restoring

7b3c2f3

bootloader parameters. Signed-off-by: Tw <[email protected]>

elfloader: arm: do not hard-code values

fa81fff

Use existing defines to make the code more descriptive. For this move some defines out of the assembler-only file. This is a preparation patch for upcoming patches. No functional change. Signed-off-by: Matthias Rosenfelder <[email protected]>

elfloader: Rewrite loop: Do not use goto

80b8902

There are better ways to loop in C. No functional change. Signed-off-by: Matthias Rosenfelder <[email protected]>

andybui01 force-pushed the andyb/orin-port branch from 167a3b6 to 84a55cd Compare February 25, 2024 23:56

Indanz reviewed Feb 26, 2024

View reviewed changes

andybui01 mentioned this pull request Feb 28, 2024

elfloader: improve stability #191

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elfloader: NVIDIA Jetson Orin support #190

Elfloader: NVIDIA Jetson Orin support #190

andybui01 commented Feb 8, 2024

andybui01 commented Feb 9, 2024

andybui01 commented Feb 26, 2024

kent-mcleod commented Feb 26, 2024

Indanz Feb 26, 2024

Indanz Feb 26, 2024 •

edited

Loading

Indanz Feb 26, 2024

Indanz Feb 26, 2024

Indanz Feb 26, 2024

Indanz commented Feb 26, 2024

andybui01 commented Feb 27, 2024

Indanz commented Feb 27, 2024

Elfloader: NVIDIA Jetson Orin support #190

Are you sure you want to change the base?

Elfloader: NVIDIA Jetson Orin support #190

Conversation

andybui01 commented Feb 8, 2024

andybui01 commented Feb 9, 2024

andybui01 commented Feb 26, 2024

kent-mcleod commented Feb 26, 2024

Indanz Feb 26, 2024

Choose a reason for hiding this comment

Indanz Feb 26, 2024 • edited Loading

Choose a reason for hiding this comment

Indanz Feb 26, 2024

Choose a reason for hiding this comment

Indanz Feb 26, 2024

Choose a reason for hiding this comment

Indanz Feb 26, 2024

Choose a reason for hiding this comment

Indanz commented Feb 26, 2024

andybui01 commented Feb 27, 2024

Indanz commented Feb 27, 2024

Indanz Feb 26, 2024 •

edited

Loading