-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Part 1 of #9013 #9024
Part 1 of #9013 #9024
Conversation
We're skipping unused 0x8000 bytes of data at the beginning of the module, so we have to reduce the amount of data to be copied accordingly. Signed-off-by: Guennadi Liakhovetski <[email protected]>
Some image formats, notably relocatable objects, don't create ELF segments. To work around this use the object data size, calculated by Zephyr. Signed-off-by: Guennadi Liakhovetski <[email protected]>
We need to tell Zephyr whether or not to perform local relocations. Check persistent library data to distinguish the first instantiation from the following ones. Signed-off-by: Guennadi Liakhovetski <[email protected]>
@marc-hb uhm, yeah, I was too quick this time - this part does require a Zephyr update, which is now separate in #9019 |
SOFCI TEST |
https://sof-ci.01.org/sofpr/PR9024/build3900/devicetest/index.html was green for nocodec and SDW, but HDA didn't run, let's re-run |
Add support for relocatable objects to the llext module build system. In such builds no ELF segments are created, so we need to process all sections individually. Signed-off-by: Guennadi Liakhovetski <[email protected]>
Relocatable ELF objects don't contain segments, update rimage to prevent it from aborting in such cases. Signed-off-by: Guennadi Liakhovetski <[email protected]>
The Cadence Xtensa cross-toolchain cannot build shared libraries, when using it we build relocatable ELF objects instead. Update smart-amp-test to support both formats. Signed-off-by: Guennadi Liakhovetski <[email protected]>
The fuzzer failure https://github.com/thesofproject/sof/actions/runs/8657935406/job/23740891264?pr=9024 is curious, but I'm rather sure it's unrelated. The whole library management depends on IPC4, and all these run-time changes only affect it. The only potentially relevant for IPC3 change is the change for rimage, which makes it less sensitive to invalid ELF files, but that's a build-time change, and as far as I understand the fuzzer doesn't attempt invalid builds. @andyross could you check? |
@andyross I downloaded the input "crash" file (attached again here for convenience and to avoid expiration), debugged it and noticed something that looks bad. The IPC3 Most of the time it surprisingly does not crash (lack of defensive programming?) but it's not too surprising if that crashes from time to time - as it just did in https://github.com/thesofproject/sof/actions/runs/8657935406/job/23740891264?pr=9024 EDIT: if confirmed, this issue should be filed elsewhere. |
Actually, AMD, IMX and mediatek keep trying to process that NULL like posix/ipc.c does. Not sure about Intel |
To clarify: you're seeing a NULL header passed to ipc_cmd() because of an invalid parse (an expected failure, this is fuzzing aftera all) that happens in mailbox_validate()? I'll admit I'm not sure exactly what the specified behavior there is. The POSIX code was written to try to duplicate the original scheme, but... yeah, it's pretty confusing how this is intended to work. There's a big comment in posix/ipc.c trying to explain what I think is happening. FWIW, src/ipc/ipc-zephyr.c looks broadly similar: I see ipc_compact_read_msg() returning an uninspected outptut of mailbox_validate(), which ends up in ipc_platform_do_cmd() where it's passed straight to ipc_cmd(). Basically the same code path? My gut says this is a core issue and that ipc_cmd() is expected to be tolerant of NULLs at runtime, I guess? But we could absolutely disallow it and check before the call too. |
I think you must be right. Now I suspect the timeout logic in the fuzzing framework. I suspect it because the failing run lasted exactly as long as the passing runs. I ran it a second time and it passed https://github.com/thesofproject/sof/actions/runs/8657935406/job/23776321942 I also run the crash input locally a million times and no problem were ever found. Let's ignore this. We still have the .zip attached if we ever need to get back to it. |
@@ -53,7 +53,16 @@ target_compile_options(${MODULE} PRIVATE | |||
-save-temps -O2 | |||
) | |||
|
|||
if("${ZEPHYR_TOOLCHAIN_VARIANT}" STREQUAL "zephyr") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a "FIXME: replace this hack with new add_llext_target()
in Zephyr"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -170,7 +171,7 @@ static int llext_manager_link(struct sof_man_fw_desc *desc, struct sof_man_modul | |||
uint32_t module_id, struct module_data *md, const void **buildinfo, | |||
const struct sof_man_module_manifest **mod_manifest) | |||
{ | |||
size_t mod_size = desc->header.preload_page_count * PAGE_SZ; | |||
size_t mod_size = desc->header.preload_page_count * PAGE_SZ - 0x8000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Magic number. Needs at least a comment in the source.
try: | ||
offset = elf.get_section_by_name(sections[i]).header.sh_offset | ||
size = elf.get_section_by_name(sections[i]).header.sh_size | ||
except: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, just catch the exception you're expecting. Copy/paste its name from the stack trace, it should take seconds. Also, use pylint
.
https://pylint.pycqa.org/en/latest/user_guide/messages/warning/broad-exception-caught.html
@@ -7,9 +7,11 @@ | |||
# portable way to do that. Therefore we pass the linker path and all the command | |||
# line parameters to this script and call the linker directly. | |||
|
|||
import os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this used in this commit?
@marc-hb these are all minor improvements, the main benefit is getting this used and @lyakh can circle back and do the updates when he's back from vacation. I'm going to merge so we can start looking at the next steps for testing, CI and signing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lllext_link_helpe.py points are valid, but otherwise this is ready to merge.
These can be updated in subsequent PRs as feature is upstreamed.
Actually, "broad-except" is not a minor warning and this is the type of coding mistakes that gets fixed only if/when being hit hard by it and wasting a lot of time. Same as ignoring errors in C. |
a split up of fb6286d . Can be merged independently since this code isn't used yet, but to work it'll need the rest of #9013, including a Zephyr update, which is now available as #9019