Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Part 1 of #9013 #9024

Merged
merged 6 commits into from
Apr 16, 2024
Merged

Part 1 of #9013 #9024

merged 6 commits into from
Apr 16, 2024

Conversation

lyakh
Copy link
Collaborator

@lyakh lyakh commented Apr 10, 2024

a split up of fb6286d . Can be merged independently since this code isn't used yet, but to work it'll need the rest of #9013, including a Zephyr update, which is now available as #9019

lyakh added 3 commits April 10, 2024 14:32
We're skipping unused 0x8000 bytes of data at the beginning of the
module, so we have to reduce the amount of data to be copied
accordingly.

Signed-off-by: Guennadi Liakhovetski <[email protected]>
Some image formats, notably relocatable objects, don't create ELF
segments. To work around this use the object data size, calculated by
Zephyr.

Signed-off-by: Guennadi Liakhovetski <[email protected]>
We need to tell Zephyr whether or not to perform local relocations.
Check persistent library data to distinguish the first instantiation
from the following ones.

Signed-off-by: Guennadi Liakhovetski <[email protected]>
@marc-hb
Copy link
Collaborator

marc-hb commented Apr 10, 2024

@lyakh
Copy link
Collaborator Author

lyakh commented Apr 11, 2024

Lots of build failures in https://sof-ci.01.org/sofpr/PR9024/build3859/build/index.html and https://github.com/thesofproject/sof/actions/runs/8633265152/job/23665921289?pr=9024

@marc-hb uhm, yeah, I was too quick this time - this part does require a Zephyr update, which is now separate in #9019
so need to wait for that to be merged...

@lyakh
Copy link
Collaborator Author

lyakh commented Apr 11, 2024

SOFCI TEST

@lyakh
Copy link
Collaborator Author

lyakh commented Apr 11, 2024

https://sof-ci.01.org/sofpr/PR9024/build3900/devicetest/index.html was green for nocodec and SDW, but HDA didn't run, let's re-run

lyakh added 3 commits April 12, 2024 08:26
Add support for relocatable objects to the llext module build system.
In such builds no ELF segments are created, so we need to process all
sections individually.

Signed-off-by: Guennadi Liakhovetski <[email protected]>
Relocatable ELF objects don't contain segments, update rimage to
prevent it from aborting in such cases.

Signed-off-by: Guennadi Liakhovetski <[email protected]>
The Cadence Xtensa cross-toolchain cannot build shared libraries,
when using it we build relocatable ELF objects instead. Update
smart-amp-test to support both formats.

Signed-off-by: Guennadi Liakhovetski <[email protected]>
@lyakh
Copy link
Collaborator Author

lyakh commented Apr 12, 2024

The fuzzer failure https://github.com/thesofproject/sof/actions/runs/8657935406/job/23740891264?pr=9024 is curious, but I'm rather sure it's unrelated. The whole library management depends on IPC4, and all these run-time changes only affect it. The only potentially relevant for IPC3 change is the change for rimage, which makes it less sensitive to invalid ELF files, but that's a build-time change, and as far as I understand the fuzzer doesn't attempt invalid builds. @andyross could you check?

@marc-hb
Copy link
Collaborator

marc-hb commented Apr 12, 2024

@andyross I downloaded the input "crash" file (attached again here for convenience and to avoid expiration), debugged it and noticed something that looks bad.

IPC3-fuzz9024-logs.zip

The IPC3 mailbox_validate() function returns NULL when the hdr is malformed. I'm not sure what the "real" IPC3 code does with that NULL (I didn't look) but what you did in commit c50eddc and src/platform/posix/ipc.c does not look correct: you keep going like nothing happened?!

Most of the time it surprisingly does not crash (lack of defensive programming?) but it's not too surprising if that crashes from time to time - as it just did in https://github.com/thesofproject/sof/actions/runs/8657935406/job/23740891264?pr=9024

EDIT: if confirmed, this issue should be filed elsewhere.

@marc-hb
Copy link
Collaborator

marc-hb commented Apr 12, 2024

I'm not sure what the "real" IPC3 code does with that NULL (I didn't look)

Actually, AMD, IMX and mediatek keep trying to process that NULL like posix/ipc.c does. Not sure about Intel

@andyross
Copy link
Contributor

To clarify: you're seeing a NULL header passed to ipc_cmd() because of an invalid parse (an expected failure, this is fuzzing aftera all) that happens in mailbox_validate()? I'll admit I'm not sure exactly what the specified behavior there is. The POSIX code was written to try to duplicate the original scheme, but... yeah, it's pretty confusing how this is intended to work. There's a big comment in posix/ipc.c trying to explain what I think is happening.

FWIW, src/ipc/ipc-zephyr.c looks broadly similar: I see ipc_compact_read_msg() returning an uninspected outptut of mailbox_validate(), which ends up in ipc_platform_do_cmd() where it's passed straight to ipc_cmd(). Basically the same code path?

My gut says this is a core issue and that ipc_cmd() is expected to be tolerant of NULLs at runtime, I guess? But we could absolutely disallow it and check before the call too.

@marc-hb
Copy link
Collaborator

marc-hb commented Apr 13, 2024

My gut says this is a core issue and that ipc_cmd() is expected to be tolerant of NULLs at runtime, I guess?

I think you must be right.

Now I suspect the timeout logic in the fuzzing framework. I suspect it because the failing run lasted exactly as long as the passing runs.

I ran it a second time and it passed https://github.com/thesofproject/sof/actions/runs/8657935406/job/23776321942

I also run the crash input locally a million times and no problem were ever found.

Let's ignore this. We still have the .zip attached if we ever need to get back to it.

@@ -53,7 +53,16 @@ target_compile_options(${MODULE} PRIVATE
-save-temps -O2
)

if("${ZEPHYR_TOOLCHAIN_VARIANT}" STREQUAL "zephyr")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a "FIXME: replace this hack with new add_llext_target() in Zephyr"

Copy link
Collaborator

@marc-hb marc-hb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -170,7 +171,7 @@ static int llext_manager_link(struct sof_man_fw_desc *desc, struct sof_man_modul
uint32_t module_id, struct module_data *md, const void **buildinfo,
const struct sof_man_module_manifest **mod_manifest)
{
size_t mod_size = desc->header.preload_page_count * PAGE_SZ;
size_t mod_size = desc->header.preload_page_count * PAGE_SZ - 0x8000;
Copy link
Collaborator

@marc-hb marc-hb Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic number. Needs at least a comment in the source.

try:
offset = elf.get_section_by_name(sections[i]).header.sh_offset
size = elf.get_section_by_name(sections[i]).header.sh_size
except:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, just catch the exception you're expecting. Copy/paste its name from the stack trace, it should take seconds. Also, use pylint.

https://pylint.pycqa.org/en/latest/user_guide/messages/warning/broad-exception-caught.html

@@ -7,9 +7,11 @@
# portable way to do that. Therefore we pass the linker path and all the command
# line parameters to this script and call the linker directly.

import os
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this used in this commit?

@lgirdwood
Copy link
Member

https://pylint.pycqa.org/en/latest/user_guide/messages/warning/broad-except.html

@marc-hb these are all minor improvements, the main benefit is getting this used and @lyakh can circle back and do the updates when he's back from vacation. I'm going to merge so we can start looking at the next steps for testing, CI and signing.

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lllext_link_helpe.py points are valid, but otherwise this is ready to merge.

@lgirdwood lgirdwood dismissed marc-hb’s stale review April 16, 2024 14:29

These can be updated in subsequent PRs as feature is upstreamed.

@lgirdwood lgirdwood merged commit c5230a9 into thesofproject:main Apr 16, 2024
46 of 47 checks passed
@marc-hb
Copy link
Collaborator

marc-hb commented Apr 16, 2024

Actually, "broad-except" is not a minor warning and this is the type of coding mistakes that gets fixed only if/when being hit hard by it and wasting a lot of time. Same as ignoring errors in C.

@lyakh lyakh deleted the reloc-p1 branch April 29, 2024 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants