Load Access Fault in nxsem_trywait Due to Invalid Semaphore Pointer on RISC-V NuttX #15178
Open
1 task done
Labels
Arch: risc-v
Issues related to the RISC-V (32-bit or 64-bit) architecture
Area: Kernel
Kernel issues
OS: Linux
Issues related to Linux (building system, etc)
Type: Bug
Something isn't working
Expected Behavior
The program should execute without any memory access violations or crashes. Specifically, the
nxsem_trywait
function should correctly attempt to wait on a semaphore without causing system instability.Actual Behavior
Instead of executing normally, the program crashes due to a load access fault when trying to read a half-word from what seems to be an invalid or inaccessible memory location during the execution of the
nxsem_trywait
function. The crash happens at the address0x800078da
.Description
Instead of executing normally, the program crashes due to a load access fault when trying to read a half-word from what seems to be an invalid or inaccessible memory location during the execution of the
nxsem_trywait
function. The crash happens at the address0x800078da
.nxsem_trywait
)0x800078da (Program counter on exceptions in
nxsem_trywait
)MTVAL: 0x2 (The value associated with the exception, possibly the address offset)
Querying the nut-img file reveals that the exception happened during the execution of the
nxsem_trywait
function, specifically at offset +0x54 from its start. The error log indicates that the system was attempting to perform a load half-word unsigned (lhu) operation from an address held in register s0, which appears to be causing the issue.The assembly instruction at this location is
lhu a1,0(s0)
, which tries to load an unsigned half-word into register a1 from the address pointed to by s0. Given that MTVAL contains0x2
, it suggests that there might be an alignment issue or that the address being accessed does not exist or there is no valid mapping for it in the page tables, leading to the access fault.Debug Logs
The debug logs show that the system was executing various system calls before encountering the exception. These calls are executed sequentially by number (call_num) until finally a very large number #call_num = 18446744073709551615, which is the maximum value of an unsigned 64-bit integer that could mean an overflow or invalid argument.Upon reaching the
nxsem_trywait
function, it attempted to load a half-word from an address stored in s0. However, this address appears to be invalid or out of bounds, resulting in the load access fault.Steps to Reproduce
To reproduce this issue, one can use Syzkaller to execute system calls against the NuttX kernel. The specific sequence leading up to the crash includes calls such as
syz_sem_timedwait
,syz_putenv
,syz_setenv
, andsyz_sem_timedwait
, culminating in the problematic call tonxsem_trywait
.The corresponding syscall specific implementation code is as follows:
Suggested Fix
The nxsem_trywait function is in sched/semaphore/sem_trywait.c with the following code:
After analysis, the following recommendations were given
Pointer Validation: Ensure that the pointer passed to
nxsem_trywait
is properly initialized and aligned according to RISC-V's requirements. Check if the semaphore structure (sem
) is corrupted or if there are issues with memory allocation or deallocation patterns that could lead to accessing invalid addresses.Semaphore Initialization: Verify that all semaphores are correctly initialized before being used. Uninitialized or improperly initialized semaphores can cause undefined behavior, including invalid memory accesses.
Memory Alignment: Review the memory alignment of data structures used in conjunction with
nxsem_trywait
. Misaligned data can result in access faults on architectures like RISC-V, which have strict alignment requirements.Runtime Checks: We can see that the problem is a load access fault when trying to access sem->flags or NXSEM_COUNT(sem). This indicates that the incoming sem pointer may be invalid (e.g., NULL, uninitialized, or pointing to an illegal memory address). To fix this, we need to make sure that the sem pointer is properly initialized and points to a valid memory location before actually using it.
We introduce an auxiliary function is_valid_semaphore(), which is assumed to exist, to check if a semaphore structure is legal. This function can be defined on an implementation-specific basis, e.g., to check if the semaphore has been properly initialized, etc.
On which OS does this issue occur?
[OS: Linux]
What is the version of your OS?
Ubuntu 20.04
NuttX Version
2ff2b82
Issue Architecture
[Arch: risc-v]
Issue Area
[Area: Kernel]
Verification
The text was updated successfully, but these errors were encountered: