Skip to content

Calculates the maximum stack depth required by GCC-compiled code

License

Notifications You must be signed in to change notification settings

R34P3R-H09/stack_usage

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Stack Usage

One of the more tricky problems when coding for an embedded system is ensuring that each task has enough stack space for its needs.

Most microprocessor control units (MCUs) have limited amounts of RAM; on a general-purpose machine, for example a Linux desktop, it would in theory be possible for a potential stack overflow to be recognised and for additional stack space to be allocated, and even if that's not possible, a large stack can be pre-allocated using virtual memory without any immediate cost.

This can't be done if the MCU only has real memory. Stack overflow may be caught or not. If you're lucky there'll be enough information left behind for you to use the debugger to find the reason for the resulting crash straight away, if not it could take a while. Under FreeRTOS you're very likely to get a hard fault to add to the joy (stacks and task control blocks live close together, so running off the end of your stack is likely to trample on the stored state of another task, resulting in it trying to access invalid memory).

The Python program stack_usage.py is intended to help with this (it's not a panacea, though! if you have AdaCore support, you'll be better off using GNATstack).

The initial motivation for this work was a hard fault encountered while writing a test program to check that Ada timing events work properly (well, usably) with the FreeRTOS-based Cortex GNAT RTS.

Requirements

stack_usage has been developed on macOS Mojave using Python 2.7 and 3.7 and PLY (Python Lex and Yacc). To install PLY,

pip install --user ply

It relies on the information generated by the GCC compiler (FSF GCC 10 or later, GNAT GPL 2015 or later) using the switch -fcallgraph-info=su,da.

Use

usage: stack_usage.py [flags] <input .ci files>
flags:
-h, --help:        output this message
-s, --save=file:   save data in file
-l, --load=file:   restore previously saved data
-o, --output=file: file for CSV output (D=stack_usage.csv)
-d, --diagnostics: output diagnostic info on parsing

Notes:

  • When data is saved, it's in Python's pickle form.
  • The diagnostics output is only useful while developing the parsing information.

Case study

As noted above, the initial motivation for this work was a hard fault encountered while writing a test program to check that Ada timing events work properly (well, usably) with the FreeRTOS-based Cortex GNAT RTS.

Timing events should, per ARM D.15(25), "be executed directly by the real-time clock interrupt mechanism". This wasn't done in Cortex GNAT RTS, because of ensuring proper handling of inter-task and inter-interrupt protection (especially tricky with the micro:bit, because it uses a cortex-M0 part); instead, a highest-priority task checks timings and runs handlers when required.

Handlers are protected procedures, and code for them is generated as for any protected procedure, with the proper locking mechanisms.

The Timer task was specified as

   task Timer is
      pragma Priority (System.Priority'Last);
   end Timer;

implemented, comments elided, as

   task body Timer is
      Next : Time := Time_First;
      Period : constant Time_Span := Milliseconds (5);
   begin
      loop
         Process_Queued_Events (Next_Event_Time => Next);
         delay until (if Next = Time_First then Clock + Period else Next);
      end loop;
   end Timer;

In Cortex GNAT RTS, an Ada task is created as a FreeRTOS task whose initial procedure is a wrapper which allocates any secondary stack required out of the task's stack (in this case, 10% of the task's default 768 bytes) and invokes a compiler-generated wrapper round the user-written task body.

The event handlers for this software are specified as

   protected LED_Event_Handling is
      pragma Interrupt_Priority;
      procedure Handle
        (Event : in out Ada.Real_Time.Timing_Events.Timing_Event);
   private
      ...
   end LED_Event_Handling;

and are implemented by the compiler as a wrapper, which performs any locking and calls the user-written handler body. This wrapper is called directly by the timer task, so any stack used by the handler body comes out of the timer task's stack.

It turned out that a lot of the problem was that the standard compilation options used included -O0 (no optimisation), and this led to the wrapper procedure (see above) using 300 bytes more stack than the 40 bytes it used with -Og (which "offer[s] a reasonable level of optimization while maintaining fast compilation and a good debugging experience").

A further improvement was to eliminate the Timer task's secondary stack and bump its main stack:

   task Timer
   with
     Priority             => System.Priority'Last,
     Storage_Size         => 1024,
     Secondary_Stack_Size => 0
   is
      pragma Task_Name ("events_timer");
   end Timer;

At this point, it seemed like a good idea to implement stack_usage.py. I also added callgraph code generation as an option to the Cortex GNAT RTS build process.

To generate raw callgraph info for the micro:bit RTS,

cd ~/cortex-gnat-rts/microbit
make CALLGRAPH=yes clean all install

To generate raw callgraph info for the events test program,

cd ~/cortex-gnat-rts/test_microbit
make clean
gprbuild -p -P testbed events -cargs -Og -fcallgraph-info=su,da

To process and save the RTS's information,

cd ~/stack_usage
./stack_usage.py \
    --save=microbit.p \
    ~/cortex-gnat-rts/microbit/.build/*.ci

To combine this saved RTS information with the events program's data,

./stack_usage.py \
    --load=microbit.p \
    --output=events.csv \
    ~/cortex-gnat-rts/test-microbit/.build/*.ci

Looking at events.csv, we find that ada.real_time.timing_events.timerTKVIP (called during elaboration to create the Timer task) uses 264 bytes (from the environment task's stack). The system.tasking.restricted.stages.wrapper procedure uses 40 bytes of the Timer task's stack, and calls (via a pointer, so invisibly to stack_usage.py) ada.real_time.timing_events.timerTKB, the task's body, which uses 296 bytes (including 64 bytes used by ada.real_time.timing_events.process_queued_events).

Again, stack_usage.py can't trace into the handler procedure, because it's called via a pointer:

   if Handler /= null then
      Handler.all (Timing_Event (Next_Event.all));
   end if;

so we have to use our knowledge of the actual handlers to find that it's event_support.led_event_handling.handleN (312 bytes), wrapped by event_support.led_event_handling.handleP (which organises locking) for a total of 328 bytes.

The total stack usage for the timer task is then predicted to be 40 + 296 + 328 = 664 bytes.

This is a worst-case value: on measuring the actual free space, the actual peak usage was 424 bytes.

Why?!

The worst-case usage for a particular subprogram is the amount it uses for its own purposes (register storage, local variables) plus the maximum used by any of the subprograms it calls. If a particular execution pattern doesn't call that maximal subprogram, then the actual usage will be lower. In this case, one of the most expensive called subprograms was 64-bit scaled arithmetic division, at 192 bytes, called by Ada.Real_Time.Time_Of at 248 bytes. I'm assuming that this was never actually invoked.

Restrictions/To Do

Ignored subprograms

The tool currently ignores

  • calls to subprograms not compiled with the required options (e.g. memcmp)
  • dynamic objects (for example, declare blocks with variables sized at runtime)
  • dispatching calls

At the very least it should mark the subprograms where there's a potential problem.

Detailed reports

It might be helpful to see details of which subprograms are called by a given subprogram.

About

Calculates the maximum stack depth required by GCC-compiled code

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%