A GCC plugin to dump the final layout of a struct and all types it references.
I started this project to support my Linux Kernel MicroPython port - I wanted to have an easy, Pythonic way to access kernel structures. That's why even the dump format is Python :)
This project consists of the GCC plugin itself under gcc_plugin
and of Python heplers under python
that
act as "data accessors" using the the plugin output.
You can use the plugin easily without the accessors though - it just was my specific purpose, the plugin is quite useful by itself.
Just hit make
.
You can build in debug mode with make DEBUG=1
; You'll get debugging information printed to stderr
(basically the internal GCC tree object of every field processed).
There's test_struct
struct in tests/test_struct.c
. This struct exploits many of the peculiarities allowed in
struct definitions. You can check it out, then hit make run
to dump that weird struct, and see how different
fields ended up in the generated dump.
On a specific struct my_struct
from a specific file myfile.c
:
$ gcc -fplugin=./struct_layout.so -fplugin-arg-struct_layout-output=layout.txt -fplugin-arg-struct_layout-struct=my_struct myfile.c -c
You'll have your results in layout.txt
.
You can omit -fplugin-arg-struct_layout-struct
to dump all defined structs instead (all structs defined in your C
file, and all structs defined in all headers included)
Output is printed as Python objects, for easier handling later.
A dictionary is printed, with Struct
objects created for each struct / union. There's no distinction between structs
and unions in this aspect - unions will simply have different offsets for their fields.
The object holds the name and size of the struct/union, plus a dictionary of the fields.
The dictionary maps field names to tuples of (offset, field type). For unions, the offset is always 0
.
The objects & field types are defined in python/fields.py
.
All types have a total_size
attribute, with their total size in bits. Other
attributes vary between field types:
Scalar
- scalars, they also have their basic type, likeint
orchar
orunsigned long int
and a booleansign
field (True
signed /False
unsigned)Bitfield
- used for bitfields, these have the number of bits they occupy and asign
field.StructField
- struct/union fields, these have the struct name they are referencing. If the field is based on an anonymous struct, then itsStruct
object itself is given.Pointer
- for all types of pointers, these have their "pointee" type, which may be e.gScalar
or anotherPointer
.Void
-void
type, for example invoid *
. This has size0
.Function
- pointee type in case of function pointers. This has size0
.Array
- for arrays, these have the number of elements and the type of each element ( similar to the pointee type ofPointer
)
For example, the struct struct s { int x; unsigned char y; void *p; };
on my x86-64 evaluates to:
structs = {
's': Struct('s', 128, {
'x': (0, Scalar(32, 'int', True)),
'y': (32, Scalar(8, 'unsigned char', False)),
'p': (64, Pointer(64, Void())),
}),
}
As I said, I originally intended this for Linux so it must be easy to generate the structs here :)
To generate for a specific struct:
$ python linux/dump_structs.py layout.txt --struct task_struct --header linux/sched.h
You can set the KDIR
environment variable to run against a specific kernel tree (by default, runs against your local).
$ KDIR=/path/to/kernel python dump_struct.py ...
To dump all structs (based on a set of headers I've collected in include_all.c
) you can run:
$ python linux/dump_structs.py all.txt
When including headers to dump their defined types, you may see some structs missing from the
output (although they are fully defined in the headers).
Apparently GCC doesn't complete the processing of structs that have only a typedef name until
they are used at least once (structs of the format typedef struct { ... } ..;
).
I didn't verify it in GCC's code though.
Thus, the emitted event for finished types is not generated for them, and the plugin doesn't know of them.
A quick workaround for this problem: define a dummy, named struct referencing the types you want
in the dummy .c
file you're handing to GCC.
Paired with the structs generated by the plugins, the accessors allow very convenient handling of structured data in Python code.
Basically you need to provide the base memory accessors (functions that access read/write a u8/u16/u32/u64
pointer)
and the accessors handle the rest (fields, pointers, arrays, bitfields, signedness, ...)
You can see how test_accessor.py
does it.
This was tested on GCC 7.4.0, GCC 9.2.0, GCC 10.2.0. Oh, and Python 3, of course.
$ make test