This is a gcc
plugin written to ease porting C software to Cosmopolitan Libc.
The general idea is to reduce manually changing the source code of any external
software when attempting to build it with Cosmopolitan Libc -- ideally, you
would need to customize only the build process, but make zero changes to the
source code.
Licensed under ISC License.
I ended up writing a
gcc
patch with the code from this plugin. My patch is also licensed under ISC. The patchedgcc
does a lot less work than this plugin (and avoids almost all of the counterexamples) because I avoid using the macro hack and just patch the AST before the parser complains. (Per my current understanding,gcc
does not provide plugin access to the program AST during its construction in the parsing process, which is why I wrote the patch instead. However, plugins provide a sufficiently large surface to figure out what a problem requires before diving into the depths ofgcc
internals.
Note: this plugin has not yet been fully tested -- please check the compiled
.o
file, generated ASM, or errors in your test suite to confirm the
correctness of the transformations. When in doubt, transform the code manually.
See the Counterexamples section for more details.
- Install the necessary
gcc
plugin headers (you needgcc
to be able to use its plugin architecture) - Clone this repository and run
make
- Create a small shell script that uses
/usr/bin/gcc
with this plugin (ie add-O2 -fplugin=/location/of/portcosmo.so -include /location/of/tmpconst.h
) and use that asCC
when building software.
For building software with Cosmopolitan Libc+this plugin, you will need to use
this branch where
I've been trying to ensure I change as little of Cosmopolitan Libc as possible
in order to make this work. And it does work! This
branch of CPython
3.11.0rc1 builds with Cosmopolitan Libc, and I didn't have to modify any
switch
statements.
Cosmopolitan Libc contains system-level constants (for example, errno constants
like SIGABRT
) defined as follows:
extern const int SIGABRT;
#define SIGABRT ACTUALLY(SIGABRT)
This plugin activates upon finding a ACTUALLY(
(note the space) within a
defined macro, and (re-)defines ACTUALLY
as follows:
#define ACTUALLY(X) __tmpcosmo_##X
and records the location in the source file every time a macro containing
ACTUALLY(
is used. In tmpconst.h
, there is a huge list of constants starting
with the __tmpcosmo_
prefix.
After every (valid) macro usage has been recorded, this plugin walks through the
entire AST of the source file to find each usage, and substitutes the
appropriate extern
variable name in the location where the macro was used. It
does so via the below two components:
ifswitch
-- rearrangeswitch
statements if the case labels would otherwise raise thecase label is not constant
error.initstruct
-- update definitions of variables,struct
s, and arrays if their initialization would otherwise raise theinitializer element is not constant
error (can handlestatic
and global variables).
The plugin errors out if the ACTUALLY
macro was improperly used, or if it is
unable to confirm all the macro usage records were substituted successfully. At
the end of compilation, the plugin provides a note of how many substitutions
were made when compiling the file.
There might be other ways to check for such incorrect statements, but any
method to rearrange these switch
statements would need to incorporate a C
preprocessor and parser, and any source code transformations would need to
remain valid even if ifdef
s are mixed within the C source code.
Mixing ifdef
s is a quite common occurrence in switch
statements -- often
times you see handlers for errno
having a bunch of ifdef
s (and
fallthroughs!) to allow for different kinds of errno values based on the
operating system.
The best place to handle these statements is after the preprocessor has done
its work, so that the focus can be solely on the AST. gcc
comes in with a
battle-tested C preprocessor, parser, decent optimizations, and plugin support,
so why not a gcc
plugin?
While this plugin can traverse through the code AST and modify almost all uses of the macro, there are a few cases where it may not be able to do so:
- Using
gcc -O0
i.e. if you disable all optimizations, thengcc
will not perform constant-folding and error out withcase label is not constant
with some source code like
case __tmpcosmo_SIGABRT:
This can likely be fixed, it's just a matter of enabling the right optimization
flag in gcc
. Better yet: we can figure out how to use __tmpcosmo_SIGABRT
as
a macro that can be defined during runtime, instead of a static const int
in
tmpconst.h
, which would circumvent this problem. Edit: I ended up
patching gcc
with the code from this plugin, so this problem is avoided.
case
labels with ranges, something like:
case SIGABRT .. 0:
Yes, I know it's possible to make this work, but I haven't seen any real-life C code that does something like this yet.
- constant-folding algebra:
static const int e = SIGABRT;
/* few lines later... */
func(e);
Under gcc
's optimization flags, e
will be constant-folded, and its value
will be used everywhere instead. The plugin has not recorded all the locations
where e
could have been used, so it just bails out when seeing a declaration
like this. Edit: I ended up
patching gcc
with the code from this plugin, so this problem is avoided.
int x = SIGABRT+42;
if(j < SIGABRT+42)
case SIGABRT+42:
for(int i=SIGABRT-1; i < 0; ++i)
Under gcc
's optimization flags, all of the above statements will have been
constant-folded, and even though the plugins has recorded where the macro was
used, it does not know what expression was simplified, so it bails out if it was
unable to substitute a constant in any expression. Edit: I ended up
patching gcc
with the code from this plugin, so this problem is avoided.
-
magical things like Duff's device -- I don't know if any C code uses Duff's device with
SIGABRT
, would be fun to find out. Edit: I ended up patchinggcc
with the code from this plugin, so this problem is avoided. -
substituting the incorrect location due to a
bad
pick of constant: Suppose we have some code which uses a lot of integer constants, and some of them are on the same line as when one of our macro substitutions was recorded, then the plugin will likely substitute the constant at the wrong location. See the below example:
/* suppose tmpconst.h has the below value */
static int __tmpcosmo_SIGABRT = -961;
/* and your code has something like */
func(-961, SIGABRT);
/* the macro will modify it to */
func(-961, __tmpcosmo_SIGABRT);
/* and record the location of the modification */
/* but gcc will constant-fold it to */
func(-961, -961);
/* the AST will be INCORRECTLY transformed into */
func(SIGABRT, -961);
/* whereas the second param should actually be transformed */
func(-961, SIGABRT);
It might be possible to fix this via a hash-table of some sort, because we can just check the function call/expression at a marked location to confirm that it does not have the constant we just substituted anymore(ie our substitution actually fixed the macro use and some other constant in the source code).
This can also be fixed if we had more precise location checking, at present, if your source code has a function call like
func(27, __tmpcosmo_SIGABRT);
In terms of line information, we only know that the CALL_EXPR
with func
starts on line 42 (and also its end sometimes) -- we do not know the location of
the the individual parameters 27
and -961
, which would be useful to match
with the location we have saved from when the macro was used.
Edit: I ended up patching
gcc
with the code from this plugin, so this problem is avoided in most situations (I haven't found an example of this problem in real-life code yet). It can still happen if you're initializing a struct or writing aswitch
case with the clashing values, but my current belief is that the latter is quite rare (aswitch
whose options include both errno constants and other unrelated negative values), and the former is still uncommon, and would be caught by a simple test. Either way, the fix is the same as always: use different constants, or do the AST patching by hand.
- The
gcc
Internals documentation -- this document, along with thegcc
headers for plugin writers, provides everything you need to know about what plugins can do. - History of C
-
C99
switch
constraints and semantics -- see page 92,$\S 6.8.4.2$ -
C11
switch
constraints and semantics -- see page 149,$\S 6.8.4.2$ -
C17 final draft
switch
constraints and semantics -- see page 108,$\S 6.8.4.2$ - Assert Rewriting in
gcc
- An Introduction to
gcc
andgcc
's plugins - LWN: Randomizing Structure Layout
- Source code of the
randstruct
plugin used in the Linux kernel -- this is to understand how much can be done at thegcc
plugin level. -
gcc
OpenMP Runtime Wiki -- need to understand howpragma
s can be used to alert a plugin