Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

__LINE__ in macro can cause several functions to be changed #630

Open
stilscher opened this issue Mar 8, 2022 · 8 comments
Open

__LINE__ in macro can cause several functions to be changed #630

stilscher opened this issue Mar 8, 2022 · 8 comments
Assignees
Labels
performance Analysis time, memory usage preprocessing C preprocessing

Comments

@stilscher
Copy link
Member

When benchmarking zstd I discovered that for some commits the runtime of the incremental analysis with reluctant destabilization was noticeable worse than for the basic incremental run. For these commits also several functions were detected as being changed even though only a very little number of relevant lines of code had changed. The reason for this seems to be that

  • the changed file contains a macro using __LINE__
  • an added/removed line somewhere in the file entails that at all following occurrences of the macro, a change is detected because the line number is different
  • this leads to a bunch of changed functions in the respective file for which new ids are generated
  • if one of them contains a malloc, a different heap variable is created
  • we suspect that because of that, even with reluctant destabilization, the changed function containing the malloc will need to be fully destabilized

A commit where this problem can for example be observed is facebook/zstd@a8adfa7. It has 2 relevant changed lines of code, 17 changed functions, took ~260s to analyze incrementally and ~390s with reluctant destabilization (the non-incremental analysis of the parent commit took ~1220s).

@sim642 sim642 added performance Analysis time, memory usage preprocessing C preprocessing labels Mar 8, 2022
@sim642
Copy link
Member

sim642 commented Mar 8, 2022

--set pre.cppflags[+] -D__LINE__=0 works (and similarly for others, e.g. __FILE__) and just makes the preprocessor whine a bit. Not sure if there's any better solution.

EDIT: We could also supply a custom header that forces corresponding #defines, but it should probably be disabled by default.

@sim642
Copy link
Member

sim642 commented Mar 8, 2022

The standard might not like that though:

Attempts to redefine or undefine these macros result in undefined behavior.

(https://en.cppreference.com/w/c/preprocessor/replace)

@michael-schwarz
Copy link
Member

Redefining __LINE__ might work, but it is a bit risky. It may, e.g. (as Sarah pointed out to me yesterday), lead to miscompilation for Macros that contain labels and use __LINE__ to make sure the label in each expansion of the macro has a unique name.
With these really huge codebases, such things do happen and while this will hopefully cause CIL to just reject the file, I'm a bit worried CIL might just continue with a warning, which will make us unsound. (Not even talking about the obvious unsoundness if you have sth like int i = __LINE__;

We were thinking about some less brutal options, such as silently patching constant int arguments to functions that do not have a body, and for which we do not use the special transfer function (we should be able to decide this ahead of our analysis based on our list of library functions). This would of course not cover all instances, but at least the one we were seeing in the benchmark, and maybe that is good enough for now?

@sim642
Copy link
Member

sim642 commented Mar 9, 2022

We were thinking about some less brutal options, such as silently patching constant int arguments to functions that do not have a body, and for which we do not use the special transfer function (we should be able to decide this ahead of our analysis based on our list of library functions). This would of course not cover all instances, but at least the one we were seeing in the benchmark, and maybe that is good enough for now?

I don't really understand how this is related because __LINE__ gets replaced with integer constants by the preprocessor, before CIL and let alone Goblint sees anything. At that point those constants are indistinguishable from literal constants in the original source.

In zstd's case, it might be easiest to just disable DEBUG, so these __LINE__ constants should get totally preprocessed away: https://github.com/facebook/zstd/blob/698f261b35eb256921d95a36ce5af0c29e4df45e/zlibWrapper/examples/zwrapbench.c#L88-L102.

@michael-schwarz
Copy link
Member

michael-schwarz commented Mar 9, 2022

At that point those constants are indistinguishable from literal constants in the original source.

Yes, but for functions for which we do not have a body (and which are not special, i.e.malloc and friends), we don't do anything with constant int arguments, so we can just silently patch it (to the new value) and treat the code as unchanged, right?

@sim642
Copy link
Member

sim642 commented Mar 9, 2022

Kind of off-topic, but the code in facebook/zstd@a8adfa7 looks suspiciously weird:

    if (fileSize == UTIL_FILESIZE_UNKNOWN) 
    	EXM_THROW(32, "This file format is not supported : Dictionary file %s\n", fileName);
    {
        // ...
    }

That block isn't even attached to the if. Maybe it's intentional, but that's Apple's SSL exploit-level misleading style for sure.

@sim642
Copy link
Member

sim642 commented Mar 9, 2022

Yes, but for functions for which we do not have a body (and which are not special, i.e.malloc and friends), we don't do anything with constant int arguments, so we can just silently patch it (to the new value) and treat the code as unchanged, right?

So if I'm getting this right, what you have in mind is very specific to zstd's case, where __LINE__ is used directly as the argument of fprintf, which no analysis currently handles?

I'd rather just preprocess to disable those incrementally-fragile macros of a particular project (which has zero unsoundness risk!) than bake lots of assumptions into Goblint:

  1. __LINE__ is used as a direct argument of an undefined function (like fprintf).
  2. No Goblint analysis (including in the future) cares about the integer constants of special call arguments. Besides malloc and friends, it's not unimaginable that one day we might have a better out of bounds analysis, which covers a few standard memory management functions or things like snprintf that also rely on integer constants in their arguments.
  3. The function call is direct, not via a pointer, which we cannot syntactically detect.
  4. The project doesn't use a custom defined output function with a line argument. Any wrapper function would completely defeat this hack.
  5. Output based on __LINE__ isn't constructed via the preprocessor itself. For example (classical preprocessing technique):
    #define STR_HELPER(x) #x
    #define STR(x) STR_HELPER(x)
    char* i = "line:" STR(__LINE__); // string literal "line:3"

@michael-schwarz
Copy link
Member

I mean when one describes it like that, I think it sounds worse then it is 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Analysis time, memory usage preprocessing C preprocessing
Projects
None yet
Development

No branches or pull requests

3 participants