Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FFmpeg #18

Open
michael-schwarz opened this issue Feb 21, 2022 · 5 comments
Open

FFmpeg #18

michael-schwarz opened this issue Feb 21, 2022 · 5 comments
Labels
goblint Goblint-specific problem parsing-succeeds project Project to analyze

Comments

@michael-schwarz
Copy link
Member

michael-schwarz commented Feb 21, 2022

FFmpeg (https://github.com/FFmpeg/FFmpeg)

  • ./configure --disable-x86asm --disable-ffplay --disable-ffprobe --disable-inline-asm --disable-doc
  • compiledb make
  • CIL issue: int i, ret, __attribute__((unused))(version), nb_curves; (Support for comma-separated list of declarations with attributes in non-leading position cil#76) and __builtin_nanf("") in global initializers cil#78
  • ./goblint ../../bench-repos/FFmpeg --disable ana.base.context.non-ptr --disable sem.unknown_function.spawn --set ana.thread.domain plain --enable exp.earlyglobs --set ana.base.privatization none --enable dbg.print_dead_code --set cppflags[+] -DGOBLINT_NO_BSEARCH --set cppflags[+] -DGOBLINT_NO_ASSERT --set cppflags[+] -UHAVE_INLINE_ASM --set result json-messages -v
  • Remaining issue seems to be that the wrong main is picked fftools/ffmpeg.c:4825: Warning: def'n of func main at fftools/ffmpeg.c:4825 (sum 10757610) conflicts with the one at doc/print_options.c:121 (sum 2656015); keeping the one at doc/print_options.c:121.
    • Manually remove doc/print_options.c from compilation database
  • Requires Speedup connectedness by doing it on a smaller hashtable analyzer#603 or turning on merging of inlines
@sim642 sim642 added the goblint Goblint-specific problem label Feb 22, 2022
@michael-schwarz
Copy link
Member Author

michael-schwarz commented Feb 22, 2022

A first run on these benchmarks (merging of inlines enabled) is thoroughly disappointing:

  • Goblint runs for approx 3.5h on this benchmark, gobbling up ~140GB of RAM

  • vars: 331270, evals: 498201 (with 17245 contexts for 960 functions. Top 5 functions:)

  • An overwhelming majority of the code is dead once again

    Found dead code on 394320 lines (including 393571 in uncalled functions)!
    Total lines (logical LoC): 410450
    
  • We seem to produce so many warnings that trying to write them out in json produces a stack overflow in List.map

  • Global initializers take a non-trivial amount of time (Initializing 10815 globals. Executing 552721 assigns.)

  • Timings:

    TOTAL                          103.599 s
    parse                          46.593 s
    convert to CIL                 57.006 s
    analysis                        0.000 s
     makeCFG                         1.833 s
     connect                         3.710 s
     global_inits                   3893.714 s
     solving                        7973.467 s
       S.Dom.equal                     0.423 s
       postsolver                     2942.472 s
     warn_global                     0.451 s
       access                          0.413 s
    
  • Race Warnings:

    Summary for all memory locations:
      safe:         3359
      vulnerable:    433
      unsafe:       2091
      -------------------
      total:        5883
    
  • 76 instances of [Warning][Unknown] Unknown function ptr called

@sim642
Copy link
Member

sim642 commented Feb 22, 2022

  • We seem to produce so many warnings that trying to write them out in json produces a stack overflow in List.map

We should use BatList.map whereever we have possibly large lists. That is implemented using constant stack (and without reversal!), unlike Stdlib.List.map, which is just naive.

@michael-schwarz
Copy link
Member Author

michael-schwarz commented Feb 24, 2022

The stackoverflow happens inside

      Yojson.Safe.pretty_to_channel ~std:true out json

(at https://github.com/goblint/analyzer/blob/36a5c5d20e75c06c979a46996c202ed7384ec0ef/src/framework/analyses.ml#L278).
So just using BatList.map would not cut it as the call happens inside a library.
But I guess this a problem for the future once we actually manage to analyze FFmpeg and get meaningful results.

@sim642
Copy link
Member

sim642 commented Feb 24, 2022

The stackoverflow happens inside

      Yojson.Safe.pretty_to_channel ~std:true out json

(at https://github.com/goblint/analyzer/blob/36a5c5d20e75c06c979a46996c202ed7384ec0ef/src/framework/analyses.ml#L278).
So just using BateList.map would not cut it as the call happens inside a library.

Do you have a full stack trace of where inside the library? Might be worth reporting or fixing.

Other than that, maybe it's just specific to pretty_to_channel. Could be that the minimal output of to_channel doesn't suffer from the same issue.

@michael-schwarz
Copy link
Member Author

I ran another run with the fix for non privatization and ctx insens base:

Timings:
TOTAL                          106.970 s
  parse                          47.924 s
  convert to CIL                 59.046 s
  analysis                        0.000 s
    makeCFG                         2.077 s
    connect                         4.217 s
    global_inits                   268.657 s
    solving                        99596.327 s
      S.Dom.equal                     5.553 s
      postsolver                     1454.500 s
    warn_global                     1.611 s
      access                          1.500 s
Timing used
Memory statistics: total=203083937.10MB, max=35359.30MB, minor=203055616.41MB, major=3504974.32MB, promoted=3476653.63MB
    minor collections=96826212  major collections=462 compactions=0

So we land at around 27h and 35GB of RAM here. We once again have the issue with the stackoverflow inside the json-messages output.

Another observation is that global inits is now a lot faster, presumably because of small local states?

Also what is a little disturbing is that despite all of this and the analysis now taking considerably longer, most code is still dead (it dead in fact not change at all):

Found dead code on 393400 lines (including 392979 in uncalled functions)!
Total lines (logical LoC): 410495

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
goblint Goblint-specific problem parsing-succeeds project Project to analyze
Projects
None yet
Development

No branches or pull requests

2 participants