Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the CFG reconstruction #520

Merged
merged 2 commits into from
Jan 13, 2025
Merged

Improve the CFG reconstruction #520

merged 2 commits into from
Jan 13, 2025

Conversation

sonmarcho
Copy link
Member

@sonmarcho sonmarcho commented Jan 10, 2025

This PR drastically improves the way we reconstruct the CFG.

The new algorithm

What this PR changes is the way we find the best candidates for what I call the "switch exits", which are the blocks where the control-flow should rejoin after a switch, and should thus be placed just after the switch when reconstructing the CFG. For instance, in the snippet of code below, the switch exit is s2:

if b { s0 } else { s1 }
s2; // this block is the switch exit

The "best" switch exit is the block which come earliest in the topological order and where the maximum number of paths rejoin. For instance, consider the following code:

swich x {
  0 => (), // B
  1 => (),  // C
  2 => {    // D
     switch y {
     0 => (), // E
     1 => (),  // F
     2 => return, // G
    }
}
...; // H0
...; // H1

The CFG is as follows:

A
|
-------------------
|        |        |
B        C        D
|        |        |
|        |        |------------------
|        |        |        |        |
|        |        E        F        G
|        |        |        |
|        |        ----------
|        |        |
------------------|
|
H0
|
H1

We note that node where the maximum number of paths rejoins is H0.

The way I compute this is as follows.

We consider the CFG where we removed the backward edges (of the loops): this graph is acyclic. Imagine you take a volume of water equal to 1 and put it at the block where the switch is (e.g., A in the example above), which is the highest point, and this volume of water goes down by following the edges of the graph. Whenever there is a branching, the flow of water is divided equally between the different paths. Whenever two paths join, the flow of water is the addition of the flow of water coming from the different paths. On the graph above, it gives the following (I annotated the nodes with the flow of water, which is a fraction):

A:1
|
-------------------
|        |        |
B:1/3    C:1/3  D:1/3
|        |        |
|        |        |------------------
|        |        |        |        |
|        |        E:1/9    F :1/9   G:1/9
|        |        |        |
|        |        ----------
|        |        | 1/9+1/9=2/9
------------------|
|
H0:1/3+1/3+2/9=8/9
|
H1:8/9

For our switch exit we then simply pick the node with the highest flow (which is 8/9 here) and which comes earliest in the topological order (that is: H0).

Side remark

I'm pretty sure the algorithm which computes the quantities above has a name, but I don't manage to find it. For instance, this post on StackOverflow is looking for exactly the same thing:
https://stackoverflow.com/questions/78221666/algorithm-for-total-flow-through-weighted-directed-acyclic-graph

@sonmarcho sonmarcho requested a review from Nadrieril January 10, 2025 22:13
@sonmarcho
Copy link
Member Author

@protz I believe this solves #507 - could you confirm?
Note that I added the minimized examples as tests, but I don't know how to check that the ML-DSA examples is properly reconstructed.

@sonmarcho sonmarcho merged commit fd47c64 into main Jan 13, 2025
5 checks passed
@sonmarcho sonmarcho deleted the son/cfg1 branch January 13, 2025 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants