You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for writing this. I had hoped to use it on some older Java code obfuscated with Soot's JBCO. Alas I'm getting a NPE at 258 due a null Frame. I'm guessing the code there is trying to do something that only make sense for JDK7+?
I looked the class file spec now and I see you have to reconstruct the valid range (in terms of bytecode addresses) for the variables, as well as infer their types. I guess if this info is wrong some debugger trying to use it will throw an exception trying to get the value for some local var.
I'm still not sure how to handle the unreachable (basic) blocks though. Because they can be in between reachable blocks, and skipped by a jump. I should probably take a look at how soot does this sort of thing, as I see it also does local variables (type) reconstruction (in a different module than one that does obfuscation, of course).
Looking at https://shipilev.net/jvm/anatomy-quarks/8-local-var-reachability/ it seems to me that LVT info can safely/simply extend every local variable to the end of the method (because javac -g actually emits that), even if a use analysis would indicate less extent. So perhaps the code in question here can be simplified to just record the first write bytecode and just 'spam' end of method bytecode for end on all vars.
Also, looking the output of some by obfuscators (JBCO) some more, they definitely move the first write to some locals to a block at end of the method, from which it then gotos to the beginning (see GIA pass in JBCO "The final method now looks like { goto P1, P2, P1, goto P2}".). So any algorithm that only does forward extension of LV scope will produce incorrect info in that case. Thus probably the safest option (for obfuscated code) would be to spam whole method scope for LVT on every LV, as this doesn't require (complicated) control-flow analysis, which I appears to me latte is not doing anyway...
Duh, I see that it's valid in the JVM spec for a LV to have different (incompatible) types at various points in the same method, as long as there are not reads after a merge producing and undef/top type. So one can have multiple LVT entries for such variables that point to a different type for different bytecode address ranges. That surely complicates matters and make the problem impossible to solve without reaching definitions analysis. (Apparently ASM itself cannot give you that, but https://github.com/saeg/asm-defuse can.)
But the Java class spec says:
There may be no more than one LocalVariableTable attribute per local variable in the attributes table of a Code attribute.
So something seems a bit broken, conceptually, in Java land. Basically, bytecode verification-wise:
To merge two local variable array states, corresponding pairs of local variables are compared. The value of the merged local variable is computed using the rules above, except that the corresponding values are permitted to be different primitive types. In that case, the verifier records that the merged local variable contains an unusable value.
However this doesn't cause verification to fail (unlike the similar situation with stacks) at least in the spec. But that affari sems unrepresentable in the LVT.
And I see at the end of the latte App that it's entirely relying on AbstractInsnNode.FRAME which is a class ver >= 50 feature (only) i.e. StackMap frames, so the latte app will do nothing for older Java (ver <= 49) class files...
The text was updated successfully, but these errors were encountered:
Thanks for writing this. I had hoped to use it on some older Java code obfuscated with Soot's JBCO. Alas I'm getting a NPE at 258 due a null Frame. I'm guessing the code there is trying to do something that only make sense for JDK7+?
I looked the class file spec now and I see you have to reconstruct the valid range (in terms of bytecode addresses) for the variables, as well as infer their types. I guess if this info is wrong some debugger trying to use it will throw an exception trying to get the value for some local var.
I'm still not sure how to handle the unreachable (basic) blocks though. Because they can be in between reachable blocks, and skipped by a jump. I should probably take a look at how soot does this sort of thing, as I see it also does local variables (type) reconstruction (in a different module than one that does obfuscation, of course).
Looking at https://shipilev.net/jvm/anatomy-quarks/8-local-var-reachability/ it seems to me that LVT info can safely/simply extend every local variable to the end of the method (because
javac -g
actually emits that), even if a use analysis would indicate less extent. So perhaps the code in question here can be simplified to just record the first write bytecode and just 'spam' end of method bytecode for end on all vars.Also, looking the output of some by obfuscators (JBCO) some more, they definitely move the first write to some locals to a block at end of the method, from which it then gotos to the beginning (see GIA pass in JBCO "The final method now looks like
{ goto P1, P2, P1, goto P2}
".). So any algorithm that only does forward extension of LV scope will produce incorrect info in that case. Thus probably the safest option (for obfuscated code) would be to spam whole method scope for LVT on every LV, as this doesn't require (complicated) control-flow analysis, which I appears to me latte is not doing anyway...Duh, I see that it's valid in the JVM spec for a LV to have different (incompatible) types at various points in the same method, as long as there are not reads after a merge producing and undef/top type. So one can have multiple LVT entries for such variables that point to a different type for different bytecode address ranges. That surely complicates matters and make the problem impossible to solve without reaching definitions analysis. (Apparently ASM itself cannot give you that, but https://github.com/saeg/asm-defuse can.)
But the Java class spec says:
So something seems a bit broken, conceptually, in Java land. Basically, bytecode verification-wise:
However this doesn't cause verification to fail (unlike the similar situation with stacks) at least in the spec. But that affari sems unrepresentable in the LVT.
And I see at the end of the latte App that it's entirely relying on
AbstractInsnNode.FRAME
which is a class ver >= 50 feature (only) i.e. StackMap frames, so the latte app will do nothing for older Java (ver <= 49) class files...The text was updated successfully, but these errors were encountered: