Add support for Yul AST nodes #163

d1ll0n · 2022-11-06T09:44:20Z

Description

I've made my own kind of janky mixins to force solc-typed-ast to work with Yul in my projects, but I'd like to directly integrate it in the package so that it works smoothly with all the tooling you've made for solidity. I'm assuming the current lack of support for Yul is due to time and priorities rather than intentionally being excluded, but please let me know if I'm wrong.

I have a good idea of what all I need to update, but there are a few questions with regard to your preferences and general repository understanding. I'd love any additional feedback on this or requirements you would have to get these changes merged into the repo.

For clarity, I imagine adding support for Yul isn't a top priority for you, so I'm not requesting any time from your team to integrate these features beyond any input you'd be willing to give and then a code review when it's ready.

Todo

Create classes for every Yul AST node type.
Add processor files for every Yul AST type to src/ast/modern
Add make and copy support to ASTNodeFactory
Update scope resolution methods to recognize YulBlock, YulForLoop, YulFunctionDefinition as scopes and recognize YulVariableDeclaration and YulFunctionDefinition as definitions.
Modify the Yul AST writers to use the new classes and produce structured SrcDesc outputs with documentation like the other writers, rather than strings.
Add pretty printers in src/types

Questions

I have a couple specific things I'd like additional input on:

1. Missing requirements
Am I missing anything in the todo list above re: critical areas that will need to be updated?

2. External reference expressions
solc does not produce any specific details about the expression used to reference external identifiers, e.g. if you set a calldata value's offset with x.offset := 10 the AST node will simply be a YulIdentifier with { name: "x.offset" }, whereas a solidity AST node would wrap that with a MemberAccess. Any thoughts for how to handle this? I am planning to add a referencedDeclaration field to YulIdentifier, but this should ideally always point to an actual declaration. Maybe there could be another property externalReferenceExpression with a full AST node for the expression used?

I also have some questions regarding your (Consensys team's) preferences:

3. Documentation
I don't believe solc recognizes natspec/documentation for Yul nodes. Are you generally opposed to extending the AST node classes beyond the original types, or would it be alright to add documentation support to these nodes? It looks like there is already a post-processor to pull additional documentation beyond what solc produces, so I assume this is fine.

4. Referenced declarations
solc does not record any information about referenced declarations for Yul identifiers. I was planning to integrate an identifier resolution step as a post-processor to populate that information in the nodes. Do you have any issue with this? I ask because it'd add a fair number of additional steps to generate an AST beyond what most of the processors do.

5. Older versions of solc
Since older versions of solc simply produce strings for yul blocks, how should this be handled? I'd assume you'd want to leave those as is and only process into objects for modern AST outputs.

6. Directory structure
I was thinking I'd add a new yul directory under src/ast/implementation with statement and expression subdirectories. Any preferences?

Context

Yul AST nodes currently have minimal support in the library, the data is just copied from solc's output into the parent InlineAssembly object and cast to a very vague YulNode type. Because they are supported by the writer, one can still manipulate yul nodes manually and translate back to solidity code, but because they are not instantiated with their own classes the way other nodes are, none of the tools in solc-typed-ast can be applied to them without significant work-arounds.

The text was updated successfully, but these errors were encountered:

d1ll0n · 2022-11-08T01:55:40Z

I started working on this here: https://github.com/d1ll0n/solc-typed-ast/tree/yul-ast

blitz-1306 · 2022-11-09T17:42:29Z

Hello @d1ll0n.

First of all, thanks for the interest in the project and your time investments. Also, sorry for late response - due to business, I'm not able to invest monitor incoming messages from all of the projects. I would try to do this more regularly.

Now, to your questions:

Missing requirements

We originally implemented Yul support for writing AST back to source. There were no additial requirements aside of this, and we also did not invested much time due to lack of functionality requests or ideas on Yul processing.

The root in Solidity AST is InlineAssembly node where we have two cases:

Legacy Solidity returns Yul sections as source string (not sure about particular compiler version).
Modern Solidity returns Yul AST.

So beware that AST support would be limited to certian compiler version... Otherwise it would require to write a parser to also handle strings with Yul content.

The only other Yul-related component is located in ast-to-source writer that also would need to be updated.

UPD:

I think that we should skip scope resolution changes for AST due to it would impact BC for consumers (mainly, https://github.com/ConsenSys/scribble). The option here is to introduce a standalone function for scope resolution for YulNode subtree.

External reference expressions

I'm have no ideas also. I tried to do the same thing some time ago, but there was something that prevented me to do this. AST node contains some references to identifiers, that are used in assembly (externalReferences). I wanted to introduce a list all of declarations, that are referenced in InlineAssembly due to it may be beneficial for analysis software to have ability to ensure if some variables are modified via assembly. I dropped to do this, but do not recall why. Probably that I didn't have much time to finalize my work and were unsure how we can use it.

Documentation

I do not mind to pull documentation, but beware that solc-typed-ast does not always have an access to the source. We pull documentation if source is accessible. Also our current approach is not perfect. Also a would suggest to go with iterations and do not try to introduce all-at-once.

Another this that I'm think of, is that the YulNode is not limited to be a descendant of an ASTNode. They may have different properties and maybe root methods. Basically, you are free on conceptual part. What we can do is to prepare a prototipe, do a couple of reviews and figure out how do we merge codebase to have consistent interface.

Referenced declarations

This was a particular reason why we picked compiler AST over ANTLR some time ago. Compiler AST already produces referencedDeclaration property for common nodes... However it also sometimes was error-prone so we do some fixes in post-processing.

In Yul this is a compilcated thing... Mostly because it would require to model how compiler does this. We sometimes look under-the-hood to figure out what compiler does in certain cases. But here I guess it would need to invest time into test harnesses to be sure that AST matches the compiler rules.

Older versions of solc

+1 for not touch legacy compilers yet. Also related to what I tried to explain in answer to question 1. Investing time into a parser for legacy code without a reason seem to be an overkill (at least for now).

Directory structure

We can start with this and reconsider if there would be a reason. For now it seem to be fine decision.

Other options that I can think of is to rename implementation to solidity and place yul at the same level (ast/solidity and ast/yul).

Regards. Let me know what you think. Thanks.

blitz-1306 · 2022-11-09T17:55:52Z

Also, I guess @cd1m0 could also review your points and share an opinion. We are trying carefully test changes to not break dependent packages much.

d1ll0n · 2022-11-10T07:31:30Z

@blitz-1306 Thanks a lot for getting back to me on this! Couple of follow-ups if you don't mind:

I think that we should skip scope resolution changes for AST due to it would impact BC for consumers

Sorry, what does BC stand for?

I think it makes sense to separate scope resolution for Yul nodes, particularly since you need to handle lookups slightly differently.

Would you still be fine with running this inside of a post-processor for all modern Yul AST nodes? As long as it works properly I don't think it'd negatively impact any dependent packages.

Another this that I'm think of, is that the YulNode is not limited to be a descendant of an ASTNode.

At the moment I think it'll be easiest to have it as a descendant to ensure all the other tools continue to work, e.g. it's useful to be able to do a child search from a normal ASTNode that can yield a YulASTNode and so on. It is an interesting idea though and maybe we'll run into something that makes keeping the ASTNode base prohibitive.

In Yul this is a complicated thing... Mostly because it would require to model how compiler does this.

Ha, indeed. I've been reading the yul package in solc and going through changelogs / testing stuff in remix, and there are lots of little undocumented oddities. One example is that in 0.6 yul identifiers could have a period, but the prefix couldn't shadow existing declarations unless they were functions.

Aside from things that cause compiler errors though, the basic scope resolution process isn't too bad. Should have something ready next week.

But here I guess it would need to invest time into test harnesses to be sure that AST matches the compiler rules.

I was thinking about this today, and I think the only way to do this properly is with code evaluation. If I add support for yul AST to the constant evaluation methods, we could test the scope resolution for Yul nodes by comparing the evaluation output to the output from @ethereumjs/vm when actually running the code.

blitz-1306 · 2022-11-10T18:10:20Z

Hello @d1ll0n.

Sorry, what does BC stand for?

Backward compatibility. If we change something, even if we maintain API, some of these changes may break or impact dependent packages. In this case we should be careful, if we change lookupInScope() or related functions that may affect resolveAny() (it is used in other packages).

I think it makes sense to separate scope resolution for Yul nodes, particularly since you need to handle lookups slightly differently.

And here I'm +1 with idea to develop other function for Yul scope resolution.

Would you still be fine with running this inside of a post-processor for all modern Yul AST nodes? As long as it works properly. I don't think it'd negatively impact any dependent packages.

Post-processing idea is fine if you want to compute only initial referenced declarations. If you want different, that would also work during tree modifications - you might need something more powerful (something like lookupInScope() for Yul). But if you want only initial data - postprocessor seem to be able handle this. And yeah, it seems that it will not impact dependencies.

At the moment I think it'll be easiest to have it as a descendant to ensure all the other tools continue to work, e.g. it's useful to be able to do a child search from a normal ASTNode that can yield a YulASTNode and so on.

Beware that other tools may not expect to get new nodes. But lets see what it may lead into.

I was thinking about this today, and I think the only way to do this properly is with code evaluation.

We have some VM-related tests in Scribble. I suggest to try to come up with a sample source, that would serve as a proof that references are fine and code behaves as expected.

Still, the implementation have higher priority now. We can keep testing ideas in mind for now, then discuss when time would be more appropriate.

Regards.

d1ll0n · 2022-11-11T03:11:55Z

Backward compatibility. If we change something, even if we maintain API, some of these changes may break or impact dependent packages. In this case we should be careful, if we change lookupInScope() or related functions that may affect resolveAny() (it is used in other packages).

Totally understand. I'm going to continue to try to modify as little behavior as I can with the existing AST nodes so that the only change is to add features rather than introduce any new issues in existing ones. I'm making rapid progress on the basic functionality needed for these changes, once I get a little further along I will focus more on unit testing. So far all of the tests pass other than the formatting output for copy, some notes on that here 18e4ec1

Post-processing idea is fine if you want to compute only initial referenced declarations

Ah so to clarify on this point, I just meant run it in the post-processor to populate the initial vReferencedDeclaration for every identifier. My plan was to make a resolveAny / lookupInScope equivalent for Yul, run it in the post-processor so you know what everything points to, and also expose that for manual tree modifications.

We have some VM-related tests in Scribble. I suggest to try to come up with a sample source, that would serve as a proof that references are fine and code behaves as expected.

Perfect, thanks!

I made a pr at #166 to track the progress on this. One thing I'd love to get feedback on is the way I am currently handling Yul node ids e7bc8ca. Some of the copy tests check for hard-coded ids in the nodes after copying them, so if we generate ids sequentially from the highest one in the original AST, we'd need to modify the tests.

At the moment, I am just sidestepping this by offsetting all yul ids by 1e5, but I wrote a utility a8167f3 which allows the ASTReader to locate the highest node id when it first reads in an AST from the compiler, then use that as the first ID for all new nodes. This ensures there are no conflicts with any of the AST nodes in the original sources imported, however it would prevent importing fragments of AST data sequentially since IDs would already be taken.

Given that the ASTContext is separate for every compilation, I think this should be ok, but I am not familiar with how other projects are using this. Can you think of any reason that it would be important for backwards compatibility (or generally) to be able to read in additional compiler outputs after the fact? They would have to already have ids consistent with the previous ones, so I would assume there is never going to be a situation where someone would do reader.read(sources_fragment0), reader.read(sources_fragment1). They all have to have been produced at the same time or they will be incompatible with each other.

Aside from doing multiple imports, which seems unlikely, and rigid tests that could be modified, is there any reason that the specific values of AST nodes' ids (other than the ones initially loaded from the compiler output) would be relevant?

blitz-1306 · 2022-11-21T05:25:56Z

@d1ll0n Hey, Dillon.

Sorry it took me a week to take a look into what you submitted. I left one very raw idea here.

I would try to separate ID spaces, due to compiler is free to do whatever it wants and bring complete havok in terms of ID generation. We have two languages, so closest thing I think we can do is to try to reflect / replicate the language ID space difference in the logic. Still, I'm unsure if consequences would be applicable practially. It worth a reasearch...

blitz-1306 added the enhancement New feature or request label Nov 9, 2022

d1ll0n mentioned this issue Nov 11, 2022

(WIP) Add support for Yul AST nodes (addresses #163) #166

Open

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Yul AST nodes #163

Add support for Yul AST nodes #163

d1ll0n commented Nov 6, 2022 •

edited

Loading

d1ll0n commented Nov 8, 2022

blitz-1306 commented Nov 9, 2022 •

edited

Loading

blitz-1306 commented Nov 9, 2022

d1ll0n commented Nov 10, 2022 •

edited

Loading

blitz-1306 commented Nov 10, 2022 •

edited

Loading

d1ll0n commented Nov 11, 2022 •

edited

Loading

blitz-1306 commented Nov 21, 2022 •

edited

Loading

Add support for Yul AST nodes #163

Add support for Yul AST nodes #163

Comments

d1ll0n commented Nov 6, 2022 • edited Loading

Description

Todo

Questions

Context

d1ll0n commented Nov 8, 2022

blitz-1306 commented Nov 9, 2022 • edited Loading

blitz-1306 commented Nov 9, 2022

d1ll0n commented Nov 10, 2022 • edited Loading

blitz-1306 commented Nov 10, 2022 • edited Loading

d1ll0n commented Nov 11, 2022 • edited Loading

blitz-1306 commented Nov 21, 2022 • edited Loading

d1ll0n commented Nov 6, 2022 •

edited

Loading

blitz-1306 commented Nov 9, 2022 •

edited

Loading

d1ll0n commented Nov 10, 2022 •

edited

Loading

blitz-1306 commented Nov 10, 2022 •

edited

Loading

d1ll0n commented Nov 11, 2022 •

edited

Loading

blitz-1306 commented Nov 21, 2022 •

edited

Loading