Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binding rules to support contract inheritance #1075

Merged
merged 40 commits into from
Sep 13, 2024

Conversation

ggiraldez
Copy link
Contributor

@ggiraldez ggiraldez commented Aug 12, 2024

This PR adds binding rules to resolve contract inheritance and virtual method lookups (both direct and via super). It does that by introducing a generalized strategy to deal with references that resolve to multiple definitions. It ranks the definitions according to selector tags found in the definitions. Currently implements two ranking algorithms:

  • A simple one, for aliasing definitions such as import aliases for symbols. In this case, the aliased definition is marked down, prioritizing the actual definitions the aliases point to.
  • One based on C3 linearisation, that can resolve calls to virtual methods in a hierarchy of contracts (either directly or via super).

For the second algorithm, a substantial amount of extra infrastructure is added to the stack graph, in the form of extra attributes that the bindings system tracks. The newly introduced graph attributes are:

  • tag to mark definitions and references with a tag to change the behaviour of the ranking algorithm.
  • parents which is applicable to both definitions and references, contain an ordered list of definitions and references to construct a hierarchy. This is relevant in contract definitions, to enumerate the base contracts through their references in the inheritance list, and in function call expressions, to indicate from which contract they are being done.
  • export_node applies to contract definitions and provides a connection point for parent contracts to access virtual members defined in derived contracts.
  • imports_node also applies to contract definitions and provide the analogous connection point used for base contracts.

These last two attributes are used in conjunction with a new property of the bindings module context which can be used to indicate the resolver what is the current binding context (ie. what contract is being compiled). With virtual methods, the actual definition of a reference can only be correctly resolved knowing the full inheritance hierarchy of the contract being compiled. When setting the context, the bindings module with patch its internal stack graph and create new links between the context contract and all the parents in its hierarchy. These are "reverse" links, in the sense that allow a reference in a base contract to resolve to a definition in a sub-contract down the hierarchy.

All this machinery (parent links, context and import/export nodes) provide the necessary information for a reference to resolve to all possible definitions in the compilation contract hierarchy. Then, the C3 linearisation ranking algorithm helps decide which one to choose.

In particular, for Solidity:

  • Contract definitions are linked to their bases via the parents attribute.
  • All contract functions definitions are linked to the contract via the parents attribute, and are also tagged with the value c3.
  • Function call references that occur in contract functions are also linked to the enclosing contract via the parents attribute.
  • Super call references are tagged with the value super.
  • All contracts have export_node set to its @contract.members scope node.
  • All contracts have imports_node set to both its @contract.lexical_scope and a super_import (which is similar to @contract.super_scope) scope nodes.
  • Import alias definitions are tagged with the value alias.

Issues remaining:

  • Generalize the detection of super calls. Currently that is done crudely by inspecting the stack graph path from the reference to the definition and searching for a push node with the symbol super.
  • Decide what to return when there are multiple definitions and the ranking algorithm cannot disambiguate them. Currently we panic and crash the program, which should be fine for well-formed Solidity programs. Probably makes sense to return a Result<Definition, ResolutionError> instead of an Option<Definition>.

Copy link

changeset-bot bot commented Aug 12, 2024

⚠️ No Changeset found

Latest commit: caf4bc0

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@ggiraldez ggiraldez mentioned this pull request Aug 15, 2024
33 tasks
@ggiraldez ggiraldez force-pushed the inheritance branch 3 times, most recently from 4eff986 to 456570b Compare August 23, 2024 21:37
@AntonyBlakey
Copy link
Contributor

AntonyBlakey commented Aug 28, 2024

I think this is totally correct. The issue is, with multiple superclasses with multiple impls of foo(), does it pick the right one according to C3, which is merely the first one found in C3 traversal order.

@ggiraldez
Copy link
Contributor Author

I think this is totally correct. The issue is, with multiple superclasses with multiple impls of foo(), does it pick the right one according to C3, which is merely the first one found in C3 traversal order.

I don't think it works with C3 linearisation in the general case, because of the way the graph is constructed. It will prioritize right-most declared base and then go depth-first.

@ggiraldez
Copy link
Contributor Author

ggiraldez commented Aug 28, 2024

I think this is totally correct. The issue is, with multiple superclasses with multiple impls of foo(), does it pick the right one according to C3, which is merely the first one found in C3 traversal order.

I don't think it works with C3 linearisation in the general case, because of the way the graph is constructed. It will prioritize right-most declared base and then go depth-first.

Here's an example that illustrates this problem:

contract A {
    function foo() public virtual {
        //   ^def:1
    }
}

contract B is A {
    function foo() public virtual override {
        //   ^def:2
    }
}

contract C is A {}

contract D is B, C {
    function foo() public virtual override {
        super.foo();
        //    ^ref:2  !!! This FAILS because it incorrectly resolves to ref:1
    }
}

The way the graph is constructed, the super resolution goes through C because it has higher precedence than B, and it keeps going deeper until it reaches A's implementation.

The `parents` attribute can now be applied both to definitions and
references (eg. to provide context for resolving `super` calls) and the
`selector` attribute can take the `alias` and `c3` values for now.

Scoring algorithm for C3 is not implemented yet, but should use the topology
defined via the `parents` attributes to linearise the definitions.
…ethods

We want to reach all available virtual method definitions when setting a
compilation context for a contract, even when the calling site occurs in a base
contract. A call to `foo()` in a contract `Base` should be able to reach the
implementation in a sub-contract `Derived` even though there is not direct link
from `Base` to `Derived`.
`import_nodes` and `export_node` will be used in post-processing to link the
virtual method definitions in a contract definition being compiled with all its
base contracts (recursively). This allows any call to a virtual method anywhere
in the hierarchy to be able to reach all the definitions available for that
method. This in combination with the ranking by linearisation should allow us to
choose the correct implementation when resolving a virtual call.
This also fixes an issue with multi-part files where we were not properly
trimming the contents of each part and it resulted in a leading newline (part of
the separator) being added to all parts.
Setting the bindings context we indicate the bindings resolution that we're
working on a specific definition (eg. we're compiling a specific Contract type
in Solidity). This means that when resolving references, to account for virtual
method lookups, the universe of reachable definitions can now be expanded to
include implementations defined in more derived types in the hierarchy, up to
the set context.
@ggiraldez ggiraldez marked this pull request as ready for review September 11, 2024 22:21
@ggiraldez ggiraldez requested a review from a team as a code owner September 11, 2024 22:21
Copy link
Contributor

@AntonyBlakey AntonyBlakey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except for the panic on invalid input. A specific requirement is that we do best-effort binding on input that has errors, which will likely form the majority of the inputs we see in e.g. an LSP that tracks real-time code edits.

This allows to differentiate the unresolved case from the multiple ambiguous
definitions found. Given that we need to tolerate invalid inputs (common to
intermediate states such as LSPs), we should definitely not panic when there are
ambiguous multiple definitions.
Selector was never a good name to begin with. Tag is a bit too generic, but so
is selector for that matter. And since we're using this attribute to annotate
definitions for filtering, I think it's fitting.
This removes the code that we used to detect if a reference identifier was a
`super` call that inspected the stack graph paths found looking for the symbol.
The implemented solution is simpler and more extensible.
@AntonyBlakey AntonyBlakey added this pull request to the merge queue Sep 13, 2024
Merged via the queue into NomicFoundation:main with commit 2924b53 Sep 13, 2024
1 check passed
crates/metaslang/bindings/src/builder/mod.rs Show resolved Hide resolved
crates/metaslang/bindings/src/lib.rs Show resolved Hide resolved
owner: self.owner,
handle: path.end_node(),
})
pub fn jump_to_definition(&self) -> Result<Definition<'a, KT>, ResolutionError<'a, KT>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit suggestion: using resolve_definition() instead of jump_to_definition(). The latter is an IDE feature name, and only one use case of resolving definitions. Outside IDEs, there isn't an active editor/viewing panel to "jump" in.

Copy link
Contributor Author

@ggiraldez ggiraldez Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I'll rename it. Although, shouldn't it be named resolve_reference() (or just resolve() since it's a Reference method)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the reference is already "resolved". We created/returned a handle to it.
What we are resolving here is the definition it belongs to, which may or may not exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... we may be overloading the word resolve. With resolve I mean finding the Definition (or Definitions in case there is ambiguity) that a Reference should bind to. I think you also refer to resolve as retrieving a Reference/Definition from a Cursor?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are talking about Reference::jump_to_definition() here, right?
IIUC, user already has a Reference handle to invoke this method, so IMO it is a bit confusing to resolve_reference(), considering that it is already resolved (from the identifier). The thing that needs resolving (and the user doesn't have yet) is the definition...
Similarly, I think find_definition() would work here as well..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I like find_definition(). It seems to be the most unambiguous.

crates/metaslang/bindings/src/lib.rs Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants