Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(opt): Multiple inline cache slots per call-site #31

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Hirevo
Copy link
Owner

@Hirevo Hirevo commented Jan 27, 2023

This PR follows the work done in #13 to improve the performance of message sends using inline caches for method lookups.

The current implementation uses a single never-evicted cache slot per call-site for the inline caches, which was a notable performance improvement compared to not using any cache at all.

This PR experiments with adding more cache slots per call-site (currently: 3 slots per call-site), with the goal of further improve method dispatch performance.

The initial performance assessment is that, in this current state, the additional slots do not measurably improve performance.

For reference, here are the benchmarking numbers that led to this conclusion:

image

The goal of this PR is to continue exploring how to potentially improve the implementation to make better use of these additional cache slots.

@Hirevo Hirevo added M-interpreter Module: Interpreter P-medium Priority: Medium C-performance Category: Performance improvements labels Jan 27, 2023
@Hirevo Hirevo self-assigned this Jan 27, 2023
@smarr
Copy link
Contributor

smarr commented Jan 27, 2023

This is because in our benchmarks, the large majority of calls is monomorphic.

I don't have full data for all the SOM benchmarks, but the ones that are in the AreWeFastYet benchmarks have some data here: https://github.com/smarr/are-we-fast-yet/blob/master/docs/metrics.md#dynamic-metrics

@Hirevo
Copy link
Owner Author

Hirevo commented Jan 27, 2023

Oh, I see.
I was wondering if it was me who did anything that would have slowed the lookups and therefore cancel out any performance improvement, but it makes sense that most calls is indeed non-polymorphic.

Thanks for the link, this is very interesting data.

Though I have no idea what is meant by "target polymorphism" in the document.
If I am not mistaken, "observed receiver polymorphism" is when the receiver is observed to sometimes be different types at the same call-site, but I am not sure what would be "observed target polymorphism".
Is it when the receiver type and call-site stays the same but the message sent is observed to sometimes change (like when using Object>>#perform: or Method>>#invokeOn:with:) ?

@smarr
Copy link
Contributor

smarr commented Jan 27, 2023

For the long version :) check https://stefan-marr.de/downloads/dls22-kaleba-et-al-analyzing-the-run-time-call-site-behavior-of-ruby-applications.pdf

The short version: target polymorphism is when not just the receiver type differs but there was actually a different method. Classic example is some class hierarchy where you have the method implemented at the top of the hierarchy, but always see different subclasses. The method is the same, but you still see different receivers. So, it's polymorphic over the receivers, but actually monomorphic over the targets that are activated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-performance Category: Performance improvements M-interpreter Module: Interpreter P-medium Priority: Medium
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants