Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster sharing-aware cost extraction #10

Merged
merged 2 commits into from
Sep 13, 2023
Merged

Conversation

TrevorHansen
Copy link
Contributor

@TrevorHansen TrevorHansen commented Sep 12, 2023

This:
(a) speeds up the current greedy extraction, and
(b) adds a faster extraction based on the code of @Bastacyclop & @mwillsey

In particular, changing the hashmap library to the rust standard collection reduces the runtime of the greedy extraction from 195 seconds to 47 seconds on my machine using all the test cases in main along with those in the open PRs:
https://github.com/TrevorHansen/extraction-gym/tree/main/data

This PR also adds a new extractor that takes about 9 seconds. It's based on the greedy extractor in #8 combined with the simple extractor in main. This doesn't produce the same results though as the current greedy extractor:

Loaded 422 jsons.
extractors: ['faster-greedy-dag', 'greedy-dag']
data/tensat/nasneta_acyclic.json 16023614866670.953 16023517195541.84
data/tensat/nasrnn.json 1158.6070294405508 930.8220652824966
data/egg/math_simplify_add.json 3 7
Cummulative time for faster-greedy-dag: 8839ms
Cummulative time for greedy-dag: 47155ms
faster-greedy-dag / greedy-dag
geo mean
tree: 0.9970
dag: 1.0002
micros: 0.9750
quantiles
tree:   0.4286, 1.0000, 1.0000, 1.0000, 1.2447
dag:    0.9957, 1.0000, 1.0000, 1.0000, 1.0225
micros: 0.0910, 0.5888, 1.0630, 1.5546, 3.0305

I think you get different answers depending on the order in which you visit the nodes.

I'm happy to clean this up as required.

@mwillsey
Copy link
Member

Oh interesting. I think this is implying that there are a couple heavy hitters in terms of time spent in extraction. The geometric mean reports that this code is about as fast, but in absolute time it's much faster. Any idea what's going on there?

@TrevorHansen
Copy link
Contributor Author

TrevorHansen commented Sep 13, 2023

Oh interesting. I think this is implying that there are a couple heavy hitters in terms of time spent in extraction. The geometric mean reports that this code is about as fast, but in absolute time it's much faster. Any idea what's going on there?

Yes, Max, you're right about that. With the 211 inputs, these 4 problems provide about 95% of the time improvement:

file faster greedy dag extractor (ms) old extractor with std hashmap (ms)
data/tensat/nasrnn_acyclic.json 312 2,560
data/tensat/nasneta.json 1,392 15,290
data/tensat/nasneta_acyclic.json 2,602 17,613
data/tensat/nasrnn.json 3,163 8,904
total 7,469 44,367

The other 207 problems take 1.3 seconds to be solved by the new extractor and 2.8 seconds for the old extractor. So there are a lot more easy problems than hard problems, which hides the improvement in the geometric average.

@mwillsey mwillsey merged commit ee68161 into egraphs-good:main Sep 13, 2023
2 checks passed
@mwillsey
Copy link
Member

Awesome, thanks for the PR! Do you happen to have additional benchmarking data as well? As it is, it's quite hard to actually see the benefit of cost-sharing extraction, since few of the benchmarks seem to have it (or at least not that we have found).

@TrevorHansen
Copy link
Contributor Author

Awesome, thanks for the PR! Do you happen to have additional benchmarking data as well? As it is, it's quite hard to actually see the benefit of cost-sharing extraction, since few of the benchmarks seem to have it (or at least not that we have found).

Yes, it makes sense that you'd like more benchmarks that show the benefit of sharing-aware extraction given that it's so expensive to undertake. I'll make some, but will be slow, probably 6 weeks before I can them ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants