-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor row major matrix #624
Conversation
This is a great effort! As some moment I am also curious what would be the impact if we remove MaybeUint feature and use raw vector as we only have one implementation. I just do a very quick tried on this branch against master branch on riscv_add benchmark with command on remote ceno benchmark machine, with command
and the result turns out to be ~+6% slower on e2e latency.
I believed with fibonacchi example it might be even much slower. So it's still nessesary to this feature for keeping high performance. |
If we just talked about capture this unssignment error, I think we can achieve it from different path: in unittest we support init vector into 2 different default value with same execution trance, and then compare 2 witnesses which should be identical. An unequality indicate there are some unassigment bug. And with that, we can capture the unassignment error in early stage. |
Thank you! This is still work in progress; I'm examining some ways to make it faster while keeping it clean/safe. My bench results are a bit better for some reason:
What regression % (evaluated at yours) would you consider acceptable? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking nice so far 👍
Could you please add to the PR description a very brief overview over what we are doing, and more importantly why? (The what can be really brief, because the text of the PR answers that question in detail.) Thanks! |
Yes, I have been rather suspicious of |
I noticed the different with yours vs remote ceno benchmark server probably on environment number of cores. To me any regression of % is somehow unacceptable. So I would suggest to go from another way: in unittest initialized with 2 different default value with same witness, then check the witness polynomial equality. This help us to identify the problem while not sacrificing performance. |
If you want some inspiration for how to do this kind of change, you might want to have a look at how plonky3 changed the organisation of its data compared to plonky2. (It might be a bit much too read, though.) |
@hero78119 There is a draft of the approach based on testing here: #597 |
To get back to optimal performance without unsafe types, there could be another constructor that accepts a vector or an iterator. Callers can built their Vec with e.g. flatten and collect. See also the issue #600 |
171ca51
to
7b11a81
Compare
### Context Existing benchmark `riscv_add` are just for testing single opcode performance. This PR aim to have a first e2e bench which is representative to a workload in real world. ### What has been cover - add fibonacci bench with sp1 platform hardcode - refactor e2e flow into sub-function to maximize code sharing ### what's not being covered Hey @mcalancea, key generation + witness assignment are still encaptulated in same function and not including in bench latency meature. Break down key generation and extract witness assignment will expose massive intermediate variables in between, so probably need another refactor and design to support #624
6e0960b
to
e13ef15
Compare
f81730d
to
f179d5d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome job, and this PR bring more insighful takeaways for the future implementation guidelines 🔥 🔥
.par_iter_mut() | ||
.skip(i) | ||
.step_by(self.num_col) | ||
.map(|v| unsafe { mem::replace(v, mem::MaybeUninit::uninit()).assume_init() }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cited from @mcalancea: mutable vector in place change probably trigger false sharing
In comparison, new design just allocated new vector and collect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Just noting it's pure speculation, I haven't investigated it. But false sharing is perhaps something to keep in mind throughout the code-base.
Benchmarks result is awesome 🔥 🔥 🔥 On fibonacchi witness generation vs master branch => ~ -20% throughput improvement command
results (run result twice, each time flushing with complete new baseline)
On fibonacchi e2e with 2^20 instance: ~ -15% throughput improvement
results (run result twice, each time flushing with complete new baseline)
|
bdf705c
to
da45ffc
Compare
da45ffc
to
94001c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
epic works and LGTM!
(just one comment of TODO cleanup)
Written December 10th, most content and conversation predates this description.
Fixes #600.
Makes several changes to the way
RowMajorMatrix
works:MaybeUninit
. Rather, its contents are initialized toT::default()
using parallel iterators.InstancePaddingStrategy
argument which describes the type of padding that they want. This padding is then performed at the very latest stage (in the call toself.into_mles()
).self.into_mles()
does more than before.de_interleaving
is removed because the indirection through a bi-dimensional matrix is unnecessary. Notably,self.into_mles()
parallelizes the work differently thande_interleaving
used to.