-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better batch commit and switch to Reed Solomon code. #155
Conversation
* batch commit
Just finished switching to Reed Solomon code. All tests passed. |
Remained to be done:
|
Benchmarks with the RS code. Polynomials over base field.
Extension field polynomials.
|
For a more clear comparison
When the rate bit is set to 1, (rate = 2), and queries set to 973 accordingly, for base field polynomials:
|
Security bits analysis for BaseFoldFollowing is copied from JupyterLab. BaseFold parameter selection: Choose
Here
For our case, since we are using RS code, For the first part, import math
math.pow(13,1/3)/(2**9)
So Now let's choose
For different for gamma in [0.004, 0.004592, 0.005, 0.01, 0.02, 0.03, 0.04]:
d = 13
for dist in [1/2, 3/4, 7/8]:
up0 = (d * gamma + dist) / 3
up1 = 1 - math.sqrt(1 - (1 - math.sqrt(1 - dist * (1 - gamma))) * (1 - gamma))
print(
f'dist={dist}, gamma={gamma}: ',
d * gamma,
up0,
up1,
min(up0, up1) - d * gamma
)
Obviously, for large We want math.log(26/((0.004)**3)/(2**128), 2)
Quite close to 100. So it's acceptable. Then, obviously, there is a tradeoff between the distance and the soundness error for the second term. for diff in [0.10557166178759929, 0.21533333333333338, 0.257]:
print(100 / (-math.log(1-diff)))
So it's very large. 336 is already the minimal we can achieve, and it's already with RS code and expansion factor 8. This number is larger than the one specified in the current code. Now let's summarize an algorithm for computing the number of required queries. def queries(security_bit, rate_bit, num_vars, basecode_size_log, dist=None):
d = num_vars - basecode_size_log
if dist is None:
dist = 1 - (1 / (2 ** rate_bit))
gamma = math.pow(2*d/(2**(128-security_bit)), 1/3)
up0 = (d * gamma + dist) / 3
up1 = 1 - math.sqrt(1 - (1 - math.sqrt(1 - dist * (1 - gamma))) * (1 - gamma))
diff = min(up0, up1) - d * gamma
q = security_bit / (-math.log(1-diff))
return q print(queries(100, 3, 20, 7))
print(queries(100, 3, 20, 8))
print(queries(100, 2, 20, 8))
print(queries(100, 3, 20, 7, 0.557))
Here the 0.557 is the code distance of the original code in BaseFold with expansion factor 8. Overwrite the distance with it to obtain the required number of queries. Therefore, the required number of queries for the original code in BaseFold is actually 766. The 260 in current code only provides roughly 50 to 60 bits of security. def proof_size(security_bit, rate_bit, num_vars, basecode_size_log, dist=None):
q = queries(security_bit, rate_bit, num_vars, basecode_size_log, dist)
d = num_vars - basecode_size_log
merkle_path_num_hashes = (num_vars + rate_bit + (basecode_size_log - 1) + rate_bit) * d / 2
merkle_path_size = merkle_path_num_hashes * 16
commitments_size = (d - 1) * 16
final_message_size = (2 ** basecode_size_log) * 16
return merkle_path_size * q + commitments_size + final_message_size print(proof_size(100, 3, 20, 7))
So the proof size is roughly a bit more than 1MB for RS code. In comparison, the proof size for the original BaseFold code is print(proof_size(100, 3, 20, 7, 0.557))
More than twice as large as the RS code. |
According to the experiment results in the BaseFold paper, the proof size for 20 variables is approximately 4MB, which is even bigger than the analyzed result above. The paper does not mention what parameters were selected for BaseFold, so maybe it is something that favors the prover over the verifier and proof size. |
let res = poly | ||
.par_chunks_exact(message_size) | ||
.map(|chunk| { | ||
let mut target = vec![F::ZERO; message_size * rate]; | ||
// Just Reed-Solomon code, but with the naive domain | ||
target | ||
.iter_mut() | ||
.enumerate() | ||
.for_each(|(i, target)| *target = horner(chunk, &domain[i])); | ||
target | ||
}) | ||
.collect::<Vec<Vec<F>>>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part seems to be at least quadratic complexity, Because target.iter_mut()
is O(message_size * rate)-sized loop and the complexity of horner
is O(message_size * rate).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. But the quadratic complexity is only for the small chunks of constant size. The overall time is linear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much is message_size
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the complexity is FIXME: it's expensive
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
message_size
is roughly 2^7.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the FIXME. I think we can leave optimizing this later because this code is currently not in use anyway.
let mut cipher = Aes128Ctr64LE::new( | ||
GenericArray::from_slice(&key[..]), | ||
GenericArray::from_slice(&iv[..]), | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BaseFold code scheme uses random twist factors for the folding. To allow the verifier to get the same factors as the prover efficiently, these random factors are generated using AES.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that the same as extracting from transcript?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. These parameters should be determined at least before committing to the polynomial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK got you.
I think |
What's the difference between the two table? Also, there's no need to put so much data. |
I think we discussed this before I started aligning the mpcs code with the other part of Ceno. It was designed so, but with complex template types to make the same code work for both types of polynomials. Then the design was changed to current after using the new
This ??? should be satisfied by both ChallengeField and ChallengeField::BaseField. A more general trait, e.g., Originally, both GoldilocksExt2 and Goldilocks implement the
This bunch of stuff will be carried everywhere whenever you define a new function that invokes Finally, I do think polynomials of mixed types may be opened together. Maybe not in GKR, but for a typical PIOP, so keeping it flexible may be of some value. For example, in PLONK, the first committed witnesses are over the base field, then the prover will commit to some extension field polynomials after receiving challenges. Finally, all these polynomials are opened together. |
One is for polynomials over base field, another is over extension field. Updated the description. I'll choose fewer number of variables and batch sizes later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Main tasks accomplished by this PR: - [x] Replace the naive batch commit (committing to individual polys) to real batch commit, i.e., committing to multiple polynomials in a single Merkle tree. - [x] Add the `simple_batch_prove` and `simple_batch_verify` methods. These methods support opening: - One commitment that commits to multiple polynomials of the same size. - One opening point. - [x] Switch the encoding algorithm from the one in BaseFold paper to Reed Solomon code. The encoding algorithm of RS code is much faster, and RS code has better distance so allows a better parameter. - [x] Estimate the appropriate parameter for RS code. (The original `batch_prove` and `batch_verify` methods supports opening multiple commitments, multiple points and a flexible combination between polys and points, but only allow each input commitment to contain only one polynomial) --------- Co-authored-by: Wisdom Ogwu <[email protected]> Co-authored-by: dreamATD <[email protected]>
Main tasks accomplished by this PR:
simple_batch_prove
andsimple_batch_verify
methods. These methods support opening:(The original
batch_prove
andbatch_verify
methods supports opening multiple commitments, multiple points and a flexible combination between polys and points, but only allow each input commitment to contain only one polynomial)