Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Torus-acceleration for multiexponentiation on GT #485

Merged
merged 16 commits into from
Dec 1, 2024
Merged

Torus-acceleration for multiexponentiation on GT #485

merged 16 commits into from
Dec 1, 2024

Conversation

mratsim
Copy link
Owner

@mratsim mratsim commented Nov 27, 2024

This project was sponsored by the Ethereum Foundation under the name
Implementing & Accelerating Torus-based cryptography for SSLE, FY24-1672
and is a collaboration with Robert Granger (https://www.surrey.ac.uk/people/robert-granger) and Antonio Sanso (@asanso).

Overview

Currently, Ethereum validators are known up to 6.4 minutes in advance (1 Epoch = 32 slots = 32*12 seconds).
In theory a malicious actor may do denial-of-service attacks on upcoming validators to prevent block production.

To prevent this a Single Secret Leader Election protocol (SSLE) may be used to keep the identity of a block producer under wrap.

The protocol identified for Ethereum is Whisk:

However it was initially instantiated on elliptic curve groups (G1 or G2) and pairings could unravel the whole scheme (similar to I guess the MOV attack https://crypto.stanford.edu/pbc/notes/elliptic/movattack.html ).
To prevent this attack vector, the scheme can be instantiated on the pairing group GT.

This means computations on Fp12 instead of G1 (2x Fp), meaning we should expect computations to be 6x more expensive.

Full overview: https://crypto.stanford.edu/pbc/notes/elliptic/movattack.html

The goal of this PR is to reduce the overhead of GT multiexponentiation from the initial 5x https://ethresear.ch/t/the-return-of-torus-based-cryptography-whisk-and-curdleproof-in-the-target-group/16678/3, ideally to 3x to make Whisk viable latency-wise.

Implementation

  • This adds torus-based cryptography for GT that can be combined with 4-way endomorphism acceleration
    • It uses projective coordinates to delay "affine" conversion, similar to elliptic curve affine/projective coordinates
  • It adds a Toom-Cook+DFT multiplication/squaring algorithm for Fp6 (unused at the moment as unexpectedly slower, see Future work section

Note: the Torus acceleration as implemented seems to be only valid for curve with 1+𝑖 as sextic non-residue, with 𝑖 = √-1. This is the case for BLS12-381 and BN254-Nogami but not for BN254-Snarks (Zcash, Ethereum) or BLS12-377 (Aleo)

Benchmarks

We compare to BLST MSM G1, BLST is used by all consensus clients today, benchmarks from status-im/nim-blscurve#183.

The multiexp size is expected to be 128 or 256. We implement using 2 towering schemes. Note that Torus acceleration is only valid for Fp12 over Fp6 over Fp2

Machine is a Ryzen 9 9950XE, overclocked at 5.9GHz single-threaded

BLST MSM G1
image

Constantine MEXP GT
image

For 128 points, the ratio G1/GT is only 3x.
For 256 points, the ratio G1/GT is 3.28x.

Future work

I have been investigating some performance bugs in the summer where Constantine starts with a 1.7x perf advantage in field arithmetic that get reduced to 1x at the elliptic curve level or even worse after Fp2 -> Fp6 -> Fp12 towering:

Solving this might reduce the gap to 2.4x (20% improvement)

Another line of work is to use SIMD for multiexp to compute on 4x 64-bit integers (AVX2) or 8x 64-bit integers (AVX512) per instruction which should conservatively bring at least a 2x / 4x perf improvement respectively.

@burdges
Copy link

burdges commented Nov 28, 2024

Why not use Ristretto or similar for Whisk?

We've a similar issue in the ring VRFs used by Sassafras on Polkadot, since my optimized design has VUF outputs on G1, but then you could move them onto a "sister curve", ala cfrg/draft-irtf-cfrg-bls-signature#30. Alistair later noticed a Plonk-ish ring VRF is plenty fast enough for our validator set size, which used VUF outputs on bandersnatch, so no requirement. I suppose a Risttretto bullet-proof ring VRF maybe even faster for us too. Afaik none of that helps you, not without replacing Whick by Sassafras.

@asanso
Copy link

asanso commented Nov 28, 2024

Why not use Ristretto or similar for Whisk?

We've a similar issue in the ring VRFs used by Sassafras on Polkadot, since my optimized design has VUF outputs on G1, but then you could move them onto a "sister curve", ala cfrg/draft-irtf-cfrg-bls-signature#30. Alistair later noticed a Plonk-ish ring VRF is plenty fast enough for our validator set size, which used VUF outputs on bandersnatch, so no requirement. I suppose a Risttretto bullet-proof ring VRF maybe even faster for us too. Afaik none of that helps you, not without replacing Whick by Sassafras.

because as highlighted in https://ethresear.ch/t/the-return-of-torus-based-cryptography-whisk-and-curdleproof-in-the-target-group/16678 we want to take the easy way and use the validator’s secret signing key k and its associated public key kG1 for bootstrapping.

@mratsim mratsim merged commit bc3845a into master Dec 1, 2024
12 checks passed
@mratsim mratsim deleted the gt branch December 1, 2024 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants