Torus-acceleration for multiexponentiation on GT #485

mratsim · 2024-11-27T22:56:03Z

This project was sponsored by the Ethereum Foundation under the name
Implementing & Accelerating Torus-based cryptography for SSLE, FY24-1672
and is a collaboration with Robert Granger (https://www.surrey.ac.uk/people/robert-granger) and Antonio Sanso (@asanso).

Overview

Currently, Ethereum validators are known up to 6.4 minutes in advance (1 Epoch = 32 slots = 32*12 seconds).
In theory a malicious actor may do denial-of-service attacks on upcoming validators to prevent block production.

To prevent this a Single Secret Leader Election protocol (SSLE) may be used to keep the identity of a block producer under wrap.

The protocol identified for Ethereum is Whisk:

https://ethresear.ch/t/whisk-a-practical-shuffle-based-ssle-protocol-for-ethereum/11763
- [WIP] Introduce consensus code for Whisk (SSLE) ethereum/consensus-specs#2800, [WIP] Introduce consensus code for Whisk (SSLE) with Curdleproofs ethereum/consensus-specs#3205
- https://github.com/asn-d6/curdleproofs/blob/1aa27a6/doc/curdleproofs.pdf

However it was initially instantiated on elliptic curve groups (G1 or G2) and pairings could unravel the whole scheme (similar to I guess the MOV attack https://crypto.stanford.edu/pbc/notes/elliptic/movattack.html ).
To prevent this attack vector, the scheme can be instantiated on the pairing group GT.

This means computations on Fp12 instead of G1 (2x Fp), meaning we should expect computations to be 6x more expensive.

Full overview: https://crypto.stanford.edu/pbc/notes/elliptic/movattack.html

The goal of this PR is to reduce the overhead of GT multiexponentiation from the initial 5x https://ethresear.ch/t/the-return-of-torus-based-cryptography-whisk-and-curdleproof-in-the-target-group/16678/3, ideally to 3x to make Whisk viable latency-wise.

Implementation

This adds torus-based cryptography for GT that can be combined with 4-way endomorphism acceleration
- It uses projective coordinates to delay "affine" conversion, similar to elliptic curve affine/projective coordinates
It adds a Toom-Cook+DFT multiplication/squaring algorithm for Fp6 (unused at the moment as unexpectedly slower, see Future work section

Note: the Torus acceleration as implemented seems to be only valid for curve with 1+𝑖 as sextic non-residue, with 𝑖 = √-1. This is the case for BLS12-381 and BN254-Nogami but not for BN254-Snarks (Zcash, Ethereum) or BLS12-377 (Aleo)

Benchmarks

We compare to BLST MSM G1, BLST is used by all consensus clients today, benchmarks from status-im/nim-blscurve#183.

The multiexp size is expected to be 128 or 256. We implement using 2 towering schemes. Note that Torus acceleration is only valid for Fp12 over Fp6 over Fp2

Machine is a Ryzen 9 9950XE, overclocked at 5.9GHz single-threaded

BLST MSM G1

Constantine MEXP GT

For 128 points, the ratio G1/GT is only 3x.
For 256 points, the ratio G1/GT is 3.28x.

Future work

I have been investigating some performance bugs in the summer where Constantine starts with a 1.7x perf advantage in field arithmetic that get reduced to 1x at the elliptic curve level or even worse after Fp2 -> Fp6 -> Fp12 towering:

Solving this might reduce the gap to 2.4x (20% improvement)

Another line of work is to use SIMD for multiexp to compute on 4x 64-bit integers (AVX2) or 8x 64-bit integers (AVX512) per instruction which should conservatively bring at least a 2x / 4x perf improvement respectively.

…ring agnostic

…2 over Fp4 doesn't (without Torus)

burdges · 2024-11-28T13:22:32Z

Why not use Ristretto or similar for Whisk?

We've a similar issue in the ring VRFs used by Sassafras on Polkadot, since my optimized design has VUF outputs on G1, but then you could move them onto a "sister curve", ala cfrg/draft-irtf-cfrg-bls-signature#30. Alistair later noticed a Plonk-ish ring VRF is plenty fast enough for our validator set size, which used VUF outputs on bandersnatch, so no requirement. I suppose a Risttretto bullet-proof ring VRF maybe even faster for us too. Afaik none of that helps you, not without replacing Whick by Sassafras.

asanso · 2024-11-28T13:31:15Z

Why not use Ristretto or similar for Whisk?

We've a similar issue in the ring VRFs used by Sassafras on Polkadot, since my optimized design has VUF outputs on G1, but then you could move them onto a "sister curve", ala cfrg/draft-irtf-cfrg-bls-signature#30. Alistair later noticed a Plonk-ish ring VRF is plenty fast enough for our validator set size, which used VUF outputs on bandersnatch, so no requirement. I suppose a Risttretto bullet-proof ring VRF maybe even faster for us too. Afaik none of that helps you, not without replacing Whick by Sassafras.

because as highlighted in https://ethresear.ch/t/the-return-of-torus-based-cryptography-whisk-and-curdleproof-in-the-target-group/16678 we want to take the easy way and use the validator’s secret signing key k and its associated public key kG1 for bootstrapping.

mratsim mentioned this pull request Nov 27, 2024

CI: drop old nim compiler versions #486

Merged

mratsim added 14 commits November 28, 2024 06:04

gt-torus: add Fp6 mul/sqr with Toom-C00k-3 + DFT

ef56e18

initial support of torus-based crypto

9c7d483

gt: add torus tests and benchmarks, make cyclotomic/pairing proc towe…

6812c37

…ring agnostic

gt: batch conversion

a867553

gt: stash progress, Fp12 over Fp6 fails ref or opt multiexp while Fp1…

04914c0

…2 over Fp4 doesn't (without Torus)

gt: add preliminary benchmarks for Torus based cryptography

9912fd1

gt: fix exponentiation by 1 and GT torus conversion

3b09ad7

gt: fix aliasing issue in mixed torus multiplication

f98af81

gt: add torus optimization to optimized GT multiexp

e989067

gt: combine endomorphism acceleration and Torus acceleration

475cc7d

gt: parallel torus multiexp

823b9f4

gt: enable endomorphism + torus

ff92dd0

gt: rework conversion

b2e9b38

test: add GT multiexp to test suite

4ba5512

mratsim force-pushed the gt branch from db2e9c9 to 4ba5512 Compare November 28, 2024 06:04

GT: fix memory leak

4cdd2aa

windows: aligned alloc need explicit aligned dealloc

463414c

mratsim merged commit bc3845a into master Dec 1, 2024
12 checks passed

mratsim deleted the gt branch December 1, 2024 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Torus-acceleration for multiexponentiation on GT #485

Torus-acceleration for multiexponentiation on GT #485

mratsim commented Nov 27, 2024 •

edited

Loading

burdges commented Nov 28, 2024

asanso commented Nov 28, 2024

Torus-acceleration for multiexponentiation on GT #485

Torus-acceleration for multiexponentiation on GT #485

Conversation

mratsim commented Nov 27, 2024 • edited Loading

Overview

Implementation

Benchmarks

Future work

burdges commented Nov 28, 2024

asanso commented Nov 28, 2024

mratsim commented Nov 27, 2024 •

edited

Loading