-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Montgomery multiplication #402
Conversation
e36ef09
to
31672dd
Compare
@prestwich Would love to see this merged! This makes Ruint competitive with Arkworks-ff for finite field math. |
cool, I'm currently traveling so it's been taking me a minute to get to. i'm going to try to sit down with the linked algorithm documentation this weekend |
I've found concrete performance to be better with I'm not sure how far we should push this observation through. E.g. would we want to do this for the division and GCD algorithm as well? Right now the library is mostly instantiate with a small compile time It maybe possible to abstract over this with something like |
CI failure is due to a backwards incompatible change in proptest, and can be smoothed over by pinning to 1.5.0. Gonna investigate what's wrong upstream |
The idea being to allow expansion and do algorithm-switching based on the current size like go's
my naive question is whether with small |
i went ahead and pushed a commit to do this and opened #409, as I'm looking to merge this branch today |
There are a couple directions,
When working with large-ish sizes, e.g. U4096, stack size may be limited and an approach 2-4 is required. When implementing runtime negotiated cryptographic protocols (I stumbled on this in the To approximate natural numbers and have virtual unlimited size, 3 or 4 is required. This is orthogonal to the problem of algorithm selection, which depends purely on size and would be the same for all approaches. Though practically, limited stack size prevents you from using approach 1 with sizes where anything other than base-case algorithms are relevant (GMP doesn't do fancy multiplication until 20-30 limbs on modern hw: https://gmplib.org/devel/thres/MUL_TOOM22_THRESHOLD) Main downside of 2-4 is that we loose the |
slightly adjusted my own cargo toml changes, approving and setting this to auto-merge |
Motivation
mul_redc
, which is critical to efficient prime field implementations.modular
: Make mul_redc alloc-free #284.Solution
PR Checklist