Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime of estimate() for SEAL parameter set #57

Open
TabOg opened this issue Nov 16, 2022 · 10 comments
Open

Runtime of estimate() for SEAL parameter set #57

TabOg opened this issue Nov 16, 2022 · 10 comments

Comments

@TabOg
Copy link

TabOg commented Nov 16, 2022

When calling LWE.estimate(SEAL22_32768), where SEAL22_32768 is the largest SEAL parameter set given in schemes, the program has a very long runtime - I've been running on 4 CPUs (set jobs=32) for 4 hours, and I haven't seen any estimates, only the following error message:
Algorithm <estimator.lwe_bkw.CodedBKW object at ---> on LWEParameters(n=32768, q=4127301024497384737127654569660285988428494734657199391624693039270889863724412964643884811622321780427143710884821317803768340308614730759769835769241715444596770968742227220068214981847081570726751819595399909407406471037121576084674975771617472472542994965871984641, Xs=D(σ=0.82), Xe=D(σ=3.19), m=+Infinity, tag='SEAL22_32768') failed with cannot compute ceil(log(4127301024497384737127654569660285988428494734657199391624693039270889863724412964643884811622321780427143710884821317803768340308614730759769835769241715444596770968742227220068214981847081570726751819595399909407406471037121576084674975771617472472542994965871984641)/log(2)) using 256 bits of precision
For parameters this large, should only LWE.estimate.rough be used?

@malb
Copy link
Owner

malb commented Nov 16, 2022

Ideally, the estimate would complete in a reasonable amount of time, so I consider this a bug. It might be useful to put things on the deny list to figure out which algorithm is so slow?

@TabOg
Copy link
Author

TabOg commented Nov 16, 2022

To get it to finish in < 2 mins, I need to skip arora-gb and all of the bdd*: bkw has the issue above

@bencrts
Copy link
Collaborator

bencrts commented Nov 16, 2022

It makes sense to me that the hybrid attacks would be slow for large n, but bdd seems to behave weirdly as n grows:

sage: %time LWE.primal_bdd(schemes.SEAL20_1024)
CPU times: user 3.83 s, sys: 12.8 ms, total: 3.84 s
Wall time: 3.84 s
rop: ≈2^71.3, red: ≈2^71.1, svp: ≈2^68.5, β: 137, η: 167, d: 1952, tag: bdd

sage: %time LWE.primal_bdd(schemes.SEAL20_2048)
CPU times: user 15.2 s, sys: 27 ms, total: 15.2 s
Wall time: 15.2 s
rop: ≈2^72.4, red: ≈2^72.2, svp: ≈2^69.9, β: 137, η: 172, d: 4030, tag: bdd

sage: %time LWE.primal_bdd(schemes.SEAL20_4096)
CPU times: user 1min 1s, sys: 235 ms, total: 1min 2s
Wall time: 1min 2s
rop: ≈2^72.1, red: ≈2^72.1, svp: ≈2^66.3, β: 133, η: 159, d: 7851, tag: bdd

primal_usvp takes ~2ms for each of the above param sets and returns essentially the same estimate (as we might expect).

@TabOg
Copy link
Author

TabOg commented Nov 16, 2022

quadrupling with doubling n? So runtime proportional to n**2?

@bencrts
Copy link
Collaborator

bencrts commented Nov 17, 2022

Plus the fact that it is much more expensive than primal_usvp, since they should be doing roughly the same thing. I've started to look at bdd over here, some small tweaks seem to drop the running time -- although stablity of resulting estimates needs a more in-depth look.

sage: %time LWE.primal_bdd(schemes.SEAL20_1024)
CPU times: user 2.02 s, sys: 12.4 ms, total: 2.03 s
Wall time: 2.03 s
rop: ≈2^71.2, red: ≈2^71.1, svp: ≈2^67.4, β: 137, η: 163, d: 2015, tag: bdd

sage: %time LWE.primal_bdd(schemes.SEAL20_2048)
CPU times: user 6.02 s, sys: 56.1 ms, total: 6.07 s
Wall time: 6.12 s
rop: ≈2^72.6, red: ≈2^72.4, svp: ≈2^69.6, β: 138, η: 171, d: 3858, tag: bdd

sage: %time LWE.primal_bdd(schemes.SEAL20_4096)
CPU times: user 22 s, sys: 58.7 ms, total: 22 s
Wall time: 22 s
rop: ≈2^72.2, red: ≈2^72.1, svp: ≈2^68.2, β: 133, η: 166, d: 7783, tag: bdd

The best solution might be to seperate bdd and primal_hybrid entirely.

@joerowell
Copy link
Contributor

joerowell commented Dec 2, 2022

It looks like the cost comes entirely from the calls to svp_dimension in lwe_primal.py.

If I run the following script (seal_20_4096.py):

from estimator import *
LWE.primal_bdd(schemes.SEAL20_4096)

via cProfile:

sage --python -m cProfile --s tottime seal_20_4096.py &> seal_20_4096_stats

Gives me these results:

         4453292 function calls (4411138 primitive calls) in 53.418 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       23   50.367    2.190   50.410    2.192 lwe_primal.py:257(svp_dimension)
       31    0.685    0.022    0.731    0.024 simulator.py:101(<listcomp>)
       ....

So the vast majority of that 53 seconds comes from a few expensive calls to svp_dimension.

My guess is that this is actually an fpylll problem: it's likely that the actual gaussian_heuristic call is expensive, possibly due to the range of values.

For example, if I print out the values of r inside svp_dimension for these parameters then the first few seem quite large:

[9.67371330140938e114, 9.36982916809639e114, 9.07549106572299e114, 8.79039912109246e114, 
8.51426288104086e114, 8.24680101652289e114, 7.98774103599343e114, 7.73681900778966e114, 
7.49377929123748e114, 7.25837427620310e114, 7.03036413082657e114, 6.80951655717978e114,
 6.59560655459990e114, 6.38841619045855e114, 6.18773437812932e114, 5.99335666193247e114,
 5.80508500883504e114, 5.62272760669236e114, 5.44609866882982e114, 5.27501824476213e114,
 5.10931203685889e114, 4.94881122276919e114, 4.79335228342441e114, 4.64277683644481e114, 
... 10.0497632947441]

There's probably some work that can be reduced here: for example, fpylll's gaussian_heuristic recomputes the log norm of each entry each time the function is called, but here that isn't needed since r doesn't change. Depending on how much of the time cost comes from the logs vs the exponentiation, maybe using a different (batched) gaussian heuristic computation is the way to go. cProfile doesn't give us this information, so maybe just trying it is the best way to go.

@joerowell
Copy link
Contributor

@malb Do you have an opinion on if an amortized gaussian heuristic function should live inside fpylll or the estimator? I'm thinking of a function that essentially takes this loop fragment:

d = len(r)
for i, _ in enumerate(r):
if gaussian_heuristic(r[i:]) < D.stddev**2 * (d - i):

and computes certain repeated costs (e.g the logs of the r values) once, rather than on each call. It seems like something fpylll should have, but that then ties users of the estimator to a particular version of fpylll.

@malb
Copy link
Owner

malb commented Jan 5, 2023

To me this "feels" rather estimator-y, no? Why do you think it might live in FPyLLL?

@joerowell
Copy link
Contributor

Because it's also pretty fpylll-y, no?

We'd need to include the logic for computing the Gaussian Heuristic in the estimator. IIRC, the Gaussian Heuristic computation in fpylll leans on a few internal constants (e.g the log volume of a ball) to make things faster.

I'm not sure if you'd want to reproduce that in the estimator.

@malb
Copy link
Owner

malb commented Jan 5, 2023

To me this fits quite naturally here, I'd just "open up" gaussian_heuristic and do the appropriate caching "by hand", but I'm not married to that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants