forked from cisco/hash-sigs
-
Notifications
You must be signed in to change notification settings - Fork 0
/
read.me
597 lines (532 loc) · 36 KB
/
read.me
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
This code attempts to be a usable implementation of of the LMS Hash Based
Signature Scheme from RFC 8554.
This is a signature scheme (that is, someone with the private key can sign
messages, someone with the public key can verify sigantures, but cannot
generate signatures) that is based on a hash function (that is, the only
cryptographical assumption is that hash function is secure; in this case,
that you can't find preimages or second preimages). This particular
scheme is stateful (that is, the signer keeps state that must be updated
for each signature); there are stateless schemes, but they have much larger
signatures.
Here's how to use it: general workflow for the signer:
Step 1: generate a public/private keypair.
This is the hss_generate_private_key function; you give it the parameter
set you're interested in (see below for guidelines as to what's
appropriate here), a function to get randomness, a function to write
out the private key, a buffer to hold the public key, and the optional
aux data.
The below step can be repeated as needed (e.g. because of a program reload):
Step 2: load the private key into memory
This is the hss_load_working_key function; you give the private key,
a budget of the amount of memory to use, and optionally the aux data
generated during the hss_generate_private_key. This memory budget is
used as a guide for some space/time trade-offs; we'll never fail because
you haven't authorized enough memory (if you haven't, we'll reduce the
amount of memory as much as possible, hence passing a 0 will minimize the
amount of memory, which is appropriate if you'll use the private keys only
a handful of times). It turns out minimal space doesn't generate
signatures that slowly; if you have memory to burn, you could allow it to
allocate 100k and it would sign faster; if we allow enough space so that we
can place the entire bottom level LMS tree in memory, the speed up is
significant in nonthreaded mode (in threaded mode, we generate the
signature at the same time, and so we don't save as much wallclock time).
The aux data is optional (the load process will work if you don't provide
it); it'll be faster if you do (if you don't provide this, this'll
take at least as long as the original key generation). Also, the aux data
is integrity checked (and so if it was modified, this'll ignore it).
If you just generated a key, and want to sign a message, you'll need to
load the private key (even if you're not interested in saving the private
key between reloads). If this is what you're doing, I suggest you have a
moderate (e.g. 1k) array that you use to hold the aux data; without that,
the load process would need to redo almost all the computations that were
done when initially generating the private key.
Once you've expanded the key, you can use it to, say, actually sign messages.
Here are the steps you an do (and you don't need to reload the key after
every message you've signed):
Step 3: actually generate the signatures
This is the hss_generate_signature function; you give it the working key,
a function to update the private key, and the message to sign, and it
generates the signature.
Step 3a: reserve N signatures
This is the hss_reserve_signature function; this allows you to advance
the 'current counter' in private RAM by N (and so the update_private_key
function won't need to be called for the next N signature generations).
Of course, if the application exits without using all those N signatures,
those will still be considered used.
There are actually two variants of this; you could either has that N
signatures be reserved now (hss_reserve_signature), alternatively, you
can give a standing order that when we run out, we reserve extra
signatures (hss_set_autoreserve). The former works well if you have
some idle time periodically (and so this write-to-disk process mostly
happens when youre not busy); the latter works better if you don't have
idle time (and want to reduce the number of writes to disk).
The workflow for the verifier is easy: you pass the message, the public key
and the signature to hss_validate_signature; that returns 1 if the signature
validates, 0 if it doesn't. We provide APIs both to sign/validate the
signature in one shot (hss_generate_signature/hss_validate_signature), and
also init/update/finalize routines in case we have the message in multiple
pieces (see hss_sign_inc, hss_validate_inc for the details of those APIs).
The makefile generates three .a files; hss_lib.a, which includes the above
routines; hss_lib_threaded.a, which is the same, but with threading enabled
(and so -lpthread is required to link), and hss_verify.a, which just includes
the verification logic (nonthreaded; threading speeds up verification only
somewhat; if you want a threaded verification code, it should be fairly
obvious how to make it yourself).
Now, the practical problems that a (stateful) hash based signature has are:
- Statefulness; each signature effectively has a nonce, which must not be
used to sign two different messages. If we use a private key over reboots,
that means that we need to track the state in longterm storage somehow.
- Size of the signature; depending on the parameter set, the signatures might
be several k long
- There's an upper bound on the number of signatures that can be generated
with a specific public key.
This package tries to address all of them, as follows:
Statefulness; we do several things to mitigate this issue:
- Everytime we need to update the state (either during initial key
generation, signature generation or reservation (see below)), we have
the application provide a function that is supposed to write the new
state to secure storage; the signature is not released to the application
(actually, not even generated) unless that function claims success.
- There's quite a lot of state involved with this package. However, instead
of writing all that to secure storage, what we do is write a summary.
On a program reload, we read ("load") that summary back into memory,
reconstruct the full state, and continue on. Because this summary is
short, this makes it easier to store, and we have less to worry about
partial updates (say, if we crashed in the middle of an update).
- If writing the state to secure storage is expensive, what we can do is
"reserve" a number of signatures; that will update the state on secure
storage; however, if we reserve N signatures, that means that for the
next N signature operations, we won't have to actually perform the
update. Hence, we can move much of the work to time where we would have
been idle.
Signature size; we can't actually do anything about the signature format
(that's fixed in the draft), however what we can do is try to make selecting
a parameter set with a short signature reasonable:
- Efficiency; we try to make it efficient; efficiency allows you to use
more aggressive parameter sets (chiefly, ones with W=8, and relatively few
Merkle levels), which makes the signature shorter.
- We support multithreading; that is, during the key generation and key
reload procedures, we can use multiple threads to speed things up.
Speeding these up allows up to use considerably larger trees (and hence
fewer of them).
- We support 'auxiliary data' for the top level Merkle tree. Our model is
that you don't mind spending quite a bit of time during key gemeration (as
that is a rate event); you are far more sensitive to the operation
operations (such as the key reload, which you might do whenever you
restart the program). What the aux data allows you program to do is
generate the top level Merkle tree once (and store some intermediate
nodes); this means that we can practically use a very large top level
Merkle tree, without costing much (other than ken gen time). In fact,
if you're happy with 1-30 million signatures maximum, you could go with
a single level Merkle tree, which yields relatively short (<2k) signatures
- After the initial key gen and load, we compute everything incrementally.
That means that, just because we have a level 20 tree, we never have to
sit down and compute 2**20 operations at once; we spread that work over
the time we spent generating the signatures from the previous tree.
Upper bound on signatures; again, we're stuck with the upper limit on any
parameter set; however we try to make it reasonable to select a parameter set
with a large number of signatures. We gave most of the efficiency features
above; we give a table below listing various parameter sets that are
reasonably efficient, and have large upper signature bounds. For example,
if 1 billion signatures are enough, then a 20/10 parameter set might be
reasonable.
Now, here is another listing of the features of this package (in a
feature-centric order, rather than a problem-centric one:
- It tries to deal with state management. It does not assume that the
signing computer will be running without a reboot for the entire time that
the private key is valid, and it does not assume that we're able to do a
large scale update of secret state to disk (which would cause problems if we
rebooted in the middle of an update). It addresses this by dividing the
private key into three parts:
- A short section which is meant to be saved in secure storage (and this is
the part which is referred to as the 'private key'; this includes the
number of signatures generated so far. It's currently 48 bytes long
(with the part that needs to be actually updated only 8 bytes), we assume
that it is updated atomically. These 48 bytes consist of an 8 byte count
of the number of sigantures generated so far, 8 bytes of a compressed
version of the parameter set, and 32 bytes of 'seed' (a random value that
is used to derive every secret)
- A longer section (the "working key") that resides in memory; we assume
that this is private, but need not be saved into long term (nonvolatile)
storage. This section include the Merkle tree contents (actually, a
subsection, as below we implement a time/memory trade-off), the next
Merkle tree contents (incremental computation), I values, signatures for
internal public keys; for most parameter sets, this easily fits into
10-30k (time/memory trade-offs will allow it to be larger to give better
signing performannce). For example, a level 20/10 tree (maximum of 2**30
signatures), we can use as little as 16kbytes of ephemeral state (and the
time isn't all that bad after the working key has been regenerated).
The idea by the above split is that we keep the short permament key somewhere
safe (and which would survive a restart should we need to); at any time (for
example, after a restart), we can regenerate the longer section, which we
would keep in memory. We use the version of memory to actually generate the
signatures (and update the short section as needed). If we restart, we
regenerate the large working section, and carry on. In addition, by using
reservations (see below), we don't actually need to update the secure storage
every time.
By keeping the short section short, that means that we don't have to worry
about partial updates (which we would if that the private part was a
multikilobyte explicit tree); we can also use limited private space (e.g. HSM).
We also have an optional third section to the private key:
- Auxiliary data; this is an optional section that contains the node values
of parts of the top level Merkle tree; this is generated at keygen time,
and is read-only thereafter; the point of this is to speed up the process
of loading the private key into memory. This aux data is (usually)
persistent, but need not be secret.
- For a level 20 top level tree, 10916 bytes of aux data allows us to
regen the top level tree in circa a second. That means that, for a
20/10 level tree, generating the private key is expensive (3 minutes
on my test machine, a 3.3GHz Xeon), but restarts after that can be done
in a second or two.
- We also authenticate the auxiliary data; that means that an adversary
who can corrupt the auxiliary data can increase the time it takes for
us to load the working key; however they don't get any other advantage.
- I did say that the auxiliary data is usually persistent. However, if
we can't store it permamently, it still can be used to speed up the
initial program load (that is, immediately after the private key
generation); we'd use a moderate (several k) temp array to both
the key generation and first reload; if we need to do a reload
afterwards, we would not pass the aux array (as we wouldn't save it).
This'll speed up the initial load process (as it wouldn't have to redo
most of the computations that the key generation did); it won't speed
up the later reloads, however at least we got some benefit from it.
- We include a function that takes the short section, and recreate the working
portion, we refer to this as 'loading the private key into memory'. This is
meant to be called on a reload, and (assuming that you use the aux data) can
be significantly faster than generating the key in the first place.
- We incrementally compute the next trees (so we don't hit a bump when we
generate the 1025th signature)
- Time/memory trade-offs available (so we don't need so much memory for the
loaded private key)
- The 'load the private key' function (hss_load_private_key) takes a
parameter while means "try to limit the memory used to X bytes"; this
allows some flexibility in designing the trade-off (although less than I
expected when I first designed the API; the 'use as little memory as
possible' option isn't that slow, and the 'd*mn-the-memory-
full-speed-ahead' mode doesn't use that much memory (unless the bottom
level Merkle tree is huge). Yes, there's a delta, but not a big one.
- For any function that can update the private key, we have the application
(the program that is using the package to sign messages) pass a function
("update_private_key") and a context pointer. If this is non-NULL, then
when we update the private key, we call this function, what this function is
expected to do is write the private key into secure storage (and pass
nonzero on success); if you pass a NULL function pointer, then we assume
that passed context pointer is actually the pointer to the private key.
- Explicit implementation of the reservation primitive. Right now, we have
an API that states "make sure that we have N signatures reserved (so that
we won't have to update the private key for the next N signature operations).
- We can also perform operations in parallel (pthread's); do get this, you
use the hss_lib_thread.a library (and include -lpthread on the link); if
you use the hss_lib.a library, then everything is done in the main thread.
Using multiple threads speeds up the key generation and key loading process
significantly (15x in my experience, assuming you have enough CPU cores);
we also can use it during the signing and verification process, however the
speed up there is less radical (perhaps 2x).
- I believe the code is fully compliant C99 (that is, it should work on any
C99 implementation that can handle the program/data size; even silly ones
with 13 bit chars and 26 bit int's and things like that); in fact, the
private keys should be portable to different architectures. I haven't
tested out either of these claims, though.
I've included a simple program (demo.c) that uses the above facility to
sign files, and verify them. I've also included a test harness (test_hss) to
do regression testing of HSS; however I would rather advise you not to look
at how those regression tests use the library as best practice; they are
designed to push the library's corner cases, and so sometimes do things that
real applications really ought not do.
General notes:
- Advice about parameter sets: while this supports all the HSS parameter sets
currently allowed in draft-mcgrew-hash-sigs-08, some work better than others
The general recommendation is to use a few large trees (that is, with lots
of levels), rather than using a number of levels of smaller trees; this
package tries to make large trees not that costly (and reducing the number
of trees certainly cuts down on the signature size). If you have a
reasonable amount of auxiliary data available (and ideally, multithreading),
then making the top level tree H=20 (or even H=25) is doable; the main cost
is the time to do the initial key generation (which in most cases is a rare
event). On my test machine (a 3.3GHz Xeon), I can generate a key with a
H=20 tree on top (with W=8) in 3 minutes (multithreaded); loading that key
into memory (with 10k of aux data, with a single H=10 tree below it) takes
1-2 seconds. An H=25 tree on top took 1.5 hours; loading that key into
memory (with 1Meg of aux data) took a second. Increasing the lower tree to
H=15 makes loading take 6 seconds (without affecting the key generation
time). Those parameter sets give reasonable speed, with 3.3-3.7k
signatures, and allow 1 billion to 1 trillion signatures per private key
(which I expect should be sufficient for most uses; at 1000 signatures per
second, that's enough to last you between 10 days (top H=20, bottom H=10)
to 3 decades (top H=25, bottom H=15).
Here's a table that goes through a list of various parameter sets.
They all have W=8 (which minimizes signature sizes, while increasing time),
and were measured on my test platform (with threading enabled; times would
be considerably larger without threading). These are here to give a
guideline as to what's possible; for the computational time, your mileage
may vary, depending on the computing resources you have. The machine I
have does not have the SHA-256 extensions; you could possibly do
significantly better.
ParmSet KeyGenTime KeyLoadTime AuxSize SigSize KeyLifetime
15 6 sec 3 sec 100 1616 30 seconds
20 3 min 1 sec 10916 1776 16 minutes
25 1.5 hour 18 sec 21860 1936 9 hours
15/10 6 sec 1 sec 356 3172 9 hours
15/15 6 sec 10 sec 356 3332 12 days
20/10 3 min 2 sec 10916 3332 12 days
20/15 3 min 10 sec 2724 3492 1 year
25/10 1.5 hour 7 sec 87396 3492 1 year
25/10 1.5 hour 1 sec 349540 3492 1 year
25/15 1.5 hour 13 sec 87396 3652 34 years
ParmSet: this is the height of the Merkle tree(s); parameter sets listed as
a single integer consists of a single Merkle tree of that height;
parameter sets that consist of two trees are listed as x/y, with
x being the height of the top level Merkle tree, and y being the
bottom level.
KeyGenTime: the measured key generation time; this is the operation that is
done once per private key.
KeyLoadTime: the measured key load time (assuming the auxsize of the
specified amount); this is the operation that is done on program reload
to continue using an existing private key.
AuxSize: the size of the aux data that was used to test load time, the value
listed for AuxSize is in bytes. You can use smaller sizes, however that
would increase the key load time (as can be seen by the two different
KeyLoadTimes given for ParmSet 25/10)
SigSize: the size of a signature (in bytes)
KeyLifetime: the lifetime of a key, assuming we generated 1000 signatures
per second. In practice, we're not likely to get anywhere close to
1000 signatures per second sustained; if you have a more appropriate
figure for your scenario, this column is pretty easy to recompute.
As for signature generation or verification times, those are moderately
insensitive to the above parameter settings (except for the Winternitz
setting, and the number of Merkle trees for verification). Tests on the same
machine (without multithreading) gave approximately 4msec to sign a short
message, 2.6msec to verify (for this test, we gave the working key lots of
memory; if you skimp on that, the parameter sets with the larger bottom
trees slow down somewhat more than the ones with less; we also used a two
level ParmSet; a single level would approximately halve the verification
time). All times can be significantly improved (by maybe a factor of 8) by
using a parameter set with W=4; however that also about doubles the
signature size. In addition, we allow you to adjust the amount of
threading on a per-call basis (byte the hss_extra_info parameter);
I suspect that typically you'll run with the defaults.
- Portability of the private keys: I believe that the private keys are
portable to different CPU architectures (that is, picking up a private
key on one CPU and moving it to another will work)
However, I don't currently promise that private keys generated by this
version of the package will work on future versions; we reserve the right
to make changes that break current created private keys. Eventually, we'll
need to preserve this (possibly by including a version flag in the private
key); however, I don't believe that this package is mature enough yet.
After all, essentially everything involved with signing (with the sole
exception of the bottom C signature randomizer) must be derived in the exact
same way from the top level seed; once we start doing versioning, that means
we must forever support that way of deriving internal values from the secret
seed.
- Same thing with the API; eventually, we'll need to stabilize the API;
however, I don't believe we're there yet (although with the extra_info
structure, we may be getting close; any new parameter where most
applications are happy with a reasonable default can be placed into the
extra_info structure); I don't want to lock us in to what we have right now.
- This package will allow at most 18446744073709551615 signatures per private
key (that's 2**64-1), even if the parameter set would allow more. If you
think you need more signatures than that, well, dude, what are you smoking?
- The API has an optional struct hss_extra_info* parameter; this is to allow
you to convey information that typically is not needed (but on occasion,
might be). You can always pass a NULL pointer (in which the package uses
reasonable defaults); the current parameters that are exchanged:
- num_threads; this allows the application to tell this packet about the
number of concurrent threads that it is allowed to use; 1 means not to
use threading at all; 2-16 means that many threads, and 0 means the
default.
- last_signature; if the signature generation routine detects that it has
just signed the last signature it is allowed to, it'll set this flag.
Hence, if the application cares about that, then it can pass an
extra info structure, and on return, test this flag.
- error_code: if some operation fails, the field is set to a value that
indicates why it failed (similar to errno, except not a global).
Hence, if the application cares about that, then it can pass an extra
info structure, and on failure return, test this flag.
Just a word of warning; if you do pass in an hss_extra_info structure,
the fields must be initialized, passing in a structure containing random
stack garbage is begging Murphy to perform his magic. You can do this
by either:
struct hss_extra_info info = { 0 };
or
struct hss_extra_info info;
hss_init_extra_info( &info );
Future ideas; I'm not against future ideas, but this package is getting
sufficiently complex that I'm not likely to do anything that adds even more
complexity unless someone has a real need. If you do have a real need (for
one of the below ideas, or something else), plesae tell me; otherwise, I'm
likely not to implement it:
- Extend multithreading to generate the higher level Merkle signatures when we
step to the next lower level Merkle tree. We do multithreading for most of
the expensive operations, but not this one; when we need to generate a
nonbottom signature, that's done by the main thread (while none of the other
threads are doing anything). It's about half the cost of creating an OTS
public key, but it's still not free. We do this only rarely (once every
1024 signatures of the bottom level tree is of height 10), however it does
increase the worse-case time (which can be significant for some
applications). One issue is that we can't sign the NEXT tree until we've
completed building all of the subtrees; if one of the tasks is finiahing
that up, well, we might be stuck.
- Perhaps we can have support for SIMD hash implementations (e.g. AVX) that
can, in parallel, compute N independent hashes. I suspect that it wouldn't
be that difficult to add it to the Winternitz routines (which is where the
bulk of the time is spent anyways). However, it would appear that Intel's
SHA-256 instructions would be faster than AVX (and doesn't add any
complexity; recent versions of OpenSSL already include it); hence this would
be of interest only for CPUs with AVX but not SHA-256-NI; currently, that's
common, but I expect that'd change over time; it's not clear that this
amount of code would be warrented for a short term fix. On the other hand,
if we can get support for a GPU, well, that'd be a really big win (albeit
at a significantly higher complexity cost)
- Q: restarting at some points (say, at the start of a Merkle tree) is
cheaper than restarting in the middle, or near the end. Should we do an
auto-reserve if we're getting close to the end (so that, if we do restart,
that's cheaper)? One the other hand, with parallel processing on the load
function, loading anywhere is already fairly cheap.
- We might want to explicitly support a block-type interface (with read/writes)
to handle the aux data, rather than insist that it fits into a contiguous
memory segment; this may make it easier for an implementation to use a
significant sized aux data. My major complaints about this is "yet more
code complexity? Don't we have enough?" and "yet another nonobvious API to
document". On the other hand, except for H=25, we never really need more
than 10k or so of aux data; would this (code and documentation) complexity
actually be worth it?
- The hss_load_private_key takes the 'memory budget' as a parameter, which
allows the user to select the time/memory trade-off. It might be considered
sporting to give the user some context as to what the trade-offs actually
are; perhaps a way of generating a table of memory-size vs. hash compression
operations (or OTS public key generations) expected per signature; it
depends on the parameter set, and so we can't just publish a table (although
a table for typical parameter sets on a typical compiler might not be that
hideous).
- Should we make this package valid C++ as well? One issue is the malloc's
(which aren't casted), we could insert the "extern "C" {" markers (to say
these functions use the C ABI), are there other issues?
- As for going the other way (making this valid C89), well the biggest issue
there is the 64 bit int's we use (sequence_t); we do declare automatics
after executable statements, but those wouldn't be hard to move. Yes,
we could program around the 64 bit int's, however I don't see the point
unless someone really needs this.
- The logic to distribute updates to the upper level trees (which does the
work when we're close to the end of the current bottom level tree, when the
bottom level tree runs out of BUILDING_TREEs to update) rather assumes that
an upper level OTS is about the same cost as a bottom level OTS. This is
actually true if we use the same OTS parameter set (Winternitz parameter)
everywhere; however if the upper levels use a costlier one (e.g. the bottom
level tree uses W=4, the upper level trees use W=8), it's less true. It
might make sense to try to potentially split up the work of updating the
upper level tree into more signature operations; at least, in this case
(and it'd be an ideal way to add even more complexity into this package).
Of course, we'd have to ask: how common are these mixed-winternitz settings
anyways?
- Another thing that I could see being useful in some scenarios is providing
a way to safely share the same private key between multiple signers.
Here are two ways to do this:
1. There'd be a master holder of the private key (which would hold the real
private key); a secondary signer would ask for the permission to sign N
signatures; the master would pass it a 'subkey' that would give it
permission to sign using sequence numbers X and X+N-1 (where X is the
current private key value), and the master would increment its copy of X
by N. The client would load the subkey and we're good to go. It's not
clear how the master would advance its working key by N; a first cut may
be that the master does a fresh hss_generate_working_key; there are more
efficient ways to do it, but there are lots of corner cases that would
require testing to make sure we've covered everything. The subkey we
send to the client would be the private key plus one parameter (the
allowed maximum); we might want to modify our private key format to
include this maximum if we're serious about this (so that the secondary
signer could run the standard HSS code). There are likely a number of
gotcha's involved; however less than the below idea.
2. We'd also have a master holder, but instead of giving the entire secret
key to the secondary signer (and tell him "please only use sequence
numbers from X to X+N-1"), we instead gave him an entire subtree (without
giving him enough information to sign outside that subtree). Note that we
can construct a subtree of size 2**n, for any n (given a tree-based
method to derive OTS keys; that would help with side channel resistance);
it need not be an entire Merkle tree; such a method would waste up to
2**n - 1 sequence numbers of the main tree (as it assumes that the
initial sequence number we give to the secondary server is a multiple of
2**n), unless the master tried something tricky and remember those
weren't used yet). This approach would not prevent an attacker who
obtains that subkey from signing anything he wants for as many times as
he wants; we assume that valid signers will not repeat sequence numbers,
but an attacker might not feel so constrained. It would allow us, given
such a breach, to figure out who leaked their subkey; would that be
considered enough of an advantage to make up for the extra complexity?
For this to be a reasonable alternative, we'd need a facility to log who
we gave what (otherwise, what's the point?)
- Should we worry about implementations that don't support malloc? Or, have
a 'secure_malloc' that reserves memory that is never paged out (and we'd
prefer placing our keying data there)? The candidates for doing a secure
malloc would be the hss_working_key and the merkle_level (because they
both contain seeds); no other malloc'ed data structure contain any
secret data. On the other hand, there doesn't appear to be a great deal
of point in doing a secure malloc unless we know that the automatic
data is also secure (or we move all keying data out of automatic data;
could be problematic).
- Should the verifier cache previous successful verifications (at least, the
top level trees)? I personally don't think it's worth the bother (or the
complexity).
- We could design a signature format that's somewhat shorter; (although the
proof may need fiddling in some cases):
- Not having the intermediate root hashes in the signature
- A C randomizer of 32 bytes is overkill (with 16 bytes, we still make the
'guess C, generate a collision based on that' attack as difficult as a
second preimage attack)
- We don't really need explicit intermediate C randomizers for the message
hashing (as an attacker can't select that message value in advance)
- We could have the lower level I values a determanistic function of the top
level I (rather than expressing them in the signature)
- We don't need to express the parameter types for the top level signature
in the signatures
- We can omit L from the signature, and shorten q
These ideas would (for a 2 level parm set) reduce the siganture size by
about 112 bytes (we could drop 3 bytes from the public key; probably not
worth doing). Should we have an API for this alternative format?
With a bit of cleverness, we can support both (all signatures would be in a
form that's compatible with the short format; we'd be able to express them
in either format as required). On the other hand, with the current format,
a 15/10 W=8 parm set has a pk+signature length of 3184 bytes total (pk is
60 bytes, sig is 3124 bytes); this would reduce this total to 3072 bytes.
Is a <4% space reduction really worth the effort? This idea gave more
impressive savings back when we have 64 byte I values...
- Some of our internal functions have fairly generic names (e.g.
get/put_bigendian). We should go through, and give them names that are less
likely to conflict with other functions we're linked with. This is one
place where C++ namespaces would be nice...
- Some of the computation for a signing operation could be precomputed; we
could implement an API that says "call this during idle time, and your
next siggen will be cheaper". I personally wouldn't expect a lot of call
(pun intended!) for this sort of thing (and, in any case, we can't
precompute the OTS signature of the message, hence there are limits to how
much benefit we can give). And, maxing out the memory allowed in a working
key gives most of the same benefit...
- We have starting building some regression tests; we should do more
- We support incremental signing; there is a potential security problem. If
the caller plays games, he can cause bad things to happen; a) if he modifies
the signature between init and final, he could trick us into reusing an OTS
signature index, and b) if he copies the context, he could trick us in using
the same index to sign two different messages. The first would be fairly
easy to block (generate a MAC of the parts of the signature we care about,
and store that in the ctx); the second is harder. The only thing I can
think of to block it would be to have the working_key track those signature
indicies it has issued incremental signatures for, and are still pending
(this would block the first attack as well). That would likely require
dynamic memory allocation (unless we put a tight cap on the maximum number
of contexts that could be pending). My question is: should we go to that
extent? After all, we already need to trust the application already (as we
allow them to sign anything they want, using the API).
- Should we have better protection against the application accidentally
passing in an uninitialized/freed buffer as a hss_working_key, or a
hss_*_inc context structure? Right now, the value we use to say 'this is a
good structure' is 0 (hss_error_none), which is far too likely a value to
be seen in an uninitialized buffer. Should we also include a magic value
as well, for some simple validity checking? Something like that certainly
wouldn't be fool-proof; however it might catch some simple mistakes.
- Right now, we pass the update_private_key to every hss_generate_signature
operation. Would it make more sense to pass it instead when we
hss_load_private_key, and just store it in the working_key? Would it
make sense to use the same context for read_private_key and
update_private_key?
- Right now, hss_extra_info has a last_signature flag. Would it be more
useful to have a 'signatures_remaining' count instead (and when it hits
0, that'd mean we just wrote the last signature)?