-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenCL: Default to table based AES, now backed in local memory #5613
OpenCL: Default to table based AES, now backed in local memory #5613
Conversation
ROR with 8, 16 or 24 can be made using byte_perm instruction. I saw no gain so left it disabled but it might be nice to have it sitting there for reference / future testing.
6154359
to
dffd323
Compare
Up to 6.6x boost seen on super's AMD, and a mere 3x on the nvidias. I was hoping to achieve 10x but it turns out the bitsliced AES we used was too good as baseline for that to happen :) Anyway we seem to be on par with hashcat now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool stuff, but impossible to review for real without diving into it.
As a minor suggestion, maybe the moving of tables to a separate file can be a separate commit?
Also use same file for bitlocker format, which had another copy of them. The whole commit is effectively a no-op.
This revised version pushes 1.4 Tbps of AES-128 decryption (axcrypt) or 914 Gbps of AES-256 encryption (keepass) on a 4070ti. The bitsliced code we defaulted to before is really good but it's register hungry. It has some merits when two or more blocks are encrypted/decrypted at once (does two in parallel) but still is slower than table based now. We can still opt in to use it. Note: This commit switches all formats to table-based AES without actually enabling the copying to local until next commit where all formats are adapted to use it. This very commit thus makes for a performance regression. See openwall#5594
Enable local memory for table-based AES. Closes openwall#5594 Bitlocker format is not affected as it has it's own implementation, but AES performance is insignificant for it anyway.
dffd323
to
493a6a5
Compare
The important changes are to opencl_aes_plain.h and are surprisingly few due to macros.
I fail to see the point of that, but did so now. BTW the bitlocker format was also changed to use that table file but was otherwise not changed - it has its own copy of more or less identical (afaics) table based AES but did not gain anything from using local memory as the AES part of it is insignificant. So I did not commit any such changes to it. |
Add dmg-opencl, rar-opencl to the list of problematic formats. Side effect of openwall/john#5613. Document all formats that fail and therefore need to be disabled during testing. Signed-off-by: Claudio André <[email protected]>
@@ -53,10 +53,13 @@ typedef struct { | |||
uint32_t cracked; | |||
} result; | |||
|
|||
#define AES_MAXNR 14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this addition of AES_MAXNR
to opencl_keepass_fmt_plug.c
looks unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, that's a remnant from the older state struct.
Thank you for doing it. Looks cleaner to me that way, and makes the actual changes (in other commits) stand out. |
@@ -162,7 +163,7 @@ __kernel void sevenzip_aes(__constant sevenzip_salt *salt, | |||
/* Early rejection if possible (only decrypt last 16 bytes) */ | |||
if (pad > 0 && salt->length >= 32) { | |||
uint8_t buf[16]; | |||
AES_KEY akey; | |||
AES_KEY akey; akey.lt = < |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh BTW I first had this as AES_KEY akey = { .lt = < };
everywhere, but some device did not like that. Unfortunately I forget which. Good to know, perhaps I should start a wiki page listing knowledge like that. Another example would be the static
vs inline
vs static inline
that we had to add a workaround for in opencl_misc.h.
Add raw-SHA512-free-opencl to the list of problematic formats. Side effect of openwall/john#5613 and openwall/john#5615. Document all formats that fail and therefore need to be disabled during testing. Signed-off-by: Claudio André <[email protected]>
Add raw-SHA512-free-opencl to the list of problematic formats. Side effect of openwall/john#5613 and openwall/john#5615. Document all formats that fail and therefore need to be disabled during testing. Signed-off-by: Claudio André <[email protected]>
This revised version pushes 1.4 Tbps of AES-128 decryption (axcrypt) or 914 Gbps of AES-256 encryption (keepass) on a 4070ti.
The bitsliced code we defaulted to before is really good but it's register hungry. It has some merits when two or more blocks are encrypted/decrypted at once (does two in parallel) but still is slower than table based now. We can still opt in to use it.
Closes #5594