Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenCL: Default to table based AES, now backed in local memory #5613

Merged
merged 5 commits into from
Dec 17, 2024

Conversation

magnumripper
Copy link
Member

This revised version pushes 1.4 Tbps of AES-128 decryption (axcrypt) or 914 Gbps of AES-256 encryption (keepass) on a 4070ti.

The bitsliced code we defaulted to before is really good but it's register hungry. It has some merits when two or more blocks are encrypted/decrypted at once (does two in parallel) but still is slower than table based now. We can still opt in to use it.

Closes #5594

magnumripper and others added 2 commits December 12, 2024 14:11
ROR with 8, 16 or 24 can be made using byte_perm instruction.  I saw
no gain so left it disabled but it might be nice to have it sitting
there for reference / future testing.
@magnumripper
Copy link
Member Author

Up to 6.6x boost seen on super's AMD, and a mere 3x on the nvidias. I was hoping to achieve 10x but it turns out the bitsliced AES we used was too good as baseline for that to happen :) Anyway we seem to be on par with hashcat now.

Copy link
Member

@solardiz solardiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool stuff, but impossible to review for real without diving into it.

As a minor suggestion, maybe the moving of tables to a separate file can be a separate commit?

magnumripper and others added 3 commits December 17, 2024 11:23
Also use same file for bitlocker format, which had another copy
of them.

The whole commit is effectively a no-op.
This revised version pushes 1.4 Tbps of AES-128 decryption (axcrypt) or
914 Gbps of AES-256 encryption (keepass) on a 4070ti.

The bitsliced code we defaulted to before is really good but it's register
hungry. It has some merits when two or more blocks are encrypted/decrypted
at once (does two in parallel) but still is slower than table based now. We
can still opt in to use it.

Note: This commit switches all formats to table-based AES without actually
enabling the copying to local until next commit where all formats are adapted
to use it.  This very commit thus makes for a performance regression.

See openwall#5594
Enable local memory for table-based AES.  Closes openwall#5594

Bitlocker format is not affected as it has it's own implementation, but
AES performance is insignificant for it anyway.
@magnumripper
Copy link
Member Author

impossible to review for real without diving into it.

The important changes are to opencl_aes_plain.h and are surprisingly few due to macros.

As a minor suggestion, maybe the moving of tables to a separate file can be a separate commit?

I fail to see the point of that, but did so now. BTW the bitlocker format was also changed to use that table file but was otherwise not changed - it has its own copy of more or less identical (afaics) table based AES but did not gain anything from using local memory as the AES part of it is insignificant. So I did not commit any such changes to it.

@magnumripper magnumripper merged commit 4c5de1b into openwall:bleeding-jumbo Dec 17, 2024
35 of 36 checks passed
@magnumripper magnumripper deleted the opencl-aes-local branch December 17, 2024 12:26
claudioandre-br added a commit to openwall/john-packages that referenced this pull request Dec 17, 2024
Add dmg-opencl, rar-opencl to the list of problematic formats.

Side effect of openwall/john#5613.

Document all formats that fail and therefore need to be disabled
during testing.

Signed-off-by: Claudio André <[email protected]>
@@ -53,10 +53,13 @@ typedef struct {
uint32_t cracked;
} result;

#define AES_MAXNR 14
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, this addition of AES_MAXNR to opencl_keepass_fmt_plug.c looks unused.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, that's a remnant from the older state struct.

@solardiz
Copy link
Member

As a minor suggestion, maybe the moving of tables to a separate file can be a separate commit?

I fail to see the point of that, but did so now.

Thank you for doing it. Looks cleaner to me that way, and makes the actual changes (in other commits) stand out.

@@ -162,7 +163,7 @@ __kernel void sevenzip_aes(__constant sevenzip_salt *salt,
/* Early rejection if possible (only decrypt last 16 bytes) */
if (pad > 0 && salt->length >= 32) {
uint8_t buf[16];
AES_KEY akey;
AES_KEY akey; akey.lt = &lt;
Copy link
Member Author

@magnumripper magnumripper Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh BTW I first had this as AES_KEY akey = { .lt = &lt }; everywhere, but some device did not like that. Unfortunately I forget which. Good to know, perhaps I should start a wiki page listing knowledge like that. Another example would be the static vs inline vs static inline that we had to add a workaround for in opencl_misc.h.

claudioandre-br added a commit to openwall/john-packages that referenced this pull request Dec 18, 2024
Add raw-SHA512-free-opencl to the list of problematic formats.

Side effect of openwall/john#5613 and openwall/john#5615.

Document all formats that fail and therefore need to be disabled
during testing.

Signed-off-by: Claudio André <[email protected]>
claudioandre-br added a commit to openwall/john-packages that referenced this pull request Dec 25, 2024
Add raw-SHA512-free-opencl to the list of problematic formats.

Side effect of openwall/john#5613 and openwall/john#5615.

Document all formats that fail and therefore need to be disabled
during testing.

Signed-off-by: Claudio André <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Faster (by a magnitude) OpenCL AES code
2 participants