Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MD5_std.c: Use HH2() also in its 2x interleaved code #5636

Merged
merged 1 commit into from
Dec 29, 2024

Conversation

magnumripper
Copy link
Member

Closes #5635

@magnumripper
Copy link
Member Author

@solardiz I have no idea what formats use this? My only testing was adding a temprary #warning showing that the code was indeed compiled in, then a full -test=0 -format=cpu. And bots seem happy. Consequently I don't know if there was a boost or regression...

Copy link
Member

@solardiz solardiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also change step 48 to use HH2. The corresponding non-interleaved version has HH2 there.

@solardiz
Copy link
Member

I have no idea what formats use this?

In scalar builds, the md5crypt format uses this. Also some/most/all(?) dynamic formats that use MD5 will probably use this.

Looking at the dynamic formats stuff, quite possibly this is also frequently used even in SIMD builds, in cases where SIMD-enabled MD5 is somehow unusable/unused by a given dynamic format.

The main md5crypt format will have X2 in its algorithm name. Unfortunately, it looks like this reporting isn't propagated into the dynamic formats.

My only testing was adding a temprary #warning showing that the code was indeed compiled in

Was that in scalar or SIMD builds?

@magnumripper
Copy link
Member Author

My only testing was adding a temprary #warning showing that the code was indeed compiled in

Was that in scalar or SIMD builds?

SIMD, and I too think the dynamic formats use this a lot.

@magnumripper
Copy link
Member Author

magnumripper commented Dec 25, 2024

I fixed step 48 now but testing with md5crypt indicate a regression (not bc step 48 but the whole thing) with 1-2% on M1 macbook. I can't remember whether the original change in the same file was ever benchmarked. Very slight regression (or just noise) seen on raw-md5. This should be benchmarked on some actual non-SIMD arch though.

EDIT: Obviously non-simd build now

@magnumripper
Copy link
Member Author

md5crypt .. regression .. with 1-2% on M1

3% regression on super single thread or 8 threads, but 0-0.5% boost running 32 threads.

@solardiz
Copy link
Member

Besides speeds, I'd also look at size MD5_std.o.

This should be benchmarked on some actual non-SIMD arch though.

I suggest you try on cfarm SPARC machines https://portal.cfarm.net/machines/list/ as we don't have SIMD code for SPARC. On the other hand, the newer ones of those have MD5 instructions used by OpenSSL, so e.g. our md5crypt-long performs better than our md5crypt there.

Oh, also they have POWER7, where we broke our SIMD when we started requiring POWER8+ because of our use of 64-bit vector elements. We should ideally fix that separately, but meanwhile scalar code is used on POWER7.

Copy link
Member

@solardiz solardiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source code change looks good to me now.

@solardiz solardiz merged commit bd5bdbc into openwall:bleeding-jumbo Dec 29, 2024
35 of 36 checks passed
@solardiz
Copy link
Member

In a SIMD build, this code is compiled in, but is apparently unreachable at least with our pre-defined dynamic formats. Running -te -form=@md5 did not hit it. Tried AVX512BW and XOP builds. I also checked code size of MD5_std.o in those builds - it's either unchanged (although the code itself changes) or becomes smaller with this PR's changes. So I think this is good to keep, and if time permits we may re-benchmark on some machine where this code matters.

@solardiz
Copy link
Member

I also checked code size of MD5_std.o in those builds - it's either unchanged (although the code itself changes) or becomes smaller with this PR's changes.

Oh, I'm wrong - it's sometimes a slight code size increase.

@magnumripper magnumripper deleted the interleaved-md5-hh2 branch December 30, 2024 03:04
@magnumripper
Copy link
Member Author

magnumripper commented Dec 30, 2024

Took a while to get access to cfarm, I think I'm set now but I haven't had time to test this yet. I did find a dynamic format with "x2" in its name in a SIMD build: dynamic_18 [md5($s.Y.$p.0xF7.$s) (Post.Office MD5) 32/64 x2]. I haven't verified the code path but if it isn't correct we should fix that.

On a side note, --format=@x2 works fine on my M1 (it hits bcrypt plus dynamic 18) but on an AVX2 build, it hits nearly every format. Spaces counts though, so --format=@" x2" works around that.

On another other note, my wildcard code isn't used for algo selection so --format=@md5*x2 (or --format=@"md5* x2") doesn't work for un-matching bcrypt. I could fix that some rainy day but this is the first time I needed it so maybe it would just be bloat - unless of course I clean that code up and manage to make a canonical function of it. I kinda like that idea, useful or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MD5_std.c should use HH2() also in its 2x interleaved code
2 participants