Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

argon2-opencl fails on CPU and MIC #5417

Open
solardiz opened this issue Jan 6, 2024 · 17 comments
Open

argon2-opencl fails on CPU and MIC #5417

solardiz opened this issue Jan 6, 2024 · 17 comments
Labels

Comments

@solardiz
Copy link
Member

solardiz commented Jan 6, 2024

A known shortcoming/bug of the argon2-opencl format is that it fails self-test on CPU(-like) devices, as tested with ancient Intel OpenCL and AMD APP SDK that we have on our online dev boxes and with recent Intel OpenCL that @alainesp has on his laptop. We don't know exactly why - a guess is this has something to do with our usage of local memory.

The format works on most GPUs, the only exception identified so far being Intel HD Graphics, where it also fails.

The failures on CPUs and Intel GPU are FAILED (cmp_one(1)). The failure on MIC includes segfaults.

@solardiz solardiz added the bug label Jan 6, 2024
@solardiz
Copy link
Member Author

solardiz commented Jan 6, 2024

FWIW, the contents of out after the pre_processing kernel on Intel and AMD OpenCL on CPU match GPU's (so must be correct). On Intel HD Graphics, they don't match, so we seem to have/trigger a separate bug there.

So, not surprisingly, the main issue appears to be beyond pre-processing. This is consistent with this format already failing on CPUs before @alainesp moved the pre-processing from host to device.

@solardiz
Copy link
Member Author

solardiz commented Jan 6, 2024

Overriding these didn't make a difference (still works on GPUs, fails on CPUs):

#define upsample(a, b) (((ulong)(a) << 32) | (b))
#define mul_hi(a, b) ((ulong)(a) * (b) >> 32)

@solardiz
Copy link
Member Author

solardiz commented Jan 6, 2024

With the below hack and shmemSize forced to 32 KiB, it still works on a GPU, but still fails on CPUs like before:

-       uint warp   = (get_local_id(1) * get_local_size(0) + get_local_id(0)) / THREADS_PER_LANE;
+       uint warp   = (get_global_id(1) * get_global_size(0) + get_global_id(0)) / THREADS_PER_LANE;

So the issue is probably not specific to behavior of get_local_* on CPU.

@alainesp
Copy link
Contributor

alainesp commented Jan 6, 2024

Maybe we should print a warning to the user when detecting CPU or Intel GPUs besides the self-test fail? Explain the situation a little more.

@solardiz
Copy link
Member Author

solardiz commented Apr 6, 2024

In #5420, @magnumripper shows a macOS system where the format works for the first few test vectors on HD Graphics (edit: specifically, on Intel(R) UHD Graphics 630), only failing at FAILED (cmp_one(10)).

@magnumripper
Copy link
Member

magnumripper commented Dec 23, 2024

I stumbled upon some things like this

Device 6: Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz
Testing: argon2-opencl, Argon2 [BlaMka OpenCL]... Options used: -I opencl -cl-mad-enable -D__GPU__ -DDEVICE_INFO=138 -D__SIZEOF_HOST_SIZE_T__=8 -DDEV_VER_MAJOR=1800 -DDEV_VER_MINOR=5 -D_OPENCL_COMPILER -DUSE_WARP_SHUFFLE=0 ../run/opencl/argon2_kernel.cl
Build time: 19.913 ms
Build log: "/tmp/OCLrVNPK3.cl", line 101: error: an "asm" declaration is not allowed here
        asm("" ::: "memory");
        ^

1 error detected in the compilation of "/tmp/OCLrVNPK3.cl".
Frontend phase failed compilation.

Error building kernel ../run/opencl/argon2_kernel.cl. DEVICE_INFO=138
0: OpenCL CL_BUILD_PROGRAM_FAILURE (-11) error in opencl_common.c:1286 - clBuildProgram

A very weird detail is that code path should not be active on that device:

#if !gpu_nvidia(DEVICE_INFO) && !gpu_amd(DEVICE_INFO)
	barrier(CLK_LOCAL_MEM_FENCE);
#elif !__OS_X__ && gpu_amd(DEVICE_INFO) && DEV_VER_MAJOR < 2500
	asm("" ::: "memory");
#endif

It's not AMD!? Or maybe it is (the runtime) but then it's not a GPU!

@solardiz
Copy link
Member Author

It's not AMD!? Or maybe it is (the runtime) but then it's not a GPU!

That's weird, because we do check for GPU:

#define gpu(n)                      ((n & DEV_GPU) == (DEV_GPU))
#define gpu_amd(n)                  ((n & DEV_AMD) && gpu(n))

I tried reviewing the host code as well and don't see a bug that would cause DEV_GPU to be defined on non-GPU. However, I do see that load_device_info sets those flags by += instead of |=, and this is risky - if a flag is ever set more than once, we'll get carry. I think we should patch that for robustness.

@magnumripper
Copy link
Member

load_device_info sets those flags by += instead of |=, and this is risky

Good catch, I'll fix that right away (although I'm not expecting it to be the problem here).

@magnumripper
Copy link
Member

There was a //Copied from opencl_common.h comment in run/opencl/opencl_device_info.h which had me wondering - but that was an obsolete comment, the host code sources tha OpenCL header. I'm removing that comment while at it.

@solardiz
Copy link
Member Author

There's also this weirdly formatted comment (is interpreted as a // comment):

	//** Get execution time **//

and weird line wrap here:

        if gpu_amd
        (device_info[sequential_id]) {

@magnumripper
Copy link
Member

BTW in my output above we can see -DDEVICE_INFO=138. That means 128 + 8 + 2, meaning DEV_AMD_GCN_10 + DEV_AMD + DEV_GPU. But the device was Device 6: Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz. Something is amiss for sure. I need to reproduce before I believe what I see now.

@magnumripper
Copy link
Member

and weird line wrap here:

        if gpu_amd
        (device_info[sequential_id]) {

And no outer parens... the macro makes up for that but I'll be fixing that code so it doesn't look so weird.

@solardiz
Copy link
Member Author

in my output above we can see -DDEVICE_INFO=138. That means 128 + 8 + 2, meaning DEV_AMD_GCN_10 + DEV_AMD + DEV_GPU.

Yes, and also -D__GPU__ and the driver version appears to correspond to the GPU driver.

@magnumripper
Copy link
Member

I need to reproduce before I believe what I see now.

Yup, this was a red herring. Here's a snippet from my terminal history:

Device 6: Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz
(...)
Testing: rar-opencl, RAR3 (length 5) [SHA1 OpenCL AES]... (8xOMP) Device 3: Tahiti [AMD Radeon HD 7900 Series]
Testing: AndroidBackup-opencl [PBKDF2-SHA1 AES OpenCL]... PASS
Testing: agilekeychain-opencl, 1Password Agile Keychain [PBKDF2-SHA1 AES OpenCL]... PASS
Testing: ansible-opencl, Ansible Vault [PBKDF2-SHA256 HMAC-SHA256 OpenCL]... FAILED (cmp_all(1))
Testing: argon2-opencl, Argon2 [BlaMka OpenCL]... Options used: -I opencl -cl-mad-enable -D__GPU__ -DDEVICE_INFO=138 -D__SIZEOF_HOST_SIZE_T__=8 -DDEV_VER_MAJOR=1800 -DDEV_VER_MINOR=5 -D_OPENCL_COMPILER -DUSE_WARP_SHUFFLE=0 ../run/opencl/argon2_kernel.cl
Build time: 19.913 ms
Build log: "/tmp/OCLrVNPK3.cl", line 101: error: an "asm" declaration is not allowed here
        asm("" ::: "memory");

Looking closer, in the middle of one of the first lines (rar-opencl) it switches to the Tahiti Device 3 after a segfault (not caught by console redirection) with device 6 (I was running something like for device in 1 6 3 5 ; do ../run/john -dev=$device -test (...))

It shouldn't fail on Tahiti though, was that a known problem?

@solardiz
Copy link
Member Author

It shouldn't fail on Tahiti though, was that a known problem?

Not a known problem. In fact, I think I had added this asm memory barrier specifically for that old driver. I'll need to retest.

magnumripper added a commit to magnumripper/john that referenced this issue Dec 25, 2024
Also fix a few weird syntaxes in opencl_common.c

See openwall#5417
@solardiz
Copy link
Member Author

solardiz commented Dec 25, 2024

... I've just retested, and it just works on well's devices 3, 4, 5 from my account. It also builds for devices 1, 2, 6 but fails self-test.

@magnumripper
Copy link
Member

Yeah I can't reproduce. Doing some forensics it looks like I was on the branch for #5638 which is now merged and works fine. I have no idea what happened there but let's assume something was off with my testing, or perhaps a "missing" kernel-cache-clean somehow led to an odd state of things?

magnumripper added a commit that referenced this issue Dec 25, 2024
Also fix a few weird syntaxes in opencl_common.c

See #5417
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants