Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenCL Add a DECLSPEC or INLINE macro #5618

Closed
magnumripper opened this issue Dec 18, 2024 · 8 comments · Fixed by #5638
Closed

OpenCL Add a DECLSPEC or INLINE macro #5618

magnumripper opened this issue Dec 18, 2024 · 8 comments · Fixed by #5638
Assignees

Comments

@magnumripper
Copy link
Member

magnumripper commented Dec 18, 2024

We've had this in opencl_misc.h for ages:

/*
 * Some runtimes/drivers breaks on using inline, others breaks on lack of it,
 * yet others require use of static as well.
 */
#if __MESA__
#define inline	// empty!
#elif __POCL__
// Do nothing (POCL complains if we redefine)
#elif gpu_amd(DEVICE_INFO) // We really target ROCM here
#define inline	static inline
#else
// Do nothing
#endif

This is suboptimal because 1) it's not obvious in affected functions that inline may be changed, and 2) it would change static inline to static static inline. Also, the whole thing might be a long gone problem - we do have code that use neither or just static and there's no reports of it breaking anything. On the other hand, just maybe #5456 is affected by what syntax (or none) is used - I recall a few formats do not emit those warnings, while most do.

FWIW hashcat has this:

/**
 * function declarations can have a large influence depending on the opencl runtime
 * fast but pure kernels on rocm is a good example
 */

#ifdef NO_INLINE
#define HC_INLINE
#else
#define HC_INLINE inline static
#endif

#if defined IS_AMD && defined IS_GPU
#define DECLSPEC HC_INLINE
#elif defined IS_HIP
#define DECLSPEC __device__ HC_INLINE
#else
#define DECLSPEC
#endif

...then all their functions start with DECLSPEC unless they're already ifdef'ed for a specific platform.

@magnumripper
Copy link
Member Author

it's not obvious in affected functions that inline may be changed

I guess the very least we should do about that is rename our inline macro to MAYBE_INLINE like in host code. I like the DECLSPEC though.

@solardiz I'm not touching this until I hear your opinion, as I suspect you may have one. Except I'll investigate at some point in time whether static/inline/both/none (or the case of not using any functions) is affecting #5456.

@magnumripper
Copy link
Member Author

magnumripper commented Dec 18, 2024

FWIW the "correct" way of writing it is either static inline or nothing at all (and most runtimes would inline anyway). No underscores or attribute should be needed.

For NOT inlining, __attribute__((noinline)) might be the best bet but I would expect it to be ignored more often than not.

@magnumripper
Copy link
Member Author

I think I'll just go with this:

#define INLINE      static inline
#define NOINLINE    __attribute__((noinline))

@magnumripper
Copy link
Member Author

I'm also considering dropping most uses of inline, relying on it mostly happening anyway, but using the INLINE macro for things like our LUT3 function and stuff like that. Then benchmark all formats before and after, comparing ptxas output and binary size and see if/what changes. Imported subdir code such as ed25519 should mostly be kept as-is.

@magnumripper magnumripper changed the title OpenCL Add a DECLSPEC or similar macro OpenCL Add a DECLSPEC or INLINE macro Dec 22, 2024
@magnumripper
Copy link
Member Author

Then benchmark all formats before and after, comparing ptxas output and binary size and see if/what changes.

Tried this (with nvidia). Nothing changed, ptxas output identical.

@magnumripper
Copy link
Member Author

magnumripper commented Dec 22, 2024

Next is to experiment with what affects #5456 and what doesn't. This has to wait a week or so, I'm stuck abroad with a laptop.

magnumripper added a commit to magnumripper/john that referenced this issue Dec 24, 2024
Define NOINLINE as "__attribute__((noinline))" which is has been seen
working. Replace all uses of the latter with the macro.
Drop the questionable "inline" macro and instead define INLINE as 'static
inline', which should be the right thing. Use this only for inlines that
were replaced by the old inline macro.

Closes openwall#5618
magnumripper added a commit to magnumripper/john that referenced this issue Dec 24, 2024
Define NOINLINE as "__attribute__((noinline))" which has been seen
working. Replace all uses of the latter with the macro.
Drop the questionable "inline" macro and instead define INLINE as 'static
inline', which should be the right thing. Use this only for inlines that
were replaced by the old inline macro.

Closes openwall#5618
magnumripper added a commit to magnumripper/john that referenced this issue Dec 25, 2024
Define NOINLINE as "__attribute__((noinline))" which has been seen
working. Replace all uses of the latter with the macro.

Drop the questionable "inline" macro and instead define INLINE as 'static
inline', which should be the right thing, for anything put POCL and MESA
which we've seen problems with in the past. Then use this macro only for
inlines that were replaced by the old inline macro (for now).

Closes openwall#5618
magnumripper added a commit to magnumripper/john that referenced this issue Dec 25, 2024
Define NOINLINE as "__attribute__((noinline))" which has been seen
working. Replace all uses of the latter with the macro.

Drop the questionable "inline" macro and instead define INLINE as 'static
inline', which should be the right thing, for anything put POCL and MESA
which we've seen problems with in the past. Then use this macro only for
inlines that were replaced by the old inline macro (for now).

Closes openwall#5618
@solardiz
Copy link
Member

@solardiz I'm not touching this until I hear your opinion, as I suspect you may have one.

Just like you, I'm unhappy we were redefining inline, so let's move away from that, like you do in your PR now.

As to the rest, I'm not sure. You experimented with it just recently and continue to do so. So I'll defer to you on it.

@magnumripper
Copy link
Member Author

I'm merging the nearly cosmetical #5638 for now. #5456 is more interesting and being able to spot where the macro is used helps a lot with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants