-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make safe/unsafe foreign calls size dependent #282
Comments
One might say that, for large enough buffers, performance-sensitive applications should not be using strict bytestrings, and use I am not sure that we should introduce automatic switching based on size, but we could perhaps leave the choice to the caller. If we were to provide a switch, the cuttoff buffer size should probably be larger than with SHA and the like, because copying data is much cheaper than computing a cryptographic hash. |
I agree that high performance applications probably should use something other than large strict bytestrings and they do already mention that it'll take O(N) time. They don't mention that (for example) Apparently the difference between |
Right, I'm just hesitant to assume that the cutoff is comparable on all platforms, so would be fairly conservative with the buffer size (>= 128k? More?) to be sure that we're not too strongly biased to just X86_64 platforms. Do you have measurements of how long it takes to copy N bytes for a few values of N on X86 and perhaps ARM? How big are the samples that motivated this issue? |
Also what is the cost of moving all the other threads off the capability so they get a chance elsewhere while the copy is happening? How does that scale with the number of threads? |
I was thinking more >512 MB or something like that 😅. The crypto example is really very low since it's so cpu intensive even for fairly small chunks of data. For memcmp and memcpy I'd expect the cutoff to be much higher. To me, it's not so much about squeezing every last bit of performance out of this, just preventing some worst case scenario's when people are accidentally being stupid. |
OK, so you're on board for a fairly conservative cutoff, perhaps not quite that high, but some multiple MB sounds about right. We'll have to see what others think of this idea. I'm not opposed, though it will cost an extra branch on every dispatch, I don't know whether that's ever an issue... |
|
We should really add memcpy/memset/memmove primops into GHC. GHC already has thresholds to inline internal calls (see https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/CmmToAsm/X86/CodeGen.hs#L2194 for memcpy on X86). It could insert unsafe/safe calls with new thresholds if needed. |
Indeed, see #274 |
In https://github.com/haskell/bytestring/blob/master/Data/ByteString/Internal.hs#L747, we import
memcpy
as an unsafe FFI call, which means that it runs "inline" and will block all other Haskell threads running on the same capability in the threaded runtime. If the Bytestring being copied is very large, this might block the other Haskell threads for longer than is desired.In the crypto libraries, there are both safe and unsafe imports which are switched between depending on the size of the input. This trick might be useful here as well.
Interested to hear your opinions about this, I can pick it up if this is something desired.
The text was updated successfully, but these errors were encountered: