-
-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System.Runtime.Intrinsics.X86 #28
Comments
Hello Mark, I haven’t looked into any new .NET Core 3.0 features yet but I find very interesting the hardware intrinsics one, it is the first time I heard about it. Looking at your PoC, you seem to be more into the subject... I could create a branch where we could experiment on those features until is stable and ready for merge (master branch will help as a reference point for the benchmarks), would you be interested on it? |
That sounds like a good plan, I am interested. |
Great! 😃 I've created a new branch named intrinsics and added you as collaborator. |
That’s a very good improvement, in speed and allocation! Looks indeed very promising! Very well done! |
Thanks for the feedback! I was thinking about what the ultimate goal is - it's to get near libsodium performance which is probably the fastest implemention. So I hacked together a very thin DllImport libsodium wrapper and benchmarked it: For small packet sizes (which is the region I'm interested in) - it's very close. For large sizes, it's at least within the same magnitude of performance (where it was significantly slower before). My next area of intrinsic experimentation is Poly1305 then I can benchmark the ChaCha20Poly1305 suite. |
I’m curious, do you have some benchmarks without Intrinsics vs libsodium?, would be interesting to see where it stands. |
This micro benchmark run of ChaCha20 encrypt in NaCl.Core and libsodium is in 3 runtimes: .NET 4.7.2, Core 2.2, and Core 3.0. Interpretation of results:
|
Thanks a lot for sharing the benchmarks, it gives a really good point of reference on where it stands right now and with intrinsics vs libsodium. The difference from vanilla vs intrinsics is huge! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in one week if no further activity occurs. |
Keeping the issue alive! |
Hello,
My library that uses NaCl.Core has 2 variants, one that uses NaCl.Core, and the other that uses the libsodium native library. The libsodium variant runs 4 times faster, but has deployment pitfalls where you have to ensure the correct native file is used for the processor architecture and OS. I've worked around those pitfalls but it got me wondering about optimization in NaCl.Core as a fully managed solution has less friction.
I was wondering if you had looked at the
System.Runtime.Intrinsics.X86
namespace in .NET Core 3.0?I am new to intrinsics so don't consider the following to be authoritative but I thought I'd present my experience as a data point for using intrinsics.
XOR proof-of-concept
I did a proof of concept on an easy bit to update - the XOR in
Snuffle.cs
(this doesn't improve performance much).At the top:
In
Process
:New method at the bottom:
That compiled and benchmarked successfully, running approximately the same speed, perhaps a tiny amount faster.
Conclusion
It seems that to fully implement intrinsics in the
Snuffle.cs
class (and hence get large performance gains)ProcessKeyStreamBlock
would be the place to start, and hence the implementation ofu0.h/u1.h/u4.h/u8.h
in this directory.Whilst I don't have the bandwidth for a pull request that is a mass update to include intrinsics, if you were to establish a style/methodology for including intrinsics, I could contribute in parts when time allows.
The text was updated successfully, but these errors were encountered: