-
Notifications
You must be signed in to change notification settings - Fork 364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance optimizations #530
Conversation
In chacha20-poly1305, would it make sense to use a different mechanism for I've also seen that there is a multithreaded implementation of chacha20-poly1305 which seems to be much faster, not sure if there's anything that could be useful. |
Also chacha20-poly1305 is supported since JDK 11. How do both performs ? |
Haven't looked at further optimizations. Most of the time is spent in Poly1305.processBlock() anyway. We could continue looking at trying to make this cipher and mac faster, but for now I think a 40% improvement and being nearly as fast as native AES (in my benchmarks) is good enough. I don't think the Java built-in ChaCha20-Poly1305 can be used for SSH. SSH uses a 64bit nonce and 64bit counter, while the JDK uses a 96bit nonce and 32bit counter. |
However, using On Java 11, benchmarking shows a speed improvement of 5%, on Java 17 of 9%. (Again, with commit e152cc3 as baseline.) The problem is that on Java 8 the On Java 11 or newer, But we could make the choice at run-time, based on the Java version we're running on, using Either way, I'd prefer not to attempt that in this PR. If we do something like this, let's do it in some
The idea is that one can pre-compute the ChaCha20 key stream asynchronously. This may indeed give some |
Indeed it cannot, but mainly because the Poly1305 setup is done differently. The IETF version from RFC 8439 is just not the same as the SSH version of ChaCha20-Poly1305. But on Java 11 or newer, one can use the simple "ChaCha20" cipher from the standard SunJCE provider as a replacement for our ChaChaEngine, and do the special Poly1305 setup as we do now. The 96-32bit vs. 64-64bit difference can be dealt with. Unfortunately SunJCE does not provide a plain Poly1305 mac (see JDK-8253394), so that one would still be hand-written code. Such a ChaCha20-Poly1305 based on For Java 8, I have not found anything better than the version from commit e152cc3. |
@gnodet: could we do a bug fix release first before merging this? See the mailing list. |
Sure, I was assuming you wanted to include that one too. |
Unroll the permutation loop of ChaCha20 and simplify the crypt() method to work on bytes instead of ints. Unroll and optimize the pack/unpack methods in ChaCha20 and in Poly1305. Don't use Integer.toUnsignedLong(): avoid the extra method call by doing (int & 0xFFFF_FFFFL) inline. In Poly1305.processBlock(), use ints instead of longs where possible (variables t0-t3). All this brings a speed-up of about 40% for the encryption/decryption. Of the time spent in ChaCha20-Poly1305 about one third is spent in the ChaChaEngine and two thirds in Poly1305.processBlock(). Bug: apache#524
Simplify the implementation, and make sure it works when called directly with a non-zero timeout.
For ByteArrayBuffer, optimize the getInt()/putInt() operations to work directly in the buffer without copying into/from a local work area. Simplify some of the operations; in particular, avoid extending bytes to long and then shifting. Bug: apache#524
In ChannelAsyncOutputStream, do not create a new SSH packet if the caller already had passed in an SSH packet buffer, and we're writing it fully. This avoids allocating a new buffer, which gives GC less to do, and it also avoid needlessly copying the data around. Instead the prepared SSH packet buffer is encoded and written directly. Bug: apache#524
Give SftpOutPutStreamAsync a transferFrom operation to be able to read directly into a prepared SSH packet uffer. Previously, we always had to read into a general byte buffer, and then copy into the SSH packet. This extra data copy can be avoided. If the copy buffer size is such that the data fits into a single SSH packet, then this packet buffer can be encoded and sent directly without any further data copying. Simplify the implementation: since the stream always uses a PacketBuffer the code paths for other general byte buffers can simply be removed. Also enable reading the next buffer while the previous one is being sent. Use two buffers alternatingly; one being sent, the other being filled. Use this implementation also in SftpRemotePathChannel in its transferFrom() implementation. _Do_ close the stream there, otherwise the final ACKs may not be checked. Bug: apache#524
A utility method to upload a file via SFTP.
Bouncy Castle apparently only has native implementations in its "LTS" releases. BC 1.78.1 has none. SunJCE uses native code. The "security registrar" architecture in Apache MINA sshd prefers registered providers over the platform providers, and it automatically registers Bouncy Castle if it's present. So with Bouncy Castle on the classpath and an SSH connection for which an AES cipher was specified, Apache MINA sshd would use the much slower Bouncy Castle AES. Add a new "SunJCEWrapper" registrar that brings the SunJCE AES and HmacSHA* to the front of the list, so that they are used even if Bouncy Castle is also registered. The new registrar can be disabled via a system property as usual, and it is only enabled if the JVM indeed has a "SunJCE" security provider registered. See also https://issues.apache.org/jira/browse/SSHD-719 (issue closed as "not needed"). Bug: apache#524
The update() implementation copied _all_ bytes (successively) first into an internal 16-byte buffer and then processed that buffer. This is no needed if the input is long. Use the internal 16-byte buffer only for inputs shorter than 16 bytes, or if there is a leftover of less than 16 bytes at the end of a long input. In between process 16-byte chunks directly from the input byte array. For 32kB inputs this saves us some 2048 calls to System.arraycopy() copying all those 32kB. The speedup is minimal but noticeable in benchmarking. Bug: apache#524
A series of performance optimizations:
SftpClient.put()
for uploading.