-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance #58
Comments
You should expect latency slightly higher than native maybe around 5-10us overhead max. |
I refactored the original benchmark code you reference into separate
but I still use the CustomClientEndpoint. The CustomClientEndpoint::init() does a pre post recv request as the last line: SimpleClient::initiated recv |
I recommend running more than just 1000 loops. 38ms total runtime is probably not enough to get stable/good performance. You have to keep in mind that in Java it takes a while until all code path are JIT compiled so initially there is a lot more overhead. |
In fact I have increased the loop to 1,000,000 and the buffer size to 32 * 64 in the RDMAvsTcpBenchmarkClient and RDMAvsTcpBenchmarkServer tests, but the throughput and latency of the DISNI is not what is expected, and it is close to that of TCP, is this reasonable?
|
I do see a difference when I run it:
That said, this benchmark is not good for comparison as it only uses one outstanding posted receive for RDMA (It's more a ping pong test rather then a good benchmark). I recommend you use SendRecvClient/Server if you are interested in send/recv numbers. While it doesn't allow to set preposted receives independently from sends it at least gives you an idea what performance can be like with higher QDs. If you want a "real" RDMA benchmark, i.e. using one-sided operations like RDMA read use ReadClient/Server instead of send/recv. I see around 3us read latency with that benchmark. |
I am using the RDMA benchmark code to perform a latency test for send/recv using 100Gb/s Mellanox cards directly connected.
Seeing from 100-700us for a small string (12 bytes).
But from qperf rc_lat I get about 4 us.
Is this what I should expect?
The text was updated successfully, but these errors were encountered: