Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OnLoad sends invalid SEQ numbers in trade_sim sample #148

Open
ttsiodras opened this issue May 31, 2023 · 0 comments
Open

OnLoad sends invalid SEQ numbers in trade_sim sample #148

ttsiodras opened this issue May 31, 2023 · 0 comments

Comments

@ttsiodras
Copy link

ttsiodras commented May 31, 2023

Hello. I am using OnLoad with x3522 boards:

$ onload --version
Onload 8.0.2.51
Copyright 2019-2022 Xilinx, 2006-2019 Solarflare Communications, 2002-2005 Level 5 Networks
Built: Mar 29 2023 10:23:04 (release)
Build profile header: <ci/internal/transport_config_opt_extra.h>
Kernel module: 8.0.2.51

I tried the tradesim example - so on one side (server) I run the exchange app:

$ onload -p latency-best ./exchange -s enp1s0f0np0 

...and on the other, the trading app:

$ onload -p latency-best ./trader_onload_ds_efvi -s 10 -r 10 -d ens224np0 10.201.65.23 

As shown above, it is configured to send small (10 bytes) packets, using delegated sends.
The server reports at the end:

$ onload -p latency-best ./exchange -s enp1s0f0np0 
oo:exchange[1561]: Using Onload 8.0.2.51 [2]
oo:exchange[1561]: Copyright 2019-2022 Xilinx, 2006-2019 Solarflare Communications, 2002-2005 Level 5 Networks
Waiting for client to connect
Accepted client connection
Starting event loop
n_lost_msgs:  0
n_samples:    50000
latency_mean: 5457
latency_min:  4943
latency_max:  28408

All seems fine.

But if you record the TCP traffic on the exchange side via onload_tcpdump, you see this (attached picture).

2023-05-31_14-57

It is clear from the color highlighting that there's a problem.

  • We begin with the usual 3-way SYN/SYNACK/ACK
  • After the triggering via UDP (not shown in the screenshot, which filters for TCP alone) the trading side (10.201.65.21) sends us a TCP pkt of length 4, with SEQ and ACK set to 1. Which is fine at this point.
  • It then sends 4 more bytes - with SEQ 5 and ACK still 1. Also OK.
  • We (10.201.65.23) respond with a no-tcp-payload pkt with SEQ 1 and an ACK of 9. All good so far. We (the server) have acknowledged the first 8 bytes of payload, plus the SYN one sent at the beginning (since SYNs and FINs increase the SEQ by 1).
  • The two 32-bit values we received were metadata. The actual data transmission now starts...
  • The server loop now begins receiving the actual payload from the trading side - i.e. the 10-byte packets
  • So we get a SEQ 9, ACK 1 pkt, carrying 10 bytes
  • We send back a single byte of payload (SEQ 1, ACK 19, LEN 1), which acknowledges the 19 bytes sent so far
  • This repeats, over and over...
  • ...until we get at timestamp 0.012438 - where the server acknowledges receipt of 109 bytes so far: SEQ 10, ACK 109, LEN 1
  • ...and then, the trading side sends us a SEQ 119, ACK 11, LEN 0. This is the first time we receive a non-tcp-payload ACK from the trading client, that acknowledges (properly) receipt of 11 bytes so far on his side - since indeed, the last pkt sent from the server was SEQ 10 and had a len of 1...
  • ...but this new pkt has a SEQ of 119.

Why 119?

At this stage, an empty ACK from the trading client should have been a SEQ 109, ACK 11, LEN 0

And indeed, as if realising its mistake, the trading side sends another packet 93 microseconds later - with SEQ 109, ACK 11 - that in fact carries 10 more bytes.

Wireshark marks this second pkt as a "retransmission", because it sees the SEQ going down - but this is no retransmission. What has apparently happened, is that the OnLoad stack running in the trading side has sent an empty ACK packet (which it has every right to do) but using an invalid SEQ number from the future.

If I go in with a hex editor and modify the SEQ counter to decrement it by 10, wireshark stops highlighting the packet as an error.

This is deterministically happening every time I test - the empty ACKs have wrong SEQ numbers.

So... is this an OnLoad bug?

@ttsiodras ttsiodras changed the title OnLoad sends invalid SEQ numbers in exchange/trading_ef_vi sample OnLoad sends invalid SEQ numbers in trade_sim sample May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant