Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: address "unknown TX stage" errors #372

Open
wants to merge 3 commits into
base: development
Choose a base branch
from

Conversation

redlicha
Copy link

@redlicha redlicha commented Nov 25, 2024

This PR aims to address "unknown TX stage" errors observed in the TCP data path after network problems, which is not easily reproducible. Eventually inspection of such a case with gdb succeeded and showed that tx_ready_tasks_num and tx_ready_list of struct xio_tcp_transport were out of sync. The reason for that isn't clear, but this PR takes the approach of preventing this altogether by removing the double bookkeeping and relying solely on tx_ready_list as the source of truth.
It also emerged that there are uses of list_first_entry_or_null on a list member (instead of the list head) in order to find the next element in the list if any. This will never return NULL but instead misinterpret the list head.

This hasn't seen much testing yet; I'm posting it as RFC to discuss if this strategy is valid.

Using list_first_entry_or_null on a list member will never return NULL,
but rather either the next item if there is one or the head of the
list if there's no item. Use list_next_entry_or_null instead.

Signed-off-by: Arne Redlich <[email protected]>
`tx_ready_tasks_num` and `tx_ready_list` going out of sync after
network problems is the source of hitting "unknown TX stage" errors in
`xio_tcp_xmit`, which is caused by dereferencing an invalid `tcp_task`
pointer, since the list iteration is based on `tx_ready_tasks_num` and
not on the actual list:

```
(gdb) print tcp_hndl->tx_ready_tasks_num
$4 = 1
(gdb) print tcp_hndl->tx_ready_list
$5 = {next = 0x7fc02ecf2778, prev = 0x7fc02ecf2778}
```

Prior to [1] this would result in an infinite loop, with that commit
the connection is disconnected.

The exact reason for `tx_ready_tasks_num` and `tx_ready_list` going out
of sync is not clear, but code inspection suggests that in most places
we only need to know whether the `tx_ready_list` is empty or not, with
the exception of the following two annotated snippets in
`xio_tcp_xmit` that look at the actual value of `tx_ready_tasks_num`
while handling batching:

```
int xio_tcp_xmit(struct xio_tcp_transport *tcp_hndl)
{
	while (likely(tcp_hndl->tx_ready_tasks_num &&
		      (tcp_hndl->tx_comp_cnt < COMPLETION_BATCH_MAX))) {
		next_task = list_first_entry_or_null(&task->tasks_list_entry,
						     struct xio_task,
						     tasks_list_entry);
		next_tcp_task = next_task ?
			(struct xio_tcp_task *)next_task->dd_data : NULL;
                // [...]
		switch (tcp_task->txd.stage) {
                // [...]
		case XIO_TCP_TX_IN_SEND_CTL:
                        // [...]
			++batch_count;
			// - tx_ready_tasks_num > 0 (otherwise we'd
                        //   not be in the loop anymore)
			// - batch_count == tx_ready_tasks_num when
			//   next_task == NULL
			// -> we can drop the batch_count != tx_ready_tasks_num
			//    check
			if (batch_count != batch_nr &&
			    batch_count != tcp_hndl->tx_ready_tasks_num &&
 			    next_task &&
			    (next_tcp_task->txd.stage ==
			    XIO_TCP_TX_IN_SEND_DATA) &&
			    (next_tcp_task->txd.msg.msg_iovlen +
			    tcp_hndl->tmp_work.msg_len) < IOV_MAX) {
   				task = next_task;
				break;
			}
		// [...]
		case XIO_TCP_TX_IN_SEND_DATA:
                        // [...]
			// - tx_ready_tasks_num > 0 (otherwise we'd
                        //   not be in the loop anymore)
			// - batch_count == tx_ready_tasks_num when
			//   next_task == NULL
			// -> we can drop the batch_count != tx_ready_tasks_num
			//    check
			++batch_count;
			if (batch_count != batch_nr &&
			    batch_count != tcp_hndl->tx_ready_tasks_num &&
			    next_task &&
			    (next_tcp_task->txd.stage ==
			    XIO_TCP_TX_IN_SEND_DATA) &&
			    (next_tcp_task->txd.msg.msg_iovlen +
			    tcp_hndl->tmp_work.msg_len) < IOV_MAX) {
				task = next_task;
				break;
			}
		// [...]
	}
}
```

[1] "9d5f92077e645a6c7461dd4ab030df3a75417b2f accelio: disconnect on
tx stage error"

Signed-off-by: Arne Redlich <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant